From Nicolas.Rougier at inria.fr  Fri Mar  1 02:30:52 2013
From: Nicolas.Rougier at inria.fr (Nicolas Rougier)
Date: Fri, 1 Mar 2013 08:30:52 +0100
Subject: [Numpy-discussion] Array indexing and repeated indices
Message-ID: <6C2ED1E7-9630-4DC3-A6D4-6B149D8AB725@inria.fr>


Hi,

I'm trying to increment an array using indexing and a second array for increment values (since it might be a little tedious to explain, see below for a short example). 

Using "direct" indexing, the values in the example are incremented by 1 only while I want to achieve the alternative behavior. My question is whether there is such function in numpy or if there a re better way to achieve the same result ?
(I would like to avoid the while statement)

I found and adapted the alternative solution from: http://stackoverflow.com/questions/2004364/increment-numpy-array-with-repeated-indices but it is only for a fixed increment from what I've understood.


Nicolas


# ------------------------

import numpy as np

n,p = 5,100
nodes = np.zeros( n, [('value', 'f4', 1)] )
links = np.zeros( p, [('source', 'i4', 1),
                      ('target', 'i4', 1)])
links['source'] = np.random.randint(0, n, p)
links['target'] = np.random.randint(0, n, p)

targets = links['target'] # Indices can be repeated
K = np.ones(len(targets)) # Note K could be anything

# Direct indexing
nodes['value'] = 0
nodes['value'][targets] += K
print nodes

# "Alternative" indexing
nodes['value'] = 0
B = np.bincount(targets)
while B.any():
    I = np.argwhere(B>=1)
    nodes['value'][I] += K[I]
    B = np.maximum(B-1,0)
print nodes


From sebastian at sipsolutions.net  Fri Mar  1 05:04:07 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 01 Mar 2013 11:04:07 +0100
Subject: [Numpy-discussion] Array indexing and repeated indices
In-Reply-To: <6C2ED1E7-9630-4DC3-A6D4-6B149D8AB725@inria.fr>
References: <6C2ED1E7-9630-4DC3-A6D4-6B149D8AB725@inria.fr>
Message-ID: <1362132247.9796.1.camel@sebastian-laptop>

On Fri, 2013-03-01 at 08:30 +0100, Nicolas Rougier wrote:
> Hi,
> 
> I'm trying to increment an array using indexing and a second array for increment values (since it might be a little tedious to explain, see below for a short example). 
> 
> Using "direct" indexing, the values in the example are incremented by 1 only while I want to achieve the alternative behavior. My question is whether there is such function in numpy or if there a re better way to achieve the same result ?
> (I would like to avoid the while statement)
> 
> I found and adapted the alternative solution from: http://stackoverflow.com/questions/2004364/increment-numpy-array-with-repeated-indices but it is only for a fixed increment from what I've understood.
> 
> 
> Nicolas
> 
> 
> # ------------------------
> 
> import numpy as np
> 
> n,p = 5,100
> nodes = np.zeros( n, [('value', 'f4', 1)] )
> links = np.zeros( p, [('source', 'i4', 1),
>                       ('target', 'i4', 1)])
> links['source'] = np.random.randint(0, n, p)
> links['target'] = np.random.randint(0, n, p)
> 
> targets = links['target'] # Indices can be repeated
> K = np.ones(len(targets)) # Note K could be anything
> 
> # Direct indexing
> nodes['value'] = 0
> nodes['value'][targets] += K
> print nodes
> 
> # "Alternative" indexing
> nodes['value'] = 0
> B = np.bincount(targets)

bincount takes a weights argument which should do exactly what you are
looking for.

- Sebastian

> while B.any():
>     I = np.argwhere(B>=1)
>     nodes['value'][I] += K[I]
>     B = np.maximum(B-1,0)
> print nodes
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From Nicolas.Rougier at inria.fr  Fri Mar  1 05:21:41 2013
From: Nicolas.Rougier at inria.fr (Nicolas Rougier)
Date: Fri, 1 Mar 2013 11:21:41 +0100
Subject: [Numpy-discussion] Array indexing and repeated indices
In-Reply-To: <1362132247.9796.1.camel@sebastian-laptop>
References: <6C2ED1E7-9630-4DC3-A6D4-6B149D8AB725@inria.fr>
	<1362132247.9796.1.camel@sebastian-laptop>
Message-ID: <990526F5-6078-4DDF-88F3-C8E4C92DCE03@inria.fr>

> 
> bincount takes a weights argument which should do exactly what you are
> looking for.

Fantastic ! Thanks !


Nicolas


From sebastian at sipsolutions.net  Fri Mar  1 07:25:20 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 01 Mar 2013 13:25:20 +0100
Subject: [Numpy-discussion] step paramter for linspace
Message-ID: <1362140720.13987.0.camel@sebastian-laptop>


Hi,

there has been a request on the issue tracker for a step parameter to
linspace. This is of course tricky with the imprecision of floating
point numbers.
As a trade off, I was thinking of a step parameter that is used to
calculate the integer number of steps. However to be certain that it
never misbehaves, doing this strict up to the numerical precision of the
(float) numbers. Effectively this means:

In [9]: np.linspace(0, 1.2, step=0.3)
Out[9]: array([ 0. ,  0.3,  0.6,  0.9,  1.2])

In [10]: np.linspace(0, 1.2+5-5, step=0.3)
Out[10]: array([ 0. ,  0.3,  0.6,  0.9,  1.2])

In [11]: np.linspace(0, 1.2+500-500, step=0.3)
ValueError: could not determine exact number of samples for given step

I.e. the last fails, because 1.2 + 500 - 500 == 1.1999999999999886,
which is an error that is larger then the imprecision of floating point
numbers.

Is this considered useful, or as it can easily fail for calculated
numbers, and is thus only a convenience, it is not?

Regards,

Sebastian


From heng at cantab.net  Fri Mar  1 07:33:14 2013
From: heng at cantab.net (Henry Gomersall)
Date: Fri, 01 Mar 2013 12:33:14 +0000
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362140720.13987.0.camel@sebastian-laptop>
References: <1362140720.13987.0.camel@sebastian-laptop>
Message-ID: <1362141194.7312.27.camel@farnsworth>

On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
> there has been a request on the issue tracker for a step parameter to
> linspace. This is of course tricky with the imprecision of floating
> point numbers.

How is that different to arange? Either you specify the number of points
with linspace, or you specify the step with arange. Is there a third
option?

My usual hack to deal with the numerical bounds issue is to add/subtract
half the step.

Henry


From sebastian at sipsolutions.net  Fri Mar  1 07:44:05 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 01 Mar 2013 13:44:05 +0100
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362141194.7312.27.camel@farnsworth>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
Message-ID: <1362141845.13987.10.camel@sebastian-laptop>

On Fri, 2013-03-01 at 12:33 +0000, Henry Gomersall wrote:
> On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
> > there has been a request on the issue tracker for a step parameter to
> > linspace. This is of course tricky with the imprecision of floating
> > point numbers.
> 
> How is that different to arange? Either you specify the number of points
> with linspace, or you specify the step with arange. Is there a third
> option?
> 
> My usual hack to deal with the numerical bounds issue is to add/subtract
> half the step.
> 

There is not much. It does that half step logic for you, and you
actually know that the end point is exact (since linspace makes sure of
that).

In arange, the start and step are exact. In linspace the start and stop
are exact (even with a given step, it would vary on the order of
floating point accuracy).

Maybe the larger point is the hope that by adding this to linspace it is
easier to get new users to use it and avoid pitfalls of arange with
floating points when you are not aware of that half step thing.

> Henry
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From sebastian at sipsolutions.net  Fri Mar  1 07:58:38 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 01 Mar 2013 13:58:38 +0100
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362141845.13987.10.camel@sebastian-laptop>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<1362141845.13987.10.camel@sebastian-laptop>
Message-ID: <1362142718.13987.13.camel@sebastian-laptop>

On Fri, 2013-03-01 at 13:44 +0100, Sebastian Berg wrote:
> On Fri, 2013-03-01 at 12:33 +0000, Henry Gomersall wrote:
> > On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
> > > there has been a request on the issue tracker for a step parameter to
> > > linspace. This is of course tricky with the imprecision of floating
> > > point numbers.
> > 
> > How is that different to arange? Either you specify the number of points
> > with linspace, or you specify the step with arange. Is there a third
> > option?
> > 
> > My usual hack to deal with the numerical bounds issue is to add/subtract
> > half the step.
> > 
> 
> There is not much. It does that half step logic for you, and you
> actually know that the end point is exact (since linspace makes sure of
> that).
> 
> In arange, the start and step are exact. In linspace the start and stop
> are exact (even with a given step, it would vary on the order of
> floating point accuracy).
> 
> Maybe the larger point is the hope that by adding this to linspace it is
> easier to get new users to use it and avoid pitfalls of arange with
> floating points when you are not aware of that half step thing.
> 

That said, I am honestly not sure this is worth it. I guess I might use
it once in a while, but overall probably hardly at all and it is easy to
do something else...

> > Henry
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From njs at pobox.com  Fri Mar  1 08:34:48 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 1 Mar 2013 13:34:48 +0000
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362141194.7312.27.camel@farnsworth>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
Message-ID: <CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>

On Fri, Mar 1, 2013 at 12:33 PM, Henry Gomersall <heng at cantab.net> wrote:
> On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
>> there has been a request on the issue tracker for a step parameter to
>> linspace. This is of course tricky with the imprecision of floating
>> point numbers.
>
> How is that different to arange? Either you specify the number of points
> with linspace, or you specify the step with arange. Is there a third
> option?

arange is designed for ints and gives you a half-open interval,
linspace is designed for floats and gives you a closed interval. This
means that when arange is used on floats, it does weird things that
linspace doesn't:

In [11]: eps = np.finfo(float).eps

In [12]: np.arange(0, 1, step=0.2)
Out[12]: array([ 0. ,  0.2,  0.4,  0.6,  0.8])

In [13]: np.arange(0, 1 + eps, step=0.2)
Out[13]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

In [14]: np.linspace(0, 1, 6)
Out[14]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

In [15]: np.linspace(0, 1 + eps, 6)
Out[15]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

The half-open/closed thing also has effects on what kind of api is
reasonable. arange(0, 1, step=0.8) makes perfect sense (it acts like
python range(0, 10, step=8)). linspace(0, 1, step=0.8) is just
incoherent, though, because linspace guarantees that both the start
and end points are included.

> My usual hack to deal with the numerical bounds issue is to add/subtract
> half the step.

Right. Which is exactly the sort of annoying, content-free code that a
library is supposed to handle for you, so you can save mental energy
for more important things :-).

The problem is to figure out exactly how strict we should be. Like,
presumably linspace(0, 1, step=0.8) should fail, rather than round 0.8
to 0.5 or 1. That would clearly violate "in the face of ambiguity,
refuse the temptation to guess".

OTOH, as Sebastian points out, requiring that the step be *exactly* a
divisor of the value (stop - start), within 1 ULP, is probably
obnoxious.

Would anything bad happen if we just required that, say, (stop -
start)/step had to be within "np.allclose" of an integer, i.e., to
some reasonable relative and absolute precision, and then rounded the
number of steps to match that integer exactly?

-n


From heng at cantab.net  Fri Mar  1 09:14:35 2013
From: heng at cantab.net (Henry Gomersall)
Date: Fri, 01 Mar 2013 14:14:35 +0000
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
Message-ID: <1362147275.7312.43.camel@farnsworth>

On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote:
> > My usual hack to deal with the numerical bounds issue is to
> add/subtract
> > half the step.
> 
> Right. Which is exactly the sort of annoying, content-free code that a
> library is supposed to handle for you, so you can save mental energy
> for more important things :-).

I agree with the sentiment (I sometimes wish a library could read my
mind ;) but putting this sort of logic into the library seems dangerous
to me.

The point is that the coder _should_ understand the subtleties of
floating point numbers. IMO arange _should_ be well specified and
actually operate on the half open interval; continuing to add a step
until >= the limit is clear and always unambiguous.

Unfortunately, the docs tell me that this isn't the case:
"For floating point arguments, the length of the result is
 ``ceil((stop - start)/step)``.  Because of floating point overflow,
 this rule may result in the last element of `out` being greater
 than `stop`."

In my jet-lag addled state, i can't see when this out[-1] > stop case
will occur, but I can take it as true. It does seem to be problematic
though.

As soon as you allow freeform setting of the stop value, problems are
going to be encountered. Who's to say that the stop - delta is actually
_meant_ to be below the limit, or is meant to be the limit? Certainly
not the library!

It just seems to me that this will lead to lots of bad code in which the
writer has glossed over an ambiguous case.

Henry


From warren.weckesser at gmail.com  Fri Mar  1 09:24:57 2013
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Fri, 1 Mar 2013 09:24:57 -0500
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362147275.7312.43.camel@farnsworth>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
Message-ID: <CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>

On 3/1/13, Henry Gomersall <heng at cantab.net> wrote:
> On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote:
>> > My usual hack to deal with the numerical bounds issue is to
>> add/subtract
>> > half the step.
>>
>> Right. Which is exactly the sort of annoying, content-free code that a
>> library is supposed to handle for you, so you can save mental energy
>> for more important things :-).
>
> I agree with the sentiment (I sometimes wish a library could read my
> mind ;) but putting this sort of logic into the library seems dangerous
> to me.
>
> The point is that the coder _should_ understand the subtleties of
> floating point numbers. IMO arange _should_ be well specified and
> actually operate on the half open interval; continuing to add a step
> until >= the limit is clear and always unambiguous.
>
> Unfortunately, the docs tell me that this isn't the case:
> "For floating point arguments, the length of the result is
>  ``ceil((stop - start)/step)``.  Because of floating point overflow,
>  this rule may result in the last element of `out` being greater
>  than `stop`."
>
> In my jet-lag addled state, i can't see when this out[-1] > stop case
> will occur, but I can take it as true. It does seem to be problematic
> though.


Here you go:

In [32]: end = 2.2

In [33]: x = arange(0.1, end, 0.3)

In [34]: x[-1]
Out[34]: 2.2000000000000006

In [35]: x[-1] > end
Out[35]: True


Warren


>
> As soon as you allow freeform setting of the stop value, problems are
> going to be encountered. Who's to say that the stop - delta is actually
> _meant_ to be below the limit, or is meant to be the limit? Certainly
> not the library!
>
> It just seems to me that this will lead to lots of bad code in which the
> writer has glossed over an ambiguous case.
>
> Henry
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From heng at cantab.net  Fri Mar  1 09:32:48 2013
From: heng at cantab.net (Henry Gomersall)
Date: Fri, 01 Mar 2013 14:32:48 +0000
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
Message-ID: <1362148368.7312.48.camel@farnsworth>

On Fri, 2013-03-01 at 09:24 -0500, Warren Weckesser wrote:
> > In my jet-lag addled state, i can't see when this out[-1] > stop
> case
> > will occur, but I can take it as true. It does seem to be
> problematic
> > though.
> 
> 
> Here you go:
> 
> In [32]: end = 2.2
> 
> In [33]: x = arange(0.1, end, 0.3)

Thanks!

I'll assert then that there should be an equivalent for floats that
unambiguously returns a range for the half open interval. IMO this is
more useful than a hacky version of linspace.

Henry


From heng at cantab.net  Fri Mar  1 09:35:38 2013
From: heng at cantab.net (Henry Gomersall)
Date: Fri, 01 Mar 2013 14:35:38 +0000
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362148368.7312.48.camel@farnsworth>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
	<1362148368.7312.48.camel@farnsworth>
Message-ID: <1362148538.7312.49.camel@farnsworth>

On Fri, 2013-03-01 at 14:32 +0000, Henry Gomersall wrote:
> I'll assert then that there should be an equivalent for floats that
> unambiguously returns a range for the half open interval. IMO this is
> more useful than a hacky version of linspace.

And, no, I haven't thought carefully about how to handle a negative
step.

Henry


From sebastian at sipsolutions.net  Fri Mar  1 09:53:53 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 01 Mar 2013 15:53:53 +0100
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
Message-ID: <1362149633.13987.41.camel@sebastian-laptop>

On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote:
> On Fri, Mar 1, 2013 at 12:33 PM, Henry Gomersall <heng at cantab.net> wrote:
> > On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote:
> >> there has been a request on the issue tracker for a step parameter to
> >> linspace. This is of course tricky with the imprecision of floating
> >> point numbers.
> >
> > How is that different to arange? Either you specify the number of points
> > with linspace, or you specify the step with arange. Is there a third
> > option?
> 
> arange is designed for ints and gives you a half-open interval,
> linspace is designed for floats and gives you a closed interval. This
> means that when arange is used on floats, it does weird things that
> linspace doesn't:
> 
> In [11]: eps = np.finfo(float).eps
> 
> In [12]: np.arange(0, 1, step=0.2)
> Out[12]: array([ 0. ,  0.2,  0.4,  0.6,  0.8])
> 
> In [13]: np.arange(0, 1 + eps, step=0.2)
> Out[13]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])
> 
> In [14]: np.linspace(0, 1, 6)
> Out[14]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])
> 
> In [15]: np.linspace(0, 1 + eps, 6)
> Out[15]: array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])
> 
> The half-open/closed thing also has effects on what kind of api is
> reasonable. arange(0, 1, step=0.8) makes perfect sense (it acts like
> python range(0, 10, step=8)). linspace(0, 1, step=0.8) is just
> incoherent, though, because linspace guarantees that both the start
> and end points are included.
> 
> > My usual hack to deal with the numerical bounds issue is to add/subtract
> > half the step.
> 
> Right. Which is exactly the sort of annoying, content-free code that a
> library is supposed to handle for you, so you can save mental energy
> for more important things :-).
> 
> The problem is to figure out exactly how strict we should be. Like,
> presumably linspace(0, 1, step=0.8) should fail, rather than round 0.8
> to 0.5 or 1. That would clearly violate "in the face of ambiguity,
> refuse the temptation to guess".
> 
> OTOH, as Sebastian points out, requiring that the step be *exactly* a
> divisor of the value (stop - start), within 1 ULP, is probably
> obnoxious.
> 
> Would anything bad happen if we just required that, say, (stop -
> start)/step had to be within "np.allclose" of an integer, i.e., to
> some reasonable relative and absolute precision, and then rounded the
> number of steps to match that integer exactly?

I was a bit worried about what happens for huge a number of steps. Have
to rethink a bit about it, but I guess one should be able to relax it...
or maybe someone here has a nice idea on how to relax it.
It seems to me that there is a bit of a trade off if you get into the
millions of steps range, because absolute errors that make sense for few
steps are suddenly in the range integers.

> 
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From alan.isaac at gmail.com  Fri Mar  1 10:01:29 2013
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Fri, 01 Mar 2013 10:01:29 -0500
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362148368.7312.48.camel@farnsworth>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
	<1362148368.7312.48.camel@farnsworth>
Message-ID: <5130C2C9.3070603@gmail.com>

On 3/1/2013 9:32 AM, Henry Gomersall wrote:
> there should be an equivalent for floats that
> unambiguously returns a range for the half open interval


If I've understood you:
start + stepsize*np.arange(nsteps)

fwiw,
Alan Isaac


From heng at cantab.net  Fri Mar  1 10:07:46 2013
From: heng at cantab.net (Henry Gomersall)
Date: Fri, 01 Mar 2013 15:07:46 +0000
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <5130C2C9.3070603@gmail.com>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
	<1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com>
Message-ID: <1362150466.7312.50.camel@farnsworth>

On Fri, 2013-03-01 at 10:01 -0500, Alan G Isaac wrote:
> On 3/1/2013 9:32 AM, Henry Gomersall wrote:
> > there should be an equivalent for floats that
> > unambiguously returns a range for the half open interval
> 
> 
> If I've understood you:
> start + stepsize*np.arange(nsteps)

yes, except that nsteps is computed for you, otherwise you could just
use linspace ;)

hen


From sebastian at sipsolutions.net  Fri Mar  1 10:27:29 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 01 Mar 2013 16:27:29 +0100
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362150466.7312.50.camel@farnsworth>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
	<1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com>
	<1362150466.7312.50.camel@farnsworth>
Message-ID: <1362151649.13987.57.camel@sebastian-laptop>

On Fri, 2013-03-01 at 15:07 +0000, Henry Gomersall wrote:
> On Fri, 2013-03-01 at 10:01 -0500, Alan G Isaac wrote:
> > On 3/1/2013 9:32 AM, Henry Gomersall wrote:
> > > there should be an equivalent for floats that
> > > unambiguously returns a range for the half open interval
> > 
> > 
> > If I've understood you:
> > start + stepsize*np.arange(nsteps)
> 
> yes, except that nsteps is computed for you, otherwise you could just
> use linspace ;)

If you could just use linspace, you should use linspace (and give it a
step argument) in my opinion, but I don't think you meant that ;).

linspace holds start and stop exact and guarantees that you actually get
to stop. Even a modified/new arange will never do that, but I think many
use arange like that and giving linspace a step argument could migrate
that usage (which is simply ill defined for arange) to it. That might
give an error once in a while, but that should be much less often and
much more enlightening then a sudden "one value too much".

I think the accuracy requirements for the step for linspace can be
relaxed enough probably, though I am not quite certain yet as to how
(there is a bit of a trade off/problem when you get to a very large
number of steps).

> 
> hen
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From alan.isaac at gmail.com  Fri Mar  1 10:49:00 2013
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Fri, 01 Mar 2013 10:49:00 -0500
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362150466.7312.50.camel@farnsworth>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
	<1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com>
	<1362150466.7312.50.camel@farnsworth>
Message-ID: <5130CDEC.6010206@gmail.com>

One motivation of this thread was that
adding a step parameter to linspace might make
things easier for beginners.

I claim this thread has put the lie to that,
starting with the initial post.  So what is the
persuasive case for the change?

Imo, the current situation is good:
use arange if you want to specify the stepsize,
or use linspace if you want to specify the
number of points.  Nice and clean.

Cheers,
Alan Isaac


From sebastian at sipsolutions.net  Fri Mar  1 11:29:57 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 01 Mar 2013 17:29:57 +0100
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <5130CDEC.6010206@gmail.com>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
	<1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com>
	<1362150466.7312.50.camel@farnsworth> <5130CDEC.6010206@gmail.com>
Message-ID: <1362155397.13987.99.camel@sebastian-laptop>

On Fri, 2013-03-01 at 10:49 -0500, Alan G Isaac wrote:
> One motivation of this thread was that
> adding a step parameter to linspace might make
> things easier for beginners.
> 
> I claim this thread has put the lie to that,
> starting with the initial post.  So what is the
> persuasive case for the change?
> 
> Imo, the current situation is good:
> use arange if you want to specify the stepsize,
> or use linspace if you want to specify the
> number of points.  Nice and clean.
> 

Maybe you are right, and it is not easier. But there was a "please
include an end_point=True/False option to arange" request, and that does
not sense by arange logic.

The fact that the initial example was overly strict is something that
can be relaxed quite a bit I am sure, though I guess you may always have
an odd case here or there with floats.

I agree the difference is nice and clean right now, but I disagree that
this would change much. Arange guarantees the step size. Linspace the
end point. There is a bit of a shift, but if I thought it was less clean
I would not have asked if it is deemed useful :).

At this time it seems there is more sentiment against it and that is
fine with me. I thought it might be useful for some who normally want
the linspace behavior, but do not want to worry about the right num in
some cases. Someone who actually wants an error if the step they put in
quickly (and which they would have used to calculate num) was wrong.

> Cheers,
> Alan Isaac
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From heng at cantab.net  Fri Mar  1 11:36:03 2013
From: heng at cantab.net (Henry Gomersall)
Date: Fri, 01 Mar 2013 16:36:03 +0000
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362155397.13987.99.camel@sebastian-laptop>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
	<1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com>
	<1362150466.7312.50.camel@farnsworth> <5130CDEC.6010206@gmail.com>
	<1362155397.13987.99.camel@sebastian-laptop>
Message-ID: <1362155763.7312.62.camel@farnsworth>

On Fri, 2013-03-01 at 17:29 +0100, Sebastian Berg wrote:
> At this time it seems there is more sentiment against it and that is
> fine with me. I thought it might be useful for some who normally want
> the linspace behavior, but do not want to worry about the right num in
> some cases. Someone who actually wants an error if the step they put
> in
> quickly (and which they would have used to calculate num) was wrong.

Actually, I buy this could be useful. I think it's helpful to think
about the potential problems though.

Henry


From scollis.acrf at gmail.com  Sat Mar  2 17:32:28 2013
From: scollis.acrf at gmail.com (Scott Collis)
Date: Sat, 2 Mar 2013 16:32:28 -0600
Subject: [Numpy-discussion] feature tracking in numpy/scipy
Message-ID: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>

Good afternoon list,
I am looking at feature tracking in a 2D numpy array, along the lines of Dixon and Wiener 1993 (for tracking precipitating storms)

Identifying features based on threshold is quite trivial using ndimage.label 

b_fld=np.zeros(mygrid.fields['rain_rate_A']['data'].shape)
rr=10
b_fld[mygrid.fields['rain_rate_A']['data'] > rr]=1.0
labels, numobjects = ndimage.label(b_fld[0,0,:,:])
(note mygrid.fields['rain_rate_A']['data'] is dimensions time,height, y, x)

using the matplotlib contouring and fetching the vertices I can get a nice list of polygons of rain rate above a certain threshold? Now from here I can just go and implement the Dixon and Wiener methodology but I thought I would check here first to see if anyone know of a object/feature tracking algorithm in numpy/scipy or using numpy arrays (it just seems like something people would want to do!).. i.e. something that looks back and forward in time and identifies polygon movement and identifies objects with temporal persistence.. 

Cheers!
Scott

Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification, Tracking, Analysis, and Nowcasting?A Radar-based Methodology. Journal of Atmospheric and Oceanic Technology, 10, 785?797, doi:10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.
http://journals.ametsoc.org/doi/abs/10.1175/1520-0426%281993%29010%3C0785%3ATTITAA%3E2.0.CO%3B2

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130302/35285c2a/attachment.html>

From sudheer.joseph at yahoo.com  Sat Mar  2 21:03:11 2013
From: sudheer.joseph at yahoo.com (Sudheer Joseph)
Date: Sun, 3 Mar 2013 10:03:11 +0800 (SGT)
Subject: [Numpy-discussion] reshaping arrays
In-Reply-To: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
Message-ID: <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com>

Hi all,
? ? ? ? For a 3d array in matlab, I can do the below to reshape it before an eof analysis. Is there a way to do the same using numpy? Please help.

[nlat,nlon,ntim ]=size(ssh);
tssh=reshape(ssh,nlat*nlon,ntim);
and afterwards
eofout=[]
eofout=reshape(eof1,nlat,nlon,ntime)
with best regards,
Sudheer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130303/389c06f4/attachment.html>

From brad.froehle at gmail.com  Sat Mar  2 22:20:42 2013
From: brad.froehle at gmail.com (Bradley M. Froehle)
Date: Sat, 2 Mar 2013 19:20:42 -0800
Subject: [Numpy-discussion] reshaping arrays
In-Reply-To: <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com>
References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
	<1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com>
Message-ID: <CAHXv-MiKyTrgc=B3CVP0JMVS_i1P1PXTDa6B27AaqJ=sT2HZYg@mail.gmail.com>

On Sat, Mar 2, 2013 at 6:03 PM, Sudheer Joseph <sudheer.joseph at yahoo.com>wrote:

> Hi all,
>         For a 3d array in matlab, I can do the below to reshape it before
> an eof analysis. Is there a way to do the same using numpy? Please help.
>
> [nlat,nlon,ntim ]=size(ssh);
> tssh=reshape(ssh,nlat*nlon,ntim);
> and afterwards
> eofout=[]
> eofout=reshape(eof1,nlat,nlon,ntime)
>

Yes, easy:

nlat, nlon, ntim = ssh.shape
tssh = ssh.reshape(nlat*nlon, ntim, order='F')
and afterwards
eofout = eofl.reshape(nlat, nlon, ntim, order='F')

You probably want to go read through
http://www.scipy.org/NumPy_for_Matlab_Users.

Cheers,
Brad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130302/2a21adf4/attachment.html>

From sudheer.joseph at yahoo.com  Sat Mar  2 22:49:09 2013
From: sudheer.joseph at yahoo.com (Sudheer Joseph)
Date: Sun, 3 Mar 2013 11:49:09 +0800 (SGT)
Subject: [Numpy-discussion] reshaping arrays
In-Reply-To: <CAHXv-MiKyTrgc=B3CVP0JMVS_i1P1PXTDa6B27AaqJ=sT2HZYg@mail.gmail.com>
References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
	<1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com>
	<CAHXv-MiKyTrgc=B3CVP0JMVS_i1P1PXTDa6B27AaqJ=sT2HZYg@mail.gmail.com>
Message-ID: <1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com>

Thank you Brad,
for the quick reply with solution,?special?thanks to the link for matlab users.
with best regards,
Sudheer
?
***************************************************************
Sudheer Joseph 
Indian National Centre for Ocean Information Services
Ministry of Earth Sciences, Govt. of India
POST BOX NO: 21, IDA Jeedeemetla P.O.
Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55
Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com
Web- http://oppamthadathil.tripod.com
***************************************************************


________________________________
 From: Bradley M. Froehle <brad.froehle at gmail.com>
To: Discussion of Numerical Python <numpy-discussion at scipy.org> 
Sent: Sunday, 3 March 2013 8:50 AM
Subject: Re: [Numpy-discussion] reshaping arrays
 

On Sat, Mar 2, 2013 at 6:03 PM, Sudheer Joseph <sudheer.joseph at yahoo.com> wrote:

Hi all,
>? ? ? ? For a 3d array in matlab, I can do the below to reshape it before an eof analysis. Is there a way to do the same using numpy? Please help.
>
>
>[nlat,nlon,ntim ]=size(ssh);
>tssh=reshape(ssh,nlat*nlon,ntim);
>and afterwards
>eofout=[]
>eofout=reshape(eof1,nlat,nlon,ntime)

Yes, easy:
nlat, nlon, ntim = ssh.shape
tssh = ssh.reshape(nlat*nlon, ntim, order='F')
and afterwards
eofout = eofl.reshape(nlat, nlon, ntim, order='F')

You probably want to go read through?http://www.scipy.org/NumPy_for_Matlab_Users.

Cheers,
Brad
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130303/ecc8cdad/attachment.html>

From sudheer.joseph at yahoo.com  Sat Mar  2 23:35:43 2013
From: sudheer.joseph at yahoo.com (Sudheer Joseph)
Date: Sun, 3 Mar 2013 12:35:43 +0800 (SGT)
Subject: [Numpy-discussion] reshaping arrays
In-Reply-To: <1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com>
References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
	<1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com>
	<CAHXv-MiKyTrgc=B3CVP0JMVS_i1P1PXTDa6B27AaqJ=sT2HZYg@mail.gmail.com>
	<1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com>
Message-ID: <1362285343.43953.YahooMailNeo@web193402.mail.sg3.yahoo.com>

Hi Brad,
? ? ? ? ? ? ? ? I am not getting the attribute reshape for the array, are you having a different version of numpy than mine?

I have?
In [55]: np.__version__
Out[55]: '1.7.0'
and detail of the shape

details of variable?

In [57]: ssh??
Type: ? ? ? NetCDFVariable
String Form:<NetCDFVariable object at 0x492d3d8>
Namespace: ?Interactive
Length: ? ? 75
Docstring: ?NetCDF Variable

In [58]: ssh.shape
Out[58]: (75, 140, 180)

ssh??
Type: ? ? ? NetCDFVariable
String Form:<NetCDFVariable object at 0x492d3d8>
Namespace: ?Interactive
Length: ? ? 75
Docstring: ?NetCDF Variable

In [66]: ssh.shape
Out[66]: (75, 140, 180)

In [67]: ssh.reshape(75,140*180)
---------------------------------------------------------------------------
AttributeError ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last)
/home/sjo/RAMA_20120807/adcp/<ipython-input-67-1a21dae1d18d> in <module>()
----> 1 ssh.reshape(75,140*180)

AttributeError: reshape

?
***************************************************************
Sudheer Joseph 
Indian National Centre for Ocean Information Services
Ministry of Earth Sciences, Govt. of India
POST BOX NO: 21, IDA Jeedeemetla P.O.
Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55
Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com
Web- http://oppamthadathil.tripod.com
***************************************************************


________________________________
 From: Sudheer Joseph <sudheer.joseph at yahoo.com>
To: Discussion of Numerical Python <numpy-discussion at scipy.org> 
Sent: Sunday, 3 March 2013 9:19 AM
Subject: Re: [Numpy-discussion] reshaping arrays
 

Thank you Brad,
for the quick reply with solution,?special?thanks to the link for matlab users.
with best regards,
Sudheer
?
***************************************************************
Sudheer Joseph 
Indian National Centre for Ocean Information Services
Ministry of Earth Sciences, Govt. of India
POST BOX NO: 21, IDA Jeedeemetla P.O.
Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55
Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com
Web- http://oppamthadathil.tripod.com
***************************************************************


________________________________
 From: Bradley M. Froehle <brad.froehle at gmail.com>
To: Discussion of Numerical Python <numpy-discussion at scipy.org> 
Sent: Sunday, 3 March 2013 8:50 AM
Subject: Re: [Numpy-discussion] reshaping arrays
 

On Sat, Mar 2, 2013 at 6:03 PM, Sudheer Joseph <sudheer.joseph at yahoo.com> wrote:

Hi all,
>? ? ? ? For a 3d array in matlab, I can do the below to reshape it before an eof analysis. Is there a way to do the same using numpy? Please help.
>
>
>[nlat,nlon,ntim ]=size(ssh);
>tssh=reshape(ssh,nlat*nlon,ntim);
>and afterwards
>eofout=[]
>eofout=reshape(eof1,nlat,nlon,ntime)

Yes, easy:
nlat, nlon, ntim = ssh.shape
tssh = ssh.reshape(nlat*nlon, ntim, order='F')
and afterwards
eofout = eofl.reshape(nlat, nlon, ntim, order='F')

You probably want to go read through?http://www.scipy.org/NumPy_for_Matlab_Users.

Cheers,
Brad
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130303/3461810e/attachment.html>

From opossumnano at gmail.com  Mon Mar  4 05:36:55 2013
From: opossumnano at gmail.com (Tiziano Zito)
Date: Mon, 4 Mar 2013 11:36:55 +0100
Subject: [Numpy-discussion] EuroSciPy 2013 Call for Abstracts
Message-ID: <20130304103653.GA30426@bio230.biologie.hu-berlin.de>

Dear Scientist using Python,

EuroSciPy 2013, the Sixth Annual Conference on Python in Science, takes place in
Brussels on 21 - 24 August 2013. The conference features two days of tutorials
followed by two days of scientific talks that start with our keynote speakers,
Cameron Neylon and Peter Wang.

The topics presented at EuroSciPy are very diverse, with a focus on advanced
software engineering and original uses of Python and its scientific libraries,
either in theoretical or experimental research, from both academia and the
industry. The program includes contributed talks and posters.

Submissions for talks and posters are welcome on our website
(http://www.euroscipy.org/). Authors must use the web interface to submit an
abstract to the conference. In your abstract, please provide details on what
Python tools are being employed, and how.

The deadline for submission is 28 April 2013.

Until 31 March 2013, you can apply for a sprint session on 25 August 2013.
Also, potential organizers for EuroSciPy 2014 are welcome to contact the
conference committee.

SciPythonic Regards,
The EuroSciPy 2013 Committee
http://www.euroscipy.org/

Conference Chairs: Pierre de Buyl and Nicolas Pettiaux, Universit? libre de
Bruxelles, Belgium
Tutorial Chair: Nicolas Rougier, INRIA, Nancy, France
Program Chair: Tiziano Zito, Humboldt-Universit?t zu Berlin, Germany

Program Committee
Ralf Gommers, ASML, The Netherlands
Emmanuelle Gouillart, Joint Unit CNRS/Saint-Gobain, France
Kael Hanson, Universit? Libre de Bruxelles, Belgium
Konrad Hinsen, Centre National de la Recherche Scientifique (CNRS), France
Hans Petter Langtangen, Simula and University of Oslo, Norway
Mike M?ller, Python Academy, Germany
Raphael Ritz, International Neuroinformatics Coordinating Facility, Stockholm,
Sweden
St?fan van der Walt, Applied Mathematics, Stellenbosch University, South Africa
Ga?l Varoquaux, INRIA Parietal, Saclay, France
Nelle Varoquaux, Mines ParisTech, France
Pauli Virtanen, Aalto University, Finland

Organizing Committee
Nicolas Chauvat, Logilab, France
Emmanuelle Gouillart, Joint Unit CNRS/Saint-Gobain, France
Kael Hanson, Universit? Libre de Bruxelles, Belgium
Renaud Lambiotte, University of Namur, Belgium
Thomas Lecocq, Royal Observatory of Belgium
Mike M?ller, Python Academy, Germany
Didrik Pinte, Enthought Europe
Ga?l Varoquaux, INRIA Parietal, Saclay, France
Nelle Varoquaux, Mines ParisTech, France


From ben.root at ou.edu  Mon Mar  4 09:10:54 2013
From: ben.root at ou.edu (Benjamin Root)
Date: Mon, 4 Mar 2013 09:10:54 -0500
Subject: [Numpy-discussion] reshaping arrays
In-Reply-To: <1362285343.43953.YahooMailNeo@web193402.mail.sg3.yahoo.com>
References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
	<1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com>
	<CAHXv-MiKyTrgc=B3CVP0JMVS_i1P1PXTDa6B27AaqJ=sT2HZYg@mail.gmail.com>
	<1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com>
	<1362285343.43953.YahooMailNeo@web193402.mail.sg3.yahoo.com>
Message-ID: <CANNq6FnDRLHD_9rqxOKr63GXJztXG=LZiJ=s-9JUoaA2idpmkw@mail.gmail.com>

On Sat, Mar 2, 2013 at 11:35 PM, Sudheer Joseph <sudheer.joseph at yahoo.com>wrote:

> Hi Brad,
>                 I am not getting the attribute reshape for the array, are
> you having a different version of numpy than mine?
>
> I have
> In [55]: np.__version__
> Out[55]: '1.7.0'
> and detail of the shape
>
> details of variable
>
> In [57]: ssh??
> Type:       NetCDFVariable
> String Form:<NetCDFVariable object at 0x492d3d8>
> Namespace:  Interactive
> Length:     75
> Docstring:  NetCDF Variable
>
> In [58]: ssh.shape
> Out[58]: (75, 140, 180)
>
> ssh??
> Type:       NetCDFVariable
> String Form:<NetCDFVariable object at 0x492d3d8>
> Namespace:  Interactive
> Length:     75
> Docstring:  NetCDF Variable
>
> In [66]: ssh.shape
> Out[66]: (75, 140, 180)
>
> In [67]: ssh.reshape(75,140*180)
> ---------------------------------------------------------------------------
> AttributeError                            Traceback (most recent call last)
> /home/sjo/RAMA_20120807/adcp/<ipython-input-67-1a21dae1d18d> in <module>()
> ----> 1 ssh.reshape(75,140*180)
>
> AttributeError: reshape
>
>
>
Ah, you have a NetCDF variable, which in many ways purposefully looks like
a NumPy array, but isn't.  Just keep in mind that a NetCDF variable is
merely a way to have the data available without actually reading it in
until you need it.  If you do:

ssh_data = ssh[:]

Then the NetCDF variable will read all the data in the file and return it
as a numpy array that can be manipulated as you wish.

I hope that helps!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/438d0d72/attachment.html>

From sudheer.joseph at yahoo.com  Mon Mar  4 09:43:06 2013
From: sudheer.joseph at yahoo.com (Sudheer Joseph)
Date: Mon, 4 Mar 2013 22:43:06 +0800 (SGT)
Subject: [Numpy-discussion] reshaping arrays
In-Reply-To: <CANNq6FnDRLHD_9rqxOKr63GXJztXG=LZiJ=s-9JUoaA2idpmkw@mail.gmail.com>
References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
	<1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com>
	<CAHXv-MiKyTrgc=B3CVP0JMVS_i1P1PXTDa6B27AaqJ=sT2HZYg@mail.gmail.com>
	<1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com>
	<1362285343.43953.YahooMailNeo@web193402.mail.sg3.yahoo.com>
	<CANNq6FnDRLHD_9rqxOKr63GXJztXG=LZiJ=s-9JUoaA2idpmkw@mail.gmail.com>
Message-ID: <1362408186.10860.YahooMailNeo@web193405.mail.sg3.yahoo.com>

Thanks a lot ?Benjamin,?
?it did the trick. I have another question, I have ?ocean section along latitude 0 ( equator) which is sampled at depths.

size of the array is 12x14 but this is just the index of the array I need to make a plot which shows depth value as one axis and longitude values as another axis. Is there a quick way to rescale the data to lat depth section by adding a new axis?
depth=[0,10,20,30,40,50,60,70,80,90,100,120]
lon=[ 40, ?45, ?50, ?55, ?60, ?65, ?70, ?75, ?80, ?85, ?90, ?95, 100, 105]

In [20]: data.shape
Out[20]: (12, 14)
can you please advice me on what is the best way to?re-scale?the data to depth lat dimensions from the indices 1-12 and 1-14
With best regards,
Sudheer


From: Benjamin Root <ben.root at ou.edu>
To: Discussion of Numerical Python <numpy-discussion at scipy.org> 
Sent: Monday, 4 March 2013 7:40 PM
Subject: Re: [Numpy-discussion] reshaping arrays
 

On Sat, Mar 2, 2013 at 11:35 PM, Sudheer Joseph <sudheer.joseph at yahoo.com> wrote:

Hi Brad,
>? ? ? ? ? ? ? ? I am not getting the attribute reshape for the array, are you having a different version of numpy than mine?
>
>
>I have?
>In [55]: np.__version__
>Out[55]: '1.7.0'
>and detail of the shape
>
>
>details of variable?
>
>
>In [57]: ssh??
>Type: ? ? ? NetCDFVariable
>String Form:<NetCDFVariable object at 0x492d3d8>
>Namespace: ?Interactive
>Length: ? ? 75
>Docstring: ?NetCDF Variable
>
>
>In [58]: ssh.shape
>Out[58]: (75, 140, 180)
>
>
>ssh??
>Type: ? ? ? NetCDFVariable
>String Form:<NetCDFVariable object at 0x492d3d8>
>Namespace: ?Interactive
>Length: ? ? 75
>Docstring: ?NetCDF Variable
>
>
>In [66]: ssh.shape
>Out[66]: (75, 140, 180)
>
>
>In [67]: ssh.reshape(75,140*180)
>---------------------------------------------------------------------------
>AttributeError ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last)
>/home/sjo/RAMA_20120807/adcp/<ipython-input-67-1a21dae1d18d> in <module>()
>----> 1 ssh.reshape(75,140*180)
>
>
>AttributeError: reshape
>
>
>

Ah, you have a NetCDF variable, which in many ways purposefully looks like a NumPy array, but isn't.? Just keep in mind that a NetCDF variable is merely a way to have the data available without actually reading it in until you need it.? If you do:

ssh_data = ssh[:]

Then the NetCDF variable will read all the data in the file and return it as a numpy array that can be manipulated as you wish.

I hope that helps!
Ben Root


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/2dbe5c5b/attachment.html>

From chris.barker at noaa.gov  Mon Mar  4 13:04:09 2013
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Mon, 4 Mar 2013 10:04:09 -0800
Subject: [Numpy-discussion] step paramter for linspace
In-Reply-To: <1362155763.7312.62.camel@farnsworth>
References: <1362140720.13987.0.camel@sebastian-laptop>
	<1362141194.7312.27.camel@farnsworth>
	<CAPJVwBmjwHFDP8hXr4ty82F-dmNyK18VxCkCCw=pK8mr=rrn0w@mail.gmail.com>
	<1362147275.7312.43.camel@farnsworth>
	<CAGzF1uezTj6i2c2ecma9ckj4d8NcicTgyMrrL=an-KO9meON6A@mail.gmail.com>
	<1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com>
	<1362150466.7312.50.camel@farnsworth> <5130CDEC.6010206@gmail.com>
	<1362155397.13987.99.camel@sebastian-laptop>
	<1362155763.7312.62.camel@farnsworth>
Message-ID: <-9069729710388254232@unknownmsgid>

On Mar 1, 2013, at 8:39 AM, Henry Gomersall <heng at cantab.net> wrote:

> On Fri, 2013-03-01 at 17:29
> Actually, I buy this could be useful.

Yes, it could.

How about a "farange", designed for floating point  values -- I
imagine someone smarter than me about for could write one that would
guarantee that end-point was exact, and steps were within
For error of exact.

CHB


> I think it's helpful to think
> about the potential problems though.
>
> Henry
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From paul.anton.letnes at gmail.com  Mon Mar  4 15:39:10 2013
From: paul.anton.letnes at gmail.com (Paul Anton Letnes)
Date: Mon, 4 Mar 2013 21:39:10 +0100
Subject: [Numpy-discussion] Adding .abs() method to the array object
In-Reply-To: <CAMMTP+CPa3mKgbcFzEpKKf4j_XdDvuBiZ2NiQxz5UDAtvGFNOw@mail.gmail.com>
References: <loom.20130223T163431-613@post.gmane.org>
	<CAPJVwB=p9feobpeUSsUS5hGu6hwdZ6R79XxBiz8Y6qUzGLiouQ@mail.gmail.com>
	<CAF6FJisOcq37r=kHMhFPXDxg3bRAhXdE4oEk5j27-ZQBToWXrg@mail.gmail.com>
	<CAMMTP+CPa3mKgbcFzEpKKf4j_XdDvuBiZ2NiQxz5UDAtvGFNOw@mail.gmail.com>
Message-ID: <FF6CEFDA-45C0-444B-9569-F0865BE16392@gmail.com>


On 24. feb. 2013, at 02:20, josef.pktd at gmail.com wrote:

> On Sat, Feb 23, 2013 at 3:33 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Sat, Feb 23, 2013 at 7:25 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Sat, Feb 23, 2013 at 3:38 PM, Till Stensitzki <mail.till at gmx.de> wrote:
>>>> Hello,
>>>> i know that the array object is already crowded, but i would like
>>>> to see the abs method added, especially doing work on the console.
>>>> Considering that many much less used functions are also implemented
>>>> as a method, i don't think adding one more would be problematic.
>>> 
>>> My gut feeling is that we have too many methods on ndarray, not too
>>> few, but in any case, can you elaborate? What's the rationale for why
>>> np.abs(a) is so much harder than a.abs(), and why this function and
>>> not other unary functions?
>> 
>> Or even abs(a).
> 
> 
> my reason is that I often use
> 
> arr.max()
> but then decide I want to us abs and need
> np.max(np.abs(arr))
> instead of arr.abs().max() (and often I write that first to see the
> error message)
> 
> I don't like
> np.abs(arr).max()
> because I have to concentrate to much on the braces, especially if arr
> is a calculation
> 
> I wrote several times
> def maxabs(arr):
>    return np.max(np.abs(arr))
> 
> silly, but I use it often and np.is_close is not useful (doesn't show how close)
> 
> Just a small annoyance, but I think it's the method that I miss most often.
> 
> Josef

Very well put. I wholeheartedly agree. I'd be sort of happy with all functions becoming np.xxx() in numpy 2.0, for consistency.

Paul


From ralf.gommers at gmail.com  Mon Mar  4 15:41:46 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 4 Mar 2013 21:41:46 +0100
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
Message-ID: <CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>

On Tue, Feb 26, 2013 at 11:17 AM, Todd <toddrjen at gmail.com> wrote:

> Is numpy planning to participate in GSOC this year, either on their own or
> as a part of another group?
>

If we participate, it should be under the PSF organization. I suspect
participation for NumPy (and SciPy) largely depends on mentors being
available.


> If so, should we start trying to get some project suggestions together?
>

That can't hurt - good project descriptions will be useful not just for
GSOC but also for people new to the project looking for ways to contribute.
I suggest to use the wiki on Github for that.

Ralf


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/7936fb9e/attachment.html>

From toddrjen at gmail.com  Mon Mar  4 17:29:38 2013
From: toddrjen at gmail.com (Todd)
Date: Mon, 4 Mar 2013 23:29:38 +0100
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
Message-ID: <CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>

On Mon, Mar 4, 2013 at 9:41 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
>
> On Tue, Feb 26, 2013 at 11:17 AM, Todd <toddrjen at gmail.com> wrote:
>
>> Is numpy planning to participate in GSOC this year, either on their own
>> or as a part of another group?
>>
>
> If we participate, it should be under the PSF organization. I suspect
> participation for NumPy (and SciPy) largely depends on mentors being
> available.
>
>
>> If so, should we start trying to get some project suggestions together?
>>
>
> That can't hurt - good project descriptions will be useful not just for
> GSOC but also for people new to the project looking for ways to contribute.
> I suggest to use the wiki on Github for that.
>
> Ralf
>
>
>
>
I have some ideas, but they may not be suitable for GSOC or may just be
terrible ideas, so feel free to reject them:

1. A polar dtype.  It would be similar to the complex dtype in that it
would have two components, but instead of them being real and imaginary,
they would be amplitude and angle.  Besides the dtype, there should be
either functions or methods to convert between complex and polar dtypes,
and existing functions should be prepared to handle the new dtype.  I it
could be made to be able to handle an arbitrary number of dimensions this
would be better yet, but I don't know if this is possible not to mention
practical.  There is a lot of mathematics, including both signal processing
and vector analysis, that is often convenient to work with in this format.

2. We discussed this before, but right now subclasses of ndarray don't have
any way to preserve their class attributes when using functions that work
on multiple ndarrays, such as with concatenate.  The current
__array_finalize__ method only takes a single array.  This project would be
to work out a method to handle this sort of situation, perhaps requiring a
new method, and making sure numpy methods and functions properly invoke it.

3. Structured arrays are accessed in a manner similar to python
dictionaries, using a key.  However, they don't support the normal python
dictionary methods like keys, values, items, iterkeys, itervalues,
iteritems, etc.  This project would be to implement as much of the
dictionary (and ordereddict) API as possible in structured arrays (making
sure that the resulting API presented to the user takes into account
whether python 2 or python 3 is being used).

4. The numpy ndarray class stores data in a regular manner in memory.  This
makes many linear algebra operations easier, but makes changing the number
of elements in an array nearly impossible in practice unless you are very
careful.  There are other data structures that make adding and removing
elements easier, but are not as efficient at linear algebra operations.
The purpose of this project would be to create such a class in numpy, one
that is duck type compatible with ndarray but makes resizing feasible.
This would obviously come at a performance penalty for linear algebra
related functions.  They would still have consistent dtypes and could not
be nested, unlike python lists.  This could either be based on a new
c-based type or be a subclass of list under the hood.

5. Currently dtypes are limited to a set of fixed types, or combinations of
these types.  You can't have, say, a 48 bit float or a 1-bit bool.  This
project would be to allow users to create entirely new, non-standard dtypes
based on simple rules, such as specifying the length of the sign, length of
the exponent, and length of the mantissa for a custom floating-point
number.  Hopefully this would mostly be used for reading in non-standard
data and not used that often, but for some situations it could be useful
for storing data too (such as large amounts of boolean data, or genetic
code which can be stored in 2 bits and is often very large).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/359c1f03/attachment.html>

From jaime.frio at gmail.com  Mon Mar  4 18:21:05 2013
From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=)
Date: Mon, 4 Mar 2013 15:21:05 -0800
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
Message-ID: <CAPOWHWn5x2nwYPvaqdx8=BTHX1FnzGChnb2dpCjLRU24C_VBUg@mail.gmail.com>

On Mon, Mar 4, 2013 at 2:29 PM, Todd <toddrjen at gmail.com> wrote:

>
> 5. Currently dtypes are limited to a set of fixed types, or combinations
> of these types.  You can't have, say, a 48 bit float or a 1-bit bool.  This
> project would be to allow users to create entirely new, non-standard dtypes
> based on simple rules, such as specifying the length of the sign, length of
> the exponent, and length of the mantissa for a custom floating-point
> number.  Hopefully this would mostly be used for reading in non-standard
> data and not used that often, but for some situations it could be useful
> for storing data too (such as large amounts of boolean data, or genetic
> code which can be stored in 2 bits and is often very large).
>

I second this general idea. Simply having a pair of packbits/unpackbits
functions that could work with 2 and 4 bit uints would make my life easier.
If it were possible to have an array of dtype 'uint4' that used half the
space of a 'uint8', but could have ufuncs an the like ran on it, it would
be pure bliss. Not that I'm complaining, but a man can dream...

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/6301f7f7/attachment.html>

From jaime.frio at gmail.com  Mon Mar  4 19:23:46 2013
From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=)
Date: Mon, 4 Mar 2013 16:23:46 -0800
Subject: [Numpy-discussion] polyfit with fixed points
Message-ID: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>

A couple of days back, answering a question in StackExchange (
http://stackoverflow.com/a/15196628/110026), I found myself using Lagrange
multipliers to fit a polynomial with least squares to data, making sure it
went through some fixed points. This time it was relatively easy, because
some 5 years ago I came across the same problem in real life, and spent the
better part of a week banging my head against it. Even knowing what you are
doing, it is far from simple, and in my own experience very useful: I think
the only time ever I have fitted a polynomial to data with a definite
purpose, it required that some points were fixed.

Seeing that polyfit is entirely coded in python, it would be relatively
straightforward to add support for fixed points. It is also something I
feel capable, and willing, of doing.

 * Is such an additional feature something worthy of investigating, or will
it never find its way into numpy.polyfit?
 * Any ideas on the best syntax for the extra parameters?

Thanks,

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/e733b1f2/attachment.html>

From aron at ahmadia.net  Mon Mar  4 19:53:57 2013
From: aron at ahmadia.net (Aron Ahmadia)
Date: Mon, 4 Mar 2013 19:53:57 -0500
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
Message-ID: <CAPhiW4icbCpXtUdXSfuihgift7x5=LfGb5GriTpNgTTiy=G9dw@mail.gmail.com>

Interesting, that question would probably have gotten a different response
on scicomp, it is a pity we are not attracting more questions there!

I know there are two polyfit modules in numpy, one in numpy.polyfit, the
other in numpy.polynomial, the functionality you are suggesting seems to
fit in either.

What parameters/functionality are you considering adding?

A


On Mon, Mar 4, 2013 at 7:23 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> A couple of days back, answering a question in StackExchange (
> http://stackoverflow.com/a/15196628/110026), I found myself using
> Lagrange multipliers to fit a polynomial with least squares to data, making
> sure it went through some fixed points. This time it was relatively easy,
> because some 5 years ago I came across the same problem in real life, and
> spent the better part of a week banging my head against it. Even knowing
> what you are doing, it is far from simple, and in my own experience very
> useful: I think the only time ever I have fitted a polynomial to data with
> a definite purpose, it required that some points were fixed.
>
> Seeing that polyfit is entirely coded in python, it would be relatively
> straightforward to add support for fixed points. It is also something I
> feel capable, and willing, of doing.
>
>  * Is such an additional feature something worthy of investigating, or
> will it never find its way into numpy.polyfit?
>  * Any ideas on the best syntax for the extra parameters?
>
> Thanks,
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/f203e7aa/attachment.html>

From mail.till at gmx.de  Mon Mar  4 20:09:00 2013
From: mail.till at gmx.de (Till Stensitzki)
Date: Tue, 5 Mar 2013 01:09:00 +0000 (UTC)
Subject: [Numpy-discussion] GSOC 2013
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
Message-ID: <loom.20130305T015841-2@post.gmane.org>

Todd <toddrjen <at> gmail.com> writes:
> 
> I have some ideas, but they may not be suitable for GSOC or may just be
terrible ideas, so feel free to reject them:
> 

I have also a possible (terrible?) idea in my mind:
Including (maybe optional as blas) faster transcendental functions into numpy.
Something like https://github.com/herumi/fmath or using the MKL. I think
numpy just uses the standard std functions, whiche are not optimized for speed.

greetings
Till


From jaime.frio at gmail.com  Mon Mar  4 20:45:45 2013
From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=)
Date: Mon, 4 Mar 2013 17:45:45 -0800
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAPhiW4icbCpXtUdXSfuihgift7x5=LfGb5GriTpNgTTiy=G9dw@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAPhiW4icbCpXtUdXSfuihgift7x5=LfGb5GriTpNgTTiy=G9dw@mail.gmail.com>
Message-ID: <CAPOWHW=8_4rBDusMhMN_xj0ZOi+4B-O6tf5yohX9WpCU6mEKnw@mail.gmail.com>

On Mon, Mar 4, 2013 at 4:53 PM, Aron Ahmadia <aron at ahmadia.net> wrote:

> Interesting, that question would probably have gotten a different response
> on scicomp, it is a pity we are not attracting more questions there!
>
> I know there are two polyfit modules in numpy, one in numpy.polyfit, the
> other in numpy.polynomial, the functionality you are suggesting seems to
> fit in either.
>
> What parameters/functionality are you considering adding?
>

Well, you need two more array-likes, x_fixed and y_fixed, which could
probably be fed to polyfit as an optional tuple parameter:

polyfit(x, y, deg, fixed_points=(x_fixed, y_fixed))

The standard return would still be the deg + 1 coefficients of the fitted
polynomial, so the workings would be perfectly backwards compatible.

An optional return, either when full=True, or by setting an additional
lagrange_mult=True flag, could include the values of the Lagrange
multipliers calculated during the fit.

Jaime


> A
>
>
> On Mon, Mar 4, 2013 at 7:23 PM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> A couple of days back, answering a question in StackExchange (
>> http://stackoverflow.com/a/15196628/110026), I found myself using
>> Lagrange multipliers to fit a polynomial with least squares to data, making
>> sure it went through some fixed points. This time it was relatively easy,
>> because some 5 years ago I came across the same problem in real life, and
>> spent the better part of a week banging my head against it. Even knowing
>> what you are doing, it is far from simple, and in my own experience very
>> useful: I think the only time ever I have fitted a polynomial to data with
>> a definite purpose, it required that some points were fixed.
>>
>> Seeing that polyfit is entirely coded in python, it would be relatively
>> straightforward to add support for fixed points. It is also something I
>> feel capable, and willing, of doing.
>>
>>  * Is such an additional feature something worthy of investigating, or
>> will it never find its way into numpy.polyfit?
>>  * Any ideas on the best syntax for the extra parameters?
>>
>> Thanks,
>>
>> Jaime
>>
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>> de dominaci?n mundial.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/90551cd7/attachment.html>

From charlesr.harris at gmail.com  Mon Mar  4 23:10:21 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 4 Mar 2013 21:10:21 -0700
Subject: [Numpy-discussion] Remove interactive setup
Message-ID: <CAB6mnx+N8_GBpKY_NMikVt=fuDAQNB7DWT7OC3ykJsoEWtjGxQ@mail.gmail.com>

In distutils there are three files that provide some interactive setup:


   1. numpy/distutils/core.py
   2. numpy/distutils/fcompiler/gnu.py
   3. numpy/distutils/interactive.py

In Python3 `raw_input` has been renamed 'input' and python2 'input' is
gone. I propose that the easiest solution to this compatibility problem is
to remove all support for interactive numpy setup.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/781fe834/attachment.html>

From aron at ahmadia.net  Mon Mar  4 23:12:47 2013
From: aron at ahmadia.net (Aron Ahmadia)
Date: Mon, 4 Mar 2013 23:12:47 -0500
Subject: [Numpy-discussion] Remove interactive setup
In-Reply-To: <CAB6mnx+N8_GBpKY_NMikVt=fuDAQNB7DWT7OC3ykJsoEWtjGxQ@mail.gmail.com>
References: <CAB6mnx+N8_GBpKY_NMikVt=fuDAQNB7DWT7OC3ykJsoEWtjGxQ@mail.gmail.com>
Message-ID: <CAPhiW4hFEwsUZteP25oMVRsqDTkDRJmY6jra3qFAk1R1nakjAQ@mail.gmail.com>

I've built numpy on many different machines, including supercomputers, and
I have never used interactive setup.  I agree with the proposal to remove
it.

A


On Mon, Mar 4, 2013 at 11:10 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

> In distutils there are three files that provide some interactive setup:
>
>
>    1. numpy/distutils/core.py
>    2. numpy/distutils/fcompiler/gnu.py
>    3. numpy/distutils/interactive.py
>
> In Python3 `raw_input` has been renamed 'input' and python2 'input' is
> gone. I propose that the easiest solution to this compatibility problem is
> to remove all support for interactive numpy setup.
>
> Thoughts?
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/08784bc2/attachment.html>

From charlesr.harris at gmail.com  Mon Mar  4 23:25:48 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 4 Mar 2013 21:25:48 -0700
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAPhiW4icbCpXtUdXSfuihgift7x5=LfGb5GriTpNgTTiy=G9dw@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAPhiW4icbCpXtUdXSfuihgift7x5=LfGb5GriTpNgTTiy=G9dw@mail.gmail.com>
Message-ID: <CAB6mnxLJq=XyRSMBL6XoRHM8sBCEEPV0F8z1jgABmbLW=C88sQ@mail.gmail.com>

On Mon, Mar 4, 2013 at 5:53 PM, Aron Ahmadia <aron at ahmadia.net> wrote:

> Interesting, that question would probably have gotten a different response
> on scicomp, it is a pity we are not attracting more questions there!
>
> I know there are two polyfit modules in numpy, one in numpy.polyfit, the
> other in numpy.polynomial, the functionality you are suggesting seems to
> fit in either.
>
> What parameters/functionality are you considering adding?
>
> A
>

The discussion list convention is to bottom post.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/362c760a/attachment.html>

From charlesr.harris at gmail.com  Mon Mar  4 23:37:43 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 4 Mar 2013 21:37:43 -0700
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
Message-ID: <CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>

On Mon, Mar 4, 2013 at 5:23 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> A couple of days back, answering a question in StackExchange (
> http://stackoverflow.com/a/15196628/110026), I found myself using
> Lagrange multipliers to fit a polynomial with least squares to data, making
> sure it went through some fixed points. This time it was relatively easy,
> because some 5 years ago I came across the same problem in real life, and
> spent the better part of a week banging my head against it. Even knowing
> what you are doing, it is far from simple, and in my own experience very
> useful: I think the only time ever I have fitted a polynomial to data with
> a definite purpose, it required that some points were fixed.
>
> Seeing that polyfit is entirely coded in python, it would be relatively
> straightforward to add support for fixed points. It is also something I
> feel capable, and willing, of doing.
>
>  * Is such an additional feature something worthy of investigating, or
> will it never find its way into numpy.polyfit?
>  * Any ideas on the best syntax for the extra parameters?
>
>
There are actually seven versions of polynomial fit, two for the usual
polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e,
and Laguerre ;)

How do you propose to implement it? I think Lagrange multipliers is
overkill, I'd rather see using the weights (approximate) or change of
variable -- a permutation in this case -- followed by qr and lstsq.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/172781df/attachment.html>

From kwmsmith at gmail.com  Mon Mar  4 23:49:30 2013
From: kwmsmith at gmail.com (Kurt Smith)
Date: Mon, 4 Mar 2013 22:49:30 -0600
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
Message-ID: <CALpTaNah23=r==YmOGauCQn=1KLKZhNyJHzgR9fzEk6uqMiahw@mail.gmail.com>

On Mon, Mar 4, 2013 at 4:29 PM, Todd <toddrjen at gmail.com> wrote:
>
> 3. Structured arrays are accessed in a manner similar to python dictionaries,
> using a key.  However, they don't support the normal python dictionary
> methods like keys, values, items, iterkeys, itervalues, iteritems, etc.  This
> project would be to implement as much of the dictionary (and ordereddict) API
> as possible in structured arrays (making sure that the resulting API
> presented to the user takes into account whether python 2 or python 3 is
> being used).

Along these lines: what about implementing the new "memory friendly" dictionary
[0] with a NumPy structured array backend for the dense array portion, and
allowing any specified column of the array to be the dictionary keys?  This
would merge the strengths of NumPy structured arrays with Python dictionaries.
Some thought would have to be given to mutability / immutability issues, but
these are surmountable.  Further enhancements would be to allow for multiple
key columns -- analogous to multiple indexes into a database.

[0] http://mail.python.org/pipermail/python-dev/2012-December/123028.html

>
> 4. The numpy ndarray class stores data in a regular manner in memory.  This
> makes many linear algebra operations easier, but makes changing the number
> of elements in an array nearly impossible in practice unless you are very
> careful.  There are other data structures that make adding and removing
> elements easier, but are not as efficient at linear algebra operations.  The
> purpose of this project would be to create such a class in numpy, one that
> is duck type compatible with ndarray but makes resizing feasible.  This
> would obviously come at a performance penalty for linear algebra related
> functions.  They would still have consistent dtypes and could not be nested,
> unlike python lists.  This could either be based on a new c-based type or be
> a subclass of list under the hood.

This made me think of a serious performance limitation of structured dtypes: a
structured dtype is always "packed", which may lead to terrible byte alignment
for common types.  For instance, `dtype([('a', 'u1'), ('b',
'u8')]).itemsize == 9`,
meaning that the 8-byte integer is not aligned as an equivalent C-struct's
would be, leading to all sorts of horrors at the cache and register level.
Python's ctypes does the right thing here, and can be mined for ideas.   For
instance, the equivalent ctypes Structure adds pad bytes so the 8-byte integer
is on the correct boundary:

    class Aligned(ctypes.Structure):
        _fields_ = [('a', ctypes.c_uint8),
                    ('b', ctypes.c_uint64)]

    print ctypes.sizeof(Aligned()) # --> 16

I'd be surprised if someone hasn't already proposed fixing this, although
perhaps this would be outside the scope of a GSOC project.  I'm willing to
wager that the performance improvements would be easily measureable.

Just some more thoughts.

Kurt


From ralf.gommers at gmail.com  Tue Mar  5 00:37:49 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 5 Mar 2013 06:37:49 +0100
Subject: [Numpy-discussion] Remove interactive setup
In-Reply-To: <CAPhiW4hFEwsUZteP25oMVRsqDTkDRJmY6jra3qFAk1R1nakjAQ@mail.gmail.com>
References: <CAB6mnx+N8_GBpKY_NMikVt=fuDAQNB7DWT7OC3ykJsoEWtjGxQ@mail.gmail.com>
	<CAPhiW4hFEwsUZteP25oMVRsqDTkDRJmY6jra3qFAk1R1nakjAQ@mail.gmail.com>
Message-ID: <CABL7CQgKqbk2v7xOegS-BvGebPv+hfacTx6vrDvMXRKodocDDQ@mail.gmail.com>

On Tue, Mar 5, 2013 at 5:12 AM, Aron Ahmadia <aron at ahmadia.net> wrote:

> I've built numpy on many different machines, including supercomputers, and
> I have never used interactive setup.  I agree with the proposal to remove
> it.
>
> A
>
>
> On Mon, Mar 4, 2013 at 11:10 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>> In distutils there are three files that provide some interactive setup:
>>
>>
>>    1. numpy/distutils/core.py
>>    2. numpy/distutils/fcompiler/gnu.py
>>    3. numpy/distutils/interactive.py
>>
>> In Python3 `raw_input` has been renamed 'input' and python2 'input' is
>> gone. I propose that the easiest solution to this compatibility problem is
>> to remove all support for interactive numpy setup.
>>
>> Thoughts?
>>
>
+1 for removing

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130305/f517474f/attachment.html>

From charlesr.harris at gmail.com  Tue Mar  5 01:34:39 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 4 Mar 2013 23:34:39 -0700
Subject: [Numpy-discussion] Remove interactive setup
In-Reply-To: <CABL7CQgKqbk2v7xOegS-BvGebPv+hfacTx6vrDvMXRKodocDDQ@mail.gmail.com>
References: <CAB6mnx+N8_GBpKY_NMikVt=fuDAQNB7DWT7OC3ykJsoEWtjGxQ@mail.gmail.com>
	<CAPhiW4hFEwsUZteP25oMVRsqDTkDRJmY6jra3qFAk1R1nakjAQ@mail.gmail.com>
	<CABL7CQgKqbk2v7xOegS-BvGebPv+hfacTx6vrDvMXRKodocDDQ@mail.gmail.com>
Message-ID: <CAB6mnx+S=BP4fm7_PZ69UCtrgQPYUey4RqNi-YuOk82iEY8pLA@mail.gmail.com>

On Mon, Mar 4, 2013 at 10:37 PM, Ralf Gommers <ralf.gommers at gmail.com>wrote:

>
>
>
> On Tue, Mar 5, 2013 at 5:12 AM, Aron Ahmadia <aron at ahmadia.net> wrote:
>
>> I've built numpy on many different machines, including supercomputers,
>> and I have never used interactive setup.  I agree with the proposal to
>> remove it.
>>
>> A
>>
>>
>> On Mon, Mar 4, 2013 at 11:10 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>> In distutils there are three files that provide some interactive setup:
>>>
>>>
>>>    1. numpy/distutils/core.py
>>>    2. numpy/distutils/fcompiler/gnu.py
>>>    3. numpy/distutils/interactive.py
>>>
>>> In Python3 `raw_input` has been renamed 'input' and python2 'input' is
>>> gone. I propose that the easiest solution to this compatibility problem is
>>> to remove all support for interactive numpy setup.
>>>
>>> Thoughts?
>>>
>>
> +1 for removing
>
>
I note that the way to access it is to run python setup.py with no
arguments. I wonder what the proper message should be in that case?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/449294c4/attachment.html>

From brad.froehle at gmail.com  Tue Mar  5 01:59:33 2013
From: brad.froehle at gmail.com (Bradley M. Froehle)
Date: Mon, 4 Mar 2013 22:59:33 -0800
Subject: [Numpy-discussion] Remove interactive setup
In-Reply-To: <CAB6mnx+S=BP4fm7_PZ69UCtrgQPYUey4RqNi-YuOk82iEY8pLA@mail.gmail.com>
References: <CAB6mnx+N8_GBpKY_NMikVt=fuDAQNB7DWT7OC3ykJsoEWtjGxQ@mail.gmail.com>
	<CAPhiW4hFEwsUZteP25oMVRsqDTkDRJmY6jra3qFAk1R1nakjAQ@mail.gmail.com>
	<CABL7CQgKqbk2v7xOegS-BvGebPv+hfacTx6vrDvMXRKodocDDQ@mail.gmail.com>
	<CAB6mnx+S=BP4fm7_PZ69UCtrgQPYUey4RqNi-YuOk82iEY8pLA@mail.gmail.com>
Message-ID: <CAHXv-MgV8GERiiJvoV+_LrObA19C=VYjpTmeC7=3XCYqGjAw4Q@mail.gmail.com>

On Mon, Mar 4, 2013 at 10:34 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

> I note that the way to access it is to run python setup.py with no
> arguments. I wonder what the proper message should be in that case?
>

How about usage instructions and an error message, similar to what a basic
distutils setup script would provide?

-Brad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/df6a7139/attachment.html>

From Nicolas.Rougier at inria.fr  Tue Mar  5 02:01:32 2013
From: Nicolas.Rougier at inria.fr (Nicolas Rougier)
Date: Tue, 5 Mar 2013 08:01:32 +0100
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CALpTaNah23=r==YmOGauCQn=1KLKZhNyJHzgR9fzEk6uqMiahw@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CALpTaNah23=r==YmOGauCQn=1KLKZhNyJHzgR9fzEk6uqMiahw@mail.gmail.com>
Message-ID: <F8DDA3F6-1979-42FD-9AEB-BB686F8B11F3@inria.fr>


> This made me think of a serious performance limitation of structured dtypes: a
> structured dtype is always "packed", which may lead to terrible byte alignment
> for common types.  For instance, `dtype([('a', 'u1'), ('b',
> 'u8')]).itemsize == 9`,
> meaning that the 8-byte integer is not aligned as an equivalent C-struct's
> would be, leading to all sorts of horrors at the cache and register level.
> Python's ctypes does the right thing here, and can be mined for ideas.   For
> instance, the equivalent ctypes Structure adds pad bytes so the 8-byte integer
> is on the correct boundary:
> 
>    class Aligned(ctypes.Structure):
>        _fields_ = [('a', ctypes.c_uint8),
>                    ('b', ctypes.c_uint64)]
> 
>    print ctypes.sizeof(Aligned()) # --> 16
> 
> I'd be surprised if someone hasn't already proposed fixing this, although
> perhaps this would be outside the scope of a GSOC project.  I'm willing to
> wager that the performance improvements would be easily measureable.


I've been confronted to this very problem and ended up coding a "group class" which is a "split" structured array (each field is stored as a single array) offering the same interface as a regular structured array.
http://www.loria.fr/~rougier/coding/software/numpy_group.py


Nicolas


From jaime.frio at gmail.com  Tue Mar  5 02:41:01 2013
From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=)
Date: Mon, 4 Mar 2013 23:41:01 -0800
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>
Message-ID: <CAPOWHWk=PsjBaFS00M1vEVPtDJ99hwm94ps-Rnsiesb+D7SY1A@mail.gmail.com>

On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

>
> There are actually seven versions of polynomial fit, two for the usual
> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e,
> and Laguerre ;)
>

Correct me if I am wrong, but the fitted function is the same regardless of
the polynomial basis used. I don't know if there can be numerical stability
issues, but chebfit(x, y, n) returns the same as poly2cheb(polyfit(x, y,
n)).

In any case, with all the already existing support for these special
polynomials, it wouldn't be too hard to set the problem up to calculate the
right coefficients directly for each case.


> How do you propose to implement it? I think Lagrange multipliers is
> overkill, I'd rather see using the weights (approximate) or change of
> variable -- a permutation in this case -- followed by qr and lstsq.
>

The weights method is already in place, but I find it rather inelegant and
unsatisfactory as a solution to this problem. But if it is deemed
sufficient, then there is of course no need to go any further.

I hadn't thought of any other way than using Lagrange multipliers, but
looking at it in more detail, I am not sure it will be possible to
formulate it in a manner that can be fed to lstsq, as polyfit does today.
And if it can't, it probably wouldn't make much sense to have two different
methods which cannot produce the same full output running under the same
hood.

I can't figure out your "change of variable" method from the succinct
description, could you elaborate a little more?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130304/3689d44a/attachment.html>

From efiring at hawaii.edu  Tue Mar  5 02:45:20 2013
From: efiring at hawaii.edu (Eric Firing)
Date: Mon, 04 Mar 2013 21:45:20 -1000
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <F8DDA3F6-1979-42FD-9AEB-BB686F8B11F3@inria.fr>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CALpTaNah23=r==YmOGauCQn=1KLKZhNyJHzgR9fzEk6uqMiahw@mail.gmail.com>
	<F8DDA3F6-1979-42FD-9AEB-BB686F8B11F3@inria.fr>
Message-ID: <5135A290.2050405@hawaii.edu>

On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
>> >This made me think of a serious performance limitation of structured dtypes: a
>> >structured dtype is always "packed", which may lead to terrible byte alignment
>> >for common types.  For instance, `dtype([('a', 'u1'), ('b',
>> >'u8')]).itemsize == 9`,
>> >meaning that the 8-byte integer is not aligned as an equivalent C-struct's
>> >would be, leading to all sorts of horrors at the cache and register level.

Doesn't the "align" kwarg of np.dtype do what you want?

In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']), 
align=True)

In [3]: dt.itemsize
Out[3]: 16

Eric


From robert.kern at gmail.com  Tue Mar  5 04:09:44 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 5 Mar 2013 09:09:44 +0000
Subject: [Numpy-discussion] Remove interactive setup
In-Reply-To: <CAB6mnx+S=BP4fm7_PZ69UCtrgQPYUey4RqNi-YuOk82iEY8pLA@mail.gmail.com>
References: <CAB6mnx+N8_GBpKY_NMikVt=fuDAQNB7DWT7OC3ykJsoEWtjGxQ@mail.gmail.com>
	<CAPhiW4hFEwsUZteP25oMVRsqDTkDRJmY6jra3qFAk1R1nakjAQ@mail.gmail.com>
	<CABL7CQgKqbk2v7xOegS-BvGebPv+hfacTx6vrDvMXRKodocDDQ@mail.gmail.com>
	<CAB6mnx+S=BP4fm7_PZ69UCtrgQPYUey4RqNi-YuOk82iEY8pLA@mail.gmail.com>
Message-ID: <CAF6FJiuyycRepJ_vnpa10Yjwzyf4Z_=FJD_6HRH6Mr1-Byao9w@mail.gmail.com>

On Tue, Mar 5, 2013 at 6:34 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:

> I note that the way to access it is to run python setup.py with no
> arguments. I wonder what the proper message should be in that case?

Just let distutils handle it.

$ python setup.py
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: no commands supplied

Anyone who was expecting the interactive setup will probably complain here.

--
Robert Kern


From charlesr.harris at gmail.com  Tue Mar  5 08:23:49 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 5 Mar 2013 06:23:49 -0700
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAPOWHWk=PsjBaFS00M1vEVPtDJ99hwm94ps-Rnsiesb+D7SY1A@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>
	<CAPOWHWk=PsjBaFS00M1vEVPtDJ99hwm94ps-Rnsiesb+D7SY1A@mail.gmail.com>
Message-ID: <CAB6mnxLpLbi0p6OHyZZhx-gyRcQF6HzTwY6HSK0X+i2su0dbUg@mail.gmail.com>

On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>> There are actually seven versions of polynomial fit, two for the usual
>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e,
>> and Laguerre ;)
>>
>
> Correct me if I am wrong, but the fitted function is the same regardless
> of the polynomial basis used. I don't know if there can be numerical
> stability issues, but chebfit(x, y, n) returns the same as
> poly2cheb(polyfit(x, y, n)).
>
> In any case, with all the already existing support for these special
> polynomials, it wouldn't be too hard to set the problem up to calculate the
> right coefficients directly for each case.
>
>
>> How do you propose to implement it? I think Lagrange multipliers is
>> overkill, I'd rather see using the weights (approximate) or change of
>> variable -- a permutation in this case -- followed by qr and lstsq.
>>
>
> The weights method is already in place, but I find it rather inelegant and
> unsatisfactory as a solution to this problem. But if it is deemed
> sufficient, then there is of course no need to go any further.
>
> I hadn't thought of any other way than using Lagrange multipliers, but
> looking at it in more detail, I am not sure it will be possible to
> formulate it in a manner that can be fed to lstsq, as polyfit does today.
> And if it can't, it probably wouldn't make much sense to have two different
> methods which cannot produce the same full output running under the same
> hood.
>
> I can't figure out your "change of variable" method from the succinct
> description, could you elaborate a little more?
>

I think the place to add this is to lstsq as linear constraints. That is,
the coefficients must satisfy B * c = y_c for some set of equations B. In
the polynomial case the rows of B would be the powers of x at the points
you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the
design matrix of the unconstrained points A'  = A * v.T so that B' becomes
u * d. The coefficients are now replaced by new variables c' with the
contraints in the first two columns. If there are, say, 2 constraints, u *
d will be 2x2. Solve that equation for the first two constraints then
multiply the first two columns of the design matrix A' by the result and
put them on the rhs, i.e.,

    y = y - A'[:, :2] * c'[:2]

then solve the usual l least squares thing with

    A[:, 2:] * c'[2:] = y

to get the rest of the transformed coefficients c'. Put the coefficients
altogether and multiply with v^T to get

    c = v^T * c'

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130305/0d687d2c/attachment.html>

From charlesr.harris at gmail.com  Tue Mar  5 08:41:55 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 5 Mar 2013 06:41:55 -0700
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAB6mnxLpLbi0p6OHyZZhx-gyRcQF6HzTwY6HSK0X+i2su0dbUg@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>
	<CAPOWHWk=PsjBaFS00M1vEVPtDJ99hwm94ps-Rnsiesb+D7SY1A@mail.gmail.com>
	<CAB6mnxLpLbi0p6OHyZZhx-gyRcQF6HzTwY6HSK0X+i2su0dbUg@mail.gmail.com>
Message-ID: <CAB6mnxLxjozt7eUC+=g7WL8SFqp4WnZkJte1KsUnSd0md0d4=A@mail.gmail.com>

On Tue, Mar 5, 2013 at 6:23 AM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

>
>
> On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>> There are actually seven versions of polynomial fit, two for the usual
>>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e,
>>> and Laguerre ;)
>>>
>>
>> Correct me if I am wrong, but the fitted function is the same regardless
>> of the polynomial basis used. I don't know if there can be numerical
>> stability issues, but chebfit(x, y, n) returns the same as
>> poly2cheb(polyfit(x, y, n)).
>>
>> In any case, with all the already existing support for these special
>> polynomials, it wouldn't be too hard to set the problem up to calculate the
>> right coefficients directly for each case.
>>
>>
>>> How do you propose to implement it? I think Lagrange multipliers is
>>> overkill, I'd rather see using the weights (approximate) or change of
>>> variable -- a permutation in this case -- followed by qr and lstsq.
>>>
>>
>> The weights method is already in place, but I find it rather inelegant
>> and unsatisfactory as a solution to this problem. But if it is deemed
>> sufficient, then there is of course no need to go any further.
>>
>> I hadn't thought of any other way than using Lagrange multipliers, but
>> looking at it in more detail, I am not sure it will be possible to
>> formulate it in a manner that can be fed to lstsq, as polyfit does today.
>> And if it can't, it probably wouldn't make much sense to have two different
>> methods which cannot produce the same full output running under the same
>> hood.
>>
>> I can't figure out your "change of variable" method from the succinct
>> description, could you elaborate a little more?
>>
>
> I think the place to add this is to lstsq as linear constraints. That is,
> the coefficients must satisfy B * c = y_c for some set of equations B. In
> the polynomial case the rows of B would be the powers of x at the points
> you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the
> design matrix of the unconstrained points A'  = A * v.T so that B' becomes
> u * d. The coefficients are now replaced by new variables c' with the
> contraints in the first two columns. If there are, say, 2 constraints, u *
> d will be 2x2. Solve that equation for the first two constraints then
> multiply the first two columns of the design matrix A' by the result and
> put them on the rhs, i.e.,
>
>     y = y - A'[:, :2] * c'[:2]
>
> then solve the usual l least squares thing with
>
>     A[:, 2:] * c'[2:] = y
>
> to get the rest of the transformed coefficients c'. Put the coefficients
> altogether and multiply with v^T to get
>
>     c = v^T * c'
>
>
There are a few missing `'` in there, but I think you can get the idea, we
are making the substitution c = v^T * c'.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130305/5e0401c3/attachment.html>

From pelson.pub at gmail.com  Tue Mar  5 09:15:07 2013
From: pelson.pub at gmail.com (Phil Elson)
Date: Tue, 5 Mar 2013 14:15:07 +0000
Subject: [Numpy-discussion] Implementing a "find first" style function
Message-ID: <CA+L60sD10b9oAd+fMy1t941nX3kSy2RWS4RVkkhpFhi2pUdseg@mail.gmail.com>

The ticket https://github.com/numpy/numpy/issues/2269 discusses the
possibility of implementing a "find first" style function which can
optimise the process of finding the first value(s) which match a predicate
in a given 1D array. For example:


>>> a = np.sin(np.linspace(0, np.pi, 200))
>>> print find_first(a, lambda a: a > 0.9)
((71, ), 0.900479032457)


This has been discussed in several locations:

https://github.com/numpy/numpy/issues/2269
https://github.com/numpy/numpy/issues/2333
http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item


*Rationale*

For small arrays there is no real reason to avoid doing:

>>> a = np.sin(np.linspace(0, np.pi, 200))
>>> ind = (a > 0.9).nonzero()[0][0]
>>> print (ind, ), a[ind]
(71,) 0.900479032457


But for larger arrays, this can lead to massive amounts of work even if the
result is one of the first to be computed. Example:

>>> a = np.arange(1e8)
>>> print (a == 5).nonzero()[0][0]
5


So a function which terminates when the first matching value is found is
desirable.

As mentioned in #2269, it is possible to define a consistent ordering which
allows this functionality for >1D arrays, but IMHO it overcomplicates the
problem and was not a case that I personally needed, so I've limited the
scope to 1D arrays only.


*Implementation*

My initial assumption was that to get any kind of performance I would need
to write the *find* function in C, however after prototyping with some
array chunking it became apparent that a trivial python function would be
quick enough for my needs.

The approach I've implemented in the code found in #2269 simply breaks the
array into sub-arrays of maximum length *chunk_size* (2048 by default,
though there is no real science to this number), applies the given
predicating function, and yields the results from *nonzero()*. The given
function should be a python function which operates on the whole of the
sub-array element-wise (i.e. the function should be vectorized). Returning
a generator also has the benefit of allowing users to get the first
*n*matching values/indices.


*Results*


I timed the implementation of *find* found in my comment at
https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with an
obvious test:


In [1]: from np_utils import find

In [2]: import numpy as np

In [3]: import numpy.random

In [4]: np.random.seed(1)

In [5]: a = np.random.randn(1e8)

In [6]: a.min(), a.max()
Out[6]: (-6.1194900990552776, 5.9632246301166321)

In [7]: next(find(a, lambda a: np.abs(a) > 6))
Out[7]: ((33105441,), -6.1194900990552776)

In [8]: (np.abs(a) > 6).nonzero()
Out[8]: (array([33105441]),)

In [9]: %timeit (np.abs(a) > 6).nonzero()
1 loops, best of 3: 1.51 s per loop

In [10]: %timeit next(find(a, lambda a: np.abs(a) > 6))
1 loops, best of 3: 912 ms per loop

In [11]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=100000))
1 loops, best of 3: 470 ms per loop

In [12]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=1000000))
1 loops, best of 3: 483 ms per loop


This shows that picking a sensible *chunk_size* can yield massive speed-ups
(nonzero is x3 slower in one case). A similar example with a much smaller
1D array shows similar promise:

In [41]: a = np.random.randn(1e4)

In [42]: %timeit next(find(a, lambda a: np.abs(a) > 3))
10000 loops, best of 3: 35.8 us per loop

In [43]: %timeit (np.abs(a) > 3).nonzero()
10000 loops, best of 3: 148 us per loop


As I commented on the issue tracker, if you think this function is worth
taking forward, I'd be happy to open up a pull request.

Feedback greatfully received.

Cheers,

Phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130305/5fb42bc1/attachment.html>

From gbuday at gmail.com  Tue Mar  5 09:58:38 2013
From: gbuday at gmail.com (Gergely Buday)
Date: Tue, 5 Mar 2013 15:58:38 +0100
Subject: [Numpy-discussion] scipy_distutils.fcompiler
Message-ID: <CA+3iOzn+uU9PrrsWNZSDnm-oPGQYDShuQXD8hrbQu1et1MC2Sw@mail.gmail.com>

Hi there,

I try to compile a program developed with scipy. It is installed on my
Ubuntu 12.04 box but upon make I get:

Traceback (most recent call last):
  File "/usr/local/bin/f2py", line 4, in <module>
    f2py2e.main()
  File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line
677, in main
    run_compile()
  File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line
536, in run_compile
    import scipy_distutils.fcompiler
ImportError: No module named scipy_distutils.fcompiler

What should I do to fix this?

I have scipy Version: 0.9.0+dfsg1-1ubuntu2

- Gergely


From djpine at gmail.com  Tue Mar  5 10:50:34 2013
From: djpine at gmail.com (David Pine)
Date: Tue, 5 Mar 2013 10:50:34 -0500
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
Message-ID: <CADkJM=XJ=2iz2qaxHogb8MtdJk=ieYP81rp1R6Z3OHny=coJyw@mail.gmail.com>

Jaime,

If you are going to work on this, you should also take a look at the recent
thread
http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065649.html,
which is about the weighting function, which is in a confused state in the
current version of polyfit.  By the way, Numerical Recipes has a nice
discussion both about fixing parameters and about weighting the data in
different ways in polynomial least squares fitting.

David


On Mon, Mar 4, 2013 at 7:23 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> A couple of days back, answering a question in StackExchange (
> http://stackoverflow.com/a/15196628/110026), I found myself using
> Lagrange multipliers to fit a polynomial with least squares to data, making
> sure it went through some fixed points. This time it was relatively easy,
> because some 5 years ago I came across the same problem in real life, and
> spent the better part of a week banging my head against it. Even knowing
> what you are doing, it is far from simple, and in my own experience very
> useful: I think the only time ever I have fitted a polynomial to data with
> a definite purpose, it required that some points were fixed.
>
> Seeing that polyfit is entirely coded in python, it would be relatively
> straightforward to add support for fixed points. It is also something I
> feel capable, and willing, of doing.
>
>  * Is such an additional feature something worthy of investigating, or
> will it never find its way into numpy.polyfit?
>  * Any ideas on the best syntax for the extra parameters?
>
> Thanks,
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130305/7b62c89c/attachment.html>

From eirik.gjerlow at astro.uio.no  Tue Mar  5 10:56:40 2013
From: eirik.gjerlow at astro.uio.no (=?ISO-8859-1?Q?Eirik_Gjerl=F8w?=)
Date: Tue, 05 Mar 2013 15:56:40 +0000
Subject: [Numpy-discussion] scipy_distutils.fcompiler
In-Reply-To: <CA+3iOzn+uU9PrrsWNZSDnm-oPGQYDShuQXD8hrbQu1et1MC2Sw@mail.gmail.com>
References: <CA+3iOzn+uU9PrrsWNZSDnm-oPGQYDShuQXD8hrbQu1et1MC2Sw@mail.gmail.com>
Message-ID: <513615B8.9080006@uio.no>

Hey Gergely,

On my box, the fcompiler module is in numpy.distutils, so

import numpy.distutils.fcompiler

works for me at least!

Eirik

On 05. mars 2013 14:58, Gergely Buday wrote:
> Hi there,
>
> I try to compile a program developed with scipy. It is installed on my
> Ubuntu 12.04 box but upon make I get:
>
> Traceback (most recent call last):
>    File "/usr/local/bin/f2py", line 4, in <module>
>      f2py2e.main()
>    File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line
> 677, in main
>      run_compile()
>    File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line
> 536, in run_compile
>      import scipy_distutils.fcompiler
> ImportError: No module named scipy_distutils.fcompiler
>
> What should I do to fix this?
>
> I have scipy Version: 0.9.0+dfsg1-1ubuntu2
>
> - Gergely
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From cournape at gmail.com  Tue Mar  5 11:24:31 2013
From: cournape at gmail.com (David Cournapeau)
Date: Tue, 5 Mar 2013 16:24:31 +0000
Subject: [Numpy-discussion] scipy_distutils.fcompiler
In-Reply-To: <CA+3iOzn+uU9PrrsWNZSDnm-oPGQYDShuQXD8hrbQu1et1MC2Sw@mail.gmail.com>
References: <CA+3iOzn+uU9PrrsWNZSDnm-oPGQYDShuQXD8hrbQu1et1MC2Sw@mail.gmail.com>
Message-ID: <CAGY4rcVjoFVTCF6OpTgJkvHYZKMj5dE7_t7k8zdKR543T=wGmQ@mail.gmail.com>

On Tue, Mar 5, 2013 at 2:58 PM, Gergely Buday <gbuday at gmail.com> wrote:
> Hi there,
>
> I try to compile a program developed with scipy. It is installed on my
> Ubuntu 12.04 box but upon make I get:
>
> Traceback (most recent call last):
>   File "/usr/local/bin/f2py", line 4, in <module>
>     f2py2e.main()
>   File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line
> 677, in main
>     run_compile()
>   File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line
> 536, in run_compile
>     import scipy_distutils.fcompiler
> ImportError: No module named scipy_distutils.fcompiler
>

Looks like you're having an ancient f2py in there. You may want to use
the one included in numpy instead.

David


From andrew.collette at gmail.com  Tue Mar  5 12:33:56 2013
From: andrew.collette at gmail.com (Andrew Collette)
Date: Tue, 5 Mar 2013 10:33:56 -0700
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CAPOWHWn5x2nwYPvaqdx8=BTHX1FnzGChnb2dpCjLRU24C_VBUg@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CAPOWHWn5x2nwYPvaqdx8=BTHX1FnzGChnb2dpCjLRU24C_VBUg@mail.gmail.com>
Message-ID: <CALmrCV1tzc8_svQ186GR9bCEZZq+Xn+uM09Z5ucTHbq5k3zLDg@mail.gmail.com>

>> 5. Currently dtypes are limited to a set of fixed types, or combinations
>> of these types.  You can't have, say, a 48 bit float or a 1-bit bool.  This
>> project would be to allow users to create entirely new, non-standard dtypes
>> based on simple rules, such as specifying the length of the sign, length of
>> the exponent, and length of the mantissa for a custom floating-point number.
>> Hopefully this would mostly be used for reading in non-standard data and not
>> used that often, but for some situations it could be useful for storing data
>> too (such as large amounts of boolean data, or genetic code which can be
>> stored in 2 bits and is often very large).
>
>
> I second this general idea. Simply having a pair of packbits/unpackbits
> functions that could work with 2 and 4 bit uints would make my life easier.
> If it were possible to have an array of dtype 'uint4' that used half the
> space of a 'uint8', but could have ufuncs an the like ran on it, it would be
> pure bliss. Not that I'm complaining, but a man can dream...

I also think this would make a great addition to NumPy.  People may
even be able to save some work by leveraging the HDF5 code base; the
HDF5 guys have piles and piles of carefully tested C code for exactly
this purpose; converting between the common IEEE float sizes and those
with user-specified mantissa/exponents; 1, 2, 3 bit etc. integers and
the like.  It's all under a BSD-compatible license.  You'd have to
replace the bits which talk to the HDF5 type description system, but
it might be a good place to start.

Andrew


From kwmsmith at gmail.com  Tue Mar  5 13:14:22 2013
From: kwmsmith at gmail.com (Kurt Smith)
Date: Tue, 5 Mar 2013 12:14:22 -0600
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <5135A290.2050405@hawaii.edu>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CALpTaNah23=r==YmOGauCQn=1KLKZhNyJHzgR9fzEk6uqMiahw@mail.gmail.com>
	<F8DDA3F6-1979-42FD-9AEB-BB686F8B11F3@inria.fr>
	<5135A290.2050405@hawaii.edu>
Message-ID: <CALpTaNaSR6n5aTWgjFLM=fogxwWzEimn3K9PrLLCfxfDWzqPRw@mail.gmail.com>

On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing <efiring at hawaii.edu> wrote:
> On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
>>> >This made me think of a serious performance limitation of structured dtypes: a
>>> >structured dtype is always "packed", which may lead to terrible byte alignment
>>> >for common types.  For instance, `dtype([('a', 'u1'), ('b',
>>> >'u8')]).itemsize == 9`,
>>> >meaning that the 8-byte integer is not aligned as an equivalent C-struct's
>>> >would be, leading to all sorts of horrors at the cache and register level.
>
> Doesn't the "align" kwarg of np.dtype do what you want?
>
> In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']),
> align=True)
>
> In [3]: dt.itemsize
> Out[3]: 16

Thanks!  That's what I get for not checking before posting.

Consider this my vote to make `aligned=True` the default.

>
> Eric
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From njs at pobox.com  Tue Mar  5 13:52:56 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 5 Mar 2013 18:52:56 +0000
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CAPOWHWn5x2nwYPvaqdx8=BTHX1FnzGChnb2dpCjLRU24C_VBUg@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CAPOWHWn5x2nwYPvaqdx8=BTHX1FnzGChnb2dpCjLRU24C_VBUg@mail.gmail.com>
Message-ID: <CAPJVwBm-h3zzogxYe-ijY725Ez_CvOPj96gg55DXJWiizGsRvA@mail.gmail.com>

On 4 Mar 2013 23:21, "Jaime Fern?ndez del R?o" <jaime.frio at gmail.com> wrote:
>
> On Mon, Mar 4, 2013 at 2:29 PM, Todd <toddrjen at gmail.com> wrote:
>>
>>
>> 5. Currently dtypes are limited to a set of fixed types, or combinations
of these types.  You can't have, say, a 48 bit float or a 1-bit bool.  This
project would be to allow users to create entirely new, non-standard dtypes
based on simple rules, such as specifying the length of the sign, length of
the exponent, and length of the mantissa for a custom floating-point
number.  Hopefully this would mostly be used for reading in non-standard
data and not used that often, but for some situations it could be useful
for storing data too (such as large amounts of boolean data, or genetic
code which can be stored in 2 bits and is often very large).
>
>
> I second this general idea. Simply having a pair of packbits/unpackbits
functions that could work with 2 and 4 bit uints would make my life easier.
If it were possible to have an array of dtype 'uint4' that used half the
space of a 'uint8', but could have ufuncs an the like ran on it, it would
be pure bliss. Not that I'm complaining, but a man can dream...

This would be quite difficult, since it would require reworking the guts of
the ndarray data structure to store strides and buffer offsets in bits
rather than bytes, and probably with endianness handling too. Indexing is
all done at the ndarray buffer-of-bytes layer, without any involvement of
the dtype.

Consider:

a = zeros(10, dtype=uint4)
b = a[1::3]

Now b is a view onto a discontiguous set of half-bytes within a...

You could have a dtype that represented several uint4s that together added
up to an integral number of bytes, sort of like a structured dtype. Or
packbits()/unpackbits(), like you say.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130305/3333ebf7/attachment.html>

From toddrjen at gmail.com  Wed Mar  6 03:38:12 2013
From: toddrjen at gmail.com (Todd)
Date: Wed, 6 Mar 2013 09:38:12 +0100
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CAPJVwBm-h3zzogxYe-ijY725Ez_CvOPj96gg55DXJWiizGsRvA@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CAPOWHWn5x2nwYPvaqdx8=BTHX1FnzGChnb2dpCjLRU24C_VBUg@mail.gmail.com>
	<CAPJVwBm-h3zzogxYe-ijY725Ez_CvOPj96gg55DXJWiizGsRvA@mail.gmail.com>
Message-ID: <CAFpSVpJr0xCUAMNb2DhLb1dvf8PO4Deqd7pHLuQjStf2dLtX=g@mail.gmail.com>

On Mar 5, 2013 7:53 PM, "Nathaniel Smith" <njs at pobox.com> wrote:
>
> On 4 Mar 2013 23:21, "Jaime Fern?ndez del R?o" <jaime.frio at gmail.com>
wrote:
> >
> > On Mon, Mar 4, 2013 at 2:29 PM, Todd <toddrjen at gmail.com> wrote:
> >>
> >>
> >> 5. Currently dtypes are limited to a set of fixed types, or
combinations of these types.  You can't have, say, a 48 bit float or a
1-bit bool.  This project would be to allow users to create entirely new,
non-standard dtypes based on simple rules, such as specifying the length of
the sign, length of the exponent, and length of the mantissa for a custom
floating-point number.  Hopefully this would mostly be used for reading in
non-standard data and not used that often, but for some situations it could
be useful for storing data too (such as large amounts of boolean data, or
genetic code which can be stored in 2 bits and is often very large).
> >
> >
> > I second this general idea. Simply having a pair of packbits/unpackbits
functions that could work with 2 and 4 bit uints would make my life easier.
If it were possible to have an array of dtype 'uint4' that used half the
space of a 'uint8', but could have ufuncs an the like ran on it, it would
be pure bliss. Not that I'm complaining, but a man can dream...
>
> This would be quite difficult, since it would require reworking the guts
of the ndarray data structure to store strides and buffer offsets in bits
rather than bytes, and probably with endianness handling too. Indexing is
all done at the ndarray buffer-of-bytes layer, without any involvement of
the dtype.
>
> Consider:
>
> a = zeros(10, dtype=uint4)
> b = a[1::3]
>
> Now b is a view onto a discontiguous set of half-bytes within a...
>
> You could have a dtype that represented several uint4s that together
added up to an integral number of bytes, sort of like a structured dtype.
Or packbits()/unpackbits(), like you say.
>
> -n

Then perhaps such a project could be a four-stage thing.

1. Allow for the creation of int, unit, float, bool, and complex dtypes
with an arbitrary number of bytes

2. Allow for the creation of dtypes which are integer fractions of a byte,
(1, 2, or 4 bits), and must be padded to a whole byte.

3. Have an optional internal value in an array that tells it to exclude the
last n bits of the last byte.  This would be used to hide the padding from
step 2.  This should be abstracted into a general-purpose method for
excluding bits from the byte-to-dtype conversion so it can be used in step
4.

4. Allow for the creation of dtypes that are non-integer fractions of a
byte or non-integer multiples of a byte (3, 5, 6, 7, 9, 10, 11, 12, etc,
bits). Each element in the array would be stored as a certain number of
bytes, with the method from 3 used to cut it down to the right number of
bits.  So a 3 bit dtype would have two elements per byte with 2 bits
excluded. A 5 bit dtype would have 1 element per byte with 3 bits excluded.
A 12 bit dtype would have one element in two bytes with with 4 bits
excluded from the second byte.

This approach would allow for arbitrary numbers of bits without breaking
the internal representation, would have each stage building off the
previous stage, and we would still have something useful even if not all
the stages are completed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/4735f6ac/attachment.html>

From francesc at continuum.io  Wed Mar  6 05:29:52 2013
From: francesc at continuum.io (Francesc Alted)
Date: Wed, 06 Mar 2013 11:29:52 +0100
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CALpTaNaSR6n5aTWgjFLM=fogxwWzEimn3K9PrLLCfxfDWzqPRw@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CALpTaNah23=r==YmOGauCQn=1KLKZhNyJHzgR9fzEk6uqMiahw@mail.gmail.com>
	<F8DDA3F6-1979-42FD-9AEB-BB686F8B11F3@inria.fr>
	<5135A290.2050405@hawaii.edu>
	<CALpTaNaSR6n5aTWgjFLM=fogxwWzEimn3K9PrLLCfxfDWzqPRw@mail.gmail.com>
Message-ID: <51371AA0.1040808@continuum.io>

On 3/5/13 7:14 PM, Kurt Smith wrote:
> On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing <efiring at hawaii.edu> wrote:
>> On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
>>>>> This made me think of a serious performance limitation of structured dtypes: a
>>>>> structured dtype is always "packed", which may lead to terrible byte alignment
>>>>> for common types.  For instance, `dtype([('a', 'u1'), ('b',
>>>>> 'u8')]).itemsize == 9`,
>>>>> meaning that the 8-byte integer is not aligned as an equivalent C-struct's
>>>>> would be, leading to all sorts of horrors at the cache and register level.
>> Doesn't the "align" kwarg of np.dtype do what you want?
>>
>> In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']),
>> align=True)
>>
>> In [3]: dt.itemsize
>> Out[3]: 16
> Thanks!  That's what I get for not checking before posting.
>
> Consider this my vote to make `aligned=True` the default.

I would not run too much.  The example above takes 9 bytes to host the 
structure, while a `aligned=True` will take 16 bytes.  I'd rather let 
the default as it is, and in case performance is critical, you can 
always copy the unaligned field to a new (homogeneous) array.

-- 
Francesc Alted


From sunghwanchoi91 at gmail.com  Wed Mar  6 07:22:23 2013
From: sunghwanchoi91 at gmail.com (Sunghwan Choi)
Date: Wed, 6 Mar 2013 21:22:23 +0900
Subject: [Numpy-discussion] embedding numpy ImportError:
	numpy.core.multiarray failed to import
Message-ID: <1ab301ce1a65$44961a70$cdc24f50$@gmail.com>

Hi, 

I tried to embedding numpy in C++ but I got 

ImportError: numpy.core.multiarray failed to import

Do you know any ways to solve this problem?

I copy my codes and error message 

 
Makefile

 
CXX= icpc

all: exe

clean:

    rm -rf *.o exe

exe: test.o

    $(CXX) -o exe test.o -L/home/shchoi/program/epd/lib/ -lpython2.7 

test.o : test.cpp

    $(CXX) -c test.cpp
-I/home/shchoi/program/epd/lib/python2.7/site-packages/numpy/core/include/nu
mpy/ -I/home/shchoi/program/epd/include/python2.7

 
tmp.cpp

 
#include "Python.h"

#include "arrayobject.h"

#include <stdio.h>

#include <iostream>

extern "C" void Py_Initialize();

extern "C" void PyErr_Print();

using namespace std;

 
int main(int argc, char* argv[])

{

    double answer = 0;

    PyObject *modname, *mod, *mdict, *func, *stringarg, *args, *rslt;

 
    Py_Initialize();

    import_array();

    modname = PyString_FromString("numpy");

    mod = PyImport_Import(modname);

    PyErr_Print();

    cout << mod << endl;

    Py_Finalize();

    return 0;

}

 
$ make

icpc -c test.cpp
-I/home/shchoi/program/epd/lib/python2.7/site-packages/numpy/core/include/nu
mpy/ -I/home/shchoi/program/epd/include/python2.7

test.cpp(15): warning #117: non-void function "main" should return a value

          import_array();

          ^

icpc -o exe test.o -L/home/shchoi/program/epd/lib/ -lpython2.7
#-L/home/shchoi/program/epd/lib/python2.7/site-packages/numpy/core/

 
$ ./exe 

ImportError: numpy.core.multiarray failed to import

 
Sunghwan Choi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/0fe109f1/attachment.html>

From marcelcoding+numpy at gmail.com  Wed Mar  6 11:14:22 2013
From: marcelcoding+numpy at gmail.com (Marcel Stimberg)
Date: Wed, 6 Mar 2013 17:14:22 +0100
Subject: [Numpy-discussion] scipy.weave + nose testing no longer works with
	numpy 1.7.0
Message-ID: <CACZng1Do-+nO47Sx9NG3JTzJADKdVYh3o+PO=kT-zhwZntznFA@mail.gmail.com>

Hi,

I noticed that our unit tests running under nose and using scipy.weave
started to fail with numpy 1.7.0 because of a change in
numpy.distutils.exec_command (called by scipy.weave) which now assumes
that sys.stdout always provides a fileno function (which fails because
nose redirect stdout to a cStringIO). I guess the combination of
scipy.weave and nose is not that unusual for scientific software,
maybe 1.7.1 could make the exec_command a bit more robust in that
regard? I filed the issue as #2999 in github (including a simple
example triggering the error):
https://github.com/numpy/numpy/issues/2999

Thanks
  Marcel


From dan.blanchard at gmail.com  Wed Mar  6 11:41:25 2013
From: dan.blanchard at gmail.com (Dan Blanchard)
Date: Wed, 6 Mar 2013 11:41:25 -0500
Subject: [Numpy-discussion] Help trying to fix issue 368 on Github (math
 functions fail confusingly on long integers (and object arrays generally))
Message-ID: <CACfRLJ4RT4HB=ShXiAT91DMjWK=2mq3_kCPcWgB89wyoGM+JXQ@mail.gmail.com>

Hi,

I've been trying to take a crack at fixing
https://github.com/numpy/numpy/issues/368, and I think I've identified all
of the affected functions and even a potential fix, but I'm new to the
Python C API and the numpy source, so if anyone has time to look at the
discussion on Github and chime in with suggestions, I'd be glad to help
finish getting this patched up.  It is currently very frustrating that many
of the math functions do not work with longs.

Thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/8531bb84/attachment.html>

From dynamicgl at gmail.com  Wed Mar  6 11:44:32 2013
From: dynamicgl at gmail.com (Gelin Yan)
Date: Thu, 7 Mar 2013 00:44:32 +0800
Subject: [Numpy-discussion] a question about freeze on numpy 1.7.0
In-Reply-To: <CAHXv-Mh9rNQs8EfsXF_cjsWWYuhAF6V2P=WVFpegjnP1LVypeg@mail.gmail.com>
References: <CABkOF6QTjq0VJa-L-GM9D2Vz6caxdb8rc+1jdH77ugE3ghp1gw@mail.gmail.com>
	<CADDwiVD_EW3HWaGet9_=q1fjNcqpLY8sqwe6rL9KrXs2Di74aw@mail.gmail.com>
	<CABkOF6QBsSfyductv7utcEMv+mZUOwPLUwn2O7WXBZWqmZW=3g@mail.gmail.com>
	<CAHXv-Mh4FPvUvEr+Lb3592FheDvKqWQH=Eq-b_fTiRhm_+sz3Q@mail.gmail.com>
	<CABkOF6QGZNVdvrL6n5V4or5rN1TNi=_xBj_eSbmVrsQ8rZt-iQ@mail.gmail.com>
	<CAHXv-Mh9rNQs8EfsXF_cjsWWYuhAF6V2P=WVFpegjnP1LVypeg@mail.gmail.com>
Message-ID: <CABkOF6Tr1iv3fh2sOWQp9WM0raBnec3P+78gfYSZf3e2RW2Uig@mail.gmail.com>

On Mon, Feb 25, 2013 at 4:09 PM, Bradley M. Froehle
<brad.froehle at gmail.com>wrote:

> I submitted a bug report (and patch) to cx_freeze.  You can follow up with
> them at http://sourceforge.net/p/cx-freeze/bugs/36/.
>
> -Brad
>
>
> On Mon, Feb 25, 2013 at 12:06 AM, Gelin Yan <dynamicgl at gmail.com> wrote:
>
>>
>>
>> On Mon, Feb 25, 2013 at 3:53 PM, Bradley M. Froehle <
>> brad.froehle at gmail.com> wrote:
>>
>>> I can reproduce with NumPy 1.7.0, but I'm not convinced the bug lies
>>> within NumPy.
>>>
>>> The exception is not being raised on the `del sys` line.  Rather it is
>>> being raised in numpy.__init__:
>>>
>>>   File
>>> "/home/bfroehle/.local/lib/python2.7/site-packages/cx_Freeze/initscripts/Console.py",
>>> line 27, in <module>
>>>     exec code in m.__dict__
>>>   File "numpytest.py", line 1, in <module>
>>>     import numpy
>>>   File
>>> "/home/bfroehle/.local/lib/python2.7/site-packages/numpy/__init__.py", line
>>> 147, in <module>
>>>     from core import *
>>> AttributeError: 'module' object has no attribute 'sys'
>>>
>>> This is because, somehow, `'sys' in numpy.core.__all__` returns True in
>>> the cx_Freeze context but False in the regular Python context.
>>>
>>> -Brad
>>>
>>>
>>> On Sun, Feb 24, 2013 at 10:49 PM, Gelin Yan <dynamicgl at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Feb 25, 2013 at 9:16 AM, Ond?ej ?ert?k <ondrej.certik at gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Gelin,
>>>>>
>>>>> On Sun, Feb 24, 2013 at 12:08 AM, Gelin Yan <dynamicgl at gmail.com>
>>>>> wrote:
>>>>> > Hi All
>>>>> >
>>>>> >      When I used numpy 1.7.0 with cx_freeze 4.3.1 on windows, I
>>>>> quickly
>>>>> > found out even a simple "import numpy" may lead to program failed
>>>>> with
>>>>> > following exception:
>>>>> >
>>>>> > "AttributeError: 'module' object has no attribute 'sys'
>>>>> >
>>>>> > After a poking around some codes I noticed /numpy/core/__init__.py
>>>>> has a
>>>>> > line 'del sys' at the bottom. After I commented this line, and
>>>>> repacked the
>>>>> > whole program, It ran fine.
>>>>> > I also noticed this 'del sys' didn't exist on numpy 1.6.2
>>>>> >
>>>>> > I am curious why this 'del sys' should be here and whether it is
>>>>> safe to
>>>>> > omit it. Thanks.
>>>>>
>>>>> The "del sys" line was introduced in the commit:
>>>>>
>>>>>
>>>>> https://github.com/numpy/numpy/commit/4c0576fe9947ef2af8351405e0990cebd83ccbb6
>>>>>
>>>>> and it seems to me that it is needed so that the numpy.core namespace
>>>>> is not
>>>>> cluttered by it.
>>>>>
>>>>> Can you post the full stacktrace of your program (and preferably some
>>>>> instructions
>>>>> how to reproduce the problem)? It should become clear where the
>>>>> problem is.
>>>>>
>>>>> Thanks,
>>>>> Ondrej
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>
>>>> Hi Ondrej
>>>>
>>>>     I attached two files here for demonstration. you need cx_freeze to
>>>> build a standalone executable file. simply running python setup.py build
>>>> and try to run the executable file you may see this exception. This
>>>> example works with numpy 1.6.2. Thanks.
>>>>
>>>> Regards
>>>>
>>>> gelin yan
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>> Hi Bradley
>>
>>     So is it supposed to be a bug of cx_freeze? Any work around for that
>> except omit 'del sys'? If the answer is no, I may consider submit a ticket
>> on cx_freeze site. Thanks
>>
>> Regards
>>
>> gelin yan
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
Hi Brad

   please feel free to check it http://sourceforge.net/p/cx-freeze/bugs/36/

someone from cx_freeze has replied it. Thanks again

Regards

gelin yan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130307/f53329e1/attachment.html>

From kwmsmith at gmail.com  Wed Mar  6 13:12:27 2013
From: kwmsmith at gmail.com (Kurt Smith)
Date: Wed, 6 Mar 2013 12:12:27 -0600
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <51371AA0.1040808@continuum.io>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CALpTaNah23=r==YmOGauCQn=1KLKZhNyJHzgR9fzEk6uqMiahw@mail.gmail.com>
	<F8DDA3F6-1979-42FD-9AEB-BB686F8B11F3@inria.fr>
	<5135A290.2050405@hawaii.edu>
	<CALpTaNaSR6n5aTWgjFLM=fogxwWzEimn3K9PrLLCfxfDWzqPRw@mail.gmail.com>
	<51371AA0.1040808@continuum.io>
Message-ID: <CALpTaNY2h2kqwPHdNtG6U0G=0dzS_DhD1DcarfLY8vBkDHhATw@mail.gmail.com>

On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted <francesc at continuum.io> wrote:
> On 3/5/13 7:14 PM, Kurt Smith wrote:
>> On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing <efiring at hawaii.edu> wrote:
>>> On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
>>>>>> This made me think of a serious performance limitation of structured dtypes: a
>>>>>> structured dtype is always "packed", which may lead to terrible byte alignment
>>>>>> for common types.  For instance, `dtype([('a', 'u1'), ('b',
>>>>>> 'u8')]).itemsize == 9`,
>>>>>> meaning that the 8-byte integer is not aligned as an equivalent C-struct's
>>>>>> would be, leading to all sorts of horrors at the cache and register level.
>>> Doesn't the "align" kwarg of np.dtype do what you want?
>>>
>>> In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']),
>>> align=True)
>>>
>>> In [3]: dt.itemsize
>>> Out[3]: 16
>> Thanks!  That's what I get for not checking before posting.
>>
>> Consider this my vote to make `aligned=True` the default.
>
> I would not run too much.  The example above takes 9 bytes to host the
> structure, while a `aligned=True` will take 16 bytes.  I'd rather let
> the default as it is, and in case performance is critical, you can
> always copy the unaligned field to a new (homogeneous) array.

Yes, I can absolutely see the case you're making here, and I made my
"vote" with the understanding that `aligned=False` will almost
certainly stay the default.  Adding 'aligned=True' is simple for me to
do, so no harm done.

My case is based on what's the least surprising behavior: C structs /
all C compilers, the builtin `struct` module, and ctypes `Structure`
subclasses all use padding to ensure aligned fields by default.  You
can turn this off to get packed structures, but the default behavior
in these other places is alignment, which is why I was surprised when
I first saw that NumPy structured dtypes are packed by default.


>
> --
> Francesc Alted
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From kwmsmith at gmail.com  Wed Mar  6 13:42:52 2013
From: kwmsmith at gmail.com (Kurt Smith)
Date: Wed, 6 Mar 2013 12:42:52 -0600
Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior
	(was: GSOC 2013)
Message-ID: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>

On Wed, Mar 6, 2013 at 12:12 PM, Kurt Smith <kwmsmith at gmail.com> wrote:
> On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted <francesc at continuum.io> wrote:
>>
>> I would not run too much.  The example above takes 9 bytes to host the
>> structure, while a `aligned=True` will take 16 bytes.  I'd rather let
>> the default as it is, and in case performance is critical, you can
>> always copy the unaligned field to a new (homogeneous) array.
>
> Yes, I can absolutely see the case you're making here, and I made my
> "vote" with the understanding that `aligned=False` will almost
> certainly stay the default.  Adding 'aligned=True' is simple for me to
> do, so no harm done.
>
> My case is based on what's the least surprising behavior: C structs /
> all C compilers, the builtin `struct` module, and ctypes `Structure`
> subclasses all use padding to ensure aligned fields by default.  You
> can turn this off to get packed structures, but the default behavior
> in these other places is alignment, which is why I was surprised when
> I first saw that NumPy structured dtypes are packed by default.
>

Some surprises with aligned / unaligned arrays:

#-----------------------------

import numpy as np

packed_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=False)
aligned_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=True)

packed_arr = np.ones((10**6,), dtype=packed_dt)
aligned_arr = np.ones((10**6,), dtype=aligned_dt)

print "all(packed_arr['a'] == aligned_arr['a'])",
np.all(packed_arr['a'] == aligned_arr['a']) # True
print "all(packed_arr['b'] == aligned_arr['b'])",
np.all(packed_arr['b'] == aligned_arr['b']) # True
print "all(packed_arr == aligned_arr)", np.all(packed_arr ==
aligned_arr) # False (!!)

#-----------------------------

I can understand what's likely going on under the covers that makes
these arrays not compare equal, but I'd expect that if all columns of
two structured arrays are everywhere equal, then the arrays themselves
would be everywhere equal.  Bug?

And regarding performance, doing simple timings shows a 30%-ish
slowdown for unaligned operations:

In [36]: %timeit packed_arr['b']**2
100 loops, best of 3: 2.48 ms per loop

In [37]: %timeit aligned_arr['b']**2
1000 loops, best of 3: 1.9 ms per loop

Whereas summing shows just a 10%-ish slowdown:

In [38]: %timeit packed_arr['b'].sum()
1000 loops, best of 3: 1.29 ms per loop

In [39]: %timeit aligned_arr['b'].sum()
1000 loops, best of 3: 1.14 ms per loop


From charlesr.harris at gmail.com  Wed Mar  6 13:43:41 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 6 Mar 2013 11:43:41 -0700
Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases
Message-ID: <CAB6mnxJ-xB2KuW=v52JpcuO7kqW6rQ=ep3fR3Ak_3Z1_C3d_hg@mail.gmail.com>

Hi All,

There are now some 14 non-merge commits in the 1.7.x branch including the
critical diagonal leak fix. I think there is maybe one more critical
backport and perhaps several low priority fixes, documentation and such,
but I think we should start up the release process with a goal of getting
1.7.1 out by the middle of April.

The development branch has been accumulating stuff since last summer, I
suggest we look to get it out in May, branching at the end of this month.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/96c9993c/attachment.html>

From efiring at hawaii.edu  Wed Mar  6 13:56:47 2013
From: efiring at hawaii.edu (Eric Firing)
Date: Wed, 06 Mar 2013 08:56:47 -1000
Subject: [Numpy-discussion] GSOC 2013
In-Reply-To: <CALpTaNaSR6n5aTWgjFLM=fogxwWzEimn3K9PrLLCfxfDWzqPRw@mail.gmail.com>
References: <CAFpSVpL9fFq7p-74KfDDUEYZ8r5K3HxsPoWiU_p4GMJxt09aaA@mail.gmail.com>
	<CABL7CQiZkk2mwZ7zAgSeBN3WtRz97wjnVG7YzGSbP79y0nSpOQ@mail.gmail.com>
	<CAFpSVpKMavM4t0KTdf8=GM7tahv03cXNuy3ApOp610M2v9bagQ@mail.gmail.com>
	<CALpTaNah23=r==YmOGauCQn=1KLKZhNyJHzgR9fzEk6uqMiahw@mail.gmail.com>
	<F8DDA3F6-1979-42FD-9AEB-BB686F8B11F3@inria.fr>
	<5135A290.2050405@hawaii.edu>
	<CALpTaNaSR6n5aTWgjFLM=fogxwWzEimn3K9PrLLCfxfDWzqPRw@mail.gmail.com>
Message-ID: <5137916F.5080708@hawaii.edu>

On 2013/03/05 8:14 AM, Kurt Smith wrote:
> On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing <efiring at hawaii.edu> wrote:
>> On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
>>>>> This made me think of a serious performance limitation of structured dtypes: a
>>>>> structured dtype is always "packed", which may lead to terrible byte alignment
>>>>> for common types.  For instance, `dtype([('a', 'u1'), ('b',
>>>>> 'u8')]).itemsize == 9`,
>>>>> meaning that the 8-byte integer is not aligned as an equivalent C-struct's
>>>>> would be, leading to all sorts of horrors at the cache and register level.
>>
>> Doesn't the "align" kwarg of np.dtype do what you want?
>>
>> In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']),
>> align=True)
>>
>> In [3]: dt.itemsize
>> Out[3]: 16
>
> Thanks!  That's what I get for not checking before posting.
>
> Consider this my vote to make `aligned=True` the default.

I strongly oppose this, because it would break the common usage of 
structured dtypes for reading packed binary data from files.  I see no 
reason to change the default.

Eric

>
>>
>> Eric
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From ralf.gommers at gmail.com  Wed Mar  6 14:42:45 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 6 Mar 2013 20:42:45 +0100
Subject: [Numpy-discussion] scipy.weave + nose testing no longer works
 with numpy 1.7.0
In-Reply-To: <CACZng1Do-+nO47Sx9NG3JTzJADKdVYh3o+PO=kT-zhwZntznFA@mail.gmail.com>
References: <CACZng1Do-+nO47Sx9NG3JTzJADKdVYh3o+PO=kT-zhwZntznFA@mail.gmail.com>
Message-ID: <CABL7CQiE7H_Q0d0EMyQUbXCZPxpC0TgMNS-dKRY8n6GWyX=mcA@mail.gmail.com>

On Wed, Mar 6, 2013 at 5:14 PM, Marcel Stimberg <
marcelcoding+numpy at gmail.com> wrote:

> Hi,
>
> I noticed that our unit tests running under nose and using scipy.weave
> started to fail with numpy 1.7.0 because of a change in
> numpy.distutils.exec_command (called by scipy.weave) which now assumes
> that sys.stdout always provides a fileno function (which fails because
> nose redirect stdout to a cStringIO). I guess the combination of
> scipy.weave and nose is not that unusual for scientific software,
> maybe 1.7.1 could make the exec_command a bit more robust in that
> regard? I filed the issue as #2999 in github (including a simple
> example triggering the error):
> https://github.com/numpy/numpy/issues/2999
>

That ticket has been sitting in my inbox for the last 2 weeks, sorry for
not replying earlier. It's yet again a case of a small and seemingly
harmless change in distutils breaking a fair amount of things.

I added it to the 1.7.1 milestone.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/010d1e16/attachment.html>

From njs at pobox.com  Wed Mar  6 15:06:16 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Mar 2013 20:06:16 +0000
Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases
In-Reply-To: <CAB6mnxJ-xB2KuW=v52JpcuO7kqW6rQ=ep3fR3Ak_3Z1_C3d_hg@mail.gmail.com>
References: <CAB6mnxJ-xB2KuW=v52JpcuO7kqW6rQ=ep3fR3Ak_3Z1_C3d_hg@mail.gmail.com>
Message-ID: <CAPJVwBnu_gMqyj7HHEdtH0=Z9JyHtzGxP=1vX1qzRK5UX_=Qig@mail.gmail.com>

On Wed, Mar 6, 2013 at 6:43 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Hi All,
>
> There are now some 14 non-merge commits in the 1.7.x branch including the
> critical diagonal leak fix. I think there is maybe one more critical
> backport and perhaps several low priority fixes, documentation and such, but
> I think we should start up the release process with a goal of getting 1.7.1
> out by the middle of April.

What's the critical backport you're thinking of? This last shows just
two backport PRs waiting to be merged, one trivial one that I just
submitted, the other that needs a tweak but won't take long:
  https://github.com/numpy/numpy/issues?milestone=27&page=1&state=open
But I agree, basically we should merge those two (today?) and then
release the first RC as soon as Ondrej has a moment to do so...

> The development branch has been accumulating stuff since last summer, I
> suggest we look to get it out in May, branching at the end of this month.

I would say "let's fix the blockers and then branch as soon as Ondrej
has time to do it", but in practice I suspect this comes out the same
as what you just said :-). I just pruned the list of blockers; here's
what we've got:
  https://github.com/numpy/numpy/issues?milestone=1&page=1&state=open

-n


From njs at pobox.com  Wed Mar  6 15:09:00 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Mar 2013 20:09:00 +0000
Subject: [Numpy-discussion] Numpy deprecation schedule
Message-ID: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>

A number of items on the 1.8 todo list are reminders to remove things
that we deprecated in 1.7, and said we would remove in 1.8, e.g.:
  https://github.com/numpy/numpy/issues/596
  https://github.com/numpy/numpy/issues/294

But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that.

I suggest we switch to a time-based deprecation schedule, where
instead of saying "this will be removed in N releases" we say "this
will be removed in the first release on or after (now+N months)".

I also suggest that we set N=12, because it's a round number, it
roughly matches numpy's historical release cycle, and because AFAICT
that's the number that python itself uses for core and stdlib
deprecations.

Thoughts?
-n


From nouiz at nouiz.org  Wed Mar  6 15:21:46 2013
From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=)
Date: Wed, 6 Mar 2013 15:21:46 -0500
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
Message-ID: <CADKKbtikeO0vf8X48Y-NA=eiosDt0O1G2wD_eLRfycyOdyPWNA@mail.gmail.com>

That sound good. To be sure, the "now" mean the first release that
include the deprecation, in that case NumPy 1.7?

Fred

On Wed, Mar 6, 2013 at 3:09 PM, Nathaniel Smith <njs at pobox.com> wrote:
> A number of items on the 1.8 todo list are reminders to remove things
> that we deprecated in 1.7, and said we would remove in 1.8, e.g.:
>   https://github.com/numpy/numpy/issues/596
>   https://github.com/numpy/numpy/issues/294
>
> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that.
>
> I suggest we switch to a time-based deprecation schedule, where
> instead of saying "this will be removed in N releases" we say "this
> will be removed in the first release on or after (now+N months)".
>
> I also suggest that we set N=12, because it's a round number, it
> roughly matches numpy's historical release cycle, and because AFAICT
> that's the number that python itself uses for core and stdlib
> deprecations.
>
> Thoughts?
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From njs at pobox.com  Wed Mar  6 15:38:47 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Mar 2013 20:38:47 +0000
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CADKKbtikeO0vf8X48Y-NA=eiosDt0O1G2wD_eLRfycyOdyPWNA@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
	<CADKKbtikeO0vf8X48Y-NA=eiosDt0O1G2wD_eLRfycyOdyPWNA@mail.gmail.com>
Message-ID: <CAPJVwB=EvE5krP=G39J9A4PkbMxYxq6SkQizqK9TCJcWFjRWPg@mail.gmail.com>

On Wed, Mar 6, 2013 at 8:21 PM, Fr?d?ric Bastien <nouiz at nouiz.org> wrote:
> That sound good. To be sure, the "now" mean the first release that
> include the deprecation, in that case NumPy 1.7?

Yes.

-n


From ralf.gommers at gmail.com  Wed Mar  6 15:52:23 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 6 Mar 2013 21:52:23 +0100
Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases
In-Reply-To: <CAPJVwBnu_gMqyj7HHEdtH0=Z9JyHtzGxP=1vX1qzRK5UX_=Qig@mail.gmail.com>
References: <CAB6mnxJ-xB2KuW=v52JpcuO7kqW6rQ=ep3fR3Ak_3Z1_C3d_hg@mail.gmail.com>
	<CAPJVwBnu_gMqyj7HHEdtH0=Z9JyHtzGxP=1vX1qzRK5UX_=Qig@mail.gmail.com>
Message-ID: <CABL7CQi0RenzuRth-_ire4Vw9PMEm-hKjXnXYsYeMPk3f5b9RQ@mail.gmail.com>

On Wed, Mar 6, 2013 at 9:06 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Mar 6, 2013 at 6:43 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > Hi All,
> >
> > There are now some 14 non-merge commits in the 1.7.x branch including the
> > critical diagonal leak fix. I think there is maybe one more critical
> > backport and perhaps several low priority fixes, documentation and such,
> but
> > I think we should start up the release process with a goal of getting
> 1.7.1
> > out by the middle of April.
>
> What's the critical backport you're thinking of? This last shows just
> two backport PRs waiting to be merged, one trivial one that I just
> submitted, the other that needs a tweak but won't take long:
>   https://github.com/numpy/numpy/issues?milestone=27&page=1&state=open
> But I agree, basically we should merge those two (today?) and then
> release the first RC as soon as Ondrej has a moment to do so...
>

I added issue 2999, which I think should be taken along. Other than that,
+1 for a quick release.


> > The development branch has been accumulating stuff since last summer, I
> > suggest we look to get it out in May, branching at the end of this month.
>
> I would say "let's fix the blockers and then branch as soon as Ondrej
> has time to do it", but in practice I suspect this comes out the same
> as what you just said :-). I just pruned the list of blockers; here's
> what we've got:
>   https://github.com/numpy/numpy/issues?milestone=1&page=1&state=open
>

It looks like we're not doing so well with setting Milestones correctly.
Only 4 closed issues for 1.8....

Release quickly after 1.7.1 sounds good.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/b0bbb7a6/attachment.html>

From ben.root at ou.edu  Wed Mar  6 16:16:19 2013
From: ben.root at ou.edu (Benjamin Root)
Date: Wed, 6 Mar 2013 16:16:19 -0500
Subject: [Numpy-discussion] Implementing a "find first" style function
In-Reply-To: <CA+L60sD10b9oAd+fMy1t941nX3kSy2RWS4RVkkhpFhi2pUdseg@mail.gmail.com>
References: <CA+L60sD10b9oAd+fMy1t941nX3kSy2RWS4RVkkhpFhi2pUdseg@mail.gmail.com>
Message-ID: <CANNq6FmLkHunDrBW5t2Fy5Sh34E_sQaD9TeRw90mCAGNzb=dYQ@mail.gmail.com>

On Tue, Mar 5, 2013 at 9:15 AM, Phil Elson <pelson.pub at gmail.com> wrote:

> The ticket https://github.com/numpy/numpy/issues/2269 discusses the
> possibility of implementing a "find first" style function which can
> optimise the process of finding the first value(s) which match a predicate
> in a given 1D array. For example:
>
>
> >>> a = np.sin(np.linspace(0, np.pi, 200))
> >>> print find_first(a, lambda a: a > 0.9)
> ((71, ), 0.900479032457)
>
>
> This has been discussed in several locations:
>
> https://github.com/numpy/numpy/issues/2269
> https://github.com/numpy/numpy/issues/2333
>
> http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item
>
>
> *Rationale*
>
> For small arrays there is no real reason to avoid doing:
>
> >>> a = np.sin(np.linspace(0, np.pi, 200))
> >>> ind = (a > 0.9).nonzero()[0][0]
> >>> print (ind, ), a[ind]
> (71,) 0.900479032457
>
>
> But for larger arrays, this can lead to massive amounts of work even if
> the result is one of the first to be computed. Example:
>
> >>> a = np.arange(1e8)
> >>> print (a == 5).nonzero()[0][0]
> 5
>
>
> So a function which terminates when the first matching value is found is
> desirable.
>
> As mentioned in #2269, it is possible to define a consistent ordering
> which allows this functionality for >1D arrays, but IMHO it overcomplicates
> the problem and was not a case that I personally needed, so I've limited
> the scope to 1D arrays only.
>
>
> *Implementation*
>
> My initial assumption was that to get any kind of performance I would need
> to write the *find* function in C, however after prototyping with some
> array chunking it became apparent that a trivial python function would be
> quick enough for my needs.
>
> The approach I've implemented in the code found in #2269 simply breaks the
> array into sub-arrays of maximum length *chunk_size* (2048 by default,
> though there is no real science to this number), applies the given
> predicating function, and yields the results from *nonzero()*. The given
> function should be a python function which operates on the whole of the
> sub-array element-wise (i.e. the function should be vectorized). Returning
> a generator also has the benefit of allowing users to get the first *n*matching values/indices.
>
>
> *Results*
>
>
> I timed the implementation of *find* found in my comment at
> https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with an
> obvious test:
>
>
> In [1]: from np_utils import find
>
> In [2]: import numpy as np
>
> In [3]: import numpy.random
>
> In [4]: np.random.seed(1)
>
> In [5]: a = np.random.randn(1e8)
>
> In [6]: a.min(), a.max()
> Out[6]: (-6.1194900990552776, 5.9632246301166321)
>
> In [7]: next(find(a, lambda a: np.abs(a) > 6))
> Out[7]: ((33105441,), -6.1194900990552776)
>
> In [8]: (np.abs(a) > 6).nonzero()
> Out[8]: (array([33105441]),)
>
> In [9]: %timeit (np.abs(a) > 6).nonzero()
> 1 loops, best of 3: 1.51 s per loop
>
> In [10]: %timeit next(find(a, lambda a: np.abs(a) > 6))
> 1 loops, best of 3: 912 ms per loop
>
> In [11]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=100000))
> 1 loops, best of 3: 470 ms per loop
>
> In [12]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=1000000))
> 1 loops, best of 3: 483 ms per loop
>
>
> This shows that picking a sensible *chunk_size* can yield massive
> speed-ups (nonzero is x3 slower in one case). A similar example with a much
> smaller 1D array shows similar promise:
>
> In [41]: a = np.random.randn(1e4)
>
> In [42]: %timeit next(find(a, lambda a: np.abs(a) > 3))
> 10000 loops, best of 3: 35.8 us per loop
>
> In [43]: %timeit (np.abs(a) > 3).nonzero()
> 10000 loops, best of 3: 148 us per loop
>
>
> As I commented on the issue tracker, if you think this function is worth
> taking forward, I'd be happy to open up a pull request.
>
> Feedback greatfully received.
>
> Cheers,
>
> Phil
>
>
>
In the interest of generalizing code and such, could such approaches be
used for functions like np.any() and np.all() for short-circuiting if True
or False (respectively) are found?  I wonder what other sort of functions
in NumPy might benefit from this?

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/f6bc5c12/attachment.html>

From ralf.gommers at gmail.com  Wed Mar  6 16:24:11 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 6 Mar 2013 22:24:11 +0100
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CAPJVwB=EvE5krP=G39J9A4PkbMxYxq6SkQizqK9TCJcWFjRWPg@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
	<CADKKbtikeO0vf8X48Y-NA=eiosDt0O1G2wD_eLRfycyOdyPWNA@mail.gmail.com>
	<CAPJVwB=EvE5krP=G39J9A4PkbMxYxq6SkQizqK9TCJcWFjRWPg@mail.gmail.com>
Message-ID: <CABL7CQivxj9tKXeNiyuVcWADmHmAQKH50C_oUsL9kdhoyA9jYA@mail.gmail.com>

On Wed, Mar 6, 2013 at 9:38 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Mar 6, 2013 at 8:21 PM, Fr?d?ric Bastien <nouiz at nouiz.org> wrote:
> > That sound good. To be sure, the "now" mean the first release that
> > include the deprecation, in that case NumPy 1.7?
>
> Yes.
>

+1

$ git add HOWTO_DEPRECATE.rst.txt  ?

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/19032372/attachment.html>

From njs at pobox.com  Wed Mar  6 16:40:47 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Mar 2013 21:40:47 +0000
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CABL7CQivxj9tKXeNiyuVcWADmHmAQKH50C_oUsL9kdhoyA9jYA@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
	<CADKKbtikeO0vf8X48Y-NA=eiosDt0O1G2wD_eLRfycyOdyPWNA@mail.gmail.com>
	<CAPJVwB=EvE5krP=G39J9A4PkbMxYxq6SkQizqK9TCJcWFjRWPg@mail.gmail.com>
	<CABL7CQivxj9tKXeNiyuVcWADmHmAQKH50C_oUsL9kdhoyA9jYA@mail.gmail.com>
Message-ID: <CAPJVwBm5AnAZR+VRRqVSRJwr6fEHpRsg8f-W3FpZhcmc9KWEMg@mail.gmail.com>

On Wed, Mar 6, 2013 at 9:24 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
> On Wed, Mar 6, 2013 at 9:38 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Wed, Mar 6, 2013 at 8:21 PM, Fr?d?ric Bastien <nouiz at nouiz.org> wrote:
>> > That sound good. To be sure, the "now" mean the first release that
>> > include the deprecation, in that case NumPy 1.7?
>>
>> Yes.
>
>
> +1
>
> $ git add HOWTO_DEPRECATE.rst.txt  ?

+1

I'm vaguely intimidated by the doc structure, so I'm not sure where
this would go, but... aside from a formal description of how one does
a deprecation and the difference between DeprecationWarning and
FutureWarning, etc., we might even want to just add a whole page in
the manual that just lists the current status of all ongoing
deprecations, the releases where each change was made, the date for
the next change, etc., and use that as our canonical reference that we
check before each release? Since this is information that end-users
want to be able to see? ("I got this weird warning... what is it
trying to tell me? Which release started issuing it? What's my
deadline for fixing this?") And because this whole cycle of filing
multiple bugs and then shunting them off to the next release is pretty
awkward.

-n


From sebastian at sipsolutions.net  Wed Mar  6 17:05:22 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 06 Mar 2013 23:05:22 +0100
Subject: [Numpy-discussion] aligned / unaligned structured dtype
 behavior (was: GSOC 2013)
In-Reply-To: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
Message-ID: <1362607522.3944.7.camel@sebastian-laptop>

On Wed, 2013-03-06 at 12:42 -0600, Kurt Smith wrote:
> On Wed, Mar 6, 2013 at 12:12 PM, Kurt Smith <kwmsmith at gmail.com> wrote:
> > On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted <francesc at continuum.io> wrote:
> >>
> >> I would not run too much.  The example above takes 9 bytes to host the
> >> structure, while a `aligned=True` will take 16 bytes.  I'd rather let
> >> the default as it is, and in case performance is critical, you can
> >> always copy the unaligned field to a new (homogeneous) array.
> >
> > Yes, I can absolutely see the case you're making here, and I made my
> > "vote" with the understanding that `aligned=False` will almost
> > certainly stay the default.  Adding 'aligned=True' is simple for me to
> > do, so no harm done.
> >
> > My case is based on what's the least surprising behavior: C structs /
> > all C compilers, the builtin `struct` module, and ctypes `Structure`
> > subclasses all use padding to ensure aligned fields by default.  You
> > can turn this off to get packed structures, but the default behavior
> > in these other places is alignment, which is why I was surprised when
> > I first saw that NumPy structured dtypes are packed by default.
> >
> 
> Some surprises with aligned / unaligned arrays:
> 
> #-----------------------------
> 
> import numpy as np
> 
> packed_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=False)
> aligned_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=True)
> 
> packed_arr = np.ones((10**6,), dtype=packed_dt)
> aligned_arr = np.ones((10**6,), dtype=aligned_dt)
> 
> print "all(packed_arr['a'] == aligned_arr['a'])",
> np.all(packed_arr['a'] == aligned_arr['a']) # True
> print "all(packed_arr['b'] == aligned_arr['b'])",
> np.all(packed_arr['b'] == aligned_arr['b']) # True
> print "all(packed_arr == aligned_arr)", np.all(packed_arr ==
> aligned_arr) # False (!!)
> 
> #-----------------------------
> 
> I can understand what's likely going on under the covers that makes
> these arrays not compare equal, but I'd expect that if all columns of
> two structured arrays are everywhere equal, then the arrays themselves
> would be everywhere equal.  Bug?
> 

Yes and no... equal for structured types seems not implemented, you get
the same (wrong) False also with (packed_arr == packed_arr). But if the
types are equivalent but np.equal not implemented, just returning False
is a bit dangerous I agree. Not sure what the solution is exactly, I
think the == operator could really raise an error instead of eating them
all though probably...

- Sebastian

> And regarding performance, doing simple timings shows a 30%-ish
> slowdown for unaligned operations:
> 
> In [36]: %timeit packed_arr['b']**2
> 100 loops, best of 3: 2.48 ms per loop
> 
> In [37]: %timeit aligned_arr['b']**2
> 1000 loops, best of 3: 1.9 ms per loop
> 
> Whereas summing shows just a 10%-ish slowdown:
> 
> In [38]: %timeit packed_arr['b'].sum()
> 1000 loops, best of 3: 1.29 ms per loop
> 
> In [39]: %timeit aligned_arr['b'].sum()
> 1000 loops, best of 3: 1.14 ms per loop
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From robert.kern at gmail.com  Wed Mar  6 17:33:52 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 6 Mar 2013 22:33:52 +0000
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
Message-ID: <CAF6FJiu5P2GV=U=d43ZWUq+f9U1YM0hvDF==F6AWAsd4XDqwEQ@mail.gmail.com>

On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith <njs at pobox.com> wrote:
> A number of items on the 1.8 todo list are reminders to remove things
> that we deprecated in 1.7, and said we would remove in 1.8, e.g.:
>   https://github.com/numpy/numpy/issues/596
>   https://github.com/numpy/numpy/issues/294
>
> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that.
>
> I suggest we switch to a time-based deprecation schedule, where
> instead of saying "this will be removed in N releases" we say "this
> will be removed in the first release on or after (now+N months)".

We can always delay removal if a particular release comes sooner than
originally expected. The deprecation policy is just that we specify
minimum version numbers at which the features can be removed. It's not
really a firm schedule.

I do take your suggestion to heart, though. We shouldn't remove stuff
faster than 12 months or so. I just think that it should modify our
release process, not our "marking for deprecation" process.

-- 
Robert Kern


From njs at pobox.com  Wed Mar  6 17:45:53 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Mar 2013 22:45:53 +0000
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CAF6FJiu5P2GV=U=d43ZWUq+f9U1YM0hvDF==F6AWAsd4XDqwEQ@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
	<CAF6FJiu5P2GV=U=d43ZWUq+f9U1YM0hvDF==F6AWAsd4XDqwEQ@mail.gmail.com>
Message-ID: <CAPJVwB=ZYKzABinuEiJ5ncPYzhCvAGFmtK8iZJ39hU2+xoqt0A@mail.gmail.com>

On Wed, Mar 6, 2013 at 10:33 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> A number of items on the 1.8 todo list are reminders to remove things
>> that we deprecated in 1.7, and said we would remove in 1.8, e.g.:
>>   https://github.com/numpy/numpy/issues/596
>>   https://github.com/numpy/numpy/issues/294
>>
>> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that.
>>
>> I suggest we switch to a time-based deprecation schedule, where
>> instead of saying "this will be removed in N releases" we say "this
>> will be removed in the first release on or after (now+N months)".
>
> We can always delay removal if a particular release comes sooner than
> originally expected. The deprecation policy is just that we specify
> minimum version numbers at which the features can be removed. It's not
> really a firm schedule.
>
> I do take your suggestion to heart, though. We shouldn't remove stuff
> faster than 12 months or so. I just think that it should modify our
> release process, not our "marking for deprecation" process.

I'm not sure what this means in practical terms, though? Take the
stuff deprecated in 1.7, released 2013-02-10. From here it seems
plausible that the first release after 2014-02-10 could be 1.9, 1.10,
or even, if we end up really embracing the small-quick-release cycle,
1.11. So which should we write down as our expected version number for
the 1.7 deprecations?

-n


From robert.kern at gmail.com  Wed Mar  6 17:53:13 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 6 Mar 2013 22:53:13 +0000
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CAPJVwB=ZYKzABinuEiJ5ncPYzhCvAGFmtK8iZJ39hU2+xoqt0A@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
	<CAF6FJiu5P2GV=U=d43ZWUq+f9U1YM0hvDF==F6AWAsd4XDqwEQ@mail.gmail.com>
	<CAPJVwB=ZYKzABinuEiJ5ncPYzhCvAGFmtK8iZJ39hU2+xoqt0A@mail.gmail.com>
Message-ID: <CAF6FJiuHpbOy1ZHygEN17ja6ayjvKAv1dEHU2ws=0OAMtx=eqQ@mail.gmail.com>

On Wed, Mar 6, 2013 at 10:45 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Mar 6, 2013 at 10:33 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> A number of items on the 1.8 todo list are reminders to remove things
>>> that we deprecated in 1.7, and said we would remove in 1.8, e.g.:
>>>   https://github.com/numpy/numpy/issues/596
>>>   https://github.com/numpy/numpy/issues/294
>>>
>>> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that.
>>>
>>> I suggest we switch to a time-based deprecation schedule, where
>>> instead of saying "this will be removed in N releases" we say "this
>>> will be removed in the first release on or after (now+N months)".
>>
>> We can always delay removal if a particular release comes sooner than
>> originally expected. The deprecation policy is just that we specify
>> minimum version numbers at which the features can be removed. It's not
>> really a firm schedule.
>>
>> I do take your suggestion to heart, though. We shouldn't remove stuff
>> faster than 12 months or so. I just think that it should modify our
>> release process, not our "marking for deprecation" process.
>
> I'm not sure what this means in practical terms, though? Take the
> stuff deprecated in 1.7, released 2013-02-10. From here it seems
> plausible that the first release after 2014-02-10 could be 1.9, 1.10,
> or even, if we end up really embracing the small-quick-release cycle,
> 1.11. So which should we write down as our expected version number for
> the 1.7 deprecations?

If. I would leave the policy alone until we consistently implement
such a release cycle that makes it regularly problematic.

-- 
Robert Kern


From njs at pobox.com  Wed Mar  6 17:56:37 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 6 Mar 2013 22:56:37 +0000
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CAF6FJiuHpbOy1ZHygEN17ja6ayjvKAv1dEHU2ws=0OAMtx=eqQ@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
	<CAF6FJiu5P2GV=U=d43ZWUq+f9U1YM0hvDF==F6AWAsd4XDqwEQ@mail.gmail.com>
	<CAPJVwB=ZYKzABinuEiJ5ncPYzhCvAGFmtK8iZJ39hU2+xoqt0A@mail.gmail.com>
	<CAF6FJiuHpbOy1ZHygEN17ja6ayjvKAv1dEHU2ws=0OAMtx=eqQ@mail.gmail.com>
Message-ID: <CAPJVwB=ssz0yUNQxYEYi11OFdYxk3UKxUnbewx10rn37BxT20A@mail.gmail.com>

On Wed, Mar 6, 2013 at 10:53 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Mar 6, 2013 at 10:45 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Wed, Mar 6, 2013 at 10:33 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>> On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> A number of items on the 1.8 todo list are reminders to remove things
>>>> that we deprecated in 1.7, and said we would remove in 1.8, e.g.:
>>>>   https://github.com/numpy/numpy/issues/596
>>>>   https://github.com/numpy/numpy/issues/294
>>>>
>>>> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that.
>>>>
>>>> I suggest we switch to a time-based deprecation schedule, where
>>>> instead of saying "this will be removed in N releases" we say "this
>>>> will be removed in the first release on or after (now+N months)".
>>>
>>> We can always delay removal if a particular release comes sooner than
>>> originally expected. The deprecation policy is just that we specify
>>> minimum version numbers at which the features can be removed. It's not
>>> really a firm schedule.
>>>
>>> I do take your suggestion to heart, though. We shouldn't remove stuff
>>> faster than 12 months or so. I just think that it should modify our
>>> release process, not our "marking for deprecation" process.
>>
>> I'm not sure what this means in practical terms, though? Take the
>> stuff deprecated in 1.7, released 2013-02-10. From here it seems
>> plausible that the first release after 2014-02-10 could be 1.9, 1.10,
>> or even, if we end up really embracing the small-quick-release cycle,
>> 1.11. So which should we write down as our expected version number for
>> the 1.7 deprecations?
>
> If. I would leave the policy alone until we consistently implement
> such a release cycle that makes it regularly problematic.

It's being problematic right now, we need some process in place to
handle these bugs through the 1.8 release and to make sure we don't
drop them on the floor later...

-n


From robert.kern at gmail.com  Wed Mar  6 18:02:08 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 6 Mar 2013 23:02:08 +0000
Subject: [Numpy-discussion] Numpy deprecation schedule
In-Reply-To: <CAPJVwB=ssz0yUNQxYEYi11OFdYxk3UKxUnbewx10rn37BxT20A@mail.gmail.com>
References: <CAPJVwBk41sHkayU6ihgtneE58LsTd7_6HPiVCMqN4ed9j3tBuQ@mail.gmail.com>
	<CAF6FJiu5P2GV=U=d43ZWUq+f9U1YM0hvDF==F6AWAsd4XDqwEQ@mail.gmail.com>
	<CAPJVwB=ZYKzABinuEiJ5ncPYzhCvAGFmtK8iZJ39hU2+xoqt0A@mail.gmail.com>
	<CAF6FJiuHpbOy1ZHygEN17ja6ayjvKAv1dEHU2ws=0OAMtx=eqQ@mail.gmail.com>
	<CAPJVwB=ssz0yUNQxYEYi11OFdYxk3UKxUnbewx10rn37BxT20A@mail.gmail.com>
Message-ID: <CAF6FJisG69A-Jsg+qV026NQD8U+eYS8wrSnJ3tPg6uM4c7-RYw@mail.gmail.com>

On Wed, Mar 6, 2013 at 10:56 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Mar 6, 2013 at 10:53 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Wed, Mar 6, 2013 at 10:45 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Wed, Mar 6, 2013 at 10:33 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>>> On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>>> A number of items on the 1.8 todo list are reminders to remove things
>>>>> that we deprecated in 1.7, and said we would remove in 1.8, e.g.:
>>>>>   https://github.com/numpy/numpy/issues/596
>>>>>   https://github.com/numpy/numpy/issues/294
>>>>>
>>>>> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that.
>>>>>
>>>>> I suggest we switch to a time-based deprecation schedule, where
>>>>> instead of saying "this will be removed in N releases" we say "this
>>>>> will be removed in the first release on or after (now+N months)".
>>>>
>>>> We can always delay removal if a particular release comes sooner than
>>>> originally expected. The deprecation policy is just that we specify
>>>> minimum version numbers at which the features can be removed. It's not
>>>> really a firm schedule.
>>>>
>>>> I do take your suggestion to heart, though. We shouldn't remove stuff
>>>> faster than 12 months or so. I just think that it should modify our
>>>> release process, not our "marking for deprecation" process.
>>>
>>> I'm not sure what this means in practical terms, though? Take the
>>> stuff deprecated in 1.7, released 2013-02-10. From here it seems
>>> plausible that the first release after 2014-02-10 could be 1.9, 1.10,
>>> or even, if we end up really embracing the small-quick-release cycle,
>>> 1.11. So which should we write down as our expected version number for
>>> the 1.7 deprecations?
>>
>> If. I would leave the policy alone until we consistently implement
>> such a release cycle that makes it regularly problematic.
>
> It's being problematic right now,

Changing existing process is like automation: don't do it until the
problem bites you twice. That's why I suggested that we don't change
things until it's *regularly* problematic.

> we need some process in place to
> handle these bugs through the 1.8 release and to make sure we don't
> drop them on the floor later...

Bump the milestones to 1.9.

--
Robert Kern


From jaime.frio at gmail.com  Wed Mar  6 18:52:11 2013
From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=)
Date: Wed, 6 Mar 2013 15:52:11 -0800
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAB6mnxLpLbi0p6OHyZZhx-gyRcQF6HzTwY6HSK0X+i2su0dbUg@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>
	<CAPOWHWk=PsjBaFS00M1vEVPtDJ99hwm94ps-Rnsiesb+D7SY1A@mail.gmail.com>
	<CAB6mnxLpLbi0p6OHyZZhx-gyRcQF6HzTwY6HSK0X+i2su0dbUg@mail.gmail.com>
Message-ID: <CAPOWHWnP6Yp4bX-uobia_Cj8P-d9j3YptDtgq+h2L0H2JafQBw@mail.gmail.com>

On Tue, Mar 5, 2013 at 5:23 AM, Charles R Harris
<charlesr.harris at gmail.com>wrote:

>
>
> On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>> There are actually seven versions of polynomial fit, two for the usual
>>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e,
>>> and Laguerre ;)
>>>
>>
>> Correct me if I am wrong, but the fitted function is the same regardless
>> of the polynomial basis used. I don't know if there can be numerical
>> stability issues, but chebfit(x, y, n) returns the same as
>> poly2cheb(polyfit(x, y, n)).
>>
>> In any case, with all the already existing support for these special
>> polynomials, it wouldn't be too hard to set the problem up to calculate the
>> right coefficients directly for each case.
>>
>>
>>> How do you propose to implement it? I think Lagrange multipliers is
>>> overkill, I'd rather see using the weights (approximate) or change of
>>> variable -- a permutation in this case -- followed by qr and lstsq.
>>>
>>
>> The weights method is already in place, but I find it rather inelegant
>> and unsatisfactory as a solution to this problem. But if it is deemed
>> sufficient, then there is of course no need to go any further.
>>
>> I hadn't thought of any other way than using Lagrange multipliers, but
>> looking at it in more detail, I am not sure it will be possible to
>> formulate it in a manner that can be fed to lstsq, as polyfit does today.
>> And if it can't, it probably wouldn't make much sense to have two different
>> methods which cannot produce the same full output running under the same
>> hood.
>>
>> I can't figure out your "change of variable" method from the succinct
>> description, could you elaborate a little more?
>>
>
> I think the place to add this is to lstsq as linear constraints. That is,
> the coefficients must satisfy B * c = y_c for some set of equations B. In
> the polynomial case the rows of B would be the powers of x at the points
> you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the
> design matrix of the unconstrained points A'  = A * v.T so that B' becomes
> u * d. The coefficients are now replaced by new variables c' with the
> contraints in the first two columns. If there are, say, 2 constraints, u *
> d will be 2x2. Solve that equation for the first two constraints then
> multiply the first two columns of the design matrix A' by the result and
> put them on the rhs, i.e.,
>
>     y = y - A'[:, :2] * c'[:2]
>
> then solve the usual l least squares thing with
>
>     A[:, 2:] * c'[2:] = y
>
> to get the rest of the transformed coefficients c'. Put the coefficients
> altogether and multiply with v^T to get
>
>     c = v^T * c'
>

Very nice, and works beautifully! I have tried the method you describe, and
there are a few relevant observations:

 1. It gives the exact same result as the Lagrange multiplier approach,
which is probably expected, but I wasn't all that sure it would be the case.
 2. The result also seems to be to what the sequence of fits giving
increasing weights to the fixed points converges to. This image
http://i1092.photobucket.com/albums/i412/jfrio/image.png is an example. In
there:
     * blue crosses are the data points to fit to
     * red points are the fixed points
     * blue line is the standard polyfit
     * red line is the constrained polyfit
     * cyan, magenta, yellow and black are polyfits with weights of 2, 4,
8, 16 for the fixed points, 1 for the rest

Seeing this last point, probably the cleanest, least disruptive
implementation of this, would be to allow np.inf values in the weights
parameter, which would get filtered out, and dealt with in the above manner.

So I have two questions:

 1. Does this make sense? Or will it be better to make it more explicit,
with a 'fixed_points' keyword argument defaulting to None?
 2. Once I have this implemented, documented and tested... How do I go
about submitting it for consideration? Would a patch be the way to go, or
should I fork?

Thanks,

Jaime


>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/a4973aa4/attachment.html>

From charlesr.harris at gmail.com  Wed Mar  6 19:29:24 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 6 Mar 2013 17:29:24 -0700
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAPOWHWnP6Yp4bX-uobia_Cj8P-d9j3YptDtgq+h2L0H2JafQBw@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>
	<CAPOWHWk=PsjBaFS00M1vEVPtDJ99hwm94ps-Rnsiesb+D7SY1A@mail.gmail.com>
	<CAB6mnxLpLbi0p6OHyZZhx-gyRcQF6HzTwY6HSK0X+i2su0dbUg@mail.gmail.com>
	<CAPOWHWnP6Yp4bX-uobia_Cj8P-d9j3YptDtgq+h2L0H2JafQBw@mail.gmail.com>
Message-ID: <CAB6mnxLGkGHix_YCh+yKTvfeJPB8pVOhnsgm71jvysTcDXC01w@mail.gmail.com>

On Wed, Mar 6, 2013 at 4:52 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Tue, Mar 5, 2013 at 5:23 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o <
>> jaime.frio at gmail.com> wrote:
>>
>>> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris <
>>> charlesr.harris at gmail.com> wrote:
>>>
>>>>
>>>> There are actually seven versions of polynomial fit, two for the usual
>>>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e,
>>>> and Laguerre ;)
>>>>
>>>
>>> Correct me if I am wrong, but the fitted function is the same regardless
>>> of the polynomial basis used. I don't know if there can be numerical
>>> stability issues, but chebfit(x, y, n) returns the same as
>>> poly2cheb(polyfit(x, y, n)).
>>>
>>> In any case, with all the already existing support for these special
>>> polynomials, it wouldn't be too hard to set the problem up to calculate the
>>> right coefficients directly for each case.
>>>
>>>
>>>> How do you propose to implement it? I think Lagrange multipliers is
>>>> overkill, I'd rather see using the weights (approximate) or change of
>>>> variable -- a permutation in this case -- followed by qr and lstsq.
>>>>
>>>
>>> The weights method is already in place, but I find it rather inelegant
>>> and unsatisfactory as a solution to this problem. But if it is deemed
>>> sufficient, then there is of course no need to go any further.
>>>
>>> I hadn't thought of any other way than using Lagrange multipliers, but
>>> looking at it in more detail, I am not sure it will be possible to
>>> formulate it in a manner that can be fed to lstsq, as polyfit does today.
>>> And if it can't, it probably wouldn't make much sense to have two different
>>> methods which cannot produce the same full output running under the same
>>> hood.
>>>
>>> I can't figure out your "change of variable" method from the succinct
>>> description, could you elaborate a little more?
>>>
>>
>> I think the place to add this is to lstsq as linear constraints. That is,
>> the coefficients must satisfy B * c = y_c for some set of equations B. In
>> the polynomial case the rows of B would be the powers of x at the points
>> you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the
>> design matrix of the unconstrained points A'  = A * v.T so that B' becomes
>> u * d. The coefficients are now replaced by new variables c' with the
>> contraints in the first two columns. If there are, say, 2 constraints, u *
>> d will be 2x2. Solve that equation for the first two constraints then
>> multiply the first two columns of the design matrix A' by the result and
>> put them on the rhs, i.e.,
>>
>>     y = y - A'[:, :2] * c'[:2]
>>
>> then solve the usual l least squares thing with
>>
>>     A[:, 2:] * c'[2:] = y
>>
>> to get the rest of the transformed coefficients c'. Put the coefficients
>> altogether and multiply with v^T to get
>>
>>     c = v^T * c'
>>
>
> Very nice, and works beautifully! I have tried the method you describe,
> and there are a few relevant observations:
>
>  1. It gives the exact same result as the Lagrange multiplier approach,
> which is probably expected, but I wasn't all that sure it would be the case.
>

It's equivalent, but I'm thinking in algorithmic terms, which is somewhat
more specific than the mathematical formulation.


>  2. The result also seems to be to what the sequence of fits giving
> increasing weights to the fixed points converges to. This image
> http://i1092.photobucket.com/albums/i412/jfrio/image.png is an example.
> In there:
>      * blue crosses are the data points to fit to
>      * red points are the fixed points
>      * blue line is the standard polyfit
>      * red line is the constrained polyfit
>      * cyan, magenta, yellow and black are polyfits with weights of 2, 4,
> 8, 16 for the fixed points, 1 for the rest
>
> Seeing this last point, probably the cleanest, least disruptive
> implementation of this, would be to allow np.inf values in the weights
> parameter, which would get filtered out, and dealt with in the above manner.
>
>
Interesting idea, I like it. It is less general, but probably all that is
needed for polynomial fits. I suppose that after you pull out the relevant
rows you can set the weights to zero so that they will have no (order of
roundoff) effect on the remaining fit and you don't need to rewrite the
design matrix.


> So I have two questions:
>
>  1. Does this make sense? Or will it be better to make it more explicit,
> with a 'fixed_points' keyword argument defaulting to None?
>  2. Once I have this implemented, documented and tested... How do I go
> about submitting it for consideration? Would a patch be the way to go, or
> should I fork?
>
>
A fork is definitely the way to go. That makes it easy for folks to review
the code and tell you everything you did wrong ;)

I think adding linear constraints to lstsq would be good, then the upper
level routines can make use of them. Something like a new argument
constraints=(B, y), with None the default.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130306/c78e1ceb/attachment.html>

From alan.isaac at gmail.com  Thu Mar  7 09:51:35 2013
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Thu, 07 Mar 2013 09:51:35 -0500
Subject: [Numpy-discussion] scipy.optimize.fminbound bound violation
Message-ID: <5138A977.1090001@gmail.com>

Under what conditions should I expect fminbound
to call the supplied function with argument values
substantially outside the user-provided bounds?

Thanks,
Alan Isaac


From e.antero.tammi at gmail.com  Thu Mar  7 11:22:30 2013
From: e.antero.tammi at gmail.com (eat)
Date: Thu, 7 Mar 2013 18:22:30 +0200
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAPOWHWnP6Yp4bX-uobia_Cj8P-d9j3YptDtgq+h2L0H2JafQBw@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>
	<CAPOWHWk=PsjBaFS00M1vEVPtDJ99hwm94ps-Rnsiesb+D7SY1A@mail.gmail.com>
	<CAB6mnxLpLbi0p6OHyZZhx-gyRcQF6HzTwY6HSK0X+i2su0dbUg@mail.gmail.com>
	<CAPOWHWnP6Yp4bX-uobia_Cj8P-d9j3YptDtgq+h2L0H2JafQBw@mail.gmail.com>
Message-ID: <CAKa=AYQJO4SA9s5Bg+8sLg7VE4u8RFCic_a8_zzAJ0ULt6ceBA@mail.gmail.com>

Hi,

On Thu, Mar 7, 2013 at 1:52 AM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Tue, Mar 5, 2013 at 5:23 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o <
>> jaime.frio at gmail.com> wrote:
>>
>>> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris <
>>> charlesr.harris at gmail.com> wrote:
>>>
>>>>
>>>> There are actually seven versions of polynomial fit, two for the usual
>>>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e,
>>>> and Laguerre ;)
>>>>
>>>
>>> Correct me if I am wrong, but the fitted function is the same regardless
>>> of the polynomial basis used. I don't know if there can be numerical
>>> stability issues, but chebfit(x, y, n) returns the same as
>>> poly2cheb(polyfit(x, y, n)).
>>>
>>> In any case, with all the already existing support for these special
>>> polynomials, it wouldn't be too hard to set the problem up to calculate the
>>> right coefficients directly for each case.
>>>
>>>
>>>> How do you propose to implement it? I think Lagrange multipliers is
>>>> overkill, I'd rather see using the weights (approximate) or change of
>>>> variable -- a permutation in this case -- followed by qr and lstsq.
>>>>
>>>
>>> The weights method is already in place, but I find it rather inelegant
>>> and unsatisfactory as a solution to this problem. But if it is deemed
>>> sufficient, then there is of course no need to go any further.
>>>
>>> I hadn't thought of any other way than using Lagrange multipliers, but
>>> looking at it in more detail, I am not sure it will be possible to
>>> formulate it in a manner that can be fed to lstsq, as polyfit does today.
>>> And if it can't, it probably wouldn't make much sense to have two different
>>> methods which cannot produce the same full output running under the same
>>> hood.
>>>
>>> I can't figure out your "change of variable" method from the succinct
>>> description, could you elaborate a little more?
>>>
>>
>> I think the place to add this is to lstsq as linear constraints. That is,
>> the coefficients must satisfy B * c = y_c for some set of equations B. In
>> the polynomial case the rows of B would be the powers of x at the points
>> you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the
>> design matrix of the unconstrained points A'  = A * v.T so that B' becomes
>> u * d. The coefficients are now replaced by new variables c' with the
>> contraints in the first two columns. If there are, say, 2 constraints, u *
>> d will be 2x2. Solve that equation for the first two constraints then
>> multiply the first two columns of the design matrix A' by the result and
>> put them on the rhs, i.e.,
>>
>>     y = y - A'[:, :2] * c'[:2]
>>
>> then solve the usual l least squares thing with
>>
>>     A[:, 2:] * c'[2:] = y
>>
>> to get the rest of the transformed coefficients c'. Put the coefficients
>> altogether and multiply with v^T to get
>>
>>     c = v^T * c'
>>
>
> Very nice, and works beautifully! I have tried the method you describe,
> and there are a few relevant observations:
>
>  1. It gives the exact same result as the Lagrange multiplier approach,
> which is probably expected, but I wasn't all that sure it would be the case.
>  2. The result also seems to be to what the sequence of fits giving
> increasing weights to the fixed points converges to. This image
> http://i1092.photobucket.com/albums/i412/jfrio/image.png is an example.
> In there:
>      * blue crosses are the data points to fit to
>      * red points are the fixed points
>      * blue line is the standard polyfit
>      * red line is the constrained polyfit
>      * cyan, magenta, yellow and black are polyfits with weights of 2, 4,
> 8, 16 for the fixed points, 1 for the rest
>
> Seeing this last point, probably the cleanest, least disruptive
> implementation of this, would be to allow np.inf values in the weights
> parameter, which would get filtered out, and dealt with in the above manner.
>

Just to point out that a very simple approach is where one just multiply
the constraints with big enough number M, like:

In []: def V(x, n= None):
   ....:     """Polynomial package compatible Vandermonde 'matrix'"""
   ....:     return vander(x, n)[:, ::-1]
   ....:
In []: def clsq(A, b, C, d, M= 1e5):
   ....:     """A simple constrained least squared solution of Ax= b, s.t.
Cx= d"""
   ....:     return solve(dot(A.T, A)+ M* dot(C.T, C), dot(A.T, b)+ M*
dot(C.T, d))
   ....:

In []: x= linspace(-6, 6, 23)
In []: y= sin(x)+ 4e-1* rand(len(x))- 2e-1
In []: x_f, y_f= linspace(-(3./ 2)* pi, (3./ 2)* pi, 4), array([1, -1, 1,
-1])
In []: n, x_s= 5, linspace(-6, 6, 123)
In []: plot(x, y, 'bo', x_f, y_f, 'bs', x_s, sin(x_s), 'b--')
Out[]: <snip>
In []: for M in 7** (arange(5)):
   ....:     p= Polynomial(clsq(V(x, n), y, V(x_f, n), y_f, M))
   ....:     plot(x_s, p(x_s))
   ....:
Out[]: <snip>
In []: ylim([-2, 2])
Out[]: <snip>
In []: show()

Obviously this is not any 'silver bullet' solution, but simple enough ;-)


My 2 cents,
-eat

>
> So I have two questions:
>
>  1. Does this make sense? Or will it be better to make it more explicit,
> with a 'fixed_points' keyword argument defaulting to None?
>  2. Once I have this implemented, documented and tested... How do I go
> about submitting it for consideration? Would a patch be the way to go, or
> should I fork?
>
> Thanks,
>
> Jaime
>
>
>>
>> Chuck
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130307/f9e9f21e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clsq.png
Type: image/png
Size: 69891 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130307/f9e9f21e/attachment.png>

From charlesr.harris at gmail.com  Thu Mar  7 12:07:15 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 7 Mar 2013 10:07:15 -0700
Subject: [Numpy-discussion] polyfit with fixed points
In-Reply-To: <CAKa=AYQJO4SA9s5Bg+8sLg7VE4u8RFCic_a8_zzAJ0ULt6ceBA@mail.gmail.com>
References: <CAPOWHWnXc7=g760c6w6Umyibmu50ccwijbe=S_oNG9a08vrTjQ@mail.gmail.com>
	<CAB6mnxLv=KtnAr83GwZ+dJR9BTskpi6ya8LJJSEky6bQ0tG=Qg@mail.gmail.com>
	<CAPOWHWk=PsjBaFS00M1vEVPtDJ99hwm94ps-Rnsiesb+D7SY1A@mail.gmail.com>
	<CAB6mnxLpLbi0p6OHyZZhx-gyRcQF6HzTwY6HSK0X+i2su0dbUg@mail.gmail.com>
	<CAPOWHWnP6Yp4bX-uobia_Cj8P-d9j3YptDtgq+h2L0H2JafQBw@mail.gmail.com>
	<CAKa=AYQJO4SA9s5Bg+8sLg7VE4u8RFCic_a8_zzAJ0ULt6ceBA@mail.gmail.com>
Message-ID: <CAB6mnxLKrymbfpTctA6t69U7mn5fSJ0D3uhmtVrX1Sr7cWCBpA@mail.gmail.com>

On Thu, Mar 7, 2013 at 9:22 AM, eat <e.antero.tammi at gmail.com> wrote:

> Hi,
>
> On Thu, Mar 7, 2013 at 1:52 AM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Tue, Mar 5, 2013 at 5:23 AM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o <
>>> jaime.frio at gmail.com> wrote:
>>>
>>>> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris <
>>>> charlesr.harris at gmail.com> wrote:
>>>>
>>>>>
>>>>> There are actually seven versions of polynomial fit, two for the usual
>>>>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e,
>>>>> and Laguerre ;)
>>>>>
>>>>
>>>> Correct me if I am wrong, but the fitted function is the same
>>>> regardless of the polynomial basis used. I don't know if there can be
>>>> numerical stability issues, but chebfit(x, y, n) returns the same as
>>>> poly2cheb(polyfit(x, y, n)).
>>>>
>>>> In any case, with all the already existing support for these special
>>>> polynomials, it wouldn't be too hard to set the problem up to calculate the
>>>> right coefficients directly for each case.
>>>>
>>>>
>>>>> How do you propose to implement it? I think Lagrange multipliers is
>>>>> overkill, I'd rather see using the weights (approximate) or change of
>>>>> variable -- a permutation in this case -- followed by qr and lstsq.
>>>>>
>>>>
>>>> The weights method is already in place, but I find it rather inelegant
>>>> and unsatisfactory as a solution to this problem. But if it is deemed
>>>> sufficient, then there is of course no need to go any further.
>>>>
>>>> I hadn't thought of any other way than using Lagrange multipliers, but
>>>> looking at it in more detail, I am not sure it will be possible to
>>>> formulate it in a manner that can be fed to lstsq, as polyfit does today.
>>>> And if it can't, it probably wouldn't make much sense to have two different
>>>> methods which cannot produce the same full output running under the same
>>>> hood.
>>>>
>>>> I can't figure out your "change of variable" method from the succinct
>>>> description, could you elaborate a little more?
>>>>
>>>
>>> I think the place to add this is to lstsq as linear constraints. That
>>> is, the coefficients must satisfy B * c = y_c for some set of equations B.
>>> In the polynomial case the rows of B would be the powers of x at the points
>>> you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the
>>> design matrix of the unconstrained points A'  = A * v.T so that B' becomes
>>> u * d. The coefficients are now replaced by new variables c' with the
>>> contraints in the first two columns. If there are, say, 2 constraints, u *
>>> d will be 2x2. Solve that equation for the first two constraints then
>>> multiply the first two columns of the design matrix A' by the result and
>>> put them on the rhs, i.e.,
>>>
>>>     y = y - A'[:, :2] * c'[:2]
>>>
>>> then solve the usual l least squares thing with
>>>
>>>     A[:, 2:] * c'[2:] = y
>>>
>>> to get the rest of the transformed coefficients c'. Put the coefficients
>>> altogether and multiply with v^T to get
>>>
>>>     c = v^T * c'
>>>
>>
>> Very nice, and works beautifully! I have tried the method you describe,
>> and there are a few relevant observations:
>>
>>  1. It gives the exact same result as the Lagrange multiplier approach,
>> which is probably expected, but I wasn't all that sure it would be the case.
>>  2. The result also seems to be to what the sequence of fits giving
>> increasing weights to the fixed points converges to. This image
>> http://i1092.photobucket.com/albums/i412/jfrio/image.png is an example.
>> In there:
>>      * blue crosses are the data points to fit to
>>      * red points are the fixed points
>>      * blue line is the standard polyfit
>>      * red line is the constrained polyfit
>>      * cyan, magenta, yellow and black are polyfits with weights of 2, 4,
>> 8, 16 for the fixed points, 1 for the rest
>>
>> Seeing this last point, probably the cleanest, least disruptive
>> implementation of this, would be to allow np.inf values in the weights
>> parameter, which would get filtered out, and dealt with in the above manner.
>>
>
> Just to point out that a very simple approach is where one just multiply
> the constraints with big enough number M, like:
>
> In []: def V(x, n= None):
>    ....:     """Polynomial package compatible Vandermonde 'matrix'"""
>    ....:     return vander(x, n)[:, ::-1]
>

Just to note, there is a polyvander in numpy.polynomial.polynomial, and a
chebvander in numpy.polynomial.chebyshev, etc.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130307/0d0ab4e3/attachment.html>

From francesc at continuum.io  Thu Mar  7 12:47:12 2013
From: francesc at continuum.io (Francesc Alted)
Date: Thu, 07 Mar 2013 18:47:12 +0100
Subject: [Numpy-discussion] aligned / unaligned structured dtype
 behavior (was: GSOC 2013)
In-Reply-To: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
Message-ID: <5138D2A0.3080802@continuum.io>

On 3/6/13 7:42 PM, Kurt Smith wrote:
> And regarding performance, doing simple timings shows a 30%-ish
> slowdown for unaligned operations:
>
> In [36]: %timeit packed_arr['b']**2
> 100 loops, best of 3: 2.48 ms per loop
>
> In [37]: %timeit aligned_arr['b']**2
> 1000 loops, best of 3: 1.9 ms per loop

Hmm, that clearly depends on the architecture.  On my machine:

In [1]: import numpy as np

In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)

In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)

In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)

In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)

In [6]: baligned = aligned_arr['b']

In [7]: bpacked = packed_arr['b']

In [8]: %timeit baligned**2
1000 loops, best of 3: 1.96 ms per loop

In [9]: %timeit bpacked**2
100 loops, best of 3: 7.84 ms per loop

That is, the unaligned column is 4x slower (!).  numexpr allows somewhat 
better results:

In [11]: %timeit numexpr.evaluate('baligned**2')
1000 loops, best of 3: 1.13 ms per loop

In [12]: %timeit numexpr.evaluate('bpacked**2')
1000 loops, best of 3: 865 us per loop

Yes, in this case, the unaligned array goes faster (as much as 30%).  I 
think the reason is that numexpr optimizes the unaligned access by doing 
a copy of the different chunks in internal buffers that fits in L1 
cache.  Apparently this is very beneficial in this case (not sure why, 
though).

>
> Whereas summing shows just a 10%-ish slowdown:
>
> In [38]: %timeit packed_arr['b'].sum()
> 1000 loops, best of 3: 1.29 ms per loop
>
> In [39]: %timeit aligned_arr['b'].sum()
> 1000 loops, best of 3: 1.14 ms per loop

On my machine:

In [14]: %timeit baligned.sum()
1000 loops, best of 3: 1.03 ms per loop

In [15]: %timeit bpacked.sum()
100 loops, best of 3: 3.79 ms per loop

Again, the 4x slowdown is here.  Using numexpr:

In [16]: %timeit numexpr.evaluate('sum(baligned)')
100 loops, best of 3: 2.16 ms per loop

In [17]: %timeit numexpr.evaluate('sum(bpacked)')
100 loops, best of 3: 2.08 ms per loop

Again, the unaligned case is (sligthly better).  In this case numexpr is 
a bit slower that NumPy because sum() is not parallelized internally.  
Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy 
could help improving unaligned performance. Worth a try?

-- 
Francesc Alted


From francesc at continuum.io  Thu Mar  7 13:06:03 2013
From: francesc at continuum.io (Francesc Alted)
Date: Thu, 07 Mar 2013 19:06:03 +0100
Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior
In-Reply-To: <5138D2A0.3080802@continuum.io>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
	<5138D2A0.3080802@continuum.io>
Message-ID: <5138D70B.2030606@continuum.io>

On 3/7/13 6:47 PM, Francesc Alted wrote:
> On 3/6/13 7:42 PM, Kurt Smith wrote:
>> And regarding performance, doing simple timings shows a 30%-ish
>> slowdown for unaligned operations:
>>
>> In [36]: %timeit packed_arr['b']**2
>> 100 loops, best of 3: 2.48 ms per loop
>>
>> In [37]: %timeit aligned_arr['b']**2
>> 1000 loops, best of 3: 1.9 ms per loop
>
> Hmm, that clearly depends on the architecture.  On my machine:
>
> In [1]: import numpy as np
>
> In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)
>
> In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)
>
> In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)
>
> In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)
>
> In [6]: baligned = aligned_arr['b']
>
> In [7]: bpacked = packed_arr['b']
>
> In [8]: %timeit baligned**2
> 1000 loops, best of 3: 1.96 ms per loop
>
> In [9]: %timeit bpacked**2
> 100 loops, best of 3: 7.84 ms per loop
>
> That is, the unaligned column is 4x slower (!).  numexpr allows 
> somewhat better results:
>
> In [11]: %timeit numexpr.evaluate('baligned**2')
> 1000 loops, best of 3: 1.13 ms per loop
>
> In [12]: %timeit numexpr.evaluate('bpacked**2')
> 1000 loops, best of 3: 865 us per loop

Just for completeness, here it is what Theano gets:

In [18]: import theano

In [20]: a = theano.tensor.vector()

In [22]: f = theano.function([a], a**2)

In [23]: %timeit f(baligned)
100 loops, best of 3: 7.74 ms per loop

In [24]: %timeit f(bpacked)
100 loops, best of 3: 12.6 ms per loop

So yeah, Theano is also slower for the unaligned case (but less than 2x 
in this case).

>
> Yes, in this case, the unaligned array goes faster (as much as 30%).  
> I think the reason is that numexpr optimizes the unaligned access by 
> doing a copy of the different chunks in internal buffers that fits in 
> L1 cache.  Apparently this is very beneficial in this case (not sure 
> why, though).
>
>>
>> Whereas summing shows just a 10%-ish slowdown:
>>
>> In [38]: %timeit packed_arr['b'].sum()
>> 1000 loops, best of 3: 1.29 ms per loop
>>
>> In [39]: %timeit aligned_arr['b'].sum()
>> 1000 loops, best of 3: 1.14 ms per loop
>
> On my machine:
>
> In [14]: %timeit baligned.sum()
> 1000 loops, best of 3: 1.03 ms per loop
>
> In [15]: %timeit bpacked.sum()
> 100 loops, best of 3: 3.79 ms per loop
>
> Again, the 4x slowdown is here.  Using numexpr:
>
> In [16]: %timeit numexpr.evaluate('sum(baligned)')
> 100 loops, best of 3: 2.16 ms per loop
>
> In [17]: %timeit numexpr.evaluate('sum(bpacked)')
> 100 loops, best of 3: 2.08 ms per loop

And with Theano:

In [26]: f2 = theano.function([a], a.sum())

In [27]: %timeit f2(baligned)
100 loops, best of 3: 2.52 ms per loop

In [28]: %timeit f2(bpacked)
100 loops, best of 3: 7.43 ms per loop

Again, the unaligned case is significantly slower (as much as 3x here!).

-- 
Francesc Alted


From nouiz at nouiz.org  Thu Mar  7 13:26:27 2013
From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=)
Date: Thu, 7 Mar 2013 13:26:27 -0500
Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior
In-Reply-To: <5138D70B.2030606@continuum.io>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
	<5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io>
Message-ID: <CADKKbthuFN6n3+=ce=cyQyr8K_T6aX2GQDH0W3pesoUnN5-jvQ@mail.gmail.com>

Hi,

It is normal that unaligned access are slower. The hardware have been
optimized for aligned access. So this is a user choice space vs speed.
We can't go around that. We can only minimize the cost of unaligned
access in some cases, but not all and those optimization depend of the
CPU. But newer CPU have lowered in cost of unaligned access.

I'm surprised that Theano worked with the unaligned input. I added
some check to make this raise an error, as we do not support that!
Francesc, can you check if Theano give the good result? It is possible
that someone (maybe me), just copy the input to an aligned ndarray
when we receive an not aligned one. That could explain why it worked,
but my memory tell me that we raise an error.

As you saw in the number, this is a bad example for Theano as the
function compiled is too fast . Their is more Theano overhead then
computation time in that example. We have reduced recently the
overhead, but we can do more to lower it.

Fred

On Thu, Mar 7, 2013 at 1:06 PM, Francesc Alted <francesc at continuum.io> wrote:
> On 3/7/13 6:47 PM, Francesc Alted wrote:
>> On 3/6/13 7:42 PM, Kurt Smith wrote:
>>> And regarding performance, doing simple timings shows a 30%-ish
>>> slowdown for unaligned operations:
>>>
>>> In [36]: %timeit packed_arr['b']**2
>>> 100 loops, best of 3: 2.48 ms per loop
>>>
>>> In [37]: %timeit aligned_arr['b']**2
>>> 1000 loops, best of 3: 1.9 ms per loop
>>
>> Hmm, that clearly depends on the architecture.  On my machine:
>>
>> In [1]: import numpy as np
>>
>> In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)
>>
>> In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)
>>
>> In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)
>>
>> In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)
>>
>> In [6]: baligned = aligned_arr['b']
>>
>> In [7]: bpacked = packed_arr['b']
>>
>> In [8]: %timeit baligned**2
>> 1000 loops, best of 3: 1.96 ms per loop
>>
>> In [9]: %timeit bpacked**2
>> 100 loops, best of 3: 7.84 ms per loop
>>
>> That is, the unaligned column is 4x slower (!).  numexpr allows
>> somewhat better results:
>>
>> In [11]: %timeit numexpr.evaluate('baligned**2')
>> 1000 loops, best of 3: 1.13 ms per loop
>>
>> In [12]: %timeit numexpr.evaluate('bpacked**2')
>> 1000 loops, best of 3: 865 us per loop
>
> Just for completeness, here it is what Theano gets:
>
> In [18]: import theano
>
> In [20]: a = theano.tensor.vector()
>
> In [22]: f = theano.function([a], a**2)
>
> In [23]: %timeit f(baligned)
> 100 loops, best of 3: 7.74 ms per loop
>
> In [24]: %timeit f(bpacked)
> 100 loops, best of 3: 12.6 ms per loop
>
> So yeah, Theano is also slower for the unaligned case (but less than 2x
> in this case).
>
>>
>> Yes, in this case, the unaligned array goes faster (as much as 30%).
>> I think the reason is that numexpr optimizes the unaligned access by
>> doing a copy of the different chunks in internal buffers that fits in
>> L1 cache.  Apparently this is very beneficial in this case (not sure
>> why, though).
>>
>>>
>>> Whereas summing shows just a 10%-ish slowdown:
>>>
>>> In [38]: %timeit packed_arr['b'].sum()
>>> 1000 loops, best of 3: 1.29 ms per loop
>>>
>>> In [39]: %timeit aligned_arr['b'].sum()
>>> 1000 loops, best of 3: 1.14 ms per loop
>>
>> On my machine:
>>
>> In [14]: %timeit baligned.sum()
>> 1000 loops, best of 3: 1.03 ms per loop
>>
>> In [15]: %timeit bpacked.sum()
>> 100 loops, best of 3: 3.79 ms per loop
>>
>> Again, the 4x slowdown is here.  Using numexpr:
>>
>> In [16]: %timeit numexpr.evaluate('sum(baligned)')
>> 100 loops, best of 3: 2.16 ms per loop
>>
>> In [17]: %timeit numexpr.evaluate('sum(bpacked)')
>> 100 loops, best of 3: 2.08 ms per loop
>
> And with Theano:
>
> In [26]: f2 = theano.function([a], a.sum())
>
> In [27]: %timeit f2(baligned)
> 100 loops, best of 3: 2.52 ms per loop
>
> In [28]: %timeit f2(bpacked)
> 100 loops, best of 3: 7.43 ms per loop
>
> Again, the unaligned case is significantly slower (as much as 3x here!).
>
> --
> Francesc Alted
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From ben.root at ou.edu  Thu Mar  7 14:14:28 2013
From: ben.root at ou.edu (Benjamin Root)
Date: Thu, 7 Mar 2013 14:14:28 -0500
Subject: [Numpy-discussion] feature tracking in numpy/scipy
In-Reply-To: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com>
Message-ID: <CANNq6FmMLRf+_xnKFAU2P5JXE0g69kefwf_Th+n5fJB-ps5Mww@mail.gmail.com>

On Sat, Mar 2, 2013 at 5:32 PM, Scott Collis <scollis.acrf at gmail.com> wrote:

> Good afternoon list,
> I am looking at feature tracking in a 2D numpy array, along the lines of
> Dixon and Wiener 1993 (for tracking precipitating storms)
>
> Identifying features based on threshold is quite trivial using
> ndimage.label
>
> b_fld=np.zeros(mygrid.fields['rain_rate_A']['data'].shape)
> rr=10
> b_fld[mygrid.fields['rain_rate_A']['data'] > rr]=1.0
> labels, numobjects = ndimage.label(b_fld[0,0,:,:])
> (note mygrid.fields['rain_rate_A']['data'] is dimensions time,height, y, x)
>
> using the matplotlib contouring and fetching the vertices I can get a nice
> list of polygons of rain rate above a certain threshold? Now from here I
> can just go and implement the Dixon and Wiener methodology but I thought I
> would check here first to see if anyone know of a object/feature tracking
> algorithm in numpy/scipy or using numpy arrays (it just seems like
> something people would want to do!).. i.e. something that looks back and
> forward in time and identifies polygon movement and identifies objects with
> temporal persistence..
>
> Cheers!
> Scott
>
> Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification,
> Tracking, Analysis, and Nowcasting?A Radar-based Methodology. *Journal of
> Atmospheric and Oceanic Technology*, *10*, 785?797,
> doi:10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.
>
> http://journals.ametsoc.org/doi/abs/10.1175/1520-0426%281993%29010%3C0785%3ATTITAA%3E2.0.CO%3B2
>
>
>
Say hello to my PhD project: https://github.com/WeatherGod/ZigZag

In it, I have the centroid-tracking portion of the TITAN code, along with
SCIT, and hooks into MHT.  Several of the dependencies are also available
in my repositories.

Cheers!
Ben

P.S. - I have personally met Dr. Dixon on multiple occasions and he is a
great guy to work with.  Feel free to email him or myself with questions
about TITAN.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130307/27aee796/attachment.html>

From dagamayank at gmail.com  Thu Mar  7 14:36:14 2013
From: dagamayank at gmail.com (Mayank Daga)
Date: Thu, 7 Mar 2013 13:36:14 -0600
Subject: [Numpy-discussion] Definition of dot function
Message-ID: <CALhwE7VkciDpSiHZV_FqodSh4tNLo9Z1M=QvaK_MKc-fbkTBOQ@mail.gmail.com>

Hi,

Can someone point me to the definition of dot() in the numpy source? The
only instance of 'def dot()' I found was in numpy/ma/extras.py but that
does not seem to be the correct one.

~mayank

-- 
Mayank Daga
"Nothing Succeeds Like Success"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130307/23b53262/attachment.html>

From heng at cantab.net  Thu Mar  7 15:26:47 2013
From: heng at cantab.net (Henry Gomersall)
Date: Thu, 07 Mar 2013 20:26:47 +0000
Subject: [Numpy-discussion] Definition of dot function
In-Reply-To: <CALhwE7VkciDpSiHZV_FqodSh4tNLo9Z1M=QvaK_MKc-fbkTBOQ@mail.gmail.com>
References: <CALhwE7VkciDpSiHZV_FqodSh4tNLo9Z1M=QvaK_MKc-fbkTBOQ@mail.gmail.com>
Message-ID: <1362688007.3893.6.camel@farnsworth>

On Thu, 2013-03-07 at 13:36 -0600, Mayank Daga wrote:
> Can someone point me to the definition of dot() in the numpy source?
> The only instance of 'def dot()' I found was in numpy/ma/extras.py but
> that does not seem to be the correct one.

It seems to be in a dynamic library.

In [9]: numpy.dot.__module__
Out[9]: 'numpy.core.multiarray'

In [10]: numpy.core.multiarray.__file__
Out[10]:
'/usr/local/lib/python2.7/dist-packages/numpy/core/multiarray.so'

so... in here perhaps?
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c

hen


From njs at pobox.com  Thu Mar  7 17:21:43 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 7 Mar 2013 22:21:43 +0000
Subject: [Numpy-discussion] Definition of dot function
In-Reply-To: <1362688007.3893.6.camel@farnsworth>
References: <CALhwE7VkciDpSiHZV_FqodSh4tNLo9Z1M=QvaK_MKc-fbkTBOQ@mail.gmail.com>
	<1362688007.3893.6.camel@farnsworth>
Message-ID: <CAPJVwB=A5mpMkTn_sE_GiNfSXpbTYyfRojhGvsea0L7pSyHAyw@mail.gmail.com>

On 7 Mar 2013 20:27, "Henry Gomersall" <heng at cantab.net> wrote:
>
> On Thu, 2013-03-07 at 13:36 -0600, Mayank Daga wrote:
> > Can someone point me to the definition of dot() in the numpy source?
> > The only instance of 'def dot()' I found was in numpy/ma/extras.py but
> > that does not seem to be the correct one.
>
> It seems to be in a dynamic library.
>
> In [9]: numpy.dot.__module__
> Out[9]: 'numpy.core.multiarray'
>
> In [10]: numpy.core.multiarray.__file__
> Out[10]:
> '/usr/local/lib/python2.7/dist-packages/numpy/core/multiarray.so'
>
> so... in here perhaps?
>
https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c

The actual entry point is array_matrixproduct in that file, which then
calls PyArray_MatrixProduct2, which either does the work or dispatches
through a dtype-specific function pointer ('dotfunc').

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130307/8a8ab487/attachment.html>

From kwmsmith at gmail.com  Thu Mar  7 22:14:19 2013
From: kwmsmith at gmail.com (Kurt Smith)
Date: Thu, 7 Mar 2013 21:14:19 -0600
Subject: [Numpy-discussion] aligned / unaligned structured dtype
 behavior (was: GSOC 2013)
In-Reply-To: <5138D2A0.3080802@continuum.io>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
	<5138D2A0.3080802@continuum.io>
Message-ID: <CALpTaNa7vWz7vwBpLs=C_10Ovp8dXhXg+LVx5o1znNpdXRiMgA@mail.gmail.com>

On Thu, Mar 7, 2013 at 11:47 AM, Francesc Alted <francesc at continuum.io> wrote:
> On 3/6/13 7:42 PM, Kurt Smith wrote:
>
> Hmm, that clearly depends on the architecture.  On my machine:
> ...
> That is, the unaligned column is 4x slower (!).  numexpr allows somewhat
> better results:
> ...
> Yes, in this case, the unaligned array goes faster (as much as 30%).  I
> think the reason is that numexpr optimizes the unaligned access by doing
> a copy of the different chunks in internal buffers that fits in L1
> cache.  Apparently this is very beneficial in this case (not sure why,
> though).
>
> On my machine:
> ...
> Again, the 4x slowdown is here.  Using numexpr:
> ...
> Again, the unaligned case is (sligthly better).  In this case numexpr is
> a bit slower that NumPy because sum() is not parallelized internally.
> Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy
> could help improving unaligned performance. Worth a try?
>

Very interesting -- thanks for sharing.

> --
> Francesc Alted


From kwmsmith at gmail.com  Thu Mar  7 22:28:22 2013
From: kwmsmith at gmail.com (Kurt Smith)
Date: Thu, 7 Mar 2013 21:28:22 -0600
Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior
In-Reply-To: <CADKKbthuFN6n3+=ce=cyQyr8K_T6aX2GQDH0W3pesoUnN5-jvQ@mail.gmail.com>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
	<5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io>
	<CADKKbthuFN6n3+=ce=cyQyr8K_T6aX2GQDH0W3pesoUnN5-jvQ@mail.gmail.com>
Message-ID: <CALpTaNap1=2XAdHCYVudTZiLD+kn-ztjb2GVEy9urE4NK-gNEQ@mail.gmail.com>

On Thu, Mar 7, 2013 at 12:26 PM, Fr?d?ric Bastien <nouiz at nouiz.org> wrote:
> Hi,
>
> It is normal that unaligned access are slower. The hardware have been
> optimized for aligned access. So this is a user choice space vs speed.

The quantitative difference is still important, so this thread is
useful for future reference, I think.  If reading in data into a
packed array is 3x faster than reading into an aligned array, but the
core computation is 4x slower with a packed array...you get the idea.

I would have benefitted years ago knowing (1) numpy structured dtypes
are packed by default, and (2) computations with unaligned data can be
several factors slower than aligned.  That's strong motivation to
always make sure I'm using 'aligned=True' except when memory usage is
an issue, or for file IO with packed binary data, etc.

> We can't go around that. We can only minimize the cost of unaligned
> access in some cases, but not all and those optimization depend of the
> CPU. But newer CPU have lowered in cost of unaligned access.
>
> I'm surprised that Theano worked with the unaligned input. I added
> some check to make this raise an error, as we do not support that!
> Francesc, can you check if Theano give the good result? It is possible
> that someone (maybe me), just copy the input to an aligned ndarray
> when we receive an not aligned one. That could explain why it worked,
> but my memory tell me that we raise an error.
>
> As you saw in the number, this is a bad example for Theano as the
> function compiled is too fast . Their is more Theano overhead then
> computation time in that example. We have reduced recently the
> overhead, but we can do more to lower it.
>
> Fred
>
> On Thu, Mar 7, 2013 at 1:06 PM, Francesc Alted <francesc at continuum.io> wrote:
>> On 3/7/13 6:47 PM, Francesc Alted wrote:
>>> On 3/6/13 7:42 PM, Kurt Smith wrote:
>>>> And regarding performance, doing simple timings shows a 30%-ish
>>>> slowdown for unaligned operations:
>>>>
>>>> In [36]: %timeit packed_arr['b']**2
>>>> 100 loops, best of 3: 2.48 ms per loop
>>>>
>>>> In [37]: %timeit aligned_arr['b']**2
>>>> 1000 loops, best of 3: 1.9 ms per loop
>>>
>>> Hmm, that clearly depends on the architecture.  On my machine:
>>>
>>> In [1]: import numpy as np
>>>
>>> In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)
>>>
>>> In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)
>>>
>>> In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)
>>>
>>> In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)
>>>
>>> In [6]: baligned = aligned_arr['b']
>>>
>>> In [7]: bpacked = packed_arr['b']
>>>
>>> In [8]: %timeit baligned**2
>>> 1000 loops, best of 3: 1.96 ms per loop
>>>
>>> In [9]: %timeit bpacked**2
>>> 100 loops, best of 3: 7.84 ms per loop
>>>
>>> That is, the unaligned column is 4x slower (!).  numexpr allows
>>> somewhat better results:
>>>
>>> In [11]: %timeit numexpr.evaluate('baligned**2')
>>> 1000 loops, best of 3: 1.13 ms per loop
>>>
>>> In [12]: %timeit numexpr.evaluate('bpacked**2')
>>> 1000 loops, best of 3: 865 us per loop
>>
>> Just for completeness, here it is what Theano gets:
>>
>> In [18]: import theano
>>
>> In [20]: a = theano.tensor.vector()
>>
>> In [22]: f = theano.function([a], a**2)
>>
>> In [23]: %timeit f(baligned)
>> 100 loops, best of 3: 7.74 ms per loop
>>
>> In [24]: %timeit f(bpacked)
>> 100 loops, best of 3: 12.6 ms per loop
>>
>> So yeah, Theano is also slower for the unaligned case (but less than 2x
>> in this case).
>>
>>>
>>> Yes, in this case, the unaligned array goes faster (as much as 30%).
>>> I think the reason is that numexpr optimizes the unaligned access by
>>> doing a copy of the different chunks in internal buffers that fits in
>>> L1 cache.  Apparently this is very beneficial in this case (not sure
>>> why, though).
>>>
>>>>
>>>> Whereas summing shows just a 10%-ish slowdown:
>>>>
>>>> In [38]: %timeit packed_arr['b'].sum()
>>>> 1000 loops, best of 3: 1.29 ms per loop
>>>>
>>>> In [39]: %timeit aligned_arr['b'].sum()
>>>> 1000 loops, best of 3: 1.14 ms per loop
>>>
>>> On my machine:
>>>
>>> In [14]: %timeit baligned.sum()
>>> 1000 loops, best of 3: 1.03 ms per loop
>>>
>>> In [15]: %timeit bpacked.sum()
>>> 100 loops, best of 3: 3.79 ms per loop
>>>
>>> Again, the 4x slowdown is here.  Using numexpr:
>>>
>>> In [16]: %timeit numexpr.evaluate('sum(baligned)')
>>> 100 loops, best of 3: 2.16 ms per loop
>>>
>>> In [17]: %timeit numexpr.evaluate('sum(bpacked)')
>>> 100 loops, best of 3: 2.08 ms per loop
>>
>> And with Theano:
>>
>> In [26]: f2 = theano.function([a], a.sum())
>>
>> In [27]: %timeit f2(baligned)
>> 100 loops, best of 3: 2.52 ms per loop
>>
>> In [28]: %timeit f2(bpacked)
>> 100 loops, best of 3: 7.43 ms per loop
>>
>> Again, the unaligned case is significantly slower (as much as 3x here!).
>>
>> --
>> Francesc Alted
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From francesc at continuum.io  Fri Mar  8 05:22:20 2013
From: francesc at continuum.io (Francesc Alted)
Date: Fri, 08 Mar 2013 11:22:20 +0100
Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior
In-Reply-To: <CADKKbthuFN6n3+=ce=cyQyr8K_T6aX2GQDH0W3pesoUnN5-jvQ@mail.gmail.com>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
	<5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io>
	<CADKKbthuFN6n3+=ce=cyQyr8K_T6aX2GQDH0W3pesoUnN5-jvQ@mail.gmail.com>
Message-ID: <5139BBDC.6090202@continuum.io>

On 3/7/13 7:26 PM, Fr?d?ric Bastien wrote:
> Hi,
>
> It is normal that unaligned access are slower. The hardware have been
> optimized for aligned access. So this is a user choice space vs speed.
> We can't go around that.

Well, my benchmarks apparently say that numexpr can get better 
performance when tackling computations on unaligned arrays (30% 
faster).  This puzzled me a bit yesterday, but after thinking a bit 
about what was happening, the explanation is clear to me now.

The aligned and unaligned arrays were not contiguous, as they had a gap 
between elements (a consequence of the layout of structure arrays): 8 
bytes for the aligned case and 1 byte for the packed one.  The hardware 
of modern machines fetches a complete cache line (64 bytes typically) 
whenever an element is accessed and that means that, even though we are 
only making use of one field in the computations, both fields are 
brought into cache.  That means that, for aligned object, 16 MB (16 
bytes * 1 million elements) are transmitted to the cache, while the 
unaligned object only have to transmit 9 MB (9 bytes * 1 million).  Of 
course, transmitting 16 MB is pretty much work than just 9 MB.

Now, the elements land in cache aligned for the aligned case and 
unaligned for the packed case, and as you say, unaligned access in cache 
is pretty slow for the CPU, and this is the reason why NumPy can take up 
to 4x more time to perform the computation.  So why numexpr is 
performing much better for the packed case?  Well, it turns out that 
numexpr has machinery to detect that an array is unaligned, and does an 
internal copy for every block that is brought to the cache to be 
computed.  This block size is between 1024 elements (8 KB for double 
precision) and 4096 elements when linked with VML support, and that 
means that a copy normally happens at L1 or L2 cache speed, which is 
much faster than memory-to-memory copy. After the copy numexpr can 
perform operations with aligned data at full CPU speed.  The paradox is 
that, by doing more copies, you may end performing faster computations.  
This is the joy of programming with memory hierarchy in mind.

This is to say that there is more in the equation than just if an array 
is aligned or not.  You must take in account how (and how much!) data 
travels from storage to CPU before making assumptions on the performance 
of your programs.

>   We can only minimize the cost of unaligned
> access in some cases, but not all and those optimization depend of the
> CPU. But newer CPU have lowered in cost of unaligned access.
>
> I'm surprised that Theano worked with the unaligned input. I added
> some check to make this raise an error, as we do not support that!
> Francesc, can you check if Theano give the good result? It is possible
> that someone (maybe me), just copy the input to an aligned ndarray
> when we receive an not aligned one. That could explain why it worked,
> but my memory tell me that we raise an error.

It seems to work for me:

In [10]: f = theano.function([a], a**2)

In [11]: f(baligned)
Out[11]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])

In [12]: f(bpacked)
Out[12]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])

In [13]: f2 = theano.function([a], a.sum())

In [14]: f2(baligned)
Out[14]: array(1000000.0)

In [15]: f2(bpacked)
Out[15]: array(1000000.0)


>
> As you saw in the number, this is a bad example for Theano as the
> function compiled is too fast . Their is more Theano overhead then
> computation time in that example. We have reduced recently the
> overhead, but we can do more to lower it.

Yeah.  I was mainly curious about how different packages handle 
unaligned arrays.

-- 
Francesc Alted


From ondrej.certik at gmail.com  Fri Mar  8 06:07:02 2013
From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=)
Date: Fri, 8 Mar 2013 12:07:02 +0100
Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases
In-Reply-To: <CABL7CQi0RenzuRth-_ire4Vw9PMEm-hKjXnXYsYeMPk3f5b9RQ@mail.gmail.com>
References: <CAB6mnxJ-xB2KuW=v52JpcuO7kqW6rQ=ep3fR3Ak_3Z1_C3d_hg@mail.gmail.com>
	<CAPJVwBnu_gMqyj7HHEdtH0=Z9JyHtzGxP=1vX1qzRK5UX_=Qig@mail.gmail.com>
	<CABL7CQi0RenzuRth-_ire4Vw9PMEm-hKjXnXYsYeMPk3f5b9RQ@mail.gmail.com>
Message-ID: <CADDwiVBMMppZAmX9JpHuCOpWGYpHtBTwC4WWVNUnQGVBWd6apQ@mail.gmail.com>

On Wed, Mar 6, 2013 at 9:52 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Wed, Mar 6, 2013 at 9:06 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Wed, Mar 6, 2013 at 6:43 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> > Hi All,
>> >
>> > There are now some 14 non-merge commits in the 1.7.x branch including
>> > the
>> > critical diagonal leak fix. I think there is maybe one more critical
>> > backport and perhaps several low priority fixes, documentation and such,
>> > but
>> > I think we should start up the release process with a goal of getting
>> > 1.7.1
>> > out by the middle of April.
>>
>> What's the critical backport you're thinking of? This last shows just
>> two backport PRs waiting to be merged, one trivial one that I just
>> submitted, the other that needs a tweak but won't take long:
>>   https://github.com/numpy/numpy/issues?milestone=27&page=1&state=open
>> But I agree, basically we should merge those two (today?) and then
>> release the first RC as soon as Ondrej has a moment to do so...
>
>
> I added issue 2999, which I think should be taken along. Other than that, +1
> for a quick release.
>
>
>>
>> > The development branch has been accumulating stuff since last summer, I
>> > suggest we look to get it out in May, branching at the end of this
>> > month.
>>
>> I would say "let's fix the blockers and then branch as soon as Ondrej
>> has time to do it", but in practice I suspect this comes out the same
>> as what you just said :-). I just pruned the list of blockers; here's
>> what we've got:
>>   https://github.com/numpy/numpy/issues?milestone=1&page=1&state=open
>
>
> It looks like we're not doing so well with setting Milestones correctly.
> Only 4 closed issues for 1.8....
>
> Release quickly after 1.7.1 sounds good.

I hope to finish the rest of issues for 1.7.1 today or tomorrow.
Should I release 1.7.1rc1 first? I think that makes sense, just to be
sure, right?

Ondrej


From mdroe at stsci.edu  Fri Mar  8 09:33:24 2013
From: mdroe at stsci.edu (Michael Droettboom)
Date: Fri, 8 Mar 2013 09:33:24 -0500
Subject: [Numpy-discussion] SciPy John Hunter Excellence in Plotting Contest
In-Reply-To: <513928AF.7010201@stsci.edu>
References: <513928AF.7010201@stsci.edu>
Message-ID: <5139F6B4.2010800@stsci.edu>


Apologies for any accidental cross-posting.

Email not displaying correctly? View it in your browser. 
<http://us1.campaign-archive1.com/?u=e91b4574d5d1709a9dc4f7ab7&id=999d7ba343&e=7c1fb2879c> 


Scientific Computing with Python-Austin, Texas-June 24-29, 2013


  SciPy John Hunter Excellence in Plotting Contest

In memory of John Hunter, we are pleased to announce the first SciPy 
John Hunter Excellence in Plotting Competition. This open competition 
aims to highlight the importance of quality plotting to scientific 
progress and showcase the capabilities of the current generation of 
plotting software. Participants are invited to submit scientific plots 
to be judged by a panel. The winning entries will be announced and 
displayed at the conference.

NumFOCUS is graciously sponsoring cash prizes for the winners in the 
following amounts:

  * 1st prize: $500
  * 2nd prize: $200
  * 3rd prize: $100


    Instructions

  * Entries must be submitted by April 3 via e-mail
    <mailto:plotting-contest at scipy.org>.
  * Plots may be produced with any combination of Python-based tools (it
    is not required that they use matplotlib, for example).
  * Source code for the plot must be provided, along with a rendering of
    the plot in a vector format (PDF, PS, etc.). If the data can not be
    shared for reasons of size or licensing, "fake" data may be
    substituted, along with an image of the plot using real data.
  * Entries will be judged on their clarity, innovation and aesthetics,
    but most importantly for their effectiveness in illuminating real
    scientific work. Entrants are encouraged to submit plots that were
    used during the course of research, rather than merely being
    hypothetical.
  * SciPy reserves the right to display the entry at the conference, use
    in any materials or on its website, providing attribution to the
    original author(s).


    Important dates:

  * April 3rd: Plotting submissions due
  * Monday-Tuesday, June 24 - 25: SciPy 2013 Tutorials, Austin TX
  * Wednesday-Thursday, June 26 - 27: SciPy 2013 Conference, Austin TX *
    Winners will be announced during the conference days
  * Friday-Saturday, June 27 - 28: SciPy 2013 Sprints, Austin TX & remote

We look forward to exciting submissions that push the boundaries of 
plotting, in this, our first attempt at this kind of competition.

The SciPy Plotting Contest Organizer

-Michael Droettboom, Space Telescope Science Institute
You are receiving this email because you subscribed to the mailing list 
or registered for the SciPy 2010 or SciPy 2011 conference in Austin, TX.

Unsubscribe 
<http://scipy.us1.list-manage.com/unsubscribe?u=e91b4574d5d1709a9dc4f7ab7&id=069dcb6ee4&e=7c1fb2879c&c=999d7ba343> 
mdboom at gmail.com <mailto:mdboom at gmail.com> from this list | Forward to a 
friend 
<http://us1.forward-to-friend1.com/forward?u=e91b4574d5d1709a9dc4f7ab7&id=999d7ba343&e=7c1fb2879c> 
| Update your profile 
<http://scipy.us1.list-manage.com/profile?u=e91b4574d5d1709a9dc4f7ab7&id=069dcb6ee4&e=7c1fb2879c> 

*Our mailing address is:*
Enthought, Inc.
515 Congress Ave.
Austin, TX 78701

Add us to your address book 
<http://scipy.us1.list-manage.com/vcard?u=e91b4574d5d1709a9dc4f7ab7&id=069dcb6ee4>

/Copyright (C) 2013 Enthought, Inc. All rights reserved./


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130308/084df299/attachment.html>

From nouiz at nouiz.org  Fri Mar  8 10:16:43 2013
From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=)
Date: Fri, 8 Mar 2013 10:16:43 -0500
Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior
In-Reply-To: <5139BBDC.6090202@continuum.io>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
	<5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io>
	<CADKKbthuFN6n3+=ce=cyQyr8K_T6aX2GQDH0W3pesoUnN5-jvQ@mail.gmail.com>
	<5139BBDC.6090202@continuum.io>
Message-ID: <CADKKbtjLkd39v9U9tBVyGaz4sNJYEbt4V28VM2bWutcehE-njA@mail.gmail.com>

On Fri, Mar 8, 2013 at 5:22 AM, Francesc Alted <francesc at continuum.io> wrote:
> On 3/7/13 7:26 PM, Fr?d?ric Bastien wrote:
>> I'm surprised that Theano worked with the unaligned input. I added
>> some check to make this raise an error, as we do not support that!
>> Francesc, can you check if Theano give the good result? It is possible
>> that someone (maybe me), just copy the input to an aligned ndarray
>> when we receive an not aligned one. That could explain why it worked,
>> but my memory tell me that we raise an error.
>
> It seems to work for me:
>
> In [10]: f = theano.function([a], a**2)
>
> In [11]: f(baligned)
> Out[11]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])
>
> In [12]: f(bpacked)
> Out[12]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])
>
> In [13]: f2 = theano.function([a], a.sum())
>
> In [14]: f2(baligned)
> Out[14]: array(1000000.0)
>
> In [15]: f2(bpacked)
> Out[15]: array(1000000.0)


I understand what happen. You declare the symbolic variable like this:

a = theano.tensor.vector()

This create a symbolic variable with dtype floatX that is float64 by
default. baligned and bpacked are of dtype int64.

When a Theano function receive as input an ndarray of the wrong dtype,
we try to cast it to the good dtype and check we don't loose
precission. As the input are only 1s, there is no lost of precission,
so the input is silently accepted and copied. So when we check later
for the aligned flags, it pass.

If you change the symbolic variable to have a dtype of int64, there
won't be a copy and we will see the error:

a = theano.tensor.lvector()
f = theano.function([a], a ** 2)
f(bpacked)

TypeError: ('Bad input argument to theano function at index
0(0-based)', 'The numpy.ndarray object is not aligned. Theano C code
does not support that.', '', 'object shape', (1000000,), 'object
strides', (9,))

If I time now this new function I have:

In [14]: timeit baligned**2
100 loops, best of 3: 7.5 ms per loop

In [15]: timeit bpacked**2
100 loops, best of 3: 8.25 ms per loop

In [16]: timeit f(baligned)
100 loops, best of 3: 7.36 ms per loop


So the Theano overhead was the copy in this case. It is not the first
time I saw this. We added the automatic cast to allow specifing most
python int/list/real as input.


Fred


From nouiz at nouiz.org  Fri Mar  8 10:18:10 2013
From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=)
Date: Fri, 8 Mar 2013 10:18:10 -0500
Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior
In-Reply-To: <CALpTaNap1=2XAdHCYVudTZiLD+kn-ztjb2GVEy9urE4NK-gNEQ@mail.gmail.com>
References: <CALpTaNaE_2T1eNwbsJsTUhCGPMB7kt36jEG=7jDZrd0A0neY-g@mail.gmail.com>
	<5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io>
	<CADKKbthuFN6n3+=ce=cyQyr8K_T6aX2GQDH0W3pesoUnN5-jvQ@mail.gmail.com>
	<CALpTaNap1=2XAdHCYVudTZiLD+kn-ztjb2GVEy9urE4NK-gNEQ@mail.gmail.com>
Message-ID: <CADKKbtgO4qantgxLPN+oJZhQb3QWh53WJp2zNKwi=hMj-2Pq-A@mail.gmail.com>

I agree that documenting this better would be useful to many people.

So if someone what to summarize this and put it in the doc, I think
many people will appreciate this.

Fred

On Thu, Mar 7, 2013 at 10:28 PM, Kurt Smith <kwmsmith at gmail.com> wrote:
> On Thu, Mar 7, 2013 at 12:26 PM, Fr?d?ric Bastien <nouiz at nouiz.org> wrote:
>> Hi,
>>
>> It is normal that unaligned access are slower. The hardware have been
>> optimized for aligned access. So this is a user choice space vs speed.
>
> The quantitative difference is still important, so this thread is
> useful for future reference, I think.  If reading in data into a
> packed array is 3x faster than reading into an aligned array, but the
> core computation is 4x slower with a packed array...you get the idea.
>
> I would have benefitted years ago knowing (1) numpy structured dtypes
> are packed by default, and (2) computations with unaligned data can be
> several factors slower than aligned.  That's strong motivation to
> always make sure I'm using 'aligned=True' except when memory usage is
> an issue, or for file IO with packed binary data, etc.
>
>> We can't go around that. We can only minimize the cost of unaligned
>> access in some cases, but not all and those optimization depend of the
>> CPU. But newer CPU have lowered in cost of unaligned access.
>>
>> I'm surprised that Theano worked with the unaligned input. I added
>> some check to make this raise an error, as we do not support that!
>> Francesc, can you check if Theano give the good result? It is possible
>> that someone (maybe me), just copy the input to an aligned ndarray
>> when we receive an not aligned one. That could explain why it worked,
>> but my memory tell me that we raise an error.
>>
>> As you saw in the number, this is a bad example for Theano as the
>> function compiled is too fast . Their is more Theano overhead then
>> computation time in that example. We have reduced recently the
>> overhead, but we can do more to lower it.
>>
>> Fred
>>
>> On Thu, Mar 7, 2013 at 1:06 PM, Francesc Alted <francesc at continuum.io> wrote:
>>> On 3/7/13 6:47 PM, Francesc Alted wrote:
>>>> On 3/6/13 7:42 PM, Kurt Smith wrote:
>>>>> And regarding performance, doing simple timings shows a 30%-ish
>>>>> slowdown for unaligned operations:
>>>>>
>>>>> In [36]: %timeit packed_arr['b']**2
>>>>> 100 loops, best of 3: 2.48 ms per loop
>>>>>
>>>>> In [37]: %timeit aligned_arr['b']**2
>>>>> 1000 loops, best of 3: 1.9 ms per loop
>>>>
>>>> Hmm, that clearly depends on the architecture.  On my machine:
>>>>
>>>> In [1]: import numpy as np
>>>>
>>>> In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)
>>>>
>>>> In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)
>>>>
>>>> In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)
>>>>
>>>> In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)
>>>>
>>>> In [6]: baligned = aligned_arr['b']
>>>>
>>>> In [7]: bpacked = packed_arr['b']
>>>>
>>>> In [8]: %timeit baligned**2
>>>> 1000 loops, best of 3: 1.96 ms per loop
>>>>
>>>> In [9]: %timeit bpacked**2
>>>> 100 loops, best of 3: 7.84 ms per loop
>>>>
>>>> That is, the unaligned column is 4x slower (!).  numexpr allows
>>>> somewhat better results:
>>>>
>>>> In [11]: %timeit numexpr.evaluate('baligned**2')
>>>> 1000 loops, best of 3: 1.13 ms per loop
>>>>
>>>> In [12]: %timeit numexpr.evaluate('bpacked**2')
>>>> 1000 loops, best of 3: 865 us per loop
>>>
>>> Just for completeness, here it is what Theano gets:
>>>
>>> In [18]: import theano
>>>
>>> In [20]: a = theano.tensor.vector()
>>>
>>> In [22]: f = theano.function([a], a**2)
>>>
>>> In [23]: %timeit f(baligned)
>>> 100 loops, best of 3: 7.74 ms per loop
>>>
>>> In [24]: %timeit f(bpacked)
>>> 100 loops, best of 3: 12.6 ms per loop
>>>
>>> So yeah, Theano is also slower for the unaligned case (but less than 2x
>>> in this case).
>>>
>>>>
>>>> Yes, in this case, the unaligned array goes faster (as much as 30%).
>>>> I think the reason is that numexpr optimizes the unaligned access by
>>>> doing a copy of the different chunks in internal buffers that fits in
>>>> L1 cache.  Apparently this is very beneficial in this case (not sure
>>>> why, though).
>>>>
>>>>>
>>>>> Whereas summing shows just a 10%-ish slowdown:
>>>>>
>>>>> In [38]: %timeit packed_arr['b'].sum()
>>>>> 1000 loops, best of 3: 1.29 ms per loop
>>>>>
>>>>> In [39]: %timeit aligned_arr['b'].sum()
>>>>> 1000 loops, best of 3: 1.14 ms per loop
>>>>
>>>> On my machine:
>>>>
>>>> In [14]: %timeit baligned.sum()
>>>> 1000 loops, best of 3: 1.03 ms per loop
>>>>
>>>> In [15]: %timeit bpacked.sum()
>>>> 100 loops, best of 3: 3.79 ms per loop
>>>>
>>>> Again, the 4x slowdown is here.  Using numexpr:
>>>>
>>>> In [16]: %timeit numexpr.evaluate('sum(baligned)')
>>>> 100 loops, best of 3: 2.16 ms per loop
>>>>
>>>> In [17]: %timeit numexpr.evaluate('sum(bpacked)')
>>>> 100 loops, best of 3: 2.08 ms per loop
>>>
>>> And with Theano:
>>>
>>> In [26]: f2 = theano.function([a], a.sum())
>>>
>>> In [27]: %timeit f2(baligned)
>>> 100 loops, best of 3: 2.52 ms per loop
>>>
>>> In [28]: %timeit f2(bpacked)
>>> 100 loops, best of 3: 7.43 ms per loop
>>>
>>> Again, the unaligned case is significantly slower (as much as 3x here!).
>>>
>>> --
>>> Francesc Alted
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From mdroe at stsci.edu  Thu Mar  7 18:54:23 2013
From: mdroe at stsci.edu (Michael Droettboom)
Date: Thu, 7 Mar 2013 18:54:23 -0500
Subject: [Numpy-discussion] SciPy John Hunter Excellence in Plotting Contest
In-Reply-To: <CAJd6-48F_r3NpnkZW_eA=Z3APu3ULrOXn15aLHcxFayqMqUJng@mail.gmail.com>
References: <CAJd6-48F_r3NpnkZW_eA=Z3APu3ULrOXn15aLHcxFayqMqUJng@mail.gmail.com>
Message-ID: <513928AF.7010201@stsci.edu>

Apologies for any accidental cross-posting.

Email not displaying correctly? View it in your browser. 
<http://us1.campaign-archive1.com/?u=e91b4574d5d1709a9dc4f7ab7&id=999d7ba343&e=7c1fb2879c> 


Scientific Computing with Python-Austin, Texas-June 24-29, 2013


  SciPy John Hunter Excellence in Plotting Contest

In memory of John Hunter, we are pleased to announce the first SciPy 
John Hunter Excellence in Plotting Competition. This open competition 
aims to highlight the importance of quality plotting to scientific 
progress and showcase the capabilities of the current generation of 
plotting software. Participants are invited to submit scientific plots 
to be judged by a panel. The winning entries will be announced and 
displayed at the conference.

NumFOCUS is graciously sponsoring cash prizes for the winners in the 
following amounts:

  * 1st prize: $500
  * 2nd prize: $200
  * 3rd prize: $100


    Instructions

  * Entries must be submitted by April 3 via e-mail
    <mailto:plotting-contest at scipy.org>.
  * Plots may be produced with any combination of Python-based tools (it
    is not required that they use matplotlib, for example).
  * Source code for the plot must be provided, along with a rendering of
    the plot in a vector format (PDF, PS, etc.). If the data can not be
    shared for reasons of size or licensing, "fake" data may be
    substituted, along with an image of the plot using real data.
  * Entries will be judged on their clarity, innovation and aesthetics,
    but most importantly for their effectiveness in illuminating real
    scientific work. Entrants are encouraged to submit plots that were
    used during the course of research, rather than merely being
    hypothetical.
  * SciPy reserves the right to display the entry at the conference, use
    in any materials or on its website, providing attribution to the
    original author(s).


    Important dates:

  * April 3rd: Plotting submissions due
  * Monday-Tuesday, June 24 - 25: SciPy 2013 Tutorials, Austin TX
  * Wednesday-Thursday, June 26 - 27: SciPy 2013 Conference, Austin TX *
    Winners will be announced during the conference days
  * Friday-Saturday, June 27 - 28: SciPy 2013 Sprints, Austin TX & remote

We look forward to exciting submissions that push the boundaries of 
plotting, in this, our first attempt at this kind of competition.

The SciPy Plotting Contest Organizer

-Michael Droettboom, Space Telescope Science Institute
You are receiving this email because you subscribed to the mailing list 
or registered for the SciPy 2010 or SciPy 2011 conference in Austin, TX.

Unsubscribe 
<http://scipy.us1.list-manage.com/unsubscribe?u=e91b4574d5d1709a9dc4f7ab7&id=069dcb6ee4&e=7c1fb2879c&c=999d7ba343> 
mdboom at gmail.com <mailto:mdboom at gmail.com> from this list | Forward to a 
friend 
<http://us1.forward-to-friend1.com/forward?u=e91b4574d5d1709a9dc4f7ab7&id=999d7ba343&e=7c1fb2879c> 
| Update your profile 
<http://scipy.us1.list-manage.com/profile?u=e91b4574d5d1709a9dc4f7ab7&id=069dcb6ee4&e=7c1fb2879c> 

*Our mailing address is:*
Enthought, Inc.
515 Congress Ave.
Austin, TX 78701

Add us to your address book 
<http://scipy.us1.list-manage.com/vcard?u=e91b4574d5d1709a9dc4f7ab7&id=069dcb6ee4>

/Copyright (C) 2013 Enthought, Inc. All rights reserved./


-- 
Michael Droettboom
http://www.droettboom.com/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130307/d674e3da/attachment.html>

From sergio.callegari at gmail.com  Fri Mar  8 11:23:33 2013
From: sergio.callegari at gmail.com (Sergio Callegari)
Date: Fri, 8 Mar 2013 16:23:33 +0000 (UTC)
Subject: [Numpy-discussion] Casting and promotion rules (e.g. int + uint64
	=> float)
Message-ID: <loom.20130308T170408-627@post.gmane.org>

Hi,

  I have noticed that numpy introduces some unexpected type casts, that are
in some cases problematic.

A very weird cast is

int + uint64 -> float

for instance, consider the following snippet:

import numpy as np
a=np.uint64(1)
a+1
-> 2.0

this cast is quite different from what other programming languages (e.g., C)
would do in this case, so it already comes unexpected. 

Furthermore, an int64 (or an uint64) is too large to fit into a float,
hence this automatic conversion also results in data loss! For instance
consider:

a=np.uint64(18446744073709551614)
a+np.uint64(1)
-> 18446744073709551615 # CORRECT!
a+1
-> 1.8446744073709552e+19 
# Actually 1.84467440737095516160e+19 - LOSS OF DATA

in fact

np.uint64(a+1)
-> 0

Weird, isn't it?

Another issue is that variables unexpectedly change type with accumulation 
operators

a=np.uint64(1)
a+=1

now a is float

I believe that some casting/promotion rules should be revised, since they
now lead to difficult to catch, intermittent errors.  In case this cannot
be done immediately, I suggest at least documenting these promotions,
providing examples on how to code many conventional tasks. E.g., 
incrementing an integer of unknown size

b=a+type(a)(1)

I have also reported this in https://github.com/numpy/numpy/issues/3118

Thanks!


From pelson.pub at gmail.com  Fri Mar  8 12:38:23 2013
From: pelson.pub at gmail.com (Phil Elson)
Date: Fri, 8 Mar 2013 17:38:23 +0000
Subject: [Numpy-discussion] Implementing a "find first" style function
In-Reply-To: <CANNq6FmLkHunDrBW5t2Fy5Sh34E_sQaD9TeRw90mCAGNzb=dYQ@mail.gmail.com>
References: <CA+L60sD10b9oAd+fMy1t941nX3kSy2RWS4RVkkhpFhi2pUdseg@mail.gmail.com>
	<CANNq6FmLkHunDrBW5t2Fy5Sh34E_sQaD9TeRw90mCAGNzb=dYQ@mail.gmail.com>
Message-ID: <CA+L60sBGXLc6s+yyAnmKqwAs1SUAh7egAux+CWcmsjZdXH5bTw@mail.gmail.com>

Interesting. I hadn't thought of those. I've implemented (very roughly
without a sound logic check) and benchmarked:

def my_any(a, predicate, chunk_size=2048):
    try:
        next(find(a, predicate, chunk_size))
        return True
    except StopIteration:
        return False

def my_all(a, predicate, chunk_size=2048):
    return not my_any(a, lambda a: ~predicate(a), chunk_size)


With the following setup:

import numpy as np
import numpy.random

np.random.seed(1)
a = np.random.randn(1e8)


For a low frequency *any*:

In [12]: %timeit (np.abs(a) > 6).any()
1 loops, best of 3: 1.29 s per loop

In [13]: %timeit my_any(a, lambda a: np.abs(a) > 6)
1 loops, best of 3: 792 ms per loop

In [14]: %timeit my_any(a, lambda a: np.abs(a) > 6, chunk_size=10000)
1 loops, best of 3: 654 ms per loop

For a False *any*:

In [16]: %timeit (np.abs(a) > 7).any()
1 loops, best of 3: 1.22 s per loop

In [17]: %timeit my_any(a, lambda a: np.abs(a) > 7)
1 loops, best of 3: 2.4 s per loop

For a high probability *any*:

In [28]: %timeit (np.abs(a) > 1).any()
1 loops, best of 3: 972 ms per loop

In [27]: %timeit my_any(a, lambda a: np.abs(a) > 1)
10000 loops, best of 3: 67 us per loop

---------------

For a low probability *all*:

In [18]: %timeit (np.abs(a) < 6).all()
1 loops, best of 3: 1.16 s per loop

In [19]: %timeit my_all(a, lambda a: np.abs(a) < 6)
1 loops, best of 3: 880 ms per loop

In [20]: %timeit my_all(a, lambda a: np.abs(a) < 6, chunk_size=10000)
1 loops, best of 3: 706 ms per loop

For a True *all*:

In [22]: %timeit (np.abs(a) < 7).all()
1 loops, best of 3: 1.47 s per loop

In [23]: %timeit my_all(a, lambda a: np.abs(a) < 7)
1 loops, best of 3: 2.65 s per loop

For a high probability *all*:

In [25]: %timeit (np.abs(a) < 1).all()
1 loops, best of 3: 978 ms per loop

In [26]: %timeit my_all(a, lambda a: np.abs(a) < 1)
10000 loops, best of 3: 73.6 us per loop


On 6 March 2013 21:16, Benjamin Root <ben.root at ou.edu> wrote:

>
>
> On Tue, Mar 5, 2013 at 9:15 AM, Phil Elson <pelson.pub at gmail.com> wrote:
>
>> The ticket https://github.com/numpy/numpy/issues/2269 discusses the
>> possibility of implementing a "find first" style function which can
>> optimise the process of finding the first value(s) which match a predicate
>> in a given 1D array. For example:
>>
>>
>> >>> a = np.sin(np.linspace(0, np.pi, 200))
>> >>> print find_first(a, lambda a: a > 0.9)
>> ((71, ), 0.900479032457)
>>
>>
>> This has been discussed in several locations:
>>
>> https://github.com/numpy/numpy/issues/2269
>> https://github.com/numpy/numpy/issues/2333
>>
>> http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item
>>
>>
>> *Rationale*
>>
>> For small arrays there is no real reason to avoid doing:
>>
>> >>> a = np.sin(np.linspace(0, np.pi, 200))
>> >>> ind = (a > 0.9).nonzero()[0][0]
>> >>> print (ind, ), a[ind]
>> (71,) 0.900479032457
>>
>>
>> But for larger arrays, this can lead to massive amounts of work even if
>> the result is one of the first to be computed. Example:
>>
>> >>> a = np.arange(1e8)
>> >>> print (a == 5).nonzero()[0][0]
>> 5
>>
>>
>> So a function which terminates when the first matching value is found is
>> desirable.
>>
>> As mentioned in #2269, it is possible to define a consistent ordering
>> which allows this functionality for >1D arrays, but IMHO it overcomplicates
>> the problem and was not a case that I personally needed, so I've limited
>> the scope to 1D arrays only.
>>
>>
>> *Implementation*
>>
>> My initial assumption was that to get any kind of performance I would
>> need to write the *find* function in C, however after prototyping with
>> some array chunking it became apparent that a trivial python function would
>> be quick enough for my needs.
>>
>> The approach I've implemented in the code found in #2269 simply breaks
>> the array into sub-arrays of maximum length *chunk_size* (2048 by
>> default, though there is no real science to this number), applies the given
>> predicating function, and yields the results from *nonzero()*. The given
>> function should be a python function which operates on the whole of the
>> sub-array element-wise (i.e. the function should be vectorized). Returning
>> a generator also has the benefit of allowing users to get the first *n*matching values/indices.
>>
>>
>> *Results*
>>
>>
>> I timed the implementation of *find* found in my comment at
>> https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with an
>> obvious test:
>>
>>
>> In [1]: from np_utils import find
>>
>> In [2]: import numpy as np
>>
>> In [3]: import numpy.random
>>
>> In [4]: np.random.seed(1)
>>
>> In [5]: a = np.random.randn(1e8)
>>
>> In [6]: a.min(), a.max()
>> Out[6]: (-6.1194900990552776, 5.9632246301166321)
>>
>> In [7]: next(find(a, lambda a: np.abs(a) > 6))
>> Out[7]: ((33105441,), -6.1194900990552776)
>>
>> In [8]: (np.abs(a) > 6).nonzero()
>> Out[8]: (array([33105441]),)
>>
>> In [9]: %timeit (np.abs(a) > 6).nonzero()
>> 1 loops, best of 3: 1.51 s per loop
>>
>> In [10]: %timeit next(find(a, lambda a: np.abs(a) > 6))
>> 1 loops, best of 3: 912 ms per loop
>>
>> In [11]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=100000))
>> 1 loops, best of 3: 470 ms per loop
>>
>> In [12]: %timeit next(find(a, lambda a: np.abs(a) > 6,
>> chunk_size=1000000))
>> 1 loops, best of 3: 483 ms per loop
>>
>>
>> This shows that picking a sensible *chunk_size* can yield massive
>> speed-ups (nonzero is x3 slower in one case). A similar example with a much
>> smaller 1D array shows similar promise:
>>
>> In [41]: a = np.random.randn(1e4)
>>
>> In [42]: %timeit next(find(a, lambda a: np.abs(a) > 3))
>> 10000 loops, best of 3: 35.8 us per loop
>>
>> In [43]: %timeit (np.abs(a) > 3).nonzero()
>> 10000 loops, best of 3: 148 us per loop
>>
>>
>> As I commented on the issue tracker, if you think this function is worth
>> taking forward, I'd be happy to open up a pull request.
>>
>> Feedback greatfully received.
>>
>> Cheers,
>>
>> Phil
>>
>>
>>
> In the interest of generalizing code and such, could such approaches be
> used for functions like np.any() and np.all() for short-circuiting if True
> or False (respectively) are found?  I wonder what other sort of functions
> in NumPy might benefit from this?
>
> Ben Root
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130308/73b041e0/attachment.html>

From njs at pobox.com  Fri Mar  8 17:23:11 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 8 Mar 2013 22:23:11 +0000
Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases
In-Reply-To: <CADDwiVBMMppZAmX9JpHuCOpWGYpHtBTwC4WWVNUnQGVBWd6apQ@mail.gmail.com>
References: <CAB6mnxJ-xB2KuW=v52JpcuO7kqW6rQ=ep3fR3Ak_3Z1_C3d_hg@mail.gmail.com>
	<CAPJVwBnu_gMqyj7HHEdtH0=Z9JyHtzGxP=1vX1qzRK5UX_=Qig@mail.gmail.com>
	<CABL7CQi0RenzuRth-_ire4Vw9PMEm-hKjXnXYsYeMPk3f5b9RQ@mail.gmail.com>
	<CADDwiVBMMppZAmX9JpHuCOpWGYpHtBTwC4WWVNUnQGVBWd6apQ@mail.gmail.com>
Message-ID: <CAPJVwBnKrfBHyBsWXTmyxi2joNE=o_66Ni1arEWtVaTwYDuqgg@mail.gmail.com>

On Fri, Mar 8, 2013 at 11:07 AM, Ond?ej ?ert?k <ondrej.certik at gmail.com> wrote:
> I hope to finish the rest of issues for 1.7.1 today or tomorrow.
> Should I release 1.7.1rc1 first? I think that makes sense, just to be
> sure, right?

Big +1 to doing an RC from me.

I guess conceptually this is like we just jumped back in time to right
before we released 1.7.0, and merged a bunch more bug-fixes. We'd
definitely have done another RC for the new changes then, so we should
do one now too :-).

-n


From sebastian at sipsolutions.net  Sat Mar  9 11:17:39 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sat, 09 Mar 2013 17:17:39 +0100
Subject: [Numpy-discussion] Compile time flag for numpy
Message-ID: <1362845859.15128.2.camel@sebastian-laptop>

Hey,

how would I go about making a compile time flag for numpy to use as a
macro?

The reason is: https://github.com/numpy/numpy/pull/2735

so that it would be possible to compile numpy differently for debugging
if software depending on numpy is broken by this change.

Regards,

Sebastian


From sebastian at sipsolutions.net  Sat Mar  9 11:30:51 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sat, 09 Mar 2013 17:30:51 +0100
Subject: [Numpy-discussion] Compile time flag for numpy
In-Reply-To: <1362845859.15128.2.camel@sebastian-laptop>
References: <1362845859.15128.2.camel@sebastian-laptop>
Message-ID: <1362846651.15128.3.camel@sebastian-laptop>

On Sat, 2013-03-09 at 17:17 +0100, Sebastian Berg wrote:
> Hey,
> 
> how would I go about making a compile time flag for numpy to use as a
> macro?
> 

To be clear I mean an environment variable.

> The reason is: https://github.com/numpy/numpy/pull/2735
> 
> so that it would be possible to compile numpy differently for debugging
> if software depending on numpy is broken by this change.
> 
> Regards,
> 
> Sebastian
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From siu at continuum.io  Sun Mar 10 14:12:27 2013
From: siu at continuum.io (Siu Kwan Lam)
Date: Sun, 10 Mar 2013 13:12:27 -0500
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
Message-ID: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>

Hi all,

I am redirecting a discussion on github issue tracker here.  My original post (https://github.com/numpy/numpy/issues/3137):

"The current implementation of the RNG seems to be MT19937-32. Since 64-bit machines are common nowadays, I am suggesting adding or upgrading to MT19937-64.  Thoughts?"

Let me start by answering to njsmith's comments on the issue tracker:

> Would it be faster?

Although I have not benchmarked the 64-bit implementation, it is likely that it will be faster on a 64-bit machine since the number of iteration (controlled by NN and MM in the reference implementation http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c) is reduced by half.  In addition, each generation in the 64-bit implementation produces a 64-bit random int which can be used to generate double precision random number.  Unlike the 32-bit implementation which requires generating a pair of 32-bit random int.

But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit instructions; thus, it is likely to be slower.  (1)

> Use less memory?

The amount of memory use will remain the same.  The size of the RNG state is the same.

> Provide higher quality randomness?


My naive answer is that 32-bit and 64-bit implementation have the same 2^19937-1 period.  Need to do some research and experiments.

> Would it change the output of this program:
>   import numpy
>   numpy.random.seed(0)
>   print numpy.random.random()
> ?

Unfortunately, yes.  The 64-bit implementation generates a different random number sequence with the same seed. (2)


My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future).  If user does not provide a choice, we use the MT19937-32 by default.

        numpy.random.set_state("MT19937_64", ?)   # choose the 64-bit implementation
	
Thoughts?

Best,
Siu 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130310/363432de/attachment.html>

From rdirective at gmail.com  Sun Mar 10 21:24:59 2013
From: rdirective at gmail.com (QT)
Date: Sun, 10 Mar 2013 20:24:59 -0500
Subject: [Numpy-discussion] Numpy 1.7.0 with Intel MKL 11.0.2.146
Message-ID: <CAFgvQuCd5rwp2k1stQL=kF5N0edv0ZHbpinqxmsgYXzHU_+h5g@mail.gmail.com>

Dear all,

I'm at my wits end.  I've followed Intel's own
instructions<http://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl>on
how to compile Numpy with Intel MKL.  Everything compiled and linked
fine and I've installed it locally in my user folder...There is one nasty
problem.  When one calls the numpy library to do some computation, it does
not use all of the available threads.  I have 8 "cores" on my machine and
it only uses 4 of them.  The MKL_NUM_THREADS environmental variable can be
set to tune the number of threads but setting it to 8 does not change
anything.  Indeed, setting it to 3 does limit the threads to 3....What is
going on?

As a comparison, the numpy (version 1.4.1, installed from yum, which uses
BLAS+ATLAS) uses all 8 threads.  I do not get this.

You can run this test program

python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)'
'np.dot(a, a)'

There is one saving grace, the local numpy built with MKL is much faster
than the system's numpy.

I hope someone can help me.  Searching the internet has been fruitless.

Best,
Quyen

My site.cfg for numpy (1.7.0)
[mkl]
library_dirs = /opt/intel/mkl/lib/intel64
include_dirs = /opt/intel/mkl/include
mkl_libs = mkl_rt
lapack_libs =

I've edited line 37 of numpy/distutils/intelcompiler.py
self.cc_exe = 'icc -O3 -fPIC -fp-model strict -fomit-frame-pointer -openmp
-parallel -DMKL_ILP64'

Also line 54 of numpy/distutils/fcompiler/intel.py
return ['-i8 -xhost -openmp -fp-model strict']

My .bash_profile also contains the lines:
source /opt/intel/bin/compilervars.sh intel64
source /opt/intel/mkl/bin/mklvars.sh intel64

The above is needed to set the LD_LIBRARY_PATH so that Python can source
the intel dynamic library when numpy is called.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130310/111e4611/attachment.html>

From warren.weckesser at gmail.com  Sun Mar 10 23:18:23 2013
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Sun, 10 Mar 2013 23:18:23 -0400
Subject: [Numpy-discussion] Numpy 1.7.0 with Intel MKL 11.0.2.146
In-Reply-To: <CAFgvQuCd5rwp2k1stQL=kF5N0edv0ZHbpinqxmsgYXzHU_+h5g@mail.gmail.com>
References: <CAFgvQuCd5rwp2k1stQL=kF5N0edv0ZHbpinqxmsgYXzHU_+h5g@mail.gmail.com>
Message-ID: <CAGzF1uc7S6a7iHTmkDT4x5zJ4rMAmVK5qWg3pGsxvxuXKRQb_Q@mail.gmail.com>

On 3/10/13, QT <rdirective at gmail.com> wrote:
> Dear all,
>
> I'm at my wits end.  I've followed Intel's own
> instructions<http://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl>on
> how to compile Numpy with Intel MKL.  Everything compiled and linked
> fine and I've installed it locally in my user folder...There is one nasty
> problem.  When one calls the numpy library to do some computation, it does
> not use all of the available threads.  I have 8 "cores" on my machine and
> it only uses 4 of them.  The MKL_NUM_THREADS environmental variable can be
> set to tune the number of threads but setting it to 8 does not change
> anything.  Indeed, setting it to 3 does limit the threads to 3....What is
> going on?


Does your computer have 8 physical cores, or 4 cores that look like 8
because of hyperthreading?

Warren


>
> As a comparison, the numpy (version 1.4.1, installed from yum, which uses
> BLAS+ATLAS) uses all 8 threads.  I do not get this.
>
> You can run this test program
>
> python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)'
> 'np.dot(a, a)'
>
> There is one saving grace, the local numpy built with MKL is much faster
> than the system's numpy.
>
> I hope someone can help me.  Searching the internet has been fruitless.
>
> Best,
> Quyen
>
> My site.cfg for numpy (1.7.0)
> [mkl]
> library_dirs = /opt/intel/mkl/lib/intel64
> include_dirs = /opt/intel/mkl/include
> mkl_libs = mkl_rt
> lapack_libs =
>
> I've edited line 37 of numpy/distutils/intelcompiler.py
> self.cc_exe = 'icc -O3 -fPIC -fp-model strict -fomit-frame-pointer -openmp
> -parallel -DMKL_ILP64'
>
> Also line 54 of numpy/distutils/fcompiler/intel.py
> return ['-i8 -xhost -openmp -fp-model strict']
>
> My .bash_profile also contains the lines:
> source /opt/intel/bin/compilervars.sh intel64
> source /opt/intel/mkl/bin/mklvars.sh intel64
>
> The above is needed to set the LD_LIBRARY_PATH so that Python can source
> the intel dynamic library when numpy is called.
>


From warren.weckesser at gmail.com  Sun Mar 10 23:31:15 2013
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Sun, 10 Mar 2013 23:31:15 -0400
Subject: [Numpy-discussion] Numpy 1.7.0 with Intel MKL 11.0.2.146
In-Reply-To: <CAGzF1uc7S6a7iHTmkDT4x5zJ4rMAmVK5qWg3pGsxvxuXKRQb_Q@mail.gmail.com>
References: <CAFgvQuCd5rwp2k1stQL=kF5N0edv0ZHbpinqxmsgYXzHU_+h5g@mail.gmail.com>
	<CAGzF1uc7S6a7iHTmkDT4x5zJ4rMAmVK5qWg3pGsxvxuXKRQb_Q@mail.gmail.com>
Message-ID: <CAGzF1udxnMqJ7=N62NkiYQAF7CVRwsGcgAm6A3vB8NjRPHMWhg@mail.gmail.com>

On 3/10/13, Warren Weckesser <warren.weckesser at gmail.com> wrote:
> On 3/10/13, QT <rdirective at gmail.com> wrote:
>> Dear all,
>>
>> I'm at my wits end.  I've followed Intel's own
>> instructions<http://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl>on
>> how to compile Numpy with Intel MKL.  Everything compiled and linked
>> fine and I've installed it locally in my user folder...There is one nasty
>> problem.  When one calls the numpy library to do some computation, it
>> does
>> not use all of the available threads.  I have 8 "cores" on my machine and
>> it only uses 4 of them.  The MKL_NUM_THREADS environmental variable can
>> be
>> set to tune the number of threads but setting it to 8 does not change
>> anything.  Indeed, setting it to 3 does limit the threads to 3....What is
>> going on?
>
>
> Does your computer have 8 physical cores, or 4 cores that look like 8
> because of hyperthreading?
>


Here's why I ask this: http://software.intel.com/en-us/forums/topic/294954


> Warren
>
>
>>
>> As a comparison, the numpy (version 1.4.1, installed from yum, which uses
>> BLAS+ATLAS) uses all 8 threads.  I do not get this.
>>
>> You can run this test program
>>
>> python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)'
>> 'np.dot(a, a)'
>>
>> There is one saving grace, the local numpy built with MKL is much faster
>> than the system's numpy.
>>
>> I hope someone can help me.  Searching the internet has been fruitless.
>>
>> Best,
>> Quyen
>>
>> My site.cfg for numpy (1.7.0)
>> [mkl]
>> library_dirs = /opt/intel/mkl/lib/intel64
>> include_dirs = /opt/intel/mkl/include
>> mkl_libs = mkl_rt
>> lapack_libs =
>>
>> I've edited line 37 of numpy/distutils/intelcompiler.py
>> self.cc_exe = 'icc -O3 -fPIC -fp-model strict -fomit-frame-pointer
>> -openmp
>> -parallel -DMKL_ILP64'
>>
>> Also line 54 of numpy/distutils/fcompiler/intel.py
>> return ['-i8 -xhost -openmp -fp-model strict']
>>
>> My .bash_profile also contains the lines:
>> source /opt/intel/bin/compilervars.sh intel64
>> source /opt/intel/mkl/bin/mklvars.sh intel64
>>
>> The above is needed to set the LD_LIBRARY_PATH so that Python can source
>> the intel dynamic library when numpy is called.
>>
>


From rdirective at gmail.com  Sun Mar 10 23:38:11 2013
From: rdirective at gmail.com (QT)
Date: Sun, 10 Mar 2013 22:38:11 -0500
Subject: [Numpy-discussion] Numpy 1.7.0 with Intel MKL 11.0.2.146
In-Reply-To: <CAGzF1udxnMqJ7=N62NkiYQAF7CVRwsGcgAm6A3vB8NjRPHMWhg@mail.gmail.com>
References: <CAFgvQuCd5rwp2k1stQL=kF5N0edv0ZHbpinqxmsgYXzHU_+h5g@mail.gmail.com>
	<CAGzF1uc7S6a7iHTmkDT4x5zJ4rMAmVK5qWg3pGsxvxuXKRQb_Q@mail.gmail.com>
	<CAGzF1udxnMqJ7=N62NkiYQAF7CVRwsGcgAm6A3vB8NjRPHMWhg@mail.gmail.com>
Message-ID: <CAFgvQuAXsJxbLBWf1pdFVOHU0pou+7ttP0rLR2Cek46kZs6xWg@mail.gmail.com>

Dear Warren,

It's an Intel i7 950, 4 cores, 8 with hyper-threading.

I used MKL 11.0.2.146, but I will read your link.  It seems spot on.

Best,
Quyen

On Sun, Mar 10, 2013 at 10:31 PM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:

> On 3/10/13, Warren Weckesser <warren.weckesser at gmail.com> wrote:
> > On 3/10/13, QT <rdirective at gmail.com> wrote:
> >> Dear all,
> >>
> >> I'm at my wits end.  I've followed Intel's own
> >> instructions<
> http://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl>on
> >> how to compile Numpy with Intel MKL.  Everything compiled and linked
> >> fine and I've installed it locally in my user folder...There is one
> nasty
> >> problem.  When one calls the numpy library to do some computation, it
> >> does
> >> not use all of the available threads.  I have 8 "cores" on my machine
> and
> >> it only uses 4 of them.  The MKL_NUM_THREADS environmental variable can
> >> be
> >> set to tune the number of threads but setting it to 8 does not change
> >> anything.  Indeed, setting it to 3 does limit the threads to 3....What
> is
> >> going on?
> >
> >
> > Does your computer have 8 physical cores, or 4 cores that look like 8
> > because of hyperthreading?
> >
>
>
> Here's why I ask this: http://software.intel.com/en-us/forums/topic/294954
>
>
> > Warren
> >
> >
> >>
> >> As a comparison, the numpy (version 1.4.1, installed from yum, which
> uses
> >> BLAS+ATLAS) uses all 8 threads.  I do not get this.
> >>
> >> You can run this test program
> >>
> >> python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)'
> >> 'np.dot(a, a)'
> >>
> >> There is one saving grace, the local numpy built with MKL is much faster
> >> than the system's numpy.
> >>
> >> I hope someone can help me.  Searching the internet has been fruitless.
> >>
> >> Best,
> >> Quyen
> >>
> >> My site.cfg for numpy (1.7.0)
> >> [mkl]
> >> library_dirs = /opt/intel/mkl/lib/intel64
> >> include_dirs = /opt/intel/mkl/include
> >> mkl_libs = mkl_rt
> >> lapack_libs =
> >>
> >> I've edited line 37 of numpy/distutils/intelcompiler.py
> >> self.cc_exe = 'icc -O3 -fPIC -fp-model strict -fomit-frame-pointer
> >> -openmp
> >> -parallel -DMKL_ILP64'
> >>
> >> Also line 54 of numpy/distutils/fcompiler/intel.py
> >> return ['-i8 -xhost -openmp -fp-model strict']
> >>
> >> My .bash_profile also contains the lines:
> >> source /opt/intel/bin/compilervars.sh intel64
> >> source /opt/intel/mkl/bin/mklvars.sh intel64
> >>
> >> The above is needed to set the LD_LIBRARY_PATH so that Python can source
> >> the intel dynamic library when numpy is called.
> >>
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130310/2d3950e8/attachment.html>

From robert.kern at gmail.com  Mon Mar 11 05:46:54 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 11 Mar 2013 09:46:54 +0000
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
In-Reply-To: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
Message-ID: <CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>

On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam <siu at continuum.io> wrote:
> Hi all,
>
> I am redirecting a discussion on github issue tracker here.  My original
> post (https://github.com/numpy/numpy/issues/3137):
>
> "The current implementation of the RNG seems to be MT19937-32. Since 64-bit
> machines are common nowadays, I am suggesting adding or upgrading to
> MT19937-64.  Thoughts?"
>
> Let me start by answering to njsmith's comments on the issue tracker:
>
> Would it be faster?
>
>
> Although I have not benchmarked the 64-bit implementation, it is likely that
> it will be faster on a 64-bit machine since the number of iteration
> (controlled by NN and MM in the reference implementation
> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c)
> is reduced by half.  In addition, each generation in the 64-bit
> implementation produces a 64-bit random int which can be used to generate
> double precision random number.  Unlike the 32-bit implementation which
> requires generating a pair of 32-bit random int.

>From the last time this was brought up, it looks like getting a single
64-bit integer out from MT19937-64 takes about the same amount of time
as getting a single 32-bit integer from MT19937-32, perhaps a little
slower, even on a 64-bit machine.

http://comments.gmane.org/gmane.comp.python.numeric.general/27773

So getting a single double would be not quite twice as fast.

> But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit
> instructions; thus, it is likely to be slower.  (1)
>
> Use less memory?
>
>
> The amount of memory use will remain the same.  The size of the RNG state is
> the same.
>
> Provide higher quality randomness?
>
>
> My naive answer is that 32-bit and 64-bit implementation have the same
> 2^19937-1 period. Need to do some research and experiments.
>
> Would it change the output of this program: import numpy
> numpy.random.seed(0) print numpy.random.random() ?
>
>
> Unfortunately, yes.  The 64-bit implementation generates a different random
> number sequence with the same seed. (2)
>
>
> My suggestion to overcome (1) and (2) is to allow the user to select between
> the two implementations (and possibly different algorithms in the future).
> If user does not provide a choice, we use the MT19937-32 by default.
>
>         numpy.random.set_state("MT19937_64", ?)   # choose the 64-bit
> implementation

Most likely, the different PRNGs should be different subclasses of
RandomState. The module-level convenience API should probably be left
alone. If you need to control the PRNG that you are using, you really
need to be passing around a RandomState instance and not relying on
reseeding the shared global instance. Aside: I really wish we hadn't
exposed `set_state()` in the module API. It's an attractive nuisance.

There is some low-level C work that needs to be done to allow the
non-uniform distributions to be shared between implementations of the
core uniform PRNG, but that's the same no matter how you organize the
upper layer.

--
Robert Kern


From chris.barker at noaa.gov  Mon Mar 11 13:07:05 2013
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Mon, 11 Mar 2013 10:07:05 -0700
Subject: [Numpy-discussion] Casting and promotion rules (e.g. int +
 uint64 => float)
In-Reply-To: <loom.20130308T170408-627@post.gmane.org>
References: <loom.20130308T170408-627@post.gmane.org>
Message-ID: <CALGmxELgNKrHxRYq1CZ1E+hWbnAEr1eAi+vmbtuVth_7yuD3tg@mail.gmail.com>

On Fri, Mar 8, 2013 at 8:23 AM, Sergio Callegari
<sergio.callegari at gmail.com> wrote:
>   I have noticed that numpy introduces some unexpected type casts, that are
> in some cases problematic.

There has been a lot of discussion about  casting on this list in the
last couple months -- I suggest you peruse that discussion and see
what conclusions it has lead to.

> A very weird cast is
>
> int + uint64 -> float

I think the idea here is that an int can hold negative numbers, so you
can't put it in a uint64 -- but you can't put a uint64 into a signed
int64. A float64 can hold the range of numbers of both a int and
uint64, so it is used, even though it can't  hold the full precision
of a uint64 (far from it!)

> Another issue is that variables unexpectedly change type with accumulation
> operators
>
> a=np.uint64(1)
> a+=1
>
> now a is float

yeah -- that should NEVER happen -- += is supposed to be an iin=place
operator, it should never change the array! However, what you've
crated here is not an array, but a numpy scalar, and the rules are
different there (but should they be?). I suspect that part of the
issue is that array scalars behave a bit more like the built-in numpy
number types, and thus += is not an in-place operator, but rather,
translates to:

a = a + 1

and as you've seen, that casts to a float64. A little test:

In [34]: d = np.int64(2)

In [35]: e = d
# e and d are the same object

In [36]: d += 1

In [37]: e is d
Out[37]: False

# they are not longer the same object -- the += created a new object

In [38]: type(d)
Out[38]: numpy.int64

# even though it's still the same type (no casting needed)

If you do use an array, you don't get casting with +=:

In [39]: a = np.array((1,), dtype=np.uint64)

In [40]: a
Out[40]: array([1], dtype=uint64)

In [41]: a + 1.0
Out[41]: array([ 2.])

# got a cast with the additon and creation of a new array

In [42]: a += 1.0

In [43]: a
Out[43]: array([2], dtype=uint64)
# but no cast with the in-place operator.

Personally, I think the "in-place" operators should be just that --
and only work for mutable objects, but I guess the ability to easily
increment in integer was just too tempting!

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From sergio.callegari at gmail.com  Mon Mar 11 14:17:22 2013
From: sergio.callegari at gmail.com (Sergio Callegari)
Date: Mon, 11 Mar 2013 18:17:22 +0000 (UTC)
Subject: [Numpy-discussion] Casting and promotion rules (e.g. int +
	uint64 => float)
References: <loom.20130308T170408-627@post.gmane.org>
	<CALGmxELgNKrHxRYq1CZ1E+hWbnAEr1eAi+vmbtuVth_7yuD3tg@mail.gmail.com>
Message-ID: <loom.20130311T190507-563@post.gmane.org>

Thanks for the explanation.

Chris Barker - NOAA Federal <chris.barker <at> noaa.gov> writes:

> There has been a lot of discussion about  casting on this list in the
> last couple months -- I suggest you peruse that discussion and see
> what conclusions it has lead to.

I'll look at it. My message to the ml followed an invitation to do so
after I posted a bug about weird castings.

> > int + uint64 -> float
> 
> I think the idea here is that an int can hold negative numbers, so you
> can't put it in a uint64 -- but you can't put a uint64 into a signed
> int64. A float64 can hold the range of numbers of both a int and
> uint64, so it is used, even though it can't  hold the full precision
> of a uint64 (far from it!)

I understand the good intention. Yet, this does not follow the principle of
least surprise. This is not what most other languages (possibly following C)
would do and, most important, dealing with integers, one expects overflows and
wraparounds, not certainly a loss of precision.

Another issue is that the promotion rule breaks indexing

a = np.uint64(1)
b=[0,1,2,3,4,5]
b[a] -> 1 # OK
b[a+1] -> Error


I really would like to suggest changing this behavior.

Thanks

Sergio


From wfspotz at sandia.gov  Mon Mar 11 23:55:58 2013
From: wfspotz at sandia.gov (Bill Spotz)
Date: Mon, 11 Mar 2013 21:55:58 -0600
Subject: [Numpy-discussion] Request code review of numpy.i changes
Message-ID: <EBBB643A-0996-4464-81C2-84FAE29C4D2F@sandia.gov>

https://github.com/wfspotz/numpy/compare/numpy-swig

** Bill Spotz                                              **
** Sandia National Laboratories  Voice: (505)845-0170      **
** P.O. Box 5800                 Fax:   (505)284-0154      **
** Albuquerque, NM 87185-0370    Email: wfspotz at sandia.gov **


From soumendotganguly at gmail.com  Tue Mar 12 03:20:21 2013
From: soumendotganguly at gmail.com (soumen ganguly)
Date: Tue, 12 Mar 2013 12:50:21 +0530
Subject: [Numpy-discussion] unclear output format for numpy.argmax()
Message-ID: <CAO73YeEjd2zHNCfpEs=neqc4hbXJx4E-dXNxoLBTgfrPqK3fMQ@mail.gmail.com>

Hello,

There are some doubts that i have regarding the argmax() method of numpy.As
described in reference doc's of numpy,argmax(axis=None,out=None) returns
the indices of the maximum value along the given axis(In this case 0 is
default).

So, i tried to implement the method to a 2d array with elements
say,[[1,2,3],[4,5,6]] along the axis 1.The output to this code is [2,2] and
when i implement it along the axis 0,it outputs [1,1,1].I dont see the
connection to this output with the scope of argmax method.

I would appreciate a detailed insight to the argmax method.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/c31b40b6/attachment.html>

From tjhnson at gmail.com  Tue Mar 12 03:40:14 2013
From: tjhnson at gmail.com (T J)
Date: Tue, 12 Mar 2013 02:40:14 -0500
Subject: [Numpy-discussion] Vectorize and ufunc attribute
Message-ID: <CAJ8p+o9p5_Zu4ksnc7ddZKGxBWyoQvV_rms2NdZQCmJR=-P0ww@mail.gmail.com>

Prior to 1.7, I had working compatibility code such as the following:


if has_good_functions:
    # http://projects.scipy.org/numpy/ticket/1096
    from numpy import logaddexp, logaddexp2
else:
    logaddexp = vectorize(_logaddexp, otypes=[numpy.float64])
    logaddexp2 = vectorize(_logaddexp2, otypes=[numpy.float64])

    # Run these at least once so that .ufunc.reduce exists
    logaddexp([1.,2.,3.],[1.,2.,3.])
    logaddexp2([1.,2.,3.],[1.,2.,3.])

    # And then make reduce available at the top level
    logaddexp.reduce = logaddexp.ufunc.reduce
    logaddexp2.reduce = logaddexp2.ufunc.reduce


The point was that I wanted to treat the output of vectorize as a hacky
drop-in replacement for a ufunc.  In 1.7, I discovered that vectorize had
changed (https://github.com/numpy/numpy/pull/290), and now there is no
longer a ufunc attribute at all.

Should this be added back in?  Besides hackish drop-in replacements, I see
value in to being able to call reduce, accumulate, etc (when possible) on
the output of vectorize().
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/46ca6078/attachment.html>

From toddrjen at gmail.com  Tue Mar 12 03:48:36 2013
From: toddrjen at gmail.com (Todd)
Date: Tue, 12 Mar 2013 08:48:36 +0100
Subject: [Numpy-discussion] unclear output format for numpy.argmax()
In-Reply-To: <CAO73YeEjd2zHNCfpEs=neqc4hbXJx4E-dXNxoLBTgfrPqK3fMQ@mail.gmail.com>
References: <CAO73YeEjd2zHNCfpEs=neqc4hbXJx4E-dXNxoLBTgfrPqK3fMQ@mail.gmail.com>
Message-ID: <CAFpSVpLyTPhD72Pt6VSLnTLgnvi=QLdNGhYoOYMCvR=bH6nBeQ@mail.gmail.com>

On Tue, Mar 12, 2013 at 8:20 AM, soumen ganguly
<soumendotganguly at gmail.com>wrote:

> Hello,
>
> There are some doubts that i have regarding the argmax() method of
> numpy.As described in reference doc's of numpy,argmax(axis=None,out=None)
> returns the indices of the maximum value along the given axis(In this case
> 0 is default).
>
> So, i tried to implement the method to a 2d array with elements
> say,[[1,2,3],[4,5,6]] along the axis 1.The output to this code is [2,2] and
> when i implement it along the axis 0,it outputs [1,1,1].I dont see the
> connection to this output with the scope of argmax method.
>
> I would appreciate a detailed insight to the argmax method.
>
>
I am not sure I understand the question.  For axis 0 (the "outer" dimension
in the way it is printed) the things being compared are argmax([1, 4]),
argmax(([2, 5]), and argmax([3, 6])..  Amongst those, the second (index 1)
is higher in each case, so it returns [1, 1, 1].  With axis 1 (the "inner"
dimension in the way it is printed) , the things being compared are
argmax([1, 2, 3]) and argmax([4, 5, 6]).  In both case the third (index 2)
is the highest, so it returns [2, 2].  What is unexpected about this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/d6b65653/attachment.html>

From dineshbvadhia at hotmail.com  Tue Mar 12 09:01:59 2013
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Tue, 12 Mar 2013 06:01:59 -0700
Subject: [Numpy-discussion] Yes,
	this one again "ImportError: No module named multiarray"
Message-ID: <BAY173-DS1861817D28779362868CE1A3E20@phx.gbl>

I've been using Numpy/Scipy for >5 years so know a little on how to get around them.  Recently, I've needed to either freeze or create executables with tools such as PyInstaller, Cython, Py2exe and others on both Windows (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit).  The test program (which runs perfectly with the Python interpreter) is very simple:

import numpy

def main():
    print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90])
    return
    
if __name__ == '__main__':
    main()

The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11.  The "import numpy" causes an "ImportError: No module named multiarray".  After endless Googling, I am none the wiser about what (really) causes the ImportError let alone what the solution is.  The Traceback, similar to others found on the web, is:

Traceback (most recent call last):
  File "test.py", ...
  File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in <module>
    import add_newdocs
  File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in <module>
    from numpy.lib import add_newdoc
  File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in <module>
    from type_check import *
  File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8, in <module>
    import numpy.core.numeric as _nx
  File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, in <module>
    import multiarray
ImportError: No module named multiarray.

Could someone shed some light on this - please?  Thx.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/1f8132d6/attachment.html>

From aron at ahmadia.net  Tue Mar 12 09:17:34 2013
From: aron at ahmadia.net (Aron Ahmadia)
Date: Tue, 12 Mar 2013 13:17:34 +0000
Subject: [Numpy-discussion] Yes,
 this one again "ImportError: No module named multiarray"
In-Reply-To: <BAY173-DS1861817D28779362868CE1A3E20@phx.gbl>
References: <BAY173-DS1861817D28779362868CE1A3E20@phx.gbl>
Message-ID: <CAPhiW4jHfdSyA890Fp1WbFATBdEyNhGHOmha3tBi=Y8KjkvGtA@mail.gmail.com>

multiarray is an extension module that lives within numpy/core, that is,
when, "import multiarray" is called, (and it's the first imported extension
module in numpy), multiarray.ext (ext being dll on Windows I guess), gets
dynamically loaded.

"No module named multiarray" is indicating problems with your freeze setup.
 Most of these tools don't support locally imported extension modules.

Does this help you get oriented on your problem?

A


On Tue, Mar 12, 2013 at 1:01 PM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com>wrote:

> **
> I've been using Numpy/Scipy for >5 years so know a little on how to get
> around them.  Recently, I've needed to either freeze or create executables
> with tools such as PyInstaller, Cython, Py2exe and others on both Windows
> (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit).  The test
> program (which runs perfectly with the Python interpreter) is very simple:
>
> import numpy
>
> def main():
>     print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90])
>     return
>
> if __name__ == '__main__':
>     main()
>
> The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11.  The
> "import numpy" causes an "ImportError: No module named multiarray".  After
> endless Googling, I am none the wiser about what (really) causes the
> ImportError let alone what the solution is.  The Traceback, similar to
> others found on the web, is:
>
> Traceback (most recent call last):
>   File "test.py", ...
>   File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in
> <module>
>     import add_newdocs
>   File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in
> <module>
>     from numpy.lib import add_newdoc
>   File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in
> <module>
>     from type_check import *
>   File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8, in
> <module>
>     import numpy.core.numeric as _nx
>   File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, in
> <module>
>     import multiarray
> ImportError: No module named multiarray.
>
> Could someone shed some light on this - please?  Thx.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/f2a3ea86/attachment.html>

From ezindy at gmail.com  Tue Mar 12 09:27:24 2013
From: ezindy at gmail.com (Egor Zindy)
Date: Tue, 12 Mar 2013 13:27:24 +0000
Subject: [Numpy-discussion] Request code review of numpy.i changes
In-Reply-To: <EBBB643A-0996-4464-81C2-84FAE29C4D2F@sandia.gov>
References: <EBBB643A-0996-4464-81C2-84FAE29C4D2F@sandia.gov>
Message-ID: <CAEy_T3G+xqe_mR5C4JheYsS49TNh_YOXa10FNx_eCBwcsGZnvQ@mail.gmail.com>

Thanks Bill,

I wasn't happy with my use of either PyCObject_FromVoidPtr or
PyArray_BASE. Both are now deprecated.

So I updated all the ARGOUTVIEWM_ definitions with

%#ifdef SWIGPY_USE_CAPSULE
    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME,
SWIG_Python_DestroyModule);
%#else
    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1),
SWIG_Python_DestroyModule);
%#endif

%#if NPY_API_VERSION < 0x00000007
  PyArray_BASE(array) = cap;
%#else
  PyArray_SetBaseObject(array,cap);
%#endif

This could probably be improved with the use of a macro, and checking
the returned value of PyArray_SetBaseObject wouldn't hurt either.
Anyway, it's a start. Hopefully I haven't messed my use of either
SWIGPY_CAPSULE_NAME or SWIG_Python_DestroyModule here.

Other changes I made relate to various warnings, in particular
relating to the use of  SWIG_Python_AppendOutput($result, XXX) where
XXX should be a PyObject but was a PyArrayObject.

In ARGOUTVIEW / ARGOUTVIEWM typedefs, I made sure there was a

  PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE,
(void*)(*$1));
  PyArrayObject* array = (PyArrayObject*) obj;

which allows me to then use (instead of ,array)
  $result = SWIG_Python_AppendOutput($result,obj);

In the other few other instances where this construct doesn't apply
(ARGOUT_ARRAY1 for example) I used typecasting
  $result = SWIG_Python_AppendOutput($result,(PyObject*)array$argnum);

I can't think of anything else at this stage.

Kind regards,
Egor

On 12 March 2013 03:55, Bill Spotz <wfspotz at sandia.gov> wrote:
>
> https://github.com/wfspotz/numpy/compare/numpy-swig
>
> ** Bill Spotz                                              **
> ** Sandia National Laboratories  Voice: (505)845-0170      **
> ** P.O. Box 5800                 Fax:   (505)284-0154      **
> ** Albuquerque, NM 87185-0370    Email: wfspotz at sandia.gov **
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: numpy.i
Type: application/octet-stream
Size: 97352 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/c85b554b/attachment.obj>

From dineshbvadhia at hotmail.com  Tue Mar 12 10:05:30 2013
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Tue, 12 Mar 2013 07:05:30 -0700
Subject: [Numpy-discussion] Yes,
	this one again "ImportError: No module named multiarray"
In-Reply-To: <CAPhiW4jHfdSyA890Fp1WbFATBdEyNhGHOmha3tBi=Y8KjkvGtA@mail.gmail.com>
References: <BAY173-DS1861817D28779362868CE1A3E20@phx.gbl>
	<CAPhiW4jHfdSyA890Fp1WbFATBdEyNhGHOmha3tBi=Y8KjkvGtA@mail.gmail.com>
Message-ID: <BAY173-DS11F15717AF2E424ECA4E87A3E20@phx.gbl>

Does that mean numpy won't work with freeze/create_executable type of tools or is there a workaround?


From: Aron Ahmadia 
Sent: Tuesday, March 12, 2013 6:17 AM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Yes,this one again "ImportError: No module named multiarray"


multiarray is an extension module that lives within numpy/core, that is, when, "import multiarray" is called, (and it's the first imported extension module in numpy), multiarray.ext (ext being dll on Windows I guess), gets dynamically loaded. 


"No module named multiarray" is indicating problems with your freeze setup.  Most of these tools don't support locally imported extension modules.


Does this help you get oriented on your problem?


A


On Tue, Mar 12, 2013 at 1:01 PM, Dinesh B Vadhia <dineshbvadhia at hotmail.com> wrote:

  I've been using Numpy/Scipy for >5 years so know a little on how to get around them.  Recently, I've needed to either freeze or create executables with tools such as PyInstaller, Cython, Py2exe and others on both Windows (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit).  The test program (which runs perfectly with the Python interpreter) is very simple:

  import numpy

  def main():
      print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90])
      return
      
  if __name__ == '__main__':
      main()

  The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11.  The "import numpy" causes an "ImportError: No module named multiarray".  After endless Googling, I am none the wiser about what (really) causes the ImportError let alone what the solution is.  The Traceback, similar to others found on the web, is:

  Traceback (most recent call last):
    File "test.py", ...
    File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in <module>
      import add_newdocs
    File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in <module>
      from numpy.lib import add_newdoc
    File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in <module>
      from type_check import *
    File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8, in <module>
      import numpy.core.numeric as _nx
    File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, in <module>
      import multiarray
  ImportError: No module named multiarray.

  Could someone shed some light on this - please?  Thx.


  _______________________________________________
  NumPy-Discussion mailing list
  NumPy-Discussion at scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/d00cb332/attachment.html>

From aron at ahmadia.net  Tue Mar 12 10:08:32 2013
From: aron at ahmadia.net (Aron Ahmadia)
Date: Tue, 12 Mar 2013 14:08:32 +0000
Subject: [Numpy-discussion] (@Pat Marion) Re:  Yes,
 this one again "ImportError: No module named multiarray"
Message-ID: <CAPhiW4iMbeBBcdfTSw-xr6EMcZZfz=hzXNmxdsq=ecjVcDabmw@mail.gmail.com>

Pat Marion at Kitware did some work on this, I'm pinging him on the thread.

A


On Tue, Mar 12, 2013 at 2:05 PM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com>wrote:

> **
> Does that mean numpy won't work with freeze/create_executable type of
> tools or is there a workaround?
>
>
>  *From:* Aron Ahmadia <aron at ahmadia.net>
> *Sent:* Tuesday, March 12, 2013 6:17 AM
> *To:* Discussion of Numerical Python <numpy-discussion at scipy.org>
> *Subject:* Re: [Numpy-discussion] Yes,this one again "ImportError: No
> module named multiarray"
>
> multiarray is an extension module that lives within numpy/core, that is,
> when, "import multiarray" is called, (and it's the first imported extension
> module in numpy), multiarray.ext (ext being dll on Windows I guess), gets
> dynamically loaded.
>
> "No module named multiarray" is indicating problems with your freeze
> setup.  Most of these tools don't support locally imported extension
> modules.
>
> Does this help you get oriented on your problem?
>
> A
>
>
> On Tue, Mar 12, 2013 at 1:01 PM, Dinesh B Vadhia <
> dineshbvadhia at hotmail.com> wrote:
>
>> **
>> I've been using Numpy/Scipy for >5 years so know a little on how to get
>> around them.  Recently, I've needed to either freeze or create executables
>> with tools such as PyInstaller, Cython, Py2exe and others on both Windows
>> (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit).  The test
>> program (which runs perfectly with the Python interpreter) is very simple:
>>
>> import numpy
>>
>> def main():
>>     print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90])
>>     return
>>
>> if __name__ == '__main__':
>>     main()
>>
>> The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11.  The
>> "import numpy" causes an "ImportError: No module named multiarray".  After
>> endless Googling, I am none the wiser about what (really) causes the
>> ImportError let alone what the solution is.  The Traceback, similar to
>> others found on the web, is:
>>
>> Traceback (most recent call last):
>>   File "test.py", ...
>>   File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in
>> <module>
>>     import add_newdocs
>>   File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in
>> <module>
>>     from numpy.lib import add_newdoc
>>   File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in
>> <module>
>>     from type_check import *
>>   File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8,
>> in <module>
>>     import numpy.core.numeric as _nx
>>   File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, in
>> <module>
>>     import multiarray
>> ImportError: No module named multiarray.
>>
>> Could someone shed some light on this - please?  Thx.
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/710f6fce/attachment.html>

From pat.marion at kitware.com  Tue Mar 12 10:23:53 2013
From: pat.marion at kitware.com (Pat Marion)
Date: Wed, 13 Mar 2013 00:23:53 +1000
Subject: [Numpy-discussion] (@Pat Marion) Re:  Yes,
 this one again "ImportError: No module named multiarray"
In-Reply-To: <CAPhiW4iMbeBBcdfTSw-xr6EMcZZfz=hzXNmxdsq=ecjVcDabmw@mail.gmail.com>
References: <CAPhiW4iMbeBBcdfTSw-xr6EMcZZfz=hzXNmxdsq=ecjVcDabmw@mail.gmail.com>
Message-ID: <CAMetC=ScJQuPqSEgnBDGn3nrPq6CXOm1zOec3-YHON_Y+vz-0w@mail.gmail.com>

Thanks for copying me, Aron.

Hi Dinesh, I have a github project which demonstrates how to use numpy with
freeze.  The project's readme includes more information:

  https://github.com/patmarion/NumpyBuiltinExample

It does require a small patch to CPython's import.c file.  I haven't tried
posted this patch to the CPython developers, perhaps there'd be interest
incorporating it upstream.

Pat

On Wed, Mar 13, 2013 at 12:08 AM, Aron Ahmadia <aron at ahmadia.net> wrote:

> Pat Marion at Kitware did some work on this, I'm pinging him on the thread.
>
> A
>
>
> On Tue, Mar 12, 2013 at 2:05 PM, Dinesh B Vadhia <
> dineshbvadhia at hotmail.com> wrote:
>
>> **
>> Does that mean numpy won't work with freeze/create_executable type of
>> tools or is there a workaround?
>>
>>
>>  *From:* Aron Ahmadia <aron at ahmadia.net>
>> *Sent:* Tuesday, March 12, 2013 6:17 AM
>> *To:* Discussion of Numerical Python <numpy-discussion at scipy.org>
>> *Subject:* Re: [Numpy-discussion] Yes,this one again "ImportError: No
>> module named multiarray"
>>
>> multiarray is an extension module that lives within numpy/core, that is,
>> when, "import multiarray" is called, (and it's the first imported extension
>> module in numpy), multiarray.ext (ext being dll on Windows I guess), gets
>> dynamically loaded.
>>
>> "No module named multiarray" is indicating problems with your freeze
>> setup.  Most of these tools don't support locally imported extension
>> modules.
>>
>> Does this help you get oriented on your problem?
>>
>> A
>>
>>
>> On Tue, Mar 12, 2013 at 1:01 PM, Dinesh B Vadhia <
>> dineshbvadhia at hotmail.com> wrote:
>>
>>> **
>>> I've been using Numpy/Scipy for >5 years so know a little on how to get
>>> around them.  Recently, I've needed to either freeze or create executables
>>> with tools such as PyInstaller, Cython, Py2exe and others on both Windows
>>> (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit).  The test
>>> program (which runs perfectly with the Python interpreter) is very simple:
>>>
>>> import numpy
>>>
>>> def main():
>>>     print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90])
>>>     return
>>>
>>> if __name__ == '__main__':
>>>     main()
>>>
>>> The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11.
>>> The "import numpy" causes an "ImportError: No module named multiarray".  After
>>> endless Googling, I am none the wiser about what (really) causes the
>>> ImportError let alone what the solution is.  The Traceback, similar to
>>> others found on the web, is:
>>>
>>> Traceback (most recent call last):
>>>   File "test.py", ...
>>>   File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in
>>> <module>
>>>     import add_newdocs
>>>   File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in
>>> <module>
>>>     from numpy.lib import add_newdoc
>>>   File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in
>>> <module>
>>>     from type_check import *
>>>   File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8,
>>> in <module>
>>>     import numpy.core.numeric as _nx
>>>   File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5,
>>> in <module>
>>>     import multiarray
>>> ImportError: No module named multiarray.
>>>
>>> Could someone shed some light on this - please?  Thx.
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130313/b3eef498/attachment.html>

From brad.froehle at gmail.com  Tue Mar 12 10:59:53 2013
From: brad.froehle at gmail.com (Bradley M. Froehle)
Date: Tue, 12 Mar 2013 07:59:53 -0700
Subject: [Numpy-discussion] Vectorize and ufunc attribute
In-Reply-To: <CAJ8p+o9p5_Zu4ksnc7ddZKGxBWyoQvV_rms2NdZQCmJR=-P0ww@mail.gmail.com>
References: <CAJ8p+o9p5_Zu4ksnc7ddZKGxBWyoQvV_rms2NdZQCmJR=-P0ww@mail.gmail.com>
Message-ID: <CAHXv-Mjxo3jab=iG-iwZ3387ewO0T4TNXWqo==Y3P1GG1YjFkA@mail.gmail.com>

T J:

You may want to look into `numpy.frompyfunc` (
http://docs.scipy.org/doc/numpy/reference/generated/numpy.frompyfunc.html).

-Brad


On Tue, Mar 12, 2013 at 12:40 AM, T J <tjhnson at gmail.com> wrote:

> Prior to 1.7, I had working compatibility code such as the following:
>
>
> if has_good_functions:
>     # http://projects.scipy.org/numpy/ticket/1096
>     from numpy import logaddexp, logaddexp2
> else:
>     logaddexp = vectorize(_logaddexp, otypes=[numpy.float64])
>     logaddexp2 = vectorize(_logaddexp2, otypes=[numpy.float64])
>
>     # Run these at least once so that .ufunc.reduce exists
>     logaddexp([1.,2.,3.],[1.,2.,3.])
>     logaddexp2([1.,2.,3.],[1.,2.,3.])
>
>     # And then make reduce available at the top level
>     logaddexp.reduce = logaddexp.ufunc.reduce
>     logaddexp2.reduce = logaddexp2.ufunc.reduce
>
>
> The point was that I wanted to treat the output of vectorize as a hacky
> drop-in replacement for a ufunc.  In 1.7, I discovered that vectorize had
> changed (https://github.com/numpy/numpy/pull/290), and now there is no
> longer a ufunc attribute at all.
>
> Should this be added back in?  Besides hackish drop-in replacements, I see
> value in to being able to call reduce, accumulate, etc (when possible) on
> the output of vectorize().
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/30eb19ce/attachment.html>

From njs at pobox.com  Tue Mar 12 17:25:44 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 12 Mar 2013 21:25:44 +0000
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
In-Reply-To: <CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
Message-ID: <CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>

On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam <siu at continuum.io> wrote:
>> My suggestion to overcome (1) and (2) is to allow the user to select between
>> the two implementations (and possibly different algorithms in the future).
>> If user does not provide a choice, we use the MT19937-32 by default.
>>
>>         numpy.random.set_state("MT19937_64", ?)   # choose the 64-bit
>> implementation
>
> Most likely, the different PRNGs should be different subclasses of
> RandomState. The module-level convenience API should probably be left
> alone. If you need to control the PRNG that you are using, you really
> need to be passing around a RandomState instance and not relying on
> reseeding the shared global instance.

+1

> Aside: I really wish we hadn't
> exposed `set_state()` in the module API. It's an attractive nuisance.

And our own test suite is a serious offender in this regard, we have
tests that fail if you run the test suite in a non-default order...
  https://github.com/numpy/numpy/issues/347

I wonder if we dare deprecate it? The whole idea of a global random
state is just a bad one, like every other sort of global shared state.
But it's one that's deeply baked into a lot of scientific programmers
expectations about how APIs work...

-n


From njs at pobox.com  Tue Mar 12 17:27:35 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 12 Mar 2013 21:27:35 +0000
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
In-Reply-To: <CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
Message-ID: <CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>

On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam <siu at continuum.io> wrote:
>>> My suggestion to overcome (1) and (2) is to allow the user to select between
>>> the two implementations (and possibly different algorithms in the future).
>>> If user does not provide a choice, we use the MT19937-32 by default.
>>>
>>>         numpy.random.set_state("MT19937_64", ?)   # choose the 64-bit
>>> implementation
>>
>> Most likely, the different PRNGs should be different subclasses of
>> RandomState. The module-level convenience API should probably be left
>> alone. If you need to control the PRNG that you are using, you really
>> need to be passing around a RandomState instance and not relying on
>> reseeding the shared global instance.
>
> +1
>
>> Aside: I really wish we hadn't
>> exposed `set_state()` in the module API. It's an attractive nuisance.
>
> And our own test suite is a serious offender in this regard, we have
> tests that fail if you run the test suite in a non-default order...
>   https://github.com/numpy/numpy/issues/347
>
> I wonder if we dare deprecate it? The whole idea of a global random
> state is just a bad one, like every other sort of global shared state.
> But it's one that's deeply baked into a lot of scientific programmers
> expectations about how APIs work...

(To be clear, by 'it' here I meant np.random.set_seed(), not the whole
np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
some fashion when people use it', not 'rip out in a few releases'. I
think.)

-n


From chris.barker at noaa.gov  Tue Mar 12 17:50:54 2013
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Tue, 12 Mar 2013 14:50:54 -0700
Subject: [Numpy-discussion] Yes,
 this one again "ImportError: No module named multiarray"
In-Reply-To: <BAY173-DS11F15717AF2E424ECA4E87A3E20@phx.gbl>
References: <BAY173-DS1861817D28779362868CE1A3E20@phx.gbl>
	<CAPhiW4jHfdSyA890Fp1WbFATBdEyNhGHOmha3tBi=Y8KjkvGtA@mail.gmail.com>
	<BAY173-DS11F15717AF2E424ECA4E87A3E20@phx.gbl>
Message-ID: <CALGmxEKZSueVQrm-4Tuy9S5oN8YxVmazWODbdo9cd1T4rx8iwA@mail.gmail.com>

On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> Does that mean numpy won't work with freeze/create_executable type of tools
> or is there a workaround?

I've used numpy with py2exe and py2app out of the box with no issues (
actually, there is an issue with too much stuff getting bundled up,
but it works)

>> ImportError let alone what the solution is.  The Traceback, similar to
>> others found on the web, is:
>>
>> Traceback (most recent call last):
>>   File "test.py", ...
>>   File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in
>> <module>

This indicates that your code is importing the numpy that's inside the
system installation -- it should be using one in your app bundle.

What bundling tool are you using?
How did you install python/numpy?
What does your bundling tol config look like?
And, of course, version numbers of everything.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From josef.pktd at gmail.com  Tue Mar 12 18:37:37 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 12 Mar 2013 18:37:37 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
In-Reply-To: <CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
Message-ID: <CAMMTP+BYfdKUGzLk329oRUqfoTZji3XSHxCb7eihuDrVw42G_Q@mail.gmail.com>

On Tue, Mar 12, 2013 at 5:27 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern <robert.kern at gmail.com> wrote:
>>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam <siu at continuum.io> wrote:
>>>> My suggestion to overcome (1) and (2) is to allow the user to select between
>>>> the two implementations (and possibly different algorithms in the future).
>>>> If user does not provide a choice, we use the MT19937-32 by default.
>>>>
>>>>         numpy.random.set_state("MT19937_64", ?)   # choose the 64-bit
>>>> implementation
>>>
>>> Most likely, the different PRNGs should be different subclasses of
>>> RandomState. The module-level convenience API should probably be left
>>> alone. If you need to control the PRNG that you are using, you really
>>> need to be passing around a RandomState instance and not relying on
>>> reseeding the shared global instance.
>>
>> +1
>>
>>> Aside: I really wish we hadn't
>>> exposed `set_state()` in the module API. It's an attractive nuisance.

Here is a recipe how to use it
http://mail.scipy.org/pipermail/numpy-discussion/2010-September/052911.html

(I'm just drawing a random number as seed that I can save, instead of
the entire state.)

Josef

>>
>> And our own test suite is a serious offender in this regard, we have
>> tests that fail if you run the test suite in a non-default order...
>>   https://github.com/numpy/numpy/issues/347
>>
>> I wonder if we dare deprecate it? The whole idea of a global random
>> state is just a bad one, like every other sort of global shared state.
>> But it's one that's deeply baked into a lot of scientific programmers
>> expectations about how APIs work...
>
> (To be clear, by 'it' here I meant np.random.set_seed(), not the whole
> np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
> some fashion when people use it', not 'rip out in a few releases'. I
> think.)
>
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From ndbecker2 at gmail.com  Tue Mar 12 18:38:54 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 12 Mar 2013 18:38:54 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
Message-ID: <khoapr$r76$1@ger.gmane.org>

Nathaniel Smith wrote:

> On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern <robert.kern at gmail.com> wrote:
>>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam <siu at continuum.io> wrote:
>>>> My suggestion to overcome (1) and (2) is to allow the user to select
>>>> between the two implementations (and possibly different algorithms in the
>>>> future). If user does not provide a choice, we use the MT19937-32 by
>>>> default.
>>>>
>>>>         numpy.random.set_state("MT19937_64", ?)   # choose the 64-bit
>>>> implementation
>>>
>>> Most likely, the different PRNGs should be different subclasses of
>>> RandomState. The module-level convenience API should probably be left
>>> alone. If you need to control the PRNG that you are using, you really
>>> need to be passing around a RandomState instance and not relying on
>>> reseeding the shared global instance.
>>
>> +1
>>
>>> Aside: I really wish we hadn't
>>> exposed `set_state()` in the module API. It's an attractive nuisance.
>>
>> And our own test suite is a serious offender in this regard, we have
>> tests that fail if you run the test suite in a non-default order...
>>   https://github.com/numpy/numpy/issues/347
>>
>> I wonder if we dare deprecate it? The whole idea of a global random
>> state is just a bad one, like every other sort of global shared state.
>> But it's one that's deeply baked into a lot of scientific programmers
>> expectations about how APIs work...
> 
> (To be clear, by 'it' here I meant np.random.set_seed(), not the whole
> np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
> some fashion when people use it', not 'rip out in a few releases'. I
> think.)
> 
> -n

What do you mean that the idea of global shared state is a bad one?  How would 
you prefer the API to look?  An alternative is a stateless rng, where you have 
to pass it it's state on each invocation, which it would update and return.  I 
hope you're not advocating that. 


From robert.kern at gmail.com  Tue Mar 12 19:10:04 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 12 Mar 2013 23:10:04 +0000
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
In-Reply-To: <khoapr$r76$1@ger.gmane.org>
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
Message-ID: <CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>

On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Nathaniel Smith wrote:
>
>> On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern <robert.kern at gmail.com> wrote:
>>>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam <siu at continuum.io> wrote:
>>>>> My suggestion to overcome (1) and (2) is to allow the user to select
>>>>> between the two implementations (and possibly different algorithms in the
>>>>> future). If user does not provide a choice, we use the MT19937-32 by
>>>>> default.
>>>>>
>>>>>         numpy.random.set_state("MT19937_64", ?)   # choose the 64-bit
>>>>> implementation
>>>>
>>>> Most likely, the different PRNGs should be different subclasses of
>>>> RandomState. The module-level convenience API should probably be left
>>>> alone. If you need to control the PRNG that you are using, you really
>>>> need to be passing around a RandomState instance and not relying on
>>>> reseeding the shared global instance.
>>>
>>> +1
>>>
>>>> Aside: I really wish we hadn't
>>>> exposed `set_state()` in the module API. It's an attractive nuisance.
>>>
>>> And our own test suite is a serious offender in this regard, we have
>>> tests that fail if you run the test suite in a non-default order...
>>>   https://github.com/numpy/numpy/issues/347
>>>
>>> I wonder if we dare deprecate it? The whole idea of a global random
>>> state is just a bad one, like every other sort of global shared state.
>>> But it's one that's deeply baked into a lot of scientific programmers
>>> expectations about how APIs work...
>>
>> (To be clear, by 'it' here I meant np.random.set_seed(), not the whole
>> np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
>> some fashion when people use it', not 'rip out in a few releases'. I
>> think.)
>>
>> -n
>
> What do you mean that the idea of global shared state is a bad one?

The words "global shared state" drives fear into the hearts of
experienced programmers everywhere, whatever the context. :-) It's
rarely a *good* idea.

> How would
> you prefer the API to look?

There are two current APIs:

1. Instantiate RandomState and call it's methods
2. Just call the functions in numpy.random

The latter has a shared global state. In fact, all of those
"functions" are just references to the methods on a shared global
RandomState instance.

We advocate using the former API. Note that it already exists. It was
the recommended API from day one. No one is recommending adding a new
API.

> An alternative is a stateless rng, where you have
> to pass it it's state on each invocation, which it would update and return.  I
> hope you're not advocating that.

No. This is a place where OOP solved the problem neatly.

-- 
Robert Kern


From ndbecker2 at gmail.com  Tue Mar 12 20:16:12 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 12 Mar 2013 20:16:12 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
Message-ID: <khogga$ff6$1@ger.gmane.org>

I guess I talked to you about 100 years ago about sharing state between numpy 
rng and code I have in c++ that wraps boost::random.  So is there a C-api for 
this RandomState object I could use to call from c++?  Maybe I could do 
something with that.

The c++ code could invoke via the python api, but that might be slower.  I'm 
just rambling here, I'd have to see the API to get some ideas.


From tjhnson at gmail.com  Tue Mar 12 20:16:52 2013
From: tjhnson at gmail.com (T J)
Date: Tue, 12 Mar 2013 19:16:52 -0500
Subject: [Numpy-discussion] Vectorize and ufunc attribute
In-Reply-To: <CAHXv-Mjxo3jab=iG-iwZ3387ewO0T4TNXWqo==Y3P1GG1YjFkA@mail.gmail.com>
References: <CAJ8p+o9p5_Zu4ksnc7ddZKGxBWyoQvV_rms2NdZQCmJR=-P0ww@mail.gmail.com>
	<CAHXv-Mjxo3jab=iG-iwZ3387ewO0T4TNXWqo==Y3P1GG1YjFkA@mail.gmail.com>
Message-ID: <CAJ8p+o9i=zMjA-nUxrLmJKaqTySAsjaDG5Xp5dyZbM9ycKAdUg@mail.gmail.com>

On Tue, Mar 12, 2013 at 9:59 AM, Bradley M. Froehle
<brad.froehle at gmail.com>wrote:

> T J:
>
> You may want to look into `numpy.frompyfunc` (
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.frompyfunc.html
> ).
>
>
Yeah that's better, but it doesn't respect the output type of the function.
 Be nice if this supported the otypes keyword.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130312/60c3b9d5/attachment.html>

From ndbecker2 at gmail.com  Tue Mar 12 20:33:19 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 12 Mar 2013 20:33:19 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
	<khogga$ff6$1@ger.gmane.org>
Message-ID: <khohgb$n4h$1@ger.gmane.org>

Neal Becker wrote:

> I guess I talked to you about 100 years ago about sharing state between numpy
> rng and code I have in c++ that wraps boost::random.  So is there a C-api for
> this RandomState object I could use to call from c++?  Maybe I could do
> something with that.
> 
> The c++ code could invoke via the python api, but that might be slower.  I'm
> just rambling here, I'd have to see the API to get some ideas.

I think if I could just grab a long int from the underlying mersenne twister, 
through some c api?


From josef.pktd at gmail.com  Tue Mar 12 20:48:14 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 12 Mar 2013 20:48:14 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
In-Reply-To: <CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
Message-ID: <CAMMTP+B3Tm=2PQy7FR=SRWhYiA9-mGog8h=FC46Zri11nimNcw@mail.gmail.com>

On Tue, Mar 12, 2013 at 7:10 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> Nathaniel Smith wrote:
>>
>>> On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern <robert.kern at gmail.com> wrote:
>>>>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam <siu at continuum.io> wrote:
>>>>>> My suggestion to overcome (1) and (2) is to allow the user to select
>>>>>> between the two implementations (and possibly different algorithms in the
>>>>>> future). If user does not provide a choice, we use the MT19937-32 by
>>>>>> default.
>>>>>>
>>>>>>         numpy.random.set_state("MT19937_64", ?)   # choose the 64-bit
>>>>>> implementation
>>>>>
>>>>> Most likely, the different PRNGs should be different subclasses of
>>>>> RandomState. The module-level convenience API should probably be left
>>>>> alone. If you need to control the PRNG that you are using, you really
>>>>> need to be passing around a RandomState instance and not relying on
>>>>> reseeding the shared global instance.
>>>>
>>>> +1
>>>>
>>>>> Aside: I really wish we hadn't
>>>>> exposed `set_state()` in the module API. It's an attractive nuisance.
>>>>
>>>> And our own test suite is a serious offender in this regard, we have
>>>> tests that fail if you run the test suite in a non-default order...
>>>>   https://github.com/numpy/numpy/issues/347
>>>>
>>>> I wonder if we dare deprecate it? The whole idea of a global random
>>>> state is just a bad one, like every other sort of global shared state.
>>>> But it's one that's deeply baked into a lot of scientific programmers
>>>> expectations about how APIs work...
>>>
>>> (To be clear, by 'it' here I meant np.random.set_seed(), not the whole
>>> np.random API. Probably. And by 'deprecate' I mean 'whine loudly in
>>> some fashion when people use it', not 'rip out in a few releases'. I
>>> think.)
>>>
>>> -n
>>
>> What do you mean that the idea of global shared state is a bad one?
>
> The words "global shared state" drives fear into the hearts of
> experienced programmers everywhere, whatever the context. :-) It's
> rarely a *good* idea.
>
>> How would
>> you prefer the API to look?
>
> There are two current APIs:
>
> 1. Instantiate RandomState and call it's methods
> 2. Just call the functions in numpy.random
>
> The latter has a shared global state. In fact, all of those
> "functions" are just references to the methods on a shared global
> RandomState instance.
>
> We advocate using the former API. Note that it already exists. It was
> the recommended API from day one. No one is recommending adding a new
> API.

I never saw much advertising for the RandomState api, and until
recently wasn't sure why using the global random state function
np.random.norm, ... should be a bad idea.

Learning by example, and seeing almost all examples using the global
state, is not exactly conducive to figuring out that there is an
issue.

All of scipy.stats.distribution random numbers are using the global
random state. (I guess I should open a ticket.)

Josef

>
>> An alternative is a stateless rng, where you have
>> to pass it it's state on each invocation, which it would update and return.  I
>> hope you're not advocating that.
>
> No. This is a place where OOP solved the problem neatly.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From jaakko.luttinen at aalto.fi  Wed Mar 13 05:15:36 2013
From: jaakko.luttinen at aalto.fi (Jaakko Luttinen)
Date: Wed, 13 Mar 2013 11:15:36 +0200
Subject: [Numpy-discussion] Bug in einsum?
Message-ID: <514043B8.3010703@aalto.fi>

Hi,

I have encountered a very weird behaviour with einsum. I try to compute
something like R*A*R', where * denotes a kind of "matrix
multiplication". However, for particular shapes of R and A, the results
are extremely bad.

I compare two einsum results:
First, I compute in two einsum calls as (R*A)*R'.
Second, I compute the whole result in one einsum call.
However, the results are significantly different for some shapes.

My test:
import numpy as np
for D in range(30):
    A = np.random.randn(100,D,D)
    R = np.random.randn(D,D)
    Y1 = np.einsum('...ik,...kj->...ij', R, A)
    Y1 = np.einsum('...ik,...kj->...ij', Y1, R.T)
    Y2 = np.einsum('...ik,...kl,...lj->...ij', R, A, R.T)
    print("D=%d" % D, np.allclose(Y1,Y2), np.linalg.norm(Y1-Y2))

Output:
D=0 True 0.0
D=1 True 0.0
D=2 True 8.40339658678e-15
D=3 True 8.09995399928e-15
D=4 True 3.59428803435e-14
D=5 False 34.755610184
D=6 False 28.3576558351
D=7 False 41.5402690906
D=8 True 2.31709582841e-13
D=9 False 36.0161112799
D=10 True 4.76237746912e-13
D=11 True 4.57944440782e-13
D=12 True 4.90302218301e-13
D=13 True 6.96175851271e-13
D=14 True 1.10067181384e-12
D=15 True 1.29095933163e-12
D=16 True 1.3466837332e-12
D=17 True 1.52265065763e-12
D=18 True 2.05407923852e-12
D=19 True 2.33327630748e-12
D=20 True 2.96849358082e-12
D=21 True 3.31063706175e-12
D=22 True 4.28163620455e-12
D=23 True 3.58951880681e-12
D=24 True 4.69973694769e-12
D=25 True 5.47385264567e-12
D=26 True 5.49643316347e-12
D=27 True 6.75132988402e-12
D=28 True 7.86435437892e-12
D=29 True 7.85453681029e-12

So, for D={5,6,7,9}, allclose returns False and the error norm is HUGE.
It doesn't seem like just some small numerical inaccuracy because the
error norm is so large. I don't know which one is correct (Y1 or Y2) but
at least either one is wrong in my opinion.

I ran the same test several times, and each time same values of D fail.
If I change the shapes somehow, the failing values of D might change
too, but I usually have several failing values.

I'm running the latest version from github (commit bd7104cef4) under
Python 3.2.3. With NumPy 1.6.1 under Python 2.7.3 the test crashes and
Python exits printing "Floating point exception".

This seems so weird to me that I wonder if I'm just doing something stupid..

Thanks a lot for any help!
Jaakko


From ndbecker2 at gmail.com  Wed Mar 13 09:23:59 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 13 Mar 2013 09:23:59 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
	<khogga$ff6$1@ger.gmane.org> <khohgb$n4h$1@ger.gmane.org>
Message-ID: <khpulb$qel$1@ger.gmane.org>

Neal Becker wrote:

> Neal Becker wrote:
> 
>> I guess I talked to you about 100 years ago about sharing state between numpy
>> rng and code I have in c++ that wraps boost::random.  So is there a C-api for
>> this RandomState object I could use to call from c++?  Maybe I could do
>> something with that.
>> 
>> The c++ code could invoke via the python api, but that might be slower.  I'm
>> just rambling here, I'd have to see the API to get some ideas.
> 
> I think if I could just grab a long int from the underlying mersenne twister,
> through some c api?

Well, this at least appears to work - probably not the most efficient approach - 
calls the RandomState object via the python interface to get 4 bytes at a time:

int test1 (bp::object & rs) {
  bp::str bytes = call_method<bp::str> (rs.ptr(), "bytes", 4); // get 4 bytes
  return *reinterpret_cast<int*> (PyString_AS_STRING (bytes.ptr()));
}

BOOST_PYTHON_MODULE (numpy_rand) {
  boost::numpy::initialize();

  def ("test1", &test1);
}


From Andrea.Cimatoribus at nioz.nl  Wed Mar 13 09:45:23 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 13 Mar 2013 14:45:23 +0100
Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>

Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere.
I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling).
At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic?
Thanks

import numpy as np
#For cython!
cimport numpy as np
from libc.stdint cimport uint32_t

def cffskip32(fid, int count=1, int skip=0):

    cdef int k=0
    cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count, dtype=np.uint32)
    
    if skip>=0:
        while k<count:
            try:
                data[k] = np.fromfile(fid, count=1, dtype=np.uint32)
                fid.seek(skip, 1)
                k +=1
            except ValueError:
                data = data[:k]
                break
        return data

From njs at pobox.com  Wed Mar 13 09:54:13 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 13 Mar 2013 13:54:13 +0000
Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
Message-ID: <CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>

On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus
<Andrea.Cimatoribus at nioz.nl> wrote:
> Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere.
> I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling).
> At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic?

If your data is stored as fixed-format binary (as it seems it is),
then the easiest way is probably

# Exploit the operating system's virtual memory manager to get a
"virtual copy" of the entire file in memory
# (This does not actually use any memory until accessed):
virtual_arr = np.memmap(path, np.uint32, "r")
# Get a numpy view onto every 20th entry:
virtual_arr_subsampled = virtual_arr[::20]
# Copy those bits into regular malloc'ed memory:
arr_subsampled = virtual_arr_subsampled.copy()

(Your data is probably large enough that this will only work if you're
using a 64-bit system, because of address space limitations; but if
you have data that's too large to fit into memory, then I assume
you're using a 64-bit system anyway...)

-n


From nouiz at nouiz.org  Wed Mar 13 10:03:10 2013
From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=)
Date: Wed, 13 Mar 2013 10:03:10 -0400
Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks
In-Reply-To: <CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
Message-ID: <CADKKbtgC-AGagpJN+RbsxT9QhUGgum61jTyPRMGhaZHYjd8zsw@mail.gmail.com>

Hi,

I would suggest that you look at pytables[1]. It use a different file
format, but it seam to do exactly what you want and give an object
that have a very similar interface to numpy.ndarray (but fewer
function). You would just ask for the slice/indices that you want and
it return you a numpy.ndarray.

HTH

Fr?d?ric

[1] http://www.pytables.org/moin

On Wed, Mar 13, 2013 at 9:54 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus
> <Andrea.Cimatoribus at nioz.nl> wrote:
>> Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere.
>> I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling).
>> At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic?
>
> If your data is stored as fixed-format binary (as it seems it is),
> then the easiest way is probably
>
> # Exploit the operating system's virtual memory manager to get a
> "virtual copy" of the entire file in memory
> # (This does not actually use any memory until accessed):
> virtual_arr = np.memmap(path, np.uint32, "r")
> # Get a numpy view onto every 20th entry:
> virtual_arr_subsampled = virtual_arr[::20]
> # Copy those bits into regular malloc'ed memory:
> arr_subsampled = virtual_arr_subsampled.copy()
>
> (Your data is probably large enough that this will only work if you're
> using a 64-bit system, because of address space limitations; but if
> you have data that's too large to fit into memory, then I assume
> you're using a 64-bit system anyway...)
>
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From mpuecker at mit.edu  Wed Mar 13 09:56:07 2013
From: mpuecker at mit.edu (Matt U)
Date: Wed, 13 Mar 2013 13:56:07 +0000 (UTC)
Subject: [Numpy-discussion] numpy reference array
Message-ID: <loom.20130313T145058-115@post.gmane.org>

Is it possible to create a numpy array which points to the same data in a
different numpy array (but in different order etc)?

For example:

Code: 
------------------------------------------------------------------------------
import numpy as np
a = np.arange(10)
ids = np.array([0,0,5,5,9,9,1,1])
b = a[ids]
a[0] = -1
b[0] #should be -1 if b[0] referenced the same data as a[0]
    0
------------------------------------------------------------------------------

ctypes almost does it for me, but the access is inconvenient. I would like to
access b as a regular numpy array:

Code: 
------------------------------------------------------------------------------
import numpy as np
import ctypes
a = np.arange(10)
ids = np.array([0,0,5,5,9,9,1,1])
b = [a[id:id+1].ctypes.data_as(ctypes.POINTER(ctypes.c_long)) for id in ids]
a[0] = -1
b[0][0] #access is inconvenient
    -1
------------------------------------------------------------------------------
Some more information: I've written a finite-element code, and I'm working on
optimizing the python implementation. Profiling shows the slowest operation is
the re-creation of an array that extracts edge degrees of freedom from the
volume of the element (similar to b above). So, I'm trying to avoid copying the
data every time, and just setting up 'b' once. The ctypes solution is
sub-optimal since my code is mostly vectorized, that is, later I'd like to
something like

Code:
------------------------------------------------------------------------------
c[ids] = b[ids] + d[ids]
------------------------------------------------------------------------------

where c, and d are the same shape as b but contain different data.

Any thoughts? If it's not possible that will save me time searching.


From Andrea.Cimatoribus at nioz.nl  Wed Mar 13 10:18:53 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 13 Mar 2013 15:18:53 +0100
Subject: [Numpy-discussion] R:  fast numpy.fromfile skipping data chunks
In-Reply-To: <CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>,
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>

This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff).
I'll at pytables.

# Exploit the operating system's virtual memory manager to get a
"virtual copy" of the entire file in memory
# (This does not actually use any memory until accessed):
virtual_arr = np.memmap(path, np.uint32, "r")
# Get a numpy view onto every 20th entry:
virtual_arr_subsampled = virtual_arr[::20]
# Copy those bits into regular malloc'ed memory:
arr_subsampled = virtual_arr_subsampled.copy()


From jaakko.luttinen at aalto.fi  Wed Mar 13 10:21:13 2013
From: jaakko.luttinen at aalto.fi (Jaakko Luttinen)
Date: Wed, 13 Mar 2013 16:21:13 +0200
Subject: [Numpy-discussion] Dot/inner products with broadcasting?
Message-ID: <51408B59.8090504@aalto.fi>

Hi!

How can I compute dot product (or similar multiply&sum operations)
efficiently so that broadcasting is utilized?
For multi-dimensional arrays, NumPy's inner and dot functions do not
match the leading axes and use broadcasting, but instead the result has
first the leading axes of the first input array and then the leading
axes of the second input array.

For instance, I would like to compute the following inner-product:
np.sum(A*B, axis=-1)

But numpy.inner gives:
A = np.random.randn(2,3,4)
B = np.random.randn(3,4)
np.inner(A,B).shape
# -> (2, 3, 3) instead of (2, 3)

Similarly for dot product, I would like to compute for instance:
np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2)

But numpy.dot gives:
In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5)
In [13]: np.dot(A,B).shape
# -> (2, 3, 2, 5) instead of (2, 3, 5)

I could use einsum for these operations, but I'm not sure whether that's
as efficient as using some BLAS-supported(?) dot products.

I couldn't find any function which could perform this kind of
operations. NumPy's functions seem to either flatten the input arrays
(vdot, outer) or just use the axes of the input arrays separately (dot,
inner, tensordot).

Any help?

Best regards,
Jaakko


From Andrea.Cimatoribus at nioz.nl  Wed Mar 13 10:21:50 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 13 Mar 2013 15:21:50 +0100
Subject: [Numpy-discussion] R:  fast numpy.fromfile skipping data chunks
In-Reply-To: <CADKKbtgC-AGagpJN+RbsxT9QhUGgum61jTyPRMGhaZHYjd8zsw@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>,
	<CADKKbtgC-AGagpJN+RbsxT9QhUGgum61jTyPRMGhaZHYjd8zsw@mail.gmail.com>
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D380@es1.nioz.nl>

I see that pytables deals with hdf5 data. It would be very nice if the data were in such a standard format, but that is not the case, and that can't be changed.

________________________________________
Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Fr?d?ric Bastien [nouiz at nouiz.org]
Inviato: mercoled? 13 marzo 2013 15.03
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

Hi,

I would suggest that you look at pytables[1]. It use a different file
format, but it seam to do exactly what you want and give an object
that have a very similar interface to numpy.ndarray (but fewer
function). You would just ask for the slice/indices that you want and
it return you a numpy.ndarray.

HTH

Fr?d?ric

[1] http://www.pytables.org/moin

On Wed, Mar 13, 2013 at 9:54 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus
> <Andrea.Cimatoribus at nioz.nl> wrote:
>> Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere.
>> I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling).
>> At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic?
>
> If your data is stored as fixed-format binary (as it seems it is),
> then the easiest way is probably
>
> # Exploit the operating system's virtual memory manager to get a
> "virtual copy" of the entire file in memory
> # (This does not actually use any memory until accessed):
> virtual_arr = np.memmap(path, np.uint32, "r")
> # Get a numpy view onto every 20th entry:
> virtual_arr_subsampled = virtual_arr[::20]
> # Copy those bits into regular malloc'ed memory:
> arr_subsampled = virtual_arr_subsampled.copy()
>
> (Your data is probably large enough that this will only work if you're
> using a 64-bit system, because of address space limitations; but if
> you have data that's too large to fit into memory, then I assume
> you're using a 64-bit system anyway...)
>
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


From njs at pobox.com  Wed Mar 13 10:32:29 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 13 Mar 2013 14:32:29 +0000
Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
Message-ID: <CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>

On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus
<Andrea.Cimatoribus at nioz.nl> wrote:
> This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff).

np.memmap takes an offset= argument.

-n


From Andrea.Cimatoribus at nioz.nl  Wed Mar 13 10:37:54 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 13 Mar 2013 15:37:54 +0100
Subject: [Numpy-discussion] R: R: fast numpy.fromfile skipping data chunks
In-Reply-To: <CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>,
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>

Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says.
Indeed, this is silly.

________________________________________
Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Nathaniel Smith [njs at pobox.com]
Inviato: mercoled? 13 marzo 2013 15.32
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus
<Andrea.Cimatoribus at nioz.nl> wrote:
> This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff).

np.memmap takes an offset= argument.

-n
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


From Andrea.Cimatoribus at nioz.nl  Wed Mar 13 10:40:07 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 13 Mar 2013 15:40:07 +0100
Subject: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>,
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>,
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D382@es1.nioz.nl>

On top of that, there is another issue: it can be that the data available itself is not a multiple of dtype, since there can be write errors at the end of the file, and I don't know how to deal with that.
________________________________________
Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Andrea Cimatoribus
Inviato: mercoled? 13 marzo 2013 15.37
A: Discussion of Numerical Python
Oggetto: [Numpy-discussion] R: R: fast numpy.fromfile skipping data chunks

Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says.
Indeed, this is silly.

________________________________________
Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Nathaniel Smith [njs at pobox.com]
Inviato: mercoled? 13 marzo 2013 15.32
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks

On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus
<Andrea.Cimatoribus at nioz.nl> wrote:
> This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff).

np.memmap takes an offset= argument.

-n
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


From Andrea.Cimatoribus at nioz.nl  Wed Mar 13 10:46:30 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 13 Mar 2013 15:46:30 +0100
Subject: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>,
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>,
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>

>Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says.

My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains.
ac


From njs at pobox.com  Wed Mar 13 10:53:25 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 13 Mar 2013 14:53:25 +0000
Subject: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data
	chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
Message-ID: <CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>

On Wed, Mar 13, 2013 at 2:46 PM, Andrea Cimatoribus
<Andrea.Cimatoribus at nioz.nl> wrote:
>>Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says.
>
> My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains.

Worst case you can always work around such issues with an extra layer
of view manipulation:

# create a raw view onto the contents of the file
file_bytes = np.memmap(path, dtype=np.uint8, ...)
# cut out any arbitrary number of bytes from the beginning and end
data_bytes = file_bytes[...some slice expression...]
# switch to viewing the bytes as the proper data type
data = data_bytes.view(dtype=np.uint32)
# proceed as before

-n


From francesc at continuum.io  Wed Mar 13 10:53:38 2013
From: francesc at continuum.io (Francesc Alted)
Date: Wed, 13 Mar 2013 15:53:38 +0100
Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
Message-ID: <514092F2.6@continuum.io>

On 3/13/13 2:45 PM, Andrea Cimatoribus wrote:
> Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere.
> I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling).
[clip]

You can do a fid.seek(offset) prior to np.fromfile() and the it will 
read from offset.  See the docstrings for `file.seek()` on how to use it.

-- 
Francesc Alted


From francesc at continuum.io  Wed Mar 13 11:04:31 2013
From: francesc at continuum.io (Francesc Alted)
Date: Wed, 13 Mar 2013 16:04:31 +0100
Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks
In-Reply-To: <514092F2.6@continuum.io>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<514092F2.6@continuum.io>
Message-ID: <5140957F.4030708@continuum.io>

On 3/13/13 3:53 PM, Francesc Alted wrote:
> On 3/13/13 2:45 PM, Andrea Cimatoribus wrote:
>> Hi everybody, I hope this has not been discussed before, I couldn't 
>> find a solution elsewhere.
>> I need to read some binary data, and I am using numpy.fromfile to do 
>> this. Since the files are huge, and would make me run out of memory, 
>> I need to read data skipping some records (I am reading data recorded 
>> at high frequency, so basically I want to read subsampling).
> [clip]
>
> You can do a fid.seek(offset) prior to np.fromfile() and the it will 
> read from offset.  See the docstrings for `file.seek()` on how to use it.
>

Ups, you were already using file.seek().  Disregard, please.

-- 
Francesc Alted


From Andrea.Cimatoribus at nioz.nl  Wed Mar 13 11:13:50 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 13 Mar 2013 16:13:50 +0100
Subject: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping
	data	chunks
In-Reply-To: <CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>,
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>

Ok, this seems to be working (well, as soon as I get the right offset and things like that, but that's a different story).
The problem is that it does not go any faster than my initial function compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file?

________________________________________
Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Nathaniel Smith [njs at pobox.com]
Inviato: mercoled? 13 marzo 2013 15.53
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data      chunks

On Wed, Mar 13, 2013 at 2:46 PM, Andrea Cimatoribus
<Andrea.Cimatoribus at nioz.nl> wrote:
>>Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says.
>
> My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains.

Worst case you can always work around such issues with an extra layer
of view manipulation:

# create a raw view onto the contents of the file
file_bytes = np.memmap(path, dtype=np.uint8, ...)
# cut out any arbitrary number of bytes from the beginning and end
data_bytes = file_bytes[...some slice expression...]
# switch to viewing the bytes as the proper data type
data = data_bytes.view(dtype=np.uint32)
# proceed as before

-n
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


From njs at pobox.com  Wed Mar 13 11:43:02 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 13 Mar 2013 15:43:02 +0000
Subject: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping
	data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>
Message-ID: <CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>

On 13 Mar 2013 15:16, "Andrea Cimatoribus" <Andrea.Cimatoribus at nioz.nl>
wrote:
>
> Ok, this seems to be working (well, as soon as I get the right offset and
things like that, but that's a different story).
> The problem is that it does not go any faster than my initial function
compiled with cython, and it is still a lot slower than fromfile. Is there
a reason why, even with compiled code, reading from a file skipping some
records should be slower than reading the whole file?

Oh, in that case you're probably IO bound, not CPU bound, so Cython etc.
can't help.

Traditional spinning-disk hard drives can read quite quickly, but take a
long time to find the right place to read from and start reading. Your OS
has heuristics in it to detect sequential reads and automatically start the
setup for the next read while you're processing the previous read, so you
don't see the seek overhead. If your reads are widely separated enough,
these heuristics will get confused and you'll drop back to doing a new disk
seek on every call to read(), which is deadly. (And would explain what
you're seeing.) If this is what's going on, your best bet is to just write
a python loop that uses fromfile() to read some largeish (megabytes?)
chunk, subsample those and throw away the rest, and repeat.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130313/2db5dfad/attachment.html>

From jaakko.luttinen at aalto.fi  Wed Mar 13 11:46:56 2013
From: jaakko.luttinen at aalto.fi (Jaakko Luttinen)
Date: Wed, 13 Mar 2013 17:46:56 +0200
Subject: [Numpy-discussion] Performance of einsum?
Message-ID: <51409F70.800@aalto.fi>

Hi,

I was wondering if someone could provide some intuition on the
performance of einsum?

I have found that sometimes it is extremely efficient but sometimes it
is several orders of magnitudes slower compared to some other
approaches, for instance, using multiple dot-calls.

My intuition is that the computation time of einsum is linear with
respect to the size of the "index space", that is, the product of the
maximums of the indices.

So, for instance computing the dot product of three matrices A*B*C would
not be efficient as einsum('ij,jk,kl->il', A, B, C) because there are
four indices i=1,...,I, j=1,...,J, k=1,...,K and l=1,...,L so the total
computation time is O(I*J*K*L) which is much worse than with two dot
products O(I*J*K+J*K*L), or with two einsum-calls for Y=A*B and Y*C.

On the other hand, computing einsum('ij,ij,ij->i', A, B, C) would be
"efficient" because the computation time is only O(I*J).

Is this intuition roughly correct or how could I intuitively understand
when the usage of einsum is bad?

Best regards,
Jaakko


From Andrea.Cimatoribus at nioz.nl  Wed Mar 13 11:54:24 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 13 Mar 2013 16:54:24 +0100
Subject: [Numpy-discussion] R: fast numpy.fromfile skipping	data chunks
In-Reply-To: <CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>,
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D385@es1.nioz.nl>

Thanks a lot for the feedback, I'll try to modify my function to overcome this issue.
Since I'm in the process of buying new hardware too, a slight OT (but definitely related).
Does an ssd provide substantial improvement in these cases?
________________________________________
Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Nathaniel Smith [njs at pobox.com]
Inviato: mercoled? 13 marzo 2013 16.43
A: Discussion of Numerical Python
Oggetto: Re: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping        data chunks

On 13 Mar 2013 15:16, "Andrea Cimatoribus" <Andrea.Cimatoribus at nioz.nl<mailto:Andrea.Cimatoribus at nioz.nl>> wrote:
>
> Ok, this seems to be working (well, as soon as I get the right offset and things like that, but that's a different story).
> The problem is that it does not go any faster than my initial function compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file?

Oh, in that case you're probably IO bound, not CPU bound, so Cython etc. can't help.

Traditional spinning-disk hard drives can read quite quickly, but take a long time to find the right place to read from and start reading. Your OS has heuristics in it to detect sequential reads and automatically start the setup for the next read while you're processing the previous read, so you don't see the seek overhead. If your reads are widely separated enough, these heuristics will get confused and you'll drop back to doing a new disk seek on every call to read(), which is deadly. (And would explain what you're seeing.) If this is what's going on, your best bet is to just write a python loop that uses fromfile() to read some largeish (megabytes?) chunk, subsample those and throw away the rest, and repeat.

-n


From dineshbvadhia at hotmail.com  Wed Mar 13 11:59:12 2013
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Wed, 13 Mar 2013 08:59:12 -0700
Subject: [Numpy-discussion] (@Pat Marion) Re:  Yes,
	this one again "ImportError: No module named multiarray"
In-Reply-To: <CAMetC=ScJQuPqSEgnBDGn3nrPq6CXOm1zOec3-YHON_Y+vz-0w@mail.gmail.com>
References: <CAPhiW4iMbeBBcdfTSw-xr6EMcZZfz=hzXNmxdsq=ecjVcDabmw@mail.gmail.com>
	<CAMetC=ScJQuPqSEgnBDGn3nrPq6CXOm1zOec3-YHON_Y+vz-0w@mail.gmail.com>
Message-ID: <BAY173-DS3FEE4106090966F424A03A3E30@phx.gbl>

Many thanks Pat - the numpy discussion list is brill.
Go ahead and see if the CPython developers would be interested as it is a problem that appears all the time on boards/lists.
Best ... Dinesh


From: Pat Marion 
Sent: Tuesday, March 12, 2013 7:23 AM
To: Aron Ahmadia 
Cc: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] (@Pat Marion) Re: Yes,this one again "ImportError: No module named multiarray"


Thanks for copying me, Aron.

Hi Dinesh, I have a github project which demonstrates how to use numpy with freeze.  The project's readme includes more information:

  https://github.com/patmarion/NumpyBuiltinExample

It does require a small patch to CPython's import.c file.  I haven't tried posted this patch to the CPython developers, perhaps there'd be interest incorporating it upstream.

Pat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130313/f975cca6/attachment.html>

From dineshbvadhia at hotmail.com  Wed Mar 13 12:08:22 2013
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Wed, 13 Mar 2013 09:08:22 -0700
Subject: [Numpy-discussion] Yes,
	this one again "ImportError: No module named multiarray"
In-Reply-To: <CALGmxEKZSueVQrm-4Tuy9S5oN8YxVmazWODbdo9cd1T4rx8iwA@mail.gmail.com>
References: <BAY173-DS1861817D28779362868CE1A3E20@phx.gbl><CAPhiW4jHfdSyA890Fp1WbFATBdEyNhGHOmha3tBi=Y8KjkvGtA@mail.gmail.com><BAY173-DS11F15717AF2E424ECA4E87A3E20@phx.gbl>
	<CALGmxEKZSueVQrm-4Tuy9S5oN8YxVmazWODbdo9cd1T4rx8iwA@mail.gmail.com>
Message-ID: <BAY173-DS47F65C23B18E1A9709B9DA3E30@phx.gbl>

Hi Chris
Darn!  It worked this morning and I don't know why.

Focused on PyInstaller because it creates a single executable.  Testing on 
all major versions of Windows (32-bit and 64-bit), Linux and OSX.  The 
problem OS is unsurprisingly, Windows XP (SP3).

Numpy was upgraded to the mkl-version and maybe that did the trick.  Tried 
to replicate on an identical Windows XP machine using the standard 
sourceforge distribution but that resulted in a pyinstaller error.

Anyway, using the latest releases of all software ie. Python 2.7.3, Numpy 
1.7.0, Scipy 0.11.0, PyInstaller 2.0.

Will post back if run into problems again.  Best ...


--------------------------------------------------
From: "Chris Barker - NOAA Federal" <chris.barker at noaa.gov>
Sent: Tuesday, March 12, 2013 2:50 PM
To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
Subject: Re: [Numpy-discussion] Yes,this one again "ImportError: No module 
named multiarray"

> On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia
> <dineshbvadhia at hotmail.com> wrote:
>> Does that mean numpy won't work with freeze/create_executable type of 
>> tools
>> or is there a workaround?
>
> I've used numpy with py2exe and py2app out of the box with no issues (
> actually, there is an issue with too much stuff getting bundled up,
> but it works)
>
>>> ImportError let alone what the solution is.  The Traceback, similar to
>>> others found on the web, is:
>>>
>>> Traceback (most recent call last):
>>>   File "test.py", ...
>>>   File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in
>>> <module>
>
> This indicates that your code is importing the numpy that's inside the
> system installation -- it should be using one in your app bundle.
>
> What bundling tool are you using?
> How did you install python/numpy?
> What does your bundling tol config look like?
> And, of course, version numbers of everything.
>
> -Chris
>
> -- 
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
> 


From rhattersley at gmail.com  Wed Mar 13 12:53:07 2013
From: rhattersley at gmail.com (Richard Hattersley)
Date: Wed, 13 Mar 2013 16:53:07 +0000
Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
Message-ID: <CAP=RS9=HdCzkt-VCYUMK0oYLPg3qobYPWt_zfZiRuu28V=CJGw@mail.gmail.com>

> Since the files are huge, and would make me run out of memory, I need to
read data skipping some records

Is it possible to describe what you're doing with the data once you have
subsampled it? And if there were a way to work with the full resolution
data, would that be desirable?

I ask because I've been dabbling with a pure-Python library for handilng
larger-than-memory datasets - https://github.com/SciTools/biggus, and it
uses similar chunking techniques as mentioned in the other replies to
process data at the full streaming I/O rate. It's still in the early stages
of development so the design can be fluid, so maybe it's worth seeing if
there's enough in common with your needs to warrant adding your use case.

Richard


On 13 March 2013 13:45, Andrea Cimatoribus <Andrea.Cimatoribus at nioz.nl>wrote:

> Hi everybody, I hope this has not been discussed before, I couldn't find a
> solution elsewhere.
> I need to read some binary data, and I am using numpy.fromfile to do this.
> Since the files are huge, and would make me run out of memory, I need to
> read data skipping some records (I am reading data recorded at high
> frequency, so basically I want to read subsampling).
> At the moment, I came up with the code below, which is then compiled using
> cython. Despite the significant performance increase from the pure python
> version, the function is still much slower than numpy.fromfile, and only
> reads one kind of data (in this case uint32), otherwise I do not know how
> to define the array type in advance. I have basically no experience with
> cython nor c, so I am a bit stuck. How can I try to make this more
> efficient and possibly more generic?
> Thanks
>
> import numpy as np
> #For cython!
> cimport numpy as np
> from libc.stdint cimport uint32_t
>
> def cffskip32(fid, int count=1, int skip=0):
>
>     cdef int k=0
>     cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count,
> dtype=np.uint32)
>
>     if skip>=0:
>         while k<count:
>             try:
>                 data[k] = np.fromfile(fid, count=1, dtype=np.uint32)
>                 fid.seek(skip, 1)
>                 k +=1
>             except ValueError:
>                 data = data[:k]
>                 break
>         return data
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130313/a15e94e4/attachment.html>

From davidmenhur at gmail.com  Wed Mar 13 13:41:19 2013
From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=)
Date: Wed, 13 Mar 2013 18:41:19 +0100
Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D385@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D385@es1.nioz.nl>
Message-ID: <CAJhcF=1KwAPfh29W2sqyn9dOHUPBjysSAFiy58kjJfQ+Hgm5DQ@mail.gmail.com>

On 13 March 2013 16:54, Andrea Cimatoribus <Andrea.Cimatoribus at nioz.nl>wrote:

>  Since I'm in the process of buying new hardware too, a slight OT (but
> definitely related).
> Does an ssd provide substantial improvement in these cases?
>

It should help. Nevertheless, when talking about performance, it is
difficult to predict, mainly because in a computer there are many things
going on and many layers involved.

I have a couple of computers equipped with SSD, if you want, if you send me
some benchmarks I can run them and see if I get any speedup.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130313/18eea119/attachment.html>

From ndbecker2 at gmail.com  Wed Mar 13 14:40:28 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 13 Mar 2013 14:40:28 -0400
Subject: [Numpy-discussion] can't run cython on mtrand.pyx
Message-ID: <khqh6o$5eq$1@ger.gmane.org>

Grabbed numpy-1.7.0 source.
Cython is 0.18

 cython mtrand.pyx produces lots of errors.


From robert.kern at gmail.com  Wed Mar 13 15:01:51 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 13 Mar 2013 19:01:51 +0000
Subject: [Numpy-discussion] can't run cython on mtrand.pyx
In-Reply-To: <khqh6o$5eq$1@ger.gmane.org>
References: <khqh6o$5eq$1@ger.gmane.org>
Message-ID: <CAF6FJiuxY54=G0a5CCp7Vo4Z4Zrd3844zVvkQxVjc3WO0ZeqVQ@mail.gmail.com>

On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Grabbed numpy-1.7.0 source.
> Cython is 0.18
>
>  cython mtrand.pyx produces lots of errors.

It helps to copy-and-paste the errors that you are seeing.

In any case, Cython 0.18 works okay on master's mtrand.pyx sources.

-- 
Robert Kern


From ndbecker2 at gmail.com  Wed Mar 13 15:20:30 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 13 Mar 2013 15:20:30 -0400
Subject: [Numpy-discussion] can't run cython on mtrand.pyx
References: <khqh6o$5eq$1@ger.gmane.org>
	<CAF6FJiuxY54=G0a5CCp7Vo4Z4Zrd3844zVvkQxVjc3WO0ZeqVQ@mail.gmail.com>
Message-ID: <khqjhq$tj0$1@ger.gmane.org>

Robert Kern wrote:

> On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> Grabbed numpy-1.7.0 source.
>> Cython is 0.18
>>
>>  cython mtrand.pyx produces lots of errors.
> 
> It helps to copy-and-paste the errors that you are seeing.
> 
> In any case, Cython 0.18 works okay on master's mtrand.pyx sources.
> 

Well, this is the first error:

cython mtrand.pyx

Error compiling Cython file:
------------------------------------------------------------
...
                PyArray_DIMS(oa) , NPY_DOUBLE)
        length = PyArray_SIZE(array)
        array_data = <double *>PyArray_DATA(array)
        itera = <flatiter>PyArray_IterNew(<object>oa)
        for i from 0 <= i < length:
            array_data[i] = func(state, (<double *>(itera.dataptr))[0])
                                        ^
------------------------------------------------------------

mtrand.pyx:177:41: Python objects cannot be cast to pointers of primitive types


From charlesr.harris at gmail.com  Wed Mar 13 19:40:00 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 13 Mar 2013 17:40:00 -0600
Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D385@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D385@es1.nioz.nl>
Message-ID: <CAB6mnxKF+-K63oDGsSWVTW+suJQY_Q2XW+Ka1pv91zp+VfKOng@mail.gmail.com>

On Wed, Mar 13, 2013 at 9:54 AM, Andrea Cimatoribus <
Andrea.Cimatoribus at nioz.nl> wrote:

> Thanks a lot for the feedback, I'll try to modify my function to overcome
> this issue.
> Since I'm in the process of buying new hardware too, a slight OT (but
> definitely related).
> Does an ssd provide substantial improvement in these cases?
>

It should. Seek time on an ssd is quite low, and readout is fast. Skipping
over items will probably not be as fast as a sequential read but I expect
it will be substantially faster than a disk. Nathaniel's loop idea will
probably work faster also. The sequential readout rate of a modern ssd will
be about 500 MB/sec, so you can probably just divide that into your file
size to get an estimate of the time needed.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130313/d0107725/attachment.html>

From chris.barker at noaa.gov  Wed Mar 13 20:50:20 2013
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Wed, 13 Mar 2013 17:50:20 -0700
Subject: [Numpy-discussion] numpy reference array
In-Reply-To: <loom.20130313T145058-115@post.gmane.org>
References: <loom.20130313T145058-115@post.gmane.org>
Message-ID: <CALGmxE+kjdPL_8nZjmLorfsneG=rRbjRDcqB-eXYw-Tpg=uaNQ@mail.gmail.com>

On Wed, Mar 13, 2013 at 6:56 AM, Matt U <mpuecker at mit.edu> wrote:
> Is it possible to create a numpy array which points to the same data in a
> different numpy array (but in different order etc)?

You can do this (easily), but only if the "different order" can be
defined in terms of strides. A simple example is a transpose:

In [3]: a = np.arange(12).reshape((3,4))

In [4]: a
Out[4]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [5]: b = a.T

In [6]: b
Out[6]:
array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

# b is the transpose of a
# but a view on the same data block:
# change a:
In [7]: a[2,1] = 44


In [8]: a
Out[8]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8, 44, 10, 11]])

# b is changed, too.
In [9]: b
Out[9]:
array([[ 0,  4,  8],
       [ 1,  5, 44],
       [ 2,  6, 10],
       [ 3,  7, 11]])

check out "stride tricks" for clever things you can do.

But numpy does require that the data in your array be a contiguous
block, in order, so you can't arbitrarily re-arrange it while keeping
a view.

HTH,
  -Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From birdada85 at gmail.com  Wed Mar 13 21:25:28 2013
From: birdada85 at gmail.com (Birdada Simret)
Date: Thu, 14 Mar 2013 02:25:28 +0100
Subject: [Numpy-discussion] Any help from Numpy community?
Message-ID: <CA+YeWVvhF1fbvKE1VyUiq=OMO-cjFTotDZkSUKPtEzFU0BiP1A@mail.gmail.com>

*

Any help from Numpy community
[[   0.          1.54        0.          0.          0.            1.08
1.08      1.08  ]
[ 1.54        0.          1.08        1.08      1.08        0.          0.
          0.   ]
 [    0.       1.08         0.          0.           0.            0.
    0.           0.   ]
 [    0.       1.08         0.          0.           0.            0.
   0.            0.    ]
 [   0.        1.08        0.           0.           0.            0.
   0.            0.    ]
 [ 1.08       0.           0.           0.           0.            0.
   0.            0.     ]
 [ 1.08       0.           0.           0.           0.            0.
   0.            0.     ]
 [ 1.08       0.           0.           0.           0.            0.
   0.            0.     ]]

the above is the numpy array matrix. the numbers represents:
C-C: 1.54 and C-H=1.08
So I want to write this form as
C of index i is connected to C of index j
C of index i is connected to H of index j

(C(i),C(j))  # key C(i) and value C(j)
(C(i),H(j)) # key C(i) and value H(j) ; the key C(i) can be repeated to
fulfil as much as the values of H(j)
To summarize,  the out put may look like:
C1 is connected to C2
C1 is connected to H1
C1 is connected to H3
C2 is connected to H2   etc....

Any guide is greatly appreciated,
thanks
birda

*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130314/3c7e949a/attachment.html>

From pat.marion at kitware.com  Wed Mar 13 22:33:01 2013
From: pat.marion at kitware.com (Pat Marion)
Date: Thu, 14 Mar 2013 12:33:01 +1000
Subject: [Numpy-discussion] Yes,
 this one again "ImportError: No module named multiarray"
In-Reply-To: <BAY173-DS47F65C23B18E1A9709B9DA3E30@phx.gbl>
References: <BAY173-DS1861817D28779362868CE1A3E20@phx.gbl>
	<CAPhiW4jHfdSyA890Fp1WbFATBdEyNhGHOmha3tBi=Y8KjkvGtA@mail.gmail.com>
	<BAY173-DS11F15717AF2E424ECA4E87A3E20@phx.gbl>
	<CALGmxEKZSueVQrm-4Tuy9S5oN8YxVmazWODbdo9cd1T4rx8iwA@mail.gmail.com>
	<BAY173-DS47F65C23B18E1A9709B9DA3E30@phx.gbl>
Message-ID: <CAMetC=SDM7=PGD3Gj3H_db24Re8wDu6fkyKHF72DtCaoNgzM_Q@mail.gmail.com>

Glad you got it working!  For those who might be interested, the
distinction between the example I linked to and packaging tools like
PyInstaller or py2exe, is that
NumpyBuiltinExample<https://github.com/patmarion/NumpyBuiltinExample>uses
static linking to embed numpy as a builtin module.  At runtime, there
is no dynamic loading, and there is no filesystem access.  The technique is
targeted at HPC or embedded systems where you might want to avoid touching
the filesystem, or avoid dynamic loading.

Pat


On Thu, Mar 14, 2013 at 2:08 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com>wrote:

> Hi Chris
> Darn!  It worked this morning and I don't know why.
>
> Focused on PyInstaller because it creates a single executable.  Testing on
> all major versions of Windows (32-bit and 64-bit), Linux and OSX.  The
> problem OS is unsurprisingly, Windows XP (SP3).
>
> Numpy was upgraded to the mkl-version and maybe that did the trick.  Tried
> to replicate on an identical Windows XP machine using the standard
> sourceforge distribution but that resulted in a pyinstaller error.
>
> Anyway, using the latest releases of all software ie. Python 2.7.3, Numpy
> 1.7.0, Scipy 0.11.0, PyInstaller 2.0.
>
> Will post back if run into problems again.  Best ...
>
>
> --------------------------------------------------
> From: "Chris Barker - NOAA Federal" <chris.barker at noaa.gov>
> Sent: Tuesday, March 12, 2013 2:50 PM
> To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
> Subject: Re: [Numpy-discussion] Yes,this one again "ImportError: No module
> named multiarray"
>
> > On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia
> > <dineshbvadhia at hotmail.com> wrote:
> >> Does that mean numpy won't work with freeze/create_executable type of
> >> tools
> >> or is there a workaround?
> >
> > I've used numpy with py2exe and py2app out of the box with no issues (
> > actually, there is an issue with too much stuff getting bundled up,
> > but it works)
> >
> >>> ImportError let alone what the solution is.  The Traceback, similar to
> >>> others found on the web, is:
> >>>
> >>> Traceback (most recent call last):
> >>>   File "test.py", ...
> >>>   File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in
> >>> <module>
> >
> > This indicates that your code is importing the numpy that's inside the
> > system installation -- it should be using one in your app bundle.
> >
> > What bundling tool are you using?
> > How did you install python/numpy?
> > What does your bundling tol config look like?
> > And, of course, version numbers of everything.
> >
> > -Chris
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR&R            (206) 526-6959   voice
> > 7600 Sand Point Way NE   (206) 526-6329   fax
> > Seattle, WA  98115       (206) 526-6317   main reception
> >
> > Chris.Barker at noaa.gov
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130314/365587b0/attachment.html>

From Andrea.Cimatoribus at nioz.nl  Thu Mar 14 04:48:08 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Thu, 14 Mar 2013 09:48:08 +0100
Subject: [Numpy-discussion] R: R: R: R: R: fast numpy.fromfile skipping	data
 chunks
In-Reply-To: <CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>,
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>

Thanks for all the feedback (on the SSD too). For what concerns "biggus" library, for working on larger-than-memory arrays, this is really interesting, but unfortunately I don't have time to test it at the moment, I will try to have a look at it in the future. I hope to see something like that implemented in numpy soon, though.


From sudheer.joseph at yahoo.com  Thu Mar 14 05:18:01 2013
From: sudheer.joseph at yahoo.com (Sudheer Joseph)
Date: Thu, 14 Mar 2013 17:18:01 +0800 (SGT)
Subject: [Numpy-discussion] Numpy correlate
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>,
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
Message-ID: <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com>

Dear Numpy/Scipy experts,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y)
Y is?slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags.
Can any one advice me on how to see which way exactly the 2 series are?slided?back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag.


with best regards,
Sudheer???
?
***************************************************************
Sudheer Joseph 
Indian National Centre for Ocean Information Services
Ministry of Earth Sciences, Govt. of India
POST BOX NO: 21, IDA Jeedeemetla P.O.
Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55
Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com
Web- http://oppamthadathil.tripod.com
***************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130314/067815b4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plt_xcorr.py
Type: text/x-python
Size: 428 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130314/067815b4/attachment.py>

From robert.kern at gmail.com  Thu Mar 14 06:19:06 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 14 Mar 2013 10:19:06 +0000
Subject: [Numpy-discussion] can't run cython on mtrand.pyx
In-Reply-To: <khqjhq$tj0$1@ger.gmane.org>
References: <khqh6o$5eq$1@ger.gmane.org>
	<CAF6FJiuxY54=G0a5CCp7Vo4Z4Zrd3844zVvkQxVjc3WO0ZeqVQ@mail.gmail.com>
	<khqjhq$tj0$1@ger.gmane.org>
Message-ID: <CAF6FJithrUy2Gn8jaqxQDSkcSpj1diPd4_wjOV_otzNZLctWNg@mail.gmail.com>

On Wed, Mar 13, 2013 at 7:20 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Robert Kern wrote:
>
>> On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> Grabbed numpy-1.7.0 source.
>>> Cython is 0.18
>>>
>>>  cython mtrand.pyx produces lots of errors.
>>
>> It helps to copy-and-paste the errors that you are seeing.
>>
>> In any case, Cython 0.18 works okay on master's mtrand.pyx sources.
>>
>
> Well, this is the first error:
>
> cython mtrand.pyx
>
> Error compiling Cython file:
> ------------------------------------------------------------
> ...
>                 PyArray_DIMS(oa) , NPY_DOUBLE)
>         length = PyArray_SIZE(array)
>         array_data = <double *>PyArray_DATA(array)
>         itera = <flatiter>PyArray_IterNew(<object>oa)
>         for i from 0 <= i < length:
>             array_data[i] = func(state, (<double *>(itera.dataptr))[0])
>                                         ^
> ------------------------------------------------------------
>
> mtrand.pyx:177:41: Python objects cannot be cast to pointers of primitive types

It looks like Cython 0.18 removed the members of flatiter in its copy
of numpy.pxd in favor of the macros that are recommended for numpy
1.7. The irony is not lost on me.

This should be (<double *>PyArray_ITER_DATA(itera)[0]).

I'm not sure why it appears to work in master, since this code in
mtrand.pyx did not change.

https://github.com/numpy/numpy/issues/3144

--
Robert Kern


From robert.kern at gmail.com  Thu Mar 14 06:24:41 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 14 Mar 2013 10:24:41 +0000
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
In-Reply-To: <khogga$ff6$1@ger.gmane.org>
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
	<khogga$ff6$1@ger.gmane.org>
Message-ID: <CAF6FJis2F9=oK6TOXzEHSkUoNev-A5hWh_H_9J5ph4X0wF=R=A@mail.gmail.com>

On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
> I guess I talked to you about 100 years ago about sharing state between numpy
> rng and code I have in c++ that wraps boost::random.  So is there a C-api for
> this RandomState object I could use to call from c++?  Maybe I could do
> something with that.

There is not one currently. Cython has provisions for sharing such
low-level access to other Cython extensions, but I'm not sure how well
it works for exporting data pointers and function pointers to general
C/++ code. We could probably package the necessities into a struct and
export a pointer to it via a PyCapsule.

-- 
Robert Kern


From ndbecker2 at gmail.com  Thu Mar 14 06:54:18 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 14 Mar 2013 06:54:18 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
	<khogga$ff6$1@ger.gmane.org>
	<CAF6FJis2F9=oK6TOXzEHSkUoNev-A5hWh_H_9J5ph4X0wF=R=A@mail.gmail.com>
Message-ID: <khsa8n$u9s$1@ger.gmane.org>

Robert Kern wrote:

> On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> I guess I talked to you about 100 years ago about sharing state between numpy
>> rng and code I have in c++ that wraps boost::random.  So is there a C-api for
>> this RandomState object I could use to call from c++?  Maybe I could do
>> something with that.
> 
> There is not one currently. Cython has provisions for sharing such
> low-level access to other Cython extensions, but I'm not sure how well
> it works for exporting data pointers and function pointers to general
> C/++ code. We could probably package the necessities into a struct and
> export a pointer to it via a PyCapsule.
> 

I did find a way to do this, and the results are good enough.  Timing is quite 
comparable to my pure c++ implementation.

I used rk_ulong from mtrand.so.  I also tried using rk_fill, but it was a bit 
slower.

The boost::python c++ code is attached, for posterity.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pn64.cc
Type: text/x-c++src
Size: 7382 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130314/869f1d90/attachment.cc>

From ndbecker2 at gmail.com  Thu Mar 14 07:00:39 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 14 Mar 2013 07:00:39 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
	<khogga$ff6$1@ger.gmane.org>
	<CAF6FJis2F9=oK6TOXzEHSkUoNev-A5hWh_H_9J5ph4X0wF=R=A@mail.gmail.com>
Message-ID: <khsakk$21k$1@ger.gmane.org>

Robert Kern wrote:

> On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> I guess I talked to you about 100 years ago about sharing state between numpy
>> rng and code I have in c++ that wraps boost::random.  So is there a C-api for
>> this RandomState object I could use to call from c++?  Maybe I could do
>> something with that.
> 
> There is not one currently. Cython has provisions for sharing such
> low-level access to other Cython extensions, but I'm not sure how well
> it works for exporting data pointers and function pointers to general
> C/++ code. We could probably package the necessities into a struct and
> export a pointer to it via a PyCapsule.
> 

One thing this code doesn't do: it requires construction of the wrapper class 
passing in a RandomState object.  It doesn't verify you actually gave it a 
RandomState object.  It's hard to do that.  The problem as I see it is to 
perform this check, I need the RandomStateType object, which unfortunately 
mtrand.so does not export.

The only way to do it is in c++ code:

1. import numpy.random
2. get RandomState class
3. call it to create RandomState instance
4. get the ob_type pointer.

Pretty ugly:

  object mod = object (handle<> 
(borrowed((PyImport_ImportModule("numpy.random")))));
  object rs_obj = mod.attr("RandomState");
  object rs_inst = call<object> (rs_obj.ptr(), 0);
  RandomStateTypeObj = rs_inst.ptr()->ob_type;


From robert.kern at gmail.com  Thu Mar 14 07:14:32 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 14 Mar 2013 11:14:32 +0000
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
In-Reply-To: <khsakk$21k$1@ger.gmane.org>
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
	<khogga$ff6$1@ger.gmane.org>
	<CAF6FJis2F9=oK6TOXzEHSkUoNev-A5hWh_H_9J5ph4X0wF=R=A@mail.gmail.com>
	<khsakk$21k$1@ger.gmane.org>
Message-ID: <CAF6FJisNfHVoLxbBjxUXaM5QgpNcqCp+q3J1KmkO2F3Wiijw5A@mail.gmail.com>

On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Robert Kern wrote:
>
>> On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> I guess I talked to you about 100 years ago about sharing state between numpy
>>> rng and code I have in c++ that wraps boost::random.  So is there a C-api for
>>> this RandomState object I could use to call from c++?  Maybe I could do
>>> something with that.
>>
>> There is not one currently. Cython has provisions for sharing such
>> low-level access to other Cython extensions, but I'm not sure how well
>> it works for exporting data pointers and function pointers to general
>> C/++ code. We could probably package the necessities into a struct and
>> export a pointer to it via a PyCapsule.
>>
>
> One thing this code doesn't do: it requires construction of the wrapper class
> passing in a RandomState object.  It doesn't verify you actually gave it a
> RandomState object.  It's hard to do that.  The problem as I see it is to
> perform this check, I need the RandomStateType object, which unfortunately
> mtrand.so does not export.
>
> The only way to do it is in c++ code:
>
> 1. import numpy.random
> 2. get RandomState class
> 3. call it to create RandomState instance
> 4. get the ob_type pointer.
>
> Pretty ugly:
>
>   object mod = object (handle<>
> (borrowed((PyImport_ImportModule("numpy.random")))));
>   object rs_obj = mod.attr("RandomState");
>   object rs_inst = call<object> (rs_obj.ptr(), 0);
>   RandomStateTypeObj = rs_inst.ptr()->ob_type;

PyObject_IsInstance() should be sufficient.

http://docs.python.org/2/c-api/object.html#PyObject_IsInstance

--
Robert Kern


From jaakko.luttinen at aalto.fi  Thu Mar 14 07:54:06 2013
From: jaakko.luttinen at aalto.fi (Jaakko Luttinen)
Date: Thu, 14 Mar 2013 13:54:06 +0200
Subject: [Numpy-discussion] Dot/inner products with broadcasting?
In-Reply-To: <51408B59.8090504@aalto.fi>
References: <51408B59.8090504@aalto.fi>
Message-ID: <5141BA5E.2020704@aalto.fi>

Answering to myself, this pull request seems to implement an inner
product with broadcasting (inner1d) and many other useful functions:
https://github.com/numpy/numpy/pull/2954/
-J

On 03/13/2013 04:21 PM, Jaakko Luttinen wrote:
> Hi!
> 
> How can I compute dot product (or similar multiply&sum operations)
> efficiently so that broadcasting is utilized?
> For multi-dimensional arrays, NumPy's inner and dot functions do not
> match the leading axes and use broadcasting, but instead the result has
> first the leading axes of the first input array and then the leading
> axes of the second input array.
> 
> For instance, I would like to compute the following inner-product:
> np.sum(A*B, axis=-1)
> 
> But numpy.inner gives:
> A = np.random.randn(2,3,4)
> B = np.random.randn(3,4)
> np.inner(A,B).shape
> # -> (2, 3, 3) instead of (2, 3)
> 
> Similarly for dot product, I would like to compute for instance:
> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2)
> 
> But numpy.dot gives:
> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5)
> In [13]: np.dot(A,B).shape
> # -> (2, 3, 2, 5) instead of (2, 3, 5)
> 
> I could use einsum for these operations, but I'm not sure whether that's
> as efficient as using some BLAS-supported(?) dot products.
> 
> I couldn't find any function which could perform this kind of
> operations. NumPy's functions seem to either flatten the input arrays
> (vdot, outer) or just use the axes of the input arrays separately (dot,
> inner, tensordot).
> 
> Any help?
> 
> Best regards,
> Jaakko
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From rnelsonchem at gmail.com  Thu Mar 14 09:05:17 2013
From: rnelsonchem at gmail.com (Ryan)
Date: Thu, 14 Mar 2013 13:05:17 +0000 (UTC)
Subject: [Numpy-discussion] Any help from Numpy community?
References: <CA+YeWVvhF1fbvKE1VyUiq=OMO-cjFTotDZkSUKPtEzFU0BiP1A@mail.gmail.com>
Message-ID: <loom.20130314T135946-789@post.gmane.org>

Birdada Simret <birdada85 <at> gmail.com> writes:

> 
> 
> Any help from Numpy community
> [[ ? 0. ? ? ? ? ?1.54 ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ? ?1.08 ? ?
1.08 ? ? ?1.08 ?]
> 
> [ 1.54 ? ? ? ?0. ? ? ? ? ?1.08 ? ? ? ?1.08 ? ? ?1.08 ? ? ? ?0. ? ? ? ? ?0. ? ?
? ? ? 0. ? ]
> ?[ ? ?0. ? ? ? 1.08 ? ? ? ? 0. ? ? ? ? ?0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ?
? 0. ? ? ? ? ? 0. ? ]
> ?[ ? ?0. ? ? ? 1.08 ? ? ? ? 0. ? ? ? ? ?0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ?
?0. ? ? ? ? ? ?0. ? ?]
> 
> ?[ ? 0. ? ? ? ?1.08 ? ? ? ?0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ?
?0. ? ? ? ? ? ?0. ? ?]
> ?[ 1.08 ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ?
?0. ? ? ? ? ? ?0. ? ? ]
> 
> ?[ 1.08 ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ?
?0. ? ? ? ? ? ?0. ? ? ]
> ?[ 1.08 ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ?
?0. ? ? ? ? ? ?0. ? ? ]]
> 
> 
> the above is the numpy array matrix. the numbers represents:
> C-C: 1.54 and C-H=1.08
> So I want to write this form as
> C of index i is connected to C of index j
> C of index i is connected to H of index j
> 
> 
> (C(i),C(j)) ?# key C(i) and value C(j)
> (C(i),H(j)) # key C(i) and value H(j) ; the key C(i) can be repeated to fulfil
as much as the values of H(j)
> To summarize, ?the out put may look like:
> 
> C1 is connected to C2
> C1 is connected to H1
> C1 is connected to H3
> C2 is connected to H2 ? etc....
> 
> Any guide is greatly appreciated,
> thanks
> birda
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

Birda,

I think this will get you some of the way there:

import numpy as np
x = ... # Here's your 2D atomic distance array
# Create an indexing array
index = np.arange( x.size ).reshape( x.shape )
# Find the non-zero indices 
items = index[ x != 0 ]
# You only need the first half because your array is symmetric
items = items[ : items.size/2]
rows = items / x.shape[0]
cols = items % x.shape[0]
print 'Rows:   ', rows
print 'Columns:', cols
print 'Atomic Distances:', x[rows, cols]

Hope it helps.

Ryan


From ndbecker2 at gmail.com  Thu Mar 14 09:25:53 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 14 Mar 2013 09:25:53 -0400
Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit?
References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io>
	<CAF6FJisgSOpsw+tEuEjxR47UDi9nW=95U+NAzUFNsHTQObdQVw@mail.gmail.com>
	<CAPJVwB=P9PvtkAJed2FZjD-SDuC_MO7Y2z9Yhoo8xtgnVBNE2g@mail.gmail.com>
	<CAPJVwBnbnTs4QDwP3yZhhg4m8q23pZGUxMcogE6ZeHPJxDfU2w@mail.gmail.com>
	<khoapr$r76$1@ger.gmane.org>
	<CAF6FJis=Q4R85JnKkEoo6FqN0vSurb8o+cexqw28KfsRX17-fA@mail.gmail.com>
	<khogga$ff6$1@ger.gmane.org>
	<CAF6FJis2F9=oK6TOXzEHSkUoNev-A5hWh_H_9J5ph4X0wF=R=A@mail.gmail.com>
	<khsakk$21k$1@ger.gmane.org>
	<CAF6FJisNfHVoLxbBjxUXaM5QgpNcqCp+q3J1KmkO2F3Wiijw5A@mail.gmail.com>
Message-ID: <khsj4u$mgp$1@ger.gmane.org>

Robert Kern wrote:

> On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> Robert Kern wrote:
>>
>>> On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>>> I guess I talked to you about 100 years ago about sharing state between
>>>> numpy
>>>> rng and code I have in c++ that wraps boost::random.  So is there a C-api
>>>> for
>>>> this RandomState object I could use to call from c++?  Maybe I could do
>>>> something with that.
>>>
>>> There is not one currently. Cython has provisions for sharing such
>>> low-level access to other Cython extensions, but I'm not sure how well
>>> it works for exporting data pointers and function pointers to general
>>> C/++ code. We could probably package the necessities into a struct and
>>> export a pointer to it via a PyCapsule.
>>>
>>
>> One thing this code doesn't do: it requires construction of the wrapper class
>> passing in a RandomState object.  It doesn't verify you actually gave it a
>> RandomState object.  It's hard to do that.  The problem as I see it is to
>> perform this check, I need the RandomStateType object, which unfortunately
>> mtrand.so does not export.
>>
>> The only way to do it is in c++ code:
>>
>> 1. import numpy.random
>> 2. get RandomState class
>> 3. call it to create RandomState instance
>> 4. get the ob_type pointer.
>>
>> Pretty ugly:
>>
>>   object mod = object (handle<>
>> (borrowed((PyImport_ImportModule("numpy.random")))));
>>   object rs_obj = mod.attr("RandomState");
>>   object rs_inst = call<object> (rs_obj.ptr(), 0);
>>   RandomStateTypeObj = rs_inst.ptr()->ob_type;
> 
> PyObject_IsInstance() should be sufficient.
> 
> http://docs.python.org/2/c-api/object.html#PyObject_IsInstance
> 
> --
> Robert Kern

Thanks!
For the record, an updated version attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pn.cc
Type: text/x-c++src
Size: 7852 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130314/e658774f/attachment.cc>

From rnelsonchem at gmail.com  Thu Mar 14 09:26:32 2013
From: rnelsonchem at gmail.com (Ryan)
Date: Thu, 14 Mar 2013 13:26:32 +0000 (UTC)
Subject: [Numpy-discussion] Any help from Numpy community?
References: <CA+YeWVvhF1fbvKE1VyUiq=OMO-cjFTotDZkSUKPtEzFU0BiP1A@mail.gmail.com>
	<loom.20130314T135946-789@post.gmane.org>
Message-ID: <loom.20130314T141855-783@post.gmane.org>


> 
> Birda,
> 
> I think this will get you some of the way there:
> 
> import numpy as np
> x = ... # Here's your 2D atomic distance array
> # Create an indexing array
> index = np.arange( x.size ).reshape( x.shape )
> # Find the non-zero indices 
> items = index[ x != 0 ]
> # You only need the first half because your array is symmetric
> items = items[ : items.size/2]
> rows = items / x.shape[0]
> cols = items % x.shape[0]
> print 'Rows:   ', rows
> print 'Columns:', cols
> print 'Atomic Distances:', x[rows, cols]
> 
> Hope it helps.
> 
> Ryan
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

Whoops. 
That doesn't quite work. You shouldn't drop half the items array like that. 
This will work better (maybe ?):

import numpy as np
x = ... # Here's your 2D atomic distance array
index = np.arange( x.size ).reshape( x.shape )
items = index[ x != 0 ]
rows = items / x.shape[0]
cols = items % x.shape[0]
# This index mask should take better advantage of the array symmetry
mask = rows < cols
print 'Rows:   ', rows[mask]
print 'Columns:', cols[mask]
print 'Atomic Distances:', x[rows[mask], cols[mask]]

Ryan


From birdada85 at gmail.com  Thu Mar 14 10:51:15 2013
From: birdada85 at gmail.com (Birdada Simret)
Date: Thu, 14 Mar 2013 15:51:15 +0100
Subject: [Numpy-discussion] Any help from Numpy community?
In-Reply-To: <loom.20130314T141855-783@post.gmane.org>
References: <CA+YeWVvhF1fbvKE1VyUiq=OMO-cjFTotDZkSUKPtEzFU0BiP1A@mail.gmail.com>
	<loom.20130314T135946-789@post.gmane.org>
	<loom.20130314T141855-783@post.gmane.org>
Message-ID: <CA+YeWVswd17VCoU9XRVjPREZLXQRwui9j=t98x5SisYyRUoX8g@mail.gmail.com>

Hi Ryan,Thank you very much indeed, I'm not sure if I well understood your
code, let say, for the example array matrix given represents  H3C-CH3
connection(bonding).
the result from your code is:
Rows:    [0 0 0 0 1 1 1]  # is these for C indices?
Columns: [1 2 3 4 5 6 7]   # is these for H indices? but it shouldn't be 6
H's?
Atomic Distances: [ 1.  1.  1.  1.  1.  1.  1.] # ofcourse this is the
number of connections or bonds.

In fact, if I write in the form of dictionary: row indices as keys and
column indices as values,
{0:1, 0:2, 0:3, 0:4, 1:5, 1:6, 1:7}, So, does it mean C[0] is connected to
H[1], C[0] is connected to H[2] , H[1],....,C[1] is connected to H[7]?  But
I have only 6 H's and two C's  in this example (H3C-CH3)

I have tried some thing like: but still no luck ;(
import numpy as np
from collections import defaultdict
dict = defaultdict(list)
x=....2d numpy array
I = x.shape[0]
J = x.shape[1]
d={}
for i in xrange(0, I, 1):
  for j in xrange(0, J, 1):
     if x[i,j] > 0:
        dict[i].append(j)
# the result is:
dict:  {0: [1, 2, 3, 4], 1: [0, 5, 6, 7], 2: [0], 3: [0], 4: [0], 5: [1],
6: [1], 7: [1]})
keys: [0, 1, 2, 3, 4, 5, 6, 7]
values:  [[1, 2, 3, 4], [0, 5, 6, 7], [0], [0], [0], [1], [1], [1]]

#The H indices can be found by
 H_rows = np.nonzero(x.sum(axis=1)== 1)
result=>H_rows : [2, 3, 4, 5, 6, 7]  # six H's
I am trying to connect this indices with the dict result but I am confused!
So, now I want to produce a dictionary or what ever to produce results as:
 H[2] is connected to C[?]

                                      H[3] is connected to C[?]

                                      H[4] is connected to C[?], .....
Thanks for any help
                                               .


On Thu, Mar 14, 2013 at 2:26 PM, Ryan <rnelsonchem at gmail.com> wrote:

>
> >
> > Birda,
> >
> > I think this will get you some of the way there:
> >
> > import numpy as np
> > x = ... # Here's your 2D atomic distance array
> > # Create an indexing array
> > index = np.arange( x.size ).reshape( x.shape )
> > # Find the non-zero indices
> > items = index[ x != 0 ]
> > # You only need the first half because your array is symmetric
> > items = items[ : items.size/2]
> > rows = items / x.shape[0]
> > cols = items % x.shape[0]
> > print 'Rows:   ', rows
> > print 'Columns:', cols
> > print 'Atomic Distances:', x[rows, cols]
> >
> > Hope it helps.
> >
> > Ryan
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion <at> scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> Whoops.
> That doesn't quite work. You shouldn't drop half the items array like that.
> This will work better (maybe ?):
>
> import numpy as np
> x = ... # Here's your 2D atomic distance array
> index = np.arange( x.size ).reshape( x.shape )
> items = index[ x != 0 ]
> rows = items / x.shape[0]
> cols = items % x.shape[0]
> # This index mask should take better advantage of the array symmetry
> mask = rows < cols
> print 'Rows:   ', rows[mask]
> print 'Columns:', cols[mask]
> print 'Atomic Distances:', x[rows[mask], cols[mask]]
>
> Ryan
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130314/c391e764/attachment.html>

From chris.barker at noaa.gov  Thu Mar 14 11:40:53 2013
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Thu, 14 Mar 2013 08:40:53 -0700
Subject: [Numpy-discussion] R: R: R: R: R: fast numpy.fromfile skipping
 data chunks
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
Message-ID: <CALGmxEJ0wUwAkDgMObc_M-eWR_dNdoS8Md_o6ncJsfs9wcmNvg@mail.gmail.com>

On Thu, Mar 14, 2013 at 1:48 AM, Andrea Cimatoribus
<Andrea.Cimatoribus at nioz.nl> wrote:
> Thanks for all the feedback (on the SSD too). For what concerns "biggus" library, for working on larger-than-memory arrays, this is really interesting, but unfortunately I don't have time to test it at the moment, I will try to have a look at it in the future. I hope to see something like that implemented in numpy soon, though.

You may also want to look at carray:

https://github.com/FrancescAlted/carray

I"ve never used it, but it stores the contents of the array in a
compressed from in memory, so if you data compresses well, then it
could be a slick solution.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From rnelsonchem at gmail.com  Thu Mar 14 12:03:44 2013
From: rnelsonchem at gmail.com (Ryan)
Date: Thu, 14 Mar 2013 16:03:44 +0000 (UTC)
Subject: [Numpy-discussion] Any help from Numpy community?
References: <CA+YeWVvhF1fbvKE1VyUiq=OMO-cjFTotDZkSUKPtEzFU0BiP1A@mail.gmail.com>
	<loom.20130314T135946-789@post.gmane.org>
	<loom.20130314T141855-783@post.gmane.org>
	<CA+YeWVswd17VCoU9XRVjPREZLXQRwui9j=t98x5SisYyRUoX8g@mail.gmail.com>
Message-ID: <loom.20130314T165032-562@post.gmane.org>

Birdada Simret <birdada85 <at> gmail.com> writes:

> 
> 
> 
> Hi Ryan,Thank you very much indeed, I'm not sure if I well understood your
code, let say, for the example array matrix given represents ?H3C-CH3
connection(bonding).
> the result from your code is:
> Rows: ? ?[0 0 0 0 1 1 1] ?# is these for C indices?
> Columns: [1 2 3 4 5 6 7] ? # is these for H indices? but it shouldn't be 6 H's?
> Atomic Distances: [ 1. ?1. ?1. ?1. ?1. ?1. ?1.] # ofcourse this is the number
of connections or bonds.
> 
> In fact, if I write in the form of dictionary: row indices as keys and column
indices as values,
> {0:1, 0:2, 0:3, 0:4, 1:5, 1:6, 1:7}, So, does it mean C[0] is connected to
H[1], C[0] is connected to H[2] , H[1],....,C[1] is connected to H[7]? ?But I
have only 6 H's and two C's ?in this example (H3C-CH3)?
> 
> I have tried some thing like: but still no luck ;(
> import numpy as np
> from collections import defaultdict?
> dict = defaultdict(list)
> x=....2d numpy array
> 
> I = x.shape[0]
> J = x.shape[1]
> d={}
> for i in xrange(0, I, 1):?
> ? for j in xrange(0, J, 1):
> ? ? ?if x[i,j] > 0:
> ? ? ? ? dict[i].append(j)?
> # the result is:
> dict: ?{0: [1, 2, 3, 4], 1: [0, 5, 6, 7], 2: [0], 3: [0], 4: [0], 5: [1], 6:
[1], 7: [1]})
> keys: [0, 1, 2, 3, 4, 5, 6, 7]
> values: ?[[1, 2, 3, 4], [0, 5, 6, 7], [0], [0], [0], [1], [1], [1]]
> 
> ?
> #The H indices can be found by
> ?H_rows = np.nonzero(x.sum(axis=1)== 1) ?
> result=>H_rows : [2, 3, 4, 5, 6, 7] ?# six H's
> I am trying to connect this indices with the dict result but I am confused!
> So, now I want to produce a dictionary or what ever to produce results as:
?H[2] is connected to C[?]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? H[3] is connected to C[?]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? H[4] is connected to C[?], .....
> Thanks for any help
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?.
> 
> 
> On Thu, Mar 14, 2013 at 2:26 PM, Ryan <rnelsonchem <at> gmail.com> wrote:
> 
> 

Birda,

I don't know how your getting those values from my code. Here's a slightly
modified and fully self-contained version that includes your bonding matrix:

import numpy as np
x = np.array(
        [[ 0., 1.54, 0., 0., 0., 1.08, 1.08, 1.08 ],
        [ 1.54, 0., 1.08, 1.08, 1.08,  0.,  0.,  0. ],
        [ 0., 1.08, 0., 0., 0., 0., 0., 0. ],
        [ 0., 1.08, 0., 0., 0., 0., 0., 0. ],
        [ 0., 1.08, 0., 0., 0., 0., 0., 0. ],
        [ 1.08, 0., 0., 0., 0., 0., 0., 0. ],
        [ 1.08, 0., 0., 0., 0., 0., 0., 0. ],
        [ 1.08, 0., 0., 0., 0., 0., 0., 0. ]]
             )
atoms = np.array(['C1', 'C2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8'])
index = np.arange( x.size ).reshape( x.shape )
items = index[ x != 0 ]
rows = items / x.shape[0]
cols = items % x.shape[0]
mask = rows < cols
print 'Rows:   ', rows[mask]
print 'Columns:', cols[mask]
print 'Bond Atom 1:     ', atoms[ rows[mask] ]
print 'Bond Atom 2:     ', atoms[ cols[mask] ]
print 'Atomic Distances:', x[rows[mask], cols[mask]]

If I copy that into a file and run it, I get the following output:

Rows:    [0 0 0 0 1 1 1]                                      
Columns: [1 5 6 7 2 3 4]                                      
Bond Atom 1:      ['C1' 'C1' 'C1' 'C1' 'C2' 'C2' 'C2']        
Bond Atom 2:      ['C2' 'H6' 'H7' 'H8' 'H3' 'H4' 'H5']        
Atomic Distances: [ 1.54  1.08  1.08  1.08  1.08  1.08  1.08] 

Honestly, I did not think about your code all that much. Too many 'for' loops
for my taste. My code has quite a bit of fancy indexing, which I could imagine
is also quite confusing.

If you really want a dictionary type of interface that still let's you use Numpy
magic, I would take a look at Pandas (http://pandas.pydata.org/)

Ryan


From birdada85 at gmail.com  Thu Mar 14 12:50:34 2013
From: birdada85 at gmail.com (Birdada Simret)
Date: Thu, 14 Mar 2013 17:50:34 +0100
Subject: [Numpy-discussion] Any help from Numpy community?
In-Reply-To: <loom.20130314T165032-562@post.gmane.org>
References: <CA+YeWVvhF1fbvKE1VyUiq=OMO-cjFTotDZkSUKPtEzFU0BiP1A@mail.gmail.com>
	<loom.20130314T135946-789@post.gmane.org>
	<loom.20130314T141855-783@post.gmane.org>
	<CA+YeWVswd17VCoU9XRVjPREZLXQRwui9j=t98x5SisYyRUoX8g@mail.gmail.com>
	<loom.20130314T165032-562@post.gmane.org>
Message-ID: <CA+YeWVtPruScLdRncsDSHk5-tre6tbyz0Gj-UL4cX70dKzEgNQ@mail.gmail.com>

Oh, thanks alot. can the " atoms = np.array(['C1', 'C2', 'H3', 'H4', 'H5',
'H6', 'H7', 'H8'])" able to make  general? I mean, if I have a big
molecule, it seems difficult to label each time. Ofcourse I'm new to
python(even for programing) and I didn't had any knowhow about pandas, but
i will try it. any ways, it is great help, many thanks Ryan

Birda


On Thu, Mar 14, 2013 at 5:03 PM, Ryan <rnelsonchem at gmail.com> wrote:

> Birdada Simret <birdada85 <at> gmail.com> writes:
>
> >
> >
> >
> > Hi Ryan,Thank you very much indeed, I'm not sure if I well understood
> your
> code, let say, for the example array matrix given represents  H3C-CH3
> connection(bonding).
> > the result from your code is:
> > Rows:    [0 0 0 0 1 1 1]  # is these for C indices?
> > Columns: [1 2 3 4 5 6 7]   # is these for H indices? but it shouldn't be
> 6 H's?
> > Atomic Distances: [ 1.  1.  1.  1.  1.  1.  1.] # ofcourse this is the
> number
> of connections or bonds.
> >
> > In fact, if I write in the form of dictionary: row indices as keys and
> column
> indices as values,
> > {0:1, 0:2, 0:3, 0:4, 1:5, 1:6, 1:7}, So, does it mean C[0] is connected
> to
> H[1], C[0] is connected to H[2] , H[1],....,C[1] is connected to H[7]?
>  But I
> have only 6 H's and two C's  in this example (H3C-CH3)
> >
> > I have tried some thing like: but still no luck ;(
> > import numpy as np
> > from collections import defaultdict
> > dict = defaultdict(list)
> > x=....2d numpy array
> >
> > I = x.shape[0]
> > J = x.shape[1]
> > d={}
> > for i in xrange(0, I, 1):
> >   for j in xrange(0, J, 1):
> >      if x[i,j] > 0:
> >         dict[i].append(j)
> > # the result is:
> > dict:  {0: [1, 2, 3, 4], 1: [0, 5, 6, 7], 2: [0], 3: [0], 4: [0], 5:
> [1], 6:
> [1], 7: [1]})
> > keys: [0, 1, 2, 3, 4, 5, 6, 7]
> > values:  [[1, 2, 3, 4], [0, 5, 6, 7], [0], [0], [0], [1], [1], [1]]
> >
> >
> > #The H indices can be found by
> >  H_rows = np.nonzero(x.sum(axis=1)== 1)
> > result=>H_rows : [2, 3, 4, 5, 6, 7]  # six H's
> > I am trying to connect this indices with the dict result but I am
> confused!
> > So, now I want to produce a dictionary or what ever to produce results
> as:
>  H[2] is connected to C[?]
> >
>
>                                     H[3] is connected to C[?]
> >
>
>                                     H[4] is connected to C[?], .....
> > Thanks for any help
> >                                                .
> >
> >
> > On Thu, Mar 14, 2013 at 2:26 PM, Ryan <rnelsonchem <at> gmail.com>
> wrote:
> >
> >
>
> Birda,
>
> I don't know how your getting those values from my code. Here's a slightly
> modified and fully self-contained version that includes your bonding
> matrix:
>
> import numpy as np
> x = np.array(
>         [[ 0., 1.54, 0., 0., 0., 1.08, 1.08, 1.08 ],
>         [ 1.54, 0., 1.08, 1.08, 1.08,  0.,  0.,  0. ],
>         [ 0., 1.08, 0., 0., 0., 0., 0., 0. ],
>         [ 0., 1.08, 0., 0., 0., 0., 0., 0. ],
>         [ 0., 1.08, 0., 0., 0., 0., 0., 0. ],
>         [ 1.08, 0., 0., 0., 0., 0., 0., 0. ],
>         [ 1.08, 0., 0., 0., 0., 0., 0., 0. ],
>         [ 1.08, 0., 0., 0., 0., 0., 0., 0. ]]
>              )
> atoms = np.array(['C1', 'C2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8'])
> index = np.arange( x.size ).reshape( x.shape )
> items = index[ x != 0 ]
> rows = items / x.shape[0]
> cols = items % x.shape[0]
> mask = rows < cols
> print 'Rows:   ', rows[mask]
> print 'Columns:', cols[mask]
> print 'Bond Atom 1:     ', atoms[ rows[mask] ]
> print 'Bond Atom 2:     ', atoms[ cols[mask] ]
> print 'Atomic Distances:', x[rows[mask], cols[mask]]
>
> If I copy that into a file and run it, I get the following output:
>
> Rows:    [0 0 0 0 1 1 1]
> Columns: [1 5 6 7 2 3 4]
> Bond Atom 1:      ['C1' 'C1' 'C1' 'C1' 'C2' 'C2' 'C2']
> Bond Atom 2:      ['C2' 'H6' 'H7' 'H8' 'H3' 'H4' 'H5']
> Atomic Distances: [ 1.54  1.08  1.08  1.08  1.08  1.08  1.08]
>
> Honestly, I did not think about your code all that much. Too many 'for'
> loops
> for my taste. My code has quite a bit of fancy indexing, which I could
> imagine
> is also quite confusing.
>
> If you really want a dictionary type of interface that still let's you use
> Numpy
> magic, I would take a look at Pandas (http://pandas.pydata.org/)
>
> Ryan
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130314/80c3419f/attachment.html>

From ake.sandgren at hpc2n.umu.se  Fri Mar 15 05:19:45 2013
From: ake.sandgren at hpc2n.umu.se (Ake Sandgren)
Date: Fri, 15 Mar 2013 10:19:45 +0100
Subject: [Numpy-discussion] Possible bug in numpy 1.6.1
Message-ID: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se>

Hi!

Found this thing that looks like a bug in
core/src/multiarray/dtype_transfer.c

diff -ru site/numpy/core/src/multiarray/dtype_transfer.c
amd64_ubuntu1004-intel-acml/numpy/core/src/multiarray/dtype_transfer.c
--- site/numpy/core/src/multiarray/dtype_transfer.c     2011-07-20
20:25:28.000000000 +0200
+++
amd64_ubuntu1004-intel-acml/numpy/core/src/multiarray/dtype_transfer.c
2013-03-14 22:09:38.000000000 +0100
@@ -1064,7 +1064,7 @@
     _one_to_n_data *d = (_one_to_n_data *)data;
     PyArray_StridedTransferFn *subtransfer = d->stransfer,
                 *stransfer_finish_src = d->stransfer_finish_src;
-    void *subdata = d->data, *data_finish_src = data_finish_src;
+    void *subdata = d->data, *data_finish_src = d->data_finish_src;
     npy_intp subN = d->N, dst_itemsize = d->dst_itemsize;
 
     while (N > 0) {


-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se


From njs at pobox.com  Fri Mar 15 05:44:30 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 15 Mar 2013 09:44:30 +0000
Subject: [Numpy-discussion] Possible bug in numpy 1.6.1
In-Reply-To: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se>
References: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se>
Message-ID: <CAPJVwBmQxD63b8dmWcqET8zMh8=0C=o=jDwA=eftzxH8as_DyQ@mail.gmail.com>

That does look unlikely yeah... Does this have any consequences that you've
found? Is there a test case that fails before the patch but works after?

-n
On 15 Mar 2013 09:19, "Ake Sandgren" <ake.sandgren at hpc2n.umu.se> wrote:

> Hi!
>
> Found this thing that looks like a bug in
> core/src/multiarray/dtype_transfer.c
>
> diff -ru site/numpy/core/src/multiarray/dtype_transfer.c
> amd64_ubuntu1004-intel-acml/numpy/core/src/multiarray/dtype_transfer.c
> --- site/numpy/core/src/multiarray/dtype_transfer.c     2011-07-20
> 20:25:28.000000000 +0200
> +++
> amd64_ubuntu1004-intel-acml/numpy/core/src/multiarray/dtype_transfer.c
> 2013-03-14 22:09:38.000000000 +0100
> @@ -1064,7 +1064,7 @@
>      _one_to_n_data *d = (_one_to_n_data *)data;
>      PyArray_StridedTransferFn *subtransfer = d->stransfer,
>                  *stransfer_finish_src = d->stransfer_finish_src;
> -    void *subdata = d->data, *data_finish_src = data_finish_src;
> +    void *subdata = d->data, *data_finish_src = d->data_finish_src;
>      npy_intp subN = d->N, dst_itemsize = d->dst_itemsize;
>
>      while (N > 0) {
>
>
> --
> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
> Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130315/169c226a/attachment.html>

From ake.sandgren at hpc2n.umu.se  Fri Mar 15 05:52:45 2013
From: ake.sandgren at hpc2n.umu.se (Ake Sandgren)
Date: Fri, 15 Mar 2013 10:52:45 +0100
Subject: [Numpy-discussion] Possible bug in numpy 1.6.1
In-Reply-To: <CAPJVwBmQxD63b8dmWcqET8zMh8=0C=o=jDwA=eftzxH8as_DyQ@mail.gmail.com>
References: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se>
	<CAPJVwBmQxD63b8dmWcqET8zMh8=0C=o=jDwA=eftzxH8as_DyQ@mail.gmail.com>
Message-ID: <1363341165.25361.20.camel@lurvas.hpc2n.umu.se>

On Fri, 2013-03-15 at 09:44 +0000, Nathaniel Smith wrote:
> That does look unlikely yeah... Does this have any consequences that
> you've found? Is there a test case that fails before the patch but
> works after?

No, just found it during compilation with the intel compiler. It
complained about use before initialize on it.

And it's still there in 1.7.0


From kchristman54 at yahoo.com  Fri Mar 15 08:48:55 2013
From: kchristman54 at yahoo.com (Kevin Christman)
Date: Fri, 15 Mar 2013 05:48:55 -0700 (PDT)
Subject: [Numpy-discussion] Fox News
Message-ID: <1363351735.63163.YahooMailNeo@web121904.mail.ne1.yahoo.com>


http://ezinsurance.org/bpln/iebpmnsmrme.htm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130315/21696a14/attachment.html>

From tmp50 at ukr.net  Fri Mar 15 09:21:21 2013
From: tmp50 at ukr.net (Dmitrey)
Date: Fri, 15 Mar 2013 15:21:21 +0200
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
Message-ID: <1991.1363353681.6942528722304958464@ffe16.ukr.net>


Hi all,

>

I'm glad to inform you about new OpenOpt Suite release 0.45
(2013-March-15):
* Essential improvements for FuncDesigner interval analysis (thus affect
interalg)
* Temporary walkaround for a serious bug in FuncDesigner automatic
differentiation kernel due to a bug in some versions of Python or NumPy,
may affect optimization problems, including (MI)LP, (MI)NLP, TSP etc
* Some other minor bugfixes and improvements

>


>

---------------------------

>

Regards, D.

> http://openopt.org/Dmitrey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130315/b3c0d83d/attachment.html>

From oscar.villellas at continuum.io  Fri Mar 15 10:22:21 2013
From: oscar.villellas at continuum.io (Oscar Villellas)
Date: Fri, 15 Mar 2013 15:22:21 +0100
Subject: [Numpy-discussion] Dot/inner products with broadcasting?
In-Reply-To: <5141BA5E.2020704@aalto.fi>
References: <51408B59.8090504@aalto.fi>
	<5141BA5E.2020704@aalto.fi>
Message-ID: <CAMGgv_-uW3PBbnjWjvug732d9Ch=k+J1W03nNZnCkSqtKBhF_g@mail.gmail.com>

In fact, there is already an inner1d implemented in
numpy.core.umath_tests.inner1d

from numpy.core.umath_tests import inner1d

It should do the trick :)

On Thu, Mar 14, 2013 at 12:54 PM, Jaakko Luttinen
<jaakko.luttinen at aalto.fi> wrote:
> Answering to myself, this pull request seems to implement an inner
> product with broadcasting (inner1d) and many other useful functions:
> https://github.com/numpy/numpy/pull/2954/
> -J
>
> On 03/13/2013 04:21 PM, Jaakko Luttinen wrote:
>> Hi!
>>
>> How can I compute dot product (or similar multiply&sum operations)
>> efficiently so that broadcasting is utilized?
>> For multi-dimensional arrays, NumPy's inner and dot functions do not
>> match the leading axes and use broadcasting, but instead the result has
>> first the leading axes of the first input array and then the leading
>> axes of the second input array.
>>
>> For instance, I would like to compute the following inner-product:
>> np.sum(A*B, axis=-1)
>>
>> But numpy.inner gives:
>> A = np.random.randn(2,3,4)
>> B = np.random.randn(3,4)
>> np.inner(A,B).shape
>> # -> (2, 3, 3) instead of (2, 3)
>>
>> Similarly for dot product, I would like to compute for instance:
>> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2)
>>
>> But numpy.dot gives:
>> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5)
>> In [13]: np.dot(A,B).shape
>> # -> (2, 3, 2, 5) instead of (2, 3, 5)
>>
>> I could use einsum for these operations, but I'm not sure whether that's
>> as efficient as using some BLAS-supported(?) dot products.
>>
>> I couldn't find any function which could perform this kind of
>> operations. NumPy's functions seem to either flatten the input arrays
>> (vdot, outer) or just use the axes of the input arrays separately (dot,
>> inner, tensordot).
>>
>> Any help?
>>
>> Best regards,
>> Jaakko
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From alan.isaac at gmail.com  Fri Mar 15 14:38:20 2013
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Fri, 15 Mar 2013 14:38:20 -0400
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
Message-ID: <51436A9C.7010905@gmail.com>

On 3/15/2013 9:21 AM, Dmitrey wrote:
> Temporary walkaround for a serious bug in FuncDesigner automatic differentiation kernel due to a bug in some versions of Python or NumPy,


Are the suspected bugs documented somewhere?
Alan
PS The word 'banausic' is very rare in English.
Perhaps you meant 'unsophisticated'?


From warren.weckesser at gmail.com  Fri Mar 15 14:47:43 2013
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Fri, 15 Mar 2013 14:47:43 -0400
Subject: [Numpy-discussion] Add ability to disable the autogeneration of the
 function signature in a ufunc docstring.
Message-ID: <CAGzF1udzYGDnKNvYg5ZYuzkg4g4d6bHYLTUD91Gk4nZhLaWkWQ@mail.gmail.com>

Hi all,

In a recent scipy pull request (https://github.com/scipy/scipy/pull/459), I
ran into the problem of ufuncs automatically generating a signature in the
docstring using arguments such as 'x' or 'x1, x2'.  scipy.special has a lot
of ufuncs, and for most of them, there are much more descriptive or
conventional argument names than 'x'.  For now, we will include a nicer
signature in the added docstring, and grudgingly put up with the one
generated by the ufunc.  In the long term, it would be nice to be able to
disable the automatic generation of the signature.  I submitted a pull
request to numpy to allow that: https://github.com/numpy/numpy/pull/3149

Comments on the pull request would be appreciated.

Thanks,

Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130315/a60277e5/attachment.html>

From tmp50 at ukr.net  Fri Mar 15 15:34:36 2013
From: tmp50 at ukr.net (Dmitrey)
Date: Fri, 15 Mar 2013 21:34:36 +0200
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <51436A9C.7010905@gmail.com>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
Message-ID: <27202.1363376076.16920329947726938112@ffe12.ukr.net>


--- ???????? ????????? ---

> ?? ????: "Alan G Isaac" <alan.isaac at gmail.com>
????: 15 ????? 2013, 20:38:38

On 3/15/2013 9:21 AM, Dmitrey wrote:
> Temporary walkaround for a serious bug in FuncDesigner automatic differentiation kernel due to a bug in some versions of Python or NumPy,


Are the suspected bugs documented somewhere?

the suspected bugs are not documented yet, I guess it will be fixed in
future versions of Python or numpy
the bug is hard to locate and isolate, it looks like this:

derivative_items = list(pointDerivative.items())

# temporary walkaround for a bug in Python or numpy
derivative_items.sort(key=lambda elem: elem[0])
######################################

for key, val in derivative_items:
indexes = oovarsIndDict[key]

# this line is not reached in the involved buggy case
if not involveSparse and isspmatrix(val): val = val.A

if r.ndim == 1:
r[indexes[0]:indexes[1]] = val.flatten() if type(val) == ndarray else val
else:
# this line is not reached in the involved buggy case
r[:, indexes[0]:indexes[1]] = val if val.shape == r.shape else
val.reshape((funcLen, prod(val.shape)/funcLen))

so, pointDerivative is Python dict of pairs (F_i, N_i), where F_i are
hashable objects, and even for the case when N_i are ordinary scalars
(they can be numpy arrays or scipy sparse matrices) results of this code
are different wrt was or was not derivative_items.sort() performed; total
number of nonzero elements is same for both cases. oovarsIndDict is dict
of pairs (F_i, (n_start_i, n_end_i)), and for the case N_i are all
scalars for all i n_end_i = n_start_i - 1.


?Alan
PS The word 'banausic' is very rare in English.
Perhaps you meant 'unsophisticated'?


google translate tells me "banausic" is more appropriate translation than
"unsophisticated" for the sense I meant (those frameworks are aimed on
modelling only numerical optimization problems, while FuncDesigner is
suitable for modelling of systems of linear, nonlinear, ordinary
differential equations, eigenvalue problems, interval analysis and much
more).
D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130315/45a6c4c5/attachment.html>

From njs at pobox.com  Fri Mar 15 16:04:12 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 15 Mar 2013 20:04:12 +0000
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <27202.1363376076.16920329947726938112@ffe12.ukr.net>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
Message-ID: <CAPJVwB=Hq7Dou8Lg8AMKJszATxRsVn1Wpxpen9XNqMvADWR7PQ@mail.gmail.com>

On Fri, Mar 15, 2013 at 7:34 PM, Dmitrey <tmp50 at ukr.net> wrote:
> --- ???????? ????????? ---
>
> ?? ????: "Alan G Isaac" <alan.isaac at gmail.com>
> ????: 15 ????? 2013, 20:38:38
>
> On 3/15/2013 9:21 AM, Dmitrey wrote:
>> Temporary walkaround for a serious bug in FuncDesigner automatic
>> differentiation kernel due to a bug in some versions of Python or NumPy,
>
>
> Are the suspected bugs documented somewhere?
>
> the suspected bugs are not documented yet, I guess it will be fixed in
> future versions of Python or numpy
> the bug is hard to locate and isolate, it looks like this:
>
>             derivative_items = list(pointDerivative.items())
>
>             # temporary walkaround for a bug in Python or numpy
>             derivative_items.sort(key=lambda elem: elem[0])
>             ######################################
>
>             for key, val in derivative_items:
>                 indexes = oovarsIndDict[key]
>
>                 # this line is not reached in the involved buggy case
>                 if not involveSparse and isspmatrix(val): val = val.A
>
>                 if r.ndim == 1:
>                     r[indexes[0]:indexes[1]] = val.flatten() if type(val) ==
> ndarray else val
>                 else:
>                     # this line is not reached in the involved b uggy case
>                     r[:, indexes[0]:indexes[1]] = val if val.shape ==
> r.shape else val.reshape((funcLen, prod(val.shape)/funcLen))
>
> so, pointDerivative is Python dict of pairs (F_i, N_i), where F_i are
> hashable objects, and even for the case when N_i are ordinary scalars (they
> can be numpy arrays or scipy sparse matrices) results of this code are
> different wrt was or was not derivative_items.sort() performed; total number
> of nonzero elements is same for both cases. oovarsIndDict is dict of pairs
> (F_i, (n_start_i, n_end_i)), and for the case N_i are all scalars for all i
> n_end_i = n_start_i - 1.

If you can turn this into a minimal self-contained working example we
can take a look...

-n


From njs at pobox.com  Fri Mar 15 16:05:48 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 15 Mar 2013 20:05:48 +0000
Subject: [Numpy-discussion] Possible bug in numpy 1.6.1
In-Reply-To: <1363341165.25361.20.camel@lurvas.hpc2n.umu.se>
References: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se>
	<CAPJVwBmQxD63b8dmWcqET8zMh8=0C=o=jDwA=eftzxH8as_DyQ@mail.gmail.com>
	<1363341165.25361.20.camel@lurvas.hpc2n.umu.se>
Message-ID: <CAPJVwBnyR61EVr=W2BkFTZkj2VL-aLGjnL82vnV6te1DCJkWQg@mail.gmail.com>

On Fri, Mar 15, 2013 at 9:52 AM, Ake Sandgren <ake.sandgren at hpc2n.umu.se> wrote:
> On Fri, 2013-03-15 at 09:44 +0000, Nathaniel Smith wrote:
>> That does look unlikely yeah... Does this have any consequences that
>> you've found? Is there a test case that fails before the patch but
>> works after?
>
> No, just found it during compilation with the intel compiler. It
> complained about use before initialize on it.
>
> And it's still there in 1.7.0

Clever compiler. Since no-one has jumped up to investigate yet, can
you file a bug on the github tracker, so at least it doesn't get lost
entirely before someone finds the time to do that?

-n


From njs at pobox.com  Fri Mar 15 16:39:45 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 15 Mar 2013 20:39:45 +0000
Subject: [Numpy-discussion] Add ability to disable the autogeneration of
 the function signature in a ufunc docstring.
In-Reply-To: <CAGzF1udzYGDnKNvYg5ZYuzkg4g4d6bHYLTUD91Gk4nZhLaWkWQ@mail.gmail.com>
References: <CAGzF1udzYGDnKNvYg5ZYuzkg4g4d6bHYLTUD91Gk4nZhLaWkWQ@mail.gmail.com>
Message-ID: <CAPJVwBkZc-EQGiGc3dWhr_LHW8MAnTg7xqrZkWtsFuQOcpQhkg@mail.gmail.com>

On Fri, Mar 15, 2013 at 6:47 PM, Warren Weckesser
<warren.weckesser at gmail.com> wrote:
> Hi all,
>
> In a recent scipy pull request (https://github.com/scipy/scipy/pull/459), I
> ran into the problem of ufuncs automatically generating a signature in the
> docstring using arguments such as 'x' or 'x1, x2'.  scipy.special has a lot
> of ufuncs, and for most of them, there are much more descriptive or
> conventional argument names than 'x'.  For now, we will include a nicer
> signature in the added docstring, and grudgingly put up with the one
> generated by the ufunc.  In the long term, it would be nice to be able to
> disable the automatic generation of the signature.  I submitted a pull
> request to numpy to allow that: https://github.com/numpy/numpy/pull/3149
>
> Comments on the pull request would be appreciated.

The functionality seems obviously useful, but adding a magic public
attribute to all ufuncs seems like a somewhat clumsy way to expose it?
Esp. since ufuncs are always created through the C API, including
docstring specification, but this can only be set at the Python level?
Maybe it's the best option but it seems worth taking a few minutes to
consider alternatives.

Brainstorming:

- If the first line of the docstring starts with "<funcname>(" and
ends with ")", then that's a signature and we skip adding one (I think
sphinx does something like this?) Kinda magic and implicit, but highly
backwards compatible.

- Declare that henceforth, the signature generation will be disabled
by default, and go through and add a special marker like
"__SIGNATURE__" to all the existing ufunc docstrings, which gets
replaced (if present) by the automagically generated signature.

- Give ufunc arguments actual names in general, that work for things
like kwargs, and then use those in the automagically generated
signature. This is the most work, but it would mean that people don't
have to remember to update their non-magic signatures whenever numpy
adds a new feature like out= or where=, and would make the docstrings
actually accurate, which right now they aren't:

In [7]: np.add.__doc__.split("\n")[0]
Out[7]: 'add(x1, x2[, out])'

In [8]: np.add(x1=1, x2=2)
ValueError: invalid number of arguments

- Allow some special syntax to describe the argument names in the
docstring: "__ARGNAMES__: a b\n" -> "add(a, b[, out])"

- Something else...

-n


From alan.isaac at gmail.com  Fri Mar 15 16:54:10 2013
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Fri, 15 Mar 2013 16:54:10 -0400
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <27202.1363376076.16920329947726938112@ffe12.ukr.net>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
Message-ID: <51438A72.3000301@gmail.com>

On 3/15/2013 3:34 PM, Dmitrey wrote:
> the suspected bugs are not documented yet


I'm going to guess that the state of the F_i changes
when you use them as keys (i.e., when you call __le__.
It is very hard to imagine that this is a Python or NumPy bug.

Cheers,
Alan


From pav at iki.fi  Fri Mar 15 17:19:03 2013
From: pav at iki.fi (Pauli Virtanen)
Date: Fri, 15 Mar 2013 23:19:03 +0200
Subject: [Numpy-discussion] Add ability to disable the autogeneration of
 the function signature in a ufunc docstring.
In-Reply-To: <CAPJVwBkZc-EQGiGc3dWhr_LHW8MAnTg7xqrZkWtsFuQOcpQhkg@mail.gmail.com>
References: <CAGzF1udzYGDnKNvYg5ZYuzkg4g4d6bHYLTUD91Gk4nZhLaWkWQ@mail.gmail.com>
	<CAPJVwBkZc-EQGiGc3dWhr_LHW8MAnTg7xqrZkWtsFuQOcpQhkg@mail.gmail.com>
Message-ID: <ki0384$uig$1@ger.gmane.org>

15.03.2013 22:39, Nathaniel Smith kirjoitti:
[clip]
> - Something else...

How about: scrap the automatic signatures altogether, and directly use
the docstring provided to the ufunc creation function?

I suspect ufuncs are not very widely used in 3rd party code, as it
requires somewhat tricky messing with the C API. The backwards
compatibility issue is also just a documentation issue, so nothing drastic.

-- 
Pauli Virtanen


From tmp50 at ukr.net  Sat Mar 16 05:31:37 2013
From: tmp50 at ukr.net (Dmitrey)
Date: Sat, 16 Mar 2013 11:31:37 +0200
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <51438A72.3000301@gmail.com>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
Message-ID: <6447.1363426297.5285784474692943872@ffe6.ukr.net>


--- ???????? ????????? ---
?? ????: "Alan G Isaac" <alan.isaac at gmail.com>
????: 15 ????? 2013, 22:54:21

On 3/15/2013 3:34 PM, Dmitrey wrote:
> the suspected bugs are not documented yet


I'm going to guess that the state of the F_i changes
when you use them as keys (i.e., when you call __le__.

no, their state doesn't change for operations like __le__ . AFAIK
searching Python dict doesn't calls __le__ on the object keys at all, it
operates with method .__hash__(), and latter returns fixed integer
numbers assigned to the objects earlier (at least in my case).


?It is very hard to imagine that this is a Python or NumPy bug.

Cheers,
Alan

_______________________________________________
NumPy-Discussion mailing list  NumPy-Discussion at scipy.org    http://mail.scipy.org/mailman/listinfo/numpy-discussion  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/ac7b9b25/attachment.html>

From matthieu.brucher at gmail.com  Sat Mar 16 05:33:32 2013
From: matthieu.brucher at gmail.com (Matthieu Brucher)
Date: Sat, 16 Mar 2013 09:33:32 +0000
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <6447.1363426297.5285784474692943872@ffe6.ukr.net>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
	<6447.1363426297.5285784474692943872@ffe6.ukr.net>
Message-ID: <CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>

Hi,

Different objects can have the same hash, so it compares to find the actual
correct object.
Usually when you store something in a dict and later you can't find it
anymore, it is that the internal state changed and that the hash is not the
same anymore.

Matthieu


2013/3/16 Dmitrey <tmp50 at ukr.net>

>
>
> --- ???????? ????????? ---
> ?? ????: "Alan G Isaac" <alan.isaac at gmail.com>
> ????: 15 ????? 2013, 22:54:21
>
> On 3/15/2013 3:34 PM, Dmitrey wrote:
> > the suspected bugs are not documented yet
>
>
> I'm going to guess that the state of the F_i changes
> when you use them as keys (i.e., when you call __le__.
>
> no, their state doesn't change for operations like __le__ . AFAIK
> searching Python dict doesn't calls __le__ on the object keys at all, it
> operates with method .__hash__(), and latter returns fixed integer numbers
> assigned to the objects earlier (at least in my case).
>
>
>  It is very hard to imagine that this is a Python or NumPy bug.
>
> Cheers,
> Alan
>
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/b32e4c78/attachment.html>

From tmp50 at ukr.net  Sat Mar 16 06:36:10 2013
From: tmp50 at ukr.net (Dmitrey)
Date: Sat, 16 Mar 2013 12:36:10 +0200
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
	<6447.1363426297.5285784474692943872@ffe6.ukr.net>
	<CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>
Message-ID: <90157.1363430170.9409858753789952@ffe8.ukr.net>


--- ???????? ????????? ---
?? ????: "Matthieu Brucher" <matthieu.brucher at gmail.com>
????: 16 ????? 2013, 11:33:39

Hi,

Different objects can have the same hash, so it compares to find the
actual correct object.
Usually when you store something in a dict and later you can't find
it anymore, it is that the internal state changed and that the hash
is not the same anymore.


my objects (oofuns) definitely have different __hash__() results - it's
just integers 1,2,3 etc assigned to the oofuns (stored in oofun._id
field) when they are created.

D.


Matthieu


2013/3/16 Dmitrey <tmp50 at ukr.net>


--- ???????? ????????? ---
?? ????: "Alan G Isaac" <alan.isaac at gmail.com>
????: 15 ????? 2013, 22:54:21

On 3/15/2013 3:34 PM, Dmitrey wrote:
> the suspected bugs are not documented yet


I'm going to guess that the state of the F_i changes
when you use them as keys (i.e., when you call __le__.    

no, their state doesn't change for operations like __le__ . AFAIK
searching Python dict doesn't calls __le__ on the object keys at
all, it operates with method .__hash__(), and latter returns
fixed integer numbers assigned to the objects earlier (at least
in my case).


?It is very hard to imagine that this is a Python or NumPy bug.

Cheers,
Alan

_______________________________________________
NumPy-Discussion mailing list      NumPy-Discussion at scipy.org            http://mail.scipy.org/mailman/listinfo/numpy-discussion          


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

_______________________________________________
NumPy-Discussion mailing list  NumPy-Discussion at scipy.org    http://mail.scipy.org/mailman/listinfo/numpy-discussion  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/4d945872/attachment.html>

From matthieu.brucher at gmail.com  Sat Mar 16 06:39:05 2013
From: matthieu.brucher at gmail.com (Matthieu Brucher)
Date: Sat, 16 Mar 2013 10:39:05 +0000
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <90157.1363430170.9409858753789952@ffe8.ukr.net>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
	<6447.1363426297.5285784474692943872@ffe6.ukr.net>
	<CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>
	<90157.1363430170.9409858753789952@ffe8.ukr.net>
Message-ID: <CAHCaCkKYKSDig8T6CXEZt-BciBpbcTFcfzJ3L+nPzRROmiA5DQ@mail.gmail.com>

Even if they have different hashes, they can be stored in the same
underlying list before they are retrieved. Then, an actual comparison is
done to check if the given key (i.e. object instance, not hash) is the same
as one of the stored keys.


2013/3/16 Dmitrey <tmp50 at ukr.net>

>
>
> --- ???????? ????????? ---
> ?? ????: "Matthieu Brucher" <matthieu.brucher at gmail.com>
> ????: 16 ????? 2013, 11:33:39
>
> Hi,
>
> Different objects can have the same hash, so it compares to find the
> actual correct object.
> Usually when you store something in a dict and later you can't find it
> anymore, it is that the internal state changed and that the hash is not the
> same anymore.
>
>
> my objects (oofuns) definitely have different __hash__() results - it's
> just integers 1,2,3 etc assigned to the oofuns (stored in oofun._id field)
> when they are created.
>
>
> D.
>
>
>
> Matthieu
>
>
> 2013/3/16 Dmitrey <tmp50 at ukr.net>
>
>
>
> --- ???????? ????????? ---
> ?? ????: "Alan G Isaac" <alan.isaac at gmail.com>
> ????: 15 ????? 2013, 22:54:21
>
> On 3/15/2013 3:34 PM, Dmitrey wrote:
> > the suspected bugs are not documented yet
>
>
> I'm going to guess that the state of the F_i changes
> when you use them as keys (i.e., when you call __le__.
>
> no, their state doesn't change for operations like __le__ . AFAIK
> searching Python dict doesn't calls __le__ on the object keys at all, it
> operates with method .__hash__(), and latter returns fixed integer numbers
> assigned to the objects earlier (at least in my case).
>
>
>  It is very hard to imagine that this is a Python or NumPy bug.
>
> Cheers,
> Alan
>
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> --
> Information System Engineer, Ph.D.
> Blog: http://matt.eifelle.com
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
> Music band: http://liliejay.com/
>
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/ffac025d/attachment.html>

From tmp50 at ukr.net  Sat Mar 16 07:48:59 2013
From: tmp50 at ukr.net (Dmitrey)
Date: Sat, 16 Mar 2013 13:48:59 +0200
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <CAHCaCkKYKSDig8T6CXEZt-BciBpbcTFcfzJ3L+nPzRROmiA5DQ@mail.gmail.com>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
	<6447.1363426297.5285784474692943872@ffe6.ukr.net>
	<CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>
	<90157.1363430170.9409858753789952@ffe8.ukr.net>
	<CAHCaCkKYKSDig8T6CXEZt-BciBpbcTFcfzJ3L+nPzRROmiA5DQ@mail.gmail.com>
Message-ID: <10909.1363434539.16610087277036634112@ffe16.ukr.net>


--- ???????? ????????? ---
?? ????: "Matthieu Brucher" <matthieu.brucher at gmail.com>
????: 16 ????? 2013, 12:39:07

Even if they have different hashes, they can be stored in the same
underlying list before they are retrieved. Then, an actual comparison
is done to check if the given key (i.e. object instance, not hash) is
the same as one of the stored keys.


but, as I have already mentioned, comparison of oofun(s) via __le__,
__eq__ etc doesn't change their inner state (but the methods can create
additional oofun(s), although).
I have checked via debugger - my methods __le__, __eq__, __lt__, __gt__,
__ge__ are not called from the buggy place of code, only __hash__ is
called from there. Python could check key objects equivalence via id(),
although, but I don't see any possible bug source from using id().
D.


2013/3/16 Dmitrey <tmp50 at ukr.net>


--- ???????? ????????? ---
?? ????: "Matthieu Brucher" <matthieu.brucher at gmail.com>
????: 16 ????? 2013, 11:33:39

Hi,

Different objects can have the same hash, so it compares to
find the actual correct object.
Usually when you store something in a dict and later you
can't find it anymore, it is that the internal state changed
and that the hash is not the same anymore.


my objects (oofuns) definitely have different __hash__() results
- it's just integers 1,2,3 etc assigned to the oofuns (stored in
oofun._id field) when they are created.

D.


Matthieu


2013/3/16 Dmitrey <tmp50 at ukr.net>


--- ???????? ????????? ---
?? ????: "Alan G Isaac" <alan.isaac at gmail.com>
????: 15 ????? 2013, 22:54:21

On 3/15/2013 3:34 PM, Dmitrey wrote:
> the suspected bugs are not documented yet


I'm going to guess that the state of the F_i changes
when you use them as keys (i.e., when you call __le__.        

no, their state doesn't change for operations like __le__
. AFAIK searching Python dict doesn't calls __le__ on the
object keys at all, it operates with method .__hash__(),
and latter returns fixed integer numbers assigned to the
objects earlier (at least in my case).


?It is very hard to imagine that this is a Python or NumPy bug.

Cheers,
Alan

_______________________________________________
NumPy-Discussion mailing list          NumPy-Discussion at scipy.org                    http://mail.scipy.org/mailman/listinfo/numpy-discussion                  


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

_______________________________________________
NumPy-Discussion mailing list      NumPy-Discussion at scipy.org            http://mail.scipy.org/mailman/listinfo/numpy-discussion          


--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/2102a545/attachment.html>

From njs at pobox.com  Sat Mar 16 08:11:47 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 16 Mar 2013 12:11:47 +0000
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <10909.1363434539.16610087277036634112@ffe16.ukr.net>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
	<6447.1363426297.5285784474692943872@ffe6.ukr.net>
	<CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>
	<90157.1363430170.9409858753789952@ffe8.ukr.net>
	<CAHCaCkKYKSDig8T6CXEZt-BciBpbcTFcfzJ3L+nPzRROmiA5DQ@mail.gmail.com>
	<10909.1363434539.16610087277036634112@ffe16.ukr.net>
Message-ID: <CAPJVwB=ysRM=+KP914fihvzdGaXJkjbKeBkr+f0EkueJ30Aryw@mail.gmail.com>

On 16 Mar 2013 11:49, "Dmitrey" <tmp50 at ukr.net> wrote:
>
>
>
>
> --- ???????? ????????? ---
> ?? ????: "Matthieu Brucher" <matthieu.brucher at gmail.com>
> ????: 16 ????? 2013, 12:39:07
>
>> Even if they have different hashes, they can be stored in the same
underlying list before they are retrieved. Then, an actual comparison is
done to check if the given key (i.e. object instance, not hash) is the same
as one of the stored keys.
>
>
>>
> but, as I have already mentioned, comparison of oofun(s) via __le__,
__eq__ etc doesn't change their inner state (but the methods can create
additional oofun(s), although).
> I have checked via debugger - my methods __le__, __eq__, __lt__, __gt__,
__ge__ are not called from the buggy place of code, only __hash__ is called
from there. Python could check key objects equivalence via id(), although,
but I don't see any possible bug source from using id().

Dict lookup always calls both __hash__ and __eq__. I guess it might use
id() to shortcut the __eq__ call in some cases - there are some places in
python that do.

Anyway there's no point trying to debug this code by ESP... It's not even
clear from what's been said whether dict lookups have anything to do with
the problem.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/e26ed190/attachment.html>

From chaoyuejoy at gmail.com  Sat Mar 16 12:40:51 2013
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Sat, 16 Mar 2013 17:40:51 +0100
Subject: [Numpy-discussion] indexing of arbitrary axis and arbitrary slice?
Message-ID: <CAAN-aRGRb-aLnrw3kwWq4Gp7GGjswpq2U4HOgFRk+XTGKpuhPw@mail.gmail.com>

Dear all,

Is there some way to index the numpy array by specifying arbitrary axis and
arbitrary slice, while
not knowing the actual shape of the data?
For example, I have a 3-dim data, data.shape = (3,4,5)
Is there a way to retrieve data[:,0,:] by using something like
np.retrieve_data(data,axis=2,slice=0),
by this way you don't have to know the actual shape of the array.
for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually be
data[:,0,:,:]

thanks in advance,

Chao

-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/1f3f8e43/attachment.html>

From njs at pobox.com  Sat Mar 16 12:49:13 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 16 Mar 2013 16:49:13 +0000
Subject: [Numpy-discussion] indexing of arbitrary axis and arbitrary
	slice?
In-Reply-To: <CAAN-aRGRb-aLnrw3kwWq4Gp7GGjswpq2U4HOgFRk+XTGKpuhPw@mail.gmail.com>
References: <CAAN-aRGRb-aLnrw3kwWq4Gp7GGjswpq2U4HOgFRk+XTGKpuhPw@mail.gmail.com>
Message-ID: <CAPJVwBkNT2CFLbANWwBWwp+rW6b7yJz1J3VdeEbCYhwmT5+omw@mail.gmail.com>

On 16 Mar 2013 16:41, "Chao YUE" <chaoyuejoy at gmail.com> wrote:
>
> Dear all,
>
> Is there some way to index the numpy array by specifying arbitrary axis
and arbitrary slice, while
> not knowing the actual shape of the data?
> For example, I have a 3-dim data, data.shape = (3,4,5)
> Is there a way to retrieve data[:,0,:] by using something like
np.retrieve_data(data,axis=2,slice=0),
> by this way you don't have to know the actual shape of the array.
> for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually
be data[:,0,:,:]

I don't know of anything quite like that, but it's easy to fake it:

def retrieve_data(a, ax, idx):
    full_idx = [slice(None)] * a.ndim
    full_idx[ax] = idx
    return a[tuple(full_idx)]

Or for the specific case where you do know the axis in advance, you just
don't know how many trailing axes there are, use
    a[:, :, 0, ...]
and the ... will expand to represent the appropriate number of :'s.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/4bf7a2fc/attachment.html>

From robert.kern at gmail.com  Sat Mar 16 13:54:28 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 16 Mar 2013 17:54:28 +0000
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <CAHCaCkKYKSDig8T6CXEZt-BciBpbcTFcfzJ3L+nPzRROmiA5DQ@mail.gmail.com>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
	<6447.1363426297.5285784474692943872@ffe6.ukr.net>
	<CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>
	<90157.1363430170.9409858753789952@ffe8.ukr.net>
	<CAHCaCkKYKSDig8T6CXEZt-BciBpbcTFcfzJ3L+nPzRROmiA5DQ@mail.gmail.com>
Message-ID: <CAF6FJitjKPJ3Y9KXffeRhy+6csb3UmG-=We+oXiyX13+FDdWuw@mail.gmail.com>

On Sat, Mar 16, 2013 at 10:39 AM, Matthieu Brucher
<matthieu.brucher at gmail.com> wrote:
> Even if they have different hashes, they can be stored in the same
> underlying list before they are retrieved. Then, an actual comparison is
> done to check if the given key (i.e. object instance, not hash) is the same
> as one of the stored keys.

Right. And the rule is that if two objects compare equal, then they
must also hash equal. Unfortunately, it looks like `oofun` objects do
not obey this property. oofun.__eq__() seems to return a Constraint
rather than a bool, so oofun objects should simply not be used as
dictionary keys. That's quite possibly the source of the bug. Or at
least, that's a bug that needs to get fixed first before attempting to
debug anything else or attribute bugs to Python or numpy. Also, the
lack of a bool-returning __eq__() will prevent proper sorting, which
also seems to be used in the code snippet that Dmitrey showed.

--
Robert Kern


From tmp50 at ukr.net  Sat Mar 16 14:19:26 2013
From: tmp50 at ukr.net (Dmitrey)
Date: Sat, 16 Mar 2013 20:19:26 +0200
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <CAF6FJitjKPJ3Y9KXffeRhy+6csb3UmG-=We+oXiyX13+FDdWuw@mail.gmail.com>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
	<6447.1363426297.5285784474692943872@ffe6.ukr.net>
	<CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>
	<90157.1363430170.9409858753789952@ffe8.ukr.net>
	<CAHCaCkKYKSDig8T6CXEZt-BciBpbcTFcfzJ3L+nPzRROmiA5DQ@mail.gmail.com>
	<CAF6FJitjKPJ3Y9KXffeRhy+6csb3UmG-=We+oXiyX13+FDdWuw@mail.gmail.com>
Message-ID: <29942.1363457966.17802425454193213440@ffe11.ukr.net>


--- ???????? ????????? ---
?? ????: "Robert Kern" <robert.kern at gmail.com>
????: 16 ????? 2013, 19:54:51

On Sat, Mar 16, 2013 at 10:39 AM, Matthieu Brucher
<  matthieu.brucher at gmail.com  > wrote:
> Even if they have different hashes, they can be stored in the same
> underlying list before they are retrieved. Then, an actual comparison is
> done to check if the given key (i.e. object instance, not hash) is the same
> as one of the stored keys.

Right. And the rule is that if two objects compare equal, then they
must also hash equal. Unfortunately, it looks like `oofun` objects do
not obey this property. oofun.__eq__() seems to return a Constraint
rather than a bool, so oofun objects should simply not be used as
dictionary keys. 

It is one of several base features FuncDesigner is build on and is used
extremely often and wide; then whole FuncDesigner would work incorrectly
while it is used intensively and solves many problems better than its
competitors.


That's quite possibly the source of the bug. Or at
least, that's a bug that needs to get fixed first before attempting to
debug anything else or attribute bugs to Python or numpy. Also, the
lack of a bool-returning __eq__() will prevent proper sorting, which
also seems to be used in the code snippet that Dmitrey showed.

as I have already mentioned, I ensured via debugger that my __eq__,
__le__ etc are not involved from the buggy place of the code, only
__hash__ is involved from there.


?--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130316/06bad2c0/attachment.html>

From robert.kern at gmail.com  Sat Mar 16 16:14:46 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 16 Mar 2013 20:14:46 +0000
Subject: [Numpy-discussion] OpenOpt Suite release 0.45
In-Reply-To: <29942.1363457966.17802425454193213440@ffe11.ukr.net>
References: <1991.1363353681.6942528722304958464@ffe16.ukr.net>
	<51436A9C.7010905@gmail.com>
	<27202.1363376076.16920329947726938112@ffe12.ukr.net>
	<51438A72.3000301@gmail.com>
	<6447.1363426297.5285784474692943872@ffe6.ukr.net>
	<CAHCaCkKmpscXvPgc2r3npSomi9ZdB5E_gjTPnxD6RgzT7XyJZQ@mail.gmail.com>
	<90157.1363430170.9409858753789952@ffe8.ukr.net>
	<CAHCaCkKYKSDig8T6CXEZt-BciBpbcTFcfzJ3L+nPzRROmiA5DQ@mail.gmail.com>
	<CAF6FJitjKPJ3Y9KXffeRhy+6csb3UmG-=We+oXiyX13+FDdWuw@mail.gmail.com>
	<29942.1363457966.17802425454193213440@ffe11.ukr.net>
Message-ID: <CAF6FJivJkEwdy_nJrXZeYxogC4gkx7A=j8HtAW2y6H_AQTN-5A@mail.gmail.com>

On Sat, Mar 16, 2013 at 6:19 PM, Dmitrey <tmp50 at ukr.net> wrote:
>
>
> --- ???????? ????????? ---
> ?? ????: "Robert Kern" <robert.kern at gmail.com>
> ????: 16 ????? 2013, 19:54:51
>
> On Sat, Mar 16, 2013 at 10:39 AM, Matthieu Brucher
> <matthieu.brucher at gmail.com> wrote:
>> Even if they have different hashes, they can be stored in the same
>> underlying list before they are retrieved. Then, an actual comparison is
>> done to check if the given key (i.e. object instance, not hash) is the
>> same
>> as one of the stored keys.
>
> Right. And the rule is that if two objects compare equal, then they
> must also hash equal. Unfortunately, it looks like `oofun` objects do
> not obey this property. oofun.__eq__() seems to return a Constraint
> rather than a bool, so oofun objects should simply not be used as
> dictionary keys.
>
> It is one of several base features FuncDesigner is build on and is used
> extremely often and wide; then whole FuncDesigner would work incorrectly
> while it is used intensively and solves many problems better than its
> competitors.

I understand. It just means that you can't oofun objects as dictionary
keys. Adding a __hash__() method is not enough to make that work.

> That's quite possibly the source of the bug. Or at
> least, that's a bug that needs to get fixed first before attempting to
> debug anything else or attribute bugs to Python or numpy. Also, the
> lack of a bool-returning __eq__() will prevent proper sorting, which
> also seems to be used in the code snippet that Dmitrey showed.
>
> as I have already mentioned, I ensured via debugger that my __eq__, __le__
> etc are not involved from the buggy place of the code, only __hash__ is
> involved from there.

oofun.__lt__() will certainly be called, and it too is problematic. If
pointDerivates is a dict mapping oofun objects to other objects as you
say, then derivative_items will be a list of (oofun, object) tuples.
If you sort derivative_items by the first element, the oofun objects,
then oofun.__lt__() *will* be called. That's how list.sort() works. I
was wrong: oofun.__eq__() won't be called by the sorting.

You are probably not seeing the oofun.__eq__() problem in that code
because of an implementation detail in Python: dicts will check
identity first before trying to compare with __eq__(). You may be
having problems in the construction of the pointDerivates dict or
ooVarsIndDict outside of this code snippet, so if you just ran your
debugger over this code snippet, you would not detect those calls.
However, if you are not seeing the oofun.__lt__() calls from the
sorting with your debugger, then your debugger may be missing the
oofun.__eq__() calls, too.

By all means, if you still think the bug is in someone else's code,
please post a short example that other people can run that will
demonstrate the problem.

--
Robert Kern


From njs at pobox.com  Sat Mar 16 17:23:35 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 16 Mar 2013 21:23:35 +0000
Subject: [Numpy-discussion] Add ability to disable the autogeneration of
 the function signature in a ufunc docstring.
In-Reply-To: <ki0384$uig$1@ger.gmane.org>
References: <CAGzF1udzYGDnKNvYg5ZYuzkg4g4d6bHYLTUD91Gk4nZhLaWkWQ@mail.gmail.com>
	<CAPJVwBkZc-EQGiGc3dWhr_LHW8MAnTg7xqrZkWtsFuQOcpQhkg@mail.gmail.com>
	<ki0384$uig$1@ger.gmane.org>
Message-ID: <CAPJVwBnw5jD+3qegDXTLKXrUYiHzUjqiYjgXzNmjT4AF4EFMfw@mail.gmail.com>

On Fri, Mar 15, 2013 at 9:19 PM, Pauli Virtanen <pav at iki.fi> wrote:
> 15.03.2013 22:39, Nathaniel Smith kirjoitti:
> [clip]
>> - Something else...
>
> How about: scrap the automatic signatures altogether, and directly use
> the docstring provided to the ufunc creation function?
>
> I suspect ufuncs are not very widely used in 3rd party code, as it
> requires somewhat tricky messing with the C API. The backwards
> compatibility issue is also just a documentation issue, so nothing drastic.

True enough. I guess a question is how much it bothers us that there
are tons of ufunc arguments that are just not mentioned in the
interpreter docstrings:
  http://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments
Obviously not a huge amount of we'd have altered the auto-generation
already to include them :-) But IMHO it would be kind of nice if
?np.add mentioned the existence of things like where= and dtype=...
and if we decide that docstrings ought to mention such things, then
it's going to be a right hassle updating them all by hand every time
some new ufunc feature is added.

-n


From chaoyuejoy at gmail.com  Mon Mar 18 05:25:41 2013
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Mon, 18 Mar 2013 10:25:41 +0100
Subject: [Numpy-discussion] indexing of arbitrary axis and arbitrary
	slice?
In-Reply-To: <CAPJVwBkNT2CFLbANWwBWwp+rW6b7yJz1J3VdeEbCYhwmT5+omw@mail.gmail.com>
References: <CAAN-aRGRb-aLnrw3kwWq4Gp7GGjswpq2U4HOgFRk+XTGKpuhPw@mail.gmail.com>
	<CAPJVwBkNT2CFLbANWwBWwp+rW6b7yJz1J3VdeEbCYhwmT5+omw@mail.gmail.com>
Message-ID: <CAAN-aRFgv0eR7PdYJPL8jU3TM7pcyq8f-43aV=G6ffTN0+zCBA@mail.gmail.com>

Hi Nathaniel,

thanks for your reply, it works fine and suffice for my purpose.

cheers,

Chao

On Sat, Mar 16, 2013 at 5:49 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On 16 Mar 2013 16:41, "Chao YUE" <chaoyuejoy at gmail.com> wrote:
> >
> > Dear all,
> >
> > Is there some way to index the numpy array by specifying arbitrary axis
> and arbitrary slice, while
> > not knowing the actual shape of the data?
> > For example, I have a 3-dim data, data.shape = (3,4,5)
> > Is there a way to retrieve data[:,0,:] by using something like
> np.retrieve_data(data,axis=2,slice=0),
> > by this way you don't have to know the actual shape of the array.
> > for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually
> be data[:,0,:,:]
>
> I don't know of anything quite like that, but it's easy to fake it:
>
> def retrieve_data(a, ax, idx):
>     full_idx = [slice(None)] * a.ndim
>     full_idx[ax] = idx
>     return a[tuple(full_idx)]
>
> Or for the specific case where you do know the axis in advance, you just
> don't know how many trailing axes there are, use
>     a[:, :, 0, ...]
> and the ... will expand to represent the appropriate number of :'s.
>
> -n
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130318/f82cfb83/attachment.html>

From thomas.robitaille at gmail.com  Mon Mar 18 05:56:52 2013
From: thomas.robitaille at gmail.com (Thomas Robitaille)
Date: Mon, 18 Mar 2013 10:56:52 +0100
Subject: [Numpy-discussion] Memory issue with memory-mapped array assignment
Message-ID: <CAGMHX_38XFkJxeubhN2EFpjebuRHFyNjWVrO04HxNyYDoTxCQg@mail.gmail.com>

Hi everyone,

I've come across a memory issue when trying to assign data to slices
of a Numpy memory-mapped array. The short story is that if I create a
memory mapped array and try to add data to subsets of the array many
times in a loop, the memory usage of my code grows over time,
suggesting there is some kind of memory leak.

More specifically, if I run the following script:

import random
import numpy as np

image = np.memmap('image.np', mode='w+', dtype=np.float32, shape=(10000, 10000))

print("Before assignment")

for i in range(1000):

    x = random.uniform(1000, 9000)
    y = random.uniform(1000, 9000)
    imin = int(x) - 128
    imax = int(x) + 128
    jmin = int(y) - 128
    jmax = int(y) + 128
    data = np.random.random((256,256))
    image[imin:imax, jmin:jmax] = image[imin:imax, jmin:jmax] + data

    del x, y, imin, imax, jmin, jmax, data

the memory usage goes up to ~300Mb after 1000 iterations (and
proportionally more if I increase the number of iterations).

I've written up a more detailed overview of the issue on stackoverflow
(with memory profiling):

http://stackoverflow.com/questions/15473377/memory-issue-with-numpy-memory-mapped-array-assignment

Does anyone have any idea what is going on, and how I can avoid this issue?

Thanks!
Tom


From mpuecker at mit.edu  Mon Mar 18 09:42:09 2013
From: mpuecker at mit.edu (Matt U)
Date: Mon, 18 Mar 2013 13:42:09 +0000 (UTC)
Subject: [Numpy-discussion] numpy reference array
References: <loom.20130313T145058-115@post.gmane.org>
	<CALGmxE+kjdPL_8nZjmLorfsneG=rRbjRDcqB-eXYw-Tpg=uaNQ@mail.gmail.com>
Message-ID: <loom.20130318T143341-954@post.gmane.org>

Chris Barker - NOAA Federal <chris.barker <at> noaa.gov> writes:
> check out "stride tricks" for clever things you can do.
> 
> But numpy does require that the data in your array be a contiguous
> block, in order, so you can't arbitrarily re-arrange it while keeping
> a view.
> 
> HTH,
>   -Chris
> 

Hi Chris, 
Thanks for the reply, you've just saved me a lot of time. I did run across
'views' but it looked like I couldn't have my data arbitrarily arranged. Thank
you for confirming that. Unfortunately my desired view does not fit a neat
striding pattern. 
Cheers,
Matt


From pierre.haessig at crans.org  Mon Mar 18 13:00:02 2013
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Mon, 18 Mar 2013 18:00:02 +0100
Subject: [Numpy-discussion] Numpy correlate
In-Reply-To: <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>,
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
	<1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com>
Message-ID: <51474812.2060001@crans.org>

Hi Sudheer,

Le 14/03/2013 10:18, Sudheer Joseph a ?crit :
> Dear Numpy/Scipy experts,
>                                               Attached is a script
> which I made to test the numpy.correlate ( which is called py
> plt.xcorr) to see how the cross correlation is calculated. From this
> it appears the if i call plt.xcorr(x,y)
> Y is slided back in time compared to x. ie if y is a process that
> causes a delayed response in x after 5 timesteps then there should be
> a high correlation at Lag 5. However in attached plot the response is
> seen in only -ve side of the lags.
> Can any one advice me on how to see which way exactly the 2 series
> are slided back or forth.? and understand the cause result relation
> better?( I understand merely by correlation one cannot assume cause
> and result relation, but it is important to know which series is older
> in time at a given lag.
You indeed pointed out a lack of documentation of in matplotlib.xcorr
function because the definition of covariance can be ambiguous.

The way I would try to get an interpretation of xcorr function (& its
friends) is to go back to the theoretical definition of
cross-correlation, which is a normalized version of the covariance.

In your example you've created a time series X(k) and a lagged one :
Y(k) = X(k-5)

Now, the covariance function of X and Y is commonly defined as :
 Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
 (assuming that X and Y are centered for the sake of clarity).

If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)).
This yields naturally the fact that the covariance is indeed maximal at
h=-5 and not h=+5.

Note that this reasoning does yield the opposite result with a different
definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
that's what I first did !).


Therefore, I think there should be a definition in of cross correlation
in matplotlib xcorr docstring. In R's acf doc, there is this mention :
"The lag k value returned by ccf(x, y) estimates the correlation between
x[t+k] and y[t]. "
(see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)

Now I believe, this upper discussion really belongs to matplotlib ML.
I'll put an issue on github (I just spotted a mistake the definition of
normalization anyway)


Coming back to numpy :
There's a strange thing, the definition of numpy.correlate seems to give
the other definition "z[k] = sum_n a[n] * conj(v[n+k])" (
http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although
its usage prooves otherwise. What did I miss ?

best,
Pierre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130318/a73c8b14/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130318/a73c8b14/attachment.sig>

From jsseabold at gmail.com  Mon Mar 18 13:10:16 2013
From: jsseabold at gmail.com (Skipper Seabold)
Date: Mon, 18 Mar 2013 13:10:16 -0400
Subject: [Numpy-discussion] Numpy correlate
In-Reply-To: <51474812.2060001@crans.org>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
	<1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com>
	<51474812.2060001@crans.org>
Message-ID: <CAKF=DjvkvDQ+W_C2JSgjRJuR7b=V7YrE2NUd+o4EhmNpBKE8iQ@mail.gmail.com>

On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig <pierre.haessig at crans.org>wrote:

>  Hi Sudheer,
>
> Le 14/03/2013 10:18, Sudheer Joseph a ?crit :
>
> Dear Numpy/Scipy experts,
>                                               Attached is a script which I
> made to test the numpy.correlate ( which is called py plt.xcorr) to see how
> the cross correlation is calculated. From this it appears the if i call
> plt.xcorr(x,y)
> Y is slided back in time compared to x. ie if y is a process that causes a
> delayed response in x after 5 timesteps then there should be a high
> correlation at Lag 5. However in attached plot the response is seen in only
> -ve side of the lags.
> Can any one advice me on how to see which way exactly the 2 series
> are slided back or forth.? and understand the cause result relation
> better?( I understand merely by correlation one cannot assume cause and
> result relation, but it is important to know which series is older in time
> at a given lag.
>
> You indeed pointed out a lack of documentation of in matplotlib.xcorr
> function because the definition of covariance can be ambiguous.
>
> The way I would try to get an interpretation of xcorr function (& its
> friends) is to go back to the theoretical definition of cross-correlation,
> which is a normalized version of the covariance.
>
> In your example you've created a time series X(k) and a lagged one : Y(k)
> = X(k-5)
>
> Now, the covariance function of X and Y is commonly defined as :
>  Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
>  (assuming that X and Y are centered for the sake of clarity).
>
> If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This
> yields naturally the fact that the covariance is indeed maximal at h=-5 and
> not h=+5.
>
> Note that this reasoning does yield the opposite result with a different
> definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
> that's what I first did !).
>
>
> Therefore, I think there should be a definition in of cross correlation in
> matplotlib xcorr docstring. In R's acf doc, there is this mention : "The
> lag k value returned by ccf(x, y) estimates the correlation between x[t+k]
> and y[t]. "
> (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)
>
> Now I believe, this upper discussion really belongs to matplotlib ML. I'll
> put an issue on github (I just spotted a mistake the definition of
> normalization anyway)
>


You might be interested in the statsmodels implementation which should be
similar to the R functionality.

http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html<http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html?highlight=acf#statsmodels.tsa.stattools.acf>
http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html<http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html?highlight=acf#statsmodels.graphics.tsaplots.plot_acf>

Skipper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130318/7651e638/attachment.html>

From josef.pktd at gmail.com  Mon Mar 18 16:21:35 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 18 Mar 2013 16:21:35 -0400
Subject: [Numpy-discussion] Numpy correlate
In-Reply-To: <CAKF=DjvkvDQ+W_C2JSgjRJuR7b=V7YrE2NUd+o4EhmNpBKE8iQ@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
	<1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com>
	<51474812.2060001@crans.org>
	<CAKF=DjvkvDQ+W_C2JSgjRJuR7b=V7YrE2NUd+o4EhmNpBKE8iQ@mail.gmail.com>
Message-ID: <CAMMTP+BRP6LjQZ_GaAgkHRWckCjfeA2QbbD+C4dk_WyMsk==AQ@mail.gmail.com>

On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig <pierre.haessig at crans.org>
> wrote:
>>
>> Hi Sudheer,
>>
>> Le 14/03/2013 10:18, Sudheer Joseph a ?crit :
>>
>> Dear Numpy/Scipy experts,
>>                                               Attached is a script which I
>> made to test the numpy.correlate ( which is called py plt.xcorr) to see how
>> the cross correlation is calculated. From this it appears the if i call
>> plt.xcorr(x,y)
>> Y is slided back in time compared to x. ie if y is a process that causes a
>> delayed response in x after 5 timesteps then there should be a high
>> correlation at Lag 5. However in attached plot the response is seen in only
>> -ve side of the lags.
>> Can any one advice me on how to see which way exactly the 2 series are
>> slided back or forth.? and understand the cause result relation better?( I
>> understand merely by correlation one cannot assume cause and result
>> relation, but it is important to know which series is older in time at a
>> given lag.
>>
>> You indeed pointed out a lack of documentation of in matplotlib.xcorr
>> function because the definition of covariance can be ambiguous.
>>
>> The way I would try to get an interpretation of xcorr function (& its
>> friends) is to go back to the theoretical definition of cross-correlation,
>> which is a normalized version of the covariance.
>>
>> In your example you've created a time series X(k) and a lagged one : Y(k)
>> = X(k-5)
>>
>> Now, the covariance function of X and Y is commonly defined as :
>>  Cov_{X,Y}(h) = E(X(k+h) * Y(k))   where E is the expectation
>>  (assuming that X and Y are centered for the sake of clarity).
>>
>> If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This
>> yields naturally the fact that the covariance is indeed maximal at h=-5 and
>> not h=+5.
>>
>> Note that this reasoning does yield the opposite result with a different
>> definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))  (and
>> that's what I first did !).
>>
>>
>> Therefore, I think there should be a definition in of cross correlation in
>> matplotlib xcorr docstring. In R's acf doc, there is this mention : "The lag
>> k value returned by ccf(x, y) estimates the correlation between x[t+k] and
>> y[t]. "
>> (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)
>>
>> Now I believe, this upper discussion really belongs to matplotlib ML. I'll
>> put an issue on github (I just spotted a mistake the definition of
>> normalization anyway)
>
>
>
> You might be interested in the statsmodels implementation which should be
> similar to the R functionality.
>
> http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html

we don't have any cross-correlation xcorr, AFAIR
but I guess it should work the same way.

Josef

>
> Skipper
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From sudheer.joseph at yahoo.com  Tue Mar 19 03:07:57 2013
From: sudheer.joseph at yahoo.com (Sudheer Joseph)
Date: Tue, 19 Mar 2013 15:07:57 +0800 (SGT)
Subject: [Numpy-discussion] Numpy correlate
In-Reply-To: <CAMMTP+BRP6LjQZ_GaAgkHRWckCjfeA2QbbD+C4dk_WyMsk==AQ@mail.gmail.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
	<1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com>
	<51474812.2060001@crans.org>
	<CAKF=DjvkvDQ+W_C2JSgjRJuR7b=V7YrE2NUd+o4EhmNpBKE8iQ@mail.gmail.com>
	<CAMMTP+BRP6LjQZ_GaAgkHRWckCjfeA2QbbD+C4dk_WyMsk==AQ@mail.gmail.com>
Message-ID: <1363676877.47397.YahooMailNeo@web193403.mail.sg3.yahoo.com>

Thank you All for the response,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?acf do not accept 2 variables so naturally?
http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
>?http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html
>?http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html

These may not work for me.

?
***************************************************************
Sudheer Joseph 
Indian National Centre for Ocean Information Services
Ministry of Earth Sciences, Govt. of India
POST BOX NO: 21, IDA Jeedeemetla P.O.
Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55
Tel:+91-40-23886047(O),Fax:+91-40-23895011(O),
Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile)
E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com
Web- http://oppamthadathil.tripod.com
***************************************************************


________________________________
 From: "josef.pktd at gmail.com" <josef.pktd at gmail.com>
To: Discussion of Numerical Python <numpy-discussion at scipy.org> 
Sent: Tuesday, 19 March 2013 1:51 AM
Subject: Re: [Numpy-discussion] Numpy correlate
 
On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig <pierre.haessig at crans.org>
> wrote:
>>
>> Hi Sudheer,
>>
>> Le 14/03/2013 10:18, Sudheer Joseph a ?crit :
>>
>> Dear Numpy/Scipy experts,
>>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  Attached is a script which I
>> made to test the numpy.correlate ( which is called py plt.xcorr) to see how
>> the cross correlation is calculated. From this it appears the if i call
>> plt.xcorr(x,y)
>> Y is slided back in time compared to x. ie if y is a process that causes a
>> delayed response in x after 5 timesteps then there should be a high
>> correlation at Lag 5. However in attached plot the response is seen in only
>> -ve side of the lags.
>> Can any one advice me on how to see which way exactly the 2 series are
>> slided back or forth.? and understand the cause result relation better?( I
>> understand merely by correlation one cannot assume cause and result
>> relation, but it is important to know which series is older in time at a
>> given lag.
>>
>> You indeed pointed out a lack of documentation of in matplotlib.xcorr
>> function because the definition of covariance can be ambiguous.
>>
>> The way I would try to get an interpretation of xcorr function (& its
>> friends) is to go back to the theoretical definition of cross-correlation,
>> which is a normalized version of the covariance.
>>
>> In your example you've created a time series X(k) and a lagged one : Y(k)
>> = X(k-5)
>>
>> Now, the covariance function of X and Y is commonly defined as :
>>? Cov_{X,Y}(h) = E(X(k+h) * Y(k))?  where E is the expectation
>>? (assuming that X and Y are centered for the sake of clarity).
>>
>> If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This
>> yields naturally the fact that the covariance is indeed maximal at h=-5 and
>> not h=+5.
>>
>> Note that this reasoning does yield the opposite result with a different
>> definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))? (and
>> that's what I first did !).
>>
>>
>> Therefore, I think there should be a definition in of cross correlation in
>> matplotlib xcorr docstring. In R's acf doc, there is this mention : "The lag
>> k value returned by ccf(x, y) estimates the correlation between x[t+k] and
>> y[t]. "
>> (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)
>>
>> Now I believe, this upper discussion really belongs to matplotlib ML. I'll
>> put an issue on github (I just spotted a mistake the definition of
>> normalization anyway)
>
>
>
> You might be interested in the statsmodels implementation which should be
> similar to the R functionality.
>
> http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html

we don't have any cross-correlation xcorr, AFAIR
but I guess it should work the same way.

Josef

>
> Skipper
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130319/1badba02/attachment.html>

From sudheer.joseph at yahoo.com  Tue Mar 19 03:12:00 2013
From: sudheer.joseph at yahoo.com (Sudheer Joseph)
Date: Tue, 19 Mar 2013 15:12:00 +0800 (SGT)
Subject: [Numpy-discussion] Numpy correlate
In-Reply-To: <51474812.2060001@crans.org>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>,
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
	<1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com>
	<51474812.2060001@crans.org>
Message-ID: <1363677120.65039.YahooMailNeo@web193405.mail.sg3.yahoo.com>

Thank you Pierre,
? ? ? ? ? ? ? ? ? ? ? ? It appears the numpy.correlate uses the frequency domain method for getting the ccf. I would like to know how serious or exactly what is the issue with normalization?. I have computed cross correlation using the function and interpreting the results based on it. It will be helpful if you could tell me if there is a significant bug in the function
with best regards,
Sudheer
From: Pierre Haessig <pierre.haessig at crans.org>
To: numpy-discussion at scipy.org 
Sent: Monday, 18 March 2013 10:30 PM
Subject: Re: [Numpy-discussion] Numpy correlate
 

Hi Sudheer,

Le 14/03/2013 10:18, Sudheer Joseph a ?crit?: 
Dear Numpy/Scipy experts,
>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y)
>Y is?slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags.
>Can any one advice me on how to see which way exactly the 2 series are?slided?back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag.
You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous.

The way I would try to get an interpretation of xcorr function
    (& its friends) is to go back to the theoretical definition of
    cross-correlation, which is a normalized version of the covariance.

In your example you've created a time series X(k) and a lagged one :
    Y(k) = X(k-5)

Now, the covariance function of X and Y is commonly defined as :
?Cov_{X,Y}(h) = E(X(k+h) * Y(k))?? where E is the expectation
?(assuming that X and Y are centered for the sake of clarity).

If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)).
    This yields naturally the fact that the covariance is indeed maximal
    at h=-5 and not h=+5.

Note that this reasoning does yield the opposite result with a
    different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) *
    Y(k+h))? (and that's what I first did !).


Therefore, I think there should be a definition in of cross
    correlation in matplotlib xcorr docstring. In R's acf doc, there is
    this mention : "The lag k value returned by ccf(x, y) estimates the
    correlation between x[t+k] and y[t]. "
(see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html)

Now I believe, this upper discussion really belongs to matplotlib
    ML. I'll put an issue on github (I just spotted a mistake the
    definition of normalization anyway)


Coming back to numpy :
There's a strange thing, the definition of numpy.correlate seems to
    give the other definition "z[k] = sum_n a[n] * conj(v[n+k])" ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although its usage prooves otherwise. What did I miss ?

best,
Pierre

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130319/086ff134/attachment.html>

From pierre.haessig at crans.org  Wed Mar 20 04:30:59 2013
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Wed, 20 Mar 2013 09:30:59 +0100
Subject: [Numpy-discussion] Numpy correlate
In-Reply-To: <1363677120.65039.YahooMailNeo@web193405.mail.sg3.yahoo.com>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F610D37D@es1.nioz.nl>
	<CAPJVwBn8t7r93qsza7mHE=rLHRrtp8++MxcNVyLOL4b81G87iQ@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D37F@es1.nioz.nl>
	<CAPJVwB=1z7Fkzf2MmhkWPMJoyuBgoe30m38ahqU+5Na9gQdEnw@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D381@es1.nioz.nl>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D383@es1.nioz.nl>
	<CAPJVwBmxe48Kpn2qPo9ep8X1w51BBqUtRiaS7UgXxnooZ1UO5g@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D384@es1.nioz.nl>,
	<CAPJVwBnnzfeO95Sdv8zfpj-aWwsdd7FSExSPP5jz-vSfTUrf_Q@mail.gmail.com>
	<D178C78358EBCB46B95A8E0B4FD53D56A0F610D387@es1.nioz.nl>
	<1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com>
	<51474812.2060001@crans.org>
	<1363677120.65039.YahooMailNeo@web193405.mail.sg3.yahoo.com>
Message-ID: <514973C3.9000905@crans.org>

Hi,
Le 19/03/2013 08:12, Sudheer Joseph a ?crit :
> *Thank you Pierre,*
>                         It appears the numpy.correlate uses the
> frequency domain method for getting the ccf. I would like to know how
> serious or exactly what is the issue with normalization?. I have
> computed cross correlation using the function and interpreting the
> results based on it. It will be helpful if you could tell me if there
> is a significant bug in the function
> with best regards,
> Sudheer
np.correlate works in the time domain. I started a discussion about a
month ago about the way it's implemented
http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065562.html
Unfortunately I didn't find time to dig deeper in the matter which needs
working in the C code of numpy which I'm not familiar with.

Concerning the normalization of mpl.xcorr, I think that what is computed
is just fine. It's just the way this normalization is described in the
docstring which I think is weird.
https://github.com/matplotlib/matplotlib/issues/1835

best,
Pierre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/2a75ac4b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/2a75ac4b/attachment.sig>

From jaakko.luttinen at aalto.fi  Wed Mar 20 09:33:50 2013
From: jaakko.luttinen at aalto.fi (Jaakko Luttinen)
Date: Wed, 20 Mar 2013 15:33:50 +0200
Subject: [Numpy-discussion] Dot/inner products with broadcasting?
In-Reply-To: <CAMGgv_-uW3PBbnjWjvug732d9Ch=k+J1W03nNZnCkSqtKBhF_g@mail.gmail.com>
References: <51408B59.8090504@aalto.fi> <5141BA5E.2020704@aalto.fi>
	<CAMGgv_-uW3PBbnjWjvug732d9Ch=k+J1W03nNZnCkSqtKBhF_g@mail.gmail.com>
Message-ID: <5149BABE.1080306@aalto.fi>

I tried using this inner1d as an alternative to dot because it uses
broadcasting. However, I found something surprising: Not only is inner1d
much much slower than dot, it is also slower than einsum which is much
more general:

In [68]: import numpy as np

In [69]: import numpy.core.gufuncs_linalg as gula

In [70]: K = np.random.randn(1000,1000)

In [71]: %timeit gula.inner1d(K[:,np.newaxis,:],
np.swapaxes(K,-1,-2)[np.newaxis,:,:])
1 loops, best of 3: 6.05 s per loop

In [72]: %timeit np.dot(K,K)
1 loops, best of 3: 392 ms per loop

In [73]: %timeit np.einsum('ik,kj->ij', K, K)
1 loops, best of 3: 1.24 s per loop

Why is it so? I thought that the performance of inner1d would be
somewhere in between dot and einsum, probably closer to dot. Now I don't
see any reason to use inner1d instead of einsum..

-Jaakko

On 03/15/2013 04:22 PM, Oscar Villellas wrote:
> In fact, there is already an inner1d implemented in
> numpy.core.umath_tests.inner1d
> 
> from numpy.core.umath_tests import inner1d
> 
> It should do the trick :)
> 
> On Thu, Mar 14, 2013 at 12:54 PM, Jaakko Luttinen
> <jaakko.luttinen at aalto.fi> wrote:
>> Answering to myself, this pull request seems to implement an inner
>> product with broadcasting (inner1d) and many other useful functions:
>> https://github.com/numpy/numpy/pull/2954/
>> -J
>>
>> On 03/13/2013 04:21 PM, Jaakko Luttinen wrote:
>>> Hi!
>>>
>>> How can I compute dot product (or similar multiply&sum operations)
>>> efficiently so that broadcasting is utilized?
>>> For multi-dimensional arrays, NumPy's inner and dot functions do not
>>> match the leading axes and use broadcasting, but instead the result has
>>> first the leading axes of the first input array and then the leading
>>> axes of the second input array.
>>>
>>> For instance, I would like to compute the following inner-product:
>>> np.sum(A*B, axis=-1)
>>>
>>> But numpy.inner gives:
>>> A = np.random.randn(2,3,4)
>>> B = np.random.randn(3,4)
>>> np.inner(A,B).shape
>>> # -> (2, 3, 3) instead of (2, 3)
>>>
>>> Similarly for dot product, I would like to compute for instance:
>>> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2)
>>>
>>> But numpy.dot gives:
>>> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5)
>>> In [13]: np.dot(A,B).shape
>>> # -> (2, 3, 2, 5) instead of (2, 3, 5)
>>>
>>> I could use einsum for these operations, but I'm not sure whether that's
>>> as efficient as using some BLAS-supported(?) dot products.
>>>
>>> I couldn't find any function which could perform this kind of
>>> operations. NumPy's functions seem to either flatten the input arrays
>>> (vdot, outer) or just use the axes of the input arrays separately (dot,
>>> inner, tensordot).
>>>
>>> Any help?
>>>
>>> Best regards,
>>> Jaakko
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From cjw at ncf.ca  Wed Mar 20 09:46:35 2013
From: cjw at ncf.ca (Colin J. Williams)
Date: Wed, 20 Mar 2013 09:46:35 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2
	using numpy
Message-ID: <5149BDBB.6060509@ncf.ca>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/c5607c89/attachment.html>

From pierre.barbierdereuille at gmail.com  Wed Mar 20 09:57:59 2013
From: pierre.barbierdereuille at gmail.com (Pierre Barbier de Reuille)
Date: Wed, 20 Mar 2013 14:57:59 +0100
Subject: [Numpy-discussion] Bug in np.records?
Message-ID: <CAPFL7F=ca_cpr-235z8zw0v7hFsqVkkk7g5WzWBQJyhGOEqA5g@mail.gmail.com>

Hey,

I am trying to use titles for the record arrays. In the documentation, it
is specified that any column can set to "None". However, trying this fails
on numpy 1.6.2 because in np.core.records, on line 195, the "strip" method
is called on the title object. This is really annoying. Could we fix this
by replacing line 195 with:


    self._titles = [n.strip() if n is not None else None for n in
titles[:self._nfields]]

?

Thank you,
-- 
Barbier de Reuille Pierre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/b1018ed8/attachment.html>

From sd at syntonetic.com  Wed Mar 20 09:59:33 2013
From: sd at syntonetic.com (=?UTF-8?B?U8O4cmVu?=)
Date: Wed, 20 Mar 2013 14:59:33 +0100
Subject: [Numpy-discussion] numpy array to C API
Message-ID: <5149C0C5.6050801@syntonetic.com>

Greetings

I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching.
It already works like a charm calling python with the C API <Python.h>.

But what is the proper way of passing double arrays returned from Python/Numpy routines back to C?

I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities.

Going forward, what is the intended way of doing this with neat code on both sides and with a minimum of mem copy gymnastics overhead?

thanks in advance

S?ren


From davidmenhur at gmail.com  Wed Mar 20 10:14:06 2013
From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=)
Date: Wed, 20 Mar 2013 15:14:06 +0100
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <5149BDBB.6060509@ncf.ca>
References: <5149BDBB.6060509@ncf.ca>
Message-ID: <CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>

Without much detailed knowledge of the topic, I would expect both
versions to give very similar timing, as it is essentially a call to
ATLAS function, not much is done in Python.

Given this, maybe the difference is in ATLAS itself. How have you
installed it? When you compile ATLAS, it will do some machine-specific
optimisation, but if you have installed a binary chances are that your
version is optimised for a machine quite different from yours. So, two
different installations could have been compiled in different machines
and so one is more suited for your machine. If you want to be sure, I
would try to compile ATLAS (this may be difficult) or check the same
on a very different machine (like an AMD processor, different
architecture...).


Just for reference, on Linux Python 2.7 64 bits can deal with these
matrices easily.

%timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat);
res = np.dot(mat, matinv); diff= res-np.eye(6143); print
np.sum(np.abs(diff))
2.41799631031e-05
1.13955868701e-05
3.64338191541e-05
1.13484781021e-05
1 loops, best of 3: 156 s per loop

Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository
(I don't run heavy stuff on this computer).

On 20 March 2013 14:46, Colin J. Williams <cjw at ncf.ca> wrote:
> I have a small program which builds random matrices for increasing matrix
> orders, inverts the matrix and checks the precision of the product.  At some
> point, one would expect operations to fail, when the memory capacity is
> exceeded.  In both Python 2.7 and 3.2 matrices of order 3,071 area handled,
> but not 6,143.
>
> Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7.
> The profiler indicates a problem in the solver.
>
> Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free
> disk space.  Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2.
>
> The results are show below.
>
> Colin W.
>
> aaaa_ssss
> 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
> order=    2   measure ofimprecision= 0.097   Time elapsed (seconds)=
> 0.004143
> order=    5   measure ofimprecision= 2.207   Time elapsed (seconds)=
> 0.001514
> order=   11   measure ofimprecision= 2.372   Time elapsed (seconds)=
> 0.001455
> order=   23   measure ofimprecision= 3.318   Time elapsed (seconds)=
> 0.001608
> order=   47   measure ofimprecision= 4.257   Time elapsed (seconds)=
> 0.002339
> order=   95   measure ofimprecision= 4.986   Time elapsed (seconds)=
> 0.005747
> order=  191   measure ofimprecision= 5.788   Time elapsed (seconds)=
> 0.029974
> order=  383   measure ofimprecision= 6.765   Time elapsed (seconds)=
> 0.145339
> order=  767   measure ofimprecision= 7.909   Time elapsed (seconds)=
> 0.841142
> order= 1535   measure ofimprecision= 8.532   Time elapsed (seconds)=
> 5.793630
> order= 3071   measure ofimprecision= 9.774   Time elapsed (seconds)=
> 39.559540
> order=  6143 Process terminated by a MemoryError
>
> Above: 2.7.3  Below: Python 3.2.3
>
> bbb_bbb
> 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
> order=    2   measure ofimprecision= 0.000   Time elapsed (seconds)=
> 0.113930
> order=    5   measure ofimprecision= 1.807   Time elapsed (seconds)=
> 0.001373
> order=   11   measure ofimprecision= 2.395   Time elapsed (seconds)=
> 0.001468
> order=   23   measure ofimprecision= 3.073   Time elapsed (seconds)=
> 0.001609
> order=   47   measure ofimprecision= 5.642   Time elapsed (seconds)=
> 0.002687
> order=   95   measure ofimprecision= 5.745   Time elapsed (seconds)=
> 0.013510
> order=  191   measure ofimprecision= 5.866   Time elapsed (seconds)=
> 0.061560
> order=  383   measure ofimprecision= 7.129   Time elapsed (seconds)=
> 0.418490
> order=  767   measure ofimprecision= 8.240   Time elapsed (seconds)=
> 3.815713
> order= 1535   measure ofimprecision= 8.735   Time elapsed (seconds)=
> 27.877270
> order= 3071   measure ofimprecision= 9.996   Time elapsed
> (seconds)=212.545610
> order=  6143 Process terminated by a MemoryError
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From jenshnielsen at gmail.com  Wed Mar 20 10:29:37 2013
From: jenshnielsen at gmail.com (Jens Nielsen)
Date: Wed, 20 Mar 2013 14:29:37 +0000
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
Message-ID: <CAM-Pw024HQQGooWdfFAvT_ig7JrEWMSZG07ACnu+53q20sFfbg@mail.gmail.com>

Hi,

Could also be that they are linked to different libs such as atlas and
standart Blas. What is the output of
numpy.show_config() in the two different python versions.

Jens


On Wed, Mar 20, 2013 at 2:14 PM, Da?id <davidmenhur at gmail.com> wrote:

> Without much detailed knowledge of the topic, I would expect both
> versions to give very similar timing, as it is essentially a call to
> ATLAS function, not much is done in Python.
>
> Given this, maybe the difference is in ATLAS itself. How have you
> installed it? When you compile ATLAS, it will do some machine-specific
> optimisation, but if you have installed a binary chances are that your
> version is optimised for a machine quite different from yours. So, two
> different installations could have been compiled in different machines
> and so one is more suited for your machine. If you want to be sure, I
> would try to compile ATLAS (this may be difficult) or check the same
> on a very different machine (like an AMD processor, different
> architecture...).
>
>
>
> Just for reference, on Linux Python 2.7 64 bits can deal with these
> matrices easily.
>
> %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat);
> res = np.dot(mat, matinv); diff= res-np.eye(6143); print
> np.sum(np.abs(diff))
> 2.41799631031e-05
> 1.13955868701e-05
> 3.64338191541e-05
> 1.13484781021e-05
> 1 loops, best of 3: 156 s per loop
>
> Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository
> (I don't run heavy stuff on this computer).
>
> On 20 March 2013 14:46, Colin J. Williams <cjw at ncf.ca> wrote:
> > I have a small program which builds random matrices for increasing matrix
> > orders, inverts the matrix and checks the precision of the product.  At
> some
> > point, one would expect operations to fail, when the memory capacity is
> > exceeded.  In both Python 2.7 and 3.2 matrices of order 3,071 area
> handled,
> > but not 6,143.
> >
> > Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7.
> > The profiler indicates a problem in the solver.
> >
> > Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free
> > disk space.  Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2.
> >
> > The results are show below.
> >
> > Colin W.
> >
> > aaaa_ssss
> > 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
> > order=    2   measure ofimprecision= 0.097   Time elapsed (seconds)=
> > 0.004143
> > order=    5   measure ofimprecision= 2.207   Time elapsed (seconds)=
> > 0.001514
> > order=   11   measure ofimprecision= 2.372   Time elapsed (seconds)=
> > 0.001455
> > order=   23   measure ofimprecision= 3.318   Time elapsed (seconds)=
> > 0.001608
> > order=   47   measure ofimprecision= 4.257   Time elapsed (seconds)=
> > 0.002339
> > order=   95   measure ofimprecision= 4.986   Time elapsed (seconds)=
> > 0.005747
> > order=  191   measure ofimprecision= 5.788   Time elapsed (seconds)=
> > 0.029974
> > order=  383   measure ofimprecision= 6.765   Time elapsed (seconds)=
> > 0.145339
> > order=  767   measure ofimprecision= 7.909   Time elapsed (seconds)=
> > 0.841142
> > order= 1535   measure ofimprecision= 8.532   Time elapsed (seconds)=
> > 5.793630
> > order= 3071   measure ofimprecision= 9.774   Time elapsed (seconds)=
> > 39.559540
> > order=  6143 Process terminated by a MemoryError
> >
> > Above: 2.7.3  Below: Python 3.2.3
> >
> > bbb_bbb
> > 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
> > order=    2   measure ofimprecision= 0.000   Time elapsed (seconds)=
> > 0.113930
> > order=    5   measure ofimprecision= 1.807   Time elapsed (seconds)=
> > 0.001373
> > order=   11   measure ofimprecision= 2.395   Time elapsed (seconds)=
> > 0.001468
> > order=   23   measure ofimprecision= 3.073   Time elapsed (seconds)=
> > 0.001609
> > order=   47   measure ofimprecision= 5.642   Time elapsed (seconds)=
> > 0.002687
> > order=   95   measure ofimprecision= 5.745   Time elapsed (seconds)=
> > 0.013510
> > order=  191   measure ofimprecision= 5.866   Time elapsed (seconds)=
> > 0.061560
> > order=  383   measure ofimprecision= 7.129   Time elapsed (seconds)=
> > 0.418490
> > order=  767   measure ofimprecision= 8.240   Time elapsed (seconds)=
> > 3.815713
> > order= 1535   measure ofimprecision= 8.735   Time elapsed (seconds)=
> > 27.877270
> > order= 3071   measure ofimprecision= 9.996   Time elapsed
> > (seconds)=212.545610
> > order=  6143 Process terminated by a MemoryError
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/2ba1dcf1/attachment.html>

From nouiz at nouiz.org  Wed Mar 20 10:30:48 2013
From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=)
Date: Wed, 20 Mar 2013 10:30:48 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
Message-ID: <CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>

Hi,

win32 do not mean it is a 32 bits windows. sys.platform always return
win32 on 32bits and 64 bits windows even for python 64 bits.

But that is a good question, is your python 32 or 64 bits?

Fred

On Wed, Mar 20, 2013 at 10:14 AM, Da?id <davidmenhur at gmail.com> wrote:
> Without much detailed knowledge of the topic, I would expect both
> versions to give very similar timing, as it is essentially a call to
> ATLAS function, not much is done in Python.
>
> Given this, maybe the difference is in ATLAS itself. How have you
> installed it? When you compile ATLAS, it will do some machine-specific
> optimisation, but if you have installed a binary chances are that your
> version is optimised for a machine quite different from yours. So, two
> different installations could have been compiled in different machines
> and so one is more suited for your machine. If you want to be sure, I
> would try to compile ATLAS (this may be difficult) or check the same
> on a very different machine (like an AMD processor, different
> architecture...).
>
>
>
> Just for reference, on Linux Python 2.7 64 bits can deal with these
> matrices easily.
>
> %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat);
> res = np.dot(mat, matinv); diff= res-np.eye(6143); print
> np.sum(np.abs(diff))
> 2.41799631031e-05
> 1.13955868701e-05
> 3.64338191541e-05
> 1.13484781021e-05
> 1 loops, best of 3: 156 s per loop
>
> Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository
> (I don't run heavy stuff on this computer).
>
> On 20 March 2013 14:46, Colin J. Williams <cjw at ncf.ca> wrote:
>> I have a small program which builds random matrices for increasing matrix
>> orders, inverts the matrix and checks the precision of the product.  At some
>> point, one would expect operations to fail, when the memory capacity is
>> exceeded.  In both Python 2.7 and 3.2 matrices of order 3,071 area handled,
>> but not 6,143.
>>
>> Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7.
>> The profiler indicates a problem in the solver.
>>
>> Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free
>> disk space.  Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2.
>>
>> The results are show below.
>>
>> Colin W.
>>
>> aaaa_ssss
>> 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
>> order=    2   measure ofimprecision= 0.097   Time elapsed (seconds)=
>> 0.004143
>> order=    5   measure ofimprecision= 2.207   Time elapsed (seconds)=
>> 0.001514
>> order=   11   measure ofimprecision= 2.372   Time elapsed (seconds)=
>> 0.001455
>> order=   23   measure ofimprecision= 3.318   Time elapsed (seconds)=
>> 0.001608
>> order=   47   measure ofimprecision= 4.257   Time elapsed (seconds)=
>> 0.002339
>> order=   95   measure ofimprecision= 4.986   Time elapsed (seconds)=
>> 0.005747
>> order=  191   measure ofimprecision= 5.788   Time elapsed (seconds)=
>> 0.029974
>> order=  383   measure ofimprecision= 6.765   Time elapsed (seconds)=
>> 0.145339
>> order=  767   measure ofimprecision= 7.909   Time elapsed (seconds)=
>> 0.841142
>> order= 1535   measure ofimprecision= 8.532   Time elapsed (seconds)=
>> 5.793630
>> order= 3071   measure ofimprecision= 9.774   Time elapsed (seconds)=
>> 39.559540
>> order=  6143 Process terminated by a MemoryError
>>
>> Above: 2.7.3  Below: Python 3.2.3
>>
>> bbb_bbb
>> 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
>> order=    2   measure ofimprecision= 0.000   Time elapsed (seconds)=
>> 0.113930
>> order=    5   measure ofimprecision= 1.807   Time elapsed (seconds)=
>> 0.001373
>> order=   11   measure ofimprecision= 2.395   Time elapsed (seconds)=
>> 0.001468
>> order=   23   measure ofimprecision= 3.073   Time elapsed (seconds)=
>> 0.001609
>> order=   47   measure ofimprecision= 5.642   Time elapsed (seconds)=
>> 0.002687
>> order=   95   measure ofimprecision= 5.745   Time elapsed (seconds)=
>> 0.013510
>> order=  191   measure ofimprecision= 5.866   Time elapsed (seconds)=
>> 0.061560
>> order=  383   measure ofimprecision= 7.129   Time elapsed (seconds)=
>> 0.418490
>> order=  767   measure ofimprecision= 8.240   Time elapsed (seconds)=
>> 3.815713
>> order= 1535   measure ofimprecision= 8.735   Time elapsed (seconds)=
>> 27.877270
>> order= 3071   measure ofimprecision= 9.996   Time elapsed
>> (seconds)=212.545610
>> order=  6143 Process terminated by a MemoryError
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From cjwilliams43 at gmail.com  Wed Mar 20 10:49:58 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Wed, 20 Mar 2013 10:49:58 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
Message-ID: <5149CC96.7090006@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/e6e4224d/attachment.html>

From cjwilliams43 at gmail.com  Wed Mar 20 10:59:26 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Wed, 20 Mar 2013 10:59:26 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CAM-Pw024HQQGooWdfFAvT_ig7JrEWMSZG07ACnu+53q20sFfbg@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CAM-Pw024HQQGooWdfFAvT_ig7JrEWMSZG07ACnu+53q20sFfbg@mail.gmail.com>
Message-ID: <5149CECE.6020107@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/cd54db16/attachment.html>

From cjwilliams43 at gmail.com  Wed Mar 20 11:01:33 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Wed, 20 Mar 2013 11:01:33 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
Message-ID: <5149CF4D.6090906@gmail.com>

On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote:
> Hi,
>
> win32 do not mean it is a 32 bits windows. sys.platform always return
> win32 on 32bits and 64 bits windows even for python 64 bits.
>
> But that is a good question, is your python 32 or 64 bits?
32 bits.

Colin W.
>
> Fred
>
> On Wed, Mar 20, 2013 at 10:14 AM, Da?id <davidmenhur at gmail.com> wrote:
>> Without much detailed knowledge of the topic, I would expect both
>> versions to give very similar timing, as it is essentially a call to
>> ATLAS function, not much is done in Python.
>>
>> Given this, maybe the difference is in ATLAS itself. How have you
>> installed it? When you compile ATLAS, it will do some machine-specific
>> optimisation, but if you have installed a binary chances are that your
>> version is optimised for a machine quite different from yours. So, two
>> different installations could have been compiled in different machines
>> and so one is more suited for your machine. If you want to be sure, I
>> would try to compile ATLAS (this may be difficult) or check the same
>> on a very different machine (like an AMD processor, different
>> architecture...).
>>
>>
>>
>> Just for reference, on Linux Python 2.7 64 bits can deal with these
>> matrices easily.
>>
>> %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat);
>> res = np.dot(mat, matinv); diff= res-np.eye(6143); print
>> np.sum(np.abs(diff))
>> 2.41799631031e-05
>> 1.13955868701e-05
>> 3.64338191541e-05
>> 1.13484781021e-05
>> 1 loops, best of 3: 156 s per loop
>>
>> Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository
>> (I don't run heavy stuff on this computer).
>>
>> On 20 March 2013 14:46, Colin J. Williams <cjw at ncf.ca> wrote:
>>> I have a small program which builds random matrices for increasing matrix
>>> orders, inverts the matrix and checks the precision of the product.  At some
>>> point, one would expect operations to fail, when the memory capacity is
>>> exceeded.  In both Python 2.7 and 3.2 matrices of order 3,071 area handled,
>>> but not 6,143.
>>>
>>> Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7.
>>> The profiler indicates a problem in the solver.
>>>
>>> Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free
>>> disk space.  Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2.
>>>
>>> The results are show below.
>>>
>>> Colin W.
>>>
>>> aaaa_ssss
>>> 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
>>> order=    2   measure ofimprecision= 0.097   Time elapsed (seconds)=
>>> 0.004143
>>> order=    5   measure ofimprecision= 2.207   Time elapsed (seconds)=
>>> 0.001514
>>> order=   11   measure ofimprecision= 2.372   Time elapsed (seconds)=
>>> 0.001455
>>> order=   23   measure ofimprecision= 3.318   Time elapsed (seconds)=
>>> 0.001608
>>> order=   47   measure ofimprecision= 4.257   Time elapsed (seconds)=
>>> 0.002339
>>> order=   95   measure ofimprecision= 4.986   Time elapsed (seconds)=
>>> 0.005747
>>> order=  191   measure ofimprecision= 5.788   Time elapsed (seconds)=
>>> 0.029974
>>> order=  383   measure ofimprecision= 6.765   Time elapsed (seconds)=
>>> 0.145339
>>> order=  767   measure ofimprecision= 7.909   Time elapsed (seconds)=
>>> 0.841142
>>> order= 1535   measure ofimprecision= 8.532   Time elapsed (seconds)=
>>> 5.793630
>>> order= 3071   measure ofimprecision= 9.774   Time elapsed (seconds)=
>>> 39.559540
>>> order=  6143 Process terminated by a MemoryError
>>>
>>> Above: 2.7.3  Below: Python 3.2.3
>>>
>>> bbb_bbb
>>> 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
>>> order=    2   measure ofimprecision= 0.000   Time elapsed (seconds)=
>>> 0.113930
>>> order=    5   measure ofimprecision= 1.807   Time elapsed (seconds)=
>>> 0.001373
>>> order=   11   measure ofimprecision= 2.395   Time elapsed (seconds)=
>>> 0.001468
>>> order=   23   measure ofimprecision= 3.073   Time elapsed (seconds)=
>>> 0.001609
>>> order=   47   measure ofimprecision= 5.642   Time elapsed (seconds)=
>>> 0.002687
>>> order=   95   measure ofimprecision= 5.745   Time elapsed (seconds)=
>>> 0.013510
>>> order=  191   measure ofimprecision= 5.866   Time elapsed (seconds)=
>>> 0.061560
>>> order=  383   measure ofimprecision= 7.129   Time elapsed (seconds)=
>>> 0.418490
>>> order=  767   measure ofimprecision= 8.240   Time elapsed (seconds)=
>>> 3.815713
>>> order= 1535   measure ofimprecision= 8.735   Time elapsed (seconds)=
>>> 27.877270
>>> order= 3071   measure ofimprecision= 9.996   Time elapsed
>>> (seconds)=212.545610
>>> order=  6143 Process terminated by a MemoryError
>>>
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From jenshnielsen at gmail.com  Wed Mar 20 11:06:47 2013
From: jenshnielsen at gmail.com (Jens Nielsen)
Date: Wed, 20 Mar 2013 15:06:47 +0000
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <5149CF4D.6090906@gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
Message-ID: <CAM-Pw03uiJ3-jb1zV=m4hLw5EATXVY2N1+NS5RrK35EFObTfPQ@mail.gmail.com>

The python3 version is compiled without any optimised library and is
falling back on a slow version. Where did you get this installation from?

Jens


On Wed, Mar 20, 2013 at 3:01 PM, Colin J. Williams
<cjwilliams43 at gmail.com>wrote:

> On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote:
> > Hi,
> >
> > win32 do not mean it is a 32 bits windows. sys.platform always return
> > win32 on 32bits and 64 bits windows even for python 64 bits.
> >
> > But that is a good question, is your python 32 or 64 bits?
> 32 bits.
>
> Colin W.
> >
> > Fred
> >
> > On Wed, Mar 20, 2013 at 10:14 AM, Da?id <davidmenhur at gmail.com> wrote:
> >> Without much detailed knowledge of the topic, I would expect both
> >> versions to give very similar timing, as it is essentially a call to
> >> ATLAS function, not much is done in Python.
> >>
> >> Given this, maybe the difference is in ATLAS itself. How have you
> >> installed it? When you compile ATLAS, it will do some machine-specific
> >> optimisation, but if you have installed a binary chances are that your
> >> version is optimised for a machine quite different from yours. So, two
> >> different installations could have been compiled in different machines
> >> and so one is more suited for your machine. If you want to be sure, I
> >> would try to compile ATLAS (this may be difficult) or check the same
> >> on a very different machine (like an AMD processor, different
> >> architecture...).
> >>
> >>
> >>
> >> Just for reference, on Linux Python 2.7 64 bits can deal with these
> >> matrices easily.
> >>
> >> %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat);
> >> res = np.dot(mat, matinv); diff= res-np.eye(6143); print
> >> np.sum(np.abs(diff))
> >> 2.41799631031e-05
> >> 1.13955868701e-05
> >> 3.64338191541e-05
> >> 1.13484781021e-05
> >> 1 loops, best of 3: 156 s per loop
> >>
> >> Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository
> >> (I don't run heavy stuff on this computer).
> >>
> >> On 20 March 2013 14:46, Colin J. Williams <cjw at ncf.ca> wrote:
> >>> I have a small program which builds random matrices for increasing
> matrix
> >>> orders, inverts the matrix and checks the precision of the product.
>  At some
> >>> point, one would expect operations to fail, when the memory capacity is
> >>> exceeded.  In both Python 2.7 and 3.2 matrices of order 3,071 area
> handled,
> >>> but not 6,143.
> >>>
> >>> Using wall-clock times, with win32, Python 3.2 is slower than Python
> 2.7.
> >>> The profiler indicates a problem in the solver.
> >>>
> >>> Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of
> free
> >>> disk space.  Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2.
> >>>
> >>> The results are show below.
> >>>
> >>> Colin W.
> >>>
> >>> aaaa_ssss
> >>> 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
> >>> order=    2   measure ofimprecision= 0.097   Time elapsed (seconds)=
> >>> 0.004143
> >>> order=    5   measure ofimprecision= 2.207   Time elapsed (seconds)=
> >>> 0.001514
> >>> order=   11   measure ofimprecision= 2.372   Time elapsed (seconds)=
> >>> 0.001455
> >>> order=   23   measure ofimprecision= 3.318   Time elapsed (seconds)=
> >>> 0.001608
> >>> order=   47   measure ofimprecision= 4.257   Time elapsed (seconds)=
> >>> 0.002339
> >>> order=   95   measure ofimprecision= 4.986   Time elapsed (seconds)=
> >>> 0.005747
> >>> order=  191   measure ofimprecision= 5.788   Time elapsed (seconds)=
> >>> 0.029974
> >>> order=  383   measure ofimprecision= 6.765   Time elapsed (seconds)=
> >>> 0.145339
> >>> order=  767   measure ofimprecision= 7.909   Time elapsed (seconds)=
> >>> 0.841142
> >>> order= 1535   measure ofimprecision= 8.532   Time elapsed (seconds)=
> >>> 5.793630
> >>> order= 3071   measure ofimprecision= 9.774   Time elapsed (seconds)=
> >>> 39.559540
> >>> order=  6143 Process terminated by a MemoryError
> >>>
> >>> Above: 2.7.3  Below: Python 3.2.3
> >>>
> >>> bbb_bbb
> >>> 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
> >>> order=    2   measure ofimprecision= 0.000   Time elapsed (seconds)=
> >>> 0.113930
> >>> order=    5   measure ofimprecision= 1.807   Time elapsed (seconds)=
> >>> 0.001373
> >>> order=   11   measure ofimprecision= 2.395   Time elapsed (seconds)=
> >>> 0.001468
> >>> order=   23   measure ofimprecision= 3.073   Time elapsed (seconds)=
> >>> 0.001609
> >>> order=   47   measure ofimprecision= 5.642   Time elapsed (seconds)=
> >>> 0.002687
> >>> order=   95   measure ofimprecision= 5.745   Time elapsed (seconds)=
> >>> 0.013510
> >>> order=  191   measure ofimprecision= 5.866   Time elapsed (seconds)=
> >>> 0.061560
> >>> order=  383   measure ofimprecision= 7.129   Time elapsed (seconds)=
> >>> 0.418490
> >>> order=  767   measure ofimprecision= 8.240   Time elapsed (seconds)=
> >>> 3.815713
> >>> order= 1535   measure ofimprecision= 8.735   Time elapsed (seconds)=
> >>> 27.877270
> >>> order= 3071   measure ofimprecision= 9.996   Time elapsed
> >>> (seconds)=212.545610
> >>> order=  6143 Process terminated by a MemoryError
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion at scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/65f81c56/attachment.html>

From jaakko.luttinen at aalto.fi  Wed Mar 20 11:10:02 2013
From: jaakko.luttinen at aalto.fi (Jaakko Luttinen)
Date: Wed, 20 Mar 2013 17:10:02 +0200
Subject: [Numpy-discussion] Dot/inner products with broadcasting?
In-Reply-To: <5149BABE.1080306@aalto.fi>
References: <51408B59.8090504@aalto.fi> <5141BA5E.2020704@aalto.fi>
	<CAMGgv_-uW3PBbnjWjvug732d9Ch=k+J1W03nNZnCkSqtKBhF_g@mail.gmail.com>
	<5149BABE.1080306@aalto.fi>
Message-ID: <5149D14A.4020402@aalto.fi>

Well, thanks to seberg, I finally noticed that there is a dot product
function in this new module numpy.core.gufuncs_linalg, it was just named
differently (matrix_multiply instead of dot).

However, I may have found a bug in it:

import numpy.core.gufuncs_linalg as gula
A = np.arange(2*2).reshape((2,2))
B = np.arange(2*1).reshape((2,1))
gula.matrix_multiply(A, B)
----
ValueError: On entry to DGEMM parameter number 10 had an illegal value

-Jaakko

On 03/20/2013 03:33 PM, Jaakko Luttinen wrote:
> I tried using this inner1d as an alternative to dot because it uses
> broadcasting. However, I found something surprising: Not only is inner1d
> much much slower than dot, it is also slower than einsum which is much
> more general:
> 
> In [68]: import numpy as np
> 
> In [69]: import numpy.core.gufuncs_linalg as gula
> 
> In [70]: K = np.random.randn(1000,1000)
> 
> In [71]: %timeit gula.inner1d(K[:,np.newaxis,:],
> np.swapaxes(K,-1,-2)[np.newaxis,:,:])
> 1 loops, best of 3: 6.05 s per loop
> 
> In [72]: %timeit np.dot(K,K)
> 1 loops, best of 3: 392 ms per loop
> 
> In [73]: %timeit np.einsum('ik,kj->ij', K, K)
> 1 loops, best of 3: 1.24 s per loop
> 
> Why is it so? I thought that the performance of inner1d would be
> somewhere in between dot and einsum, probably closer to dot. Now I don't
> see any reason to use inner1d instead of einsum..
> 
> -Jaakko
> 
> On 03/15/2013 04:22 PM, Oscar Villellas wrote:
>> In fact, there is already an inner1d implemented in
>> numpy.core.umath_tests.inner1d
>>
>> from numpy.core.umath_tests import inner1d
>>
>> It should do the trick :)
>>
>> On Thu, Mar 14, 2013 at 12:54 PM, Jaakko Luttinen
>> <jaakko.luttinen at aalto.fi> wrote:
>>> Answering to myself, this pull request seems to implement an inner
>>> product with broadcasting (inner1d) and many other useful functions:
>>> https://github.com/numpy/numpy/pull/2954/
>>> -J
>>>
>>> On 03/13/2013 04:21 PM, Jaakko Luttinen wrote:
>>>> Hi!
>>>>
>>>> How can I compute dot product (or similar multiply&sum operations)
>>>> efficiently so that broadcasting is utilized?
>>>> For multi-dimensional arrays, NumPy's inner and dot functions do not
>>>> match the leading axes and use broadcasting, but instead the result has
>>>> first the leading axes of the first input array and then the leading
>>>> axes of the second input array.
>>>>
>>>> For instance, I would like to compute the following inner-product:
>>>> np.sum(A*B, axis=-1)
>>>>
>>>> But numpy.inner gives:
>>>> A = np.random.randn(2,3,4)
>>>> B = np.random.randn(3,4)
>>>> np.inner(A,B).shape
>>>> # -> (2, 3, 3) instead of (2, 3)
>>>>
>>>> Similarly for dot product, I would like to compute for instance:
>>>> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2)
>>>>
>>>> But numpy.dot gives:
>>>> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5)
>>>> In [13]: np.dot(A,B).shape
>>>> # -> (2, 3, 2, 5) instead of (2, 3, 5)
>>>>
>>>> I could use einsum for these operations, but I'm not sure whether that's
>>>> as efficient as using some BLAS-supported(?) dot products.
>>>>
>>>> I couldn't find any function which could perform this kind of
>>>> operations. NumPy's functions seem to either flatten the input arrays
>>>> (vdot, outer) or just use the axes of the input arrays separately (dot,
>>>> inner, tensordot).
>>>>
>>>> Any help?
>>>>
>>>> Best regards,
>>>> Jaakko
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From nouiz at nouiz.org  Wed Mar 20 11:12:17 2013
From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=)
Date: Wed, 20 Mar 2013 11:12:17 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <5149CF4D.6090906@gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
Message-ID: <CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>

On Wed, Mar 20, 2013 at 11:01 AM, Colin J. Williams
<cjwilliams43 at gmail.com> wrote:
> On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote:
>>
>> Hi,
>>
>> win32 do not mean it is a 32 bits windows. sys.platform always return
>> win32 on 32bits and 64 bits windows even for python 64 bits.
>>
>> But that is a good question, is your python 32 or 64 bits?
>
> 32 bits.

That explain why you have memory problem but not other people with 64
bits version. So if you want to work with bigger input, change to a
python 64 bits.

Fred


From cjwilliams43 at gmail.com  Wed Mar 20 11:16:05 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Wed, 20 Mar 2013 11:16:05 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CAM-Pw03uiJ3-jb1zV=m4hLw5EATXVY2N1+NS5RrK35EFObTfPQ@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CAM-Pw03uiJ3-jb1zV=m4hLw5EATXVY2N1+NS5RrK35EFObTfPQ@mail.gmail.com>
Message-ID: <5149D2B5.8000907@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/66475f65/attachment.html>

From cjwilliams43 at gmail.com  Wed Mar 20 11:18:23 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Wed, 20 Mar 2013 11:18:23 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
Message-ID: <5149D33F.9080907@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/bf9748be/attachment.html>

From lists at hilboll.de  Wed Mar 20 11:31:04 2013
From: lists at hilboll.de (Andreas Hilboll)
Date: Wed, 20 Mar 2013 16:31:04 +0100
Subject: [Numpy-discussion] how to efficiently select multiple slices from
	an array?
Message-ID: <5149D638.9050000@hilboll.de>

Cross-posting a question I asked on SO
(http://stackoverflow.com/q/15527666/152439):


Given an array

    d = np.random.randn(100)

and an index array

    i = np.random.random_integers(low=3, high=d.size - 5, size=20)

how can I efficiently create a 2d array r with

    r.shape = (20, 8)

such that for all j=0..19,

    r[j] = d[i[j]-3:i[j]+5]

In my case, the arrays are quite large (~200000 instead of 100 and 20),
so something quick would be useful.


Cheers, Andreas.


From sebastian at sipsolutions.net  Wed Mar 20 11:43:17 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 20 Mar 2013 16:43:17 +0100
Subject: [Numpy-discussion] how to efficiently select multiple slices
 from an array?
In-Reply-To: <5149D638.9050000@hilboll.de>
References: <5149D638.9050000@hilboll.de>
Message-ID: <1363794197.22391.9.camel@sebastian-laptop>

Hey,

On Wed, 2013-03-20 at 16:31 +0100, Andreas Hilboll wrote:
> Cross-posting a question I asked on SO
> (http://stackoverflow.com/q/15527666/152439):
> 
> 
> Given an array
> 
>     d = np.random.randn(100)
> 
> and an index array
> 
>     i = np.random.random_integers(low=3, high=d.size - 5, size=20)
> 
> how can I efficiently create a 2d array r with
> 
>     r.shape = (20, 8)
> 
> such that for all j=0..19,
> 
>     r[j] = d[i[j]-3:i[j]+5]
> 
> In my case, the arrays are quite large (~200000 instead of 100 and 20),
> so something quick would be useful.


You can use stride tricks, its simple to do by hand, but since I got it,
maybe just use this: https://gist.github.com/seberg/3866040

d = np.random.randn(100)
windowed_d = rolling_window(d, 8)
i = np.random_integers(len(windowed_d))
r = d[i,:]

Or use stride_tricks by hand, with:
windowed_d = np.lib.stride_tricks.as_strided(d, (d.shape[0]-7, 8),
(d.strides[0],)*2)

Since the fancy indexing will create a copy, while windowed_d views the
same data as the original array, of course that is not the case for the
end result.

Regards,

Sebastian

> 
> Cheers, Andreas.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From robert.kern at gmail.com  Wed Mar 20 12:03:36 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 20 Mar 2013 16:03:36 +0000
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <5149C0C5.6050801@syntonetic.com>
References: <5149C0C5.6050801@syntonetic.com>
Message-ID: <CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>

On Wed, Mar 20, 2013 at 1:59 PM, S?ren <sd at syntonetic.com> wrote:
> Greetings
>
> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching.
> It already works like a charm calling python with the C API <Python.h>.
>
> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C?
>
> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities.

What is this `PyArray` that you are referring to? There is nothing
named just `PyArray` to my knowledge. Do you mean direct access to the
`data` member of the PyArrayObject struct? Yes, that is deprecated.
Use the PyArray_DATA() macro to get a `void*` pointer to the start of
the data.

http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA

--
Robert Kern


From warren.weckesser at gmail.com  Wed Mar 20 13:11:10 2013
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Wed, 20 Mar 2013 13:11:10 -0400
Subject: [Numpy-discussion] Add ability to disable the autogeneration of
 the function signature in a ufunc docstring.
In-Reply-To: <CAPJVwBkZc-EQGiGc3dWhr_LHW8MAnTg7xqrZkWtsFuQOcpQhkg@mail.gmail.com>
References: <CAGzF1udzYGDnKNvYg5ZYuzkg4g4d6bHYLTUD91Gk4nZhLaWkWQ@mail.gmail.com>
	<CAPJVwBkZc-EQGiGc3dWhr_LHW8MAnTg7xqrZkWtsFuQOcpQhkg@mail.gmail.com>
Message-ID: <CAGzF1uc6PDKEqmKvdURTTqUcqCG2wwmegmU9CwC7U4DMtwduvQ@mail.gmail.com>

On Fri, Mar 15, 2013 at 4:39 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Mar 15, 2013 at 6:47 PM, Warren Weckesser
> <warren.weckesser at gmail.com> wrote:
> > Hi all,
> >
> > In a recent scipy pull request (https://github.com/scipy/scipy/pull/459),
> I
> > ran into the problem of ufuncs automatically generating a signature in
> the
> > docstring using arguments such as 'x' or 'x1, x2'.  scipy.special has a
> lot
> > of ufuncs, and for most of them, there are much more descriptive or
> > conventional argument names than 'x'.  For now, we will include a nicer
> > signature in the added docstring, and grudgingly put up with the one
> > generated by the ufunc.  In the long term, it would be nice to be able to
> > disable the automatic generation of the signature.  I submitted a pull
> > request to numpy to allow that: https://github.com/numpy/numpy/pull/3149
> >
> > Comments on the pull request would be appreciated.
>
> The functionality seems obviously useful, but adding a magic public
> attribute to all ufuncs seems like a somewhat clumsy way to expose it?
> Esp. since ufuncs are always created through the C API, including
> docstring specification, but this can only be set at the Python level?
> Maybe it's the best option but it seems worth taking a few minutes to
> consider alternatives.
>


Agreed;  exposing the flag as part of the public Python ufunc API is
unnecessary, since this is something that would rarely, if ever, be changed
during the life of the ufunc.


> Brainstorming:
>
> - If the first line of the docstring starts with "<funcname>(" and
> ends with ")", then that's a signature and we skip adding one (I think
> sphinx does something like this?) Kinda magic and implicit, but highly
> backwards compatible.
>
> - Declare that henceforth, the signature generation will be disabled
> by default, and go through and add a special marker like
> "__SIGNATURE__" to all the existing ufunc docstrings, which gets
> replaced (if present) by the automagically generated signature.
>
> - Give ufunc arguments actual names in general, that work for things
> like kwargs, and then use those in the automagically generated
> signature. This is the most work, but it would mean that people don't
> have to remember to update their non-magic signatures whenever numpy
> adds a new feature like out= or where=, and would make the docstrings
> actually accurate, which right now they aren't:
>
>
I'm leaning towards this option.  I don't know if there would still be a
need to disable the automatic generation of the docstring if it was good
enough.


In [7]: np.add.__doc__.split("\n")[0]
> Out[7]: 'add(x1, x2[, out])'
>
> In [8]: np.add(x1=1, x2=2)
> ValueError: invalid number of arguments
>
> - Allow some special syntax to describe the argument names in the
> docstring: "__ARGNAMES__: a b\n" -> "add(a, b[, out])"
>
> - Something else...
>
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/4285137b/attachment.html>

From oscar.villellas at continuum.io  Wed Mar 20 13:14:41 2013
From: oscar.villellas at continuum.io (Oscar Villellas)
Date: Wed, 20 Mar 2013 18:14:41 +0100
Subject: [Numpy-discussion] Dot/inner products with broadcasting?
In-Reply-To: <5149D14A.4020402@aalto.fi>
References: <51408B59.8090504@aalto.fi> <5141BA5E.2020704@aalto.fi>
	<CAMGgv_-uW3PBbnjWjvug732d9Ch=k+J1W03nNZnCkSqtKBhF_g@mail.gmail.com>
	<5149BABE.1080306@aalto.fi> <5149D14A.4020402@aalto.fi>
Message-ID: <CAMGgv_8hYM91ADaTe+q-V8DAWg3qGOS_mdUd2HwWLEWEMzPkyg@mail.gmail.com>

Reproduced it. I will take a look at it. That error comes direct from
BLAS and shouldn't be happening.

I will also look why inner1d is not performing well. Note: inner1d is
implemented with calls to BLAS (dot).

I will get back to you later :)

On Wed, Mar 20, 2013 at 4:10 PM, Jaakko Luttinen
<jaakko.luttinen at aalto.fi> wrote:
> Well, thanks to seberg, I finally noticed that there is a dot product
> function in this new module numpy.core.gufuncs_linalg, it was just named
> differently (matrix_multiply instead of dot).
>
> However, I may have found a bug in it:
>
> import numpy.core.gufuncs_linalg as gula
> A = np.arange(2*2).reshape((2,2))
> B = np.arange(2*1).reshape((2,1))
> gula.matrix_multiply(A, B)
> ----
> ValueError: On entry to DGEMM parameter number 10 had an illegal value
>
> -Jaakko
>
> On 03/20/2013 03:33 PM, Jaakko Luttinen wrote:
>> I tried using this inner1d as an alternative to dot because it uses
>> broadcasting. However, I found something surprising: Not only is inner1d
>> much much slower than dot, it is also slower than einsum which is much
>> more general:
>>
>> In [68]: import numpy as np
>>
>> In [69]: import numpy.core.gufuncs_linalg as gula
>>
>> In [70]: K = np.random.randn(1000,1000)
>>
>> In [71]: %timeit gula.inner1d(K[:,np.newaxis,:],
>> np.swapaxes(K,-1,-2)[np.newaxis,:,:])
>> 1 loops, best of 3: 6.05 s per loop
>>
>> In [72]: %timeit np.dot(K,K)
>> 1 loops, best of 3: 392 ms per loop
>>
>> In [73]: %timeit np.einsum('ik,kj->ij', K, K)
>> 1 loops, best of 3: 1.24 s per loop
>>
>> Why is it so? I thought that the performance of inner1d would be
>> somewhere in between dot and einsum, probably closer to dot. Now I don't
>> see any reason to use inner1d instead of einsum..
>>
>> -Jaakko
>>
>> On 03/15/2013 04:22 PM, Oscar Villellas wrote:
>>> In fact, there is already an inner1d implemented in
>>> numpy.core.umath_tests.inner1d
>>>
>>> from numpy.core.umath_tests import inner1d
>>>
>>> It should do the trick :)
>>>
>>> On Thu, Mar 14, 2013 at 12:54 PM, Jaakko Luttinen
>>> <jaakko.luttinen at aalto.fi> wrote:
>>>> Answering to myself, this pull request seems to implement an inner
>>>> product with broadcasting (inner1d) and many other useful functions:
>>>> https://github.com/numpy/numpy/pull/2954/
>>>> -J
>>>>
>>>> On 03/13/2013 04:21 PM, Jaakko Luttinen wrote:
>>>>> Hi!
>>>>>
>>>>> How can I compute dot product (or similar multiply&sum operations)
>>>>> efficiently so that broadcasting is utilized?
>>>>> For multi-dimensional arrays, NumPy's inner and dot functions do not
>>>>> match the leading axes and use broadcasting, but instead the result has
>>>>> first the leading axes of the first input array and then the leading
>>>>> axes of the second input array.
>>>>>
>>>>> For instance, I would like to compute the following inner-product:
>>>>> np.sum(A*B, axis=-1)
>>>>>
>>>>> But numpy.inner gives:
>>>>> A = np.random.randn(2,3,4)
>>>>> B = np.random.randn(3,4)
>>>>> np.inner(A,B).shape
>>>>> # -> (2, 3, 3) instead of (2, 3)
>>>>>
>>>>> Similarly for dot product, I would like to compute for instance:
>>>>> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2)
>>>>>
>>>>> But numpy.dot gives:
>>>>> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5)
>>>>> In [13]: np.dot(A,B).shape
>>>>> # -> (2, 3, 2, 5) instead of (2, 3, 5)
>>>>>
>>>>> I could use einsum for these operations, but I'm not sure whether that's
>>>>> as efficient as using some BLAS-supported(?) dot products.
>>>>>
>>>>> I couldn't find any function which could perform this kind of
>>>>> operations. NumPy's functions seem to either flatten the input arrays
>>>>> (vdot, outer) or just use the axes of the input arrays separately (dot,
>>>>> inner, tensordot).
>>>>>
>>>>> Any help?
>>>>>
>>>>> Best regards,
>>>>> Jaakko
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From njs at pobox.com  Wed Mar 20 13:16:30 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 20 Mar 2013 17:16:30 +0000
Subject: [Numpy-discussion] Add ability to disable the autogeneration of
 the function signature in a ufunc docstring.
In-Reply-To: <CAGzF1uc6PDKEqmKvdURTTqUcqCG2wwmegmU9CwC7U4DMtwduvQ@mail.gmail.com>
References: <CAGzF1udzYGDnKNvYg5ZYuzkg4g4d6bHYLTUD91Gk4nZhLaWkWQ@mail.gmail.com>
	<CAPJVwBkZc-EQGiGc3dWhr_LHW8MAnTg7xqrZkWtsFuQOcpQhkg@mail.gmail.com>
	<CAGzF1uc6PDKEqmKvdURTTqUcqCG2wwmegmU9CwC7U4DMtwduvQ@mail.gmail.com>
Message-ID: <CAPJVwBkrat5ZH37_PkJ5ZKNQcAje+5enLqJ6n_5v7Z6H6TYRiw@mail.gmail.com>

On 20 Mar 2013 17:11, "Warren Weckesser" <warren.weckesser at gmail.com> wrote:
>
>
>
> On Fri, Mar 15, 2013 at 4:39 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Fri, Mar 15, 2013 at 6:47 PM, Warren Weckesser
>> <warren.weckesser at gmail.com> wrote:
>> > Hi all,
>> >
>> > In a recent scipy pull request (https://github.com/scipy/scipy/pull/459),
I
>> > ran into the problem of ufuncs automatically generating a signature in
the
>> > docstring using arguments such as 'x' or 'x1, x2'.  scipy.special has
a lot
>> > of ufuncs, and for most of them, there are much more descriptive or
>> > conventional argument names than 'x'.  For now, we will include a nicer
>> > signature in the added docstring, and grudgingly put up with the one
>> > generated by the ufunc.  In the long term, it would be nice to be able
to
>> > disable the automatic generation of the signature.  I submitted a pull
>> > request to numpy to allow that:
https://github.com/numpy/numpy/pull/3149
>> >
>> > Comments on the pull request would be appreciated.
>>
>> The functionality seems obviously useful, but adding a magic public
>> attribute to all ufuncs seems like a somewhat clumsy way to expose it?
>> Esp. since ufuncs are always created through the C API, including
>> docstring specification, but this can only be set at the Python level?
>> Maybe it's the best option but it seems worth taking a few minutes to
>> consider alternatives.
>
>
>
> Agreed;  exposing the flag as part of the public Python ufunc API is
unnecessary, since this is something that would rarely, if ever, be changed
during the life of the ufunc.
>
>
>>
>> Brainstorming:
>>
>> - If the first line of the docstring starts with "<funcname>(" and
>> ends with ")", then that's a signature and we skip adding one (I think
>> sphinx does something like this?) Kinda magic and implicit, but highly
>> backwards compatible.
>>
>> - Declare that henceforth, the signature generation will be disabled
>> by default, and go through and add a special marker like
>> "__SIGNATURE__" to all the existing ufunc docstrings, which gets
>> replaced (if present) by the automagically generated signature.
>>
>> - Give ufunc arguments actual names in general, that work for things
>> like kwargs, and then use those in the automagically generated
>> signature. This is the most work, but it would mean that people don't
>> have to remember to update their non-magic signatures whenever numpy
>> adds a new feature like out= or where=, and would make the docstrings
>> actually accurate, which right now they aren't:
>>
>
> I'm leaning towards this option.  I don't know if there would still be a
need to disable the automatic generation of the docstring if it was good
enough.

Certainly it would be nice for ufunc argument handling to better match
python argument handling! Just needs someone willing to do the work...
*cough* ;-)

-n

>> In [7]: np.add.__doc__.split("\n")[0]
>> Out[7]: 'add(x1, x2[, out])'
>>
>> In [8]: np.add(x1=1, x2=2)
>> ValueError: invalid number of arguments
>>
>> - Allow some special syntax to describe the argument names in the
>> docstring: "__ARGNAMES__: a b\n" -> "add(a, b[, out])"
>>
>> - Something else...
>>
>> -n
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130320/a9176797/attachment.html>

From lists at hilboll.de  Wed Mar 20 13:59:22 2013
From: lists at hilboll.de (Andreas Hilboll)
Date: Wed, 20 Mar 2013 18:59:22 +0100
Subject: [Numpy-discussion] how to efficiently select multiple slices
 from an array?
In-Reply-To: <1363794197.22391.9.camel@sebastian-laptop>
References: <5149D638.9050000@hilboll.de>
	<1363794197.22391.9.camel@sebastian-laptop>
Message-ID: <5149F8FA.6040707@hilboll.de>

> Hey,
>
> On Wed, 2013-03-20 at 16:31 +0100, Andreas Hilboll wrote:
>> Cross-posting a question I asked on SO
>> (http://stackoverflow.com/q/15527666/152439):
>>
>>
>> Given an array
>>
>>     d = np.random.randn(100)
>>
>> and an index array
>>
>>     i = np.random.random_integers(low=3, high=d.size - 5, size=20)
>>
>> how can I efficiently create a 2d array r with
>>
>>     r.shape = (20, 8)
>>
>> such that for all j=0..19,
>>
>>     r[j] = d[i[j]-3:i[j]+5]
>>
>> In my case, the arrays are quite large (~200000 instead of 100 and 20),
>> so something quick would be useful.
>
>
> You can use stride tricks, its simple to do by hand, but since I got it,
> maybe just use this: https://gist.github.com/seberg/3866040
>
> d = np.random.randn(100)
> windowed_d = rolling_window(d, 8)
> i = np.random_integers(len(windowed_d))
> r = d[i,:]
>
> Or use stride_tricks by hand, with:
> windowed_d = np.lib.stride_tricks.as_strided(d, (d.shape[0]-7, 8),
> (d.strides[0],)*2)
>
> Since the fancy indexing will create a copy, while windowed_d views the
> same data as the original array, of course that is not the case for the
> end result.
>
> Regards,
>
> Sebastian
>
>>
>> Cheers, Andreas.
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

cool, thanks!


From chris.barker at noaa.gov  Wed Mar 20 14:25:20 2013
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Wed, 20 Mar 2013 11:25:20 -0700
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
References: <5149C0C5.6050801@syntonetic.com>
	<CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
Message-ID: <CALGmxEKhkksjaEMFY5LOzSoaAjcF9E1_BZt4DceTWA+rVXGKZA@mail.gmail.com>

On Wed, Mar 20, 2013 at 9:03 AM, Robert Kern <robert.kern at gmail.com> wrote:
I highly recommend using an existing tool to write this interface, to
take care of the reference counting, etc for you.

Cython is particularly nice.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From sd at syntonetic.com  Thu Mar 21 04:11:35 2013
From: sd at syntonetic.com (=?UTF-8?B?U8O4cmVu?=)
Date: Thu, 21 Mar 2013 09:11:35 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
References: <5149C0C5.6050801@syntonetic.com>
	<CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
Message-ID: <514AC0B7.6020500@syntonetic.com>

Thanks Robert, for making that clear.

I got a deprecated warning the second I added
#include <numpy/arrayobject.h>
and I got scared off too fast in my exploring phase.

Cheers
S?ren

On 20/03/2013 17:03, Robert Kern wrote:
> On Wed, Mar 20, 2013 at 1:59 PM, S?ren <sd at syntonetic.com> wrote:
>> Greetings
>>
>> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching.
>> It already works like a charm calling python with the C API <Python.h>.
>>
>> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C?
>>
>> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities.
> What is this `PyArray` that you are referring to? There is nothing
> named just `PyArray` to my knowledge. Do you mean direct access to the
> `data` member of the PyArrayObject struct? Yes, that is deprecated.
> Use the PyArray_DATA() macro to get a `void*` pointer to the start of
> the data.
>
> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From valentin at haenel.co  Thu Mar 21 04:45:21 2013
From: valentin at haenel.co (Valentin Haenel)
Date: Thu, 21 Mar 2013 09:45:21 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <514AC0B7.6020500@syntonetic.com>
References: <5149C0C5.6050801@syntonetic.com>
	<CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
	<514AC0B7.6020500@syntonetic.com>
Message-ID: <20130321084521.GG7842@kudu.in-berlin.de>

Dear S?ren,

if you are new to interfacing python/numpy with C/C++, you may want to
check out:

http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html

Disclaimer: I am the author of this chapter, so this response is a bit
of a shameless plug :D

Hope it helps none the less.

V-

* S?ren <sd at syntonetic.com> [2013-03-21]:
> Thanks Robert, for making that clear.
>
> I got a deprecated warning the second I added
> #include <numpy/arrayobject.h>
> and I got scared off too fast in my exploring phase.
>
> Cheers
> S?ren
>
> On 20/03/2013 17:03, Robert Kern wrote:
> > On Wed, Mar 20, 2013 at 1:59 PM, S?ren <sd at syntonetic.com> wrote:
> >> Greetings
> >>
> >> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching.
> >> It already works like a charm calling python with the C API <Python.h>.
> >>
> >> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C?
> >>
> >> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities.
> > What is this `PyArray` that you are referring to? There is nothing
> > named just `PyArray` to my knowledge. Do you mean direct access to the
> > `data` member of the PyArrayObject struct? Yes, that is deprecated.
> > Use the PyArray_DATA() macro to get a `void*` pointer to the start of
> > the data.
> >
> > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA
> >
> > --
> > Robert Kern
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From daniele at grinta.net  Thu Mar 21 05:04:39 2013
From: daniele at grinta.net (Daniele Nicolodi)
Date: Thu, 21 Mar 2013 10:04:39 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <20130321084521.GG7842@kudu.in-berlin.de>
References: <5149C0C5.6050801@syntonetic.com>
	<CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
	<514AC0B7.6020500@syntonetic.com>
	<20130321084521.GG7842@kudu.in-berlin.de>
Message-ID: <514ACD27.3060504@grinta.net>

On 21/03/2013 09:45, Valentin Haenel wrote:
> if you are new to interfacing python/numpy with C/C++, you may want to
> check out:
> 
> http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html
> 
> Disclaimer: I am the author of this chapter, so this response is a bit
> of a shameless plug :D

Hello Valentin,

I had a quick look at the chapter. It looks good! Thanks for sharing it.

However I have a small comment on the way you implement the Cython-Numpy
solution. I would have written the loop over the array element in Cython
itself rather than in a separately compiled C function. This would have
the advantage of presenting more capabilities of Cython and would
slightly decrease the complexity of the solution (one source file
instead of two).

Cheers,
Daniele


From valentin at haenel.co  Thu Mar 21 05:16:50 2013
From: valentin at haenel.co (Valentin Haenel)
Date: Thu, 21 Mar 2013 10:16:50 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <514ACD27.3060504@grinta.net>
References: <5149C0C5.6050801@syntonetic.com>
	<CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
	<514AC0B7.6020500@syntonetic.com>
	<20130321084521.GG7842@kudu.in-berlin.de>
	<514ACD27.3060504@grinta.net>
Message-ID: <20130321091649.GA12061@kudu.in-berlin.de>

Dear Daniele

* Daniele Nicolodi <daniele at grinta.net> [2013-03-21]:
> On 21/03/2013 09:45, Valentin Haenel wrote:
> > if you are new to interfacing python/numpy with C/C++, you may want to
> > check out:
> > 
> > http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html
> > 
> > Disclaimer: I am the author of this chapter, so this response is a bit
> > of a shameless plug :D
> 
> Hello Valentin,
> 
> I had a quick look at the chapter. It looks good! Thanks for sharing it.
> 
> However I have a small comment on the way you implement the Cython-Numpy
> solution. I would have written the loop over the array element in Cython
> itself rather than in a separately compiled C function. This would have
> the advantage of presenting more capabilities of Cython and would
> slightly decrease the complexity of the solution (one source file
> instead of two).

Thanks very much for your feedback! Since the chapter in under a CC
licence, you are welcome to submit your proposal as a Pull-Request. :D

The reason why I wrote  the loop in C is so that the cython example
synergieses with the others. The ideas is, that you have an already
existing code-base that has a function which has such a signature.
Ideally, your proposal would be an improvement, where the original
example stays in place and you develop the improvement including reasons
as to why it is better. ;)

best

V-


From daniele at grinta.net  Thu Mar 21 06:14:16 2013
From: daniele at grinta.net (Daniele Nicolodi)
Date: Thu, 21 Mar 2013 11:14:16 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <20130321091649.GA12061@kudu.in-berlin.de>
References: <5149C0C5.6050801@syntonetic.com>
	<CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
	<514AC0B7.6020500@syntonetic.com>
	<20130321084521.GG7842@kudu.in-berlin.de>
	<514ACD27.3060504@grinta.net>
	<20130321091649.GA12061@kudu.in-berlin.de>
Message-ID: <514ADD78.4010604@grinta.net>

On 21/03/2013 10:16, Valentin Haenel wrote:
> Dear Daniele
> 
> * Daniele Nicolodi <daniele at grinta.net> [2013-03-21]:
>> On 21/03/2013 09:45, Valentin Haenel wrote:
>>> if you are new to interfacing python/numpy with C/C++, you may want to
>>> check out:
>>>
>>> http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html
>>>
>>> Disclaimer: I am the author of this chapter, so this response is a bit
>>> of a shameless plug :D
>>
>> Hello Valentin,
>>
>> I had a quick look at the chapter. It looks good! Thanks for sharing it.
>>
>> However I have a small comment on the way you implement the Cython-Numpy
>> solution. I would have written the loop over the array element in Cython
>> itself rather than in a separately compiled C function. This would have
>> the advantage of presenting more capabilities of Cython and would
>> slightly decrease the complexity of the solution (one source file
>> instead of two).
> 
> Thanks very much for your feedback! Since the chapter in under a CC
> licence, you are welcome to submit your proposal as a Pull-Request. :D
> 
> The reason why I wrote  the loop in C is so that the cython example
> synergieses with the others. The ideas is, that you have an already
> existing code-base that has a function which has such a signature.
> Ideally, your proposal would be an improvement, where the original
> example stays in place and you develop the improvement including reasons
> as to why it is better. ;)

I understand the reasoning behind your choice. I'm adding sending you a
patch with this addition to my todo list, but I don't really know when I
will have time to work on it...

Cheers,
Daniele


From valentin at haenel.co  Thu Mar 21 06:19:12 2013
From: valentin at haenel.co (Valentin Haenel)
Date: Thu, 21 Mar 2013 11:19:12 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <514ADD78.4010604@grinta.net>
References: <5149C0C5.6050801@syntonetic.com>
	<CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
	<514AC0B7.6020500@syntonetic.com>
	<20130321084521.GG7842@kudu.in-berlin.de>
	<514ACD27.3060504@grinta.net>
	<20130321091649.GA12061@kudu.in-berlin.de>
	<514ADD78.4010604@grinta.net>
Message-ID: <20130321101912.GB12061@kudu.in-berlin.de>

* Daniele Nicolodi <daniele at grinta.net> [2013-03-21]:
> On 21/03/2013 10:16, Valentin Haenel wrote:
> > Dear Daniele
> > 
> > * Daniele Nicolodi <daniele at grinta.net> [2013-03-21]:
> >> On 21/03/2013 09:45, Valentin Haenel wrote:
> >>> if you are new to interfacing python/numpy with C/C++, you may want to
> >>> check out:
> >>>
> >>> http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html
> >>>
> >>> Disclaimer: I am the author of this chapter, so this response is a bit
> >>> of a shameless plug :D
> >>
> >> Hello Valentin,
> >>
> >> I had a quick look at the chapter. It looks good! Thanks for sharing it.
> >>
> >> However I have a small comment on the way you implement the Cython-Numpy
> >> solution. I would have written the loop over the array element in Cython
> >> itself rather than in a separately compiled C function. This would have
> >> the advantage of presenting more capabilities of Cython and would
> >> slightly decrease the complexity of the solution (one source file
> >> instead of two).
> > 
> > Thanks very much for your feedback! Since the chapter in under a CC
> > licence, you are welcome to submit your proposal as a Pull-Request. :D
> > 
> > The reason why I wrote  the loop in C is so that the cython example
> > synergieses with the others. The ideas is, that you have an already
> > existing code-base that has a function which has such a signature.
> > Ideally, your proposal would be an improvement, where the original
> > example stays in place and you develop the improvement including reasons
> > as to why it is better. ;)
> 
> I understand the reasoning behind your choice. I'm adding sending you a
> patch with this addition to my todo list, but I don't really know when I
> will have time to work on it...

Aye, that would be great! No need to rush -- you can also throw a feature
request into the project issue tracker, <wishful_thinking> maybe someone
else will grab it. </wishful_thinking>

V-


From sd at syntonetic.com  Thu Mar 21 12:17:15 2013
From: sd at syntonetic.com (=?windows-1252?Q?S=F8ren?=)
Date: Thu, 21 Mar 2013 17:17:15 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <20130321084147.GF7842@kudu.in-berlin.de>
References: <5149C0C5.6050801@syntonetic.com>
	<CAF6FJisyouAjuh7zgEzmUXsmCTez6XjHDBjhmczsoPjK9HcE6w@mail.gmail.com>
	<514AC0B7.6020500@syntonetic.com>
	<20130321084147.GF7842@kudu.in-berlin.de>
Message-ID: <514B328B.40107@syntonetic.com>

Thanks Valentin

Your article fell in dry spot when a newbie in C/Python interfacing.
Python-C-API fits perfectly with my current use-case.

I got currious about the Ctypes approach as well as "Ga?l Varoquaux?s 
blog post about avoiding data copies", but the link in the article 
didn't seem to work. (Under "Further Reading and References")

cheers
S?ren

On 21/03/2013 09:41, Valentin Haenel wrote:
> Dear S?ren,
>
> if you are new to interfacing python/numpy with C/C++, you may want to
> check out:
>
> http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html
>
> Disclaimer: I am the author of this chapter, so this response is a bit
> of a shameless plug :D
>
> Hope it helps.
>
> V-
>
> * S?ren <sd at syntonetic.com> [2013-03-21]:
>> Thanks Robert, for making that clear.
>>
>> I got a deprecated warning the second I added
>> #include <numpy/arrayobject.h>
>> and I got scared off too fast in my exploring phase.
>>
>> Cheers
>> S?ren
>>
>> On 20/03/2013 17:03, Robert Kern wrote:
>>> On Wed, Mar 20, 2013 at 1:59 PM, S?ren <sd at syntonetic.com> wrote:
>>>> Greetings
>>>>
>>>> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching.
>>>> It already works like a charm calling python with the C API <Python.h>.
>>>>
>>>> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C?
>>>>
>>>> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities.
>>> What is this `PyArray` that you are referring to? There is nothing
>>> named just `PyArray` to my knowledge. Do you mean direct access to the
>>> `data` member of the PyArrayObject struct? Yes, that is deprecated.
>>> Use the PyArray_DATA() macro to get a `void*` pointer to the start of
>>> the data.
>>>
>>> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA
>>>
>>> --
>>> Robert Kern
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From valentin at haenel.co  Thu Mar 21 12:34:51 2013
From: valentin at haenel.co (Valentin Haenel)
Date: Thu, 21 Mar 2013 17:34:51 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <514B328B.40107@syntonetic.com>
Message-ID: <20130321163451.GE12061@kudu.in-berlin.de>

Dear S?ren,

* S?ren <sd at syntonetic.com> [2013-03-21]:
> Your article fell in dry spot when a newbie in C/Python interfacing.
> Python-C-API fits perfectly with my current use-case.
> 
> I got currious about the Ctypes approach as well as "Ga?l Varoquaux?s 
> blog post about avoiding data copies", but the link in the article 
> didn't seem to work. (Under "Further Reading and References")

There seems to be something wrong with Ga?l's website. I have CC him,
maybe he can fix it.

best

V-

> 
> cheers
> S?ren
> 
> On 21/03/2013 09:41, Valentin Haenel wrote:
> > Dear S?ren,
> >
> > if you are new to interfacing python/numpy with C/C++, you may want to
> > check out:
> >
> > http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html
> >
> > Disclaimer: I am the author of this chapter, so this response is a bit
> > of a shameless plug :D
> >
> > Hope it helps.
> >
> > V-
> >
> > * S?ren <sd at syntonetic.com> [2013-03-21]:
> >> Thanks Robert, for making that clear.
> >>
> >> I got a deprecated warning the second I added
> >> #include <numpy/arrayobject.h>
> >> and I got scared off too fast in my exploring phase.
> >>
> >> Cheers
> >> S?ren
> >>
> >> On 20/03/2013 17:03, Robert Kern wrote:
> >>> On Wed, Mar 20, 2013 at 1:59 PM, S?ren <sd at syntonetic.com> wrote:
> >>>> Greetings
> >>>>
> >>>> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching.
> >>>> It already works like a charm calling python with the C API <Python.h>.
> >>>>
> >>>> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C?
> >>>>
> >>>> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities.
> >>> What is this `PyArray` that you are referring to? There is nothing
> >>> named just `PyArray` to my knowledge. Do you mean direct access to the
> >>> `data` member of the PyArrayObject struct? Yes, that is deprecated.
> >>> Use the PyArray_DATA() macro to get a `void*` pointer to the start of
> >>> the data.
> >>>
> >>> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA
> >>>
> >>> --
> >>> Robert Kern
> >>> _______________________________________________
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion at scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From ralf.gommers at gmail.com  Thu Mar 21 17:20:31 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 21 Mar 2013 22:20:31 +0100
Subject: [Numpy-discussion] NumPy/SciPy participation in GSoC 2013
Message-ID: <CABL7CQgOxa3bqf3_MXAAAPoq1-ZkN09AJrZpfq94u7V0aTRPtw@mail.gmail.com>

Hi all,

It is the time of the year for Google Summer of Code applications. If we
want to participate with Numpy and/or Scipy, we need two things: enough
mentors and ideas for projects. If we get those, we'll apply under the PSF
umbrella. They've outlined the timeline they're working by and guidelines
at
http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html.


We should be able to come up with some interesting project ideas I'd think,
let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas.
Preferably with enough detail to be understandable for people new to the
projects and a proposed mentor.

We need at least 3 people willing to mentor a student. Ideally we'd have
enough mentors this week, so we can apply to the PSF on time. If you're
willing to be a mentor, please send me the following: name, email address,
phone nr, and what you're interested in mentoring. If you have time
constaints and have doubts about being able to be a primary mentor, being a
backup mentor would also be helpful.

Cheers,
Ralf

P.S. as you can probably tell from the above, I'm happy to coordinate the
GSoC applications for Numpy and Scipy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130321/f349b3dd/attachment.html>

From charlesr.harris at gmail.com  Thu Mar 21 20:00:37 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 21 Mar 2013 18:00:37 -0600
Subject: [Numpy-discussion] Videos of PyCon talks.
Message-ID: <CAB6mnxJUJDE6SKdxGfyUhQbfNEz=KY7dgzYfHavY9VbSK3_ZdA@mail.gmail.com>

Here <http://pyvideo.org/category/33/pycon-us-2013>.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130321/6b8ac68f/attachment.html>

From charlesr.harris at gmail.com  Thu Mar 21 20:02:49 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 21 Mar 2013 18:02:49 -0600
Subject: [Numpy-discussion] Numpy 1.7.1
Message-ID: <CAB6mnxJNiJp3yPk4A5aBxv=vc5M2MS6UdNopP+Bx8UiNYstQrA@mail.gmail.com>

The Numpy 1.7.1 release process seems to have stalled. What do we need to
finish up to get it going again? I think it would be nice to shoot for a
release maybe the weekend after next.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130321/abe7f119/attachment.html>

From ake.sandgren at hpc2n.umu.se  Fri Mar 22 03:47:13 2013
From: ake.sandgren at hpc2n.umu.se (Ake Sandgren)
Date: Fri, 22 Mar 2013 08:47:13 +0100
Subject: [Numpy-discussion] Numpy 1.7.1
In-Reply-To: <CAB6mnxJNiJp3yPk4A5aBxv=vc5M2MS6UdNopP+Bx8UiNYstQrA@mail.gmail.com>
References: <CAB6mnxJNiJp3yPk4A5aBxv=vc5M2MS6UdNopP+Bx8UiNYstQrA@mail.gmail.com>
Message-ID: <1363938433.3343.43.camel@lurvas.hpc2n.umu.se>

On Thu, 2013-03-21 at 18:02 -0600, Charles R Harris wrote:
> The Numpy 1.7.1 release process seems to have stalled. What do we need
> to finish up to get it going again? I think it would be nice to shoot
> for a release maybe the weekend after next.

Talking about 1.7.1 i have a couple of bug fixes for 1.7.0 at
git://github.com/akesandgren/numpy.git in the v1.7.0-hpc2n branch

They are quite small.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se


From pierre.barbierdereuille at gmail.com  Fri Mar 22 06:41:13 2013
From: pierre.barbierdereuille at gmail.com (Pierre Barbier de Reuille)
Date: Fri, 22 Mar 2013 11:41:13 +0100
Subject: [Numpy-discussion] Problem with numpy.records module
Message-ID: <CAPFL7FkWP3iSL+FQ-X5wf_cQ0T50kzg6aLuTJ4yiqNTK4QKNog@mail.gmail.com>

Hello,

I am trying to use titles for record arrays. In the documentation, it is
specified that any column can set to "None". However, using 'None' fails on
numpy 1.6.2 because in np.core.records, on line 195, the "strip" method is
called on the title object.

First, I hope I understood the documentation correctly. If so, is it
possible to replace the line 195 with:

    self._titles = [n.strip() if n is not None else None for n in
titles[:self._nfields]]

so 'None' elements are handled properly?

Thanks,

-- 
Barbier de Reuille Pierre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130322/76b8d24e/attachment.html>

From njs at pobox.com  Fri Mar 22 07:42:51 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 22 Mar 2013 11:42:51 +0000
Subject: [Numpy-discussion] Numpy 1.7.1
In-Reply-To: <1363938433.3343.43.camel@lurvas.hpc2n.umu.se>
References: <CAB6mnxJNiJp3yPk4A5aBxv=vc5M2MS6UdNopP+Bx8UiNYstQrA@mail.gmail.com>
	<1363938433.3343.43.camel@lurvas.hpc2n.umu.se>
Message-ID: <CAPJVwBm6xBNx099iGAidqYRkvkKJ_xpg39OnrMd6ZjmbM43fng@mail.gmail.com>

On Fri, Mar 22, 2013 at 7:47 AM, Ake Sandgren <ake.sandgren at hpc2n.umu.se> wrote:
> On Thu, 2013-03-21 at 18:02 -0600, Charles R Harris wrote:
>> The Numpy 1.7.1 release process seems to have stalled. What do we need
>> to finish up to get it going again? I think it would be nice to shoot
>> for a release maybe the weekend after next.
>
> Talking about 1.7.1 i have a couple of bug fixes for 1.7.0 at
> git://github.com/akesandgren/numpy.git in the v1.7.0-hpc2n branch
>
> They are quite small.

Please send as PRs against master, so we can review and merge them?

-n


From ake.sandgren at hpc2n.umu.se  Fri Mar 22 08:16:50 2013
From: ake.sandgren at hpc2n.umu.se (Ake Sandgren)
Date: Fri, 22 Mar 2013 13:16:50 +0100
Subject: [Numpy-discussion] Numpy 1.7.1
In-Reply-To: <CAPJVwBm6xBNx099iGAidqYRkvkKJ_xpg39OnrMd6ZjmbM43fng@mail.gmail.com>
References: <CAB6mnxJNiJp3yPk4A5aBxv=vc5M2MS6UdNopP+Bx8UiNYstQrA@mail.gmail.com>
	<1363938433.3343.43.camel@lurvas.hpc2n.umu.se>
	<CAPJVwBm6xBNx099iGAidqYRkvkKJ_xpg39OnrMd6ZjmbM43fng@mail.gmail.com>
Message-ID: <1363954610.3343.66.camel@lurvas.hpc2n.umu.se>

On Fri, 2013-03-22 at 11:42 +0000, Nathaniel Smith wrote:
> On Fri, Mar 22, 2013 at 7:47 AM, Ake Sandgren <ake.sandgren at hpc2n.umu.se> wrote:
> > On Thu, 2013-03-21 at 18:02 -0600, Charles R Harris wrote:
> >> The Numpy 1.7.1 release process seems to have stalled. What do we need
> >> to finish up to get it going again? I think it would be nice to shoot
> >> for a release maybe the weekend after next.
> >
> > Talking about 1.7.1 i have a couple of bug fixes for 1.7.0 at
> > git://github.com/akesandgren/numpy.git in the v1.7.0-hpc2n branch
> >
> > They are quite small.
> 
> Please send as PRs against master, so we can review and merge them?

Done.


From ndbecker2 at gmail.com  Fri Mar 22 09:59:52 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Fri, 22 Mar 2013 09:59:52 -0400
Subject: [Numpy-discussion] howto apply-along-axis?
Message-ID: <kiho4k$qnt$1@ger.gmane.org>

I frequently find I have my 1d function that performs some reduction that I'd 
like to apply-along some axis of an n-d array.

As a trivial example, 

def sum(u):
  return np.sum (u)

In this case the function is probably C/C++ code, but that is irrelevant (I 
think).

Is there a reasonably efficient way to do this within numpy?


From njs at pobox.com  Fri Mar 22 10:21:03 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 22 Mar 2013 14:21:03 +0000
Subject: [Numpy-discussion] howto apply-along-axis?
In-Reply-To: <kiho4k$qnt$1@ger.gmane.org>
References: <kiho4k$qnt$1@ger.gmane.org>
Message-ID: <CAPJVwBniJnAwu5wO5_jApMz_L15gP0HcNyHWwRUYQHo=CHCiSw@mail.gmail.com>

On 22 Mar 2013 14:09, "Neal Becker" <ndbecker2 at gmail.com> wrote:
>
> I frequently find I have my 1d function that performs some reduction that
I'd
> like to apply-along some axis of an n-d array.
>
> As a trivial example,
>
> def sum(u):
>   return np.sum (u)
>
> In this case the function is probably C/C++ code, but that is irrelevant
(I
> think).
>
> Is there a reasonably efficient way to do this within numpy?

The core infrastructure for this sort of thing is there - search on
"generalized ufuncs". There's no python-level api as far as I know, though,
yet.

You could write a reasonable facsimile of np.vectorize for such functions
using nditer.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130322/a14f18a1/attachment.html>

From cjwilliams43 at gmail.com  Fri Mar 22 17:39:35 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Fri, 22 Mar 2013 17:39:35 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
Message-ID: <514CCF97.7030902@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130322/db9d672f/attachment.html>

From chris.barker at noaa.gov  Sat Mar 23 00:05:11 2013
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Fri, 22 Mar 2013 21:05:11 -0700
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <514CCF97.7030902@gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
	<514CCF97.7030902@gmail.com>
Message-ID: <CALGmxEKa0ophcQgduH112EDUOVTymXe-mLP3rAsF-ot1GmkXgA@mail.gmail.com>

On Fri, Mar 22, 2013 at 2:39 PM, Colin J. Williams
<cjwilliams43 at gmail.com> wrote:
> I have updated to numpy 1.7.0 for each of the Pythons 2.7.3, 3.2.3 and
> 3.3.0.
...
> The tests, which are available
> here(http://web.ncf.ca/cjw/FP%20Summary%20over%20273-323-330.txt), show that
> 3.2 is slower, but not to the same degree reported before.

Have posted your test code anywhere? Anyway, depending on how you did
your timings, that looks to me like 3.* is a bit faster with small
data, and pretty much within measurement error for the large datasets.

And if the large ones are doing things with really big arrays (I'm
assuming pretty big, as you're getting close to 32 bit memory
limits...), then it's really hard to imagine how python version could
make a noticeable difference -- the real work would be in the numpy
code, and that's exactly the same on all python versions.

If you are using BLAS or LAPACK stuff, then there might be some
differences with the different builds, though I wouldn't expect so if
you ar getting them from the same source.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov


From ralf.gommers at gmail.com  Sat Mar 23 07:21:21 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 23 Mar 2013 12:21:21 +0100
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <514CCF97.7030902@gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
	<514CCF97.7030902@gmail.com>
Message-ID: <CABL7CQj48VrPPF24-pwFL4k8SAD-q8z9jS2DhXisCk9piU+kTA@mail.gmail.com>

On Fri, Mar 22, 2013 at 10:39 PM, Colin J. Williams
<cjwilliams43 at gmail.com>wrote:

>  On 20/03/2013 11:12 AM, Fr?d?ric Bastien wrote:
>
> On Wed, Mar 20, 2013 at 11:01 AM, Colin J. Williams<cjwilliams43 at gmail.com> <cjwilliams43 at gmail.com> wrote:
>
>  On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote:
>
>  Hi,
>
> win32 do not mean it is a 32 bits windows. sys.platform always return
> win32 on 32bits and 64 bits windows even for python 64 bits.
>
> But that is a good question, is your python 32 or 64 bits?
>
>  32 bits.
>
>  That explain why you have memory problem but not other people with 64
> bits version. So if you want to work with bigger input, change to a
> python 64 bits.
>
> Fred
>
>
>  Thanks to the people who responded to my report that numpy, with Python
> 3.2 was significantly slower than with Python 2.7.
>
> I have updated to numpy 1.7.0 for each of the Pythons 2.7.3, 3.2.3 and
> 3.3.0.
>
> The Pythons came from python.org and the Numpys from PyPi.  The SciPy
> site still points to Source Forge, I gathered from the responses that
> Source Forge is no longer recommended for downloads.
>

That's not the case. The official binaries for NumPy and SciPy are on
SourceForge. The Windows installers on PyPI are there to make easy_install
work, but they're likely slower than the SF installers (no SSE2/SSE3
instructions).

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/e9d33156/attachment.html>

From toddrjen at gmail.com  Sat Mar 23 07:23:10 2013
From: toddrjen at gmail.com (Todd)
Date: Sat, 23 Mar 2013 12:23:10 +0100
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CABL7CQj48VrPPF24-pwFL4k8SAD-q8z9jS2DhXisCk9piU+kTA@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
	<514CCF97.7030902@gmail.com>
	<CABL7CQj48VrPPF24-pwFL4k8SAD-q8z9jS2DhXisCk9piU+kTA@mail.gmail.com>
Message-ID: <CAFpSVpL5Lh6qBE=6pR4a53shD4_9MPx8FEEG4bSsU-dDjbHd9g@mail.gmail.com>

On Sat, Mar 23, 2013 at 12:21 PM, Ralf Gommers <ralf.gommers at gmail.com>wrote:

>
> That's not the case. The official binaries for NumPy and SciPy are on
> SourceForge. The Windows installers on PyPI are there to make easy_install
> work, but they're likely slower than the SF installers (no SSE2/SSE3
> instructions).
>
> Ralf
>
>
Is there a reason why the same binaries can't be used for both?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/5b422d57/attachment.html>

From ralf.gommers at gmail.com  Sat Mar 23 08:17:28 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 23 Mar 2013 13:17:28 +0100
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CAFpSVpL5Lh6qBE=6pR4a53shD4_9MPx8FEEG4bSsU-dDjbHd9g@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
	<514CCF97.7030902@gmail.com>
	<CABL7CQj48VrPPF24-pwFL4k8SAD-q8z9jS2DhXisCk9piU+kTA@mail.gmail.com>
	<CAFpSVpL5Lh6qBE=6pR4a53shD4_9MPx8FEEG4bSsU-dDjbHd9g@mail.gmail.com>
Message-ID: <CABL7CQhcffzx2hnwkBKJf8CjbRVQnb3KeHxKWMSX0nxvPOVUmg@mail.gmail.com>

On Sat, Mar 23, 2013 at 12:23 PM, Todd <toddrjen at gmail.com> wrote:

> On Sat, Mar 23, 2013 at 12:21 PM, Ralf Gommers <ralf.gommers at gmail.com>wrote:
>
>>
>> That's not the case. The official binaries for NumPy and SciPy are on
>> SourceForge. The Windows installers on PyPI are there to make easy_install
>> work, but they're likely slower than the SF installers (no SSE2/SSE3
>> instructions).
>>
>> Ralf
>>
>>
> Is there a reason why the same binaries can't be used for both?
>

The SF .exe superpack installers contains three installers: plain, SSE2 and
SSE3 support. easy_install doesn't know what to do with such an installer.
See
http://thread.gmane.org/gmane.comp.python.numeric.general/29395/focus=29582for
the discussion on why things are as they are now.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/87dee7a2/attachment.html>

From cjwilliams43 at gmail.com  Sat Mar 23 10:39:34 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Sat, 23 Mar 2013 10:39:34 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CABL7CQj48VrPPF24-pwFL4k8SAD-q8z9jS2DhXisCk9piU+kTA@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
	<514CCF97.7030902@gmail.com>
	<CABL7CQj48VrPPF24-pwFL4k8SAD-q8z9jS2DhXisCk9piU+kTA@mail.gmail.com>
Message-ID: <514DBEA6.6080504@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/06e43a9f/attachment.html>

From davidmenhur at gmail.com  Sat Mar 23 11:17:57 2013
From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=)
Date: Sat, 23 Mar 2013 16:17:57 +0100
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <514DBEA6.6080504@gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
	<514CCF97.7030902@gmail.com>
	<CABL7CQj48VrPPF24-pwFL4k8SAD-q8z9jS2DhXisCk9piU+kTA@mail.gmail.com>
	<514DBEA6.6080504@gmail.com>
Message-ID: <CAJhcF=2BSn4aK2aF4OPf1VzX_M=E-O3gvOSpddWVwxXiDGNB0g@mail.gmail.com>

I am a bit worried about the differences in results. Just to be sure
you are comparing apples with apples, it may be a good idea to set the
seed at the beginning:

np.random.seed( SEED )

where SEED is an int. This way, you will be inverting always the same
matrix, regardless of the Python version. I think, even if the timing
is different, the results should be the same.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html#numpy.random.seed


David.


On 23 March 2013 15:39, Colin J. Williams <cjwilliams43 at gmail.com> wrote:
> On 23/03/2013 7:21 AM, Ralf Gommers wrote:
>
>
>
>
> On Fri, Mar 22, 2013 at 10:39 PM, Colin J. Williams <cjwilliams43 at gmail.com>
> wrote:
>>
>> On 20/03/2013 11:12 AM, Fr?d?ric Bastien wrote:
>>
>> On Wed, Mar 20, 2013 at 11:01 AM, Colin J. Williams
>> <cjwilliams43 at gmail.com> wrote:
>>
>> On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote:
>>
>> Hi,
>>
>> win32 do not mean it is a 32 bits windows. sys.platform always return
>> win32 on 32bits and 64 bits windows even for python 64 bits.
>>
>> But that is a good question, is your python 32 or 64 bits?
>>
>> 32 bits.
>>
>> That explain why you have memory problem but not other people with 64
>> bits version. So if you want to work with bigger input, change to a
>> python 64 bits.
>>
>> Fred
>>
>> Thanks to the people who responded to my report that numpy, with Python
>> 3.2 was significantly slower than with Python 2.7.
>>
>> I have updated to numpy 1.7.0 for each of the Pythons 2.7.3, 3.2.3 and
>> 3.3.0.
>>
>> The Pythons came from python.org and the Numpys from PyPi.  The SciPy site
>> still points to Source Forge, I gathered from the responses that Source
>> Forge is no longer recommended for downloads.
>
>
> That's not the case. The official binaries for NumPy and SciPy are on
> SourceForge. The Windows installers on PyPI are there to make easy_install
> work, but they're likely slower than the SF installers (no SSE2/SSE3
> instructions).
>
> Ralf
>
> Thanks, I'll read over Robert Kern's comments.  PyPi is the simpler process,
> but, if the result is unoptimized code, then easy_install is not the way to
> go.
>
> The code is available here(http://web.ncf.ca/cjw/testFPSpeed.py)
> and the most recent test results are
> here(http://web.ncf.ca/cjw/FP%2023-Mar-13%20Test%20Summary.txt).  These are
> using PyPi, I'll look into SourceForge.
>
> Colin W.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From jsseabold at gmail.com  Sat Mar 23 14:19:42 2013
From: jsseabold at gmail.com (Skipper Seabold)
Date: Sat, 23 Mar 2013 14:19:42 -0400
Subject: [Numpy-discussion] Unable to building numpy with openblas using
	bento or distutils
Message-ID: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>

Some help on this would be greatly appreciated. It's been recommended to
use OpenBlas over ATLAS, so I've been trying to build numpy with openblas
and have run into a few problems.

1) Build fails using bento master and waf 1.7.9, see below.
2) Distutils doesn't seem to be able to find lapack as part of atlas. I
tried to skip a site.cfg and define environmental variables. No idea what I
missed.

I followed instructions found scattered over the internet and only
understand vaguely the issues. Maybe someone can help. I'd be happy to
update the wiki with any answers.

To truly support OpenBlas, is it maybe necessary to make some additions to
numpy/distutils/system_info.py?

Thanks for having a look,

Skipper

Install OpenBlas
-----------------------------
git clone git://github.com/xianyi/OpenBLAS
cd OpenBlas

Edit c_check to look for libpthreads in the right place (Kubuntu 12.10)

|4 $ git diff c_check
```
diff --git a/c_check b/c_check
index 4d82237..de0fd33 100644
--- a/c_check
+++ b/c_check
@@ -241,7 +241,7 @@ print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if
$need_fu

 if ($os eq "LINUX") {

-    @pthread = split(/\s+/, `nm /lib/libpthread.so* | grep
_pthread_create`);
+    @pthread = split(/\s+/, `nm /lib/x86_64-linux-gnu/libpthread.so* |
grep _pthread_create`);

     if ($pthread[2] ne "") {
        print CONFFILE "#define PTHREAD_CREATE_FUNC     $pthread[2]\n";
```

make fc=gfortran
make PREFIX=~/.local install

Everything looks ok, so far.

Install NumPy
---------------------------
Using numpy master

I tried to use bento master and waf 1.7.9, following instructions from
David's blog

bentomaker configure --prefix=/home/skipper/.local
--with-blas-lapack-libdir=/home/skipper/.local/lib
--blas-lapack-type=openblas ..
bentomaker build -j4

```
<snip>
[101/104] cshlib: build/numpy/core/src/umath/umath_tests.c.11.o ->
build/numpy/core/umath_tests.so

/usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation
R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used
when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
/usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation
R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used
when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
```

No idea, so, let's try distutils

export LAPACK=~/.local/lib/libopenblas.a
export BLAS=~/.local/lib/libopenblas.a
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/
echo $LD_LIBRARY_PATH
```
:/usr/local/lib64/R/bin:/home/skipper/.local/lib/
```

This step seems to be necessary?

python setup.py config
```
Running from numpy source directory.
non-existing path in 'numpy/distutils': 'site.cfg'
F2PY Version 2
numpy/core/setup_common.py:88: MismatchCAPIWarning: API mismatch detected,
the C API version numbers have to be updated. Current C api version is 8,
with checksum f4362353e2d72f889fda0128aa015037, but recorded checksum for C
API version 8 in codegen_dir/cversions.txt is
17321775fc884de0b1eda478cd61c74b. If functions were added in the C API, you
have to update C_API_VERSION  in numpy/core/setup_common.py.
  MismatchCAPIWarning)
blas_opt_info:
blas_mkl_info:
  libraries mkl,vml,guide not found in ['/usr/local/lib64',
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

atlas_blas_threads_info:
Setting PTATLAS=ATLAS
  libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64',
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

atlas_blas_info:
  libraries f77blas,cblas,atlas not found in ['/usr/local/lib64',
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

/home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1501:
UserWarning:
    Atlas (http://math-atlas.sourceforge.net/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [atlas]) or by setting
    the ATLAS environment variable.
  warnings.warn(AtlasNotFoundError.__doc__)
blas_info:
Replacing _lib_names[0]=='blas' with 'openblas'
Replacing _lib_names[0]=='openblas' with 'openblas'
  FOUND:
    libraries = ['openblas']
    library_dirs = ['/home/skipper/.local/lib']
    language = f77

  FOUND:
    libraries = ['openblas']
    library_dirs = ['/home/skipper/.local/lib']
    define_macros = [('NO_ATLAS_INFO', 1)]
    language = f77

non-existing path in 'numpy/lib': 'benchmarks'
lapack_opt_info:
lapack_mkl_info:
mkl_info:
  libraries mkl,vml,guide not found in ['/usr/local/lib64',
'/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
  NOT AVAILABLE

  NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64
  libraries lapack_atlas not found in /usr/local/lib64
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/local/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib64
  libraries lapack_atlas not found in /usr/lib64
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib
  libraries lapack_atlas not found in /usr/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib/x86_64-linux-gnu
  libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
numpy.distutils.system_info.atlas_threads_info
  NOT AVAILABLE

atlas_info:
  libraries f77blas,cblas,atlas not found in /usr/local/lib64
  libraries lapack_atlas not found in /usr/local/lib64
  libraries f77blas,cblas,atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/local/lib
  libraries f77blas,cblas,atlas not found in /usr/lib64
  libraries lapack_atlas not found in /usr/lib64
  libraries f77blas,cblas,atlas not found in /usr/lib
  libraries lapack_atlas not found in /usr/lib
  libraries f77blas,cblas,atlas not found in /usr/lib/x86_64-linux-gnu
  libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
numpy.distutils.system_info.atlas_info
  NOT AVAILABLE

/home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1415:
UserWarning:
    Atlas (http://math-atlas.sourceforge.net/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [atlas]) or by setting
    the ATLAS environment variable.
  warnings.warn(AtlasNotFoundError.__doc__)
lapack_info:
Replacing _lib_names[0]=='lapack' with 'openblas'
Replacing _lib_names[0]=='openblas' with 'openblas'
  FOUND:
    libraries = ['openblas']
    library_dirs = ['/home/skipper/.local/lib']
    language = f77

  FOUND:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/skipper/.local/lib']
    define_macros = [('NO_ATLAS_INFO', 1)]
    language = f77

running config
```

python setup.py build &> build.log

Build log is here. Obviously it didn't go well, but I don't see anything to
indicate problems. Sometimes I am able to get _dotblas.so built, though I
don't know what causes it. This time I wasn't.

https://gist.github.com/jseabold/7054ba9d85eae09eb402#file-numpy_build-log

sudo python setup.py install &> install.log

https://gist.github.com/jseabold/a0f5638b65d44aeff598#file-numpy_install-log

>>> import numpy as np
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line
138, in <module>
    import add_newdocs
  File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line
13, in <module>
    from numpy.lib import add_newdoc
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", line
15, in <module>
    from polynomial import *
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/polynomial.py",
line 19, in <module>
    from numpy.linalg import eigvals, lstsq, inv
  File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/__init__.py",
line 50, in <module>
    from linalg import *
  File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py",
line 25, in <module>
    from numpy.linalg import lapack_lite
ImportError: libopenblas.so.0: cannot open shared object file: No such file
or directory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/d58b5817/attachment.html>

From klemm at phys.ethz.ch  Sat Mar 23 14:32:10 2013
From: klemm at phys.ethz.ch (Hanno Klemm)
Date: Sat, 23 Mar 2013 19:32:10 +0100
Subject: [Numpy-discussion] Unable to building numpy with openblas using
	bento or distutils
In-Reply-To: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
References: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
Message-ID: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch>

Skipper,
this looks like a problem that I had in the bad old days with ATLAS, as well. Try compiling openblas with the -fPIC flag that used to help. 

Best of luck,
Hanno

hanno.klemm at me.com

Sent from my mobile device, please excuse my brevity. 

On 23.03.2013, at 19:19, Skipper Seabold <jsseabold at gmail.com> wrote:

> Some help on this would be greatly appreciated. It's been recommended to use OpenBlas over ATLAS, so I've been trying to build numpy with openblas and have run into a few problems.
> 
> 1) Build fails using bento master and waf 1.7.9, see below.
> 2) Distutils doesn't seem to be able to find lapack as part of atlas. I tried to skip a site.cfg and define environmental variables. No idea what I missed.
> 
> I followed instructions found scattered over the internet and only understand vaguely the issues. Maybe someone can help. I'd be happy to update the wiki with any answers. 
> 
> To truly support OpenBlas, is it maybe necessary to make some additions to numpy/distutils/system_info.py?
> 
> Thanks for having a look,
> 
> Skipper
> 
> Install OpenBlas
> -----------------------------
> git clone git://github.com/xianyi/OpenBLAS
> cd OpenBlas
> 
> Edit c_check to look for libpthreads in the right place (Kubuntu 12.10)
> 
> |4 $ git diff c_check
> ```
> diff --git a/c_check b/c_check
> index 4d82237..de0fd33 100644
> --- a/c_check
> +++ b/c_check
> @@ -241,7 +241,7 @@ print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if $need_fu
>  
>  if ($os eq "LINUX") {
>      
> -    @pthread = split(/\s+/, `nm /lib/libpthread.so* | grep _pthread_create`);
> +    @pthread = split(/\s+/, `nm /lib/x86_64-linux-gnu/libpthread.so* | grep _pthread_create`);
>      
>      if ($pthread[2] ne "") {
>         print CONFFILE "#define PTHREAD_CREATE_FUNC     $pthread[2]\n";
> ```
> 
> make fc=gfortran
> make PREFIX=~/.local install
> 
> Everything looks ok, so far. 
> 
> Install NumPy
> ---------------------------
> Using numpy master
> 
> I tried to use bento master and waf 1.7.9, following instructions from David's blog
> 
> bentomaker configure --prefix=/home/skipper/.local --with-blas-lapack-libdir=/home/skipper/.local/lib --blas-lapack-type=openblas ..
> bentomaker build -j4
> 
> ```
> <snip>
> [101/104] cshlib: build/numpy/core/src/umath/umath_tests.c.11.o -> build/numpy/core/umath_tests.so                                                              
> /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used when making a shared object; recompile with -fPIC
> /usr/bin/ld: final link failed: Bad value
> collect2: error: ld returned 1 exit status
> /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used when making a shared object; recompile with -fPIC
> /usr/bin/ld: final link failed: Bad value
> collect2: error: ld returned 1 exit status
> ```
> 
> No idea, so, let's try distutils
> 
> export LAPACK=~/.local/lib/libopenblas.a
> export BLAS=~/.local/lib/libopenblas.a
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/
> echo $LD_LIBRARY_PATH
> ```
> :/usr/local/lib64/R/bin:/home/skipper/.local/lib/
> ```
> 
> This step seems to be necessary?
> 
> python setup.py config
> ```
> Running from numpy source directory.
> non-existing path in 'numpy/distutils': 'site.cfg'
> F2PY Version 2
> numpy/core/setup_common.py:88: MismatchCAPIWarning: API mismatch detected, the C API version numbers have to be updated. Current C api version is 8, with checksum f4362353e2d72f889fda0128aa015037, but recorded checksum for C API version 8 in codegen_dir/cversions.txt is 17321775fc884de0b1eda478cd61c74b. If functions were added in the C API, you have to update C_API_VERSION  in numpy/core/setup_common.py.
>   MismatchCAPIWarning)
> blas_opt_info:
> blas_mkl_info:
>   libraries mkl,vml,guide not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
> 
> atlas_blas_threads_info:
> Setting PTATLAS=ATLAS
>   libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
> 
> atlas_blas_info:
>   libraries f77blas,cblas,atlas not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
> 
> /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1501: UserWarning: 
>     Atlas (http://math-atlas.sourceforge.net/) libraries not found.
>     Directories to search for the libraries can be specified in the
>     numpy/distutils/site.cfg file (section [atlas]) or by setting
>     the ATLAS environment variable.
>   warnings.warn(AtlasNotFoundError.__doc__)
> blas_info:
> Replacing _lib_names[0]=='blas' with 'openblas'
> Replacing _lib_names[0]=='openblas' with 'openblas'
>   FOUND:
>     libraries = ['openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     language = f77
> 
>   FOUND:
>     libraries = ['openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     define_macros = [('NO_ATLAS_INFO', 1)]
>     language = f77
> 
> non-existing path in 'numpy/lib': 'benchmarks'
> lapack_opt_info:
> lapack_mkl_info:
> mkl_info:
>   libraries mkl,vml,guide not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
> 
>   NOT AVAILABLE
> 
> atlas_threads_info:
> Setting PTATLAS=ATLAS
>   libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64
>   libraries lapack_atlas not found in /usr/local/lib64
>   libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
>   libraries lapack_atlas not found in /usr/local/lib
>   libraries ptf77blas,ptcblas,atlas not found in /usr/lib64
>   libraries lapack_atlas not found in /usr/lib64
>   libraries ptf77blas,ptcblas,atlas not found in /usr/lib
>   libraries lapack_atlas not found in /usr/lib
>   libraries ptf77blas,ptcblas,atlas not found in /usr/lib/x86_64-linux-gnu
>   libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
> numpy.distutils.system_info.atlas_threads_info
>   NOT AVAILABLE
> 
> atlas_info:
>   libraries f77blas,cblas,atlas not found in /usr/local/lib64
>   libraries lapack_atlas not found in /usr/local/lib64
>   libraries f77blas,cblas,atlas not found in /usr/local/lib
>   libraries lapack_atlas not found in /usr/local/lib
>   libraries f77blas,cblas,atlas not found in /usr/lib64
>   libraries lapack_atlas not found in /usr/lib64
>   libraries f77blas,cblas,atlas not found in /usr/lib
>   libraries lapack_atlas not found in /usr/lib
>   libraries f77blas,cblas,atlas not found in /usr/lib/x86_64-linux-gnu
>   libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
> numpy.distutils.system_info.atlas_info
>   NOT AVAILABLE
> 
> /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1415: UserWarning: 
>     Atlas (http://math-atlas.sourceforge.net/) libraries not found.
>     Directories to search for the libraries can be specified in the
>     numpy/distutils/site.cfg file (section [atlas]) or by setting
>     the ATLAS environment variable.
>   warnings.warn(AtlasNotFoundError.__doc__)
> lapack_info:
> Replacing _lib_names[0]=='lapack' with 'openblas'
> Replacing _lib_names[0]=='openblas' with 'openblas'
>   FOUND:
>     libraries = ['openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     language = f77
> 
>   FOUND:
>     libraries = ['openblas', 'openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     define_macros = [('NO_ATLAS_INFO', 1)]
>     language = f77
> 
> running config
> ```
> 
> python setup.py build &> build.log
> 
> Build log is here. Obviously it didn't go well, but I don't see anything to indicate problems. Sometimes I am able to get _dotblas.so built, though I don't know what causes it. This time I wasn't.
> 
> https://gist.github.com/jseabold/7054ba9d85eae09eb402#file-numpy_build-log
> 
> sudo python setup.py install &> install.log
> 
> https://gist.github.com/jseabold/a0f5638b65d44aeff598#file-numpy_install-log
> 
> >>> import numpy as np
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line 138, in <module>
>     import add_newdocs
>   File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line 13, in <module>
>     from numpy.lib import add_newdoc
>   File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", line 15, in <module>
>     from polynomial import *
>   File "/usr/local/lib/python2.7/dist-packages/numpy/lib/polynomial.py", line 19, in <module>
>     from numpy.linalg import eigvals, lstsq, inv
>   File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/__init__.py", line 50, in <module>
>     from linalg import *
>   File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 25, in <module>
>     from numpy.linalg import lapack_lite
> ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/f84052bc/attachment.html>

From jsseabold at gmail.com  Sat Mar 23 15:19:43 2013
From: jsseabold at gmail.com (Skipper Seabold)
Date: Sat, 23 Mar 2013 15:19:43 -0400
Subject: [Numpy-discussion] Unable to building numpy with openblas using
 bento or distutils
In-Reply-To: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch>
References: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
	<128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch>
Message-ID: <CAKF=DjuCFc5ePUNfLX6ebDOsaQhCX-iA=DXOCqAwj75_jm8pUg@mail.gmail.com>

On Sat, Mar 23, 2013 at 2:32 PM, Hanno Klemm <klemm at phys.ethz.ch> wrote:

> Skipper,
> this looks like a problem that I had in the bad old days with ATLAS, as
> well. Try compiling openblas with the -fPIC flag that used to help.
>
>
Thanks for having a look. I checked after seeing that odd bento failure
(see here [1]), and it looks to me like OpenBlas uses the -fPIC flag in all
of the gcc and gfortran calls. Possible related? [2]

Skipper

[1] https://github.com/cournape/Bento/issues/116
[2] https://github.com/cournape/Bento/issues/128


> Best of luck,
> Hanno
>
> hanno.klemm at me.com
>
> Sent from my mobile device, please excuse my brevity.
>
> On 23.03.2013, at 19:19, Skipper Seabold <jsseabold at gmail.com> wrote:
>
> Some help on this would be greatly appreciated. It's been recommended to
> use OpenBlas over ATLAS, so I've been trying to build numpy with openblas
> and have run into a few problems.
>
> 1) Build fails using bento master and waf 1.7.9, see below.
> 2) Distutils doesn't seem to be able to find lapack as part of atlas. I
> tried to skip a site.cfg and define environmental variables. No idea what I
> missed.
>
> I followed instructions found scattered over the internet and only
> understand vaguely the issues. Maybe someone can help. I'd be happy to
> update the wiki with any answers.
>
> To truly support OpenBlas, is it maybe necessary to make some additions to
> numpy/distutils/system_info.py?
>
> Thanks for having a look,
>
> Skipper
>
> Install OpenBlas
> -----------------------------
> git clone git://github.com/xianyi/OpenBLAS
> cd OpenBlas
>
> Edit c_check to look for libpthreads in the right place (Kubuntu 12.10)
>
> |4 $ git diff c_check
> ```
> diff --git a/c_check b/c_check
> index 4d82237..de0fd33 100644
> --- a/c_check
> +++ b/c_check
> @@ -241,7 +241,7 @@ print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if
> $need_fu
>
>  if ($os eq "LINUX") {
>
> -    @pthread = split(/\s+/, `nm /lib/libpthread.so* | grep
> _pthread_create`);
> +    @pthread = split(/\s+/, `nm /lib/x86_64-linux-gnu/libpthread.so* |
> grep _pthread_create`);
>
>      if ($pthread[2] ne "") {
>         print CONFFILE "#define PTHREAD_CREATE_FUNC     $pthread[2]\n";
> ```
>
> make fc=gfortran
> make PREFIX=~/.local install
>
> Everything looks ok, so far.
>
> Install NumPy
> ---------------------------
> Using numpy master
>
> I tried to use bento master and waf 1.7.9, following instructions from
> David's blog
>
> bentomaker configure --prefix=/home/skipper/.local
> --with-blas-lapack-libdir=/home/skipper/.local/lib
> --blas-lapack-type=openblas ..
> bentomaker build -j4
>
> ```
> <snip>
> [101/104] cshlib: build/numpy/core/src/umath/umath_tests.c.11.o ->
> build/numpy/core/umath_tests.so
>
> /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation
> R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used
> when making a shared object; recompile with -fPIC
> /usr/bin/ld: final link failed: Bad value
> collect2: error: ld returned 1 exit status
> /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation
> R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used
> when making a shared object; recompile with -fPIC
> /usr/bin/ld: final link failed: Bad value
> collect2: error: ld returned 1 exit status
> ```
>
> No idea, so, let's try distutils
>
> export LAPACK=~/.local/lib/libopenblas.a
> export BLAS=~/.local/lib/libopenblas.a
> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/
> echo $LD_LIBRARY_PATH
> ```
> :/usr/local/lib64/R/bin:/home/skipper/.local/lib/
> ```
>
> This step seems to be necessary?
>
> python setup.py config
> ```
> Running from numpy source directory.
> non-existing path in 'numpy/distutils': 'site.cfg'
> F2PY Version 2
> numpy/core/setup_common.py:88: MismatchCAPIWarning: API mismatch detected,
> the C API version numbers have to be updated. Current C api version is 8,
> with checksum f4362353e2d72f889fda0128aa015037, but recorded checksum for C
> API version 8 in codegen_dir/cversions.txt is
> 17321775fc884de0b1eda478cd61c74b. If functions were added in the C API, you
> have to update C_API_VERSION  in numpy/core/setup_common.py.
>   MismatchCAPIWarning)
> blas_opt_info:
> blas_mkl_info:
>   libraries mkl,vml,guide not found in ['/usr/local/lib64',
> '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
>
> atlas_blas_threads_info:
> Setting PTATLAS=ATLAS
>   libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64',
> '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
>
> atlas_blas_info:
>   libraries f77blas,cblas,atlas not found in ['/usr/local/lib64',
> '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
>
> /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1501:
> UserWarning:
>     Atlas (http://math-atlas.sourceforge.net/) libraries not found.
>     Directories to search for the libraries can be specified in the
>     numpy/distutils/site.cfg file (section [atlas]) or by setting
>     the ATLAS environment variable.
>   warnings.warn(AtlasNotFoundError.__doc__)
> blas_info:
> Replacing _lib_names[0]=='blas' with 'openblas'
> Replacing _lib_names[0]=='openblas' with 'openblas'
>   FOUND:
>     libraries = ['openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     language = f77
>
>   FOUND:
>     libraries = ['openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     define_macros = [('NO_ATLAS_INFO', 1)]
>     language = f77
>
> non-existing path in 'numpy/lib': 'benchmarks'
> lapack_opt_info:
> lapack_mkl_info:
> mkl_info:
>   libraries mkl,vml,guide not found in ['/usr/local/lib64',
> '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu']
>   NOT AVAILABLE
>
>   NOT AVAILABLE
>
> atlas_threads_info:
> Setting PTATLAS=ATLAS
>   libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64
>   libraries lapack_atlas not found in /usr/local/lib64
>   libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
>   libraries lapack_atlas not found in /usr/local/lib
>   libraries ptf77blas,ptcblas,atlas not found in /usr/lib64
>   libraries lapack_atlas not found in /usr/lib64
>   libraries ptf77blas,ptcblas,atlas not found in /usr/lib
>   libraries lapack_atlas not found in /usr/lib
>   libraries ptf77blas,ptcblas,atlas not found in /usr/lib/x86_64-linux-gnu
>   libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
> numpy.distutils.system_info.atlas_threads_info
>   NOT AVAILABLE
>
> atlas_info:
>   libraries f77blas,cblas,atlas not found in /usr/local/lib64
>   libraries lapack_atlas not found in /usr/local/lib64
>   libraries f77blas,cblas,atlas not found in /usr/local/lib
>   libraries lapack_atlas not found in /usr/local/lib
>   libraries f77blas,cblas,atlas not found in /usr/lib64
>   libraries lapack_atlas not found in /usr/lib64
>   libraries f77blas,cblas,atlas not found in /usr/lib
>   libraries lapack_atlas not found in /usr/lib
>   libraries f77blas,cblas,atlas not found in /usr/lib/x86_64-linux-gnu
>   libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu
> numpy.distutils.system_info.atlas_info
>   NOT AVAILABLE
>
> /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1415:
> UserWarning:
>     Atlas (http://math-atlas.sourceforge.net/) libraries not found.
>     Directories to search for the libraries can be specified in the
>     numpy/distutils/site.cfg file (section [atlas]) or by setting
>     the ATLAS environment variable.
>   warnings.warn(AtlasNotFoundError.__doc__)
> lapack_info:
> Replacing _lib_names[0]=='lapack' with 'openblas'
> Replacing _lib_names[0]=='openblas' with 'openblas'
>   FOUND:
>     libraries = ['openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     language = f77
>
>   FOUND:
>     libraries = ['openblas', 'openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     define_macros = [('NO_ATLAS_INFO', 1)]
>     language = f77
>
> running config
> ```
>
> python setup.py build &> build.log
>
> Build log is here. Obviously it didn't go well, but I don't see anything
> to indicate problems. Sometimes I am able to get _dotblas.so built, though
> I don't know what causes it. This time I wasn't.
>
> https://gist.github.com/jseabold/7054ba9d85eae09eb402#file-numpy_build-log
>
> sudo python setup.py install &> install.log
>
>
> https://gist.github.com/jseabold/a0f5638b65d44aeff598#file-numpy_install-log
>
> >>> import numpy as np
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line
> 138, in <module>
>     import add_newdocs
>   File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line
> 13, in <module>
>     from numpy.lib import add_newdoc
>   File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py",
> line 15, in <module>
>     from polynomial import *
>   File "/usr/local/lib/python2.7/dist-packages/numpy/lib/polynomial.py",
> line 19, in <module>
>     from numpy.linalg import eigvals, lstsq, inv
>   File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/__init__.py",
> line 50, in <module>
>     from linalg import *
>   File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py",
> line 25, in <module>
>     from numpy.linalg import lapack_lite
> ImportError: libopenblas.so.0: cannot open shared object file: No such
> file or directory
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/aa4a86a2/attachment.html>

From cjwilliams43 at gmail.com  Sat Mar 23 15:36:32 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Sat, 23 Mar 2013 15:36:32 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CAJhcF=2BSn4aK2aF4OPf1VzX_M=E-O3gvOSpddWVwxXiDGNB0g@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
	<514CCF97.7030902@gmail.com>
	<CABL7CQj48VrPPF24-pwFL4k8SAD-q8z9jS2DhXisCk9piU+kTA@mail.gmail.com>
	<514DBEA6.6080504@gmail.com>
	<CAJhcF=2BSn4aK2aF4OPf1VzX_M=E-O3gvOSpddWVwxXiDGNB0g@mail.gmail.com>
Message-ID: <514E0440.5010404@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/b43616f9/attachment.html>

From cjwilliams43 at gmail.com  Sat Mar 23 15:47:53 2013
From: cjwilliams43 at gmail.com (Colin J. Williams)
Date: Sat, 23 Mar 2013 15:47:53 -0400
Subject: [Numpy-discussion] Execution time difference between 2.7 and
 3.2 using numpy
In-Reply-To: <CALGmxEKa0ophcQgduH112EDUOVTymXe-mLP3rAsF-ot1GmkXgA@mail.gmail.com>
References: <5149BDBB.6060509@ncf.ca>
	<CAJhcF=0UVyWQjJe8GNE4-YWYvmsa3KOt1h9Q=Gnr+pV8=jMVZw@mail.gmail.com>
	<CADKKbthJmfTE2180f-2QaFeFKZHPJ6pB2CEVnjKHBzKYrFoppg@mail.gmail.com>
	<5149CF4D.6090906@gmail.com>
	<CADKKbtiRNDV_gFuSFt87EJ6nk_W8VJ18MitA4FfA8w-i-F8wNg@mail.gmail.com>
	<514CCF97.7030902@gmail.com>
	<CALGmxEKa0ophcQgduH112EDUOVTymXe-mLP3rAsF-ot1GmkXgA@mail.gmail.com>
Message-ID: <514E06E9.3050001@gmail.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130323/90697347/attachment.html>

From ake.sandgren at hpc2n.umu.se  Sat Mar 23 19:26:46 2013
From: ake.sandgren at hpc2n.umu.se (Ake Sandgren)
Date: Sun, 24 Mar 2013 00:26:46 +0100
Subject: [Numpy-discussion] Unable to building numpy with openblas using
 bento or distutils
In-Reply-To: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
References: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
Message-ID: <1364081206.2948.75.camel@skalman.ydc.se>

On Sat, 2013-03-23 at 14:19 -0400, Skipper Seabold wrote:
> Some help on this would be greatly appreciated. It's been recommended
> to use OpenBlas over ATLAS, so I've been trying to build numpy with
> openblas and have run into a few problems.

> 
> To truly support OpenBlas, is it maybe necessary to make some
> additions to numpy/distutils/system_info.py?

Here is how.

https://github.com/akesandgren/numpy/commit/363339dd3a9826f3e3e7dc4248c258d3c4dfcd7c


From jsseabold at gmail.com  Sat Mar 23 20:44:24 2013
From: jsseabold at gmail.com (Skipper Seabold)
Date: Sat, 23 Mar 2013 20:44:24 -0400
Subject: [Numpy-discussion] Unable to building numpy with openblas using
 bento or distutils
In-Reply-To: <1364081206.2948.75.camel@skalman.ydc.se>
References: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
	<1364081206.2948.75.camel@skalman.ydc.se>
Message-ID: <CAKF=DjvmKRVyJ7MZZiqUTuYS0ZeHEyPqOaEJ=-G2q41sWEZjaw@mail.gmail.com>

On Sat, Mar 23, 2013 at 7:26 PM, Ake Sandgren <ake.sandgren at hpc2n.umu.se> wrote:
> On Sat, 2013-03-23 at 14:19 -0400, Skipper Seabold wrote:
>> Some help on this would be greatly appreciated. It's been recommended
>> to use OpenBlas over ATLAS, so I've been trying to build numpy with
>> openblas and have run into a few problems.
>
>>
>> To truly support OpenBlas, is it maybe necessary to make some
>> additions to numpy/distutils/system_info.py?
>
> Here is how.
>
> https://github.com/akesandgren/numpy/commit/363339dd3a9826f3e3e7dc4248c258d3c4dfcd7c
>


Thanks that works well for numpy. Test pass. I hope that makes it into
a pull request. My site.cfg looks like this. I don't know about the
lapack_opt section. It doesn't seem to work.

[DEFAULT]
library_dirs = /home/skipper/.local/lib
include_dirs = /home/skipper/.local/include

[openblas]
libraries = openblas

[lapack_opt]
libraries = openblas

Do you have any idea how to get scipy working too. I have a similar
site.cfg, but it does not find lapack, which is rolled into
libopenblas from what I understand. I can do

export LAPACK=~/.local/lib/libopenblas.a
python setup.py build &> build.log
sudo -E python setup.py install

There are no obvious failures in the build.log, but scipy is still
broken because it needs lapack from numpy I guess.

>>> import numpy as np
>>> np.show_config()
lapack_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/skipper/.local/lib']
    language = f77
lapack_src_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/home/skipper/.local/lib']
    language = f77
lapack_opt_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE
>>> from scipy import stats
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scipy/stats/__init__.py",
line 320, in <module>
    from .stats import *
  File "/usr/local/lib/python2.7/dist-packages/scipy/stats/stats.py",
line 242, in <module>
    import scipy.linalg as linalg
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/__init__.py",
line 147, in <module>
    from .misc import *
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/misc.py",
line 5, in <module>
    from . import blas
  File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/blas.py",
line 113, in <module>
    from scipy.linalg import _fblas
ImportError: libopenblas.so.0: cannot open shared object file: No such
file or directory

Skipper


From jsseabold at gmail.com  Sat Mar 23 21:06:48 2013
From: jsseabold at gmail.com (Skipper Seabold)
Date: Sat, 23 Mar 2013 21:06:48 -0400
Subject: [Numpy-discussion] Unable to building numpy with openblas using
 bento or distutils
In-Reply-To: <CAKF=DjvmKRVyJ7MZZiqUTuYS0ZeHEyPqOaEJ=-G2q41sWEZjaw@mail.gmail.com>
References: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
	<1364081206.2948.75.camel@skalman.ydc.se>
	<CAKF=DjvmKRVyJ7MZZiqUTuYS0ZeHEyPqOaEJ=-G2q41sWEZjaw@mail.gmail.com>
Message-ID: <CAKF=Djsyw1+_PR4RS2vszBkANhOdPtCeAytQA9g4rQ=u6No8qA@mail.gmail.com>

On Sat, Mar 23, 2013 at 8:44 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Sat, Mar 23, 2013 at 7:26 PM, Ake Sandgren <ake.sandgren at hpc2n.umu.se> wrote:
>> On Sat, 2013-03-23 at 14:19 -0400, Skipper Seabold wrote:
>>> Some help on this would be greatly appreciated. It's been recommended
>>> to use OpenBlas over ATLAS, so I've been trying to build numpy with
>>> openblas and have run into a few problems.
>>
>>>
>>> To truly support OpenBlas, is it maybe necessary to make some
>>> additions to numpy/distutils/system_info.py?
>>
>> Here is how.
>>
>> https://github.com/akesandgren/numpy/commit/363339dd3a9826f3e3e7dc4248c258d3c4dfcd7c
>>
>
>
> Thanks that works well for numpy. Test pass. I hope that makes it into
> a pull request. My site.cfg looks like this. I don't know about the
> lapack_opt section. It doesn't seem to work.
>
> [DEFAULT]
> library_dirs = /home/skipper/.local/lib
> include_dirs = /home/skipper/.local/include
>
> [openblas]
> libraries = openblas
>
> [lapack_opt]
> libraries = openblas
>
> Do you have any idea how to get scipy working too. I have a similar
> site.cfg, but it does not find lapack, which is rolled into
> libopenblas from what I understand. I can do
>
> export LAPACK=~/.local/lib/libopenblas.a
> python setup.py build &> build.log
> sudo -E python setup.py install
>
> There are no obvious failures in the build.log, but scipy is still
> broken because it needs lapack from numpy I guess.

The answer is to

export BLAS=~/.local/lib/libopenblas.a
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/

before building and installing. Now everything works. Whew. Thanks a
lot for the help.

>
>>>> import numpy as np
>>>> np.show_config()
> lapack_info:
>   NOT AVAILABLE
> atlas_threads_info:
>   NOT AVAILABLE
> blas_opt_info:
>     libraries = ['openblas', 'openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     language = f77
> lapack_src_info:
>   NOT AVAILABLE
> openblas_info:
>     libraries = ['openblas', 'openblas']
>     library_dirs = ['/home/skipper/.local/lib']
>     language = f77
> lapack_opt_info:
>   NOT AVAILABLE
> atlas_info:
>   NOT AVAILABLE
> lapack_mkl_info:
>   NOT AVAILABLE
> blas_mkl_info:
>   NOT AVAILABLE
> mkl_info:
>   NOT AVAILABLE
>>>> from scipy import stats
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.7/dist-packages/scipy/stats/__init__.py",
> line 320, in <module>
>     from .stats import *
>   File "/usr/local/lib/python2.7/dist-packages/scipy/stats/stats.py",
> line 242, in <module>
>     import scipy.linalg as linalg
>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/__init__.py",
> line 147, in <module>
>     from .misc import *
>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/misc.py",
> line 5, in <module>
>     from . import blas
>   File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/blas.py",
> line 113, in <module>
>     from scipy.linalg import _fblas
> ImportError: libopenblas.so.0: cannot open shared object file: No such
> file or directory
>
> Skipper


From sebastian at sipsolutions.net  Sun Mar 24 08:21:30 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 24 Mar 2013 13:21:30 +0100
Subject: [Numpy-discussion] NumPy/SciPy participation in GSoC 2013
In-Reply-To: <CABL7CQgOxa3bqf3_MXAAAPoq1-ZkN09AJrZpfq94u7V0aTRPtw@mail.gmail.com>
References: <CABL7CQgOxa3bqf3_MXAAAPoq1-ZkN09AJrZpfq94u7V0aTRPtw@mail.gmail.com>
Message-ID: <1364127690.12566.39.camel@sebastian-laptop>

On Thu, 2013-03-21 at 22:20 +0100, Ralf Gommers wrote:
> Hi all,
> 
> 
> It is the time of the year for Google Summer of Code applications. If
> we want to participate with Numpy and/or Scipy, we need two things:
> enough mentors and ideas for projects. If we get those, we'll apply
> under the PSF umbrella. They've outlined the timeline they're working
> by and guidelines at
> http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html. 
> 
> We should be able to come up with some interesting project ideas I'd
> think, let's put those at
> http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably
> with enough detail to be understandable for people new to the projects
> and a proposed mentor.

Just some more ideas for numpy. I did not think about it much if they
fit the GSoC format well, but maybe a possible mentor likes one:

1. Speed improvement for scalars/small arrays. This would start with
ideas along the lines currently done by two current pull requests for
numpy, that try to improve array + python scalar speed by circumventing
costly scalar -> array conversions, etc. And continue to improving the
speed for finding the correct ufunc (which I believe Nathaniel timed to
be a pretty big factor). But it would touch a lot of numpy internals
probably, so the learning curve may be pretty steep.

2. Implement stable summation. Basically it would be about creating
generalized ufuncs (if that is possible) implementing different kinds of
stable summation algorithms for the inexact types and then adding it as
an option to np.sum.

3. This has been suggested before in some way or another, but improving
of the subclassing of arrays. Though I am unsure if user code might
dislike changes, even if it is improvement...
It would start of with checking which python side functions should
explicitly call __array_wrap__ (possible writing more helpers to do it)
and calling it more consistently plus adding the context information
where it is currently not added (only simple ufunc calls add it, not
even reductions I think). I am sure you can dig a lot deeper into it
all, but it would require some serious thinking and is not straight
forward.

4. Partial sorting. This would be about implementing partial sorting and
O(N) median calculation into numpy. Plus maybe new functions that can
make use of it (though I don't know what exactly that would be, but
these functions could also have a home in scipy not numpy).

> 
> We need at least 3 people willing to mentor a student. Ideally we'd
> have enough mentors this week, so we can apply to the PSF on time. If
> you're willing to be a mentor, please send me the following: name,
> email address, phone nr, and what you're interested in mentoring. If
> you have time constaints and have doubts about being able to be a
> primary mentor, being a backup mentor would also be helpful.
> 
> 
> Cheers,
> Ralf
> 
> 
> P.S. as you can probably tell from the above, I'm happy to coordinate
> the GSoC applications for Numpy and Scipy
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From marc.gronle at ito.uni-stuttgart.de  Sun Mar 24 10:53:40 2013
From: marc.gronle at ito.uni-stuttgart.de (Marc Gronle)
Date: Sun, 24 Mar 2013 15:53:40 +0100
Subject: [Numpy-discussion] C-API: Subclassing array for numpy 1.7 (with
 #define NPY_NO_DEPRECATED_API 0x00000007)
Message-ID: <CAL=a8VwoxU1jPRAY5VeQOH+6x+nTsUM816Syv6WL2-2-zpkeSg@mail.gmail.com>

Hello together,

we embedded Python 3 in a C++ environment. In this application I created a
new class, that is a subclass from numpy array.
Until now (numpy 1.6 or numpy 1.7 without the deprecation define
(NPY_NO_DEPRECATED_API) the typedef for my class describing the object
was something like

typedef struct
{
    PyArrayObject numpyArray;
    int myMember1;
    int myMember2;
    ...
}  PySubclassObject

This always worked for me and a call to PyArray_NDIM(obj) returned the
number of dimensions, where obj is of type PySubclassObject*.

If I am now using numpy 1.7 this also works if the line #define
NPY_NO_DEPRECATED_API 0x00000007 is not set.
However, if I removed all deprecated content, there occurrs the first
runtime error when creating an instance of PySubclassObject.

This is due to the fact, that PyArrayObject now only is a tiny typedef of
the following form

typedef struct tagPyArrayObject {
        PyObject_HEAD
} PyArrayObject;

Previously this has been something like

typedef struct tagPyArrayObject {
    PyObject_HEAD
    char *data;
    int nd;
    npy_intp *dimensions;
    ...
} PyArrayObject;

Usually, when creating only one np.array, there is extra space allocated
depending on the size of PyArrayObject_fields. However, in my subclass
I don't know how to add that extra space between the members numpyArray and
myMember1. Finally, when calling
PyArray_NDIM(obj) like above, the obj-pointer is casted in the macro to
PyArrayObject_fields*. This yields an access conflict with myMember1,
myMember2...
and the members of PyArrayObject_fields.

I hope this description was clear enough to understand my problem. Has
anybody an idea what I need to change such that the subclassing also works
for the new
numpy structure?

Thanks for any answer.

Cheers
Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130324/e3dcd841/attachment.html>

From jaime.frio at gmail.com  Sun Mar 24 12:50:59 2013
From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=)
Date: Sun, 24 Mar 2013 09:50:59 -0700
Subject: [Numpy-discussion] Generalized inner?
Message-ID: <CAPOWHW=5sLwARcAcdL1oaAFAciK0y8C1J=g=jX2cVqwb0KkB4Q@mail.gmail.com>

The other day I found myself finding trailing edges in binary images doing
something like this:

arr = np.random.randint(2, size=1000).astype(np.int8)
pattern = np.array([1, 1, 1, 1, 0, 0])
arr_match = 2*arr - 1
pat_match = 2*pattern - 1
from numpy.lib.stride_tricks import as_strided
arr_win = as_strided(arr_match, shape=arr.shape[:-1] +
(arr.shape[-1]-len(pattern)+1, len(pattern)),
strides=arr.strides+arr.strides[-1:])
matches = np.einsum('...i, i', arr_win, pat_match) == len(pattern)

While this works fine, this led me to thinking that all this functions
(inner, dot, einsum, tensordot...) could be generalized to any other ufuncs
apart from a pointwise np.multiply followed by an np.add reduction.

It would be great if there was a np.gen_inner that allowed something like:

np.gen_inner(arr_win, pattern, pointwise=np.equal, reduce=np.logical_and)

I would like to think that such a generalization would be useful in other
settings (although I can't think of any right now), and that it could find
it's place in numpy, rather than in scipy.ndimage or the like. Does this
make any sense? Is there any already existing way of doing this that I'm
overlooking?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130324/f9bca852/attachment.html>

From ondrej.certik at gmail.com  Sun Mar 24 13:36:23 2013
From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=)
Date: Sun, 24 Mar 2013 18:36:23 +0100
Subject: [Numpy-discussion] Numpy 1.7.1
In-Reply-To: <CAB6mnxJNiJp3yPk4A5aBxv=vc5M2MS6UdNopP+Bx8UiNYstQrA@mail.gmail.com>
References: <CAB6mnxJNiJp3yPk4A5aBxv=vc5M2MS6UdNopP+Bx8UiNYstQrA@mail.gmail.com>
Message-ID: <CADDwiVDXL1WHS3TMHTSQgftH1ijWQxcOXNPxhHNGP8_xvmu_ww@mail.gmail.com>

On Fri, Mar 22, 2013 at 1:02 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> The Numpy 1.7.1 release process seems to have stalled.

My apologies for that.

> What do we need to
> finish up to get it going again? I think it would be nice to shoot for a
> release maybe the weekend after next.


I think just the release notes need to be written, which I am doing
right now. Then I release 1.7.1rc1. If more things need to be merged,
I can do 1.7.1rc2. Or we can later release 1.7.2, depending on the
result of the discussion of the plan here:

https://github.com/numpy/numpy/issues/3158

Ondrej


From nadavh at visionsense.com  Sun Mar 24 14:32:03 2013
From: nadavh at visionsense.com (Nadav Horesh)
Date: Sun, 24 Mar 2013 18:32:03 +0000
Subject: [Numpy-discussion] Generalized inner?
In-Reply-To: <CAPOWHW=5sLwARcAcdL1oaAFAciK0y8C1J=g=jX2cVqwb0KkB4Q@mail.gmail.com>
References: <CAPOWHW=5sLwARcAcdL1oaAFAciK0y8C1J=g=jX2cVqwb0KkB4Q@mail.gmail.com>
Message-ID: <3vuqt5ng51j5gts026ebpk53.1364149918865@email.android.com>

This is what APL's . operator does, and  I found it useful from time to time (but I was much younger then).

  Nadav

Jaime Fern?ndez del R?o <jaime.frio at gmail.com> wrote:


The other day I found myself finding trailing edges in binary images doing something like this:

arr = np.random.randint(2, size=1000).astype(np.int8)
pattern = np.array([1, 1, 1, 1, 0, 0])
arr_match = 2*arr - 1
pat_match = 2*pattern - 1
from numpy.lib.stride_tricks import as_strided
arr_win = as_strided(arr_match, shape=arr.shape[:-1] + (arr.shape[-1]-len(pattern)+1, len(pattern)), strides=arr.strides+arr.strides[-1:])
matches = np.einsum('...i, i', arr_win, pat_match) == len(pattern)

While this works fine, this led me to thinking that all this functions (inner, dot, einsum, tensordot...) could be generalized to any other ufuncs apart from a pointwise np.multiply followed by an np.add reduction.

It would be great if there was a np.gen_inner that allowed something like:

np.gen_inner(arr_win, pattern, pointwise=np.equal, reduce=np.logical_and)

I would like to think that such a generalization would be useful in other settings (although I can't think of any right now), and that it could find it's place in numpy, rather than in scipy.ndimage or the like. Does this make any sense? Is there any already existing way of doing this that I'm overlooking?

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130324/82ac2563/attachment.html>

From ondrej.certik at gmail.com  Sun Mar 24 17:02:53 2013
From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=)
Date: Sun, 24 Mar 2013 22:02:53 +0100
Subject: [Numpy-discussion] ANN: NumPy 1.7.1rc1 release
Message-ID: <CADDwiVDRXtgrVm7jtkk=BwN8s=_6cM6TEkQ_i82fQfwv3KsqEA@mail.gmail.com>

Hi,

I'm pleased to announce the availability of the first release candidate of
NumPy 1.7.1rc1.

Sources and binary installers can be found at
https://sourceforge.net/projects/numpy/files/NumPy/1.7.1rc1/

Please test it and report any bugs. It fixes a few bugs, listed below.

I would like to thank everybody who contributed patches to this release:
Nathaniel J. Smith, Sebastian Berg, Charles Harris, Bradley M. Froehle,
Ralf Gommers, Christoph Gohlke, Mark Wiebe and Maximilian Albert.

Cheers,
Ondrej


=========================
NumPy 1.7.1 Release Notes
=========================

This is a bugfix only release in the 1.7.x series.


Issues fixed
------------

gh-2973   Fix `1` is printed during numpy.test()
gh-2983   BUG: gh-2969: Backport memory leak fix 80b3a34.
gh-3007   Backport gh-3006
gh-2984   Backport fix complex polynomial fit
gh-2982   BUG: Make nansum work with booleans.
gh-2985   Backport large sort fixes
gh-3039   Backport object take
gh-3105   Backport nditer fix op axes initialization
gh-3108   BUG: npy-pkg-config ini files were missing after Bento build.
gh-3124   BUG: PyArray_LexSort allocates too much temporary memory.
gh-3131   BUG: Exported f2py_size symbol prevents linking multiple f2py
modules.
gh-3117   Backport gh-2992
gh-3135   DOC: Add mention of PyArray_SetBaseObject stealing a reference
gh-3134   DOC: Fix typo in fft docs (the indexing variable is 'm', not 'n').
gh-3136   Backport #3128

Checksums
=========

28c3f3e71b5eaa6bfab6e8340dbd35e7  release/installers/numpy-1.7.1rc1.tar.gz
436f416dee10d157314bd9da7ab95c9c
release/installers/numpy-1.7.1rc1-win32-superpack-python2.7.exe
a543c8cf69f66ff2b4c9565646105863
release/installers/numpy-1.7.1rc1-win32-superpack-python2.6.exe
6dfcbbd449b7fe4e841c5fd1bfa7af7c
release/installers/numpy-1.7.1rc1-win32-superpack-python2.5.exe
22912792a1b6155ae2bdbc30bee8fadc
release/installers/numpy-1.7.1rc1-win32-superpack-python3.2.exe
95bc5a5fcce9fcbc2717a774dccae31b
release/installers/numpy-1.7.1rc1-win32-superpack-python3.3.exe
33cf283765a148846b49b89fb96d67d5
release/installers/numpy-1.7.1rc1-win32-superpack-python3.1.exe
9761de4b35493fed38c5d177da9c3b37  release/installers/numpy-1.7.1rc1.zip


From sergio.pasra at gmail.com  Sun Mar 24 17:46:56 2013
From: sergio.pasra at gmail.com (Sergio Pascual)
Date: Sun, 24 Mar 2013 22:46:56 +0100
Subject: [Numpy-discussion] howto apply-along-axis?
In-Reply-To: <kiho4k$qnt$1@ger.gmane.org>
References: <kiho4k$qnt$1@ger.gmane.org>
Message-ID: <CACta-Kavrj3nMwma9qL4RVX8NzGCW50SNSa8Zxha3waBDEMb3w@mail.gmail.com>

This is the closer I got to do what you say

http://numpy-discussion.10968.n7.nabble.com/Reductions-with-nditer-working-only-with-the-last-axis-td8157.html

Converts a 3D to 2D, but only works in the last axis. Any improvement would
be welcomed.?


2013/3/22 Neal Becker <ndbecker2 at gmail.com>

> I frequently find I have my 1d function that performs some reduction that
> I'd
> like to apply-along some axis of an n-d array.
>
> As a trivial example,
>
> def sum(u):
>   return np.sum (u)
>
> In this case the function is probably C/C++ code, but that is irrelevant (I
> think).
>
> Is there a reasonably efficient way to do this within numpy?
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130324/aa70b66c/attachment.html>

From charlesr.harris at gmail.com  Sun Mar 24 23:00:47 2013
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 24 Mar 2013 21:00:47 -0600
Subject: [Numpy-discussion] ANN: NumPy 1.7.1rc1 release
In-Reply-To: <CADDwiVDRXtgrVm7jtkk=BwN8s=_6cM6TEkQ_i82fQfwv3KsqEA@mail.gmail.com>
References: <CADDwiVDRXtgrVm7jtkk=BwN8s=_6cM6TEkQ_i82fQfwv3KsqEA@mail.gmail.com>
Message-ID: <CAB6mnxKDudhQ9nCPoxiQ0Ww=yf_NDHS6P_orpL3PoHhi14iNTw@mail.gmail.com>

On Sun, Mar 24, 2013 at 3:02 PM, Ond?ej ?ert?k <ondrej.certik at gmail.com>wrote:

> Hi,
>
> I'm pleased to announce the availability of the first release candidate of
> NumPy 1.7.1rc1.
>
> Sources and binary installers can be found at
> https://sourceforge.net/projects/numpy/files/NumPy/1.7.1rc1/
>
> Please test it and report any bugs. It fixes a few bugs, listed below.
>
> I would like to thank everybody who contributed patches to this release:
> Nathaniel J. Smith, Sebastian Berg, Charles Harris, Bradley M. Froehle,
> Ralf Gommers, Christoph Gohlke, Mark Wiebe and Maximilian Albert.
>
> Cheers,
> Ondrej
>
>
>
> =========================
> NumPy 1.7.1 Release Notes
> =========================
>
> This is a bugfix only release in the 1.7.x series.
>
>
> Issues fixed
> ------------
>
> gh-2973   Fix `1` is printed during numpy.test()
> gh-2983   BUG: gh-2969: Backport memory leak fix 80b3a34.
> gh-3007   Backport gh-3006
> gh-2984   Backport fix complex polynomial fit
> gh-2982   BUG: Make nansum work with booleans.
> gh-2985   Backport large sort fixes
> gh-3039   Backport object take
> gh-3105   Backport nditer fix op axes initialization
> gh-3108   BUG: npy-pkg-config ini files were missing after Bento build.
> gh-3124   BUG: PyArray_LexSort allocates too much temporary memory.
> gh-3131   BUG: Exported f2py_size symbol prevents linking multiple f2py
> modules.
> gh-3117   Backport gh-2992
> gh-3135   DOC: Add mention of PyArray_SetBaseObject stealing a reference
> gh-3134   DOC: Fix typo in fft docs (the indexing variable is 'm', not
> 'n').
> gh-3136   Backport #3128
>
> Checksums
> =========
>
> 28c3f3e71b5eaa6bfab6e8340dbd35e7  release/installers/numpy-1.7.1rc1.tar.gz
> 436f416dee10d157314bd9da7ab95c9c
> release/installers/numpy-1.7.1rc1-win32-superpack-python2.7.exe
> a543c8cf69f66ff2b4c9565646105863
> release/installers/numpy-1.7.1rc1-win32-superpack-python2.6.exe
> 6dfcbbd449b7fe4e841c5fd1bfa7af7c
> release/installers/numpy-1.7.1rc1-win32-superpack-python2.5.exe
> 22912792a1b6155ae2bdbc30bee8fadc
> release/installers/numpy-1.7.1rc1-win32-superpack-python3.2.exe
> 95bc5a5fcce9fcbc2717a774dccae31b
> release/installers/numpy-1.7.1rc1-win32-superpack-python3.3.exe
> 33cf283765a148846b49b89fb96d67d5
> release/installers/numpy-1.7.1rc1-win32-superpack-python3.1.exe
> 9761de4b35493fed38c5d177da9c3b37  release/installers/numpy-1.7.1rc1.zip
> __


Great. The fix for the memory leak should make some folks happy.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130324/09782e6b/attachment.html>

From cgohlke at uci.edu  Sun Mar 24 23:40:58 2013
From: cgohlke at uci.edu (Christoph Gohlke)
Date: Sun, 24 Mar 2013 20:40:58 -0700
Subject: [Numpy-discussion] ANN: NumPy 1.7.1rc1 release
In-Reply-To: <CADDwiVDRXtgrVm7jtkk=BwN8s=_6cM6TEkQ_i82fQfwv3KsqEA@mail.gmail.com>
References: <CADDwiVDRXtgrVm7jtkk=BwN8s=_6cM6TEkQ_i82fQfwv3KsqEA@mail.gmail.com>
Message-ID: <514FC74A.90705@uci.edu>

On 3/24/2013 2:02 PM, Ond?ej ?ert?k wrote:
> Hi,
>
> I'm pleased to announce the availability of the first release candidate of
> NumPy 1.7.1rc1.
>
> Sources and binary installers can be found at
> https://sourceforge.net/projects/numpy/files/NumPy/1.7.1rc1/
>
> Please test it and report any bugs. It fixes a few bugs, listed below.
>
> I would like to thank everybody who contributed patches to this release:
> Nathaniel J. Smith, Sebastian Berg, Charles Harris, Bradley M. Froehle,
> Ralf Gommers, Christoph Gohlke, Mark Wiebe and Maximilian Albert.
>
> Cheers,
> Ondrej
>
>
>
> =========================
> NumPy 1.7.1 Release Notes
> =========================
>
> This is a bugfix only release in the 1.7.x series.
>
>
> Issues fixed
> ------------
>
> gh-2973   Fix `1` is printed during numpy.test()
> gh-2983   BUG: gh-2969: Backport memory leak fix 80b3a34.
> gh-3007   Backport gh-3006
> gh-2984   Backport fix complex polynomial fit
> gh-2982   BUG: Make nansum work with booleans.
> gh-2985   Backport large sort fixes
> gh-3039   Backport object take
> gh-3105   Backport nditer fix op axes initialization
> gh-3108   BUG: npy-pkg-config ini files were missing after Bento build.
> gh-3124   BUG: PyArray_LexSort allocates too much temporary memory.
> gh-3131   BUG: Exported f2py_size symbol prevents linking multiple f2py
> modules.
> gh-3117   Backport gh-2992
> gh-3135   DOC: Add mention of PyArray_SetBaseObject stealing a reference
> gh-3134   DOC: Fix typo in fft docs (the indexing variable is 'm', not 'n').
> gh-3136   Backport #3128
>
> Checksums
> =========
>
> 28c3f3e71b5eaa6bfab6e8340dbd35e7  release/installers/numpy-1.7.1rc1.tar.gz
> 436f416dee10d157314bd9da7ab95c9c
> release/installers/numpy-1.7.1rc1-win32-superpack-python2.7.exe
> a543c8cf69f66ff2b4c9565646105863
> release/installers/numpy-1.7.1rc1-win32-superpack-python2.6.exe
> 6dfcbbd449b7fe4e841c5fd1bfa7af7c
> release/installers/numpy-1.7.1rc1-win32-superpack-python2.5.exe
> 22912792a1b6155ae2bdbc30bee8fadc
> release/installers/numpy-1.7.1rc1-win32-superpack-python3.2.exe
> 95bc5a5fcce9fcbc2717a774dccae31b
> release/installers/numpy-1.7.1rc1-win32-superpack-python3.3.exe
> 33cf283765a148846b49b89fb96d67d5
> release/installers/numpy-1.7.1rc1-win32-superpack-python3.1.exe
> 9761de4b35493fed38c5d177da9c3b37  release/installers/numpy-1.7.1rc1.zip


Hello,

test_exec_command_stderr fails on Python 3.x for Windows (msvc/MKL builds):

https://github.com/numpy/numpy/issues/3165

--
Christoph


From ndbecker2 at gmail.com  Mon Mar 25 08:24:23 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Mon, 25 Mar 2013 08:24:23 -0400
Subject: [Numpy-discussion] picking elements with boolean masks
Message-ID: <kipflk$iet$1@ger.gmane.org>

starting with a NxM array, I want to select elements of the array using a set of 
boolean masks.  The masks are simply where the indexes have a 0 or 1 in the 
corresponding bit position.  For example, consider the case where M = 4.

all_syms = np.arange (4)
all_bits = np.arange (2)
bit_mask = (all_syms[:,np.newaxis] >> all_bits) & 1
mask0 = bit_mask == 0
mask1 = bit_mask == 1

Maybe there's a more straightforward way to generate these masks.  That's not my 
question. 

In [331]: mask1
Out[331]: 
array([[False, False],
       [ True, False],
       [False,  True],
       [ True,  True]], dtype=bool)

OK, now I want to use this mask on D
In [333]: D.shape
Out[333]: (32400, 4)

Just to simplify, let's just try the first row of D

In [336]: D[0]
Out[336]: array([ 0.,  2.,  2.,  4.])

In [335]: D[0][mask1[...,0]]
Out[335]: array([ 2.,  4.])

that worked fine.  But I want not just to apply one of the masks in the set 
(mask1 is [4,2], it has 2 masks), I want the results of applying all the masks 
(2 in this case)


In [334]: D[0][mask1]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-334-243c7a5e45a4> in <module>()
----> 1 D[0][mask1]

ValueError: boolean index array should have 1 dimension

Any ideas what's the best approach here?


From ndbecker2 at gmail.com  Mon Mar 25 08:50:42 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Mon, 25 Mar 2013 08:50:42 -0400
Subject: [Numpy-discussion] picking elements with boolean masks
References: <kipflk$iet$1@ger.gmane.org>
Message-ID: <kiph6v$23l$1@ger.gmane.org>

Neal Becker wrote:

> starting with a NxM array, I want to select elements of the array using a set
> of
> boolean masks.  The masks are simply where the indexes have a 0 or 1 in the
> corresponding bit position.  For example, consider the case where M = 4.
> 
> all_syms = np.arange (4)
> all_bits = np.arange (2)
> bit_mask = (all_syms[:,np.newaxis] >> all_bits) & 1
> mask0 = bit_mask == 0
> mask1 = bit_mask == 1
> 
> Maybe there's a more straightforward way to generate these masks.  That's not
> my question.
> 
> In [331]: mask1
> Out[331]:
> array([[False, False],
>        [ True, False],
>        [False,  True],
>        [ True,  True]], dtype=bool)
> 
> OK, now I want to use this mask on D
> In [333]: D.shape
> Out[333]: (32400, 4)
> 
> Just to simplify, let's just try the first row of D
> 
> In [336]: D[0]
> Out[336]: array([ 0.,  2.,  2.,  4.])
> 
> In [335]: D[0][mask1[...,0]]
> Out[335]: array([ 2.,  4.])
> 
> that worked fine.  But I want not just to apply one of the masks in the set
> (mask1 is [4,2], it has 2 masks), I want the results of applying all the masks
> (2 in this case)
> 
> 
> In [334]: D[0][mask1]
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-334-243c7a5e45a4> in <module>()
> ----> 1 D[0][mask1]
> 
> ValueError: boolean index array should have 1 dimension
> 
> Any ideas what's the best approach here?

Perhaps what I need is to use integer indexing, rather than boolean.

all_syms = np.arange (const.size)
all_bits = np.arange (BITS_PER_SYM)
bit_mask = (all_syms[:,np.newaxis] >> all_bits) & 1

ind = np.array ([np.nonzero (bit_mask[...,i])[0] for i in range (BITS_PER_SYM)])
In [366]: ind
Out[366]: 
array([[1, 3],
       [2, 3]])

So now we have the 1-d indexes of the elements we want to select from D.
D = np.arange (4)+1

In [376]: D
Out[376]: array([1, 2, 3, 4])

In [377]: D[ind]
Out[377]: 
array([[2, 4],
       [3, 4]])

Looks like that does the job


From dineshbvadhia at hotmail.com  Mon Mar 25 11:23:52 2013
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Mon, 25 Mar 2013 08:23:52 -0700
Subject: [Numpy-discussion] variables not defined in numpy.random
	__init.py__ ?
Message-ID: <BAY173-DS23007BEF790895316C5B0FA3D70@phx.gbl>

Using PyInstaller, the following error occurs:

Traceback (most recent call last):
  File "<string>", line 9, in <module>
  File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in init
    __import__(f, globals(), locals(), [])
  File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line 23, in <module>
    import os, tempfile
  File "/usr/lib/python2.7/tempfile.py", line 34, in <module>
    from random import Random as _Random
  File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", line 90, in <module>
    ranf = random = sample = random_sample
NameError: name 'random_sample' is not defined

Is line 90 in __init.py__ valid?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130325/b4e4ad15/attachment.html>

From gael.varoquaux at normalesup.org  Mon Mar 25 13:04:01 2013
From: gael.varoquaux at normalesup.org (=?iso-8859-1?Q?Ga=EBl?= Varoquaux)
Date: Mon, 25 Mar 2013 18:04:01 +0100
Subject: [Numpy-discussion] numpy array to C API
In-Reply-To: <20130321163451.GE12061@kudu.in-berlin.de>
References: <514B328B.40107@syntonetic.com>
	<20130321163451.GE12061@kudu.in-berlin.de>
Message-ID: <20130325170401.GF20550@phare.normalesup.org>

On Thu, Mar 21, 2013 at 05:34:51PM +0100, Valentin Haenel wrote:
> > I got currious about the Ctypes approach as well as "Ga?l Varoquaux?s 
> > blog post about avoiding data copies", but the link in the article 
> > didn't seem to work. (Under "Further Reading and References")

> There seems to be something wrong with Ga?l's website. I have CC him,
> maybe he can fix it.

Thanks Valentin! I believe that I have fixed the problem. Soren, if you
still have difficulties accessing the material, please complain.

Cheers,

Ga?l


From dineshbvadhia at hotmail.com  Mon Mar 25 14:40:58 2013
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Mon, 25 Mar 2013 11:40:58 -0700
Subject: [Numpy-discussion] Unable to building numpy with openblas
	usingbento or distutils
In-Reply-To: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch>
References: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
	<128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch>
Message-ID: <BAY173-DS19FD5563C0EF2B7AD5770CA3D70@phx.gbl>

Caveat: Not tested but it did look interesting: http://osdf.github.com/blog/numpyscipy-with-openblas-for-ubuntu-1204-second-try.html.
Would be interested to know if it worked out as want to try out OpenBlas in the future.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130325/4bf2ea83/attachment.html>

From jsseabold at gmail.com  Mon Mar 25 14:52:41 2013
From: jsseabold at gmail.com (Skipper Seabold)
Date: Mon, 25 Mar 2013 14:52:41 -0400
Subject: [Numpy-discussion] Unable to building numpy with openblas
 usingbento or distutils
In-Reply-To: <BAY173-DS19FD5563C0EF2B7AD5770CA3D70@phx.gbl>
References: <CAKF=DjtdCb7VckKBDtPAGNPuKkKyG9cxfH5EYPqZsV93Ri549A@mail.gmail.com>
	<128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch>
	<BAY173-DS19FD5563C0EF2B7AD5770CA3D70@phx.gbl>
Message-ID: <CAKF=Djv5UbKe_S5eBUkS6tYxgKuKdt-U4HFspkBEqVmbTeEOzA@mail.gmail.com>

On Mon, Mar 25, 2013 at 2:40 PM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com>wrote:

> **
> Caveat: Not tested but it did look interesting:
> http://osdf.github.com/blog/numpyscipy-with-openblas-for-ubuntu-1204-second-try.html
> .
> Would be interested to know if it worked out as want to try out OpenBlas
> in the future.
>
>
Yes, this is one of the sources I used. I needed to change the c_check file
in openblas as described up thread, and I didn't like the
half-distutils/half-bento hack, but with Ake's patch to numpy's distutils,
and my site.cfg, this works as described for me (Kubuntu 12.10) using just
the usual setup.py.

Skipper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130325/1c51d16f/attachment.html>

From jrocher at enthought.com  Mon Mar 25 14:56:32 2013
From: jrocher at enthought.com (Jonathan Rocher)
Date: Mon, 25 Mar 2013 13:56:32 -0500
Subject: [Numpy-discussion] Growing the contributor base of Numpy
Message-ID: <CAOzk5Qe1HsrbzYBd0i0hT+rA0X1TsOxvYRteJTetA4XsiXhPdA@mail.gmail.com>

Dear all,

One recurring question is how to *grow the contributor base* to NumPy and
provide help and relief to core developers and maintainers.

 One way to do this would be to *leverage the upcoming SciPy conference* in
2 ways:

   1. Provide an intermediate or advanced level tutorial on NumPy focusing
   on teaching the C-API and the architecture of the package to help people
   navigate the source code, and find answers to precise deep questions. I
   think that many users would be interested in being better able to
   understand the underlayers to become powerful users (and contributors if
   they want to).

   2. Organize a Numpy sprint to leverage all this freshly graduated
   students apply what they learned to tackle some of the work under the
   guidance of core developers.

 This would be a great occasion to share and grow knowledge that is
fundamental to our community. And the fact that the underlayers are in C is
fine IMHO: SciPy is about scientific programming in Python and that is done
with a lot of C.

*Thoughts? Anyone interested in leading a tutorial (can be a team of
people)? Anyone willing to coordinate the sprint? Who would be willing to
be present and help during the sprint? *

Note that there is less than 1 week left until the tutorial submission
deadline. I am happy to help brainstorm on this to make it happen.

Thanks,
Jonathan and Andy, for the SciPy2013 organizers

-- 
Jonathan Rocher, PhD
Scientific software developer
SciPy2013 conference co-chair
Enthought, Inc.
jrocher at enthought.com
1-512-536-1057
http://www.enthought.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130325/41002c00/attachment.html>

From ralf.gommers at gmail.com  Mon Mar 25 15:51:13 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 25 Mar 2013 20:51:13 +0100
Subject: [Numpy-discussion] variables not defined in numpy.random
 __init.py__ ?
In-Reply-To: <BAY173-DS23007BEF790895316C5B0FA3D70@phx.gbl>
References: <BAY173-DS23007BEF790895316C5B0FA3D70@phx.gbl>
Message-ID: <CABL7CQjGL55PtVrBfQe7e=jsUc7ofYAwP0i2vyZXFA6f931FPg@mail.gmail.com>

On Mon, Mar 25, 2013 at 4:23 PM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com>wrote:

> **
> Using PyInstaller, the following error occurs:
>
> Traceback (most recent call last):
>   File "<string>", line 9, in <module>
>   File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in init
>     __import__(f, globals(), locals(), [])
>   File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line
> 23, in <module>
>     import os, tempfile
>   File "/usr/lib/python2.7/tempfile.py", line 34, in <module>
>     from random import Random as _Random
>   File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", line
> 90, in <module>
>     ranf = random = sample = random_sample
> NameError: name 'random_sample' is not defined
>
> Is line 90 in __init.py__ valid?
>

It is.

Above the failing you see a line "from info import __all__", and in
random/info.py you'll see that `random_sample` is in the __all__ dict.
Somehow it disappeared for you, you'll need to do some debugging to find
out why.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130325/e5c45b0c/attachment.html>

From brad.froehle at gmail.com  Mon Mar 25 16:26:03 2013
From: brad.froehle at gmail.com (Bradley M. Froehle)
Date: Mon, 25 Mar 2013 13:26:03 -0700
Subject: [Numpy-discussion] variables not defined in numpy.random
 __init.py__ ?
In-Reply-To: <CABL7CQjGL55PtVrBfQe7e=jsUc7ofYAwP0i2vyZXFA6f931FPg@mail.gmail.com>
References: <BAY173-DS23007BEF790895316C5B0FA3D70@phx.gbl>
	<CABL7CQjGL55PtVrBfQe7e=jsUc7ofYAwP0i2vyZXFA6f931FPg@mail.gmail.com>
Message-ID: <CAHXv-Mise5+mswHdXfwTcv12Tcc1yhgfr0j5kVpWkAnBvf+r=g@mail.gmail.com>

On Mon, Mar 25, 2013 at 12:51 PM, Ralf Gommers <ralf.gommers at gmail.com>wrote:

> On Mon, Mar 25, 2013 at 4:23 PM, Dinesh B Vadhia <
> dineshbvadhia at hotmail.com> wrote:
>
>> **
>> Using PyInstaller, the following error occurs:
>>
>> Traceback (most recent call last):
>>   File "<string>", line 9, in <module>
>>   File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in init
>>     __import__(f, globals(), locals(), [])
>>   File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line
>> 23, in <module>
>>     import os, tempfile
>>   File "/usr/lib/python2.7/tempfile.py", line 34, in <module>
>>     from random import Random as _Random
>>   File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", line
>> 90, in <module>
>>     ranf = random = sample = random_sample
>> NameError: name 'random_sample' is not defined
>>
>> Is line 90 in __init.py__ valid?
>>
>
> It is.
>

In my reading of this the main problem is that `tempfile` is trying to
import `random` from the Python standard library but instead is importing
the one from within NumPy (i.e., `numpy.random`).  I suspect that somehow
`sys.path` is being set incorrectly --- perhaps because of the `PYTHONPATH`
environment variable.

-Brad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130325/0ab1fdd2/attachment.html>

From ralf.gommers at gmail.com  Mon Mar 25 19:27:35 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 26 Mar 2013 00:27:35 +0100
Subject: [Numpy-discussion] NumPy/SciPy participation in GSoC 2013
In-Reply-To: <CABL7CQgOxa3bqf3_MXAAAPoq1-ZkN09AJrZpfq94u7V0aTRPtw@mail.gmail.com>
References: <CABL7CQgOxa3bqf3_MXAAAPoq1-ZkN09AJrZpfq94u7V0aTRPtw@mail.gmail.com>
Message-ID: <CABL7CQh7s_KRsmbowafRP4Riotec7DEt5mfi9xpmgFc2sGtndQ@mail.gmail.com>

On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers <ralf.gommers at gmail.com>wrote:

> Hi all,
>
> It is the time of the year for Google Summer of Code applications. If we
> want to participate with Numpy and/or Scipy, we need two things: enough
> mentors and ideas for projects. If we get those, we'll apply under the PSF
> umbrella. They've outlined the timeline they're working by and guidelines
> at
> http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html.
>
>
> We should be able to come up with some interesting project ideas I'd
> think, let's put those at
> http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably with
> enough detail to be understandable for people new to the projects and a
> proposed mentor.
>
> We need at least 3 people willing to mentor a student. Ideally we'd have
> enough mentors this week, so we can apply to the PSF on time. If you're
> willing to be a mentor, please send me the following: name, email address,
> phone nr, and what you're interested in mentoring. If you have time
> constaints and have doubts about being able to be a primary mentor, being a
> backup mentor would also be helpful.
>

So far we've only got one primary mentor (thanks Chuck!), most core devs do
not seem to have the bandwidth this year. If there are other people
interested in mentoring please let me know. If not, then it looks like
we're not participating this year.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130326/c1649929/attachment.html>

From ralf.gommers at gmail.com  Tue Mar 26 03:16:35 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 26 Mar 2013 08:16:35 +0100
Subject: [Numpy-discussion] [numfocus] Growing the contributor base of
	Numpy
In-Reply-To: <CAOzk5Qe1HsrbzYBd0i0hT+rA0X1TsOxvYRteJTetA4XsiXhPdA@mail.gmail.com>
References: <CAOzk5Qe1HsrbzYBd0i0hT+rA0X1TsOxvYRteJTetA4XsiXhPdA@mail.gmail.com>
Message-ID: <CABL7CQgwiDCo-fSHbr4qdKOHEW7CQYVxufzAb09+h8T-b2B3Lw@mail.gmail.com>

On Mon, Mar 25, 2013 at 7:56 PM, Jonathan Rocher <jrocher at enthought.com>wrote:

> Dear all,
>
> One recurring question is how to *grow the contributor base* to NumPy and
> provide help and relief to core developers and maintainers.
>
>  One way to do this would be to *leverage the upcoming SciPy conference*in 2 ways:
>
>    1. Provide an intermediate or advanced level tutorial on NumPy
>    focusing on teaching the C-API and the architecture of the package to help
>    people navigate the source code, and find answers to precise deep
>    questions. I think that many users would be interested in being better able
>    to understand the underlayers to become powerful users (and contributors if
>    they want to).
>
>    2. Organize a Numpy sprint to leverage all this freshly graduated
>    students apply what they learned to tackle some of the work under the
>    guidance of core developers.
>
>  This would be a great occasion to share and grow knowledge that is
> fundamental to our community. And the fact that the underlayers are in C is
> fine IMHO: SciPy is about scientific programming in Python and that is done
> with a lot of C.
>
> *Thoughts? Anyone interested in leading a tutorial (can be a team of
> people)? Anyone willing to coordinate the sprint? Who would be willing to
> be present and help during the sprint? *
>

First thought: excellent initiative. I'm not going to be at SciPy, but I'm
happy to coordinate a numpy/scipy sprint at EuroScipy. Going to email the
organizers right now.

Ralf


> Note that there is less than 1 week left until the tutorial submission
> deadline. I am happy to help brainstorm on this to make it happen.
>
> Thanks,
> Jonathan and Andy, for the SciPy2013 organizers
>
> --
> Jonathan Rocher, PhD
> Scientific software developer
> SciPy2013 conference co-chair
> Enthought, Inc.
> jrocher at enthought.com
> 1-512-536-1057
> http://www.enthought.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "NumFOCUS" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to numfocus+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130326/b75f0cae/attachment.html>

From dineshbvadhia at hotmail.com  Tue Mar 26 04:46:30 2013
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Tue, 26 Mar 2013 01:46:30 -0700
Subject: [Numpy-discussion] variables not defined in
	numpy.random__init.py__ ?
In-Reply-To: <CAHXv-Mise5+mswHdXfwTcv12Tcc1yhgfr0j5kVpWkAnBvf+r=g@mail.gmail.com>
References: <BAY173-DS23007BEF790895316C5B0FA3D70@phx.gbl><CABL7CQjGL55PtVrBfQe7e=jsUc7ofYAwP0i2vyZXFA6f931FPg@mail.gmail.com>
	<CAHXv-Mise5+mswHdXfwTcv12Tcc1yhgfr0j5kVpWkAnBvf+r=g@mail.gmail.com>
Message-ID: <BAY173-DS1450F59BCBB70C9F6ED072A3D00@phx.gbl>

@ Ralf.  I missed info.py at the top and it is a valid statement.

@ Brad.  My project is using Numpy and Scipy and falls over at this point when using PyInstaller.  One of the project source files has an "import random" from the Standard Library.  As you say, at this point in tempfile.py, it is attempting to "import random" from the Standard Library but instead is importing the one from Numpy (numpy.random).  How can this be fixed?  Or, is it something for PyInstaller to fix?  Thx.


From: Bradley M. Froehle 
Sent: Monday, March 25, 2013 1:26 PM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] variables not defined in numpy.random__init.py__ ?


On Mon, Mar 25, 2013 at 12:51 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

  On Mon, Mar 25, 2013 at 4:23 PM, Dinesh B Vadhia <dineshbvadhia at hotmail.com> wrote:

    Using PyInstaller, the following error occurs:

    Traceback (most recent call last):
      File "<string>", line 9, in <module>
      File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in init
        __import__(f, globals(), locals(), [])
      File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line 23, in <module>
        import os, tempfile
      File "/usr/lib/python2.7/tempfile.py", line 34, in <module>
        from random import Random as _Random
      File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", line 90, in <module>
        ranf = random = sample = random_sample
    NameError: name 'random_sample' is not defined

    Is line 90 in __init.py__ valid?


  It is. 


In my reading of this the main problem is that `tempfile` is trying to import `random` from the Python standard library but instead is importing the one from within NumPy (i.e., `numpy.random`).  I suspect that somehow `sys.path` is being set incorrectly --- perhaps because of the `PYTHONPATH` environment variable.


-Brad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130326/d681b882/attachment.html>

From pelson.pub at gmail.com  Tue Mar 26 05:20:34 2013
From: pelson.pub at gmail.com (Phil Elson)
Date: Tue, 26 Mar 2013 09:20:34 +0000
Subject: [Numpy-discussion] Implementing a "find first" style function
In-Reply-To: <CA+L60sBGXLc6s+yyAnmKqwAs1SUAh7egAux+CWcmsjZdXH5bTw@mail.gmail.com>
References: <CA+L60sD10b9oAd+fMy1t941nX3kSy2RWS4RVkkhpFhi2pUdseg@mail.gmail.com>
	<CANNq6FmLkHunDrBW5t2Fy5Sh34E_sQaD9TeRw90mCAGNzb=dYQ@mail.gmail.com>
	<CA+L60sBGXLc6s+yyAnmKqwAs1SUAh7egAux+CWcmsjZdXH5bTw@mail.gmail.com>
Message-ID: <CA+L60sDJyK=-T7Xd-AioOxh5zkvCqcxqX5g1KLH02ULhMqY6-w@mail.gmail.com>

Bump.

I'd be interested to know if this is a desirable feature for numpy?
(specifically the 1D "find" functionality rather than the "any"/"all" also
discussed)
If so, I'd be more than happy to submit a PR, but I don't want to put in
the effort if the principle isn't desirable in the core of numpy.

Cheers,


On 8 March 2013 17:38, Phil Elson <pelson.pub at gmail.com> wrote:

> Interesting. I hadn't thought of those. I've implemented (very roughly
> without a sound logic check) and benchmarked:
>
> def my_any(a, predicate, chunk_size=2048):
>     try:
>         next(find(a, predicate, chunk_size))
>         return True
>     except StopIteration:
>         return False
>
> def my_all(a, predicate, chunk_size=2048):
>     return not my_any(a, lambda a: ~predicate(a), chunk_size)
>
>
> With the following setup:
>
> import numpy as np
> import numpy.random
>
> np.random.seed(1)
> a = np.random.randn(1e8)
>
>
> For a low frequency *any*:
>
> In [12]: %timeit (np.abs(a) > 6).any()
> 1 loops, best of 3: 1.29 s per loop
>
> In [13]: %timeit my_any(a, lambda a: np.abs(a) > 6)
>
> 1 loops, best of 3: 792 ms per loop
>
> In [14]: %timeit my_any(a, lambda a: np.abs(a) > 6, chunk_size=10000)
> 1 loops, best of 3: 654 ms per loop
>
> For a False *any*:
>
> In [16]: %timeit (np.abs(a) > 7).any()
> 1 loops, best of 3: 1.22 s per loop
>
> In [17]: %timeit my_any(a, lambda a: np.abs(a) > 7)
> 1 loops, best of 3: 2.4 s per loop
>
> For a high probability *any*:
>
> In [28]: %timeit (np.abs(a) > 1).any()
> 1 loops, best of 3: 972 ms per loop
>
> In [27]: %timeit my_any(a, lambda a: np.abs(a) > 1)
> 10000 loops, best of 3: 67 us per loop
>
> ---------------
>
> For a low probability *all*:
>
> In [18]: %timeit (np.abs(a) < 6).all()
> 1 loops, best of 3: 1.16 s per loop
>
> In [19]: %timeit my_all(a, lambda a: np.abs(a) < 6)
> 1 loops, best of 3: 880 ms per loop
>
> In [20]: %timeit my_all(a, lambda a: np.abs(a) < 6, chunk_size=10000)
> 1 loops, best of 3: 706 ms per loop
>
> For a True *all*:
>
> In [22]: %timeit (np.abs(a) < 7).all()
> 1 loops, best of 3: 1.47 s per loop
>
> In [23]: %timeit my_all(a, lambda a: np.abs(a) < 7)
> 1 loops, best of 3: 2.65 s per loop
>
> For a high probability *all*:
>
> In [25]: %timeit (np.abs(a) < 1).all()
> 1 loops, best of 3: 978 ms per loop
>
> In [26]: %timeit my_all(a, lambda a: np.abs(a) < 1)
> 10000 loops, best of 3: 73.6 us per loop
>
>
>
>
>
>
>
> On 6 March 2013 21:16, Benjamin Root <ben.root at ou.edu> wrote:
>
>>
>>
>> On Tue, Mar 5, 2013 at 9:15 AM, Phil Elson <pelson.pub at gmail.com> wrote:
>>
>>> The ticket https://github.com/numpy/numpy/issues/2269 discusses the
>>> possibility of implementing a "find first" style function which can
>>> optimise the process of finding the first value(s) which match a predicate
>>> in a given 1D array. For example:
>>>
>>>
>>> >>> a = np.sin(np.linspace(0, np.pi, 200))
>>> >>> print find_first(a, lambda a: a > 0.9)
>>> ((71, ), 0.900479032457)
>>>
>>>
>>> This has been discussed in several locations:
>>>
>>> https://github.com/numpy/numpy/issues/2269
>>> https://github.com/numpy/numpy/issues/2333
>>>
>>> http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item
>>>
>>>
>>> *Rationale*
>>>
>>> For small arrays there is no real reason to avoid doing:
>>>
>>> >>> a = np.sin(np.linspace(0, np.pi, 200))
>>> >>> ind = (a > 0.9).nonzero()[0][0]
>>> >>> print (ind, ), a[ind]
>>> (71,) 0.900479032457
>>>
>>>
>>> But for larger arrays, this can lead to massive amounts of work even if
>>> the result is one of the first to be computed. Example:
>>>
>>> >>> a = np.arange(1e8)
>>> >>> print (a == 5).nonzero()[0][0]
>>> 5
>>>
>>>
>>> So a function which terminates when the first matching value is found is
>>> desirable.
>>>
>>> As mentioned in #2269, it is possible to define a consistent ordering
>>> which allows this functionality for >1D arrays, but IMHO it overcomplicates
>>> the problem and was not a case that I personally needed, so I've limited
>>> the scope to 1D arrays only.
>>>
>>>
>>> *Implementation*
>>>
>>> My initial assumption was that to get any kind of performance I would
>>> need to write the *find* function in C, however after prototyping with
>>> some array chunking it became apparent that a trivial python function would
>>> be quick enough for my needs.
>>>
>>> The approach I've implemented in the code found in #2269 simply breaks
>>> the array into sub-arrays of maximum length *chunk_size* (2048 by
>>> default, though there is no real science to this number), applies the given
>>> predicating function, and yields the results from *nonzero()*. The
>>> given function should be a python function which operates on the whole of
>>> the sub-array element-wise (i.e. the function should be vectorized).
>>> Returning a generator also has the benefit of allowing users to get the
>>> first *n* matching values/indices.
>>>
>>>
>>> *Results*
>>>
>>>
>>> I timed the implementation of *find* found in my comment at
>>> https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with
>>> an obvious test:
>>>
>>>
>>> In [1]: from np_utils import find
>>>
>>> In [2]: import numpy as np
>>>
>>> In [3]: import numpy.random
>>>
>>> In [4]: np.random.seed(1)
>>>
>>> In [5]: a = np.random.randn(1e8)
>>>
>>> In [6]: a.min(), a.max()
>>> Out[6]: (-6.1194900990552776, 5.9632246301166321)
>>>
>>> In [7]: next(find(a, lambda a: np.abs(a) > 6))
>>> Out[7]: ((33105441,), -6.1194900990552776)
>>>
>>> In [8]: (np.abs(a) > 6).nonzero()
>>> Out[8]: (array([33105441]),)
>>>
>>> In [9]: %timeit (np.abs(a) > 6).nonzero()
>>> 1 loops, best of 3: 1.51 s per loop
>>>
>>> In [10]: %timeit next(find(a, lambda a: np.abs(a) > 6))
>>> 1 loops, best of 3: 912 ms per loop
>>>
>>> In [11]: %timeit next(find(a, lambda a: np.abs(a) > 6,
>>> chunk_size=100000))
>>> 1 loops, best of 3: 470 ms per loop
>>>
>>> In [12]: %timeit next(find(a, lambda a: np.abs(a) > 6,
>>> chunk_size=1000000))
>>> 1 loops, best of 3: 483 ms per loop
>>>
>>>
>>> This shows that picking a sensible *chunk_size* can yield massive
>>> speed-ups (nonzero is x3 slower in one case). A similar example with a much
>>> smaller 1D array shows similar promise:
>>>
>>> In [41]: a = np.random.randn(1e4)
>>>
>>> In [42]: %timeit next(find(a, lambda a: np.abs(a) > 3))
>>> 10000 loops, best of 3: 35.8 us per loop
>>>
>>> In [43]: %timeit (np.abs(a) > 3).nonzero()
>>> 10000 loops, best of 3: 148 us per loop
>>>
>>>
>>> As I commented on the issue tracker, if you think this function is worth
>>> taking forward, I'd be happy to open up a pull request.
>>>
>>> Feedback greatfully received.
>>>
>>> Cheers,
>>>
>>> Phil
>>>
>>>
>>>
>> In the interest of generalizing code and such, could such approaches be
>> used for functions like np.any() and np.all() for short-circuiting if True
>> or False (respectively) are found?  I wonder what other sort of functions
>> in NumPy might benefit from this?
>>
>> Ben Root
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130326/40ea8e09/attachment.html>

From ndbecker2 at gmail.com  Tue Mar 26 09:07:07 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 26 Mar 2013 09:07:07 -0400
Subject: [Numpy-discussion] howto reduce along arbitrary axis
Message-ID: <kis6hp$r7p$1@ger.gmane.org>

In the following code, the function maxstar is applied along the
last axis.  Can anyone suggest how to modify this to apply reduction along
a user-specified axis?

def maxstar2 (a, b):
    return max (a, b) + log1p (exp (-abs (a - b)))


def maxstar (u):
    s = u.shape[-1]
    if s == 1:
        return u[...,0]
    elif s == 2:
        return maxstar2 (u[...,0], u[...,1])
    else:
        return maxstar2 (
            maxstar (u[...,:s/2]),
            maxstar (u[...,s/2:]))


From chaoyuejoy at gmail.com  Tue Mar 26 10:23:16 2013
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Tue, 26 Mar 2013 15:23:16 +0100
Subject: [Numpy-discussion] howto reduce along arbitrary axis
Message-ID: <CAAN-aRFTLfVyMxbttrU8ZOLP+kJZ3Hz2MUTQED-YDwTDC=LjpA@mail.gmail.com>

Hi Neal,

I forward you this mail which I think might be of help to your question.

Chao

---------- Forwarded message ----------
From: Chao YUE <chaoyuejoy at gmail.com>
Date: Sat, Mar 16, 2013 at 5:40 PM
Subject: indexing of arbitrary axis and arbitrary slice?
To: Discussion of Numerical Python <numpy-discussion at scipy.org>


Dear all,

Is there some way to index the numpy array by specifying arbitrary axis and
arbitrary slice, while
not knowing the actual shape of the data?
For example, I have a 3-dim data, data.shape = (3,4,5)
Is there a way to retrieve data[:,0,:] by using something like
np.retrieve_data(data,axis=2,slice=0),
by this way you don't have to know the actual shape of the array.
for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually be
data[:,0,:,:]

thanks in advance,

Chao

-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130326/9fdc35cb/attachment.html>

From chaoyuejoy at gmail.com  Tue Mar 26 10:59:58 2013
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Tue, 26 Mar 2013 15:59:58 +0100
Subject: [Numpy-discussion] howto reduce along arbitrary axis
In-Reply-To: <CAG3t+pGMfDyKGjbBmd45NdaaoK-9tSzc3Y9VyztKKTTfxTDj5g@mail.gmail.com>
References: <CAAN-aRFTLfVyMxbttrU8ZOLP+kJZ3Hz2MUTQED-YDwTDC=LjpA@mail.gmail.com>
	<CAG3t+pGMfDyKGjbBmd45NdaaoK-9tSzc3Y9VyztKKTTfxTDj5g@mail.gmail.com>
Message-ID: <CAAN-aRH06SpAgK5B6rjH4Oh0cVmwN=0bGsGCtNzyCUXdT9XHXw@mail.gmail.com>

Oh sorry, my fault...

here is the answer by Nathaniel Smith:

def retrieve_data(a, ax, idx):
    full_idx = [slice(None)] * a.ndim
    full_idx[ax] = idx
    return a[tuple(full_idx)]

Or for the specific case where you do know the axis in advance, you just
don't know how many trailing axes there are, use
    a[:, :, 0, ...]
and the ... will expand to represent the appropriate number of :'s.
probably you can sue something simlaer.

Chao

On Tue, Mar 26, 2013 at 3:33 PM, Neal Becker <ndbecker2 at gmail.com> wrote:

> Thank you, but do you also have an answer to this question?  I only see
> the question.
>
>
> On Tue, Mar 26, 2013 at 10:23 AM, Chao YUE <chaoyuejoy at gmail.com> wrote:
>
>> Hi Neal,
>>
>> I forward you this mail which I think might be of help to your question.
>>
>> Chao
>>
>> ---------- Forwarded message ----------
>> From: Chao YUE <chaoyuejoy at gmail.com>
>> Date: Sat, Mar 16, 2013 at 5:40 PM
>> Subject: indexing of arbitrary axis and arbitrary slice?
>> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
>>
>>
>> Dear all,
>>
>> Is there some way to index the numpy array by specifying arbitrary axis
>> and arbitrary slice, while
>> not knowing the actual shape of the data?
>> For example, I have a 3-dim data, data.shape = (3,4,5)
>> Is there a way to retrieve data[:,0,:] by using something like
>> np.retrieve_data(data,axis=2,slice=0),
>> by this way you don't have to know the actual shape of the array.
>> for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually
>> be data[:,0,:,:]
>>
>> thanks in advance,
>>
>> Chao
>>
>> --
>>
>> ***********************************************************************************
>> Chao YUE
>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
>> UMR 1572 CEA-CNRS-UVSQ
>> Batiment 712 - Pe 119
>> 91191 GIF Sur YVETTE Cedex
>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>>
>> ************************************************************************************
>>
>>
>>
>> --
>>
>> ***********************************************************************************
>> Chao YUE
>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
>> UMR 1572 CEA-CNRS-UVSQ
>> Batiment 712 - Pe 119
>> 91191 GIF Sur YVETTE Cedex
>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>>
>> ************************************************************************************
>>
>
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130326/5fc32344/attachment.html>

From cournape at gmail.com  Tue Mar 26 15:06:11 2013
From: cournape at gmail.com (David Cournapeau)
Date: Tue, 26 Mar 2013 19:06:11 +0000
Subject: [Numpy-discussion] [numfocus] Growing the contributor base of
	Numpy
In-Reply-To: <CAOzk5Qe1HsrbzYBd0i0hT+rA0X1TsOxvYRteJTetA4XsiXhPdA@mail.gmail.com>
References: <CAOzk5Qe1HsrbzYBd0i0hT+rA0X1TsOxvYRteJTetA4XsiXhPdA@mail.gmail.com>
Message-ID: <CAGY4rcWm9Am9DAwZJtDH8pd0kTdoQvYW5kOPvPpGj0BBh2wScg@mail.gmail.com>

On Mon, Mar 25, 2013 at 6:56 PM, Jonathan Rocher <jrocher at enthought.com> wrote:
> Dear all,
>
> One recurring question is how to grow the contributor base to NumPy and
> provide help and relief to core developers and maintainers.
>
> One way to do this would be to leverage the upcoming SciPy conference in 2
> ways:
>
> Provide an intermediate or advanced level tutorial on NumPy focusing on
> teaching the C-API and the architecture of the package to help people
> navigate the source code, and find answers to precise deep questions. I
> think that many users would be interested in being better able to understand
> the underlayers to become powerful users (and contributors if they want to).
>
> Organize a Numpy sprint to leverage all this freshly graduated students
> apply what they learned to tackle some of the work under the guidance of
> core developers.
>
> This would be a great occasion to share and grow knowledge that is
> fundamental to our community. And the fact that the underlayers are in C is
> fine IMHO: SciPy is about scientific programming in Python and that is done
> with a lot of C.
>
> Thoughts? Anyone interested in leading a tutorial (can be a team of people)?
> Anyone willing to coordinate the sprint? Who would be willing to be present
> and help during the sprint?

I would be happy to be part of the team doing it,

David


From ondrej.certik at gmail.com  Tue Mar 26 20:32:06 2013
From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=)
Date: Tue, 26 Mar 2013 17:32:06 -0700
Subject: [Numpy-discussion] ANN: NumPy 1.7.1rc1 release
In-Reply-To: <CAB6mnxKDudhQ9nCPoxiQ0Ww=yf_NDHS6P_orpL3PoHhi14iNTw@mail.gmail.com>
References: <CADDwiVDRXtgrVm7jtkk=BwN8s=_6cM6TEkQ_i82fQfwv3KsqEA@mail.gmail.com>
	<CAB6mnxKDudhQ9nCPoxiQ0Ww=yf_NDHS6P_orpL3PoHhi14iNTw@mail.gmail.com>
Message-ID: <CADDwiVAo-ANqY=mzv7_m1Wxh4x23702m-voc6hgTL8+3J56s5w@mail.gmail.com>

On Sun, Mar 24, 2013 at 8:00 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Sun, Mar 24, 2013 at 3:02 PM, Ond?ej ?ert?k <ondrej.certik at gmail.com>
> wrote:
>>
>> Hi,
>>
>> I'm pleased to announce the availability of the first release candidate of
>> NumPy 1.7.1rc1.
>>
>> Sources and binary installers can be found at
>> https://sourceforge.net/projects/numpy/files/NumPy/1.7.1rc1/
>>
>> Please test it and report any bugs. It fixes a few bugs, listed below.
>>
>> I would like to thank everybody who contributed patches to this release:
>> Nathaniel J. Smith, Sebastian Berg, Charles Harris, Bradley M. Froehle,
>> Ralf Gommers, Christoph Gohlke, Mark Wiebe and Maximilian Albert.
>>
>> Cheers,
>> Ondrej
>>
>>
>>
>> =========================
>> NumPy 1.7.1 Release Notes
>> =========================
>>
>> This is a bugfix only release in the 1.7.x series.
>>
>>
>> Issues fixed
>> ------------
>>
>> gh-2973   Fix `1` is printed during numpy.test()
>> gh-2983   BUG: gh-2969: Backport memory leak fix 80b3a34.
>> gh-3007   Backport gh-3006
>> gh-2984   Backport fix complex polynomial fit
>> gh-2982   BUG: Make nansum work with booleans.
>> gh-2985   Backport large sort fixes
>> gh-3039   Backport object take
>> gh-3105   Backport nditer fix op axes initialization
>> gh-3108   BUG: npy-pkg-config ini files were missing after Bento build.
>> gh-3124   BUG: PyArray_LexSort allocates too much temporary memory.
>> gh-3131   BUG: Exported f2py_size symbol prevents linking multiple f2py
>> modules.
>> gh-3117   Backport gh-2992
>> gh-3135   DOC: Add mention of PyArray_SetBaseObject stealing a reference
>> gh-3134   DOC: Fix typo in fft docs (the indexing variable is 'm', not
>> 'n').
>> gh-3136   Backport #3128
>>
>> Checksums
>> =========
>>
>> 28c3f3e71b5eaa6bfab6e8340dbd35e7  release/installers/numpy-1.7.1rc1.tar.gz
>> 436f416dee10d157314bd9da7ab95c9c
>> release/installers/numpy-1.7.1rc1-win32-superpack-python2.7.exe
>> a543c8cf69f66ff2b4c9565646105863
>> release/installers/numpy-1.7.1rc1-win32-superpack-python2.6.exe
>> 6dfcbbd449b7fe4e841c5fd1bfa7af7c
>> release/installers/numpy-1.7.1rc1-win32-superpack-python2.5.exe
>> 22912792a1b6155ae2bdbc30bee8fadc
>> release/installers/numpy-1.7.1rc1-win32-superpack-python3.2.exe
>> 95bc5a5fcce9fcbc2717a774dccae31b
>> release/installers/numpy-1.7.1rc1-win32-superpack-python3.3.exe
>> 33cf283765a148846b49b89fb96d67d5
>> release/installers/numpy-1.7.1rc1-win32-superpack-python3.1.exe
>> 9761de4b35493fed38c5d177da9c3b37  release/installers/numpy-1.7.1rc1.zip
>> __
>
>
> Great. The fix for the memory leak should make some folks happy.

Yes. I created an issue here for them to test it:

https://github.com/scikit-learn/scikit-learn/issues/1809

Just to make sure.

Ondrej


From matthew.brett at gmail.com  Tue Mar 26 20:48:00 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 26 Mar 2013 17:48:00 -0700
Subject: [Numpy-discussion] Any plans for windows 64-bit installer for
	1.7?
In-Reply-To: <CADDwiVDziAKbLSfG2m9UuodT_HWzVMx2p5NCzg+J6iF4xGwjHg@mail.gmail.com>
References: <CAH6Pt5pMGbZ1wOHeMsUGjDQoSqdZ8dZx8Qfk7R5z2sd0SHH2OQ@mail.gmail.com>
	<CAH6Pt5oXFjkJ+++HxDZho_bVqROvW8+7dPhZ57H4rExACZxqvw@mail.gmail.com>
	<CABL7CQh6k5XKzWsEM3gTkT+tTPvbCMSCKuaB0xFk12iz0JgwPg@mail.gmail.com>
	<CAH6Pt5qvqsEYWMr9NXQPkPCufqV3VMizsVypizvWLzfy6NpBaA@mail.gmail.com>
	<CAF6FJiv693xK0+enL3U9VHMVuLXNQkFTQXUpTvzcNFQfWSXPrw@mail.gmail.com>
	<CAB6mnxJEBhrxoBdO62e_CHagU+_EAqC_2wRoYw5ZNA+uy_K-rw@mail.gmail.com>
	<CAH6Pt5oZ1D-+=EzN1RrRK=9kJYSB949ZM=Kv8cc4Xzt4zQ-CHg@mail.gmail.com>
	<CAH6Pt5pqbHUbL3aPRag8sNJnJthO7tL79o6qFry96aQCADdyLg@mail.gmail.com>
	<51119007.6090806@uci.edu>
	<CAH6Pt5rvYXnFsS6D-Bashp770MDRRqs8QLS1W4pvK9u213WsDA@mail.gmail.com>
	<CABL7CQicsKKE+Y=ObhPqOKs1BzhTE=OS_5j6mt+7n-MJGdwY_A@mail.gmail.com>
	<CAH6Pt5rPn4MxttRhFDZ3evaX371zoLh8qD5NUBwH8WU-ZSJ6Mw@mail.gmail.com>
	<CABL7CQiSJ0WJjUgBLXKF-bp1GnDKWRSEAy0pqQ+HyQODSeTM6Q@mail.gmail.com>
	<CAH6Pt5pmktcmqFnRxrjsC3hcKs3Qr2r7r31BYDQtJJiga=oF6g@mail.gmail.com>
	<5113399F.3090803@astro.uio.no>
	<CADDwiVAJs5QTuB6jWRysnFQjvTR=u9h1B1qNUHjiSu0i4y3zfg@mail.gmail.com>
	<CAH6Pt5pdDJZ0dLLNULo_eG6yCLuJfC3FtaCGa-AveShqrGZ_Xw@mail.gmail.com>
	<CABL7CQgZq9PjOX0pmCtfq5qRxXR5hky=bpsh_HiEs_2=pv1b=g@mail.gmail.com>
	<CAH6Pt5rAx7DV1dVE9S7wr0Fz7KasEKooJ8wdYayhSMnsri=mUw@mail.gmail.com>
	<CABL7CQg1_zoW-PR8vJjsrahK9DqdHWuK6m3RJwaF0PnUApeGqQ@mail.gmail.com>
	<CAH6Pt5rcrcAtR9KSm+QhLMP64jFXC4BDB4f70uhmCYsqKPNjsw@mail.gmail.com>
	<CALGmxE+rA9LXUF7Qyt4JbL9BzEmEnx7DYwpzXFO4urvk4GncVQ@mail.gmail.com>
	<CADDwiVDziAKbLSfG2m9UuodT_HWzVMx2p5NCzg+J6iF4xGwjHg@mail.gmail.com>
Message-ID: <CAH6Pt5rLMoDLTsRTjKb9c1uqPGjnhg-ZbOcyRGsFnVPi0DH6Vw@mail.gmail.com>

Hi Ondrej,

On Thu, Feb 7, 2013 at 3:18 PM, Ond?ej ?ert?k <ondrej.certik at gmail.com> wrote:
> On Thu, Feb 7, 2013 at 12:29 PM, Chris Barker - NOAA Federal
> <chris.barker at noaa.gov> wrote:
>> On Thu, Feb 7, 2013 at 11:38 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> a) If we cannot build Scipy now, it may or may not be acceptable to
>>> release numpy now.  I think it is, you (Ralf) think it isn't, we
>>> haven't discussed that.  It may not come up.
>>
>> Is anyone suggesting we hold the whole release for this? I fnot, then
>
> Just to make it clear, I do not plan to hold the whole release because of this.
> Previous releases also didn't have this official 64bit Windows binary,
> so there is
> no regression.
>
> Once we figure out how to create 64bit binaries, then we'll start
> uploading them.

Did you make any progress with this?  Worth making some notes?
Anything we can do to help?

Cheers,

Matthew


From njs at pobox.com  Wed Mar 27 08:19:02 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 27 Mar 2013 12:19:02 +0000
Subject: [Numpy-discussion] Implementing a "find first" style function
In-Reply-To: <CA+L60sDJyK=-T7Xd-AioOxh5zkvCqcxqX5g1KLH02ULhMqY6-w@mail.gmail.com>
References: <CA+L60sD10b9oAd+fMy1t941nX3kSy2RWS4RVkkhpFhi2pUdseg@mail.gmail.com>
	<CANNq6FmLkHunDrBW5t2Fy5Sh34E_sQaD9TeRw90mCAGNzb=dYQ@mail.gmail.com>
	<CA+L60sBGXLc6s+yyAnmKqwAs1SUAh7egAux+CWcmsjZdXH5bTw@mail.gmail.com>
	<CA+L60sDJyK=-T7Xd-AioOxh5zkvCqcxqX5g1KLH02ULhMqY6-w@mail.gmail.com>
Message-ID: <CAPJVwB=bWwdiGUPLw+AZ7xQBR2xfhLvJ0qFbC1nep7bWLGbL6w@mail.gmail.com>

On Tue, Mar 26, 2013 at 9:20 AM, Phil Elson <pelson.pub at gmail.com> wrote:
> Bump.
>
> I'd be interested to know if this is a desirable feature for numpy?
> (specifically the 1D "find" functionality rather than the "any"/"all" also
> discussed)
> If so, I'd be more than happy to submit a PR, but I don't want to put in the
> effort if the principle isn't desirable in the core of numpy.

I don't think anyone has a strong opinion either way :-). It seems
like a fairly general interface that people might find useful, so I
don't see an immediate objection to including it in principle. It
would help to see the actual numbers from a tuned version though to
know how much benefit there is to get...

-n


From mdroe at stsci.edu  Wed Mar 27 08:51:11 2013
From: mdroe at stsci.edu (Michael Droettboom)
Date: Wed, 27 Mar 2013 08:51:11 -0400
Subject: [Numpy-discussion] ANN: matplotlib 1.2.1 release
Message-ID: <5152EB3F.5090902@stsci.edu>

I'm pleased to announce the release of matplotlib 1.2.1.  This is a bug 
release and improves stability and quality over the 1.2.0 release from 
four months ago.  All users on 1.2.0 are encouraged to upgrade.

Since github no longer provides download hosting, our tarballs and 
binaries are back on SourceForge, and we have a master index of 
downloads here:

http://matplotlib.org/downloads <http://matplotlib.org/downloads.html>

Highlights include:

- Usage of deprecated APIs in matplotlib are now displayed by default on 
all Python versions
- Agg backend: Cleaner rendering of rectilinear lines when snapping to 
pixel boundaries, and fixes rendering bugs when using clip paths
- Python 3: Fixes a number of missed Python 3 compatibility problems
- Histograms and stacked histograms have a number of important bugfixes
- Compatibility with more 3rd-party TrueType fonts
- SVG backend: Image support in SVG output is consistent with other backends
- Qt backend: Fixes leaking of window objects in Qt backend
- hexbin with a log scale now works correctly
- autoscaling works better on 3D plots
- ...and numerous others.

Enjoy!  As always, there are number of good ways to get help with 
matplotlib listed on the homepage at http://matplotlib.org/ and I thank 
everyone for their continued support of this project.

Mike Droettboom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130327/4dcbc4f7/attachment.html>

From Andrea.Cimatoribus at nioz.nl  Wed Mar 27 10:41:59 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 27 Mar 2013 15:41:59 +0100
Subject: [Numpy-discussion] Growing the contributor base of Numpy
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F7F514F1@es1.nioz.nl>

Not sure if this is really relevant to the original message, but here is my opinion. I think that the numpy/scipy community would greatly benefit from a platform enabling easy sharing of code written by users. This would provide a database of solved problems, where people could dig without having to ask. I think that something like this exists for matlab, but I have no experience with it. If it exists for python, then it must be seriously under-advertised. The web provides many answers, but they are scattered in all sorts of places, and it is often impossible to contribute improvements to code found online. If such a database could enable some sort of collaborative development it would be a great added value for numpy, and would provide a natural source of new features or improvements for scipy and numpy.


From njs at pobox.com  Wed Mar 27 10:59:09 2013
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 27 Mar 2013 14:59:09 +0000
Subject: [Numpy-discussion] Growing the contributor base of Numpy
In-Reply-To: <D178C78358EBCB46B95A8E0B4FD53D56A0F7F514F1@es1.nioz.nl>
References: <D178C78358EBCB46B95A8E0B4FD53D56A0F7F514F1@es1.nioz.nl>
Message-ID: <CAPJVwBmb2D_EaSUQ=1xqbgcF9KiF_HFt3J04d07W01KrdNVU7w@mail.gmail.com>

On Wed, Mar 27, 2013 at 2:41 PM, Andrea Cimatoribus
<Andrea.Cimatoribus at nioz.nl> wrote:
>
> Not sure if this is really relevant to the original message, but here is my opinion. I think that the numpy/scipy community would greatly benefit from a platform enabling easy sharing of code written by users. This would provide a database of solved problems, where people could dig without having to ask. I think that something like this exists for matlab, but I have no experience with it. If it exists for python, then it must be seriously under-advertised. The web provides many answers, but they are scattered in all sorts of places, and it is often impossible to contribute improvements to code found online. If such a database could enable some sort of collaborative development it would be a great added value for numpy, and would provide a natural source of new features or improvements for scipy and numpy.

Supposedly that's what scipy-central is for, but it's somehow not yet
reached critical mass and become a household name; I haven't looked
hard enough to have any hypotheses about why not. Surya Kasturi is
working on spiffing it up (see discussion on scipy-dev); I bet they
could use some help if you want to scratch this itch.

-n


From Andrea.Cimatoribus at nioz.nl  Wed Mar 27 13:12:02 2013
From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus)
Date: Wed, 27 Mar 2013 18:12:02 +0100
Subject: [Numpy-discussion] Growing the contributor base of Numpy
Message-ID: <D178C78358EBCB46B95A8E0B4FD53D56A0F7F514F5@es1.nioz.nl>

Oh, I didn't even know it existed!

>
> Not sure if this is really relevant to the original message, but here is my opinion. I think that the numpy/scipy community would greatly benefit from a platform enabling easy sharing of code written by users. This would provide a database of solved problems, where people could dig without having to ask. I think that something like this exists for matlab, but I have no experience with it. If it exists for python, then it must be seriously under-advertised. The web provides many answers, but they are scattered in all sorts of places, and it is often impossible to contribute improvements to code found online. If such a database could enable some sort of collaborative development it would be a great added value for numpy, and would provide a natural source of new features or improvements for scipy and numpy.

Supposedly that's what scipy-central is for, but it's somehow not yet
reached critical mass and become a household name; I haven't looked
hard enough to have any hypotheses about why not. Surya Kasturi is
working on spiffing it up (see discussion on scipy-dev); I bet they
could use some help if you want to scratch this itch.

From ralf.gommers at gmail.com  Wed Mar 27 16:09:21 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 27 Mar 2013 21:09:21 +0100
Subject: [Numpy-discussion] [numfocus] Growing the contributor base of
	Numpy
In-Reply-To: <CABL7CQgwiDCo-fSHbr4qdKOHEW7CQYVxufzAb09+h8T-b2B3Lw@mail.gmail.com>
References: <CAOzk5Qe1HsrbzYBd0i0hT+rA0X1TsOxvYRteJTetA4XsiXhPdA@mail.gmail.com>
	<CABL7CQgwiDCo-fSHbr4qdKOHEW7CQYVxufzAb09+h8T-b2B3Lw@mail.gmail.com>
Message-ID: <CABL7CQhS06=h=4L7BJWFXEfzzNqQPu2kQ0jKrgaVfdwFZtquLw@mail.gmail.com>

On Tue, Mar 26, 2013 at 8:16 AM, Ralf Gommers <ralf.gommers at gmail.com>wrote:

>
>
>
> On Mon, Mar 25, 2013 at 7:56 PM, Jonathan Rocher <jrocher at enthought.com>wrote:
>
>> Dear all,
>>
>> One recurring question is how to *grow the contributor base* to NumPy
>> and provide help and relief to core developers and maintainers.
>>
>>  One way to do this would be to *leverage the upcoming SciPy conference*in 2 ways:
>>
>>    1. Provide an intermediate or advanced level tutorial on NumPy
>>    focusing on teaching the C-API and the architecture of the package to help
>>    people navigate the source code, and find answers to precise deep
>>    questions. I think that many users would be interested in being better able
>>    to understand the underlayers to become powerful users (and contributors if
>>    they want to).
>>
>>    2. Organize a Numpy sprint to leverage all this freshly graduated
>>    students apply what they learned to tackle some of the work under the
>>    guidance of core developers.
>>
>>  This would be a great occasion to share and grow knowledge that is
>> fundamental to our community. And the fact that the underlayers are in C is
>> fine IMHO: SciPy is about scientific programming in Python and that is done
>> with a lot of C.
>>
>> *Thoughts? Anyone interested in leading a tutorial (can be a team of
>> people)? Anyone willing to coordinate the sprint? Who would be willing to
>> be present and help during the sprint? *
>>
>
> First thought: excellent initiative. I'm not going to be at SciPy, but I'm
> happy to coordinate a numpy/scipy sprint at EuroScipy. Going to email the
> organizers right now.
>

The EuroScipy organizers have accepted our sprint, so we'll have a room
available. If you're going to the conference, think about reserving Sun 25
Aug to attend this sprint. I've put up a page where people can add topics
and more details: http://projects.scipy.org/scipy/wiki/EuroSciPy2013Sprint

Ralf


>
> Ralf
>
>
>
>
>> Note that there is less than 1 week left until the tutorial submission
>> deadline. I am happy to help brainstorm on this to make it happen.
>>
>> Thanks,
>> Jonathan and Andy, for the SciPy2013 organizers
>>
>> --
>> Jonathan Rocher, PhD
>> Scientific software developer
>> SciPy2013 conference co-chair
>> Enthought, Inc.
>> jrocher at enthought.com
>> 1-512-536-1057
>> http://www.enthought.com
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "NumFOCUS" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to numfocus+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130327/5efb63ba/attachment.html>

From jrocher at enthought.com  Wed Mar 27 16:45:56 2013
From: jrocher at enthought.com (Jonathan Rocher)
Date: Wed, 27 Mar 2013 15:45:56 -0500
Subject: [Numpy-discussion] [numfocus] Growing the contributor base of
	Numpy
In-Reply-To: <CABL7CQhS06=h=4L7BJWFXEfzzNqQPu2kQ0jKrgaVfdwFZtquLw@mail.gmail.com>
References: <CAOzk5Qe1HsrbzYBd0i0hT+rA0X1TsOxvYRteJTetA4XsiXhPdA@mail.gmail.com>
	<CABL7CQgwiDCo-fSHbr4qdKOHEW7CQYVxufzAb09+h8T-b2B3Lw@mail.gmail.com>
	<CABL7CQhS06=h=4L7BJWFXEfzzNqQPu2kQ0jKrgaVfdwFZtquLw@mail.gmail.com>
Message-ID: <CAOzk5QcS+ctVe-307V6Hcbkeg_LUQMwFuKiUU8jJO-CpxhUmAg@mail.gmail.com>

Awesome Ralf!

And thanks David C. for being available for the US one. When you say you
would like to be part of it, did you mean an advanced tutorial or a sprint?
Other people available to contribute to this or coordinate this?

Thanks,
Jonathan


On Wed, Mar 27, 2013 at 3:09 PM, Ralf Gommers <ralf.gommers at gmail.com>wrote:

>
>
>
> On Tue, Mar 26, 2013 at 8:16 AM, Ralf Gommers <ralf.gommers at gmail.com>wrote:
>
>>
>>
>>
>> On Mon, Mar 25, 2013 at 7:56 PM, Jonathan Rocher <jrocher at enthought.com>wrote:
>>
>>> Dear all,
>>>
>>> One recurring question is how to *grow the contributor base* to NumPy
>>> and provide help and relief to core developers and maintainers.
>>>
>>>  One way to do this would be to *leverage the upcoming SciPy conference*in 2 ways:
>>>
>>>    1. Provide an intermediate or advanced level tutorial on NumPy
>>>    focusing on teaching the C-API and the architecture of the package to help
>>>    people navigate the source code, and find answers to precise deep
>>>    questions. I think that many users would be interested in being better able
>>>    to understand the underlayers to become powerful users (and contributors if
>>>    they want to).
>>>
>>>    2. Organize a Numpy sprint to leverage all this freshly graduated
>>>    students apply what they learned to tackle some of the work under the
>>>    guidance of core developers.
>>>
>>>  This would be a great occasion to share and grow knowledge that is
>>> fundamental to our community. And the fact that the underlayers are in C is
>>> fine IMHO: SciPy is about scientific programming in Python and that is done
>>> with a lot of C.
>>>
>>> *Thoughts? Anyone interested in leading a tutorial (can be a team of
>>> people)? Anyone willing to coordinate the sprint? Who would be willing to
>>> be present and help during the sprint? *
>>>
>>
>> First thought: excellent initiative. I'm not going to be at SciPy, but
>> I'm happy to coordinate a numpy/scipy sprint at EuroScipy. Going to email
>> the organizers right now.
>>
>
> The EuroScipy organizers have accepted our sprint, so we'll have a room
> available. If you're going to the conference, think about reserving Sun 25
> Aug to attend this sprint. I've put up a page where people can add topics
> and more details: http://projects.scipy.org/scipy/wiki/EuroSciPy2013Sprint
>
> Ralf
>
>
>
>>
>> Ralf
>>
>>
>>
>>
>>>  Note that there is less than 1 week left until the tutorial submission
>>> deadline. I am happy to help brainstorm on this to make it happen.
>>>
>>> Thanks,
>>> Jonathan and Andy, for the SciPy2013 organizers
>>>
>>> --
>>> Jonathan Rocher, PhD
>>> Scientific software developer
>>> SciPy2013 conference co-chair
>>> Enthought, Inc.
>>> jrocher at enthought.com
>>> 1-512-536-1057
>>> http://www.enthought.com
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "NumFOCUS" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to numfocus+unsubscribe at googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>
>


-- 
Jonathan Rocher, PhD
Scientific software developer
SciPy2013 conference co-chair
Enthought, Inc.
jrocher at enthought.com
1-512-536-1057
http://www.enthought.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130327/597cc4b0/attachment.html>

From ralf.gommers at gmail.com  Wed Mar 27 18:11:12 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 27 Mar 2013 23:11:12 +0100
Subject: [Numpy-discussion] variables not defined in
 numpy.random__init.py__ ?
In-Reply-To: <BAY173-DS1450F59BCBB70C9F6ED072A3D00@phx.gbl>
References: <BAY173-DS23007BEF790895316C5B0FA3D70@phx.gbl>
	<CABL7CQjGL55PtVrBfQe7e=jsUc7ofYAwP0i2vyZXFA6f931FPg@mail.gmail.com>
	<CAHXv-Mise5+mswHdXfwTcv12Tcc1yhgfr0j5kVpWkAnBvf+r=g@mail.gmail.com>
	<BAY173-DS1450F59BCBB70C9F6ED072A3D00@phx.gbl>
Message-ID: <CABL7CQhq_yejksVVRu8EZjgLMjLcgbgteQa=6V39vAhqstThFg@mail.gmail.com>

On Tue, Mar 26, 2013 at 9:46 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com>wrote:

> **
> @ Ralf.  I missed info.py at the top and it is a valid statement.
>
> @ Brad.  My project is using Numpy and Scipy and falls over at this point
> when using PyInstaller.  One of the project source files has an "import
> random" from the Standard Library.  As you say, at this point in
> tempfile.py, it is attempting to "import random" from the Standard Library
> but instead is importing the one from Numpy (numpy.random).  How can this
> be fixed?  Or, is it something for PyInstaller to fix?  Thx.
>

Probably the latter. Check your PYTHONPATH is not set and you're not doing
anything to sys.path somehow. Then probably best to ask on the PyInstaller
mailing list.

Ralf


>
>
>  *From:* Bradley M. Froehle <brad.froehle at gmail.com>
> *Sent:* Monday, March 25, 2013 1:26 PM
> *To:* Discussion of Numerical Python <numpy-discussion at scipy.org>
> *Subject:* Re: [Numpy-discussion] variables not defined in
> numpy.random__init.py__ ?
>
> On Mon, Mar 25, 2013 at 12:51 PM, Ralf Gommers <ralf.gommers at gmail.com>wrote:
>
>> On Mon, Mar 25, 2013 at 4:23 PM, Dinesh B Vadhia <
>> dineshbvadhia at hotmail.com> wrote:
>>
>>> **
>>> Using PyInstaller, the following error occurs:
>>>
>>> Traceback (most recent call last):
>>>   File "<string>", line 9, in <module>
>>>   File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in
>>> init
>>>     __import__(f, globals(), locals(), [])
>>>   File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line
>>> 23, in <module>
>>>     import os, tempfile
>>>   File "/usr/lib/python2.7/tempfile.py", line 34, in <module>
>>>     from random import Random as _Random
>>>   File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py",
>>> line 90, in <module>
>>>     ranf = random = sample = random_sample
>>> NameError: name 'random_sample' is not defined
>>>
>>> Is line 90 in __init.py__ valid?
>>>
>>
>> It is.
>>
>
> In my reading of this the main problem is that `tempfile` is trying to
> import `random` from the Python standard library but instead is importing
> the one from within NumPy (i.e., `numpy.random`).  I suspect that somehow
> `sys.path` is being set incorrectly --- perhaps because of the `PYTHONPATH`
> environment variable.
>
> -Brad
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130327/bf46d0b0/attachment.html>

From pelson.pub at gmail.com  Thu Mar 28 07:04:15 2013
From: pelson.pub at gmail.com (Phil Elson)
Date: Thu, 28 Mar 2013 11:04:15 +0000
Subject: [Numpy-discussion] Implementing a "find first" style function
In-Reply-To: <CAPJVwB=bWwdiGUPLw+AZ7xQBR2xfhLvJ0qFbC1nep7bWLGbL6w@mail.gmail.com>
References: <CA+L60sD10b9oAd+fMy1t941nX3kSy2RWS4RVkkhpFhi2pUdseg@mail.gmail.com>
	<CANNq6FmLkHunDrBW5t2Fy5Sh34E_sQaD9TeRw90mCAGNzb=dYQ@mail.gmail.com>
	<CA+L60sBGXLc6s+yyAnmKqwAs1SUAh7egAux+CWcmsjZdXH5bTw@mail.gmail.com>
	<CA+L60sDJyK=-T7Xd-AioOxh5zkvCqcxqX5g1KLH02ULhMqY6-w@mail.gmail.com>
	<CAPJVwB=bWwdiGUPLw+AZ7xQBR2xfhLvJ0qFbC1nep7bWLGbL6w@mail.gmail.com>
Message-ID: <CA+L60sAwFaGQrXojDAKzwB8mDLx=ZHwUwbuF3UsOcXHhd39Fmw@mail.gmail.com>

I've specifically not tuned it, primarily because the get the best tuning
you need to know the likelihood of finding a match. One option would be to
allow users to specify a "probability" parameter which would chunk the
array into size*probability chunks - an additional parameter could then be
exposed to limit the maximum chunk size to give the user control of the
maximum memory overhead that the routine could use.

I'll submit a PR and we can discuss inline.

Thanks for the response Nathaniel.


On 27 March 2013 12:19, Nathaniel Smith <njs at pobox.com> wrote:

> On Tue, Mar 26, 2013 at 9:20 AM, Phil Elson <pelson.pub at gmail.com> wrote:
> > Bump.
> >
> > I'd be interested to know if this is a desirable feature for numpy?
> > (specifically the 1D "find" functionality rather than the "any"/"all"
> also
> > discussed)
> > If so, I'd be more than happy to submit a PR, but I don't want to put in
> the
> > effort if the principle isn't desirable in the core of numpy.
>
> I don't think anyone has a strong opinion either way :-). It seems
> like a fairly general interface that people might find useful, so I
> don't see an immediate objection to including it in principle. It
> would help to see the actual numbers from a tuned version though to
> know how much benefit there is to get...
>
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130328/b73d1eb2/attachment.html>

From cournape at gmail.com  Thu Mar 28 12:47:05 2013
From: cournape at gmail.com (David Cournapeau)
Date: Thu, 28 Mar 2013 16:47:05 +0000
Subject: [Numpy-discussion] [numfocus] Growing the contributor base of
	Numpy
In-Reply-To: <CAOzk5QcS+ctVe-307V6Hcbkeg_LUQMwFuKiUU8jJO-CpxhUmAg@mail.gmail.com>
References: <CAOzk5Qe1HsrbzYBd0i0hT+rA0X1TsOxvYRteJTetA4XsiXhPdA@mail.gmail.com>
	<CABL7CQgwiDCo-fSHbr4qdKOHEW7CQYVxufzAb09+h8T-b2B3Lw@mail.gmail.com>
	<CABL7CQhS06=h=4L7BJWFXEfzzNqQPu2kQ0jKrgaVfdwFZtquLw@mail.gmail.com>
	<CAOzk5QcS+ctVe-307V6Hcbkeg_LUQMwFuKiUU8jJO-CpxhUmAg@mail.gmail.com>
Message-ID: <CAGY4rcUMrK2jzRoR569EDbWkJOBJxD3-VVPGNUJsm3ubv5n20g@mail.gmail.com>

On Wed, Mar 27, 2013 at 8:45 PM, Jonathan Rocher <jrocher at enthought.com> wrote:
> Awesome Ralf!
>
> And thanks David C. for being available for the US one. When you say you
> would like to be part of it, did you mean an advanced tutorial or a sprint?

I meant I would be happy to contribute to a tutorial in the spirit of
"dive into numpy code". I would prefer if we were two doing it,
though.

David


From irving at naml.us  Thu Mar 28 14:16:48 2013
From: irving at naml.us (Geoffrey Irving)
Date: Thu, 28 Mar 2013 11:16:48 -0700
Subject: [Numpy-discussion] inheriting from recarray with nested dtypes
Message-ID: <CAJ1ofpfrUH9whmjyrTjdHPt=rmFbc6jbuMaJVU-vDeThUZ=huw@mail.gmail.com>

I have the following two structured dtypes:

    rotation (quaternion) = dtype([('s','f8'),('v','3f8')])
    frame = dtype([('t','3f8'),('r',rotation)])

For various reasons, I usually store rotation arrays in a class
Rotations deriving from ndarray, and frames in a class Frames deriving
from ndarray.

Currently I am defining .s, .v, .t, and .r properties manually, and
I'd like to switch to inheriting from recarray.  However, the .r
property should return an array of class Rotations.  I.e.,

    f = Frames(...) # f.dtype = frame
    r = f.r # r.dtype = rotation, type(r) = Rotations

Is there a clean way to tell recarray to adjust the type returned?  It
already has a bit of intelligence there, since it returns ndarray vs.
recarray based on whether the returned dtype has fields.

The full code is here in case anyone is curious:

    https://github.com/otherlab/core/blob/master/vector/Frame.py
    https://github.com/otherlab/core/blob/master/vector/Rotation.py

Thanks,
Geoffrey


From ondrej.certik at gmail.com  Thu Mar 28 21:31:07 2013
From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=)
Date: Thu, 28 Mar 2013 18:31:07 -0700
Subject: [Numpy-discussion] Any plans for windows 64-bit installer for
	1.7?
In-Reply-To: <CAH6Pt5rLMoDLTsRTjKb9c1uqPGjnhg-ZbOcyRGsFnVPi0DH6Vw@mail.gmail.com>
References: <CAH6Pt5pMGbZ1wOHeMsUGjDQoSqdZ8dZx8Qfk7R5z2sd0SHH2OQ@mail.gmail.com>
	<CAH6Pt5oXFjkJ+++HxDZho_bVqROvW8+7dPhZ57H4rExACZxqvw@mail.gmail.com>
	<CABL7CQh6k5XKzWsEM3gTkT+tTPvbCMSCKuaB0xFk12iz0JgwPg@mail.gmail.com>
	<CAH6Pt5qvqsEYWMr9NXQPkPCufqV3VMizsVypizvWLzfy6NpBaA@mail.gmail.com>
	<CAF6FJiv693xK0+enL3U9VHMVuLXNQkFTQXUpTvzcNFQfWSXPrw@mail.gmail.com>
	<CAB6mnxJEBhrxoBdO62e_CHagU+_EAqC_2wRoYw5ZNA+uy_K-rw@mail.gmail.com>
	<CAH6Pt5oZ1D-+=EzN1RrRK=9kJYSB949ZM=Kv8cc4Xzt4zQ-CHg@mail.gmail.com>
	<CAH6Pt5pqbHUbL3aPRag8sNJnJthO7tL79o6qFry96aQCADdyLg@mail.gmail.com>
	<51119007.6090806@uci.edu>
	<CAH6Pt5rvYXnFsS6D-Bashp770MDRRqs8QLS1W4pvK9u213WsDA@mail.gmail.com>
	<CABL7CQicsKKE+Y=ObhPqOKs1BzhTE=OS_5j6mt+7n-MJGdwY_A@mail.gmail.com>
	<CAH6Pt5rPn4MxttRhFDZ3evaX371zoLh8qD5NUBwH8WU-ZSJ6Mw@mail.gmail.com>
	<CABL7CQiSJ0WJjUgBLXKF-bp1GnDKWRSEAy0pqQ+HyQODSeTM6Q@mail.gmail.com>
	<CAH6Pt5pmktcmqFnRxrjsC3hcKs3Qr2r7r31BYDQtJJiga=oF6g@mail.gmail.com>
	<5113399F.3090803@astro.uio.no>
	<CADDwiVAJs5QTuB6jWRysnFQjvTR=u9h1B1qNUHjiSu0i4y3zfg@mail.gmail.com>
	<CAH6Pt5pdDJZ0dLLNULo_eG6yCLuJfC3FtaCGa-AveShqrGZ_Xw@mail.gmail.com>
	<CABL7CQgZq9PjOX0pmCtfq5qRxXR5hky=bpsh_HiEs_2=pv1b=g@mail.gmail.com>
	<CAH6Pt5rAx7DV1dVE9S7wr0Fz7KasEKooJ8wdYayhSMnsri=mUw@mail.gmail.com>
	<CABL7CQg1_zoW-PR8vJjsrahK9DqdHWuK6m3RJwaF0PnUApeGqQ@mail.gmail.com>
	<CAH6Pt5rcrcAtR9KSm+QhLMP64jFXC4BDB4f70uhmCYsqKPNjsw@mail.gmail.com>
	<CALGmxE+rA9LXUF7Qyt4JbL9BzEmEnx7DYwpzXFO4urvk4GncVQ@mail.gmail.com>
	<CADDwiVDziAKbLSfG2m9UuodT_HWzVMx2p5NCzg+J6iF4xGwjHg@mail.gmail.com>
	<CAH6Pt5rLMoDLTsRTjKb9c1uqPGjnhg-ZbOcyRGsFnVPi0DH6Vw@mail.gmail.com>
Message-ID: <CADDwiVBWCQ20ayBewJYKMR4wwawjAWotA5MqEOek4B-VCA_Xpg@mail.gmail.com>

Hi Matthew,

On Tue, Mar 26, 2013 at 5:48 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi Ondrej,
>
> On Thu, Feb 7, 2013 at 3:18 PM, Ond?ej ?ert?k <ondrej.certik at gmail.com> wrote:
>> On Thu, Feb 7, 2013 at 12:29 PM, Chris Barker - NOAA Federal
>> <chris.barker at noaa.gov> wrote:
>>> On Thu, Feb 7, 2013 at 11:38 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>> a) If we cannot build Scipy now, it may or may not be acceptable to
>>>> release numpy now.  I think it is, you (Ralf) think it isn't, we
>>>> haven't discussed that.  It may not come up.
>>>
>>> Is anyone suggesting we hold the whole release for this? I fnot, then
>>
>> Just to make it clear, I do not plan to hold the whole release because of this.
>> Previous releases also didn't have this official 64bit Windows binary,
>> so there is
>> no regression.
>>
>> Once we figure out how to create 64bit binaries, then we'll start
>> uploading them.
>
> Did you make any progress with this?  Worth making some notes?
> Anything we can do to help?

Unfortunately I've been too busy the last month to push this through,
so right now I am just concentrating on getting 1.7.1 out of the door,
as that is higher priority.

I am starting a new job on Monday, so once things settle down, I
should be able to get back to this. I will post notes once I get to
this again.

Ondrej


From toddrjen at gmail.com  Fri Mar 29 11:15:07 2013
From: toddrjen at gmail.com (Todd)
Date: Fri, 29 Mar 2013 16:15:07 +0100
Subject: [Numpy-discussion] Polar/spherical coordinates handling
Message-ID: <CAFpSVpLQhTSHm_-H7AQKzHXyiSdg6EiVSfmjGAD=Q286sz1g9Q@mail.gmail.com>

>From what I can see, numpy doesn't have any functions for handling polar or
spherical coordinate to/from cartesian coordinate conversion.  I think such
methods would be pretty useful.  I am looking now and it doesn't look that
hard to create functions to convert between n-dimensional cartesian and
n-spherical coordinates.  Would anyone be interested in me adding methods
for this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130329/95a4f553/attachment.html>

From amcmorl at gmail.com  Fri Mar 29 11:33:13 2013
From: amcmorl at gmail.com (Angus McMorland)
Date: Fri, 29 Mar 2013 11:33:13 -0400
Subject: [Numpy-discussion] Polar/spherical coordinates handling
In-Reply-To: <CAFpSVpLQhTSHm_-H7AQKzHXyiSdg6EiVSfmjGAD=Q286sz1g9Q@mail.gmail.com>
References: <CAFpSVpLQhTSHm_-H7AQKzHXyiSdg6EiVSfmjGAD=Q286sz1g9Q@mail.gmail.com>
Message-ID: <CACtA=SwRLPaaYXKketkWi826TJvYjqSEPP0p4PhaXak+iVUgbQ@mail.gmail.com>

On 29 March 2013 11:15, Todd <toddrjen at gmail.com> wrote:
> From what I can see, numpy doesn't have any functions for handling polar or
> spherical coordinate to/from cartesian coordinate conversion.  I think such
> methods would be pretty useful.  I am looking now and it doesn't look that
> hard to create functions to convert between n-dimensional cartesian and
> n-spherical coordinates.  Would anyone be interested in me adding methods
> for this?

I use these co-ordinate transforms often. I wonder if it wouldn't be
preferable to create a scikit focused on spherical or, more generally,
geometric operations rather than adding to the already hefty number of
functions in numpy. I'd be interested to contribute to such a scikit.

Angus
--
AJC McMorland
Research Associate
Neurobiology, University of Pittsburgh


From toddrjen at gmail.com  Fri Mar 29 13:27:13 2013
From: toddrjen at gmail.com (Todd)
Date: Fri, 29 Mar 2013 18:27:13 +0100
Subject: [Numpy-discussion] Polar/spherical coordinates handling
In-Reply-To: <CACtA=SwRLPaaYXKketkWi826TJvYjqSEPP0p4PhaXak+iVUgbQ@mail.gmail.com>
References: <CAFpSVpLQhTSHm_-H7AQKzHXyiSdg6EiVSfmjGAD=Q286sz1g9Q@mail.gmail.com>
	<CACtA=SwRLPaaYXKketkWi826TJvYjqSEPP0p4PhaXak+iVUgbQ@mail.gmail.com>
Message-ID: <CAFpSVpKkRmOivDu-39TTk8t6gZsHm2T3-XMXXqRBnCb=scHNZQ@mail.gmail.com>

On Fri, Mar 29, 2013 at 4:33 PM, Angus McMorland <amcmorl at gmail.com> wrote:

> On 29 March 2013 11:15, Todd <toddrjen at gmail.com> wrote:
> > From what I can see, numpy doesn't have any functions for handling polar
> or
> > spherical coordinate to/from cartesian coordinate conversion.  I think
> such
> > methods would be pretty useful.  I am looking now and it doesn't look
> that
> > hard to create functions to convert between n-dimensional cartesian and
> > n-spherical coordinates.  Would anyone be interested in me adding methods
> > for this?
>
> I use these co-ordinate transforms often. I wonder if it wouldn't be
> preferable to create a scikit focused on spherical or, more generally,
> geometric operations rather than adding to the already hefty number of
> functions in numpy. I'd be interested to contribute to such a scikit.
>

The reason I think these particular functions belong in numpy is that they
are closely tied to signal processing and linear algebra, far more than any
other coordinate systems.  It is really just a generalization of the
complex number processing that is already available from numpy.

Also, although numpy has methods to convert complex values to magnitude and
angle, it doesn't have any methods to go the other way.  Again, such a
function would just be a special 2-D case of the more general n-dimensional
functions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130329/270198d4/attachment.html>

From matthew.brett at gmail.com  Fri Mar 29 22:08:23 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Fri, 29 Mar 2013 19:08:23 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
Message-ID: <CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>

Hi,

We were teaching today, and found ourselves getting very confused
about ravel and shape in numpy.

Summary
--------------

There are two separate ideas needed to understand ordering in ravel and reshape:

Idea 1): ravel / reshape can proceed from the last axis to the first,
or the first to the last.  This is "ravel index ordering"
Idea 2) The physical layout of the array (on disk or in memory) can be
"C" or "F" contiguous or neither.
This is "memory ordering"

The index ordering is usually (but see below) orthogonal to the memory ordering.

The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
index ordering, and this mixes the two ideas and is confusing.

What the current situation looks like
----------------------------------------------------

Specifically, we've been rolling this around 4 experienced numpy users
and we all predicted at least one of the results below wrongly.

This was what we knew, or should have known:

In [2]: import numpy as np

In [3]: arr = np.arange(10).reshape((2, 5))

In [5]: arr.ravel()
Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

So, the 'ravel' operation unravels over the last axis (1) first,
followed by axis 0.

So far so good (even if the opposite to MATLAB, Octave).

Then we found the 'order' flag to ravel:

In [10]: arr.flags
Out[10]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [11]: arr.ravel('C')
Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

But we soon got confused.  How about this?

In [12]: arr_F = np.array(arr, order='F')

In [13]: arr_F.flags
Out[13]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [16]: arr_F
Out[16]:
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [17]: arr_F.ravel('C')
Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Right - so the flag 'C' to ravel, has got nothing to do with *memory*
ordering, but is to do with *index* ordering.

And in fact, we can ask for memory ordering specifically:

In [22]: arr.ravel('K')
Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [23]: arr_F.ravel('K')
Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])

In [24]: arr.ravel('A')
Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [25]: arr_F.ravel('A')
Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])

There are some confusions to get into with the 'order' flag to reshape
as well, of the same type.

Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.

This is very confusing.  We think the index ordering and memory
ordering ideas need to be separated, and specifically, we should avoid
using "C" and "F" to refer to index ordering.

Proposal
-------------

* Deprecate the use of "C" and "F" meaning backwards and forwards
index ordering for ravel, reshape
* Prefer "Z" and "N", being graphical representations of unraveling in
2 dimensions, axis1 first and axis0 first respectively (excellent
naming idea by Paul Ivanov)

What do y'all think?

Cheers,

Matthew
Paul Ivanov
JB Poline


From josef.pktd at gmail.com  Sat Mar 30 07:14:51 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 30 Mar 2013 07:14:51 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
Message-ID: <CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>

On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>
> Hi,
>
> We were teaching today, and found ourselves getting very confused
> about ravel and shape in numpy.
>
> Summary
> --------------
>
> There are two separate ideas needed to understand ordering in ravel and reshape:
>
> Idea 1): ravel / reshape can proceed from the last axis to the first,
> or the first to the last.  This is "ravel index ordering"
> Idea 2) The physical layout of the array (on disk or in memory) can be
> "C" or "F" contiguous or neither.
> This is "memory ordering"
>
> The index ordering is usually (but see below) orthogonal to the memory ordering.
>
> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
> index ordering, and this mixes the two ideas and is confusing.
>
> What the current situation looks like
> ----------------------------------------------------
>
> Specifically, we've been rolling this around 4 experienced numpy users
> and we all predicted at least one of the results below wrongly.
>
> This was what we knew, or should have known:
>
> In [2]: import numpy as np
>
> In [3]: arr = np.arange(10).reshape((2, 5))
>
> In [5]: arr.ravel()
> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
> So, the 'ravel' operation unravels over the last axis (1) first,
> followed by axis 0.
>
> So far so good (even if the opposite to MATLAB, Octave).
>
> Then we found the 'order' flag to ravel:
>
> In [10]: arr.flags
> Out[10]:
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   OWNDATA : False
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> In [11]: arr.ravel('C')
> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
> But we soon got confused.  How about this?
>
> In [12]: arr_F = np.array(arr, order='F')
>
> In [13]: arr_F.flags
> Out[13]:
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> In [16]: arr_F
> Out[16]:
> array([[0, 1, 2, 3, 4],
>        [5, 6, 7, 8, 9]])
>
> In [17]: arr_F.ravel('C')
> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
> ordering, but is to do with *index* ordering.
>
> And in fact, we can ask for memory ordering specifically:
>
> In [22]: arr.ravel('K')
> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
> In [23]: arr_F.ravel('K')
> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>
> In [24]: arr.ravel('A')
> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
> In [25]: arr_F.ravel('A')
> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>
> There are some confusions to get into with the 'order' flag to reshape
> as well, of the same type.
>
> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>
> This is very confusing.  We think the index ordering and memory
> ordering ideas need to be separated, and specifically, we should avoid
> using "C" and "F" to refer to index ordering.
>
> Proposal
> -------------
>
> * Deprecate the use of "C" and "F" meaning backwards and forwards
> index ordering for ravel, reshape
> * Prefer "Z" and "N", being graphical representations of unraveling in
> 2 dimensions, axis1 first and axis0 first respectively (excellent
> naming idea by Paul Ivanov)
>
> What do y'all think?
>
> Cheers,
>
> Matthew
> Paul Ivanov
> JB Poline
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


I always thought "F" and "C" are easy to understand, I always thought about
the content and never about the memory when using it.

In my numpy htmlhelp for version 1.5, I don't have a K or A option

>>> np.__version__
'1.5.1'

>>> np.arange(5).ravel("K")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: order not understood

>>> np.arange(5).ravel("A")
array([0, 1, 2, 3, 4])
>>>

the C, F in ravel have their twins in reshape

>>> arr = np.arange(10).reshape(2,5, order="C").copy()
>>> arr
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> arr.ravel()
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> arr = np.arange(10).reshape(2,5, order="F").copy()
>>> arr
array([[0, 2, 4, 6, 8],
       [1, 3, 5, 7, 9]])
>>> arrarr.ravel("F")
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

For example we use it when we get raveled arrays from R,
and F for column order and C for row order indexing are pretty
obvious names when coming from another package (Matlab, R, Gauss)

Josef


From josef.pktd at gmail.com  Sat Mar 30 07:48:19 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 30 Mar 2013 07:48:19 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
Message-ID: <CAMMTP+AniaZi0-zbnAf=NLQ_STB4rLb531boi4_C0H5ett=1Xw@mail.gmail.com>

On Sat, Mar 30, 2013 at 7:14 AM,  <josef.pktd at gmail.com> wrote:
> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>
>> Hi,
>>
>> We were teaching today, and found ourselves getting very confused
>> about ravel and shape in numpy.
>>
>> Summary
>> --------------
>>
>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>
>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>> or the first to the last.  This is "ravel index ordering"
>> Idea 2) The physical layout of the array (on disk or in memory) can be
>> "C" or "F" contiguous or neither.
>> This is "memory ordering"
>>
>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>
>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>> index ordering, and this mixes the two ideas and is confusing.
>>
>> What the current situation looks like
>> ----------------------------------------------------
>>
>> Specifically, we've been rolling this around 4 experienced numpy users
>> and we all predicted at least one of the results below wrongly.
>>
>> This was what we knew, or should have known:
>>
>> In [2]: import numpy as np
>>
>> In [3]: arr = np.arange(10).reshape((2, 5))
>>
>> In [5]: arr.ravel()
>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> So, the 'ravel' operation unravels over the last axis (1) first,
>> followed by axis 0.
>>
>> So far so good (even if the opposite to MATLAB, Octave).
>>
>> Then we found the 'order' flag to ravel:
>>
>> In [10]: arr.flags
>> Out[10]:
>>   C_CONTIGUOUS : True
>>   F_CONTIGUOUS : False
>>   OWNDATA : False
>>   WRITEABLE : True
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> In [11]: arr.ravel('C')
>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> But we soon got confused.  How about this?
>>
>> In [12]: arr_F = np.array(arr, order='F')
>>
>> In [13]: arr_F.flags
>> Out[13]:
>>   C_CONTIGUOUS : False
>>   F_CONTIGUOUS : True
>>   OWNDATA : True
>>   WRITEABLE : True
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> In [16]: arr_F
>> Out[16]:
>> array([[0, 1, 2, 3, 4],
>>        [5, 6, 7, 8, 9]])
>>
>> In [17]: arr_F.ravel('C')
>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>> ordering, but is to do with *index* ordering.
>>
>> And in fact, we can ask for memory ordering specifically:
>>
>> In [22]: arr.ravel('K')
>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> In [23]: arr_F.ravel('K')
>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>
>> In [24]: arr.ravel('A')
>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> In [25]: arr_F.ravel('A')
>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>
>> There are some confusions to get into with the 'order' flag to reshape
>> as well, of the same type.
>>
>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>
>> This is very confusing.  We think the index ordering and memory
>> ordering ideas need to be separated, and specifically, we should avoid
>> using "C" and "F" to refer to index ordering.
>>
>> Proposal
>> -------------
>>
>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>> index ordering for ravel, reshape
>> * Prefer "Z" and "N", being graphical representations of unraveling in
>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>> naming idea by Paul Ivanov)
>>
>> What do y'all think?
>>
>> Cheers,
>>
>> Matthew
>> Paul Ivanov
>> JB Poline
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> I always thought "F" and "C" are easy to understand, I always thought about
> the content and never about the memory when using it.
>
> In my numpy htmlhelp for version 1.5, I don't have a K or A option
>
>>>> np.__version__
> '1.5.1'
>
>>>> np.arange(5).ravel("K")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: order not understood
>
>>>> np.arange(5).ravel("A")
> array([0, 1, 2, 3, 4])
>>>>
>
> the C, F in ravel have their twins in reshape
>
>>>> arr = np.arange(10).reshape(2,5, order="C").copy()
>>>> arr
> array([[0, 1, 2, 3, 4],
>        [5, 6, 7, 8, 9]])
>>>> arr.ravel()
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>> arr = np.arange(10).reshape(2,5, order="F").copy()
>>>> arr
> array([[0, 2, 4, 6, 8],
>        [1, 3, 5, 7, 9]])
>>>> arrarr.ravel("F")
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>
> For example we use it when we get raveled arrays from R,
> and F for column order and C for row order indexing are pretty
> obvious names when coming from another package (Matlab, R, Gauss)

just a quick search to get an idea

in statsmodels
19 out of 135 ravel are ravel('F')
50 out of 270 reshapes specify: reshape.*order='F' (regular expression)

Josef

>
> Josef


From ivan.oseledets at gmail.com  Sat Mar 30 14:01:38 2013
From: ivan.oseledets at gmail.com (Ivan Oseledets)
Date: Sat, 30 Mar 2013 22:01:38 +0400
Subject: [Numpy-discussion] Indexing bug?
Message-ID: <CANSLWcQbcdf0yqO27hykjaFfvXNrCR3JMpYigVf_O7RkpZhgbQ@mail.gmail.com>

I am using numpy 1.6.1,
and encountered a wierd fancy indexing bug:

import numpy as np
c = np.random.randn(10,200,10);

In [29]: print c[[0,1],:200,:2].shape
(2, 200, 2)

In [30]: print c[[0,1],:200,[0,1]].shape
(2, 200)

It means, that here fancy indexing is not working right for a 3d array.

Is this bug fixed with higher versions of numpy?
I do not check, since mine is from EPD and is compiled with MKL (and I
can consider recompiling myself only under strong circumstances)

Ivan


From jaime.frio at gmail.com  Sat Mar 30 14:13:35 2013
From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=)
Date: Sat, 30 Mar 2013 11:13:35 -0700
Subject: [Numpy-discussion] Indexing bug?
In-Reply-To: <CANSLWcQbcdf0yqO27hykjaFfvXNrCR3JMpYigVf_O7RkpZhgbQ@mail.gmail.com>
References: <CANSLWcQbcdf0yqO27hykjaFfvXNrCR3JMpYigVf_O7RkpZhgbQ@mail.gmail.com>
Message-ID: <CAPOWHWk+mL6KN6F2FHTPn5HTiU0UEQPj6KdxjNK_+T1E-YRiBg@mail.gmail.com>

On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets
<ivan.oseledets at gmail.com>wrote:

> I am using numpy 1.6.1,
> and encountered a wierd fancy indexing bug:
>
> import numpy as np
> c = np.random.randn(10,200,10);
>
> In [29]: print c[[0,1],:200,:2].shape
> (2, 200, 2)
>
> In [30]: print c[[0,1],:200,[0,1]].shape
> (2, 200)
>
> It means, that here fancy indexing is not working right for a 3d array.
>

It is working fine, review the docs:

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing

In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1].

If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :,
j] you could use slicing:

 c[:2, :200, :2]

or something more elaborate like:

c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)]

Jaime


>
> Is this bug fixed with higher versions of numpy?
> I do not check, since mine is from EPD and is compiled with MKL (and I
> can consider recompiling myself only under strong circumstances)
>
> Ivan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130330/8cd0dd2a/attachment.html>

From sebastian at sipsolutions.net  Sat Mar 30 14:55:23 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sat, 30 Mar 2013 19:55:23 +0100
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
Message-ID: <1364669723.2556.19.camel@sebastian-laptop>

On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
> Hi,
> 
> We were teaching today, and found ourselves getting very confused
> about ravel and shape in numpy.
> 
> Summary
> --------------
> 
> There are two separate ideas needed to understand ordering in ravel and reshape:
> 
> Idea 1): ravel / reshape can proceed from the last axis to the first,
> or the first to the last.  This is "ravel index ordering"
> Idea 2) The physical layout of the array (on disk or in memory) can be
> "C" or "F" contiguous or neither.
> This is "memory ordering"
> 
> The index ordering is usually (but see below) orthogonal to the memory ordering.
> 
> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
> index ordering, and this mixes the two ideas and is confusing.
> 
> What the current situation looks like
> ----------------------------------------------------
> 
> Specifically, we've been rolling this around 4 experienced numpy users
> and we all predicted at least one of the results below wrongly.
> 
> This was what we knew, or should have known:
> 
> In [2]: import numpy as np
> 
> In [3]: arr = np.arange(10).reshape((2, 5))
> 
> In [5]: arr.ravel()
> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> So, the 'ravel' operation unravels over the last axis (1) first,
> followed by axis 0.
> 
> So far so good (even if the opposite to MATLAB, Octave).
> 
> Then we found the 'order' flag to ravel:
> 
> In [10]: arr.flags
> Out[10]:
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   OWNDATA : False
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
> 
> In [11]: arr.ravel('C')
> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> But we soon got confused.  How about this?
> 
> In [12]: arr_F = np.array(arr, order='F')
> 
> In [13]: arr_F.flags
> Out[13]:
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
> 
> In [16]: arr_F
> Out[16]:
> array([[0, 1, 2, 3, 4],
>        [5, 6, 7, 8, 9]])
> 
> In [17]: arr_F.ravel('C')
> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
> ordering, but is to do with *index* ordering.
> 
> And in fact, we can ask for memory ordering specifically:
> 
> In [22]: arr.ravel('K')
> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> In [23]: arr_F.ravel('K')
> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
> 
> In [24]: arr.ravel('A')
> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> 
> In [25]: arr_F.ravel('A')
> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
> 
> There are some confusions to get into with the 'order' flag to reshape
> as well, of the same type.
> 
> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
> 
> This is very confusing.  We think the index ordering and memory
> ordering ideas need to be separated, and specifically, we should avoid
> using "C" and "F" to refer to index ordering.
> 
> Proposal
> -------------
> 
> * Deprecate the use of "C" and "F" meaning backwards and forwards
> index ordering for ravel, reshape
> * Prefer "Z" and "N", being graphical representations of unraveling in
> 2 dimensions, axis1 first and axis0 first respectively (excellent
> naming idea by Paul Ivanov)
> 
> What do y'all think?
> 

Personally I think it is clear enough and that "Z" and "N" would confuse
me just as much (though I am used to the other names). Also "Z" and "N"
would seem more like aliases, which would also make sense in the memory
order context.
If anything, I would prefer renaming the arguments iteration_order and
memory_order, but it seems overdoing it...
Maybe the documentation could just be checked if it is always clear
though. I.e. maybe it does not use "iteration" or "memory" order
consistently (though I somewhat feel it is usually clear that it must be
iteration order, since no numpy function cares about the input memory
order as they will just do a copy if necessary).

Regards,

Sebastian 

> Cheers,
> 
> Matthew
> Paul Ivanov
> JB Poline
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From matthew.brett at gmail.com  Sat Mar 30 15:45:52 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 12:45:52 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <1364669723.2556.19.camel@sebastian-laptop>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<1364669723.2556.19.camel@sebastian-laptop>
Message-ID: <CAH6Pt5rqPT9SeWdTj_-cv1NthhGYzFB3umHQJpVvfQ3=PAJ-tQ@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 11:55 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
>> Hi,
>>
>> We were teaching today, and found ourselves getting very confused
>> about ravel and shape in numpy.
>>
>> Summary
>> --------------
>>
>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>
>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>> or the first to the last.  This is "ravel index ordering"
>> Idea 2) The physical layout of the array (on disk or in memory) can be
>> "C" or "F" contiguous or neither.
>> This is "memory ordering"
>>
>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>
>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>> index ordering, and this mixes the two ideas and is confusing.
>>
>> What the current situation looks like
>> ----------------------------------------------------
>>
>> Specifically, we've been rolling this around 4 experienced numpy users
>> and we all predicted at least one of the results below wrongly.
>>
>> This was what we knew, or should have known:
>>
>> In [2]: import numpy as np
>>
>> In [3]: arr = np.arange(10).reshape((2, 5))
>>
>> In [5]: arr.ravel()
>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> So, the 'ravel' operation unravels over the last axis (1) first,
>> followed by axis 0.
>>
>> So far so good (even if the opposite to MATLAB, Octave).
>>
>> Then we found the 'order' flag to ravel:
>>
>> In [10]: arr.flags
>> Out[10]:
>>   C_CONTIGUOUS : True
>>   F_CONTIGUOUS : False
>>   OWNDATA : False
>>   WRITEABLE : True
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> In [11]: arr.ravel('C')
>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> But we soon got confused.  How about this?
>>
>> In [12]: arr_F = np.array(arr, order='F')
>>
>> In [13]: arr_F.flags
>> Out[13]:
>>   C_CONTIGUOUS : False
>>   F_CONTIGUOUS : True
>>   OWNDATA : True
>>   WRITEABLE : True
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> In [16]: arr_F
>> Out[16]:
>> array([[0, 1, 2, 3, 4],
>>        [5, 6, 7, 8, 9]])
>>
>> In [17]: arr_F.ravel('C')
>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>> ordering, but is to do with *index* ordering.
>>
>> And in fact, we can ask for memory ordering specifically:
>>
>> In [22]: arr.ravel('K')
>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> In [23]: arr_F.ravel('K')
>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>
>> In [24]: arr.ravel('A')
>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> In [25]: arr_F.ravel('A')
>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>
>> There are some confusions to get into with the 'order' flag to reshape
>> as well, of the same type.
>>
>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>
>> This is very confusing.  We think the index ordering and memory
>> ordering ideas need to be separated, and specifically, we should avoid
>> using "C" and "F" to refer to index ordering.
>>
>> Proposal
>> -------------
>>
>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>> index ordering for ravel, reshape
>> * Prefer "Z" and "N", being graphical representations of unraveling in
>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>> naming idea by Paul Ivanov)
>>
>> What do y'all think?
>>
>
> Personally I think it is clear enough and that "Z" and "N" would confuse
> me just as much (though I am used to the other names). Also "Z" and "N"
> would seem more like aliases, which would also make sense in the memory
> order context.
> If anything, I would prefer renaming the arguments iteration_order and
> memory_order, but it seems overdoing it...

I am not sure what you mean - at the moment  there is one argument
called 'order' that can refer to iteration order or memory order.  Are
you proposing two arguments?

> Maybe the documentation could just be checked if it is always clear
> though. I.e. maybe it does not use "iteration" or "memory" order
> consistently (though I somewhat feel it is usually clear that it must be
> iteration order, since no numpy function cares about the input memory
> order as they will just do a copy if necessary).

Do you really mean this?  Numpy is full of 'order=' flags that refer to memory.

Cheers,

Matthew


From matthew.brett at gmail.com  Sat Mar 30 15:51:17 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 12:51:17 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
Message-ID: <CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>
>> Hi,
>>
>> We were teaching today, and found ourselves getting very confused
>> about ravel and shape in numpy.
>>
>> Summary
>> --------------
>>
>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>
>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>> or the first to the last.  This is "ravel index ordering"
>> Idea 2) The physical layout of the array (on disk or in memory) can be
>> "C" or "F" contiguous or neither.
>> This is "memory ordering"
>>
>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>
>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>> index ordering, and this mixes the two ideas and is confusing.
>>
>> What the current situation looks like
>> ----------------------------------------------------
>>
>> Specifically, we've been rolling this around 4 experienced numpy users
>> and we all predicted at least one of the results below wrongly.
>>
>> This was what we knew, or should have known:
>>
>> In [2]: import numpy as np
>>
>> In [3]: arr = np.arange(10).reshape((2, 5))
>>
>> In [5]: arr.ravel()
>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> So, the 'ravel' operation unravels over the last axis (1) first,
>> followed by axis 0.
>>
>> So far so good (even if the opposite to MATLAB, Octave).
>>
>> Then we found the 'order' flag to ravel:
>>
>> In [10]: arr.flags
>> Out[10]:
>>   C_CONTIGUOUS : True
>>   F_CONTIGUOUS : False
>>   OWNDATA : False
>>   WRITEABLE : True
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> In [11]: arr.ravel('C')
>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> But we soon got confused.  How about this?
>>
>> In [12]: arr_F = np.array(arr, order='F')
>>
>> In [13]: arr_F.flags
>> Out[13]:
>>   C_CONTIGUOUS : False
>>   F_CONTIGUOUS : True
>>   OWNDATA : True
>>   WRITEABLE : True
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> In [16]: arr_F
>> Out[16]:
>> array([[0, 1, 2, 3, 4],
>>        [5, 6, 7, 8, 9]])
>>
>> In [17]: arr_F.ravel('C')
>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>> ordering, but is to do with *index* ordering.
>>
>> And in fact, we can ask for memory ordering specifically:
>>
>> In [22]: arr.ravel('K')
>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> In [23]: arr_F.ravel('K')
>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>
>> In [24]: arr.ravel('A')
>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> In [25]: arr_F.ravel('A')
>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>
>> There are some confusions to get into with the 'order' flag to reshape
>> as well, of the same type.
>>
>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>
>> This is very confusing.  We think the index ordering and memory
>> ordering ideas need to be separated, and specifically, we should avoid
>> using "C" and "F" to refer to index ordering.
>>
>> Proposal
>> -------------
>>
>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>> index ordering for ravel, reshape
>> * Prefer "Z" and "N", being graphical representations of unraveling in
>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>> naming idea by Paul Ivanov)
>>
>> What do y'all think?
>>
>> Cheers,
>>
>> Matthew
>> Paul Ivanov
>> JB Poline
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> I always thought "F" and "C" are easy to understand, I always thought about
> the content and never about the memory when using it.

I can only say that 4 out of 4 experienced numpy developers found
themselves unable to predict the behavior of these functions before
they saw the output.

The problem is always that explaining something makes it clearer for a
moment, but, for those who do not have the explanation or who have
forgotten it, at least among us here, the outputs were generating
groans and / or high fives as we incorrectly or correctly guessed what
was going to happen.

I think the only way to find out whether this really is confusing or
not, is to put someone in front of these functions without any
explanation and ask them to predict what is going to come out of the
various inputs and flags.   Or to try and teach it, which was the
problem we were having.

Cheers,

Matthew


From josef.pktd at gmail.com  Sat Mar 30 16:57:36 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 30 Mar 2013 16:57:36 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
Message-ID: <CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>

On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> We were teaching today, and found ourselves getting very confused
>>> about ravel and shape in numpy.
>>>
>>> Summary
>>> --------------
>>>
>>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>>
>>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>>> or the first to the last.  This is "ravel index ordering"
>>> Idea 2) The physical layout of the array (on disk or in memory) can be
>>> "C" or "F" contiguous or neither.
>>> This is "memory ordering"
>>>
>>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>>
>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>>> index ordering, and this mixes the two ideas and is confusing.
>>>
>>> What the current situation looks like
>>> ----------------------------------------------------
>>>
>>> Specifically, we've been rolling this around 4 experienced numpy users
>>> and we all predicted at least one of the results below wrongly.
>>>
>>> This was what we knew, or should have known:
>>>
>>> In [2]: import numpy as np
>>>
>>> In [3]: arr = np.arange(10).reshape((2, 5))
>>>
>>> In [5]: arr.ravel()
>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> So, the 'ravel' operation unravels over the last axis (1) first,
>>> followed by axis 0.
>>>
>>> So far so good (even if the opposite to MATLAB, Octave).
>>>
>>> Then we found the 'order' flag to ravel:
>>>
>>> In [10]: arr.flags
>>> Out[10]:
>>>   C_CONTIGUOUS : True
>>>   F_CONTIGUOUS : False
>>>   OWNDATA : False
>>>   WRITEABLE : True
>>>   ALIGNED : True
>>>   UPDATEIFCOPY : False
>>>
>>> In [11]: arr.ravel('C')
>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> But we soon got confused.  How about this?
>>>
>>> In [12]: arr_F = np.array(arr, order='F')
>>>
>>> In [13]: arr_F.flags
>>> Out[13]:
>>>   C_CONTIGUOUS : False
>>>   F_CONTIGUOUS : True
>>>   OWNDATA : True
>>>   WRITEABLE : True
>>>   ALIGNED : True
>>>   UPDATEIFCOPY : False
>>>
>>> In [16]: arr_F
>>> Out[16]:
>>> array([[0, 1, 2, 3, 4],
>>>        [5, 6, 7, 8, 9]])
>>>
>>> In [17]: arr_F.ravel('C')
>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>>> ordering, but is to do with *index* ordering.
>>>
>>> And in fact, we can ask for memory ordering specifically:
>>>
>>> In [22]: arr.ravel('K')
>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> In [23]: arr_F.ravel('K')
>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>
>>> In [24]: arr.ravel('A')
>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> In [25]: arr_F.ravel('A')
>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>
>>> There are some confusions to get into with the 'order' flag to reshape
>>> as well, of the same type.
>>>
>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>>
>>> This is very confusing.  We think the index ordering and memory
>>> ordering ideas need to be separated, and specifically, we should avoid
>>> using "C" and "F" to refer to index ordering.
>>>
>>> Proposal
>>> -------------
>>>
>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>> index ordering for ravel, reshape
>>> * Prefer "Z" and "N", being graphical representations of unraveling in
>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>> naming idea by Paul Ivanov)
>>>
>>> What do y'all think?
>>>
>>> Cheers,
>>>
>>> Matthew
>>> Paul Ivanov
>>> JB Poline
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> I always thought "F" and "C" are easy to understand, I always thought about
>> the content and never about the memory when using it.
>
> I can only say that 4 out of 4 experienced numpy developers found
> themselves unable to predict the behavior of these functions before
> they saw the output.
>
> The problem is always that explaining something makes it clearer for a
> moment, but, for those who do not have the explanation or who have
> forgotten it, at least among us here, the outputs were generating
> groans and / or high fives as we incorrectly or correctly guessed what
> was going to happen.
>
> I think the only way to find out whether this really is confusing or
> not, is to put someone in front of these functions without any
> explanation and ask them to predict what is going to come out of the
> various inputs and flags.   Or to try and teach it, which was the
> problem we were having.

changing the names doesn't make it easier to understand.
I think the confusion is because the new A and K refer to existing memory


``ravel`` is just stacking columns ('F') or stacking rows ('C'), I
don't remember having seen any weird cases.
------------

I always thought of "order" in array creation is the way we want to
have the memory layout of the *target* array and has nothing to do
with existing memory layout (creating view or copy as needed).

reshape, and ravel are *views* if possible, memory might just be some
weird strides
(and can be ignored unless you want to do some memory optimization,
keeping track of the memory is difficult.
I don't think I will start to use A and K after upgrading numpy.)

>>> a1 = np.ones((10,4))

not contiguous

>>> arr2 = a1[:, 2:4]
>>> arr2.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


stack columns (needs to make a copy)

>>> arr3 = arr2.ravel('F')
>>> arr3.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

stack columns or rows with reshape

(I have no idea what it did with the memory)

>>> arr2.reshape(-1,1).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> arr2.reshape(-1,1, order='F').flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> arr2.reshape(-1, order='F').flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

-------------------

one case where I do pay attention to memory layout is column slicing

>>> arr = np.ones((10, 5), order='F')
>>> for i in range(1, 5): print arr[:, :i+2].ravel('C').flags['OWNDATA']
???
>>> for i in range(1,5): print arr[:, :i+2].ravel('F').flags['OWNDATA']
???

Josef

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From josef.pktd at gmail.com  Sat Mar 30 17:20:19 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 30 Mar 2013 17:20:19 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
Message-ID: <CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>

On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We were teaching today, and found ourselves getting very confused
>>>> about ravel and shape in numpy.
>>>>
>>>> Summary
>>>> --------------
>>>>
>>>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>>>
>>>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>>>> or the first to the last.  This is "ravel index ordering"
>>>> Idea 2) The physical layout of the array (on disk or in memory) can be
>>>> "C" or "F" contiguous or neither.
>>>> This is "memory ordering"
>>>>
>>>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>>>
>>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>>>> index ordering, and this mixes the two ideas and is confusing.
>>>>
>>>> What the current situation looks like
>>>> ----------------------------------------------------
>>>>
>>>> Specifically, we've been rolling this around 4 experienced numpy users
>>>> and we all predicted at least one of the results below wrongly.
>>>>
>>>> This was what we knew, or should have known:
>>>>
>>>> In [2]: import numpy as np
>>>>
>>>> In [3]: arr = np.arange(10).reshape((2, 5))
>>>>
>>>> In [5]: arr.ravel()
>>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> So, the 'ravel' operation unravels over the last axis (1) first,
>>>> followed by axis 0.
>>>>
>>>> So far so good (even if the opposite to MATLAB, Octave).
>>>>
>>>> Then we found the 'order' flag to ravel:
>>>>
>>>> In [10]: arr.flags
>>>> Out[10]:
>>>>   C_CONTIGUOUS : True
>>>>   F_CONTIGUOUS : False
>>>>   OWNDATA : False
>>>>   WRITEABLE : True
>>>>   ALIGNED : True
>>>>   UPDATEIFCOPY : False
>>>>
>>>> In [11]: arr.ravel('C')
>>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> But we soon got confused.  How about this?
>>>>
>>>> In [12]: arr_F = np.array(arr, order='F')
>>>>
>>>> In [13]: arr_F.flags
>>>> Out[13]:
>>>>   C_CONTIGUOUS : False
>>>>   F_CONTIGUOUS : True
>>>>   OWNDATA : True
>>>>   WRITEABLE : True
>>>>   ALIGNED : True
>>>>   UPDATEIFCOPY : False
>>>>
>>>> In [16]: arr_F
>>>> Out[16]:
>>>> array([[0, 1, 2, 3, 4],
>>>>        [5, 6, 7, 8, 9]])
>>>>
>>>> In [17]: arr_F.ravel('C')
>>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>>>> ordering, but is to do with *index* ordering.
>>>>
>>>> And in fact, we can ask for memory ordering specifically:
>>>>
>>>> In [22]: arr.ravel('K')
>>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> In [23]: arr_F.ravel('K')
>>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>
>>>> In [24]: arr.ravel('A')
>>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> In [25]: arr_F.ravel('A')
>>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>
>>>> There are some confusions to get into with the 'order' flag to reshape
>>>> as well, of the same type.
>>>>
>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>>>
>>>> This is very confusing.  We think the index ordering and memory
>>>> ordering ideas need to be separated, and specifically, we should avoid
>>>> using "C" and "F" to refer to index ordering.
>>>>
>>>> Proposal
>>>> -------------
>>>>
>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>> index ordering for ravel, reshape
>>>> * Prefer "Z" and "N", being graphical representations of unraveling in
>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>> naming idea by Paul Ivanov)
>>>>
>>>> What do y'all think?
>>>>
>>>> Cheers,
>>>>
>>>> Matthew
>>>> Paul Ivanov
>>>> JB Poline
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> I always thought "F" and "C" are easy to understand, I always thought about
>>> the content and never about the memory when using it.
>>
>> I can only say that 4 out of 4 experienced numpy developers found
>> themselves unable to predict the behavior of these functions before
>> they saw the output.
>>
>> The problem is always that explaining something makes it clearer for a
>> moment, but, for those who do not have the explanation or who have
>> forgotten it, at least among us here, the outputs were generating
>> groans and / or high fives as we incorrectly or correctly guessed what
>> was going to happen.
>>
>> I think the only way to find out whether this really is confusing or
>> not, is to put someone in front of these functions without any
>> explanation and ask them to predict what is going to come out of the
>> various inputs and flags.   Or to try and teach it, which was the
>> problem we were having.
>
> changing the names doesn't make it easier to understand.
> I think the confusion is because the new A and K refer to existing memory
>
>
> ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I
> don't remember having seen any weird cases.

example from our statistics use:
rows are observations/time periods, columns are variables/individuals

using "F" or "C", we can stack either by time-periods (observations)
or individuals (cross-section units)
that's easy to understand.


"A" and "K"  are pretty useless for us, because we don't know which
stacking we would get (we don't try to control the memory layout)
The only reason to use "A" or "K", in my opinion, is to use the
existing memory efficiently. Since the order in the array is
unpredictable, it only makes sense if we don't care about it, for
example when we only have elementwise operations.

Josef


From matthew.brett at gmail.com  Sat Mar 30 18:19:33 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 15:19:33 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
Message-ID: <CAH6Pt5pTtFbs8_spS0Kb9kbG82sOi7h+N2QHQaPnveaWOKV_zw@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 1:57 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We were teaching today, and found ourselves getting very confused
>>>> about ravel and shape in numpy.
>>>>
>>>> Summary
>>>> --------------
>>>>
>>>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>>>
>>>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>>>> or the first to the last.  This is "ravel index ordering"
>>>> Idea 2) The physical layout of the array (on disk or in memory) can be
>>>> "C" or "F" contiguous or neither.
>>>> This is "memory ordering"
>>>>
>>>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>>>
>>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>>>> index ordering, and this mixes the two ideas and is confusing.
>>>>
>>>> What the current situation looks like
>>>> ----------------------------------------------------
>>>>
>>>> Specifically, we've been rolling this around 4 experienced numpy users
>>>> and we all predicted at least one of the results below wrongly.
>>>>
>>>> This was what we knew, or should have known:
>>>>
>>>> In [2]: import numpy as np
>>>>
>>>> In [3]: arr = np.arange(10).reshape((2, 5))
>>>>
>>>> In [5]: arr.ravel()
>>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> So, the 'ravel' operation unravels over the last axis (1) first,
>>>> followed by axis 0.
>>>>
>>>> So far so good (even if the opposite to MATLAB, Octave).
>>>>
>>>> Then we found the 'order' flag to ravel:
>>>>
>>>> In [10]: arr.flags
>>>> Out[10]:
>>>>   C_CONTIGUOUS : True
>>>>   F_CONTIGUOUS : False
>>>>   OWNDATA : False
>>>>   WRITEABLE : True
>>>>   ALIGNED : True
>>>>   UPDATEIFCOPY : False
>>>>
>>>> In [11]: arr.ravel('C')
>>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> But we soon got confused.  How about this?
>>>>
>>>> In [12]: arr_F = np.array(arr, order='F')
>>>>
>>>> In [13]: arr_F.flags
>>>> Out[13]:
>>>>   C_CONTIGUOUS : False
>>>>   F_CONTIGUOUS : True
>>>>   OWNDATA : True
>>>>   WRITEABLE : True
>>>>   ALIGNED : True
>>>>   UPDATEIFCOPY : False
>>>>
>>>> In [16]: arr_F
>>>> Out[16]:
>>>> array([[0, 1, 2, 3, 4],
>>>>        [5, 6, 7, 8, 9]])
>>>>
>>>> In [17]: arr_F.ravel('C')
>>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>>>> ordering, but is to do with *index* ordering.
>>>>
>>>> And in fact, we can ask for memory ordering specifically:
>>>>
>>>> In [22]: arr.ravel('K')
>>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> In [23]: arr_F.ravel('K')
>>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>
>>>> In [24]: arr.ravel('A')
>>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> In [25]: arr_F.ravel('A')
>>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>
>>>> There are some confusions to get into with the 'order' flag to reshape
>>>> as well, of the same type.
>>>>
>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>>>
>>>> This is very confusing.  We think the index ordering and memory
>>>> ordering ideas need to be separated, and specifically, we should avoid
>>>> using "C" and "F" to refer to index ordering.
>>>>
>>>> Proposal
>>>> -------------
>>>>
>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>> index ordering for ravel, reshape
>>>> * Prefer "Z" and "N", being graphical representations of unraveling in
>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>> naming idea by Paul Ivanov)
>>>>
>>>> What do y'all think?
>>>>
>>>> Cheers,
>>>>
>>>> Matthew
>>>> Paul Ivanov
>>>> JB Poline
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> I always thought "F" and "C" are easy to understand, I always thought about
>>> the content and never about the memory when using it.
>>
>> I can only say that 4 out of 4 experienced numpy developers found
>> themselves unable to predict the behavior of these functions before
>> they saw the output.
>>
>> The problem is always that explaining something makes it clearer for a
>> moment, but, for those who do not have the explanation or who have
>> forgotten it, at least among us here, the outputs were generating
>> groans and / or high fives as we incorrectly or correctly guessed what
>> was going to happen.
>>
>> I think the only way to find out whether this really is confusing or
>> not, is to put someone in front of these functions without any
>> explanation and ask them to predict what is going to come out of the
>> various inputs and flags.   Or to try and teach it, which was the
>> problem we were having.
>
> changing the names doesn't make it easier to understand.
> I think the confusion is because the new A and K refer to existing memory
>
>
> ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I
> don't remember having seen any weird cases.
> ------------
>
> I always thought of "order" in array creation is the way we want to
> have the memory layout of the *target* array and has nothing to do
> with existing memory layout (creating view or copy as needed).

In the case of ravel of course F and C in memory aren't relevant.

'F' and 'C' don't refer to target memory layout at all in 'reshape':

In [26]: a = np.arange(10).reshape((2, 5))

In [28]: a.reshape((2, 5), order='F').flags
Out[28]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

So I think that distinction actively confusing in this case, and more
evidence that this is not the right name for what we mean.

Cheers,

Matthew


From matthew.brett at gmail.com  Sat Mar 30 18:21:45 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 15:21:45 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
Message-ID: <CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We were teaching today, and found ourselves getting very confused
>>>>> about ravel and shape in numpy.
>>>>>
>>>>> Summary
>>>>> --------------
>>>>>
>>>>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>>>>
>>>>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>>>>> or the first to the last.  This is "ravel index ordering"
>>>>> Idea 2) The physical layout of the array (on disk or in memory) can be
>>>>> "C" or "F" contiguous or neither.
>>>>> This is "memory ordering"
>>>>>
>>>>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>>>>
>>>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>>>>> index ordering, and this mixes the two ideas and is confusing.
>>>>>
>>>>> What the current situation looks like
>>>>> ----------------------------------------------------
>>>>>
>>>>> Specifically, we've been rolling this around 4 experienced numpy users
>>>>> and we all predicted at least one of the results below wrongly.
>>>>>
>>>>> This was what we knew, or should have known:
>>>>>
>>>>> In [2]: import numpy as np
>>>>>
>>>>> In [3]: arr = np.arange(10).reshape((2, 5))
>>>>>
>>>>> In [5]: arr.ravel()
>>>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> So, the 'ravel' operation unravels over the last axis (1) first,
>>>>> followed by axis 0.
>>>>>
>>>>> So far so good (even if the opposite to MATLAB, Octave).
>>>>>
>>>>> Then we found the 'order' flag to ravel:
>>>>>
>>>>> In [10]: arr.flags
>>>>> Out[10]:
>>>>>   C_CONTIGUOUS : True
>>>>>   F_CONTIGUOUS : False
>>>>>   OWNDATA : False
>>>>>   WRITEABLE : True
>>>>>   ALIGNED : True
>>>>>   UPDATEIFCOPY : False
>>>>>
>>>>> In [11]: arr.ravel('C')
>>>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> But we soon got confused.  How about this?
>>>>>
>>>>> In [12]: arr_F = np.array(arr, order='F')
>>>>>
>>>>> In [13]: arr_F.flags
>>>>> Out[13]:
>>>>>   C_CONTIGUOUS : False
>>>>>   F_CONTIGUOUS : True
>>>>>   OWNDATA : True
>>>>>   WRITEABLE : True
>>>>>   ALIGNED : True
>>>>>   UPDATEIFCOPY : False
>>>>>
>>>>> In [16]: arr_F
>>>>> Out[16]:
>>>>> array([[0, 1, 2, 3, 4],
>>>>>        [5, 6, 7, 8, 9]])
>>>>>
>>>>> In [17]: arr_F.ravel('C')
>>>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>>>>> ordering, but is to do with *index* ordering.
>>>>>
>>>>> And in fact, we can ask for memory ordering specifically:
>>>>>
>>>>> In [22]: arr.ravel('K')
>>>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> In [23]: arr_F.ravel('K')
>>>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>>
>>>>> In [24]: arr.ravel('A')
>>>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>>
>>>>> In [25]: arr_F.ravel('A')
>>>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>>
>>>>> There are some confusions to get into with the 'order' flag to reshape
>>>>> as well, of the same type.
>>>>>
>>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>>>>
>>>>> This is very confusing.  We think the index ordering and memory
>>>>> ordering ideas need to be separated, and specifically, we should avoid
>>>>> using "C" and "F" to refer to index ordering.
>>>>>
>>>>> Proposal
>>>>> -------------
>>>>>
>>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>> index ordering for ravel, reshape
>>>>> * Prefer "Z" and "N", being graphical representations of unraveling in
>>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>> naming idea by Paul Ivanov)
>>>>>
>>>>> What do y'all think?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Matthew
>>>>> Paul Ivanov
>>>>> JB Poline
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>>
>>>> I always thought "F" and "C" are easy to understand, I always thought about
>>>> the content and never about the memory when using it.
>>>
>>> I can only say that 4 out of 4 experienced numpy developers found
>>> themselves unable to predict the behavior of these functions before
>>> they saw the output.
>>>
>>> The problem is always that explaining something makes it clearer for a
>>> moment, but, for those who do not have the explanation or who have
>>> forgotten it, at least among us here, the outputs were generating
>>> groans and / or high fives as we incorrectly or correctly guessed what
>>> was going to happen.
>>>
>>> I think the only way to find out whether this really is confusing or
>>> not, is to put someone in front of these functions without any
>>> explanation and ask them to predict what is going to come out of the
>>> various inputs and flags.   Or to try and teach it, which was the
>>> problem we were having.
>>
>> changing the names doesn't make it easier to understand.
>> I think the confusion is because the new A and K refer to existing memory
>>
>>
>> ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I
>> don't remember having seen any weird cases.
>
> example from our statistics use:
> rows are observations/time periods, columns are variables/individuals
>
> using "F" or "C", we can stack either by time-periods (observations)
> or individuals (cross-section units)
> that's easy to understand.

I disagree, I think it's confusing, but I have evidence, and that is
that four out of four of us tested ourselves and got it wrong.

Perhaps we are particularly dumb or poorly informed, but I think it's
rash to assert there is no problem here.

Cheers,

Matthew


From sebastian at sipsolutions.net  Sat Mar 30 19:28:49 2013
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 31 Mar 2013 00:28:49 +0100
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5rqPT9SeWdTj_-cv1NthhGYzFB3umHQJpVvfQ3=PAJ-tQ@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<1364669723.2556.19.camel@sebastian-laptop>
	<CAH6Pt5rqPT9SeWdTj_-cv1NthhGYzFB3umHQJpVvfQ3=PAJ-tQ@mail.gmail.com>
Message-ID: <1364686129.2556.65.camel@sebastian-laptop>

On Sat, 2013-03-30 at 12:45 -0700, Matthew Brett wrote:
> Hi,
> 
> On Sat, Mar 30, 2013 at 11:55 AM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> > On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
> >> Hi,
> >>
> >> We were teaching today, and found ourselves getting very confused
> >> about ravel and shape in numpy.
> >>

<snip>

> >>
> >> What do y'all think?
> >>
> >
> > Personally I think it is clear enough and that "Z" and "N" would confuse
> > me just as much (though I am used to the other names). Also "Z" and "N"
> > would seem more like aliases, which would also make sense in the memory
> > order context.
> > If anything, I would prefer renaming the arguments iteration_order and
> > memory_order, but it seems overdoing it...
> 
> I am not sure what you mean - at the moment  there is one argument
> called 'order' that can refer to iteration order or memory order.  Are
> you proposing two arguments?
> 

Yes that is what I meant. The reason that it is not convincing to me is
that if I write `np.reshape(arr, ..., order='Z')`, I may be tempted to
also write `np.copy(arr, order='Z')`. I don't see anything against
allowing 'Z' as a more memorable 'C' (I also used to forget which was
which), but I don't really see enforcing a different _value_ on the same
named argument making it clearer.
Renaming the argument itself would seem more sensible to me right now,
but I cannot think of a decent name, so I would prefer trying to clarify
the documentation if necessary.

> > Maybe the documentation could just be checked if it is always clear
> > though. I.e. maybe it does not use "iteration" or "memory" order
> > consistently (though I somewhat feel it is usually clear that it must be
> > iteration order, since no numpy function cares about the input memory
> > order as they will just do a copy if necessary).
> 
> Do you really mean this?  Numpy is full of 'order=' flags that refer to memory.
> 

I somewhat imagined there were more iteration order flags and I
basically count empty/ones/.../copy as basically one "array creation"
monster...

> Cheers,
> 
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From brad.froehle at gmail.com  Sat Mar 30 19:31:53 2013
From: brad.froehle at gmail.com (Bradley M. Froehle)
Date: Sat, 30 Mar 2013 16:31:53 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
Message-ID: <CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>

On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>wrote:

> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <
> matthew.brett at gmail.com> wrote:
> >>>>>
> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
> ordering.
> >>>>>
> >>>>> This is very confusing.  We think the index ordering and memory
> >>>>> ordering ideas need to be separated, and specifically, we should
> avoid
> >>>>> using "C" and "F" to refer to index ordering.
> >>>>>
> >>>>> Proposal
> >>>>> -------------
> >>>>>
> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
> >>>>> index ordering for ravel, reshape
> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
> in
> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
> >>>>> naming idea by Paul Ivanov)
> >>>>>
> >>>>> What do y'all think?
> >>>>
> >>>> I always thought "F" and "C" are easy to understand, I always thought
> about
> >>>> the content and never about the memory when using it.
> >>
> >> changing the names doesn't make it easier to understand.
> >> I think the confusion is because the new A and K refer to existing
> memory
> >>
>
> I disagree, I think it's confusing, but I have evidence, and that is
> that four out of four of us tested ourselves and got it wrong.
>
> Perhaps we are particularly dumb or poorly informed, but I think it's
> rash to assert there is no problem here.
>

I got all four correct.  I think the concept --- at least for ravel --- is
pretty simple: would you like to read the data off in C ordering or Fortran
ordering.  Since the output array is one-dimensional, its ordering is
irrelevant.

I don't understand the 'Z' / 'N' suggestion at all.  Are they part of some
pneumonic?

I'd STRONGLY advise against deprecating the 'F' and 'C' options.  NumPy
already suffers from too much bikeshedding with names --- I rarely am able
to pull out a script I wrote using NumPy even a few years ago and have
it immediately work.

Cheers,
Brad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130330/d30d157d/attachment.html>

From matthew.brett at gmail.com  Sat Mar 30 19:42:38 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 16:42:38 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
Message-ID: <CAH6Pt5rhJcG8WJqVLYOYgOmuPHcgxy0=g3J-z1dG6QS+J5WGuQ@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 4:31 PM, Bradley M. Froehle
<brad.froehle at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>> >> <matthew.brett at gmail.com> wrote:
>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>> >>>> <matthew.brett at gmail.com> wrote:
>> >>>>>
>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>> >>>>> ordering.
>> >>>>>
>> >>>>> This is very confusing.  We think the index ordering and memory
>> >>>>> ordering ideas need to be separated, and specifically, we should
>> >>>>> avoid
>> >>>>> using "C" and "F" to refer to index ordering.
>> >>>>>
>> >>>>> Proposal
>> >>>>> -------------
>> >>>>>
>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>> >>>>> index ordering for ravel, reshape
>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>> >>>>> in
>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>> >>>>> naming idea by Paul Ivanov)
>> >>>>>
>> >>>>> What do y'all think?
>> >>>>
>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>> >>>> about
>> >>>> the content and never about the memory when using it.
>> >>
>> >> changing the names doesn't make it easier to understand.
>> >> I think the confusion is because the new A and K refer to existing
>> >> memory
>> >>
>>
>> I disagree, I think it's confusing, but I have evidence, and that is
>> that four out of four of us tested ourselves and got it wrong.
>>
>> Perhaps we are particularly dumb or poorly informed, but I think it's
>> rash to assert there is no problem here.
>
>
> I got all four correct.

Then you are smarted and or better informed than we were.  I hope you
didn't read my explanation before you tested yourself.

Of course if you did read my email first I'd expect you and I to get
the answer right first time.

If you didn't read my email first, and didn't think too hard about it,
and still got all the examples right, and you'd get other more
confusing examples right that use reshape,  then I'd add you as a data
point on the other side to the four data points we got yesterday.

> I think the concept --- at least for ravel --- is
> pretty simple: would you like to read the data off in C ordering or Fortran
> ordering.  Since the output array is one-dimensional, its ordering is
> irrelevant.

Right - hence my confidence that Josef's sense of thinking of the 'C'
and 'F' being target array output was not a good way to think of it in
this case.  It is in the case of arr.tostring() though.

> I don't understand the 'Z' / 'N' suggestion at all.  Are they part of some
> pneumonic?

Think of the way you'd read off the elements using reverse
(last-first) index order for a 2D array, you might imagine something
like a Z.

> I'd STRONGLY advise against deprecating the 'F' and 'C' options.  NumPy
> already suffers from too much bikeshedding with names --- I rarely am able
> to pull out a script I wrote using NumPy even a few years ago and have it
> immediately work.

I wish we could drop bike-shedding - it's a completely useless word
because one person's bike-shedding is another person's necessary
clarification.  You think this clarification isn't necessary and you
think this discussion is bike-shedding.

I'm not suggesting dropping the 'F' and 'C', obviously - can I call
that a 'straw man'?

I am suggesting changing the name to something much clearer, leaving
that name clearly explained in the docs, and leaving 'C' and 'F" as
functional synonyms for a very long time.

Cheers,

Matthew


From josef.pktd at gmail.com  Sat Mar 30 19:50:53 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 30 Mar 2013 19:50:53 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
Message-ID: <CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>

On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
<brad.froehle at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>> >> <matthew.brett at gmail.com> wrote:
>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>> >>>> <matthew.brett at gmail.com> wrote:
>> >>>>>
>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>> >>>>> ordering.
>> >>>>>
>> >>>>> This is very confusing.  We think the index ordering and memory
>> >>>>> ordering ideas need to be separated, and specifically, we should
>> >>>>> avoid
>> >>>>> using "C" and "F" to refer to index ordering.
>> >>>>>
>> >>>>> Proposal
>> >>>>> -------------
>> >>>>>
>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>> >>>>> index ordering for ravel, reshape
>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>> >>>>> in
>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>> >>>>> naming idea by Paul Ivanov)
>> >>>>>
>> >>>>> What do y'all think?
>> >>>>
>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>> >>>> about
>> >>>> the content and never about the memory when using it.
>> >>
>> >> changing the names doesn't make it easier to understand.
>> >> I think the confusion is because the new A and K refer to existing
>> >> memory
>> >>
>>
>> I disagree, I think it's confusing, but I have evidence, and that is
>> that four out of four of us tested ourselves and got it wrong.
>>
>> Perhaps we are particularly dumb or poorly informed, but I think it's
>> rash to assert there is no problem here.

I think you are overcomplicating things or phrased it as a "trick question"

ravel F and C have *nothing* to do with memory layout.
I think it's not confusing for beginners that have no idea and never think
about memory layout.
I've never seen any problems with it in statsmodels and I have seen
many developers (GSOC) that are pretty new to python and numpy.
(I didn't check the repo history to verify, so IIRC)

Even if N, Z were clearer in this case (which I don't think it is and which
I have no idea what it should stand for), you would have to go for every
use of ``order`` in numpy to check whether it should be N or F or Z or C,
and then users would have to check which order name convention is
used in a specific function.

Josef

>
>
> I got all four correct.  I think the concept --- at least for ravel --- is
> pretty simple: would you like to read the data off in C ordering or Fortran
> ordering.  Since the output array is one-dimensional, its ordering is
> irrelevant.
>
> I don't understand the 'Z' / 'N' suggestion at all.  Are they part of some
> pneumonic?
>
> I'd STRONGLY advise against deprecating the 'F' and 'C' options.  NumPy
> already suffers from too much bikeshedding with names --- I rarely am able
> to pull out a script I wrote using NumPy even a few years ago and have it
> immediately work.
>
> Cheers,
> Brad
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From matthew.brett at gmail.com  Sat Mar 30 20:29:53 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 20:29:53 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
Message-ID: <CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
> <brad.froehle at gmail.com> wrote:
>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>> wrote:
>>>
>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>> >> <matthew.brett at gmail.com> wrote:
>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>> >>>> <matthew.brett at gmail.com> wrote:
>>> >>>>>
>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>> >>>>> ordering.
>>> >>>>>
>>> >>>>> This is very confusing.  We think the index ordering and memory
>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>> >>>>> avoid
>>> >>>>> using "C" and "F" to refer to index ordering.
>>> >>>>>
>>> >>>>> Proposal
>>> >>>>> -------------
>>> >>>>>
>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>> >>>>> index ordering for ravel, reshape
>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>> >>>>> in
>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>> >>>>> naming idea by Paul Ivanov)
>>> >>>>>
>>> >>>>> What do y'all think?
>>> >>>>
>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>> >>>> about
>>> >>>> the content and never about the memory when using it.
>>> >>
>>> >> changing the names doesn't make it easier to understand.
>>> >> I think the confusion is because the new A and K refer to existing
>>> >> memory
>>> >>
>>>
>>> I disagree, I think it's confusing, but I have evidence, and that is
>>> that four out of four of us tested ourselves and got it wrong.
>>>
>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>> rash to assert there is no problem here.
>
> I think you are overcomplicating things or phrased it as a "trick question"

I don't know what you mean by trick question - was there something
over-complicated in the example?  I deliberately didn't include
various much more confusing examples in "reshape".

> ravel F and C have *nothing* to do with memory layout.

We do agree on this of course - but you said in an earlier mail that
you thought of 'C" and 'F' as referring to target memory layout (which
they don't in this case) so I think we also agree that "C" and "F" do
often refer to memory layout elsewhere in numpy.

> I think it's not confusing for beginners that have no idea and never think
> about memory layout.
> I've never seen any problems with it in statsmodels and I have seen
> many developers (GSOC) that are pretty new to python and numpy.
> (I didn't check the repo history to verify, so IIRC)

Usually you don't need to know what reshape or ravel did because you
are likely to reshape again and that will use the same algorithm.

For example, I didn't know that that ravel worked in reverse index
order, started explaining it wrong, and had to check. I use ravel and
reshape a lot, and have not run into this problem because either a) I
didn't test my code properly or b) I did reshape after ravel / reshape
and it reversed what I did first time.  So, I don't think it's "we
haven't noticed any problems" is a good argument in the face of
"several experienced developers got it wrong when trying to guess what
it did".

> Even if N, Z were clearer in this case (which I don't think it is and which
> I have no idea what it should stand for), you would have to go for every
> use of ``order`` in numpy to check whether it should be N or F or Z or C,
> and then users would have to check which order name convention is
> used in a specific function.

Right - and this would be silly if and only if it made sense to
conflate memory layout and index ordering.

Cheers,

Matthew


From josef.pktd at gmail.com  Sat Mar 30 22:02:42 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 30 Mar 2013 22:02:42 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
Message-ID: <CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>

On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>> <brad.froehle at gmail.com> wrote:
>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>> wrote:
>>>>
>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>> >> <matthew.brett at gmail.com> wrote:
>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>> >>>>> ordering.
>>>> >>>>>
>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>> >>>>> avoid
>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>> >>>>>
>>>> >>>>> Proposal
>>>> >>>>> -------------
>>>> >>>>>
>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>> >>>>> index ordering for ravel, reshape
>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>> >>>>> in
>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>> >>>>> naming idea by Paul Ivanov)
>>>> >>>>>
>>>> >>>>> What do y'all think?
>>>> >>>>
>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>> >>>> about
>>>> >>>> the content and never about the memory when using it.
>>>> >>
>>>> >> changing the names doesn't make it easier to understand.
>>>> >> I think the confusion is because the new A and K refer to existing
>>>> >> memory
>>>> >>
>>>>
>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>> that four out of four of us tested ourselves and got it wrong.
>>>>
>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>> rash to assert there is no problem here.
>>
>> I think you are overcomplicating things or phrased it as a "trick question"
>
> I don't know what you mean by trick question - was there something
> over-complicated in the example?  I deliberately didn't include
> various much more confusing examples in "reshape".

I meant making the "candidates" think about memory instead of just
column versus row stacking.
I don't think I ever get confused about reshape "F" in 2d.
But when I work with 3d or larger ndim nd-arrays, I always have to
try an example to check my intuition (in general not just reshape).

>
>> ravel F and C have *nothing* to do with memory layout.
>
> We do agree on this of course - but you said in an earlier mail that
> you thought of 'C" and 'F' as referring to target memory layout (which
> they don't in this case) so I think we also agree that "C" and "F" do
> often refer to memory layout elsewhere in numpy.

I guess that wasn't so helpful.
(emphasis on *target*, There are very few places where an order
keyword refers to *existing* memory layout.
So I'm not tempted to think about existing memory layout when I see
``order``.

Also my examples might have confused the issue:
ravel and reshape, with C and F are easy to understand without
ever looking at memory issues.

memory only comes into play when we want to know whether we
get a view or copy. The examples were only for the cases when I
do care about this.
)

>
>> I think it's not confusing for beginners that have no idea and never think
>> about memory layout.
>> I've never seen any problems with it in statsmodels and I have seen
>> many developers (GSOC) that are pretty new to python and numpy.
>> (I didn't check the repo history to verify, so IIRC)
>
> Usually you don't need to know what reshape or ravel did because you
> are likely to reshape again and that will use the same algorithm.
>
> For example, I didn't know that that ravel worked in reverse index
> order, started explaining it wrong, and had to check. I use ravel and
> reshape a lot, and have not run into this problem because either a) I
> didn't test my code properly or b) I did reshape after ravel / reshape
> and it reversed what I did first time.  So, I don't think it's "we
> haven't noticed any problems" is a good argument in the face of
> "several experienced developers got it wrong when trying to guess what
> it did".

What's reverse index order?

In the case of statsmodels, we do care about the stacking order. When
we use reshape(..., order='F') or ravel('F'), it's only because we
want to have a
specific array (not memory) layout (and/or because the raveled array came
from R)

(aside:  2 cases
- for 2d parameter vectors, we ravel and reshape often, and we changed
our convention to Fortran order, (parameter in rows, equations in columns, IIRC)
The interpretation of the results depends on which way we ravel or reshape.

- for panel data (time versus individuals), we need to build matching
kronecker product arrays which are block-diagonal if the stacking/``order``
is the right way.

None of the cases cares about memory layout, it's just:
    Do we stack by columns or by rows, i.e. fortran- or c-order?
    Do we want this in rows or in columns?
)


>
>> Even if N, Z were clearer in this case (which I don't think it is and which
>> I have no idea what it should stand for), you would have to go for every
>> use of ``order`` in numpy to check whether it should be N or F or Z or C,
>> and then users would have to check which order name convention is
>> used in a specific function.
>
> Right - and this would be silly if and only if it made sense to
> conflate memory layout and index ordering.

I see the two things, but never saw it as a problem

arr2 = np.asarray(arr1, order='F')
    give me an array with Fortran memory layout, I need it
(never used in statsmodels,
there might be a few places where we used other ways to control
the memory layout, but not much.)

arr2 = arr1.reshape(-1, 5, order='F')
    unstack this array by columns, I want 5 of them
arr1 = arr2.ravel('F')
    go back, stack them again by columns
(used quite a bit as described before)

Cheers,

Josef

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From matthew.brett at gmail.com  Sat Mar 30 23:43:17 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 20:43:17 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
Message-ID: <CAH6Pt5oDMW65LEdXG1Y1Tvve029Ttc0irXf5QfzryLS+PBGJ6g@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>> <brad.froehle at gmail.com> wrote:
>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>> wrote:
>>>>>
>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>> >>>>>
>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>> >>>>> ordering.
>>>>> >>>>>
>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>> >>>>> avoid
>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>> >>>>>
>>>>> >>>>> Proposal
>>>>> >>>>> -------------
>>>>> >>>>>
>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>> >>>>> index ordering for ravel, reshape
>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>> >>>>> in
>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>> >>>>> naming idea by Paul Ivanov)
>>>>> >>>>>
>>>>> >>>>> What do y'all think?
>>>>> >>>>
>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>> >>>> about
>>>>> >>>> the content and never about the memory when using it.
>>>>> >>
>>>>> >> changing the names doesn't make it easier to understand.
>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>> >> memory
>>>>> >>
>>>>>
>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>
>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>> rash to assert there is no problem here.
>>>
>>> I think you are overcomplicating things or phrased it as a "trick question"
>>
>> I don't know what you mean by trick question - was there something
>> over-complicated in the example?  I deliberately didn't include
>> various much more confusing examples in "reshape".
>
> I meant making the "candidates" think about memory instead of just
> column versus row stacking.
> I don't think I ever get confused about reshape "F" in 2d.
> But when I work with 3d or larger ndim nd-arrays, I always have to
> try an example to check my intuition (in general not just reshape).
>
>>
>>> ravel F and C have *nothing* to do with memory layout.
>>
>> We do agree on this of course - but you said in an earlier mail that
>> you thought of 'C" and 'F' as referring to target memory layout (which
>> they don't in this case) so I think we also agree that "C" and "F" do
>> often refer to memory layout elsewhere in numpy.
>
> I guess that wasn't so helpful.
> (emphasis on *target*, There are very few places where an order
> keyword refers to *existing* memory layout.

It is helpful because it shows how easy it is to get confused between
memory order and index order.

> What's reverse index order?

I am not being clear, sorry about that:

import numpy as np

def ravel_iter_last_fastest(arr):
    res = []
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            for k in range(arr.shape[2]):
                # Iterating over last dimension fastest
                res.append(arr[i, j, k])
    return np.array(res)


def ravel_iter_first_fastest(arr):
    res = []
    for k in range(arr.shape[2]):
        for j in range(arr.shape[1]):
            for i in range(arr.shape[0]):
                # Iterating over first dimension fastest
                res.append(arr[i, j, k])
    return np.array(res)


a = np.arange(24).reshape((2, 3, 4))

print np.all(a.ravel('C') == ravel_iter_last_fastest(a))
print np.all(a.ravel('F') == ravel_iter_first_fastest(a))

By 'reverse index ordering' I mean 'ravel_iter_last_fastest' above.  I
guess one could argue that this was not 'reverse' but 'forward' index
ordering, but I am not arguing about which is better, or those names,
only that it's the order of indices that differs, not the memory
layout, and that these ideas need to be kept separate.

Cheers,

Matthew


From matthew.brett at gmail.com  Sun Mar 31 00:04:49 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 21:04:49 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
Message-ID: <CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>> <brad.froehle at gmail.com> wrote:
>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>> wrote:
>>>>>
>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>> >>>>>
>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>> >>>>> ordering.
>>>>> >>>>>
>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>> >>>>> avoid
>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>> >>>>>
>>>>> >>>>> Proposal
>>>>> >>>>> -------------
>>>>> >>>>>
>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>> >>>>> index ordering for ravel, reshape
>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>> >>>>> in
>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>> >>>>> naming idea by Paul Ivanov)
>>>>> >>>>>
>>>>> >>>>> What do y'all think?
>>>>> >>>>
>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>> >>>> about
>>>>> >>>> the content and never about the memory when using it.
>>>>> >>
>>>>> >> changing the names doesn't make it easier to understand.
>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>> >> memory
>>>>> >>
>>>>>
>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>
>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>> rash to assert there is no problem here.
>>>
>>> I think you are overcomplicating things or phrased it as a "trick question"
>>
>> I don't know what you mean by trick question - was there something
>> over-complicated in the example?  I deliberately didn't include
>> various much more confusing examples in "reshape".
>
> I meant making the "candidates" think about memory instead of just
> column versus row stacking.

To be specific, we were teaching about reshaping a (I, J, K, N) 4D
array, it was an image, with time as the 4th dimension (N time
points).   Raveling and reshaping 3D and 4D arrays is a common thing
to do in neuroimaging, as you can imagine.

A student asked what he would get back from raveling this array, a
concatenated time series, or something spatial?

We showed (I'd worked it out by this time) that the first N values
were the time series given by [0, 0, 0, :].

He said - "Oh - I see - so the data is stored as a whole lot of time
series one by one, I thought it would be stored as a series of
images'.

Ironically, this was a Fortran-ordered array in memory, and he was wrong.

So, I think the idea of memory ordering and index ordering is very
easy to confuse, and comes up naturally.

I would like, as a teacher, to be able to say something like:

This is what C memory layout is (it's the memory layout  that gives
arr.flags.C_CONTIGUOUS=True)
This is what F memory layout is (it's the memory layout  that gives
arr.flags.F_CONTIGUOUS=True)
It's rather easy to get something that is neither C or F memory layout
Numpy does many memory layouts.
Ravel and reshape and numpy in general do not care (normally) about C
or F layouts, they only care about index ordering.

My point, that I'm repeating, is that my job is made harder by
'arr.ravel('F')'.

Cheers,

Matthew


From josef.pktd at gmail.com  Sun Mar 31 00:05:20 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 31 Mar 2013 00:05:20 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5oDMW65LEdXG1Y1Tvve029Ttc0irXf5QfzryLS+PBGJ6g@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5oDMW65LEdXG1Y1Tvve029Ttc0irXf5QfzryLS+PBGJ6g@mail.gmail.com>
Message-ID: <CAMMTP+AspEj-aPyuYbQcVqV80hPj7YphZS6rBxB4QLqJP2K_Rg@mail.gmail.com>

On Sat, Mar 30, 2013 at 11:43 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>> <brad.froehle at gmail.com> wrote:
>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>> >>>>> ordering.
>>>>>> >>>>>
>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>> >>>>> avoid
>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>> >>>>>
>>>>>> >>>>> Proposal
>>>>>> >>>>> -------------
>>>>>> >>>>>
>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>> >>>>> index ordering for ravel, reshape
>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>> >>>>> in
>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>> >>>>>
>>>>>> >>>>> What do y'all think?
>>>>>> >>>>
>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>> >>>> about
>>>>>> >>>> the content and never about the memory when using it.
>>>>>> >>
>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>> >> memory
>>>>>> >>
>>>>>>
>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>
>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>> rash to assert there is no problem here.
>>>>
>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>
>>> I don't know what you mean by trick question - was there something
>>> over-complicated in the example?  I deliberately didn't include
>>> various much more confusing examples in "reshape".
>>
>> I meant making the "candidates" think about memory instead of just
>> column versus row stacking.
>> I don't think I ever get confused about reshape "F" in 2d.
>> But when I work with 3d or larger ndim nd-arrays, I always have to
>> try an example to check my intuition (in general not just reshape).
>>
>>>
>>>> ravel F and C have *nothing* to do with memory layout.
>>>
>>> We do agree on this of course - but you said in an earlier mail that
>>> you thought of 'C" and 'F' as referring to target memory layout (which
>>> they don't in this case) so I think we also agree that "C" and "F" do
>>> often refer to memory layout elsewhere in numpy.
>>
>> I guess that wasn't so helpful.
>> (emphasis on *target*, There are very few places where an order
>> keyword refers to *existing* memory layout.
>
> It is helpful because it shows how easy it is to get confused between
> memory order and index order.
>
>> What's reverse index order?
>
> I am not being clear, sorry about that:
>
> import numpy as np
>
> def ravel_iter_last_fastest(arr):
>     res = []
>     for i in range(arr.shape[0]):
>         for j in range(arr.shape[1]):
>             for k in range(arr.shape[2]):
>                 # Iterating over last dimension fastest
>                 res.append(arr[i, j, k])
>     return np.array(res)
>
>
> def ravel_iter_first_fastest(arr):
>     res = []
>     for k in range(arr.shape[2]):
>         for j in range(arr.shape[1]):
>             for i in range(arr.shape[0]):
>                 # Iterating over first dimension fastest
>                 res.append(arr[i, j, k])
>     return np.array(res)

good example

that's just C and F order in the terminology of numpy
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#controlling-iteration-order
(independent of memory)
http://docs.scipy.org/doc/numpy/reference/generated/numpy.flatiter.html#numpy.flatiter

I don't think we want to rename a large part of the basic terminology of numpy


Josef


>
>
> a = np.arange(24).reshape((2, 3, 4))
>
> print np.all(a.ravel('C') == ravel_iter_last_fastest(a))
> print np.all(a.ravel('F') == ravel_iter_first_fastest(a))
>
> By 'reverse index ordering' I mean 'ravel_iter_last_fastest' above.  I
> guess one could argue that this was not 'reverse' but 'forward' index
> ordering, but I am not arguing about which is better, or those names,
> only that it's the order of indices that differs, not the memory
> layout, and that these ideas need to be kept separate.
>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From matthew.brett at gmail.com  Sun Mar 31 00:12:51 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 21:12:51 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+AspEj-aPyuYbQcVqV80hPj7YphZS6rBxB4QLqJP2K_Rg@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5oDMW65LEdXG1Y1Tvve029Ttc0irXf5QfzryLS+PBGJ6g@mail.gmail.com>
	<CAMMTP+AspEj-aPyuYbQcVqV80hPj7YphZS6rBxB4QLqJP2K_Rg@mail.gmail.com>
Message-ID: <CAH6Pt5p=vvWRJKwuaFh6NWz4ng9VRz=859SQrQsLDqoMZchadg@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 9:05 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 11:43 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>> <brad.froehle at gmail.com> wrote:
>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>> >>>>> ordering.
>>>>>>> >>>>>
>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>> >>>>> avoid
>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>> >>>>>
>>>>>>> >>>>> Proposal
>>>>>>> >>>>> -------------
>>>>>>> >>>>>
>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>> >>>>> in
>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>> >>>>>
>>>>>>> >>>>> What do y'all think?
>>>>>>> >>>>
>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>> >>>> about
>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>> >>
>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>> >> memory
>>>>>>> >>
>>>>>>>
>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>
>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>> rash to assert there is no problem here.
>>>>>
>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>
>>>> I don't know what you mean by trick question - was there something
>>>> over-complicated in the example?  I deliberately didn't include
>>>> various much more confusing examples in "reshape".
>>>
>>> I meant making the "candidates" think about memory instead of just
>>> column versus row stacking.
>>> I don't think I ever get confused about reshape "F" in 2d.
>>> But when I work with 3d or larger ndim nd-arrays, I always have to
>>> try an example to check my intuition (in general not just reshape).
>>>
>>>>
>>>>> ravel F and C have *nothing* to do with memory layout.
>>>>
>>>> We do agree on this of course - but you said in an earlier mail that
>>>> you thought of 'C" and 'F' as referring to target memory layout (which
>>>> they don't in this case) so I think we also agree that "C" and "F" do
>>>> often refer to memory layout elsewhere in numpy.
>>>
>>> I guess that wasn't so helpful.
>>> (emphasis on *target*, There are very few places where an order
>>> keyword refers to *existing* memory layout.
>>
>> It is helpful because it shows how easy it is to get confused between
>> memory order and index order.
>>
>>> What's reverse index order?
>>
>> I am not being clear, sorry about that:
>>
>> import numpy as np
>>
>> def ravel_iter_last_fastest(arr):
>>     res = []
>>     for i in range(arr.shape[0]):
>>         for j in range(arr.shape[1]):
>>             for k in range(arr.shape[2]):
>>                 # Iterating over last dimension fastest
>>                 res.append(arr[i, j, k])
>>     return np.array(res)
>>
>>
>> def ravel_iter_first_fastest(arr):
>>     res = []
>>     for k in range(arr.shape[2]):
>>         for j in range(arr.shape[1]):
>>             for i in range(arr.shape[0]):
>>                 # Iterating over first dimension fastest
>>                 res.append(arr[i, j, k])
>>     return np.array(res)
>
> good example
>
> that's just C and F order in the terminology of numpy
> http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#controlling-iteration-order
> (independent of memory)
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.flatiter.html#numpy.flatiter
>
> I don't think we want to rename a large part of the basic terminology of numpy

Sometimes two ideas get conflated together, and it seems natural to
keep together, until people get confused, and you realize that there
are two separate ideas.

For example here's a quote from the 'flatiter' doc :

    Iteration is done in C-contiguous style

Now - that seems really ugly to me.  For example, 'contiguous' should
not be in that sentence, although it's easy to see why it is, and it
seems to me to be a sign of the confusion between the ideas.

Cheers,

Matthew


From josef.pktd at gmail.com  Sun Mar 31 00:37:50 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 31 Mar 2013 00:37:50 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>
Message-ID: <CAMMTP+A5Y3yn6v50KE8JiVZNYVyOZMu4XJZEnBj=x7pBZLA7Bw@mail.gmail.com>

On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>> <brad.froehle at gmail.com> wrote:
>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>> >>>>> ordering.
>>>>>> >>>>>
>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>> >>>>> avoid
>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>> >>>>>
>>>>>> >>>>> Proposal
>>>>>> >>>>> -------------
>>>>>> >>>>>
>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>> >>>>> index ordering for ravel, reshape
>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>> >>>>> in
>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>> >>>>>
>>>>>> >>>>> What do y'all think?
>>>>>> >>>>
>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>> >>>> about
>>>>>> >>>> the content and never about the memory when using it.
>>>>>> >>
>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>> >> memory
>>>>>> >>
>>>>>>
>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>
>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>> rash to assert there is no problem here.
>>>>
>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>
>>> I don't know what you mean by trick question - was there something
>>> over-complicated in the example?  I deliberately didn't include
>>> various much more confusing examples in "reshape".
>>
>> I meant making the "candidates" think about memory instead of just
>> column versus row stacking.
>
> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
> array, it was an image, with time as the 4th dimension (N time
> points).   Raveling and reshaping 3D and 4D arrays is a common thing
> to do in neuroimaging, as you can imagine.
>
> A student asked what he would get back from raveling this array, a
> concatenated time series, or something spatial?
>
> We showed (I'd worked it out by this time) that the first N values
> were the time series given by [0, 0, 0, :].
>
> He said - "Oh - I see - so the data is stored as a whole lot of time
> series one by one, I thought it would be stored as a series of
> images'.
>
> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>
> So, I think the idea of memory ordering and index ordering is very
> easy to confuse, and comes up naturally.
>
> I would like, as a teacher, to be able to say something like:
>
> This is what C memory layout is (it's the memory layout  that gives
> arr.flags.C_CONTIGUOUS=True)
> This is what F memory layout is (it's the memory layout  that gives
> arr.flags.F_CONTIGUOUS=True)
> It's rather easy to get something that is neither C or F memory layout
> Numpy does many memory layouts.
> Ravel and reshape and numpy in general do not care (normally) about C
> or F layouts, they only care about index ordering.
>
> My point, that I'm repeating, is that my job is made harder by
> 'arr.ravel('F')'.

But once you know that ravel and reshape don't care about memory, the
ravel is easy to predict (maybe not easy to visualize in 4-D):

order=C: stack the last dimension, N, time series of one 3d pixels,
then stack the time series of the next pixel...
    process pixels by depth and the row by row (like old TVs)

I assume you did this because your underlying array is C contiguous.
so your ravel('C') is a c-contiguous view (instead of some weird
strides or a copy)

I usually prefer time in the first dimension, and stack order=F, then
I can start at the front, stack all time periods of the first pixel,
keep going and work pixels down the columns, first page, next page,
...
(and I hope I have a F-contiguous array, so my raveled array is also
F-contiguous.)

(note: I'm bringing memory back in as optimization, but not to predict
the stacking)

Josef
(I think brains are designed for Fortran order and C-ordering in numpy
is a accident,
except, reading a Western language book is neither)


>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From matthew.brett at gmail.com  Sun Mar 31 00:50:00 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sat, 30 Mar 2013 21:50:00 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+A5Y3yn6v50KE8JiVZNYVyOZMu4XJZEnBj=x7pBZLA7Bw@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>
	<CAMMTP+A5Y3yn6v50KE8JiVZNYVyOZMu4XJZEnBj=x7pBZLA7Bw@mail.gmail.com>
Message-ID: <CAH6Pt5pcoj0TS_gufpd7KC1jenh=DNoWM26ZfOOsc88zyJH6vA@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd at gmail.com> wrote:
> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>> <brad.froehle at gmail.com> wrote:
>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>> >>>>> ordering.
>>>>>>> >>>>>
>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>> >>>>> avoid
>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>> >>>>>
>>>>>>> >>>>> Proposal
>>>>>>> >>>>> -------------
>>>>>>> >>>>>
>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>> >>>>> in
>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>> >>>>>
>>>>>>> >>>>> What do y'all think?
>>>>>>> >>>>
>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>> >>>> about
>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>> >>
>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>> >> memory
>>>>>>> >>
>>>>>>>
>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>
>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>> rash to assert there is no problem here.
>>>>>
>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>
>>>> I don't know what you mean by trick question - was there something
>>>> over-complicated in the example?  I deliberately didn't include
>>>> various much more confusing examples in "reshape".
>>>
>>> I meant making the "candidates" think about memory instead of just
>>> column versus row stacking.
>>
>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>> array, it was an image, with time as the 4th dimension (N time
>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>> to do in neuroimaging, as you can imagine.
>>
>> A student asked what he would get back from raveling this array, a
>> concatenated time series, or something spatial?
>>
>> We showed (I'd worked it out by this time) that the first N values
>> were the time series given by [0, 0, 0, :].
>>
>> He said - "Oh - I see - so the data is stored as a whole lot of time
>> series one by one, I thought it would be stored as a series of
>> images'.
>>
>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>
>> So, I think the idea of memory ordering and index ordering is very
>> easy to confuse, and comes up naturally.
>>
>> I would like, as a teacher, to be able to say something like:
>>
>> This is what C memory layout is (it's the memory layout  that gives
>> arr.flags.C_CONTIGUOUS=True)
>> This is what F memory layout is (it's the memory layout  that gives
>> arr.flags.F_CONTIGUOUS=True)
>> It's rather easy to get something that is neither C or F memory layout
>> Numpy does many memory layouts.
>> Ravel and reshape and numpy in general do not care (normally) about C
>> or F layouts, they only care about index ordering.
>>
>> My point, that I'm repeating, is that my job is made harder by
>> 'arr.ravel('F')'.
>
> But once you know that ravel and reshape don't care about memory, the
> ravel is easy to predict (maybe not easy to visualize in 4-D):

But this assumes that you already know that there's such a thing as
memory layout, and there's such a thing as index ordering, and that
'C' and 'F' in ravel refer to index ordering.  Once you have that,
you're golden.  I'm arguing it's markedly harder to get this
distinction, and keep it in mind, and teach it, if we are using the
'C' and 'F" names for both things.

> order=C: stack the last dimension, N, time series of one 3d pixels,
> then stack the time series of the next pixel...
>     process pixels by depth and the row by row (like old TVs)
>
> I assume you did this because your underlying array is C contiguous.
> so your ravel('C') is a c-contiguous view (instead of some weird
> strides or a copy)

Sorry - what do you mean by 'this' in 'did this'?  Reshape?   Why
would it matter what my underlying array memory layout was?

> I usually prefer time in the first dimension, and stack order=F, then
> I can start at the front, stack all time periods of the first pixel,
> keep going and work pixels down the columns, first page, next page,
> ...
> (and I hope I have a F-contiguous array, so my raveled array is also
> F-contiguous.)
>
> (note: I'm bringing memory back in as optimization, but not to predict
> the stacking)
>
> Josef
> (I think brains are designed for Fortran order and C-ordering in numpy
> is a accident,
> except, reading a Western language book is neither)

Yes, I find first axis fastest changing easier to think about, and I
came from MATLAB (about 8 years ago mind), so that also made it more
natural.

I had (until yesterday) simply assumed that numpy unraveled that way,
because it seemed more obvious to me, and knew that the unravel index
order need have nothing to do with the memory order, or the fact that
arrays are C contiguous by default.   Not so of course.  That's not my
complaint as you know - it's just a convention, I guessed the
convention wrong.

Cheers,

Matthew


From ivan.oseledets at gmail.com  Sun Mar 31 01:14:37 2013
From: ivan.oseledets at gmail.com (Ivan Oseledets)
Date: Sun, 31 Mar 2013 09:14:37 +0400
Subject: [Numpy-discussion] Indexing bug
Message-ID: <CANSLWcRNkoqr79Yk0nzR5utiZ+msgorbvumAMMi=PTKBvV6faQ@mail.gmail.com>

Message: 2
Date: Sat, 30 Mar 2013 11:13:35 -0700
From: Jaime Fern?ndez del R?o <jaime.frio at gmail.com>
Subject: Re: [Numpy-discussion] Indexing bug?
To: Discussion of Numerical Python <numpy-discussion at scipy.org>
Message-ID:
        <CAPOWHWk+mL6KN6F2FHTPn5HTiU0UEQPj6KdxjNK_+T1E-YRiBg at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets
<ivan.oseledets at gmail.com>wrote:

> I am using numpy 1.6.1,
> and encountered a wierd fancy indexing bug:
>
> import numpy as np
> c = np.random.randn(10,200,10);
>
> In [29]: print c[[0,1],:200,:2].shape
> (2, 200, 2)
>
> In [30]: print c[[0,1],:200,[0,1]].shape
> (2, 200)
>
> It means, that here fancy indexing is not working right for a 3d array.
>

On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets
<ivan.oseledets at gmail.com>wrote:

> I am using numpy 1.6.1,
> and encountered a wierd fancy indexing bug:
>
> import numpy as np
> c = np.random.randn(10,200,10);
>
> In [29]: print c[[0,1],:200,:2].shape
> (2, 200, 2)
>
> In [30]: print c[[0,1],:200,[0,1]].shape
> (2, 200)
>
> It means, that here fancy indexing is not working right for a 3d array.
>
-->
It is working fine, review the docs:

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing

In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1].

If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :,
j] you could use slicing:

 c[:2, :200, :2]

or something more elaborate like:

c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)]

Jaime
--->


Oh!  So it is not a bug, it is a feature, which is completely
incompatible with other array based languages (MATLAB and Fortran). To
me, I can not find a single explanation why it is so in numpy.
Taking submatrices from a matrix is a common operation and the syntax
above is very natural to take submatrices, not a weird diagonal stuff.
i.e.,

c = np.random.randn(100,100)
d = c[[0,3],[2,3]]

should NOT produce two numbers! (and you can not do it using slices!)

In MATLAB and Fortran
c(indi,indj)
will produce a 2 x 2 matrix.
How it can be done in numpy (and why the complications?)

So, please consider this message as a feature request.

Ivan


From josef.pktd at gmail.com  Sun Mar 31 01:38:09 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 31 Mar 2013 01:38:09 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5pcoj0TS_gufpd7KC1jenh=DNoWM26ZfOOsc88zyJH6vA@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>
	<CAMMTP+A5Y3yn6v50KE8JiVZNYVyOZMu4XJZEnBj=x7pBZLA7Bw@mail.gmail.com>
	<CAH6Pt5pcoj0TS_gufpd7KC1jenh=DNoWM26ZfOOsc88zyJH6vA@mail.gmail.com>
Message-ID: <CAMMTP+CFgERMR5HeD=1kP-zhZ_RwCcAV6vYNmNMmuhbqrf4Ccw@mail.gmail.com>

On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd at gmail.com> wrote:
>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>>> <brad.froehle at gmail.com> wrote:
>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>>> >>>>> ordering.
>>>>>>>> >>>>>
>>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>>> >>>>> avoid
>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Proposal
>>>>>>>> >>>>> -------------
>>>>>>>> >>>>>
>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>>> >>>>> in
>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>>> >>>>>
>>>>>>>> >>>>> What do y'all think?
>>>>>>>> >>>>
>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>>> >>>> about
>>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>>> >>
>>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>>> >> memory
>>>>>>>> >>
>>>>>>>>
>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>>
>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>>> rash to assert there is no problem here.
>>>>>>
>>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>>
>>>>> I don't know what you mean by trick question - was there something
>>>>> over-complicated in the example?  I deliberately didn't include
>>>>> various much more confusing examples in "reshape".
>>>>
>>>> I meant making the "candidates" think about memory instead of just
>>>> column versus row stacking.
>>>
>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>>> array, it was an image, with time as the 4th dimension (N time
>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>>> to do in neuroimaging, as you can imagine.
>>>
>>> A student asked what he would get back from raveling this array, a
>>> concatenated time series, or something spatial?
>>>
>>> We showed (I'd worked it out by this time) that the first N values
>>> were the time series given by [0, 0, 0, :].
>>>
>>> He said - "Oh - I see - so the data is stored as a whole lot of time
>>> series one by one, I thought it would be stored as a series of
>>> images'.
>>>
>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>>
>>> So, I think the idea of memory ordering and index ordering is very
>>> easy to confuse, and comes up naturally.
>>>
>>> I would like, as a teacher, to be able to say something like:
>>>
>>> This is what C memory layout is (it's the memory layout  that gives
>>> arr.flags.C_CONTIGUOUS=True)
>>> This is what F memory layout is (it's the memory layout  that gives
>>> arr.flags.F_CONTIGUOUS=True)
>>> It's rather easy to get something that is neither C or F memory layout
>>> Numpy does many memory layouts.
>>> Ravel and reshape and numpy in general do not care (normally) about C
>>> or F layouts, they only care about index ordering.
>>>
>>> My point, that I'm repeating, is that my job is made harder by
>>> 'arr.ravel('F')'.
>>
>> But once you know that ravel and reshape don't care about memory, the
>> ravel is easy to predict (maybe not easy to visualize in 4-D):
>
> But this assumes that you already know that there's such a thing as
> memory layout, and there's such a thing as index ordering, and that
> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
> you're golden.  I'm arguing it's markedly harder to get this
> distinction, and keep it in mind, and teach it, if we are using the
> 'C' and 'F" names for both things.

No, I think you are still missing my point.
I think explaining ravel and reshape F and C is easy (kind of) because the
students don't need to know at that stage about memory layouts.

All they need to know is that we look at n-dimensional objects in
C-order or in  F-order
(whichever index runs fastest)


>
>> order=C: stack the last dimension, N, time series of one 3d pixels,
>> then stack the time series of the next pixel...
>>     process pixels by depth and the row by row (like old TVs)
>>
>> I assume you did this because your underlying array is C contiguous.
>> so your ravel('C') is a c-contiguous view (instead of some weird
>> strides or a copy)
>
> Sorry - what do you mean by 'this' in 'did this'?  Reshape?   Why
> would it matter what my underlying array memory layout was?

`this` was use ravel('C') and have time series as last index.
Because if we have a few gigabytes of video recordings, we better
match the ravel order with the memory order.
I thought you picked time N in the last axis, so you can have
fast access to time series (assuming you didn't specify F-contiguous).
(it's not confusing: we have two orders, index/iterator and memory,
and to get a nice view, the two should match)

rereading: since you had F-ordered memory, ravel('F') gives the nice
view (a picture at a time instead of a timeseries at a time)

>
>> I usually prefer time in the first dimension, and stack order=F, then
>> I can start at the front, stack all time periods of the first pixel,
>> keep going and work pixels down the columns, first page, next page,
>> ...
>> (and I hope I have a F-contiguous array, so my raveled array is also
>> F-contiguous.)
>>
>> (note: I'm bringing memory back in as optimization, but not to predict
>> the stacking)
>>
>> Josef
>> (I think brains are designed for Fortran order and C-ordering in numpy
>> is a accident,
>> except, reading a Western language book is neither)
>
> Yes, I find first axis fastest changing easier to think about, and I
> came from MATLAB (about 8 years ago mind), so that also made it more
> natural.
>
> I had (until yesterday) simply assumed that numpy unraveled that way,
> because it seemed more obvious to me, and knew that the unravel index
> order need have nothing to do with the memory order, or the fact that
> arrays are C contiguous by default.   Not so of course.  That's not my
> complaint as you know - it's just a convention, I guessed the
> convention wrong.

Numpy was written by C developers, and one of the first things I learned
about numpy is the ``order``:
Default is always C     (except for linalg)
and axis=None (except in scipy.stats), and dimensions disappear in reduce

Cheers,

Josef

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From cournape at gmail.com  Sun Mar 31 05:30:35 2013
From: cournape at gmail.com (David Cournapeau)
Date: Sun, 31 Mar 2013 10:30:35 +0100
Subject: [Numpy-discussion] Indexing bug
In-Reply-To: <CANSLWcRNkoqr79Yk0nzR5utiZ+msgorbvumAMMi=PTKBvV6faQ@mail.gmail.com>
References: <CANSLWcRNkoqr79Yk0nzR5utiZ+msgorbvumAMMi=PTKBvV6faQ@mail.gmail.com>
Message-ID: <CAGY4rcXgTyRz2+SvVjn7sxoAP6YrZnKOvrizF6wm=ZQ9YoveYA@mail.gmail.com>

On Sun, Mar 31, 2013 at 6:14 AM, Ivan Oseledets
<ivan.oseledets at gmail.com> wrote:
> Message: 2
> Date: Sat, 30 Mar 2013 11:13:35 -0700
> From: Jaime Fern?ndez del R?o <jaime.frio at gmail.com>
> Subject: Re: [Numpy-discussion] Indexing bug?
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>         <CAPOWHWk+mL6KN6F2FHTPn5HTiU0UEQPj6KdxjNK_+T1E-YRiBg at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets
> <ivan.oseledets at gmail.com>wrote:
>
>> I am using numpy 1.6.1,
>> and encountered a wierd fancy indexing bug:
>>
>> import numpy as np
>> c = np.random.randn(10,200,10);
>>
>> In [29]: print c[[0,1],:200,:2].shape
>> (2, 200, 2)
>>
>> In [30]: print c[[0,1],:200,[0,1]].shape
>> (2, 200)
>>
>> It means, that here fancy indexing is not working right for a 3d array.
>>
>
> On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets
> <ivan.oseledets at gmail.com>wrote:
>
>> I am using numpy 1.6.1,
>> and encountered a wierd fancy indexing bug:
>>
>> import numpy as np
>> c = np.random.randn(10,200,10);
>>
>> In [29]: print c[[0,1],:200,:2].shape
>> (2, 200, 2)
>>
>> In [30]: print c[[0,1],:200,[0,1]].shape
>> (2, 200)
>>
>> It means, that here fancy indexing is not working right for a 3d array.
>>
> -->
> It is working fine, review the docs:
>
> http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
>
> In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1].
>
> If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :,
> j] you could use slicing:
>
>  c[:2, :200, :2]
>
> or something more elaborate like:
>
> c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)]
>
> Jaime
> --->
>
>
> Oh!  So it is not a bug, it is a feature, which is completely
> incompatible with other array based languages (MATLAB and Fortran). To
> me, I can not find a single explanation why it is so in numpy.
> Taking submatrices from a matrix is a common operation and the syntax
> above is very natural to take submatrices, not a weird diagonal stuff.

It is not a weird diagonal stuff, but a well define operation: when
you use fancy indexing, the indexing numbers become coordinate (

> i.e.,
>
> c = np.random.randn(100,100)
> d = c[[0,3],[2,3]]
>
> should NOT produce two numbers! (and you can not do it using slices!)
>
> In MATLAB and Fortran
> c(indi,indj)
> will produce a 2 x 2 matrix.
> How it can be done in numpy (and why the complications?)

in your example, it is simple enough:

c[[0, 3], 2:4] (return the first row limited to columns 3, 4, and the
4th row limiter to columns 3, 4).

Numpy's syntax is' biased' toward fancy indexing, and you need more
typing if you want to extract 'irregular' submatrices. Matlab has a
different tradeoff (extracting irregular sub-matrices is sligthly
easier, but selecting a few points is harder as you need sub2index to
use linear indexing).

David


From matthew.brett at gmail.com  Sun Mar 31 15:54:29 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sun, 31 Mar 2013 12:54:29 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+CFgERMR5HeD=1kP-zhZ_RwCcAV6vYNmNMmuhbqrf4Ccw@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>
	<CAMMTP+A5Y3yn6v50KE8JiVZNYVyOZMu4XJZEnBj=x7pBZLA7Bw@mail.gmail.com>
	<CAH6Pt5pcoj0TS_gufpd7KC1jenh=DNoWM26ZfOOsc88zyJH6vA@mail.gmail.com>
	<CAMMTP+CFgERMR5HeD=1kP-zhZ_RwCcAV6vYNmNMmuhbqrf4Ccw@mail.gmail.com>
Message-ID: <CAH6Pt5qnzVQX3i_qG71JaMprtkpbnP-yS6ZgVhUhEzwnbWFGGw@mail.gmail.com>

Hi,

On Sat, Mar 30, 2013 at 10:38 PM,  <josef.pktd at gmail.com> wrote:
> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd at gmail.com> wrote:
>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>>>> <brad.froehle at gmail.com> wrote:
>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>>>> >>>>> ordering.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>>>> >>>>> avoid
>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Proposal
>>>>>>>>> >>>>> -------------
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>>>> >>>>> in
>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> What do y'all think?
>>>>>>>>> >>>>
>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>>>> >>>> about
>>>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>>>> >>
>>>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>>>> >> memory
>>>>>>>>> >>
>>>>>>>>>
>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>>>
>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>>>> rash to assert there is no problem here.
>>>>>>>
>>>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>>>
>>>>>> I don't know what you mean by trick question - was there something
>>>>>> over-complicated in the example?  I deliberately didn't include
>>>>>> various much more confusing examples in "reshape".
>>>>>
>>>>> I meant making the "candidates" think about memory instead of just
>>>>> column versus row stacking.
>>>>
>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>>>> array, it was an image, with time as the 4th dimension (N time
>>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>>>> to do in neuroimaging, as you can imagine.
>>>>
>>>> A student asked what he would get back from raveling this array, a
>>>> concatenated time series, or something spatial?
>>>>
>>>> We showed (I'd worked it out by this time) that the first N values
>>>> were the time series given by [0, 0, 0, :].
>>>>
>>>> He said - "Oh - I see - so the data is stored as a whole lot of time
>>>> series one by one, I thought it would be stored as a series of
>>>> images'.
>>>>
>>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>>>
>>>> So, I think the idea of memory ordering and index ordering is very
>>>> easy to confuse, and comes up naturally.
>>>>
>>>> I would like, as a teacher, to be able to say something like:
>>>>
>>>> This is what C memory layout is (it's the memory layout  that gives
>>>> arr.flags.C_CONTIGUOUS=True)
>>>> This is what F memory layout is (it's the memory layout  that gives
>>>> arr.flags.F_CONTIGUOUS=True)
>>>> It's rather easy to get something that is neither C or F memory layout
>>>> Numpy does many memory layouts.
>>>> Ravel and reshape and numpy in general do not care (normally) about C
>>>> or F layouts, they only care about index ordering.
>>>>
>>>> My point, that I'm repeating, is that my job is made harder by
>>>> 'arr.ravel('F')'.
>>>
>>> But once you know that ravel and reshape don't care about memory, the
>>> ravel is easy to predict (maybe not easy to visualize in 4-D):
>>
>> But this assumes that you already know that there's such a thing as
>> memory layout, and there's such a thing as index ordering, and that
>> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
>> you're golden.  I'm arguing it's markedly harder to get this
>> distinction, and keep it in mind, and teach it, if we are using the
>> 'C' and 'F" names for both things.
>
> No, I think you are still missing my point.
> I think explaining ravel and reshape F and C is easy (kind of) because the
> students don't need to know at that stage about memory layouts.
>
> All they need to know is that we look at n-dimensional objects in
> C-order or in  F-order
> (whichever index runs fastest)

Would you accept that it may or may not be true that it is desirable
or practical not to mention memory layouts when teaching numpy?

You believe it is desirable, I believe that it is not - that teaching
numpy naturally involves some discussion of memory layout.

As evidence:

* My student, without any prompting about memory layouts, is asking about it
* Travis' numpy book has a very early section on this (section 2.3 -
memory layout)
* I often think about memory layouts, and from your discussion, you do
too.  It's uncommon that you don't have to teach something that
experienced users think about often.
* The most common use of 'order' only refers to memory layout.  For
example np.array "order" doesn't refer to index ordering but to memory
layout.
* The current docstring of 'reshape' cannot be explained without
referring to memory order.

Cheers,

Matthew


From josef.pktd at gmail.com  Sun Mar 31 16:43:36 2013
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 31 Mar 2013 16:43:36 -0400
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAH6Pt5qnzVQX3i_qG71JaMprtkpbnP-yS6ZgVhUhEzwnbWFGGw@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>
	<CAMMTP+A5Y3yn6v50KE8JiVZNYVyOZMu4XJZEnBj=x7pBZLA7Bw@mail.gmail.com>
	<CAH6Pt5pcoj0TS_gufpd7KC1jenh=DNoWM26ZfOOsc88zyJH6vA@mail.gmail.com>
	<CAMMTP+CFgERMR5HeD=1kP-zhZ_RwCcAV6vYNmNMmuhbqrf4Ccw@mail.gmail.com>
	<CAH6Pt5qnzVQX3i_qG71JaMprtkpbnP-yS6ZgVhUhEzwnbWFGGw@mail.gmail.com>
Message-ID: <CAMMTP+C8T8rotRYeC1ByGd3-ETFVJ+3YGyc1ZjV8uYDnV7LaaA@mail.gmail.com>

On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 10:38 PM,  <josef.pktd at gmail.com> wrote:
>> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>> Hi,
>>>
>>> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd at gmail.com> wrote:
>>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>>>>> <brad.froehle at gmail.com> wrote:
>>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>>>>> >>>>> ordering.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>>>>> >>>>> avoid
>>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Proposal
>>>>>>>>>> >>>>> -------------
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>>>>> >>>>> in
>>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> What do y'all think?
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>>>>> >>>> about
>>>>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>>>>> >>
>>>>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>>>>> >> memory
>>>>>>>>>> >>
>>>>>>>>>>
>>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>>>>
>>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>>>>> rash to assert there is no problem here.
>>>>>>>>
>>>>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>>>>
>>>>>>> I don't know what you mean by trick question - was there something
>>>>>>> over-complicated in the example?  I deliberately didn't include
>>>>>>> various much more confusing examples in "reshape".
>>>>>>
>>>>>> I meant making the "candidates" think about memory instead of just
>>>>>> column versus row stacking.
>>>>>
>>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>>>>> array, it was an image, with time as the 4th dimension (N time
>>>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>>>>> to do in neuroimaging, as you can imagine.
>>>>>
>>>>> A student asked what he would get back from raveling this array, a
>>>>> concatenated time series, or something spatial?
>>>>>
>>>>> We showed (I'd worked it out by this time) that the first N values
>>>>> were the time series given by [0, 0, 0, :].
>>>>>
>>>>> He said - "Oh - I see - so the data is stored as a whole lot of time
>>>>> series one by one, I thought it would be stored as a series of
>>>>> images'.
>>>>>
>>>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>>>>
>>>>> So, I think the idea of memory ordering and index ordering is very
>>>>> easy to confuse, and comes up naturally.
>>>>>
>>>>> I would like, as a teacher, to be able to say something like:
>>>>>
>>>>> This is what C memory layout is (it's the memory layout  that gives
>>>>> arr.flags.C_CONTIGUOUS=True)
>>>>> This is what F memory layout is (it's the memory layout  that gives
>>>>> arr.flags.F_CONTIGUOUS=True)
>>>>> It's rather easy to get something that is neither C or F memory layout
>>>>> Numpy does many memory layouts.
>>>>> Ravel and reshape and numpy in general do not care (normally) about C
>>>>> or F layouts, they only care about index ordering.
>>>>>
>>>>> My point, that I'm repeating, is that my job is made harder by
>>>>> 'arr.ravel('F')'.
>>>>
>>>> But once you know that ravel and reshape don't care about memory, the
>>>> ravel is easy to predict (maybe not easy to visualize in 4-D):
>>>
>>> But this assumes that you already know that there's such a thing as
>>> memory layout, and there's such a thing as index ordering, and that
>>> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
>>> you're golden.  I'm arguing it's markedly harder to get this
>>> distinction, and keep it in mind, and teach it, if we are using the
>>> 'C' and 'F" names for both things.
>>
>> No, I think you are still missing my point.
>> I think explaining ravel and reshape F and C is easy (kind of) because the
>> students don't need to know at that stage about memory layouts.
>>
>> All they need to know is that we look at n-dimensional objects in
>> C-order or in  F-order
>> (whichever index runs fastest)
>
> Would you accept that it may or may not be true that it is desirable
> or practical not to mention memory layouts when teaching numpy?

I think they should be in two different sections.

basic usage:
ravel, reshape in pure index order, and indexing, broadcasting, ...

advanced usage:
memory layout and some ability to predict when you get a view and
when you get a copy.

And I still think words can mean different things in different context
(with a qualifier maybe)
indexing in fortran order
memory in fortran order

Disclaimer: I never tried to teach numpy
and with GSOC students my explanations only went a little bit
beyond what they needed to know for the purpose at hand (I hope)

>
> You believe it is desirable, I believe that it is not - that teaching
> numpy naturally involves some discussion of memory layout.
>
> As evidence:
>
> * My student, without any prompting about memory layouts, is asking about it
> * Travis' numpy book has a very early section on this (section 2.3 -
> memory layout)
> * I often think about memory layouts, and from your discussion, you do
> too.  It's uncommon that you don't have to teach something that
> experienced users think about often.

I'm mentioning memory layout because I'm talking to you.
I wouldn't talk about memory layout if I would try to explain ravel,
reshape and indexing for the first time to a student.

> * The most common use of 'order' only refers to memory layout.  For
> example np.array "order" doesn't refer to index ordering but to memory
> layout.

No, as I tried to show with the statsmodels example.
I don't require GSOC students (that are relatively new to numpy) to understand
much about memory layout.
The only use of ``order`` in statsmodels refers to *index* order in
ravel and reshape.

> * The current docstring of 'reshape' cannot be explained without
> referring to memory order.

really ?
I thought reshape only refers to *index* order for "F" and "C"

I don't think I can express my preference for reshape order="F" any
better than I did, so maybe it's time for some additional users/developers
to chime in.

Josef

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From ralf.gommers at gmail.com  Sun Mar 31 17:03:05 2013
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 31 Mar 2013 23:03:05 +0200
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+C8T8rotRYeC1ByGd3-ETFVJ+3YGyc1ZjV8uYDnV7LaaA@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>
	<CAMMTP+A5Y3yn6v50KE8JiVZNYVyOZMu4XJZEnBj=x7pBZLA7Bw@mail.gmail.com>
	<CAH6Pt5pcoj0TS_gufpd7KC1jenh=DNoWM26ZfOOsc88zyJH6vA@mail.gmail.com>
	<CAMMTP+CFgERMR5HeD=1kP-zhZ_RwCcAV6vYNmNMmuhbqrf4Ccw@mail.gmail.com>
	<CAH6Pt5qnzVQX3i_qG71JaMprtkpbnP-yS6ZgVhUhEzwnbWFGGw@mail.gmail.com>
	<CAMMTP+C8T8rotRYeC1ByGd3-ETFVJ+3YGyc1ZjV8uYDnV7LaaA@mail.gmail.com>
Message-ID: <CABL7CQj+m-HB4ve_ZWhGDA10_WfOvhFffeKND2DiwCpkFwhc1A@mail.gmail.com>

On Sun, Mar 31, 2013 at 10:43 PM, <josef.pktd at gmail.com> wrote:

> On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
> > Hi,
> >
> > On Sat, Mar 30, 2013 at 10:38 PM,  <josef.pktd at gmail.com> wrote:
> >> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <
> matthew.brett at gmail.com> wrote:
> >>> Hi,
> >>>
> >>> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd at gmail.com> wrote:
> >>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <
> matthew.brett at gmail.com> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
> >>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <
> matthew.brett at gmail.com> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
> >>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
> >>>>>>>> <brad.froehle at gmail.com> wrote:
> >>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <
> matthew.brett at gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
> >>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com>
> wrote:
> >>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
> >>>>>>>>>> >> <matthew.brett at gmail.com> wrote:
> >>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com>
> wrote:
> >>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
> >>>>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense
> of index
> >>>>>>>>>> >>>>> ordering.
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> This is very confusing.  We think the index ordering and
> memory
> >>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we
> should
> >>>>>>>>>> >>>>> avoid
> >>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> Proposal
> >>>>>>>>>> >>>>> -------------
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and
> forwards
> >>>>>>>>>> >>>>> index ordering for ravel, reshape
> >>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of
> unraveling
> >>>>>>>>>> >>>>> in
> >>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively
> (excellent
> >>>>>>>>>> >>>>> naming idea by Paul Ivanov)
> >>>>>>>>>> >>>>>
> >>>>>>>>>> >>>>> What do y'all think?
> >>>>>>>>>> >>>>
> >>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I
> always thought
> >>>>>>>>>> >>>> about
> >>>>>>>>>> >>>> the content and never about the memory when using it.
> >>>>>>>>>> >>
> >>>>>>>>>> >> changing the names doesn't make it easier to understand.
> >>>>>>>>>> >> I think the confusion is because the new A and K refer to
> existing
> >>>>>>>>>> >> memory
> >>>>>>>>>> >>
> >>>>>>>>>>
> >>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and
> that is
> >>>>>>>>>> that four out of four of us tested ourselves and got it wrong.
> >>>>>>>>>>
> >>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I
> think it's
> >>>>>>>>>> rash to assert there is no problem here.
> >>>>>>>>
> >>>>>>>> I think you are overcomplicating things or phrased it as a "trick
> question"
> >>>>>>>
> >>>>>>> I don't know what you mean by trick question - was there something
> >>>>>>> over-complicated in the example?  I deliberately didn't include
> >>>>>>> various much more confusing examples in "reshape".
> >>>>>>
> >>>>>> I meant making the "candidates" think about memory instead of just
> >>>>>> column versus row stacking.
> >>>>>
> >>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
> >>>>> array, it was an image, with time as the 4th dimension (N time
> >>>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
> >>>>> to do in neuroimaging, as you can imagine.
> >>>>>
> >>>>> A student asked what he would get back from raveling this array, a
> >>>>> concatenated time series, or something spatial?
> >>>>>
> >>>>> We showed (I'd worked it out by this time) that the first N values
> >>>>> were the time series given by [0, 0, 0, :].
> >>>>>
> >>>>> He said - "Oh - I see - so the data is stored as a whole lot of time
> >>>>> series one by one, I thought it would be stored as a series of
> >>>>> images'.
> >>>>>
> >>>>> Ironically, this was a Fortran-ordered array in memory, and he was
> wrong.
> >>>>>
> >>>>> So, I think the idea of memory ordering and index ordering is very
> >>>>> easy to confuse, and comes up naturally.
> >>>>>
> >>>>> I would like, as a teacher, to be able to say something like:
> >>>>>
> >>>>> This is what C memory layout is (it's the memory layout  that gives
> >>>>> arr.flags.C_CONTIGUOUS=True)
> >>>>> This is what F memory layout is (it's the memory layout  that gives
> >>>>> arr.flags.F_CONTIGUOUS=True)
> >>>>> It's rather easy to get something that is neither C or F memory
> layout
> >>>>> Numpy does many memory layouts.
> >>>>> Ravel and reshape and numpy in general do not care (normally) about C
> >>>>> or F layouts, they only care about index ordering.
> >>>>>
> >>>>> My point, that I'm repeating, is that my job is made harder by
> >>>>> 'arr.ravel('F')'.
> >>>>
> >>>> But once you know that ravel and reshape don't care about memory, the
> >>>> ravel is easy to predict (maybe not easy to visualize in 4-D):
> >>>
> >>> But this assumes that you already know that there's such a thing as
> >>> memory layout, and there's such a thing as index ordering, and that
> >>> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
> >>> you're golden.  I'm arguing it's markedly harder to get this
> >>> distinction, and keep it in mind, and teach it, if we are using the
> >>> 'C' and 'F" names for both things.
> >>
> >> No, I think you are still missing my point.
> >> I think explaining ravel and reshape F and C is easy (kind of) because
> the
> >> students don't need to know at that stage about memory layouts.
> >>
> >> All they need to know is that we look at n-dimensional objects in
> >> C-order or in  F-order
> >> (whichever index runs fastest)
> >
> > Would you accept that it may or may not be true that it is desirable
> > or practical not to mention memory layouts when teaching numpy?
>
> I think they should be in two different sections.
>
> basic usage:
> ravel, reshape in pure index order, and indexing, broadcasting, ...
>
> advanced usage:
> memory layout and some ability to predict when you get a view and
> when you get a copy.
>
> And I still think words can mean different things in different context
> (with a qualifier maybe)
> indexing in fortran order
> memory in fortran order
>
> Disclaimer: I never tried to teach numpy
> and with GSOC students my explanations only went a little bit
> beyond what they needed to know for the purpose at hand (I hope)
>
> >
> > You believe it is desirable, I believe that it is not - that teaching
> > numpy naturally involves some discussion of memory layout.
> >
> > As evidence:
> >
> > * My student, without any prompting about memory layouts, is asking
> about it
> > * Travis' numpy book has a very early section on this (section 2.3 -
> > memory layout)
> > * I often think about memory layouts, and from your discussion, you do
> > too.  It's uncommon that you don't have to teach something that
> > experienced users think about often.
>
> I'm mentioning memory layout because I'm talking to you.
> I wouldn't talk about memory layout if I would try to explain ravel,
> reshape and indexing for the first time to a student.
>
> > * The most common use of 'order' only refers to memory layout.  For
> > example np.array "order" doesn't refer to index ordering but to memory
> > layout.
>
> No, as I tried to show with the statsmodels example.
> I don't require GSOC students (that are relatively new to numpy) to
> understand
> much about memory layout.
> The only use of ``order`` in statsmodels refers to *index* order in
> ravel and reshape.
>
> > * The current docstring of 'reshape' cannot be explained without
> > referring to memory order.
>
> really ?
> I thought reshape only refers to *index* order for "F" and "C"
>
> I don't think I can express my preference for reshape order="F" any
> better than I did, so maybe it's time for some additional users/developers
> to chime in.


My 2cents: while I can't go back and un-read earlier emails in this thread,
I don't see what's ambiguous in the case of ravel. For reshape I can see
though that it's possible to interpret it in two ways. In such cases I open
up IPython and play with a 2x3 array to check my understanding. That's OK,
and certainly better than adding duplicate names now for C/F even if that
would solve the issue (which it probably wouldn't). Therefore I'm -1 on the
initial proposal.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130331/a40076f0/attachment.html>

From matthew.brett at gmail.com  Sun Mar 31 17:04:46 2013
From: matthew.brett at gmail.com (Matthew Brett)
Date: Sun, 31 Mar 2013 14:04:46 -0700
Subject: [Numpy-discussion] Raveling,
 reshape order keyword unnecessarily confuses index and memory
 ordering
In-Reply-To: <CAMMTP+C8T8rotRYeC1ByGd3-ETFVJ+3YGyc1ZjV8uYDnV7LaaA@mail.gmail.com>
References: <CAH6Pt5qQTdPy_pfc8-wfYQzHYEvSuO_6Ttej+Oo4pEP2z2czfw@mail.gmail.com>
	<CAH6Pt5pgCFDQhWti7mMeh1QcW4tNGRE02t9rELY7kk55McfnMA@mail.gmail.com>
	<CAMMTP+BT59cOv7BQh0chKGb1DCTwUgnbvr4yHJn0aor9zAQQvw@mail.gmail.com>
	<CAH6Pt5oF_bSZZ-Ab2mSzTJ6Bm7D79o-3jEp5L4+2a1fVjZxMwA@mail.gmail.com>
	<CAMMTP+DgMXqOQgPMJUdApakkS63cNwUb0BPKYXbp0pkL1aExCQ@mail.gmail.com>
	<CAMMTP+CEgbi+r3--WYMjdYjKFaf5PKAQSpuOj=J2E=S2-MNPSw@mail.gmail.com>
	<CAH6Pt5oCyGxsxSqi2tkjJcYBnc=CesKyR+sHj3JZDfMbct9Juw@mail.gmail.com>
	<CAHXv-MgvuU+K_bgagv5ogw1XcecZ2-qRPUP0tAqPEG5Gc=zD1A@mail.gmail.com>
	<CAMMTP+BKUr+BgLF+ajs25nDUr=-ZV1YY0sPzc8kn+8ZcvG++zg@mail.gmail.com>
	<CAH6Pt5ov=Uy0pm7g2k1SMQD=BSRSuKzgTmEnqtxO9Jca1wM=6A@mail.gmail.com>
	<CAMMTP+Cswu=Rha=De6Q30529wz5iiTc9Mdm0AYopncb0d9ypig@mail.gmail.com>
	<CAH6Pt5pPe+w8DF8GAy=1bG8ZzxMZzLAGkpxoytp7dC9YffKvmA@mail.gmail.com>
	<CAMMTP+A5Y3yn6v50KE8JiVZNYVyOZMu4XJZEnBj=x7pBZLA7Bw@mail.gmail.com>
	<CAH6Pt5pcoj0TS_gufpd7KC1jenh=DNoWM26ZfOOsc88zyJH6vA@mail.gmail.com>
	<CAMMTP+CFgERMR5HeD=1kP-zhZ_RwCcAV6vYNmNMmuhbqrf4Ccw@mail.gmail.com>
	<CAH6Pt5qnzVQX3i_qG71JaMprtkpbnP-yS6ZgVhUhEzwnbWFGGw@mail.gmail.com>
	<CAMMTP+C8T8rotRYeC1ByGd3-ETFVJ+3YGyc1ZjV8uYDnV7LaaA@mail.gmail.com>
Message-ID: <CAH6Pt5pbDgdihUWBVDdM3p5uXFJdHeS=9C4VQ+V8cB_gt+X4+g@mail.gmail.com>

Hi,

On Sun, Mar 31, 2013 at 1:43 PM,  <josef.pktd at gmail.com> wrote:
> On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 10:38 PM,  <josef.pktd at gmail.com> wrote:
>>> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Sat, Mar 30, 2013 at 9:37 PM,  <josef.pktd at gmail.com> wrote:
>>>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Sat, Mar 30, 2013 at 7:02 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>>>>>>>>> <brad.froehle at gmail.com> wrote:
>>>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>>>>>>>>> >> <matthew.brett at gmail.com> wrote:
>>>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>>>>>>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>>>>>>>>> >>>>> ordering.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>>>>>>>>> >>>>> avoid
>>>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Proposal
>>>>>>>>>>> >>>>> -------------
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>>>>>>>>> >>>>> index ordering for ravel, reshape
>>>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>>>>>>>>> >>>>> in
>>>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>>>>>>>>> >>>>> naming idea by Paul Ivanov)
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> What do y'all think?
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>>>>>>>>> >>>> about
>>>>>>>>>>> >>>> the content and never about the memory when using it.
>>>>>>>>>>> >>
>>>>>>>>>>> >> changing the names doesn't make it easier to understand.
>>>>>>>>>>> >> I think the confusion is because the new A and K refer to existing
>>>>>>>>>>> >> memory
>>>>>>>>>>> >>
>>>>>>>>>>>
>>>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>>>>>>>>> that four out of four of us tested ourselves and got it wrong.
>>>>>>>>>>>
>>>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>>>>>>>>> rash to assert there is no problem here.
>>>>>>>>>
>>>>>>>>> I think you are overcomplicating things or phrased it as a "trick question"
>>>>>>>>
>>>>>>>> I don't know what you mean by trick question - was there something
>>>>>>>> over-complicated in the example?  I deliberately didn't include
>>>>>>>> various much more confusing examples in "reshape".
>>>>>>>
>>>>>>> I meant making the "candidates" think about memory instead of just
>>>>>>> column versus row stacking.
>>>>>>
>>>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D
>>>>>> array, it was an image, with time as the 4th dimension (N time
>>>>>> points).   Raveling and reshaping 3D and 4D arrays is a common thing
>>>>>> to do in neuroimaging, as you can imagine.
>>>>>>
>>>>>> A student asked what he would get back from raveling this array, a
>>>>>> concatenated time series, or something spatial?
>>>>>>
>>>>>> We showed (I'd worked it out by this time) that the first N values
>>>>>> were the time series given by [0, 0, 0, :].
>>>>>>
>>>>>> He said - "Oh - I see - so the data is stored as a whole lot of time
>>>>>> series one by one, I thought it would be stored as a series of
>>>>>> images'.
>>>>>>
>>>>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong.
>>>>>>
>>>>>> So, I think the idea of memory ordering and index ordering is very
>>>>>> easy to confuse, and comes up naturally.
>>>>>>
>>>>>> I would like, as a teacher, to be able to say something like:
>>>>>>
>>>>>> This is what C memory layout is (it's the memory layout  that gives
>>>>>> arr.flags.C_CONTIGUOUS=True)
>>>>>> This is what F memory layout is (it's the memory layout  that gives
>>>>>> arr.flags.F_CONTIGUOUS=True)
>>>>>> It's rather easy to get something that is neither C or F memory layout
>>>>>> Numpy does many memory layouts.
>>>>>> Ravel and reshape and numpy in general do not care (normally) about C
>>>>>> or F layouts, they only care about index ordering.
>>>>>>
>>>>>> My point, that I'm repeating, is that my job is made harder by
>>>>>> 'arr.ravel('F')'.
>>>>>
>>>>> But once you know that ravel and reshape don't care about memory, the
>>>>> ravel is easy to predict (maybe not easy to visualize in 4-D):
>>>>
>>>> But this assumes that you already know that there's such a thing as
>>>> memory layout, and there's such a thing as index ordering, and that
>>>> 'C' and 'F' in ravel refer to index ordering.  Once you have that,
>>>> you're golden.  I'm arguing it's markedly harder to get this
>>>> distinction, and keep it in mind, and teach it, if we are using the
>>>> 'C' and 'F" names for both things.
>>>
>>> No, I think you are still missing my point.
>>> I think explaining ravel and reshape F and C is easy (kind of) because the
>>> students don't need to know at that stage about memory layouts.
>>>
>>> All they need to know is that we look at n-dimensional objects in
>>> C-order or in  F-order
>>> (whichever index runs fastest)
>>
>> Would you accept that it may or may not be true that it is desirable
>> or practical not to mention memory layouts when teaching numpy?
>
> I think they should be in two different sections.
>
> basic usage:
> ravel, reshape in pure index order, and indexing, broadcasting, ...
>
> advanced usage:
> memory layout and some ability to predict when you get a view and
> when you get a copy.

Right - that is what you think - but I was asking - do you agree that
it's possible that that is not best way to teach it?

What evidence would you give that it was the best way to teach it?

> And I still think words can mean different things in different context
> (with a qualifier maybe)
> indexing in fortran order
> memory in fortran order

Right - but you'd probably also accept that using the same word for
different and related things is likely to cause confusion?   I'm sure
we could come up with some experimental evidence for that if you do
doubt it.

> Disclaimer: I never tried to teach numpy
> and with GSOC students my explanations only went a little bit
> beyond what they needed to know for the purpose at hand (I hope)
>
>>
>> You believe it is desirable, I believe that it is not - that teaching
>> numpy naturally involves some discussion of memory layout.
>>
>> As evidence:
>>
>> * My student, without any prompting about memory layouts, is asking about it
>> * Travis' numpy book has a very early section on this (section 2.3 -
>> memory layout)
>> * I often think about memory layouts, and from your discussion, you do
>> too.  It's uncommon that you don't have to teach something that
>> experienced users think about often.
>
> I'm mentioning memory layout because I'm talking to you.
> I wouldn't talk about memory layout if I would try to explain ravel,
> reshape and indexing for the first time to a student.
>
>> * The most common use of 'order' only refers to memory layout.  For
>> example np.array "order" doesn't refer to index ordering but to memory
>> layout.
>
> No, as I tried to show with the statsmodels example.
> I don't require GSOC students (that are relatively new to numpy) to understand
> much about memory layout.
> The only use of ``order`` in statsmodels refers to *index* order in
> ravel and reshape.
>
>> * The current docstring of 'reshape' cannot be explained without
>> referring to memory order.
>
> really ?
> I thought reshape only refers to *index* order for "F" and "C"

Here's the docstring for 'reshape':

order : {'C', 'F', 'A'}, optional
    Determines whether the array data should be viewed as in C
    (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN
    order should be preserved.

The 'A' option cannot be explained without reference to 'C' or 'F'
*memory* layout - i.e. a different meaning of the 'C' and 'F" in the
indexing interpretation.

Actually, as a matter of interest - how would you explain the behavior
of 'A' when the array is neither 'C' or 'F' memory layout?  Maybe that
could be a good test case?

Here's the docstring for 'ravel':

order : {'C','F', 'A', 'K'}, optional
    The elements of ``a`` are read in this order. 'C' means to view
    the elements in C (row-major) order. 'F' means to view the elements
    in Fortran (column-major) order. 'A' means to view the elements
    in 'F' order if a is Fortran contiguous, 'C' order otherwise.
    'K' means to view the elements in the order they occur in memory,
    except for reversing the data when strides are negative.
    By default, 'C' order is used.

Cheers,

Matthew