From ralf.gommers at googlemail.com  Mon Jan  2 15:21:14 2012
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Mon, 2 Jan 2012 21:21:14 +0100
Subject: [Numpy-discussion] GSOC
In-Reply-To: <CAB=suEmMyBLgY9uVCd1pFEyKtx7REqU3xqGDoCVCYtWhm8n8VA@mail.gmail.com>
References: <CAB6mnxLYF65T+hwYJSm6-PYOeMitUM6qZcM2WjgjvTsa_yL9ZA@mail.gmail.com>
	<CABL7CQhV5kd7JFrYswK=neDqpdd4qWEDm+FqbnZdZgXf7HMzZw@mail.gmail.com>
	<CALGmxEJX0XVNSJqenyEqCsr8w1qeU4ORKLPcjb2SL+w286t2BA@mail.gmail.com>
	<CAB=suEmMyBLgY9uVCd1pFEyKtx7REqU3xqGDoCVCYtWhm8n8VA@mail.gmail.com>
Message-ID: <CABL7CQj0NM2s8TzGcN_kWxs3U1pS6TA79pRpUor4y-CwPi22iA@mail.gmail.com>

On Sat, Dec 31, 2011 at 6:43 AM, Jaidev Deshpande <
deshpande.jaidev at gmail.com> wrote:

> Hi Chris
>
> > Documentation is specificsly excluded from GSoC (at least it was a
> > couple years ago when I last was involved)
>
> Documentation wasn't excluded last year from GSoC, there were quite a
> few projects that required a lot of documentation.
> But yes, there was no "documentation only" project.
>
> Anyhow, it seems reasonable that testing alone can't be a project.
> What about benchmarking and the related statistics? Does that qualify
> as a worthwhile project (again, GSoC or otherwise)?
>
> That's certainly worth doing, and doing well. You could start with
investigating what Wes has done with vbench so far, and look at how to get
the output of that into http://speed.pypy.org/.

I have the feeling it's not enough work for a GSoC project though, and with
a project like starting scikits.signal you'd have a better chance.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120102/622135d8/attachment.html>

From charlesr.harris at gmail.com  Mon Jan  2 21:44:02 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 2 Jan 2012 19:44:02 -0700
Subject: [Numpy-discussion] polynomial package update
Message-ID: <CAB6mnxJ1O+pxo7EWfdiBxy3_s-0Zqk=dyRhN1ppJ87yZ-5L3UA@mail.gmail.com>

Hi All,

I've made a pull request for a  rather large update of the polynomial
package. The new features are

1) Bug fixes
2) Improved documentation in the numpy reference
3) Preliminary support for multi-dimensional coefficient arrays
4) Support for NA in the fitting routines
5) Improved testing and test coverage
6) Gauss quadrature
7) Weight functions
8) (Mostly) Symmetrized companion matrices
9) Add cast and basis as static functions of convenience classes
10) Remove deprecated import from package *init*.py

If anyone has an interest in that package, please take some time and review
it here <https://github.com/numpy/numpy/pull/181>.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120102/bbfefc21/attachment.html>

From josef.pktd at gmail.com  Tue Jan  3 00:46:16 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 3 Jan 2012 00:46:16 -0500
Subject: [Numpy-discussion] polynomial package update
In-Reply-To: <CAB6mnxJ1O+pxo7EWfdiBxy3_s-0Zqk=dyRhN1ppJ87yZ-5L3UA@mail.gmail.com>
References: <CAB6mnxJ1O+pxo7EWfdiBxy3_s-0Zqk=dyRhN1ppJ87yZ-5L3UA@mail.gmail.com>
Message-ID: <CAMMTP+C5BdDag3vQoC6HK7zgGdAMo0AYB6auh63jjFSugUB-wA@mail.gmail.com>

On Mon, Jan 2, 2012 at 9:44 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Hi All,
>
> I've made a pull request for a? rather large update of the polynomial
> package. The new features are
>
> 1) Bug fixes
> 2) Improved documentation in the numpy reference
> 3) Preliminary support for multi-dimensional coefficient arrays
> 4) Support for NA in the fitting routines
> 5) Improved testing and test coverage
> 6) Gauss quadrature
> 7) Weight functions
> 8) (Mostly) Symmetrized companion matrices
> 9) Add cast and basis as static functions of convenience classes
> 10) Remove deprecated import from package init.py
>
> If anyone has an interest in that package, please take some time and review
> it here.

(Since I'm not setup for compiling numpy I cannot try it out. Just
some spotty reading of the code.)

The two things I'm most interested in are the 2d, 3d enhancements and
the quadrature.

What's the return of the 2d vander functions?

If I read it correctly, it's:

>>> xyn = np.array([['x^%d*y^%d'%(px,py) for py in range(5)] for px in range(3)])
>>> xyn
array([['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4'],
       ['x^1*y^0', 'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4'],
       ['x^2*y^0', 'x^2*y^1', 'x^2*y^2', 'x^2*y^3', 'x^2*y^4']],
      dtype='|S7')
>>> xyn.ravel()
array(['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4', 'x^1*y^0',
       'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4', 'x^2*y^0', 'x^2*y^1',
       'x^2*y^2', 'x^2*y^3', 'x^2*y^4'],
      dtype='|S7')

Are the normalization constants available in explicit form to get an
orthonormal basis?
The test_100 look like good recipes for getting the normalization and
the integration constants.

Are the quads weights and points the same as in scipy.special (up to
floating point differences)?

Looks very useful and I'm looking forward to trying it out, and I will
borrow some code like test_100 as recipes.
(For densities, I still need mostly orthonormal basis and integration
normalized to 1.)

Josef


>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From chaoyuejoy at gmail.com  Tue Jan  3 04:01:39 2012
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Tue, 3 Jan 2012 10:01:39 +0100
Subject: [Numpy-discussion] strange nan in np.ma.average
Message-ID: <CAAN-aREFXDczGx4cQBxbKTOB9t-Xk04da5eviLKjKg-R49Ja4A@mail.gmail.com>

Dear all numpy users,

I have 10 90X720 arrays. let's say they are in a list 'a' with each element
a 90X720 numpy masked array.
then I create a new empty ndarray: data

data=np.empty([10,90,720])

##first I store all the 10 ndarray in a 10X90X720 array:
for i,d in enumerate(a):
      data[i]=a

data.shape=(10, 90, 720)
then I use data_av=np.ma.average(data, axis=0) to get the average.


The strange thing is, I don't have any 'nan' in all the 10 90X720 array,
but I have nan value in the final data_av.
how does this come?


In [26]: np.nonzero(np.isnan(data_av))
Out[26]:
(array([ 0,  0,  2,  2,  3,  5,  5,  6,  6,  6,  6,  7,  8,  8,  8,  9, 10,
       10, 10, 11, 11, 12, 13, 13, 14, 17, 17, 19, 22, 22, 44, 63, 64, 64,
       67, 68, 71, 72, 73, 76, 77, 77, 78, 79, 80, 80, 81, 82, 82, 84, 85,
       85, 86, 86, 87, 87, 88, 89, 89, 89]),
 array([159, 541, 497, 548,  90,  97, 170, 244, 267, 587, 590, 150, 126,
       168, 477, 240, 271, 277, 588,  99, 179, 528,  52, 256, 230, 109,
       190, 617, 377, 389, 707, 539, 193, 361, 262, 465, 100, 232, 206,
        90,  87,  93, 522, 229, 200, 482, 325, 195, 239, 228, 159, 194,

thanks,

Chao

-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120103/c5e44e0b/attachment.html>

From charlesr.harris at gmail.com  Tue Jan  3 08:21:04 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 3 Jan 2012 06:21:04 -0700
Subject: [Numpy-discussion] polynomial package update
In-Reply-To: <CAMMTP+C5BdDag3vQoC6HK7zgGdAMo0AYB6auh63jjFSugUB-wA@mail.gmail.com>
References: <CAB6mnxJ1O+pxo7EWfdiBxy3_s-0Zqk=dyRhN1ppJ87yZ-5L3UA@mail.gmail.com>
	<CAMMTP+C5BdDag3vQoC6HK7zgGdAMo0AYB6auh63jjFSugUB-wA@mail.gmail.com>
Message-ID: <CAB6mnx+jBFD8KLfRDpJKeuhXDu-3F2f3MLd0UmohSYC9GKg3SQ@mail.gmail.com>

On Mon, Jan 2, 2012 at 10:46 PM, <josef.pktd at gmail.com> wrote:

> On Mon, Jan 2, 2012 at 9:44 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > Hi All,
> >
> > I've made a pull request for a  rather large update of the polynomial
> > package. The new features are
> >
> > 1) Bug fixes
> > 2) Improved documentation in the numpy reference
> > 3) Preliminary support for multi-dimensional coefficient arrays
> > 4) Support for NA in the fitting routines
> > 5) Improved testing and test coverage
> > 6) Gauss quadrature
> > 7) Weight functions
> > 8) (Mostly) Symmetrized companion matrices
> > 9) Add cast and basis as static functions of convenience classes
> > 10) Remove deprecated import from package init.py
> >
> > If anyone has an interest in that package, please take some time and
> review
> > it here.
>
> (Since I'm not setup for compiling numpy I cannot try it out. Just
> some spotty reading of the code.)
>
> The two things I'm most interested in are the 2d, 3d enhancements and
> the quadrature.
>
> What's the return of the 2d vander functions?
>
> If I read it correctly, it's:
>
> >>> xyn = np.array([['x^%d*y^%d'%(px,py) for py in range(5)] for px in
> range(3)])
> >>> xyn
> array([['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4'],
>       ['x^1*y^0', 'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4'],
>       ['x^2*y^0', 'x^2*y^1', 'x^2*y^2', 'x^2*y^3', 'x^2*y^4']],
>      dtype='|S7')
> >>> xyn.ravel()
> array(['x^0*y^0', 'x^0*y^1', 'x^0*y^2', 'x^0*y^3', 'x^0*y^4', 'x^1*y^0',
>       'x^1*y^1', 'x^1*y^2', 'x^1*y^3', 'x^1*y^4', 'x^2*y^0', 'x^2*y^1',
>       'x^2*y^2', 'x^2*y^3', 'x^2*y^4'],
>      dtype='|S7')
>
>
Yes, that's right.


> Are the normalization constants available in explicit form to get an
> orthonormal basis?
>

No, not at the moment. I haven't quite figured out how I want to expose
them but I agree that they should be available.


> The test_100 look like good recipes for getting the normalization and
> the integration constants.
>
>
Yes, that works. There are also explicit formulas, but I don't know that
they
would work better. Some of the factors get very large, for Laguerre of
degree
100 the can be up in the 10^100 range


> Are the quads weights and points the same as in scipy.special (up to
> floating point differences)?
>
>
Yes, but more accurate. For instance, the scipy.special values for
Gauss-Laguerre integration die at around degree 40.


> Looks very useful and I'm looking forward to trying it out, and I will
> borrow some code like test_100 as recipes.
> (For densities, I still need mostly orthonormal basis and integration
> normalized to 1.)
>
>
Let me know what would be useful and I'll try to put it in.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120103/de6c4fdb/attachment.html>

From ndbecker2 at gmail.com  Tue Jan  3 09:42:26 2012
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 03 Jan 2012 09:42:26 -0500
Subject: [Numpy-discussion] choose -> segfault
Message-ID: <jdv44i$hpm$1@dough.gmane.org>

I made 2 mistakes here, the 1st argument had the wrong shape, and I really 
wanted to use 'where', not 'choose'.  But shouldn't segfault:

ValueError: Need between 2 and (32) array objects (inclusive).
Segmentation fault (core dumped)


From robert.kern at gmail.com  Tue Jan  3 09:44:37 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 3 Jan 2012 14:44:37 +0000
Subject: [Numpy-discussion] choose -> segfault
In-Reply-To: <jdv44i$hpm$1@dough.gmane.org>
References: <jdv44i$hpm$1@dough.gmane.org>
Message-ID: <CAF6FJis-ozBfHE4cS6Gde9QgZ11fcy_iWPVMxrQxtoqrg4jk2g@mail.gmail.com>

On Tue, Jan 3, 2012 at 14:42, Neal Becker <ndbecker2 at gmail.com> wrote:
> I made 2 mistakes here, the 1st argument had the wrong shape, and I really
> wanted to use 'where', not 'choose'. ?But shouldn't segfault:
>
> ValueError: Need between 2 and (32) array objects (inclusive).
> Segmentation fault (core dumped)

Can you provide an example that replicates the crash? Since it looks
like you have a core dump handy, can you get a gdb backtrace to show
us where the crash is? Platform details would also be handy.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From ognen at enthought.com  Tue Jan  3 12:46:29 2012
From: ognen at enthought.com (Ognen Duzlevski)
Date: Tue, 3 Jan 2012 11:46:29 -0600
Subject: [Numpy-discussion] Enum type
Message-ID: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>

Hello,

I am playing with adding an enum dtype to numpy (to get my feet wet in
numpy really). I have looked at the
https://github.com/martinling/numpy_quaternion and I feel comfortable
with my understanding of adding a simple type to numpy in technical
terms.

I am mostly a C programmer and have programmed in Python but not at
the level where my code wcould be considered "pretty" or maybe even
"pythonic". I know enums from C and have browsed around a few python
enum implementations online. Most of them use hash tables or lists to
associate names to numbers - these approaches just feel "heavy" to me.

What would be a proper "numpy approach" to this? I am looking mostly
for direction and advice as I would like to do the work myself :-)

Any input appreciated :-)
Ognen


From jsalvati at u.washington.edu  Tue Jan  3 12:49:48 2012
From: jsalvati at u.washington.edu (John Salvatier)
Date: Tue, 3 Jan 2012 09:49:48 -0800
Subject: [Numpy-discussion] nested_iters does not accept length zero nest
	(also doesn't have documentation)
Message-ID: <CAJtbx92zH_Q5MfqAtJkSe-dVSO1RDjRtoWJrVDqn2YXNac25Gw@mail.gmail.com>

Hellow, while using the nested_iters function, I've noticed that it does
not accept length zero nestings. For example, the following fails:

nested_iters([ones(3),ones(3)], [[], [0]])

with "ValueError: If 'op_axes' or 'itershape' is not NULL in theiterator
constructor, 'oa_ndim' must be greater than zero"

This makes a certain amount of sense to me, but I think having the iterator
with the empty axes have a single iteration would be more useful. For
example, if you are using nested_iters to ally a function along a specific
set of axes, you'll otherwise have to special case the case where those
axes take up the whole array (which is my use case). This is not much of a
hassle for me, but I thought other people might like to know.

Also, I could not find any nested_iters documentation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120103/6d06814d/attachment.html>

From chaoyuejoy at gmail.com  Tue Jan  3 13:05:10 2012
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Tue, 3 Jan 2012 19:05:10 +0100
Subject: [Numpy-discussion] strange nan in np.ma.average
In-Reply-To: <CAAN-aREFXDczGx4cQBxbKTOB9t-Xk04da5eviLKjKg-R49Ja4A@mail.gmail.com>
References: <CAAN-aREFXDczGx4cQBxbKTOB9t-Xk04da5eviLKjKg-R49Ja4A@mail.gmail.com>
Message-ID: <CAAN-aRGgTtObVw8sD7D4C+e4MB2aDy=BhFAq+yNJdt+8m3crmw@mail.gmail.com>

the problem is here,
data=np.empty([10,90,720])

you should always use np.ma.empty if you want to construct a masked empty
array.

Chao

2012/1/3 Chao YUE <chaoyuejoy at gmail.com>

> Dear all numpy users,
>
> I have 10 90X720 arrays. let's say they are in a list 'a' with each
> element a 90X720 numpy masked array.
> then I create a new empty ndarray: data
>
> data=np.empty([10,90,720])
>
> ##first I store all the 10 ndarray in a 10X90X720 array:
> for i,d in enumerate(a):
>       data[i]=a
>
> data.shape=(10, 90, 720)
> then I use data_av=np.ma.average(data, axis=0) to get the average.
>
>
> The strange thing is, I don't have any 'nan' in all the 10 90X720 array,
> but I have nan value in the final data_av.
> how does this come?
>
>
> In [26]: np.nonzero(np.isnan(data_av))
> Out[26]:
> (array([ 0,  0,  2,  2,  3,  5,  5,  6,  6,  6,  6,  7,  8,  8,  8,  9, 10,
>        10, 10, 11, 11, 12, 13, 13, 14, 17, 17, 19, 22, 22, 44, 63, 64, 64,
>        67, 68, 71, 72, 73, 76, 77, 77, 78, 79, 80, 80, 81, 82, 82, 84, 85,
>        85, 86, 86, 87, 87, 88, 89, 89, 89]),
>  array([159, 541, 497, 548,  90,  97, 170, 244, 267, 587, 590, 150, 126,
>        168, 477, 240, 271, 277, 588,  99, 179, 528,  52, 256, 230, 109,
>        190, 617, 377, 389, 707, 539, 193, 361, 262, 465, 100, 232, 206,
>         90,  87,  93, 522, 229, 200, 482, 325, 195, 239, 228, 159, 194,
>
> thanks,
>
> Chao
>
> --
>
> ***********************************************************************************
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>
> ************************************************************************************
>
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120103/ea11f180/attachment.html>

From wesmckinn at gmail.com  Tue Jan  3 13:05:37 2012
From: wesmckinn at gmail.com (Wes McKinney)
Date: Tue, 3 Jan 2012 13:05:37 -0500
Subject: [Numpy-discussion] Enum type
In-Reply-To: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
References: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
Message-ID: <CAJPUwMA+tua3MJgF-gLXaqQa01ysuEq7HThqOVSB+KTuVKzRmQ@mail.gmail.com>

On Tue, Jan 3, 2012 at 12:46 PM, Ognen Duzlevski <ognen at enthought.com> wrote:
> Hello,
>
> I am playing with adding an enum dtype to numpy (to get my feet wet in
> numpy really). I have looked at the
> https://github.com/martinling/numpy_quaternion and I feel comfortable
> with my understanding of adding a simple type to numpy in technical
> terms.
>
> I am mostly a C programmer and have programmed in Python but not at
> the level where my code wcould be considered "pretty" or maybe even
> "pythonic". I know enums from C and have browsed around a few python
> enum implementations online. Most of them use hash tables or lists to
> associate names to numbers - these approaches just feel "heavy" to me.
>
> What would be a proper "numpy approach" to this? I am looking mostly
> for direction and advice as I would like to do the work myself :-)
>
> Any input appreciated :-)
> Ognen
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

You should use a hash table internally in my opinion. I've started
using khash from klib (https://github.com/attractivechaos/klib) which
has excellent memory usage (more than 50% less than Python dict with
large hash tables) and good performance characteristics. With the enum
dtype you can avoid reference counting with primitive types, not sure
about object dtype. If enum arrays are mutable this will be very
tricky.

- Wes


From jim.vickroy at noaa.gov  Tue Jan  3 13:06:39 2012
From: jim.vickroy at noaa.gov (Jim Vickroy)
Date: Tue, 03 Jan 2012 11:06:39 -0700
Subject: [Numpy-discussion] Enum type
In-Reply-To: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
References: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
Message-ID: <4F0343AF.9010103@noaa.gov>

On 1/3/2012 10:46 AM, Ognen Duzlevski wrote:
> Hello,
>
> I am playing with adding an enum dtype to numpy (to get my feet wet in
> numpy really). I have looked at the
> https://github.com/martinling/numpy_quaternion and I feel comfortable
> with my understanding of adding a simple type to numpy in technical
> terms.
>
> I am mostly a C programmer and have programmed in Python but not at
> the level where my code wcould be considered "pretty" or maybe even
> "pythonic". I know enums from C and have browsed around a few python
> enum implementations online. Most of them use hash tables or lists to
> associate names to numbers - these approaches just feel "heavy" to me.
>
> What would be a proper "numpy approach" to this? I am looking mostly
> for direction and advice as I would like to do the work myself :-)
>
> Any input appreciated :-)
> Ognen

Does "enumerate" 
(http://docs.python.org/library/functions.html#enumerate) work for you?

> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From chris.barker at noaa.gov  Tue Jan  3 13:41:28 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 3 Jan 2012 10:41:28 -0800
Subject: [Numpy-discussion] GSOC
In-Reply-To: <CAB=suEmMyBLgY9uVCd1pFEyKtx7REqU3xqGDoCVCYtWhm8n8VA@mail.gmail.com>
References: <CAB6mnxLYF65T+hwYJSm6-PYOeMitUM6qZcM2WjgjvTsa_yL9ZA@mail.gmail.com>
	<CABL7CQhV5kd7JFrYswK=neDqpdd4qWEDm+FqbnZdZgXf7HMzZw@mail.gmail.com>
	<CALGmxEJX0XVNSJqenyEqCsr8w1qeU4ORKLPcjb2SL+w286t2BA@mail.gmail.com>
	<CAB=suEmMyBLgY9uVCd1pFEyKtx7REqU3xqGDoCVCYtWhm8n8VA@mail.gmail.com>
Message-ID: <CALGmxE+O6q6kTLcndGUWr2Ft2sYGjzHUWJ4BAJqSsukUC+62GA@mail.gmail.com>

On Fri, Dec 30, 2011 at 9:43 PM, Jaidev Deshpande
<deshpande.jaidev at gmail.com> wrote:
>> Documentation is specificsly excluded from GSoC (at least it was a
>> couple years ago when I last was involved)
>
> Documentation wasn't excluded last year from GSoC, there were quite a
> few projects that required a lot of documentation.

sure -- it's certanly encouraged to docuemnt code that gets written, but...

> But yes, there was no "documentation only" project.

exactly -- from the 2011 GSoC FAQ:

12. Are proposals for documentation work eligible for Google Summer of Code?

 While we greatly appreciate the value of documentation, this program
is an exercise in developing code; we can't accept proposals for
documentation-only work at this time.

> Anyhow, it seems reasonable that testing alone can't be a project.
> What about benchmarking and the related statistics? Does that qualify
> as a worthwhile project (again, GSoC or otherwise)?

I didn't find a specific RAQ, but from the above, I suspect that all
projects must be primarily about producing code: not documenting,
testing, or benchmarking. Those, of course, should all be part of code
development, but not the focus.

- Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From wesmckinn at gmail.com  Tue Jan  3 13:46:11 2012
From: wesmckinn at gmail.com (Wes McKinney)
Date: Tue, 3 Jan 2012 13:46:11 -0500
Subject: [Numpy-discussion] Enum type
In-Reply-To: <4F0343AF.9010103@noaa.gov>
References: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
	<4F0343AF.9010103@noaa.gov>
Message-ID: <CAJPUwMAgLMr8zi2TOLJMvK4=voUuTipDcZLfV97cH-2MiS+COQ@mail.gmail.com>

On Tue, Jan 3, 2012 at 1:06 PM, Jim Vickroy <jim.vickroy at noaa.gov> wrote:
> On 1/3/2012 10:46 AM, Ognen Duzlevski wrote:
>> Hello,
>>
>> I am playing with adding an enum dtype to numpy (to get my feet wet in
>> numpy really). I have looked at the
>> https://github.com/martinling/numpy_quaternion and I feel comfortable
>> with my understanding of adding a simple type to numpy in technical
>> terms.
>>
>> I am mostly a C programmer and have programmed in Python but not at
>> the level where my code wcould be considered "pretty" or maybe even
>> "pythonic". I know enums from C and have browsed around a few python
>> enum implementations online. Most of them use hash tables or lists to
>> associate names to numbers - these approaches just feel "heavy" to me.
>>
>> What would be a proper "numpy approach" to this? I am looking mostly
>> for direction and advice as I would like to do the work myself :-)
>>
>> Any input appreciated :-)
>> Ognen
>
> Does "enumerate"
> (http://docs.python.org/library/functions.html#enumerate) work for you?
>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

That's not exactly what he means. The R lingo for this concept is
"factor" or a bit more common "categorical variable":

http://stat.ethz.ch/R-manual/R-patched/library/base/html/factor.html

FWIW R's factor type is implemented using hash tables. I do the same in pandas.

- Wes


From ognen at enthought.com  Tue Jan  3 13:52:30 2012
From: ognen at enthought.com (Ognen Duzlevski)
Date: Tue, 3 Jan 2012 12:52:30 -0600
Subject: [Numpy-discussion] Enum type
In-Reply-To: <CAJPUwMAgLMr8zi2TOLJMvK4=voUuTipDcZLfV97cH-2MiS+COQ@mail.gmail.com>
References: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
	<4F0343AF.9010103@noaa.gov>
	<CAJPUwMAgLMr8zi2TOLJMvK4=voUuTipDcZLfV97cH-2MiS+COQ@mail.gmail.com>
Message-ID: <CAA6U3WC+vG9A4P4KAo=mg8LgbbW0582CjsPK8h=K=0F-Xghvnw@mail.gmail.com>

On Tue, Jan 3, 2012 at 12:46 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Tue, Jan 3, 2012 at 1:06 PM, Jim Vickroy <jim.vickroy at noaa.gov> wrote:
>> On 1/3/2012 10:46 AM, Ognen Duzlevski wrote:
>>> Hello,
>>>
>>> I am playing with adding an enum dtype to numpy (to get my feet wet in
>>> numpy really). I have looked at the
>>> https://github.com/martinling/numpy_quaternion and I feel comfortable
>>> with my understanding of adding a simple type to numpy in technical
>>> terms.
>>>
>>> I am mostly a C programmer and have programmed in Python but not at
>>> the level where my code wcould be considered "pretty" or maybe even
>>> "pythonic". I know enums from C and have browsed around a few python
>>> enum implementations online. Most of them use hash tables or lists to
>>> associate names to numbers - these approaches just feel "heavy" to me.
>>>
>>> What would be a proper "numpy approach" to this? I am looking mostly
>>> for direction and advice as I would like to do the work myself :-)
>>>
>>> Any input appreciated :-)
>>> Ognen
>>
>> Does "enumerate"
>> (http://docs.python.org/library/functions.html#enumerate) work for you?
> That's not exactly what he means. The R lingo for this concept is
> "factor" or a bit more common "categorical variable":
>
> http://stat.ethz.ch/R-manual/R-patched/library/base/html/factor.html
>
> FWIW R's factor type is implemented using hash tables. I do the same in pandas.
>
> - Wes

Wes,

You are right, "categorical variable" is what I am after. Thanks for
the pointer, I will go the klib route you suggested and see what comes
out. I may be "old fashioned" a bit in the sense that adding
dependencies on external libraries is something I am reluctant to do -
this is why I said using hashes may have felt a bit "heavy". But that
may be my shortcoming :-)

Ognen


From d.s.seljebotn at astro.uio.no  Tue Jan  3 14:29:44 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 03 Jan 2012 20:29:44 +0100
Subject: [Numpy-discussion] Enum type
In-Reply-To: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
References: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
Message-ID: <4F035728.4060509@astro.uio.no>

On 01/03/2012 06:46 PM, Ognen Duzlevski wrote:
> Hello,
>
> I am playing with adding an enum dtype to numpy (to get my feet wet in
> numpy really). I have looked at the
> https://github.com/martinling/numpy_quaternion and I feel comfortable
> with my understanding of adding a simple type to numpy in technical
> terms.
>
> I am mostly a C programmer and have programmed in Python but not at
> the level where my code wcould be considered "pretty" or maybe even
> "pythonic". I know enums from C and have browsed around a few python
> enum implementations online. Most of them use hash tables or lists to
> associate names to numbers - these approaches just feel "heavy" to me.

If you want the enum values to be stored efficiently (using 1, 2 or 
4-byte integers), and want a mapping between string names and such 
integers, then you need to map between them somehow, right? I.e., when 
printing the repr() of each element, you at least need a list in order 
to go from enum values to names (and that doesn't feel 'heavy' to me -- 
it's the minimal possible solution for the job!)

It's unclear whether you mean heavy on the CPU, in the API, in the C 
code, or whatever, so difficult to give more feedback.

As far as the API goes, you could probably do something like:

colors = np.enum(['red', 'green', 'blue'])
arr = np.asarray([colors.red, colors.red, colors.red, colors.blue])
assert arr[0] == colors.red
assert np.all(arr.view(np.int8) == [0, 0, 0, 2])

So the strings are only needed in the API in the constructor of the enum 
type. They are needed there though.

Dag Sverre


From nitroamos at gmail.com  Tue Jan  3 14:58:13 2012
From: nitroamos at gmail.com (Amos Anderson)
Date: Tue, 3 Jan 2012 11:58:13 -0800
Subject: [Numpy-discussion] numpy dgemm link error
Message-ID: <CAHGKun9dd3ctpY8bYtppCNDcqm=L5La5bgTpdkYfYURbGKoNDQ@mail.gmail.com>

Hello --

I've been having some problems building numpy.

The first problem I had "error: unrecognizable insn", I was able to
fix by following these instructions:
http://www.mail-archive.com/numpy-discussion at scipy.org/msg34238.html

> Could you try the following, at line 38, to add the following:
>
> #define EINSUM_USE_SSE1 0
> #define EINSUM_USE_SSE2 0

But now when I compile, I get a linker error against dgemm. I've
pasted the full build output below.

Amos.


Running from numpy source directory.non-existing path in
'numpy/distutils': 'site.cfg'
usage: svnversion [OPTIONS] WC_PATH [TRAIL_URL]

  Produce a compact "version number" for the working copy path
  WC_PATH.  TRAIL_URL is the trailing portion of the URL used to
  determine if WC_PATH itself is switched (detection of switches
  within WC_PATH does not rely on TRAIL_URL).  The version number
  is written to standard output.  For example:

    $ svnversion . /repos/svn/trunk
    4168

  The version number will be a single number if the working
  copy is single revision, unmodified, not switched and with
  an URL that matches the TRAIL_URL argument.  If the working
  copy is unusual the version number will be more complex:

   4123:4168     mixed revision working copy
   4168M         modified working copy
   4123S         switched working copy
   4123:4168MS   mixed revision, modified, switched working copy

  If invoked on a directory that is not a working copy, an
  exported directory say, the program will output "exported".

Valid options:
  -n [--no-newline]        : do not output the trailing newline
  -c [--committed]         : last changed rather than current revisions
  --version                : show version information
F2PY Version 2
blas_opt_info:
blas_mkl_info:
  libraries mkl,vml,guide not found in /home/amosa/triad/tools/python/lib
  libraries mkl,vml,guide not found in /usr/local/lib64
  libraries mkl,vml,guide not found in /usr/local/lib
  libraries mkl,vml,guide not found in /usr/lib64
  libraries mkl,vml,guide not found in /usr/lib
  NOT AVAILABLE

atlas_blas_threads_info:
Setting PTATLAS=ATLAS
  libraries ptf77blas,ptcblas,atlas not found in
/home/amosa/triad/tools/python/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib64
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib
  NOT AVAILABLE

atlas_blas_info:
  libraries f77blas,cblas,atlas not found in /home/amosa/triad/tools/python/lib
  libraries f77blas,cblas,atlas not found in /usr/local/lib64
  libraries f77blas,cblas,atlas not found in /usr/local/lib
customize GnuFCompiler
Found executable /usr/bin/g77
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler using config
compiling '_configtest.c':

/* This file is generated from numpy/distutils/system_info.py */
void ATL_buildinfo(void);
int main(void) {
  ATL_buildinfo();
  return 0;
}

C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

compile options: '-c'
gcc: _configtest.c
gcc -pthread _configtest.o -L/usr/lib64 -lf77blas -lcblas -latlas -o _configtest
ATLAS version 3.7.11 built by root on Mon Jun  5 10:14:12 EDT 2006:
   UNAME    : Linux intel1.lsf.platform.com 2.6.9-34.ELsmp #1 SMP Fri
Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
   INSTFLG  :
   MMDEF    : /export/madison/src/roll/hpc/BUILD/ATLAS/CONFIG/ARCHS/P4E64SSE3/gcc/gemm
   ARCHDEF  : /export/madison/src/roll/hpc/BUILD/ATLAS/CONFIG/ARCHS/P4E64SSE3/gcc/misc
   F2CDEFS  : -DAdd__ -DStringSunStyle
   CACHEEDGE: 393216
   F77      : /usr/bin/g77, version GNU Fortran (GCC) 3.4.5 20051201
(Red Hat 3.4.5-2)
   F77FLAGS : -fomit-frame-pointer -O -m64
   CC       : /usr/bin/gcc, version gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)
   CC FLAGS : -fomit-frame-pointer -O3 -funroll-all-loops -m64
   MCC      : /usr/bin/gcc, version gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)
   MCCFLAGS : -fomit-frame-pointer -O -m64
success!
removing: _configtest.c _configtest.o _configtest
  FOUND:
    libraries = ['f77blas', 'cblas', 'atlas']
    library_dirs = ['/usr/lib64']
    language = c
    define_macros = [('ATLAS_INFO', '"\\"3.7.11\\""')]

  FOUND:
    libraries = ['f77blas', 'cblas', 'atlas']
    library_dirs = ['/usr/lib64']
    language = c
    define_macros = [('ATLAS_INFO', '"\\"3.7.11\\""')]

usage: svnversion [OPTIONS] WC_PATH [TRAIL_URL]

  Produce a compact "version number" for the working copy path
  WC_PATH.  TRAIL_URL is the trailing portion of the URL used to
  determine if WC_PATH itself is switched (detection of switches
  within WC_PATH does not rely on TRAIL_URL).  The version number
  is written to standard output.  For example:

    $ svnversion . /repos/svn/trunk
    4168

  The version number will be a single number if the working
  copy is single revision, unmodified, not switched and with
  an URL that matches the TRAIL_URL argument.  If the working
  copy is unusual the version number will be more complex:

   4123:4168     mixed revision working copy
   4168M         modified working copy
   4123S         switched working copy
   4123:4168MS   mixed revision, modified, switched working copy

  If invoked on a directory that is not a working copy, an
  exported directory say, the program will output "exported".

Valid options:
  -n [--no-newline]        : do not output the trailing newline
  -c [--committed]         : last changed rather than current revisions
  --version                : show version information
lapack_opt_info:
lapack_mkl_info:
mkl_info:
  libraries mkl,vml,guide not found in /home/amosa/triad/tools/python/lib
  libraries mkl,vml,guide not found in /usr/local/lib64
  libraries mkl,vml,guide not found in /usr/local/lib
  libraries mkl,vml,guide not found in /usr/lib64
  libraries mkl,vml,guide not found in /usr/lib
  NOT AVAILABLE

  NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
  libraries ptf77blas,ptcblas,atlas not found in
/home/amosa/triad/tools/python/lib
  libraries lapack_atlas not found in /home/amosa/triad/tools/python/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64
  libraries lapack_atlas not found in /usr/local/lib64
  libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/local/lib
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib64
  libraries lapack_atlas not found in /usr/lib64
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2
  libraries lapack_atlas not found in /usr/lib/sse2
  libraries ptf77blas,ptcblas,atlas not found in /usr/lib
  libraries lapack_atlas not found in /usr/lib
numpy.distutils.system_info.atlas_threads_info
  NOT AVAILABLE

atlas_info:
  libraries f77blas,cblas,atlas not found in /home/amosa/triad/tools/python/lib
  libraries lapack_atlas not found in /home/amosa/triad/tools/python/lib
  libraries f77blas,cblas,atlas not found in /usr/local/lib64
  libraries lapack_atlas not found in /usr/local/lib64
  libraries f77blas,cblas,atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/local/lib
  libraries lapack_atlas not found in /usr/lib64
numpy.distutils.system_info.atlas_info
  FOUND:
    libraries = ['lapack', 'f77blas', 'cblas', 'atlas']
    library_dirs = ['/usr/lib64']
    language = f77
    define_macros = [('ATLAS_INFO', '"\\"3.7.11\\""')]

  FOUND:
    libraries = ['lapack', 'f77blas', 'cblas', 'atlas']
    library_dirs = ['/usr/lib64']
    language = f77
    define_macros = [('ATLAS_INFO', '"\\"3.7.11\\""')]

running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands
--compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands
--fcompiler options
running build_src
build_src
building py_modules sources
building library "npymath" sources
customize GnuFCompiler
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler using config
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

compile options: '-Inumpy/core/src/private -Inumpy/core/src
-Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/include
-I/home/amosa/triad/tools/python/include/python2.7 -c'
gcc: _configtest.c
gcc -pthread _configtest.o -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

compile options: '-Inumpy/core/src/private -Inumpy/core/src
-Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/include
-I/home/amosa/triad/tools/python/include/python2.7 -c'
gcc: _configtest.c
_configtest.c:1: warning: conflicting types for built-in function 'exp'
gcc -pthread _configtest.o -o _configtest
_configtest.o(.text+0x5): In function `main':
/home/amosa/triad/tools/numpy/numpy-1.6.1/_configtest.c:6: undefined
reference to `exp'
collect2: ld returned 1 exit status
_configtest.o(.text+0x5): In function `main':
/home/amosa/triad/tools/numpy/numpy-1.6.1/_configtest.c:6: undefined
reference to `exp'
collect2: ld returned 1 exit status
failure.
removing: _configtest.c _configtest.o
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

compile options: '-Inumpy/core/src/private -Inumpy/core/src
-Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/include
-I/home/amosa/triad/tools/python/include/python2.7 -c'
gcc: _configtest.c
_configtest.c:1: warning: conflicting types for built-in function 'exp'
gcc -pthread _configtest.o -lm -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
building extension "numpy.core._sort" sources
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h'
to sources.
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h'
to sources.
executing numpy/core/code_generators/generate_numpy_api.py
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h'
to sources.
numpy.core - nothing done with h_files =
['build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h']
building extension "numpy.core.multiarray" sources
conv_template:>
build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c
conv_template:>
build/src.linux-x86_64-2.7/numpy/core/src/multiarray/arraytypes.c
conv_template:> build/src.linux-x86_64-2.7/numpy/core/src/multiarray/nditer.c
conv_template:>
build/src.linux-x86_64-2.7/numpy/core/src/multiarray/lowlevel_strided_loops.c
conv_template:> build/src.linux-x86_64-2.7/numpy/core/src/multiarray/einsum.c
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h'
to sources.
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h'
to sources.
executing numpy/core/code_generators/generate_numpy_api.py
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h'
to sources.
numpy.core - nothing done with h_files =
['build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h']
building extension "numpy.core.umath" sources
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h'
to sources.
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h'
to sources.
executing numpy/core/code_generators/generate_ufunc_api.py
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h'
to sources.
  adding 'build/src.linux-x86_64-2.7/numpy/core/src/umath' to include_dirs.
numpy.core - nothing done with h_files =
['build/src.linux-x86_64-2.7/numpy/core/src/umath/funcs.inc',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h']
building extension "numpy.core.scalarmath" sources
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h'
to sources.
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h'
to sources.
executing numpy/core/code_generators/generate_numpy_api.py
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h'
to sources.
executing numpy/core/code_generators/generate_ufunc_api.py
  adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h'
to sources.
numpy.core - nothing done with h_files =
['build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h',
'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h']
building extension "numpy.core._dotblas" sources
  adding 'numpy/core/blasdot/_dotblas.c' to sources.
building extension "numpy.core.umath_tests" sources
building extension "numpy.core.multiarray_tests" sources
building extension "numpy.lib._compiled_base" sources
building extension "numpy.numarray._capi" sources
building extension "numpy.fft.fftpack_lite" sources
building extension "numpy.linalg.lapack_lite" sources
  adding 'numpy/linalg/lapack_litemodule.c' to sources.
  adding 'numpy/linalg/python_xerbla.c' to sources.
building extension "numpy.random.mtrand" sources
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

compile options: '-Inumpy/core/src/private -Inumpy/core/src
-Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/include
-I/home/amosa/triad/tools/python/include/python2.7 -c'
gcc: _configtest.c
gcc -pthread _configtest.o -o _configtest
_configtest
failure.
removing: _configtest.c _configtest.o _configtest
building data_files sources
build_src: building npy-pkg config files
running build_py
copying numpy/version.py -> build/lib.linux-x86_64-2.7/numpy
copying build/src.linux-x86_64-2.7/numpy/__config__.py ->
build/lib.linux-x86_64-2.7/numpy
copying build/src.linux-x86_64-2.7/numpy/distutils/__config__.py ->
build/lib.linux-x86_64-2.7/numpy/distutils
running build_clib
customize UnixCCompiler
customize UnixCCompiler using build_clib
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize GnuFCompiler
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler
gnu: no Fortran 90 compiler found
gnu: no Fortran 90 compiler found
customize GnuFCompiler using build_ext
building 'numpy.core.multiarray' extension
compiling C sources
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

compile options: '-Inumpy/core/include
-Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy
-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
-Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/include
-I/home/amosa/triad/tools/python/include/python2.7
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c'
gcc: numpy/core/src/multiarray/multiarraymodule_onefile.c
In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:42:
numpy/core/src/multiarray/einsum.c.src:39:1: warning:
"EINSUM_USE_SSE1" redefined
numpy/core/src/multiarray/einsum.c.src:25:1: warning: this is the
location of the previous definition
numpy/core/src/multiarray/einsum.c.src:40:1: warning:
"EINSUM_USE_SSE2" redefined
numpy/core/src/multiarray/einsum.c.src:34:1: warning: this is the
location of the previous definition
numpy/core/src/multiarray/descriptor.c: In function
`_convert_divisor_to_multiple':
numpy/core/src/multiarray/descriptor.c:606: warning: 'q' might be used
uninitialized in this function
numpy/core/src/multiarray/nditer.c.src: In function `npyiter_allocate_arrays':
numpy/core/src/multiarray/nditer.c.src:4923: warning: 'innershape'
might be used uninitialized in this function
numpy/core/src/multiarray/multiarraymodule_onefile.c: At top level:
numpy/core/src/multiarray/scalartypes.c.src:2550: warning:
'longlong_arrtype_hash' defined but not used
numpy/core/src/multiarray/mapping.c:75: warning: '_array_ass_item'
defined but not used
build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h:227:
warning: '_import_umath' defined but not used
gcc -pthread -shared
build/temp.linux-x86_64-2.7/numpy/core/src/multiarray/multiarraymodule_onefile.o
-L/state/partition1/home/amosa/triad/tools/python/lib
-Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o
build/lib.linux-x86_64-2.7/numpy/core/multiarray.so
building 'numpy.core.umath' extension
compiling C sources
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

creating build/temp.linux-x86_64-2.7/numpy/core/src/umath
compile options: '-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath
-Inumpy/core/include
-Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy
-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
-Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/include
-I/home/amosa/triad/tools/python/include/python2.7
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c'
gcc: numpy/core/src/umath/umathmodule_onefile.c
numpy/core/include/numpy/npy_3kcompat.h:392: warning:
'simple_capsule_dtor' defined but not used
numpy/core/src/private/lowlevel_strided_loops.h:36: warning:
'PyArray_FreeStridedTransferData' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:43: warning:
'PyArray_CopyStridedTransferData' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:64: warning:
'PyArray_GetStridedCopyFn' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:78: warning:
'PyArray_GetStridedCopySwapFn' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:92: warning:
'PyArray_GetStridedCopySwapPairFn' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:109: warning:
'PyArray_GetStridedZeroPadCopyFn' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:120: warning:
'PyArray_GetStridedNumericCastFn' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:174: warning:
'PyArray_GetDTypeTransferFunction' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:227: warning:
'PyArray_TransferNDimToStrided' declared `static' but never defined
numpy/core/src/private/lowlevel_strided_loops.h:237: warning:
'PyArray_TransferStridedToNDim' declared `static' but never defined
gcc -pthread -shared
build/temp.linux-x86_64-2.7/numpy/core/src/umath/umathmodule_onefile.o
-L/state/partition1/home/amosa/triad/tools/python/lib
-Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o
build/lib.linux-x86_64-2.7/numpy/core/umath.so
building 'numpy.core.scalarmath' extension
compiling C sources
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

compile options: '-Inumpy/core/include
-Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy
-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
-Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/include
-I/home/amosa/triad/tools/python/include/python2.7
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c'
gcc: build/src.linux-x86_64-2.7/numpy/core/src/scalarmathmodule.c
numpy/core/src/scalarmathmodule.c.src:1054: warning: function
declaration isn't a prototype
numpy/core/include/numpy/npy_3kcompat.h:392: warning:
'simple_capsule_dtor' defined but not used
gcc -pthread -shared
build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/scalarmathmodule.o
-L/state/partition1/home/amosa/triad/tools/python/lib
-Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o
build/lib.linux-x86_64-2.7/numpy/core/scalarmath.so
building 'numpy.core._dotblas' extension
compiling C sources
C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

creating build/temp.linux-x86_64-2.7/numpy/core/blasdot
compile options: '-DATLAS_INFO="\"3.7.11\"" -Inumpy/core/blasdot
-Inumpy/core/include
-Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy
-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
-Inumpy/core/src/npymath -Inumpy/core/src/multiarray
-Inumpy/core/src/umath -Inumpy/core/include
-I/home/amosa/triad/tools/python/include/python2.7
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c'
gcc: numpy/core/blasdot/_dotblas.c
numpy/core/blasdot/_dotblas.c: In function `dotblas_matrixproduct':
numpy/core/blasdot/_dotblas.c:239: warning: comparison of distinct
pointer types lacks a cast
numpy/core/blasdot/_dotblas.c:257: warning: passing arg 3 of pointer
to function from incompatible pointer type
numpy/core/blasdot/_dotblas.c:292: warning: passing arg 3 of pointer
to function from incompatible pointer type
gcc -pthread -shared
build/temp.linux-x86_64-2.7/numpy/core/blasdot/_dotblas.o -L/usr/lib64
-L/state/partition1/home/amosa/triad/tools/python/lib
-Lbuild/temp.linux-x86_64-2.7 -lf77blas -lcblas -latlas -lpython2.7 -o
build/lib.linux-x86_64-2.7/numpy/core/_dotblas.so
/usr/bin/ld: /usr/lib64/libcblas.a(cblas_dgemm.o): relocation
R_X86_64_32 against `a local symbol' can not be used when making a
shared object; recompile with -fPIC
/usr/lib64/libcblas.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
/usr/bin/ld: /usr/lib64/libcblas.a(cblas_dgemm.o): relocation
R_X86_64_32 against `a local symbol' can not be used when making a
shared object; recompile with -fPIC
/usr/lib64/libcblas.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
error: Command "gcc -pthread -shared
build/temp.linux-x86_64-2.7/numpy/core/blasdot/_dotblas.o -L/usr/lib64
-L/state/partition1/home/amosa/triad/tools/python/lib
-Lbuild/temp.linux-x86_64-2.7 -lf77blas -lcblas -latlas -lpython2.7 -o
build/lib.linux-x86_64-2.7/numpy/core/_dotblas.so" failed with exit
status 1


From njs at pobox.com  Tue Jan  3 15:02:24 2012
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 3 Jan 2012 12:02:24 -0800
Subject: [Numpy-discussion] Enum type
In-Reply-To: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
References: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
Message-ID: <CAPJVwBnHikJbFsNJ4igjDtPnhrNNmSNC2JBoBUy+zKgG7DDcpA@mail.gmail.com>

On Tue, Jan 3, 2012 at 9:46 AM, Ognen Duzlevski <ognen at enthought.com> wrote:
> Hello,
>
> I am playing with adding an enum dtype to numpy (to get my feet wet in
> numpy really). I have looked at the
> https://github.com/martinling/numpy_quaternion and I feel comfortable
> with my understanding of adding a simple type to numpy in technical
> terms.

Hi Ognen,

I'm in the middle of an intercontinental move, so I can't help much,
but I'd also love to see a proper enum/categorical type in numpy, so
here are a few notes:

- I wrote a simple cython implementation of this last year, which
might be useful -- code attached.

- The barrier I ran into, which you'll surely run into as well, is a
flaw in the ufunc API in numpy. Currently, ufunc inner loops do not
have any way to access the dtype of the array they are being called
on. For most dtypes, this isn't an issue -- the inner loop for adding
together int32's knows that it is being called on an array of int32's,
it doesn't need to see the dtype to figure that out. But with enums,
each array has a different set of possible categories, and these will
be attached to the dtype object somehow. So if you want to do, say,
equality comparison between an enum-array and a string-array:
  np.enumarray(["a"", "b", "c"]) == ["a", "c", "b"] -> np.array([True,
False, True])
...you can't actually make this work in current numpy. The solution is
that the ufunc API needs to be changed to make dtype's somehow
available to inner loops. (Probably by passing a pointer to the array
object, like all the PyArray_ArrFuncs do.)

See this thread:
http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html

- Both the statistical folk (pandas, statsmodels) and the hdf5 folk
(pytables, h5py) have reasons to want better enum support. (Maybe
there are other use cases too -- anyone I'm forgetting?) You should
make sure to talk to both groups to make sure what you come up with
will work for them.

Cheers,
-- Nathaniel

> I am mostly a C programmer and have programmed in Python but not at
> the level where my code wcould be considered "pretty" or maybe even
> "pythonic". I know enums from C and have browsed around a few python
> enum implementations online. Most of them use hash tables or lists to
> associate names to numbers - these approaches just feel "heavy" to me.
>
> What would be a proper "numpy approach" to this? I am looking mostly
> for direction and advice as I would like to do the work myself :-)
>
> Any input appreciated :-)
> Ognen
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: npenum.pyx
Type: application/octet-stream
Size: 12481 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120103/ae2fa198/attachment.obj>

From ognen at enthought.com  Tue Jan  3 17:34:42 2012
From: ognen at enthought.com (Ognen Duzlevski)
Date: Tue, 3 Jan 2012 16:34:42 -0600
Subject: [Numpy-discussion] Enum type
In-Reply-To: <CAPJVwBnHikJbFsNJ4igjDtPnhrNNmSNC2JBoBUy+zKgG7DDcpA@mail.gmail.com>
References: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
	<CAPJVwBnHikJbFsNJ4igjDtPnhrNNmSNC2JBoBUy+zKgG7DDcpA@mail.gmail.com>
Message-ID: <CAA6U3WD6ff+kQbBm7CN8BxXDgncp_EiEJUq6WPq7CrQMt-05pg@mail.gmail.com>

Nathaniel,

On Tue, Jan 3, 2012 at 2:02 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Jan 3, 2012 at 9:46 AM, Ognen Duzlevski <ognen at enthought.com> wrote:
>> Hello,
>>
>> I am playing with adding an enum dtype to numpy (to get my feet wet in
>> numpy really). I have looked at the
>> https://github.com/martinling/numpy_quaternion and I feel comfortable
>> with my understanding of adding a simple type to numpy in technical
>> terms.
>
> Hi Ognen,
>
> I'm in the middle of an intercontinental move, so I can't help much,
> but I'd also love to see a proper enum/categorical type in numpy, so
> here are a few notes:
>
> - I wrote a simple cython implementation of this last year, which
> might be useful -- code attached.
>
> - The barrier I ran into, which you'll surely run into as well, is a
> flaw in the ufunc API in numpy. Currently, ufunc inner loops do not
> have any way to access the dtype of the array they are being called
> on. For most dtypes, this isn't an issue -- the inner loop for adding
> together int32's knows that it is being called on an array of int32's,
> it doesn't need to see the dtype to figure that out. But with enums,
> each array has a different set of possible categories, and these will
> be attached to the dtype object somehow. So if you want to do, say,
> equality comparison between an enum-array and a string-array:
> ?np.enumarray(["a"", "b", "c"]) == ["a", "c", "b"] -> np.array([True,
> False, True])
> ...you can't actually make this work in current numpy. The solution is
> that the ufunc API needs to be changed to make dtype's somehow
> available to inner loops. (Probably by passing a pointer to the array
> object, like all the PyArray_ArrFuncs do.)
>
> See this thread:
> http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html
>
> - Both the statistical folk (pandas, statsmodels) and the hdf5 folk
> (pytables, h5py) have reasons to want better enum support. (Maybe
> there are other use cases too -- anyone I'm forgetting?) You should
> make sure to talk to both groups to make sure what you come up with
> will work for them.
>
> Cheers,
> -- Nathaniel

Thanks! The above input is exactly what I was looking for (in addition
to my original question). This "corner case" knowledge is good to have
;)
Ognen


From wonjunchoi001 at gmail.com  Tue Jan  3 22:56:52 2012
From: wonjunchoi001 at gmail.com (Wonjun, Choi)
Date: Tue, 3 Jan 2012 19:56:52 -0800 (PST)
Subject: [Numpy-discussion] what is the best way to pass c,
	c++ array to numpy in cython?
Message-ID: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com>

hello,

what is the best way to pass c, c++ array to numpy in cython?
or what is the best way to pass fortran multi-dimensional array to
numpy in cython?

Wonjun, Choi


From questions.anon at gmail.com  Tue Jan  3 23:10:43 2012
From: questions.anon at gmail.com (questions anon)
Date: Wed, 4 Jan 2012 15:10:43 +1100
Subject: [Numpy-discussion] find location of maximum values
In-Reply-To: <CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
Message-ID: <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>

Thanks for your responses but I am still having difficuties with this
problem. Using argmax gives me one very large value and I am not sure what
it is.
There shouldn't be any issues with the shape. The latitude and longitude
are the same shape always (covering a state) and the temperature (TSFC)
data are hourly for a whole month.

Are there any other ideas for finding the location and time of the maximum
value in an array?
Thanks

On Wed, Dec 21, 2011 at 3:38 PM, Benjamin Root <ben.root at ou.edu> wrote:

>
>
> On Tuesday, December 20, 2011, questions anon <questions.anon at gmail.com>
> wrote:
> > ok thanks, a quick try at using it resulted in:
> > IndexError: index out of bounds
> > but I may need to do abit more investigating to understand how it works.
> > thanks
>
> The assumption is that these arrays are all the same shape.  If not, then
> extra work is needed to figure out how to map indices of the temperature
> array to the indices of the lat and Lon arrays.
>
> Ben Root
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120104/67a4e60b/attachment.html>

From teoliphant at gmail.com  Wed Jan  4 01:07:31 2012
From: teoliphant at gmail.com (Travis Oliphant)
Date: Wed, 4 Jan 2012 00:07:31 -0600
Subject: [Numpy-discussion] Enum type
In-Reply-To: <CAPJVwBnHikJbFsNJ4igjDtPnhrNNmSNC2JBoBUy+zKgG7DDcpA@mail.gmail.com>
References: <CAA6U3WD_TsjewJnZt0ySE+x=6-ZzzX2=D9A_1eOr4iJOhtEE-g@mail.gmail.com>
	<CAPJVwBnHikJbFsNJ4igjDtPnhrNNmSNC2JBoBUy+zKgG7DDcpA@mail.gmail.com>
Message-ID: <BC102A6B-05D2-487A-832A-155816E48FA7@gmail.com>

A categorical type (or enum type) is an important dtype to add to NumPy.   It would be very nice if the option existed to make the categorical dtype "dynamic" in that the categories can grow as more data is added or inserted into the array.   This would effectively allow binning of data on insertion into the array.  

The option would need to exist to have both "fixed" and "dynamic" dtypes because there are important use-cases for both.

-Travis

On Jan 3, 2012, at 2:02 PM, Nathaniel Smith wrote:

> On Tue, Jan 3, 2012 at 9:46 AM, Ognen Duzlevski <ognen at enthought.com> wrote:
>> Hello,
>> 
>> I am playing with adding an enum dtype to numpy (to get my feet wet in
>> numpy really). I have looked at the
>> https://github.com/martinling/numpy_quaternion and I feel comfortable
>> with my understanding of adding a simple type to numpy in technical
>> terms.
> 
> Hi Ognen,
> 
> I'm in the middle of an intercontinental move, so I can't help much,
> but I'd also love to see a proper enum/categorical type in numpy, so
> here are a few notes:
> 
> - I wrote a simple cython implementation of this last year, which
> might be useful -- code attached.
> 
> - The barrier I ran into, which you'll surely run into as well, is a
> flaw in the ufunc API in numpy. Currently, ufunc inner loops do not
> have any way to access the dtype of the array they are being called
> on. For most dtypes, this isn't an issue -- the inner loop for adding
> together int32's knows that it is being called on an array of int32's,
> it doesn't need to see the dtype to figure that out. But with enums,
> each array has a different set of possible categories, and these will
> be attached to the dtype object somehow. So if you want to do, say,
> equality comparison between an enum-array and a string-array:
>  np.enumarray(["a"", "b", "c"]) == ["a", "c", "b"] -> np.array([True,
> False, True])
> ...you can't actually make this work in current numpy. The solution is
> that the ufunc API needs to be changed to make dtype's somehow
> available to inner loops. (Probably by passing a pointer to the array
> object, like all the PyArray_ArrFuncs do.)
> 
> See this thread:
> http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html
> 
> - Both the statistical folk (pandas, statsmodels) and the hdf5 folk
> (pytables, h5py) have reasons to want better enum support. (Maybe
> there are other use cases too -- anyone I'm forgetting?) You should
> make sure to talk to both groups to make sure what you come up with
> will work for them.
> 
> Cheers,
> -- Nathaniel
> 
>> I am mostly a C programmer and have programmed in Python but not at
>> the level where my code wcould be considered "pretty" or maybe even
>> "pythonic". I know enums from C and have browsed around a few python
>> enum implementations online. Most of them use hash tables or lists to
>> associate names to numbers - these approaches just feel "heavy" to me.
>> 
>> What would be a proper "numpy approach" to this? I am looking mostly
>> for direction and advice as I would like to do the work myself :-)
>> 
>> Any input appreciated :-)
>> Ognen
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> <npenum.pyx>_______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From dhruvkaran at gmail.com  Wed Jan  4 01:25:10 2012
From: dhruvkaran at gmail.com (Dhruvkaran Mehta)
Date: Tue, 3 Jan 2012 22:25:10 -0800
Subject: [Numpy-discussion] SParse feature vector generation
Message-ID: <CAO=18DY=fpfWjX0axfce-GZhwZOAffwC+48CdxUdAp5hV4U3Hw@mail.gmail.com>

Hi numpy users,

*Is there a convenient way in numpy to go from "string" features like:*

"uc_berkeley", "google", 1
"stanford", "intel", 1
.
.
.
"uiuc", "texas_instruments", 0

*to a numpy matrix like:*

 "uc_berkeley", "stanford", ..., "uiuc", "google", "intel",
"texas_instruments", "bool"
          1                0         ...     0           1           0
           0                       1
          0                1         ...     0           0           1
           0                       1
          :
          0                0         ...     1           0           0
           1                       0

I really appreciate you taking the time to help!
Thanks!
--Dhruv
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120103/2a07fd74/attachment.html>

From gael.varoquaux at normalesup.org  Wed Jan  4 01:28:45 2012
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Wed, 4 Jan 2012 07:28:45 +0100
Subject: [Numpy-discussion] what is the best way to pass c,
 c++ array to numpy in cython?
In-Reply-To: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com>
References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com>
Message-ID: <20120104062845.GB22809@phare.normalesup.org>

On Tue, Jan 03, 2012 at 07:56:52PM -0800, Wonjun, Choi wrote:
> what is the best way to pass c, c++ array to numpy in cython?

I don't know if it is the best way, but I wrote a self-contained example
a little while ago, to explain to people one way of doing it:
http://gael-varoquaux.info/blog/?p=157

For multidimensional arrays, all you have to do is to pass in the full
shape and number of dimensions in the call to PyArray_SimpleNewFromData.

Hope this helps,

Gael


From xantares09 at hotmail.com  Wed Jan  4 05:22:39 2012
From: xantares09 at hotmail.com (xantares 09)
Date: Wed, 4 Jan 2012 10:22:39 +0000
Subject: [Numpy-discussion] PyInt and Numpy's int64 conversion
In-Reply-To: <CAJPUwMB07s4x=eyEFcHqJFO5Be8j5DmaWnJaSMZDaVPpejYOeA@mail.gmail.com>
References: <DUB110-W405C1C6C10422B293590CCC0AB0@phx.gbl>,
	<CAJPUwMBFOFBouJfAWoeE0+0sqZZ71FqUbEpHSAhBBeouomML_Q@mail.gmail.com>,
	<DUB110-W17B22102FE172632653F96C0A80@phx.gbl>,
	<CAJPUwMB07s4x=eyEFcHqJFO5Be8j5DmaWnJaSMZDaVPpejYOeA@mail.gmail.com>
Message-ID: <DUB110-W71092FE09119D92229D521C0970@phx.gbl>


> From: wesmckinn at gmail.com
> Date: Sat, 24 Dec 2011 19:51:06 -0500
> To: numpy-discussion at scipy.org
> Subject: Re: [Numpy-discussion] PyInt and Numpy's int64 conversion
> 
> On Sat, Dec 24, 2011 at 3:11 AM, xantares 09 <xantares09 at hotmail.com> wrote:
> >
> >
> >> From: wesmckinn at gmail.com
> >> Date: Fri, 23 Dec 2011 12:31:45 -0500
> >> To: numpy-discussion at scipy.org
> >> Subject: Re: [Numpy-discussion] PyInt and Numpy's int64 conversion
> >
> >>
> >> On Fri, Dec 23, 2011 at 4:37 AM, xantares 09 <xantares09 at hotmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I'm using Numpy from the C python api side while tweaking my SWIG
> >> > interface
> >> > to work with numpy array types.
> >> > I want to convert a numpy array of integers (whose elements are numpy's
> >> > 'int64')
> >> > The problem is that it this int64 type is not compatible with the
> >> > standard
> >> > python integer type:
> >> > I cannot use PyInt_Check, and PyInt_AsUnsignedLongMask to check and
> >> > convert
> >> > from int64: basically PyInt_Check returns false.
> >> > I checked the numpy config header and npy_int64 does have a size of 8o,
> >> > which should be the same as int on my x86_64.
> >> > What is the correct way to do that ?
> >> > I checked for a Int64_Check function and didn't find any in numpy
> >> > headers.
> >> >
> >> > Regards,
> >> >
> >> > x.
> >> >
> >> > _______________________________________________
> >> > NumPy-Discussion mailing list
> >> > NumPy-Discussion at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >> >
> >>
> >> hello,
> >>
> >> I think you'll want to use the C macro PyArray_IsIntegerScalar, e.g.
> >> in pandas I have the following function exposed to my Cython code:
> >>
> >> PANDAS_INLINE int
> >> is_integer_object(PyObject* obj) {
> >> return PyArray_IsIntegerScalar(obj);
> >> }
> >>
> >> last time I checked that macro detects Python int, long, and all of
> >> the NumPy integer hierarchy (int8, 16, 32, 64). If you ONLY want to
> >> check for int64 I am not 100% sure the best way.
> >>
> >> - Wes
> >
> > Hi,
> >
> > Thank you for your reply !
> >
> > That's the thing : I want to check/convert every type of integer, numpy's
> > int64 and also python standard ints.
> > Is there a way to avoid to use only the python api ? ( and avoid to depend
> > on numpy's PyArray_* functions )
> >
> > Regards.
> >
> > x.
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> 
> No. All of the PyTypeObject objects for the NumPy array scalars are
> explicitly part of the NumPy C API so you have no choice but to depend
> on that (to get the best performance). If you want to ONLY check for
> int64 at the C API level, I did a bit of digging and the relevant type
> definitions are in
> 
> https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/npy_common.h
> 
> so you'll want to do:
> 
> int is_int64(PyObject* obj){
>   return PyObject_TypeCheck(obj, &PyInt64ArrType_Type);
> }
> 
> and that will *only* detect np.int64
> 
> - Wes

Ok many thanks !

One last thing, do you happen to know how to actually convert an np int64 to a C int ?

- x.
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120104/88522622/attachment.html>

From derek at astro.physik.uni-goettingen.de  Wed Jan  4 06:29:36 2012
From: derek at astro.physik.uni-goettingen.de (Derek Homeier)
Date: Wed, 4 Jan 2012 12:29:36 +0100
Subject: [Numpy-discussion] find location of maximum values
In-Reply-To: <CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
Message-ID: <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>

On 04.01.2012, at 5:10AM, questions anon wrote:

> Thanks for your responses but I am still having difficuties with this problem. Using argmax gives me one very large value and I am not sure what it is. 
> There shouldn't be any issues with the shape. The latitude and longitude are the same shape always (covering a state) and the temperature (TSFC) data are hourly for a whole month. 

There will be an issue if not TSFC.shape == TIME.shape == LAT.shape == LON.shape

One needs more information on the structure of these data to say anything definite, 
but if e.g. your TSFC data have a time and a location dimension, argmax will 
per default return the index for the flattened array (see the argmax documentation 
for details, and how to use the axis keyword to get a different output). 
This might be the very large value you mention, and if your location data have fewer 
dimensions, the index will easily be out of range. As Ben wrote, you'd need extra work to 
find the maximum location, depending on what maximum you are actually looking for. 

As a speculative example, let's assume you have the temperature data in an 
array(ntime, nloc) and the position data in array(nloc). Then 

TSFC.argmax(axis=1) 

would give you the index for the hottest place for each hour of the month 
(i.e. actually an array of ntime indices, and pointer to so many different locations). 

To locate the maximum temperature for the entire month, your best way would probably 
be to first extract the array of (monthly) maximum temperatures in each location as

tmax = TSFC.max(axis=0)

which would have (in this example) the shape (nloc,), so you could directly use it to index 

LAT[tmax.argmax()]   etc. 

Cheers,
						Derek


From wonjunchoi001 at gmail.com  Wed Jan  4 21:26:05 2012
From: wonjunchoi001 at gmail.com (=?EUC-KR?B?w9a/+MHY?=)
Date: Thu, 5 Jan 2012 11:26:05 +0900
Subject: [Numpy-discussion] what is the best way to pass c,
 c++ array to numpy in cython?
In-Reply-To: <20120104062845.GB22809@phare.normalesup.org>
References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com>
	<20120104062845.GB22809@phare.normalesup.org>
Message-ID: <CANO3Hx1OdrhiXiZ30BHsH0CCsXX6fn8=-DPhjpO+QRaKOiofSg@mail.gmail.com>

it seems like you recommend below way.
Cython example of exposing C-computed arrays in Python without data copies
https://gist.github.com/1249305

but it uses malloc. isn't it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120105/46b04149/attachment.html>

From gael.varoquaux at normalesup.org  Thu Jan  5 02:03:38 2012
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Thu, 5 Jan 2012 08:03:38 +0100
Subject: [Numpy-discussion] what is the best way to pass c,
 c++ array to numpy in cython?
In-Reply-To: <CANO3Hx1OdrhiXiZ30BHsH0CCsXX6fn8=-DPhjpO+QRaKOiofSg@mail.gmail.com>
References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com>
	<20120104062845.GB22809@phare.normalesup.org>
	<CANO3Hx1OdrhiXiZ30BHsH0CCsXX6fn8=-DPhjpO+QRaKOiofSg@mail.gmail.com>
Message-ID: <20120105070338.GD21804@phare.normalesup.org>

On Thu, Jan 05, 2012 at 11:26:05AM +0900, ??? wrote:
>    it seems like you recommend below way.
>    Cython example of exposing C-computed arrays in Python without data copies
>    [1]https://gist.github.com/1249305

>    but it uses malloc. isn't it?

In this example, the data can be allocated the way you want in C. Malloc
is just an implementation detail, you can write the code you want
instead.

Gael


From wonjunchoi001 at gmail.com  Thu Jan  5 02:04:52 2012
From: wonjunchoi001 at gmail.com (=?EUC-KR?B?w9a/+MHY?=)
Date: Thu, 5 Jan 2012 16:04:52 +0900
Subject: [Numpy-discussion] what is the best way to pass c,
 c++ array to numpy in cython?
In-Reply-To: <20120105070338.GD21804@phare.normalesup.org>
References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com>
	<20120104062845.GB22809@phare.normalesup.org>
	<CANO3Hx1OdrhiXiZ30BHsH0CCsXX6fn8=-DPhjpO+QRaKOiofSg@mail.gmail.com>
	<20120105070338.GD21804@phare.normalesup.org>
Message-ID: <CANO3Hx3AetaMPnEm54EpCXjQ-+ag-brGMbn0yTom32fcQkRwNQ@mail.gmail.com>

can I pass the array without malloc?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120105/e16a6fcf/attachment.html>

From gael.varoquaux at normalesup.org  Thu Jan  5 04:08:44 2012
From: gael.varoquaux at normalesup.org (Gael Varoquaux)
Date: Thu, 5 Jan 2012 10:08:44 +0100
Subject: [Numpy-discussion] what is the best way to pass c,
 c++ array to numpy in cython?
In-Reply-To: <CANO3Hx3AetaMPnEm54EpCXjQ-+ag-brGMbn0yTom32fcQkRwNQ@mail.gmail.com>
References: <3e298b7c-e22f-4734-8b53-cd70e3f80fe8@i8g2000vbh.googlegroups.com>
	<20120104062845.GB22809@phare.normalesup.org>
	<CANO3Hx1OdrhiXiZ30BHsH0CCsXX6fn8=-DPhjpO+QRaKOiofSg@mail.gmail.com>
	<20120105070338.GD21804@phare.normalesup.org>
	<CANO3Hx3AetaMPnEm54EpCXjQ-+ag-brGMbn0yTom32fcQkRwNQ@mail.gmail.com>
Message-ID: <20120105090844.GA17920@phare.normalesup.org>

On Thu, Jan 05, 2012 at 04:04:52PM +0900, ??? wrote:
>    can I pass the array without malloc?

An array is a pointer in C, so yes you can do what you want.

G


From barthelemy at crans.org  Thu Jan  5 09:02:30 2012
From: barthelemy at crans.org (=?ISO-8859-1?Q?S=E9bastien_Barth=E9l=E9my?=)
Date: Thu, 5 Jan 2012 15:02:30 +0100
Subject: [Numpy-discussion] Error in numpy.load example?
Message-ID: <CAPkDDd6OHZnK7Vn1Q4XRwj+dq50-UXe=nj0SQ5nCfAsup3CEYA@mail.gmail.com>

Hi all,

the doc http://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html
contains the following example:

 Store compressed data to disk, and load it again:

 >>> np.savez('/tmp/123.npz', a=np.array([[1, 2, 3], [4, 5, 6]]),
b=np.array([1, 2]))
 >>> data = np.load('/tmp/123.npy')

However http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html
says:

 numpy.savez(file, *args, **kwds)?

     Save several arrays into a single file in uncompressed .npz format.

Moreover, this last page points to an undocumented numpy.savez_compressed
function, which is also non-existent in my version of numpy (1.5.1-2ubuntu2).

That's quite confusing.

>From the following thread, it seems the arrays are stored uncompressed
in a zipfile.
http://article.gmane.org/gmane.comp.python.numeric.general/38378

Can somebody confirm that/fix the docs?

Cheers
-- S?bastien


From jsalvati at u.washington.edu  Thu Jan  5 14:22:50 2012
From: jsalvati at u.washington.edu (John Salvatier)
Date: Thu, 5 Jan 2012 11:22:50 -0800
Subject: [Numpy-discussion] "Symbol table not found" compiling numpy from
	git repository on Windows
Message-ID: <CAJtbx930-EJBMoyJ_RgOAtGrz=RqQbBCv=wXy3=kkQ0Lizqujg@mail.gmail.com>

Hello,

I'm trying to compile numpy on Windows 7 using the command: "python
setup.py config --compiler=mingw32 build" but I get an error about a symbol
table not found. Anyone know how to work around this or what to look into?

building library "npymath" sources
Building msvcr library: "C:\Python26\libs\libmsvcr90.a" (from
C:\Windows\winsxs\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.21022.8_none_750b37ff97f4f68b\msvcr90.dll)
objdump.exe:
C:\Windows\winsxs\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.21022.8_none_750b37ff97f4f68b\msvcr90.dll:
File format not recognized
Traceback (most recent call last):
  File "setup.py", line 214, in <module>
    setup_package()
  File "setup.py", line 207, in setup_package
    configuration=configuration )
  File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\core.py", line
186, in setup
    return old_setup(**new_attr)
  File "C:\Python26\lib\distutils\core.py", line 152, in setup
    dist.run_commands()
  File "C:\Python26\lib\distutils\dist.py", line 975, in run_commands
    self.run_command(cmd)
  File "C:\Python26\lib\distutils\dist.py", line 995, in run_command
    cmd_obj.run()
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build.py",
line 37, in run
    old_build.run(self)
  File "C:\Python26\lib\distutils\command\build.py", line 134, in run
    self.run_command(cmd_name)
  File "C:\Python26\lib\distutils\cmd.py", line 333, in run_command
    self.distribution.run_command(command)
  File "C:\Python26\lib\distutils\dist.py", line 995, in run_command
    cmd_obj.run()
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py",
line 152, in run
    self.build_sources()
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py",
line 163, in build_sources
    self.build_library_sources(*libname_info)
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py",
line 298, in build_library_sources
    sources = self.generate_sources(sources, (lib_name, build_info))
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py",
line 385, in generate_sources
    source = func(extension, build_dir)
  File "numpy\core\setup.py", line 646, in get_mathlib_info
    st = config_cmd.try_link('int main(void) { return 0;}')
  File "C:\Python26\lib\distutils\command\config.py", line 257, in try_link
    self._check_compiler()
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\config.py",
line 45, in _check_compiler
    old_config._check_compiler(self)
  File "C:\Python26\lib\distutils\command\config.py", line 107, in
_check_compiler
    dry_run=self.dry_run, force=1)
  File "C:\Users\jsalvatier\workspace\numpy\numpy\distutils\ccompiler.py",
line 560, in new_compiler
    compiler = klass(None, dry_run, force)
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py",
line 94, in __init__
    msvcr_success = build_msvcr_library()
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py",
line 362, in build_msvcr_library
    generate_def(dll_file, def_file)
  File
"C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py",
line 282, in generate_def
    raise ValueError("Symbol table not found")
ValueError: Symbol table not found

Thank you,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120105/5f2bf78c/attachment.html>

From schut at sarvision.nl  Fri Jan  6 04:52:41 2012
From: schut at sarvision.nl (Vincent Schut)
Date: Fri, 6 Jan 2012 09:52:41 +0000 (UTC)
Subject: [Numpy-discussion] find location of maximum values
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
Message-ID: <je6g99$mbf$1@dough.gmane.org>

On Wed, 04 Jan 2012 12:29:36 +0100, Derek Homeier wrote:

> On 04.01.2012, at 5:10AM, questions anon wrote:
> 
>> Thanks for your responses but I am still having difficuties with this
>> problem. Using argmax gives me one very large value and I am not sure
>> what it is.

it is the index in the flattened array. To translate this into a 
multidimensional index, use numpy.unravel_index(i, original_shape).

Cheers,
Vincent.


From dkoepfer at gmx.de  Fri Jan  6 07:15:22 2012
From: dkoepfer at gmx.de (=?iso-8859-1?Q?=22David_K=F6pfer=22?=)
Date: Fri, 06 Jan 2012 13:15:22 +0100
Subject: [Numpy-discussion] filling an alice of array of object with a
 reference to an object that has a __getitem__ method
In-Reply-To: <je6g99$mbf$1@dough.gmane.org>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<je6g99$mbf$1@dough.gmane.org>
Message-ID: <20120106121522.203240@gmx.net>

Dear numpy community,

I'm trying to create an array of type object.

A = empty(9, dtype=object)
A[ array(0,1,2) ] = MyObject(1)
A[ array(3,4,5) ] = MyObject(2)
A[ array(6,7,8) ] = MyObject(3)

This has worked well until MyObject has gotten an __getitem__ method. Now python (as it is usually supposed to) assigns A[0] to MyObject(1)[0], [1] to MyObject(1)[1] and so on. 

Is there any way to just get a reference of the instance of MyObject into every entry of the array slice? 

Thank you for any help on this problem
David


From ralf.gommers at googlemail.com  Fri Jan  6 10:15:09 2012
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Fri, 6 Jan 2012 16:15:09 +0100
Subject: [Numpy-discussion] numpy 1.7.0 release?
In-Reply-To: <CABL7CQgDfW_Poeu+sEPoKFg-bZRqbAFLN3mfypSAhObGQJ4t0A@mail.gmail.com>
References: <CABL7CQhu8mPep9H7tKHnK_cyn93tLEN8JGC2TEbTLEXaZSE29Q@mail.gmail.com>
	<CAB6mnx+PTCJoME_bNyPqN4sj4JdTQobC-vbqP_jsnJESvxi8=Q@mail.gmail.com>
	<CABL7CQgDfW_Poeu+sEPoKFg-bZRqbAFLN3mfypSAhObGQJ4t0A@mail.gmail.com>
Message-ID: <CABL7CQjVFkv9dDdMPyqRBU1muK+QW1T0G2d7V8MVteLeYsS3SA@mail.gmail.com>

On Tue, Dec 20, 2011 at 9:28 PM, Ralf Gommers
<ralf.gommers at googlemail.com>wrote:

>
>
> On Tue, Dec 20, 2011 at 3:18 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>> Hi Ralf,
>>
>> On Mon, Dec 5, 2011 at 12:43 PM, Ralf Gommers <
>> ralf.gommers at googlemail.com> wrote:
>>
>>> Hi all,
>>>
>>> It's been a little over 6 months since the release of 1.6.0 and the NA
>>> debate has quieted down, so I'd like to ask your opinion on the timing of
>>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and small
>>> improvements, plus three larger chucks of work:
>>>
>>> - datetime
>>> - NA
>>> - Bento support
>>>
>>> My impression is that both datetime and NA are releasable, but should be
>>> labeled "tech preview" or something similar, because they may still see
>>> significant changes. Please correct me if I'm wrong.
>>>
>>> There's still some maintenance work to do and pull requests to merge,
>>> but a beta release by Christmas should be feasible. What do you all think?
>>>
>>>
>> I'm now thinking that is too optimistic. There are a fair number of
>> tickets that need to be looked at, including some for einsum and the
>> iterator, and I think the number of pull requests needs to be reduced. How
>> about sometime in the beginning of January?
>>
>>
> Yes, it certainly was. Besides the tickets and pull requests, we also need
> the support for MinGW 4.x that David is looking at. If that goes smoothly
> then the first week of January may be feasible, otherwise it'll have to be
> February (I'm traveling for most of Jan). Or someone else has to volunteer
> to be the release manager for this release.
>

There isn't really much progress here. Besides a few smaller issues that
still need attention, I think the MinGW 4.x issue is a blocker and needs to
be resolved. This can be done either by making it work, or deciding to
stick with 3.x. In the latter case numpy.datetime should be fixed somehow.

For the next three weeks I'm traveling and won't be able to do any work on
numpy. I propose to keep master in a state that's (close to being)
releasable until the blocker issue is resolved and we can create a 1.7.x
branch.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120106/247cdb28/attachment.html>

From wesmckinn at gmail.com  Fri Jan  6 16:00:00 2012
From: wesmckinn at gmail.com (Wes McKinney)
Date: Fri, 6 Jan 2012 16:00:00 -0500
Subject: [Numpy-discussion] PyInt and Numpy's int64 conversion
In-Reply-To: <DUB110-W71092FE09119D92229D521C0970@phx.gbl>
References: <DUB110-W405C1C6C10422B293590CCC0AB0@phx.gbl>
	<CAJPUwMBFOFBouJfAWoeE0+0sqZZ71FqUbEpHSAhBBeouomML_Q@mail.gmail.com>
	<DUB110-W17B22102FE172632653F96C0A80@phx.gbl>
	<CAJPUwMB07s4x=eyEFcHqJFO5Be8j5DmaWnJaSMZDaVPpejYOeA@mail.gmail.com>
	<DUB110-W71092FE09119D92229D521C0970@phx.gbl>
Message-ID: <CAJPUwMCA7Tm8-rbtFKxndPHTi33dLyJNJGfQ9RddxuRL_ZM3=A@mail.gmail.com>

On Wed, Jan 4, 2012 at 5:22 AM, xantares 09 <xantares09 at hotmail.com> wrote:
>
>
>> From: wesmckinn at gmail.com
>> Date: Sat, 24 Dec 2011 19:51:06 -0500
>
>> To: numpy-discussion at scipy.org
>> Subject: Re: [Numpy-discussion] PyInt and Numpy's int64 conversion
>>
>> On Sat, Dec 24, 2011 at 3:11 AM, xantares 09 <xantares09 at hotmail.com>
>> wrote:
>> >
>> >
>> >> From: wesmckinn at gmail.com
>> >> Date: Fri, 23 Dec 2011 12:31:45 -0500
>> >> To: numpy-discussion at scipy.org
>> >> Subject: Re: [Numpy-discussion] PyInt and Numpy's int64 conversion
>> >
>> >>
>> >> On Fri, Dec 23, 2011 at 4:37 AM, xantares 09 <xantares09 at hotmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I'm using Numpy from the C python api side while tweaking my SWIG
>> >> > interface
>> >> > to work with numpy array types.
>> >> > I want to convert a numpy array of integers (whose elements are
>> >> > numpy's
>> >> > 'int64')
>> >> > The problem is that it this int64 type is not compatible with the
>> >> > standard
>> >> > python integer type:
>> >> > I cannot use PyInt_Check, and PyInt_AsUnsignedLongMask to check and
>> >> > convert
>> >> > from int64: basically PyInt_Check returns false.
>> >> > I checked the numpy config header and npy_int64 does have a size of
>> >> > 8o,
>> >> > which should be the same as int on my x86_64.
>> >> > What is the correct way to do that ?
>> >> > I checked for a Int64_Check function and didn't find any in numpy
>> >> > headers.
>> >> >
>> >> > Regards,
>> >> >
>> >> > x.
>> >> >
>> >> > _______________________________________________
>> >> > NumPy-Discussion mailing list
>> >> > NumPy-Discussion at scipy.org
>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >> >
>> >>
>> >> hello,
>> >>
>> >> I think you'll want to use the C macro PyArray_IsIntegerScalar, e.g.
>> >> in pandas I have the following function exposed to my Cython code:
>> >>
>> >> PANDAS_INLINE int
>> >> is_integer_object(PyObject* obj) {
>> >> return PyArray_IsIntegerScalar(obj);
>> >> }
>> >>
>> >> last time I checked that macro detects Python int, long, and all of
>> >> the NumPy integer hierarchy (int8, 16, 32, 64). If you ONLY want to
>> >> check for int64 I am not 100% sure the best way.
>> >>
>> >> - Wes
>> >
>> > Hi,
>> >
>> > Thank you for your reply !
>> >
>> > That's the thing : I want to check/convert every type of integer,
>> > numpy's
>> > int64 and also python standard ints.
>> > Is there a way to avoid to use only the python api ? ( and avoid to
>> > depend
>> > on numpy's PyArray_* functions )
>> >
>> > Regards.
>> >
>> > x.
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>>
>> No. All of the PyTypeObject objects for the NumPy array scalars are
>> explicitly part of the NumPy C API so you have no choice but to depend
>> on that (to get the best performance). If you want to ONLY check for
>> int64 at the C API level, I did a bit of digging and the relevant type
>> definitions are in
>>
>>
>> https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/npy_common.h
>>
>> so you'll want to do:
>>
>> int is_int64(PyObject* obj){
>> return PyObject_TypeCheck(obj, &PyInt64ArrType_Type);
>> }
>>
>> and that will *only* detect np.int64
>>
>> - Wes
>
> Ok many thanks !
>
> One last thing, do you happen to know how to actually convert an np int64 to
> a C int ?
>
> - x.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Not sure off-hand. You'll have to look at the NumPy scalar API in the C code


From travis at continuum.io  Fri Jan  6 16:31:45 2012
From: travis at continuum.io (Travis Oliphant)
Date: Fri, 6 Jan 2012 15:31:45 -0600
Subject: [Numpy-discussion] PyInt and Numpy's int64 conversion
In-Reply-To: <CAJPUwMCA7Tm8-rbtFKxndPHTi33dLyJNJGfQ9RddxuRL_ZM3=A@mail.gmail.com>
References: <DUB110-W405C1C6C10422B293590CCC0AB0@phx.gbl>
	<CAJPUwMBFOFBouJfAWoeE0+0sqZZ71FqUbEpHSAhBBeouomML_Q@mail.gmail.com>
	<DUB110-W17B22102FE172632653F96C0A80@phx.gbl>
	<CAJPUwMB07s4x=eyEFcHqJFO5Be8j5DmaWnJaSMZDaVPpejYOeA@mail.gmail.com>
	<DUB110-W71092FE09119D92229D521C0970@phx.gbl>
	<CAJPUwMCA7Tm8-rbtFKxndPHTi33dLyJNJGfQ9RddxuRL_ZM3=A@mail.gmail.com>
Message-ID: <EC33EEDA-D057-4A59-A3EE-15021C221F87@continuum.io>

>>> 
>>> No. All of the PyTypeObject objects for the NumPy array scalars are
>>> explicitly part of the NumPy C API so you have no choice but to depend
>>> on that (to get the best performance). If you want to ONLY check for
>>> int64 at the C API level, I did a bit of digging and the relevant type
>>> definitions are in
>>> 
>>> 
>>> https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/npy_common.h
>>> 
>>> so you'll want to do:
>>> 
>>> int is_int64(PyObject* obj){
>>> return PyObject_TypeCheck(obj, &PyInt64ArrType_Type);
>>> }
>>> 
>>> and that will *only* detect np.int64
>>> 
>>> - Wes
>> 
>> Ok many thanks !
>> 
>> One last thing, do you happen to know how to actually convert an np int64 to
>> a C int ?
>> 
>> - x.
> 
> Not sure off-hand. You'll have to look at the NumPy scalar API in the C code

What is it you want to do?   Do you want to get the C int out of the np.int64 *Python* object?  

If so, you do: 

npy_int64 val
PyArray_ScalarAsCtype(obj, &val);

If you want to get the C int of a *different* type out of the scalar Python object, you do: 

npy_int32 val
PyArray_Descr * outcode = PyArray_DescrFromType(NPY_INT32);
PyArray_CastScalarToCtype(obj, &val, outcode);
Py_DECREF(outcode);


-Travis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120106/4118e67e/attachment.html>

From bsouthey at gmail.com  Fri Jan  6 20:45:31 2012
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 6 Jan 2012 19:45:31 -0600
Subject: [Numpy-discussion] numpy 1.7.0 release?
In-Reply-To: <CABL7CQjVFkv9dDdMPyqRBU1muK+QW1T0G2d7V8MVteLeYsS3SA@mail.gmail.com>
References: <CABL7CQhu8mPep9H7tKHnK_cyn93tLEN8JGC2TEbTLEXaZSE29Q@mail.gmail.com>
	<CAB6mnx+PTCJoME_bNyPqN4sj4JdTQobC-vbqP_jsnJESvxi8=Q@mail.gmail.com>
	<CABL7CQgDfW_Poeu+sEPoKFg-bZRqbAFLN3mfypSAhObGQJ4t0A@mail.gmail.com>
	<CABL7CQjVFkv9dDdMPyqRBU1muK+QW1T0G2d7V8MVteLeYsS3SA@mail.gmail.com>
Message-ID: <CAAea2pZH1up4nkJQuTOkAbjUKsV7dPyDx0gu-M0r8SjiYKS2xQ@mail.gmail.com>

On Fri, Jan 6, 2012 at 9:15 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Tue, Dec 20, 2011 at 9:28 PM, Ralf Gommers <ralf.gommers at googlemail.com>
> wrote:
>>
>>
>>
>> On Tue, Dec 20, 2011 at 3:18 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>> Hi Ralf,
>>>
>>> On Mon, Dec 5, 2011 at 12:43 PM, Ralf Gommers
>>> <ralf.gommers at googlemail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> It's been a little over 6 months since the release of 1.6.0 and the NA
>>>> debate has quieted down, so I'd like to ask your opinion on the timing of
>>>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and small
>>>> improvements, plus three larger chucks of work:
>>>>
>>>> - datetime
>>>> - NA
>>>> - Bento support
>>>>
>>>> My impression is that both datetime and NA are releasable, but should be
>>>> labeled "tech preview" or something similar, because they may still see
>>>> significant changes. Please correct me if I'm wrong.
>>>>
>>>> There's still some maintenance work to do and pull requests to merge,
>>>> but a beta release by Christmas should be feasible. What do you all think?
>>>>
>>>
>>> I'm now thinking that is too optimistic. There are a fair number of
>>> tickets that need to be looked at, including some for einsum and the
>>> iterator, and I think the number of pull requests needs to be reduced. How
>>> about sometime in the beginning of January?
>>>
>>
>> Yes, it certainly was. Besides the tickets and pull requests, we also need
>> the support for MinGW 4.x that David is looking at. If that goes smoothly
>> then the first week of January may be feasible, otherwise it'll have to be
>> February (I'm traveling for most of Jan). Or someone else has to volunteer
>> to be the release manager for this release.
>
>
> There isn't really much progress here. Besides a few smaller issues that
> still need attention, I think the MinGW 4.x issue is a blocker and needs to
> be resolved. This can be done either by making it work, or deciding to stick
> with 3.x. In the latter case numpy.datetime should be fixed somehow.
>
> For the next three weeks I'm traveling and won't be able to do any work on
> numpy. I propose to keep master in a state that's (close to being)
> releasable until the blocker issue is resolved and we can create a 1.7.x
> branch.
>
> Ralf
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I think that my ticket 1973
(http://projects.scipy.org/numpy/ticket/1973) "Can not display a
masked array containing np.NA values even if masked" that is due to
the astype function not handling the NA object is also a blocker.

Bruce


From ralf.gommers at googlemail.com  Sat Jan  7 03:11:15 2012
From: ralf.gommers at googlemail.com (Ralf Gommers)
Date: Sat, 7 Jan 2012 09:11:15 +0100
Subject: [Numpy-discussion] numpy 1.7.0 release?
In-Reply-To: <CAAea2pZH1up4nkJQuTOkAbjUKsV7dPyDx0gu-M0r8SjiYKS2xQ@mail.gmail.com>
References: <CABL7CQhu8mPep9H7tKHnK_cyn93tLEN8JGC2TEbTLEXaZSE29Q@mail.gmail.com>
	<CAB6mnx+PTCJoME_bNyPqN4sj4JdTQobC-vbqP_jsnJESvxi8=Q@mail.gmail.com>
	<CABL7CQgDfW_Poeu+sEPoKFg-bZRqbAFLN3mfypSAhObGQJ4t0A@mail.gmail.com>
	<CABL7CQjVFkv9dDdMPyqRBU1muK+QW1T0G2d7V8MVteLeYsS3SA@mail.gmail.com>
	<CAAea2pZH1up4nkJQuTOkAbjUKsV7dPyDx0gu-M0r8SjiYKS2xQ@mail.gmail.com>
Message-ID: <CABL7CQjm_iCKCbaufoi8=--8KAWKr4oQvFC+vgka4JU2ra8D4Q@mail.gmail.com>

On Sat, Jan 7, 2012 at 2:45 AM, Bruce Southey <bsouthey at gmail.com> wrote:

> On Fri, Jan 6, 2012 at 9:15 AM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
> >
> >
> > On Tue, Dec 20, 2011 at 9:28 PM, Ralf Gommers <
> ralf.gommers at googlemail.com>
> > wrote:
> >>
> >>
> >>
> >> On Tue, Dec 20, 2011 at 3:18 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >>>
> >>> Hi Ralf,
> >>>
> >>> On Mon, Dec 5, 2011 at 12:43 PM, Ralf Gommers
> >>> <ralf.gommers at googlemail.com> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> It's been a little over 6 months since the release of 1.6.0 and the NA
> >>>> debate has quieted down, so I'd like to ask your opinion on the
> timing of
> >>>> 1.7.0. It looks to me like we have a healthy amount of bug fixes and
> small
> >>>> improvements, plus three larger chucks of work:
> >>>>
> >>>> - datetime
> >>>> - NA
> >>>> - Bento support
> >>>>
> >>>> My impression is that both datetime and NA are releasable, but should
> be
> >>>> labeled "tech preview" or something similar, because they may still
> see
> >>>> significant changes. Please correct me if I'm wrong.
> >>>>
> >>>> There's still some maintenance work to do and pull requests to merge,
> >>>> but a beta release by Christmas should be feasible. What do you all
> think?
> >>>>
> >>>
> >>> I'm now thinking that is too optimistic. There are a fair number of
> >>> tickets that need to be looked at, including some for einsum and the
> >>> iterator, and I think the number of pull requests needs to be reduced.
> How
> >>> about sometime in the beginning of January?
> >>>
> >>
> >> Yes, it certainly was. Besides the tickets and pull requests, we also
> need
> >> the support for MinGW 4.x that David is looking at. If that goes
> smoothly
> >> then the first week of January may be feasible, otherwise it'll have to
> be
> >> February (I'm traveling for most of Jan). Or someone else has to
> volunteer
> >> to be the release manager for this release.
> >
> >
> > There isn't really much progress here. Besides a few smaller issues that
> > still need attention, I think the MinGW 4.x issue is a blocker and needs
> to
> > be resolved. This can be done either by making it work, or deciding to
> stick
> > with 3.x. In the latter case numpy.datetime should be fixed somehow.
> >
> > For the next three weeks I'm traveling and won't be able to do any work
> on
> > numpy. I propose to keep master in a state that's (close to being)
> > releasable until the blocker issue is resolved and we can create a 1.7.x
> > branch.
> >
> > Ralf
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> I think that my ticket 1973
> (http://projects.scipy.org/numpy/ticket/1973) "Can not display a
> masked array containing np.NA values even if masked" that is due to
> the astype function not handling the NA object is also a blocker.
>
> I've set it to Milestone 1.7.0. This should be done for all tickets that
are important for this release, so we can keep track of it.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120107/dee3cff3/attachment.html>

From faltet at gmail.com  Sun Jan  8 05:47:26 2012
From: faltet at gmail.com (Francesc Alted)
Date: Sun, 8 Jan 2012 11:47:26 +0100
Subject: [Numpy-discussion] ANN: Numexpr 2.0 released
In-Reply-To: <CAJewx89m6hRFMznSQ2WcX-PEUo76xCg7CbDC2MuLn=ipi8m4gQ@mail.gmail.com>
References: <201111271400.48560.faltet@pytables.org>
	<CAJewx89m6hRFMznSQ2WcX-PEUo76xCg7CbDC2MuLn=ipi8m4gQ@mail.gmail.com>
Message-ID: <CAFrp1vpscmNstnbG2_F2gPvishC88Wk5DO3bPkuGNBEWLOACCA@mail.gmail.com>

Hi srean,

Sorry for being late answering, the latest weeks have been really
crazy for me.  See my comments below.

2011/12/13 srean <srean.list at gmail.com>:
> This is great news, I hope this gets included in the epd distribution soon.
>
> I had mailed a few questions about numexpr sometime ago. I am still
> curious about those. I have included the relevant parts below. In
> addition, I have another question. There was a numexpr branch that
> allows a "out=blah" parameer to build the output in place, has that
> been merged or its functionality incorporated ?

Yes, the `out` parameter is fully supported in 2.0 series, as well as
new `order` and `casting` ones.  These are fully documented in
docstrings in forthcoming 2.0.1, as well as in the new User's Guide
wiki page at:

http://code.google.com/p/numexpr/wiki/UsersGuide

Thanks for pointing this out!

> This goes without saying, but, thanks for numexpr.
>
> -- ?from old mail --
>
> What I find somewhat encumbering is that there is no single piece of
> document that lists all the operators and functions that numexpr can
> parse. For a new user this will be very useful There is a list in the
> wiki page entitled "overview" but it seems incomplete (for instance it
> does not describe the reduction operations available). I do not know
> enough to know how incomplete it is.

The reduction functions are just `sum()` and `prod()` and are fully
documented in the new User's Guide.

>
> Is there any plan to implement the reduction like enhancements that
> ufuncs provide: namely reduce_at, accumulate, reduce ? It is entirely
> possible that they are already in there but I could not figure out how
> to use them. If they aren't it would be great to have them.

No, these are not implemented, but we will gladly accept contributions ;)

-- 
Francesc Alted


From faltet at gmail.com  Sun Jan  8 05:49:53 2012
From: faltet at gmail.com (Francesc Alted)
Date: Sun, 8 Jan 2012 11:49:53 +0100
Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released
Message-ID: <CAFrp1vq8UchvSc0KETncb3xUevPD1tqLg8bAE=NjFAdAgswBqQ@mail.gmail.com>

==========================
 Announcing Numexpr 2.0.1
==========================

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
VML library, which allows for squeezing the last drop of performance
out of your multi-core processors.

What's new
==========

In this release, better docstrings for `evaluate` and reduction
methods (`sum`, `prod`) is in place.  Also, compatibility with Python
2.5 has been restored (2.4 is definitely not supported anymore).

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=========================

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

You can get the packages from PyPI as well:

http://pypi.python.org/pypi/numexpr

Share your experience
=====================

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

-- 
Francesc Alted


From nadavh at visionsense.com  Sun Jan  8 06:11:47 2012
From: nadavh at visionsense.com (Nadav Horesh)
Date: Sun, 8 Jan 2012 03:11:47 -0800
Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released
In-Reply-To: <CAFrp1vq8UchvSc0KETncb3xUevPD1tqLg8bAE=NjFAdAgswBqQ@mail.gmail.com>
References: <CAFrp1vq8UchvSc0KETncb3xUevPD1tqLg8bAE=NjFAdAgswBqQ@mail.gmail.com>
Message-ID: <26FC23E7C398A64083C980D16001012D261E514754@VA3DIAXVS361.RED001.local>

What about python3 support?

 Thanks

    Nadav.

________________________________________
From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Francesc Alted [faltet at gmail.com]
Sent: 08 January 2012 12:49
To: Discussion of Numerical Python; numexpr
Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released

==========================
 Announcing Numexpr 2.0.1
==========================

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
VML library, which allows for squeezing the last drop of performance
out of your multi-core processors.

What's new
==========

In this release, better docstrings for `evaluate` and reduction
methods (`sum`, `prod`) is in place.  Also, compatibility with Python
2.5 has been restored (2.4 is definitely not supported anymore).

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=========================

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

You can get the packages from PyPI as well:

http://pypi.python.org/pypi/numexpr

Share your experience
=====================

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

--
Francesc Alted
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

From faltet at gmail.com  Sun Jan  8 08:08:11 2012
From: faltet at gmail.com (Francesc Alted)
Date: Sun, 8 Jan 2012 14:08:11 +0100
Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released
In-Reply-To: <26FC23E7C398A64083C980D16001012D261E514754@VA3DIAXVS361.RED001.local>
References: <CAFrp1vq8UchvSc0KETncb3xUevPD1tqLg8bAE=NjFAdAgswBqQ@mail.gmail.com>
	<26FC23E7C398A64083C980D16001012D261E514754@VA3DIAXVS361.RED001.local>
Message-ID: <CAFrp1vrXkT5St3a13mKR8OXQjtJAQ-Lx9xcG9cCjEDXokPKPMw@mail.gmail.com>

Python3 is not on my radar yet.  Perhaps others might be interested on
doing the port.

Francesc

2012/1/8 Nadav Horesh <nadavh at visionsense.com>:
> What about python3 support?
>
> ?Thanks
>
> ? ?Nadav.
>
> ________________________________________
> From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Francesc Alted [faltet at gmail.com]
> Sent: 08 January 2012 12:49
> To: Discussion of Numerical Python; numexpr
> Subject: [Numpy-discussion] ANN: Numexpr 2.0.1 released
>
> ==========================
> ?Announcing Numexpr 2.0.1
> ==========================
>
> Numexpr is a fast numerical expression evaluator for NumPy. ?With it,
> expressions that operate on arrays (like "3*a+4*b") are accelerated
> and use less memory than doing the same calculation in Python.
>
> It wears multi-threaded capabilities, as well as support for Intel's
> VML library, which allows for squeezing the last drop of performance
> out of your multi-core processors.
>
> What's new
> ==========
>
> In this release, better docstrings for `evaluate` and reduction
> methods (`sum`, `prod`) is in place. ?Also, compatibility with Python
> 2.5 has been restored (2.4 is definitely not supported anymore).
>
> In case you want to know more in detail what has changed in this
> version, see:
>
> http://code.google.com/p/numexpr/wiki/ReleaseNotes
>
> or have a look at RELEASE_NOTES.txt in the tarball.
>
> Where I can find Numexpr?
> =========================
>
> The project is hosted at Google code in:
>
> http://code.google.com/p/numexpr/
>
> You can get the packages from PyPI as well:
>
> http://pypi.python.org/pypi/numexpr
>
> Share your experience
> =====================
>
> Let us know of any bugs, suggestions, gripes, kudos, etc. you may
> have.
>
>
> Enjoy!
>
> --
> Francesc Alted
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted


From pierre.haessig at crans.org  Sun Jan  8 08:56:36 2012
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Sun, 08 Jan 2012 14:56:36 +0100
Subject: [Numpy-discussion] Error in numpy.load example?
In-Reply-To: <CAPkDDd6OHZnK7Vn1Q4XRwj+dq50-UXe=nj0SQ5nCfAsup3CEYA@mail.gmail.com>
References: <CAPkDDd6OHZnK7Vn1Q4XRwj+dq50-UXe=nj0SQ5nCfAsup3CEYA@mail.gmail.com>
Message-ID: <4F09A094.9050304@crans.org>

Hi Sebastien,

Le 05/01/2012 15:02, S?bastien Barth?l?my a ?crit :
> However http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html
> says:
>
>  numpy.savez(file, *args, **kwds)?
>
>      Save several arrays into a single file in uncompressed .npz format.
>
> Moreover, this last page points to an undocumented numpy.savez_compressed
> function, which is also non-existent in my version of numpy (1.5.1-2ubuntu2).

Indeed, this online doc is not consistent with the numpy.savez *version
1.5*...  [3]

It seems there was an API update between 1.5 and 1.6 ([1][2]).
(As a Debian testing user, I was also unaware of this change. Thanks for
pointing that out, at least for me ! )

Now, I think Sebastien's question raises an interesting practical issue :
If I'm correct, there is no place in the HTML page where the numpy
version is written, except in <title>
(and except the root of the manual
http://docs.scipy.org/doc/numpy/reference/index.html )

Maybe, there should be some "top page version indicator". And possibly
some way (a drop down menu ??) to switch between versions. (I don't know
how this would fit in the Sphinx template system however...)
What do other people think ?

Best,
Pierre

[1]
http://docs.scipy.org/doc/numpy-1.5.x/reference/generated/numpy.savez.html#numpy.savez
[2]
http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.savez.html#numpy.savez
[3] There is even a ticket about this !
http://projects.scipy.org/numpy/ticket/1696


From shish at keba.be  Sun Jan  8 16:16:33 2012
From: shish at keba.be (Olivier Delalleau)
Date: Sun, 8 Jan 2012 16:16:33 -0500
Subject: [Numpy-discussion] filling an alice of array of object with a
 reference to an object that has a __getitem__ method
In-Reply-To: <20120106121522.203240@gmx.net>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<je6g99$mbf$1@dough.gmane.org> <20120106121522.203240@gmx.net>
Message-ID: <CAFXk4boAw1gAQA=PFbBsza0rTKsydQbn=k3NWPj9roUk+TdzAw@mail.gmail.com>

You could try A[...].fill(MyObject(...)). I haven't tried it myself, so not
sure it would work though...

-=- Olivier

2012/1/6 "David K?pfer" <dkoepfer at gmx.de>

> Dear numpy community,
>
> I'm trying to create an array of type object.
>
> A = empty(9, dtype=object)
> A[ array(0,1,2) ] = MyObject(1)
> A[ array(3,4,5) ] = MyObject(2)
> A[ array(6,7,8) ] = MyObject(3)
>
> This has worked well until MyObject has gotten an __getitem__ method. Now
> python (as it is usually supposed to) assigns A[0] to MyObject(1)[0], [1]
> to MyObject(1)[1] and so on.
>
> Is there any way to just get a reference of the instance of MyObject into
> every entry of the array slice?
>
> Thank you for any help on this problem
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120108/7d434a03/attachment.html>

From dkoepfer at gmx.de  Mon Jan  9 03:21:35 2012
From: dkoepfer at gmx.de (=?iso-8859-1?Q?=22David_K=F6pfer=22?=)
Date: Mon, 09 Jan 2012 09:21:35 +0100
Subject: [Numpy-discussion] filling an alice of array of object with a
 reference to an object that has a __getitem__ method
In-Reply-To: <CAFXk4boAw1gAQA=PFbBsza0rTKsydQbn=k3NWPj9roUk+TdzAw@mail.gmail.com>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<je6g99$mbf$1@dough.gmane.org> <20120106121522.203240@gmx.net>
	<CAFXk4boAw1gAQA=PFbBsza0rTKsydQbn=k3NWPj9roUk+TdzAw@mail.gmail.com>
Message-ID: <20120109082135.203220@gmx.net>

Hi Oliver,

thank you very much for your reply, sadly it is not working as you and I hoped. The array still stays at None even after the code. 

I've also tried A[X] = [MyObject(...)]*len(X) but that just results in a Memory error.

So is there really no way to avoid this broadcasting?

David


-------- Original-Nachricht --------
> Datum: Sun, 8 Jan 2012 16:16:33 -0500
> Von: Olivier Delalleau <shish at keba.be>
> An: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Betreff: Re: [Numpy-discussion] filling an alice of array of object with a reference to an object that has a __getitem__ method

> You could try A[...].fill(MyObject(...)). I haven't tried it myself, so
> not
> sure it would work though...
> 
> -=- Olivier
> 
> 2012/1/6 "David K?pfer" <dkoepfer at gmx.de>
> 
> > Dear numpy community,
> >
> > I'm trying to create an array of type object.
> >
> > A = empty(9, dtype=object)
> > A[ array(0,1,2) ] = MyObject(1)
> > A[ array(3,4,5) ] = MyObject(2)
> > A[ array(6,7,8) ] = MyObject(3)
> >
> > This has worked well until MyObject has gotten an __getitem__ method.
> Now
> > python (as it is usually supposed to) assigns A[0] to MyObject(1)[0],
> [1]
> > to MyObject(1)[1] and so on.
> >
> > Is there any way to just get a reference of the instance of MyObject
> into
> > every entry of the array slice?
> >
> > Thank you for any help on this problem
> > David
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >


From shish at keba.be  Mon Jan  9 10:39:21 2012
From: shish at keba.be (Olivier Delalleau)
Date: Mon, 9 Jan 2012 10:39:21 -0500
Subject: [Numpy-discussion] filling an alice of array of object with a
 reference to an object that has a __getitem__ method
In-Reply-To: <20120109082135.203220@gmx.net>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<je6g99$mbf$1@dough.gmane.org> <20120106121522.203240@gmx.net>
	<CAFXk4boAw1gAQA=PFbBsza0rTKsydQbn=k3NWPj9roUk+TdzAw@mail.gmail.com>
	<20120109082135.203220@gmx.net>
Message-ID: <CAFXk4bo20OFa1i17cXGRJN-nAjk-HvAMWEQo+KaPufaJ-+h0nA@mail.gmail.com>

Oh, sorry, I hadn't paid enough attention to the way you are indexing A: if
you are using an array to index, it creates a copy, so using ".fill" will
fill the copy and you won't see the result.
Instead, use A[0:3], A[3:6], etc.

-=- Olivier


2012/1/9 "David K?pfer" <dkoepfer at gmx.de>

> Hi Oliver,
>
> thank you very much for your reply, sadly it is not working as you and I
> hoped. The array still stays at None even after the code.
>
> I've also tried A[X] = [MyObject(...)]*len(X) but that just results in a
> Memory error.
>
> So is there really no way to avoid this broadcasting?
>
> David
>
>
> -------- Original-Nachricht --------
> > Datum: Sun, 8 Jan 2012 16:16:33 -0500
> > Von: Olivier Delalleau <shish at keba.be>
> > An: Discussion of Numerical Python <numpy-discussion at scipy.org>
> > Betreff: Re: [Numpy-discussion] filling an alice of array of object with
> a reference to an object that has a __getitem__ method
>
> > You could try A[...].fill(MyObject(...)). I haven't tried it myself, so
> > not
> > sure it would work though...
> >
> > -=- Olivier
> >
> > 2012/1/6 "David K?pfer" <dkoepfer at gmx.de>
> >
> > > Dear numpy community,
> > >
> > > I'm trying to create an array of type object.
> > >
> > > A = empty(9, dtype=object)
> > > A[ array(0,1,2) ] = MyObject(1)
> > > A[ array(3,4,5) ] = MyObject(2)
> > > A[ array(6,7,8) ] = MyObject(3)
> > >
> > > This has worked well until MyObject has gotten an __getitem__ method.
> > Now
> > > python (as it is usually supposed to) assigns A[0] to MyObject(1)[0],
> > [1]
> > > to MyObject(1)[1] and so on.
> > >
> > > Is there any way to just get a reference of the instance of MyObject
> > into
> > > every entry of the array slice?
> > >
> > > Thank you for any help on this problem
> > > David
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at scipy.org
> > > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120109/787b5e5a/attachment.html>

From questions.anon at gmail.com  Mon Jan  9 17:31:41 2012
From: questions.anon at gmail.com (questions anon)
Date: Tue, 10 Jan 2012 09:31:41 +1100
Subject: [Numpy-discussion] find location of maximum values
In-Reply-To: <C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
Message-ID: <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com>

thanks for the responses.
Unfortunately they are not matching shapes
>>> print TSFC.shape, TIME.shape, LAT.shape, LON.shape
(721, 106, 193) (721,) (106,) (193,)

So I still receive index out of bounds error:
>>>tmax=TSFC.max(axis=0)
numpy array of max values for the month
>>>maxindex=tmax.argmax()
2928
>>>maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max()
35.5 (degrees celcius)

>>>latloc=LAT[tmax.argmax()]
IndexError: index out of bounds

lonloc=LON[tmax.argmax()]
timeloc=TIME[tmax.argmax()]


Any other ideas for this type of situation?
thanks

On Wed, Jan 4, 2012 at 10:29 PM, Derek Homeier <
derek at astro.physik.uni-goettingen.de> wrote:

> On 04.01.2012, at 5:10AM, questions anon wrote:
>
> > Thanks for your responses but I am still having difficuties with this
> problem. Using argmax gives me one very large value and I am not sure what
> it is.
> > There shouldn't be any issues with the shape. The latitude and longitude
> are the same shape always (covering a state) and the temperature (TSFC)
> data are hourly for a whole month.
>
> There will be an issue if not TSFC.shape == TIME.shape == LAT.shape ==
> LON.shape
>
> One needs more information on the structure of these data to say anything
> definite,
> but if e.g. your TSFC data have a time and a location dimension, argmax
> will
> per default return the index for the flattened array (see the argmax
> documentation
> for details, and how to use the axis keyword to get a different output).
> This might be the very large value you mention, and if your location data
> have fewer
> dimensions, the index will easily be out of range. As Ben wrote, you'd
> need extra work to
> find the maximum location, depending on what maximum you are actually
> looking for.
>
> As a speculative example, let's assume you have the temperature data in an
> array(ntime, nloc) and the position data in array(nloc). Then
>
> TSFC.argmax(axis=1)
>
> would give you the index for the hottest place for each hour of the month
> (i.e. actually an array of ntime indices, and pointer to so many different
> locations).
>
> To locate the maximum temperature for the entire month, your best way
> would probably
> be to first extract the array of (monthly) maximum temperatures in each
> location as
>
> tmax = TSFC.max(axis=0)
>
> which would have (in this example) the shape (nloc,), so you could
> directly use it to index
>
> LAT[tmax.argmax()]   etc.
>
> Cheers,
>                                                Derek
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/6ab1563a/attachment.html>

From ben.root at ou.edu  Mon Jan  9 18:22:39 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Mon, 9 Jan 2012 17:22:39 -0600
Subject: [Numpy-discussion] find location of maximum values
In-Reply-To: <CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com>
Message-ID: <CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com>

On Monday, January 9, 2012, questions anon <questions.anon at gmail.com> wrote:
> thanks for the responses.
> Unfortunately they are not matching shapes
>>>> print TSFC.shape, TIME.shape, LAT.shape, LON.shape
> (721, 106, 193) (721,) (106,) (193,)
>
> So I still receive index out of bounds error:
>>>>tmax=TSFC.max(axis=0)
> numpy array of max values for the month
>>>>maxindex=tmax.argmax()
> 2928
>>>>maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max()
> 35.5 (degrees celcius)
>
>>>>latloc=LAT[tmax.argmax()]
> IndexError: index out of bounds
>
> lonloc=LON[tmax.argmax()]
> timeloc=TIME[tmax.argmax()]
>
>
> Any other ideas for this type of situation?
> thanks

Right, we realize they are not the same shape.  When you use argmax on the
temperature data, take that index number and use unravel_index(index,
TSFC.shape) to get a three-element tuple, each being the index in the TIME,
LAT, LON arrays, respectively.

Cheers,
Ben Root

>
> On Wed, Jan 4, 2012 at 10:29 PM, Derek Homeier <
derek at astro.physik.uni-goettingen.de> wrote:
>>
>> On 04.01.2012, at 5:10AM, questions anon wrote:
>>
>> > Thanks for your responses but I am still having difficuties with this
problem. Using argmax gives me one very large value and I am not sure what
it is.
>> > There shouldn't be any issues with the shape. The latitude and
longitude are the same shape always (covering a state) and the temperature
(TSFC) data are hourly for a whole month.
>>
>> There will be an issue if not TSFC.shape == TIME.shape == LAT.shape ==
LON.shape
>>
>> One needs more information on the structure of these data to say
anything definite,
>> but if e.g. your TSFC data have a time and a location dimension, argmax
will
>> per default return the index for the flattened array (see the argmax
documentation
>> for details, and how to use the axis keyword to get a different output).
>> This might be the very large value you mention, and if your location
data have fewer
>> dimensions, the index will easily be out of range. As Ben wrote, you'd
need extra work to
>> find the maximum location, depending on what maximum you are actually
looking for.
>>
>> As a speculative example, let's assume you have the temperature data in
an
>> array(ntime, nloc) and the position data in array(nloc). Then
>>
>> TSFC.argmax(axis=1)
>>
>> would give you the index for the hottest place for each hour of the month
>> (i.e. actually an array of ntime indices, and pointer to so many
different locations).
>>
>> To locate the maximum temperature for the entire month, your best way
would probably
>> be to first extract the array of (monthly) maximum temperatures in each
location as
>>
>> tmax = TSFC.max(axis=0)
>>
>> which would have (in this example) the shape (nloc,), so you could
directly use it to index
>>
>> LAT[tmax.argmax()]   etc.
>>
>> Cheers,
>>                                                Derek
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120109/4b1e0db4/attachment.html>

From questions.anon at gmail.com  Mon Jan  9 20:59:00 2012
From: questions.anon at gmail.com (questions anon)
Date: Tue, 10 Jan 2012 12:59:00 +1100
Subject: [Numpy-discussion] find location of maximum values
In-Reply-To: <CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com>
	<CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com>
Message-ID: <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com>

thank you, I seem to have made some progress (with lots of help)!!
I still seem to be having trouble with the time. Because it is hourly data
for a whole month I assume that is where my problem lies.
When I run the following code I alwayes receive the first timestamp of the
file. Not sure how to get around this:

            tmax=TSFC.max(axis=0)
            maxindex=tmax.argmax()
            maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max()
            print maxindex, maxtemp
            val=N.unravel_index(maxindex, TSFC.shape)
            listval=list(val)
            print listval
            timelocation=TIME[listval[0]]
            latlocation=LAT[listval[1]]
            lonlocation=LON[listval[2]]
            print latlocation, lonlocation
            cdftime=utime('seconds since 1970-01-01 00:00:00')
            ncfiletime=cdftime.num2date(timelocation)
            print ncfiletime


On Tue, Jan 10, 2012 at 10:22 AM, Benjamin Root <ben.root at ou.edu> wrote:

>
>
> On Monday, January 9, 2012, questions anon <questions.anon at gmail.com>
> wrote:
> > thanks for the responses.
> > Unfortunately they are not matching shapes
> >>>> print TSFC.shape, TIME.shape, LAT.shape, LON.shape
> > (721, 106, 193) (721,) (106,) (193,)
> >
> > So I still receive index out of bounds error:
> >>>>tmax=TSFC.max(axis=0)
> > numpy array of max values for the month
> >>>>maxindex=tmax.argmax()
> > 2928
> >>>>maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max()
> > 35.5 (degrees celcius)
> >
> >>>>latloc=LAT[tmax.argmax()]
> > IndexError: index out of bounds
> >
> > lonloc=LON[tmax.argmax()]
> > timeloc=TIME[tmax.argmax()]
> >
> >
> > Any other ideas for this type of situation?
> > thanks
>
> Right, we realize they are not the same shape.  When you use argmax on the
> temperature data, take that index number and use unravel_index(index,
> TSFC.shape) to get a three-element tuple, each being the index in the TIME,
> LAT, LON arrays, respectively.
>
> Cheers,
> Ben Root
>
>
> >
> > On Wed, Jan 4, 2012 at 10:29 PM, Derek Homeier <
> derek at astro.physik.uni-goettingen.de> wrote:
> >>
> >> On 04.01.2012, at 5:10AM, questions anon wrote:
> >>
> >> > Thanks for your responses but I am still having difficuties with this
> problem. Using argmax gives me one very large value and I am not sure what
> it is.
> >> > There shouldn't be any issues with the shape. The latitude and
> longitude are the same shape always (covering a state) and the temperature
> (TSFC) data are hourly for a whole month.
> >>
> >> There will be an issue if not TSFC.shape == TIME.shape == LAT.shape ==
> LON.shape
> >>
> >> One needs more information on the structure of these data to say
> anything definite,
> >> but if e.g. your TSFC data have a time and a location dimension, argmax
> will
> >> per default return the index for the flattened array (see the argmax
> documentation
> >> for details, and how to use the axis keyword to get a different output).
> >> This might be the very large value you mention, and if your location
> data have fewer
> >> dimensions, the index will easily be out of range. As Ben wrote, you'd
> need extra work to
> >> find the maximum location, depending on what maximum you are actually
> looking for.
> >>
> >> As a speculative example, let's assume you have the temperature data in
> an
> >> array(ntime, nloc) and the position data in array(nloc). Then
> >>
> >> TSFC.argmax(axis=1)
> >>
> >> would give you the index for the hottest place for each hour of the
> month
> >> (i.e. actually an array of ntime indices, and pointer to so many
> different locations).
> >>
> >> To locate the maximum temperature for the entire month, your best way
> would probably
> >> be to first extract the array of (monthly) maximum temperatures in each
> location as
> >>
> >> tmax = TSFC.max(axis=0)
> >>
> >> which would have (in this example) the shape (nloc,), so you could
> directly use it to index
> >>
> >> LAT[tmax.argmax()]   etc.
> >>
> >> Cheers,
> >>                                                Derek
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/f538bc81/attachment.html>

From shish at keba.be  Mon Jan  9 22:39:52 2012
From: shish at keba.be (Olivier Delalleau)
Date: Mon, 9 Jan 2012 22:39:52 -0500
Subject: [Numpy-discussion] find location of maximum values
In-Reply-To: <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com>
	<CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com>
	<CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com>
Message-ID: <CAFXk4bqfLCrNWyTTxMCszu+K2KpRDHucjxN636be_UcPTOySSg@mail.gmail.com>

Do you mean that listval[0] is systematically equal to 0, or is it
something else?

-=- Olivier

2012/1/9 questions anon <questions.anon at gmail.com>

> thank you, I seem to have made some progress (with lots of help)!!
> I still seem to be having trouble with the time. Because it is hourly data
> for a whole month I assume that is where my problem lies.
> When I run the following code I alwayes receive the first timestamp of the
> file. Not sure how to get around this:
>
>             tmax=TSFC.max(axis=0)
>             maxindex=tmax.argmax()
>             maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max()
>             print maxindex, maxtemp
>             val=N.unravel_index(maxindex, TSFC.shape)
>             listval=list(val)
>             print listval
>             timelocation=TIME[listval[0]]
>             latlocation=LAT[listval[1]]
>             lonlocation=LON[listval[2]]
>             print latlocation, lonlocation
>
>             cdftime=utime('seconds since 1970-01-01 00:00:00')
>             ncfiletime=cdftime.num2date(timelocation)
>             print ncfiletime
>
>
>
> On Tue, Jan 10, 2012 at 10:22 AM, Benjamin Root <ben.root at ou.edu> wrote:
>
>>
>>
>> On Monday, January 9, 2012, questions anon <questions.anon at gmail.com>
>> wrote:
>> > thanks for the responses.
>> > Unfortunately they are not matching shapes
>> >>>> print TSFC.shape, TIME.shape, LAT.shape, LON.shape
>> > (721, 106, 193) (721,) (106,) (193,)
>> >
>> > So I still receive index out of bounds error:
>> >>>>tmax=TSFC.max(axis=0)
>> > numpy array of max values for the month
>> >>>>maxindex=tmax.argmax()
>> > 2928
>> >>>>maxtemp=tmax.ravel()[maxindex] #or maxtemp=TSFC.max()
>> > 35.5 (degrees celcius)
>> >
>> >>>>latloc=LAT[tmax.argmax()]
>> > IndexError: index out of bounds
>> >
>> > lonloc=LON[tmax.argmax()]
>> > timeloc=TIME[tmax.argmax()]
>> >
>> >
>> > Any other ideas for this type of situation?
>> > thanks
>>
>> Right, we realize they are not the same shape.  When you use argmax on
>> the temperature data, take that index number and use unravel_index(index,
>> TSFC.shape) to get a three-element tuple, each being the index in the TIME,
>> LAT, LON arrays, respectively.
>>
>> Cheers,
>> Ben Root
>>
>>
>> >
>> > On Wed, Jan 4, 2012 at 10:29 PM, Derek Homeier <
>> derek at astro.physik.uni-goettingen.de> wrote:
>> >>
>> >> On 04.01.2012, at 5:10AM, questions anon wrote:
>> >>
>> >> > Thanks for your responses but I am still having difficuties with
>> this problem. Using argmax gives me one very large value and I am not sure
>> what it is.
>> >> > There shouldn't be any issues with the shape. The latitude and
>> longitude are the same shape always (covering a state) and the temperature
>> (TSFC) data are hourly for a whole month.
>> >>
>> >> There will be an issue if not TSFC.shape == TIME.shape == LAT.shape ==
>> LON.shape
>> >>
>> >> One needs more information on the structure of these data to say
>> anything definite,
>> >> but if e.g. your TSFC data have a time and a location dimension,
>> argmax will
>> >> per default return the index for the flattened array (see the argmax
>> documentation
>> >> for details, and how to use the axis keyword to get a different
>> output).
>> >> This might be the very large value you mention, and if your location
>> data have fewer
>> >> dimensions, the index will easily be out of range. As Ben wrote, you'd
>> need extra work to
>> >> find the maximum location, depending on what maximum you are actually
>> looking for.
>> >>
>> >> As a speculative example, let's assume you have the temperature data
>> in an
>> >> array(ntime, nloc) and the position data in array(nloc). Then
>> >>
>> >> TSFC.argmax(axis=1)
>> >>
>> >> would give you the index for the hottest place for each hour of the
>> month
>> >> (i.e. actually an array of ntime indices, and pointer to so many
>> different locations).
>> >>
>> >> To locate the maximum temperature for the entire month, your best way
>> would probably
>> >> be to first extract the array of (monthly) maximum temperatures in
>> each location as
>> >>
>> >> tmax = TSFC.max(axis=0)
>> >>
>> >> which would have (in this example) the shape (nloc,), so you could
>> directly use it to index
>> >>
>> >> LAT[tmax.argmax()]   etc.
>> >>
>> >> Cheers,
>> >>                                                Derek
>> >>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120109/4bed4563/attachment.html>

From aronne.merrelli at gmail.com  Mon Jan  9 23:28:52 2012
From: aronne.merrelli at gmail.com (Aronne Merrelli)
Date: Mon, 9 Jan 2012 22:28:52 -0600
Subject: [Numpy-discussion] find location of maximum values
In-Reply-To: <CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com>
	<CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com>
	<CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com>
Message-ID: <CAHNdQ4JUCbNXszRMhTXAQF_+51Xdf6+x3FFNpa7J=RiTZ--zVw@mail.gmail.com>

On Mon, Jan 9, 2012 at 7:59 PM, questions anon <questions.anon at gmail.com>wrote:

> thank you, I seem to have made some progress (with lots of help)!!
> I still seem to be having trouble with the time. Because it is hourly data
> for a whole month I assume that is where my problem lies.
> When I run the following code I alwayes receive the first timestamp of the
> file. Not sure how to get around this:
>
>             tmax=TSFC.max(axis=0)
>             maxindex=tmax.argmax()
>

You are computing max(axis=0) first. So, tmax is an array containing the
maximum temperature at each lat/lon grid point, over the set of 721 months.
It will be a [106, 193] array.

So the argmax of tmax is an element in a shape [106,193] array (the number
of latitude/number of longitude) not the original three dimension [721,
106, 193] array. Thus when you unravel it you can only get the first time
value.

I re-read your original post but I don't understand what number you need.
Are you trying to get the single max value over the entire array? Or max
value for each month? (a 721 element vector)? or something else?


Cheers,
Aronne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120109/3d31eaaf/attachment.html>

From questions.anon at gmail.com  Mon Jan  9 23:40:02 2012
From: questions.anon at gmail.com (questions anon)
Date: Tue, 10 Jan 2012 15:40:02 +1100
Subject: [Numpy-discussion] find location of maximum values
In-Reply-To: <CAHNdQ4JUCbNXszRMhTXAQF_+51Xdf6+x3FFNpa7J=RiTZ--zVw@mail.gmail.com>
References: <CAN_=ogvx95yXcg-WibYD1cWsZoHxd61G7UNwBQQYYneTJQ3hsQ@mail.gmail.com>
	<CAFXk4brvyEhxf0t_9AtvkxmZWS8dgvH+Q1NV6qn7MAZvzXg_qg@mail.gmail.com>
	<CAN_=ogvoqjKPnt+xXptaEpJ-JmCq_y2m+7nyOm5kiTH3rnZ7NA@mail.gmail.com>
	<CANNq6F=B1Y_8jf-qFGqT8d8iYmGriy-2dQqa7O-fJH6ugNTnPQ@mail.gmail.com>
	<CAN_=ogtx9Ujd2yd7TQC87rELy33GtjQBjeszLkZc81T2-+7a3Q@mail.gmail.com>
	<C0A87C84-31A9-4E79-91CB-11A402443AB2@astro.physik.uni-goettingen.de>
	<CAN_=ogu6Amjt3NJMWwtGtTGsYMmg_5DHZRvpzo+GyAL3y4beUw@mail.gmail.com>
	<CANNq6F=aemGHHd7i4ieBD2iFUKHg4=conx3etBReCf7k96MuqQ@mail.gmail.com>
	<CAN_=ogvC9nryZR9M2C9+8X1X+2goHQKyXpAcro2Bsq7bKw2o6g@mail.gmail.com>
	<CAHNdQ4JUCbNXszRMhTXAQF_+51Xdf6+x3FFNpa7J=RiTZ--zVw@mail.gmail.com>
Message-ID: <CAN_=ogvBwmYDduqF2EK3TiVxWkL2bcmZk9v4ZQ9oozV5qA9CzQ@mail.gmail.com>

Thank you, thank you, thank you!
I needed to find the max value (and corresponding TIME and LAT, LON) for
the entire month but I shouldn't have been using the tmax, instead I needed
to use the entire array. Below code works for those needing to do something
similar.
Thanks for all your help everyone!


            tmax=TSFC.max(axis=0)
            maxindex=TSFC.argmax()
            maxtemp=TSFC.ravel()[maxindex] #or maxtemp=TSFC.max()
            print maxindex, maxtemp
            val=N.unravel_index(maxindex, TSFC.shape)
            listval=list(val)
            print listval
            timelocation=TIME[listval[0]]
            latlocation=LAT[listval[1]]
            lonlocation=LON[listval[2]]
            print latlocation, lonlocation
            cdftime=utime('seconds since 1970-01-01 00:00:00')
            ncfiletime=cdftime.num2date(timelocation)
            print ncfiletime


On Tue, Jan 10, 2012 at 3:28 PM, Aronne Merrelli
<aronne.merrelli at gmail.com>wrote:

>
>
> On Mon, Jan 9, 2012 at 7:59 PM, questions anon <questions.anon at gmail.com>wrote:
>
>> thank you, I seem to have made some progress (with lots of help)!!
>> I still seem to be having trouble with the time. Because it is hourly
>> data for a whole month I assume that is where my problem lies.
>> When I run the following code I alwayes receive the first timestamp of
>> the file. Not sure how to get around this:
>>
>>             tmax=TSFC.max(axis=0)
>>             maxindex=tmax.argmax()
>>
>
> You are computing max(axis=0) first. So, tmax is an array containing the
> maximum temperature at each lat/lon grid point, over the set of 721 months.
> It will be a [106, 193] array.
>
> So the argmax of tmax is an element in a shape [106,193] array (the number
> of latitude/number of longitude) not the original three dimension [721,
> 106, 193] array. Thus when you unravel it you can only get the first time
> value.
>
> I re-read your original post but I don't understand what number you need.
> Are you trying to get the single max value over the entire array? Or max
> value for each month? (a 721 element vector)? or something else?
>
>
> Cheers,
> Aronne
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/2ace75cc/attachment.html>

From scipy at samueljohn.de  Tue Jan 10 10:24:41 2012
From: scipy at samueljohn.de (Samuel John)
Date: Tue, 10 Jan 2012 16:24:41 +0100
Subject: [Numpy-discussion] SParse feature vector generation
In-Reply-To: <CAO=18DY=fpfWjX0axfce-GZhwZOAffwC+48CdxUdAp5hV4U3Hw@mail.gmail.com>
References: <CAO=18DY=fpfWjX0axfce-GZhwZOAffwC+48CdxUdAp5hV4U3Hw@mail.gmail.com>
Message-ID: <11FDB528-2255-4D76-90A7-0FC013E4E12A@samueljohn.de>

I would just use a lookup dict:

names = [ "uc_berkeley", "stanford", "uiuc", "google", "intel", "texas_instruments", "bool"]
lookup = dict( zip( range(len(names)), names ) )


Now, given you have n entries:

S = numpy.zeros( (n, len(names)) ,dtype=numpy.int32)

for k in ["uc_berkeley", "google", "bool"]:
    S[0,lookup[k]] += 1

for k in ["stanford", "intel","bool"]: 
    S[1,lookup[k]] += 1

... and so forth. so lookup[k] returns the index to use. 


Hope this helps. I am not aware of an automatic that does this. I may be wrong.
cheers, 
 Samuel


On 04.01.2012, at 07:25, Dhruvkaran Mehta wrote:

> Hi numpy users,
> 
> Is there a convenient way in numpy to go from "string" features like:
> 
> "uc_berkeley", "google", 1
> "stanford", "intel", 1
> .
> .
> .
> "uiuc", "texas_instruments", 0
> 
> to a numpy matrix like:
> 
>  "uc_berkeley", "stanford", ..., "uiuc", "google", "intel", "texas_instruments", "bool"
>           1                0         ...     0           1           0                0                       1
>           0                1         ...     0           0           1                0                       1 
>           :
>           0                0         ...     1           0           0                1                       0
> 
> I really appreciate you taking the time to help!
> Thanks!
> --Dhruv
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From scipy at samueljohn.de  Tue Jan 10 11:29:04 2012
From: scipy at samueljohn.de (Samuel John)
Date: Tue, 10 Jan 2012 17:29:04 +0100
Subject: [Numpy-discussion] Ufuncs and flexible types, CAPI
In-Reply-To: <CAE8bXEmGPrnjzVcK_TZ8QUnNeVnUEtsopz0+5ptKXWwSUSa21g@mail.gmail.com>
References: <CAE8bXEmGPrnjzVcK_TZ8QUnNeVnUEtsopz0+5ptKXWwSUSa21g@mail.gmail.com>
Message-ID: <18968058-9EDB-4CAA-9DF3-64DC069BD619@samueljohn.de>

[sorry for duplicate - I used the wrong mail address]

I am afraid, I didn't quite get the question.
What is the scenario? What is the benefit that would weight out the performance hit of checking whether there is a callback or not. This has to be evaluated quite a lot.

Oh well ... and 1.3.0 is pretty old :-)

cheers,
Samuel

On 31.12.2011, at 07:48, Val Kalatsky wrote:

> 
> Hi folks, 
> 
> First post, may not follow the standards, please bear with me. 
> 
> Need to define a ufunc that takes care of various type. 
> Fixed - no problem, userdef - no problem, flexible - problem. 
> It appears that the standard ufunc loop does not provide means to 
> deliver the size of variable size items. 
> Questions and suggestions:
> 
> 1) Please no laughing: I have to code for NumPy 1.3.0. 
> Perhaps this issue has been resolved, then the discussion becomes moot. 
> If so please direct me to the right link. 
> 
> 2) A reasonable approach here would be to use callbacks and to give the user (read programmer) 
> a chance to intervene at least twice: OnInit and OnFail (OnFinish may not be unreasonable as well). 
> 
> OnInit: before starting the type resolution the user is given a chance to do something (e.g. check for 
> that pesky type and take control then return a flag indicating a stop) before the resolution starts
> OnFail: the resolution took place and did not succeed, the user is given a chance to fix it. 
> In most of the case these callbacks are NULLs. 
> 
> I could patch numpy with a generic method that does it, but it's a shame not to use the good ufunc machine. 
> 
> Thanks for tips and suggestions.
> 
> Val Kalatsky
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From kalatsky at gmail.com  Tue Jan 10 13:26:17 2012
From: kalatsky at gmail.com (Val Kalatsky)
Date: Tue, 10 Jan 2012 12:26:17 -0600
Subject: [Numpy-discussion] Ufuncs and flexible types, CAPI
In-Reply-To: <18968058-9EDB-4CAA-9DF3-64DC069BD619@samueljohn.de>
References: <CAE8bXEmGPrnjzVcK_TZ8QUnNeVnUEtsopz0+5ptKXWwSUSa21g@mail.gmail.com>
	<18968058-9EDB-4CAA-9DF3-64DC069BD619@samueljohn.de>
Message-ID: <CAE8bXEnz_Q1WVVFk_7QOxNB5--zyu1TVwVGRCK74tLHKBVoYng@mail.gmail.com>

Hi Samuel,

Thanks for the reply.

I hoped somebody will prove me wrong on ufuncs' limitation: no flexible
type support.
Also wanted to bring up a discussion on changing ufunc API.
I think adding another parameter that delivers pointers to arrays to the
loops would not
lead to any undesirable consequences.

Yep, 1.3.0 is old, but 1.7 has same loop prototype (with some minor
cosmetic change):
(char **args, intp *dimensions, intp *steps, void *func)  ->  (char **args,
intp *dimensions, intp *steps, void *NPY_UNUSED(func))
it probably has not change from the conception.

Thanks
Val

On Tue, Jan 10, 2012 at 10:29 AM, Samuel John <scipy at samueljohn.de> wrote:

> [sorry for duplicate - I used the wrong mail address]
>
> I am afraid, I didn't quite get the question.
> What is the scenario? What is the benefit that would weight out the
> performance hit of checking whether there is a callback or not. This has to
> be evaluated quite a lot.
>
> Oh well ... and 1.3.0 is pretty old :-)
>
> cheers,
> Samuel
>
> On 31.12.2011, at 07:48, Val Kalatsky wrote:
>
> >
> > Hi folks,
> >
> > First post, may not follow the standards, please bear with me.
> >
> > Need to define a ufunc that takes care of various type.
> > Fixed - no problem, userdef - no problem, flexible - problem.
> > It appears that the standard ufunc loop does not provide means to
> > deliver the size of variable size items.
> > Questions and suggestions:
> >
> > 1) Please no laughing: I have to code for NumPy 1.3.0.
> > Perhaps this issue has been resolved, then the discussion becomes moot.
> > If so please direct me to the right link.
> >
> > 2) A reasonable approach here would be to use callbacks and to give the
> user (read programmer)
> > a chance to intervene at least twice: OnInit and OnFail (OnFinish may
> not be unreasonable as well).
> >
> > OnInit: before starting the type resolution the user is given a chance
> to do something (e.g. check for
> > that pesky type and take control then return a flag indicating a stop)
> before the resolution starts
> > OnFail: the resolution took place and did not succeed, the user is given
> a chance to fix it.
> > In most of the case these callbacks are NULLs.
> >
> > I could patch numpy with a generic method that does it, but it's a shame
> not to use the good ufunc machine.
> >
> > Thanks for tips and suggestions.
> >
> > Val Kalatsky
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/9732d65c/attachment.html>

From madsipsen at gmail.com  Tue Jan 10 15:14:02 2012
From: madsipsen at gmail.com (Mads Ipsen)
Date: Tue, 10 Jan 2012 21:14:02 +0100
Subject: [Numpy-discussion] Index update
Message-ID: <4F0C9C0A.5030809@gmail.com>

Hi,

Suppose you have N items, say N = 10.

Now a subset of these items are selected given by a list A of indices. 
Lets say that items A = [2,5,7] are selected. Assume now that you delete 
some of the items given by the indices S = [1,4,8]. This means that the 
list of indices A must be updated, since items have been deleted. For 
this particular case the updated selection list A becomes A = [1,3,5].

Is there some smart numpy way of doing this index update of the selected 
items in A without looping? Typically N is large.

Best regards,

Mads

-- 
+-----------------------------------------------------+
| Mads Ipsen                                          |
+----------------------+------------------------------+
| G?seb?ksvej 7, 4. tv |                              |
| DK-2500 Valby        | phone:          +45-29716388 |
| Denmark              | email:  mads.ipsen at gmail.com |
+----------------------+------------------------------+


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/90877fe9/attachment.html>

From kalatsky at gmail.com  Tue Jan 10 15:45:46 2012
From: kalatsky at gmail.com (Val Kalatsky)
Date: Tue, 10 Jan 2012 14:45:46 -0600
Subject: [Numpy-discussion] Index update
In-Reply-To: <4F0C9C0A.5030809@gmail.com>
References: <4F0C9C0A.5030809@gmail.com>
Message-ID: <CAE8bXE=9FpyopC7DydcAwdMRyr-YAyP5i-8ZzGKfqS6crfvGhQ@mail.gmail.com>

A - np.digitize(A, S)
Should do the trick, just make sure that S is sorted and A and S do not
overlap,
if they do remove those items from A using set operations.
Val

On Tue, Jan 10, 2012 at 2:14 PM, Mads Ipsen <madsipsen at gmail.com> wrote:

> **
> Hi,
>
> Suppose you have N items, say N = 10.
>
> Now a subset of these items are selected given by a list A of indices.
> Lets say that items A = [2,5,7] are selected. Assume now that you delete
> some of the items given by the indices S = [1,4,8]. This means that the
> list of indices A must be updated, since items have been deleted. For this
> particular case the updated selection list A becomes A = [1,3,5].
>
> Is there some smart numpy way of doing this index update of the selected
> items in A without looping? Typically N is large.
>
> Best regards,
>
> Mads
>
> --
> +-----------------------------------------------------+
> | Mads Ipsen                                          |
> +----------------------+------------------------------+
> | G?seb?ksvej 7, 4. tv |                              |
> | DK-2500 Valby        | phone:          +45-29716388 |
> | Denmark              | email:  mads.ipsen at gmail.com |
> +----------------------+------------------------------+
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/df087851/attachment.html>

From madsipsen at gmail.com  Tue Jan 10 15:53:52 2012
From: madsipsen at gmail.com (Mads Ipsen)
Date: Tue, 10 Jan 2012 21:53:52 +0100
Subject: [Numpy-discussion] Index update
In-Reply-To: <CAE8bXE=9FpyopC7DydcAwdMRyr-YAyP5i-8ZzGKfqS6crfvGhQ@mail.gmail.com>
References: <4F0C9C0A.5030809@gmail.com>
	<CAE8bXE=9FpyopC7DydcAwdMRyr-YAyP5i-8ZzGKfqS6crfvGhQ@mail.gmail.com>
Message-ID: <4F0CA560.7080807@gmail.com>

Thanks - very cool!

On 10/01/2012 21:45, Val Kalatsky wrote:
>
> A - np.digitize(A, S)
> Should do the trick, just make sure that S is sorted and A and S do 
> not overlap,
> if they do remove those items from A using set operations.
> Val
>
> On Tue, Jan 10, 2012 at 2:14 PM, Mads Ipsen <madsipsen at gmail.com 
> <mailto:madsipsen at gmail.com>> wrote:
>
>     Hi,
>
>     Suppose you have N items, say N = 10.
>
>     Now a subset of these items are selected given by a list A of
>     indices. Lets say that items A = [2,5,7] are selected. Assume now
>     that you delete some of the items given by the indices S =
>     [1,4,8]. This means that the list of indices A must be updated,
>     since items have been deleted. For this particular case the
>     updated selection list A becomes A = [1,3,5].
>
>     Is there some smart numpy way of doing this index update of the
>     selected items in A without looping? Typically N is large.
>
>     Best regards,
>
>     Mads
>
>     -- 
>     +-----------------------------------------------------+
>     | Mads Ipsen                                          |
>     +----------------------+------------------------------+
>     | G?seb?ksvej 7, 4. tv |                              |
>     | DK-2500 Valby        | phone:+45-29716388  <tel:%2B45-29716388>  |
>     | Denmark              | email:mads.ipsen at gmail.com  <mailto:mads.ipsen at gmail.com>  |
>     +----------------------+------------------------------+
>
>
>
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
+-----------------------------------------------------+
| Mads Ipsen                                          |
+----------------------+------------------------------+
| G?seb?ksvej 7, 4. tv |                              |
| DK-2500 Valby        | phone:          +45-29716388 |
| Denmark              | email:  mads.ipsen at gmail.com |
+----------------------+------------------------------+


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120110/ff6500e6/attachment.html>

From mikehulluk at googlemail.com  Wed Jan 11 04:41:22 2012
From: mikehulluk at googlemail.com (Michael Hull)
Date: Wed, 11 Jan 2012 09:41:22 +0000
Subject: [Numpy-discussion] Numpy 'groupby'
Message-ID: <CABzAe0YuOPkAO9mXfV-Q91EAF3PtwUX77Xk+8nNuC-U1zow6ZA@mail.gmail.com>

Hi Everyone,
First off, thanks for all your hard work on numpy, its a really great help!
I was wondering if there was a standard 'groupby' in numpy, that
similar to that in itertools.
I know its not hard to write with np.diff, but I have found myself
writing it on more than a couple of occasions, and wondered if
 there was a 'standarised' version I was missing out on??
Thanks,


Mike


From ndbecker2 at gmail.com  Wed Jan 11 07:05:33 2012
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 11 Jan 2012 07:05:33 -0500
Subject: [Numpy-discussion] Numpy 'groupby'
References: <CABzAe0YuOPkAO9mXfV-Q91EAF3PtwUX77Xk+8nNuC-U1zow6ZA@mail.gmail.com>
Message-ID: <jejtud$abh$1@dough.gmane.org>

Michael Hull wrote:

> Hi Everyone,
> First off, thanks for all your hard work on numpy, its a really great help!
> I was wondering if there was a standard 'groupby' in numpy, that
> similar to that in itertools.
> I know its not hard to write with np.diff, but I have found myself
> writing it on more than a couple of occasions, and wondered if
>  there was a 'standarised' version I was missing out on??
> Thanks,
> 
> 
> Mike

I've played with groupby in pandas.


From hmgaudecker at gmail.com  Wed Jan 11 10:12:02 2012
From: hmgaudecker at gmail.com (Hans-Martin v. Gaudecker)
Date: Wed, 11 Jan 2012 16:12:02 +0100
Subject: [Numpy-discussion] Problem installing NumPy with Python
	3.2.2/MacOS X 10.7.2
In-Reply-To: <mailman.6468.1324246451.1086.numpy-discussion@scipy.org>
References: <mailman.6468.1324246451.1086.numpy-discussion@scipy.org>
Message-ID: <472327CE-59F4-4DB5-B80C-D7EC2FFBBAF3@gmail.com>

I recently upgraded to Lion and just faced the same problem with both Python 2.7.2 and Python 3.2.2 installed via the python.org installers. My hunch is that the errors are related to the fact that Apple dropped gcc-4.2 from XCode 4.2. I got gcc-4.2 via [1] then, still the same error -- who knows what else got lost in that upgrade... Previous successful builds with gcc-4.2 might have been with XCode 4.1 (or 4.2 installed on top of it).

In the end I decided to re-install both Python versions via homebrew, nicely described here [2] and everything seems to work fine using LLVM. Test outputs for NumPy master under 2.7.2 and 3.2.2 are below in case they are of interest.

Best,
Hans-Martin

[1] https://github.com/kennethreitz/osx-gcc-installer
[2] http://www.thisisthegreenroom.com/2011/installing-python-numpy-scipy-matplotlib-and-ipython-on-lion/#numpy


>>> numpy.test()
Running unit tests for numpy
NumPy version 2.0.0.dev-55472ca
NumPy is installed in /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy
Python version 2.7.2 (default, Jan 11 2012, 15:34:30) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
nose version 1.1.2
........................./usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/tests/test_datetime.py:1317: UserWarning: pytz not found, pytz compatibility tests skipped
  warnings.warn("pytz not found, pytz compatibility tests skipped")
......................................................................................................................................................F..................................................................................................................................................................................................................................................................S.........................................................................................................................................................../usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/numeric.py:2020: RuntimeWarning: invalid value encountered in absolute
  return all(less_equal(absolute(x-y), atol + rtol * absolute(y)))
..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K....................................................................................................K...SK.S.......S........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................S............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
======================================================================
FAIL: test_einsum_sums_clongdouble (test_einsum.TestEinSum)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/tests/test_einsum.py", line 479, in test_einsum_sums_clongdouble
    self.check_einsum_sums(np.clongdouble);
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/core/tests/test_einsum.py", line 231, in check_einsum_sums
    np.sum(a, axis=0).astype(dtype))
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 256, in assert_equal
    return assert_array_equal(actual, desired, err_msg, verbose)
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 753, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 677, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

(mismatch 100.0%)
 x: array([[ 48.0+0.0j,  50.0+0.0j,  52.0+0.0j,  54.0+0.0j,  56.0+0.0j,
         58.0+0.0j,  60.0+0.0j,  62.0+0.0j,  64.0+0.0j,  66.0+0.0j,
         68.0+0.0j,  70.0+0.0j,  72.0+0.0j,  74.0+0.0j,  76.0+0.0j,...
 y: array([[ 0.0+0.0j,  1.0+0.0j,  2.0+0.0j,  3.0+0.0j,  4.0+0.0j,  5.0+0.0j,
         6.0+0.0j,  7.0+0.0j,  8.0+0.0j,  9.0+0.0j,  10.0+0.0j,  11.0+0.0j,
         12.0+0.0j,  13.0+0.0j,  14.0+0.0j,  15.0+0.0j],...

======================================================================
FAIL: test_prod (test_defmatrix.TestProperties)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/matrixlib/tests/test_defmatrix.py", line 78, in test_prod
    assert_equal(x.prod(0), matrix([[4,10,18]]))
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 256, in assert_equal
    return assert_array_equal(actual, desired, err_msg, verbose)
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 753, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 677, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

(mismatch 100.0%)
 x: matrix([[4611686018427387904, 4611686018427387904,          4449894403]])
 y: matrix([[ 4, 10, 18]])

----------------------------------------------------------------------
Ran 3553 tests in 51.254s

FAILED (KNOWNFAIL=3, SKIP=5, failures=2)
<nose.result.TextTestResult run=3553 errors=0 failures=2>


>>> import numpy
>>> numpy.test()
Running unit tests for numpy
NumPy version 2.0.0.dev-55472ca
NumPy is installed in /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy
Python version 3.2.2 (default, Jan 11 2012, 15:30:18) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
nose version 1.1.2
........................./usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy/core/tests/test_datetime.py:1317: UserWarning: pytz not found, pytz compatibility tests skipped
  warnings.warn("pytz not found, pytz compatibility tests skipped")
...................................................................................................................................................................................................................................................................................E..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K...........................................................................................................................................................................................................K....................................................................................................K...SK.S.......S............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy/lib/format.py:575: ResourceWarning: unclosed file <_io.BufferedReader name='/var/folders/vx/v51x110x36zcd9lppn78v0qw0000gn/T/tmpenv3j3'>
  mode=mode, offset=offset)
...........................................................................................................................................................................................................................S............................................................................................................................................................................................................................................................................................................/usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy/ma/core.py:4778: RuntimeWarning: invalid value encountered in power
  np.power(out, 0.5, out=out, casting='unsafe')
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
======================================================================
ERROR: test_multiarray.TestFromBuffer.test_empty('', array([], dtype=float64), {})
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/nose-1.1.2-py3.2.egg/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy/core/tests/test_multiarray.py", line 1446, in tst_basic
    assert_array_equal(np.frombuffer(buffer,**kwargs),expected)
AttributeError: 'str' object has no attribute '__buffer__'

----------------------------------------------------------------------
Ran 3552 tests in 44.427s

FAILED (KNOWNFAIL=4, SKIP=4, errors=1)
<nose.result.TextTestResult run=3552 errors=1 failures=0>


On 18 Dec 2011, at 23:14, numpy-discussion-request at scipy.org wrote:

> Send NumPy-Discussion mailing list submissions to
> 	numpy-discussion at scipy.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
> 	numpy-discussion-request at scipy.org
> 
> You can reach the person managing the list at
> 	numpy-discussion-owner at scipy.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Problem installing NumPy with Python	3.2.2/MacOS X 10.7.2
>      (McNicol, Adam)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sun, 18 Dec 2011 22:13:48 -0000
> From: "McNicol, Adam" <amcnicol at longroad.ac.uk>
> Subject: Re: [Numpy-discussion] Problem installing NumPy with Python
> 	3.2.2/MacOS X 10.7.2
> To: <numpy-discussion at scipy.org>
> Message-ID: <2128d5dc6c318f2d07c027b5c6c4c0ef5b01bf31 at localhost>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi,
> 
> Definitely have the sdk installed. In the Developer/SDKs directory I have one for 10.6 and another for 10.7 - no idea where a second 10.6 would be coming from =(
> 
> 
> Adam.
> 
> 
> -----Original Message-----
> From: numpy-discussion-request at scipy.org [mailto:numpy-discussion-request at scipy.org]
> Sent: Sun 12/18/2011 9:52 PM
> To: numpy-discussion at scipy.org
> Subject: NumPy-Discussion Digest, Vol 63, Issue 55
> 
> Send NumPy-Discussion mailing list submissions to
> 	numpy-discussion at scipy.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
> 	numpy-discussion-request at scipy.org
> 
> You can reach the person managing the list at
> 	numpy-discussion-owner at scipy.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Problem installing NumPy with Python	3.2.2/MacOS X 10.7.2
>      (McNicol, Adam)
>   2. Re: Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2
>      (Ralf Gommers)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sun, 18 Dec 2011 18:48:47 -0000
> From: "McNicol, Adam" <amcnicol at longroad.ac.uk>
> Subject: Re: [Numpy-discussion] Problem installing NumPy with Python
> 	3.2.2/MacOS X 10.7.2
> To: <numpy-discussion at scipy.org>
> Message-ID: <fd29dc5c77c7ee4f0f0622b3459e91a5d77badc3 at localhost>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi Ralf,
> 
> Thanks for the response. I tried reinstalling Xcode 4.2.1 and the GCC/Fortran installer from http://r.research.att.com/tools/ (gcc-42-5666.3-darwin11.pkg) before installing the distribute package that you suggested.
> 
> I then reran the numpy installer being sure to enter the three export lines as suggested on the numpy installation guide for Lion.
> 
> Still no success. I guess I'll just have to wait for more official support for my configuration.
> 
> I have included the output from terminal just in case it is useful as there were a few lines in red that suggest something isn't quite right with something. I have placed ** before the lines that appear in red.
> 
> I appreciate the suggestions,
> 
> Thanks again,
> 
> 
> Adam.
> 
> running build
> running config_cc
> unifing config_cc, config, build_clib, build_ext, build commands --compiler options
> running config_fc
> unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
> running build_src
> build_src
> building py_modules sources
> creating build
> creating build/src.macosx-10.6-intel-3.2
> creating build/src.macosx-10.6-intel-3.2/numpy
> creating build/src.macosx-10.6-intel-3.2/numpy/distutils
> building library "npymath" sources
> customize NAGFCompiler
> **Could not locate executable f95
> customize AbsoftFCompiler
> **Could not locate executable f90
> **Could not locate executable f77
> customize IBMFCompiler
> **Could not locate executable xlf90
> **Could not locate executable xlf
> customize IntelFCompiler
> **Could not locate executable fort
> **Could not locate executable ifc
> customize GnuFCompiler
> **Could not locate executable g77
> customize Gnu95FCompiler
> **Could not locate executable gfortran
> customize G95FCompiler
> **Could not locate executable g95
> customize PGroupFCompiler
> **Could not locate executable pgf90
> **Could not locate executable pgf77
> **don't know how to compile Fortran code on platform 'posix'
> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk
> 
> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c'
> gcc-4.2: _configtest.c
> gcc-4.2 _configtest.o -o _configtest
> success!
> removing: _configtest.c _configtest.o _configtest
> customize NAGFCompiler
> customize AbsoftFCompiler
> customize IBMFCompiler
> customize IntelFCompiler
> customize GnuFCompiler
> customize Gnu95FCompiler
> customize G95FCompiler
> customize PGroupFCompiler
> **don't know how to compile Fortran code on platform 'posix'
> customize NAGFCompiler
> customize AbsoftFCompiler
> customize IBMFCompiler
> customize IntelFCompiler
> customize GnuFCompiler
> customize Gnu95FCompiler
> customize G95FCompiler
> customize PGroupFCompiler
> **don't know how to compile Fortran code on platform 'posix'
> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk
> 
> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c'
> gcc-4.2: _configtest.c
> _configtest.c:1: warning: conflicting types for built-in function ?exp?
> _configtest.c:1: warning: conflicting types for built-in function ?exp?
> gcc-4.2 _configtest.o -o _configtest
> success!
> removing: _configtest.c _configtest.o _configtest
> creating build/src.macosx-10.6-intel-3.2/numpy/core
> creating build/src.macosx-10.6-intel-3.2/numpy/core/src
> creating build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath
> conv_template:> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/npy_math.c
> conv_template:> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/ieee754.c
> conv_template:> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/npy_math_complex.c
> building extension "numpy.core._sort" sources
> Generating build/src.macosx-10.6-intel-3.2/numpy/core/include/numpy/config.h
> customize NAGFCompiler
> customize AbsoftFCompiler
> customize IBMFCompiler
> customize IntelFCompiler
> customize GnuFCompiler
> customize Gnu95FCompiler
> customize G95FCompiler
> customize PGroupFCompiler
> **don't know how to compile Fortran code on platform 'posix'
> customize NAGFCompiler
> customize AbsoftFCompiler
> customize IBMFCompiler
> customize IntelFCompiler
> customize GnuFCompiler
> customize Gnu95FCompiler
> customize G95FCompiler
> customize PGroupFCompiler
> **don't know how to compile Fortran code on platform 'posix'
> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk
> 
> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c'
> gcc-4.2: _configtest.c
> In file included from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9,
>                 from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73,
>                 from _configtest.c:1:
> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory
> In file included from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9,
>                 from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73,
>                 from _configtest.c:1:
> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory
> lipo: can't figure out the architecture type of: /var/folders/_c/z033hf1s1cgfcxtxfnpg0lsm0000gn/T//ccKv548x.out
> In file included from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9,
>                 from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73,
>                 from _configtest.c:1:
> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory
> In file included from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9,
>                 from /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73,
>                 from _configtest.c:1:
> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory
> lipo: can't figure out the architecture type of: /var/folders/_c/z033hf1s1cgfcxtxfnpg0lsm0000gn/T//ccKv548x.out
> failure.
> removing: _configtest.c _configtest.o
> Running from numpy source directory.Traceback (most recent call last):
>  File "setup.py", line 196, in <module>
>    setup_package()
>  File "setup.py", line 189, in setup_package
>    configuration=configuration )
>  File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/core.py", line 186, in setup
>    return old_setup(**new_attr)
>  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/core.py", line 148, in setup
>    dist.run_commands()
>  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", line 917, in run_commands
>    self.run_command(cmd)
>  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", line 936, in run_command
>    cmd_obj.run()
>  File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build.py", line 37, in run
>    old_build.run(self)
>  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/command/build.py", line 126, in run
>    self.run_command(cmd_name)
>  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/cmd.py", line 313, in run_command
>    self.distribution.run_command(command)
>  File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py", line 936, in run_command
>    cmd_obj.run()
>  File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", line 152, in run
>    self.build_sources()
>  File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", line 169, in build_sources
>    self.build_extension_sources(ext)
>  File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", line 328, in build_extension_sources
>    sources = self.generate_sources(sources, ext)
>  File "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py", line 385, in generate_sources
>    source = func(extension, build_dir)
>  File "numpy/core/setup.py", line 410, in generate_config_h
>    moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir)
>  File "numpy/core/setup.py", line 41, in check_types
>    out = check_types(*a, **kw)
>  File "numpy/core/setup.py", line 271, in check_types
>    "Cannot compile 'Python.h'. Perhaps you need to "\
> SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel.
> 
> 
> 
> Message: 3
> Date: Sun, 18 Dec 2011 09:49:00 +0100
> From: Ralf Gommers <ralf.gommers at googlemail.com>
> Subject: Re: [Numpy-discussion] Problem installing NumPy with Python
> 	3.2.2/MacOS X 10.7.2
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
> 	<CABL7CQh-+t+6p5z_S_JVycRCB-FWK0Q3wiAWjazgmDp8AJkKqw at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> On Sat, Dec 17, 2011 at 12:59 PM, McNicol, Adam <amcnicol at longroad.ac.uk>wrote:
> 
>> **
>> 
>> Hi There,
>> 
>> Thanks for the responses.
>> 
>> At this point I would settle from just being able to install matplotlib.
>> Even if some of the functionality isn't present currently that is fine.
>> 
>> I'm afraid my knowledge of Python falls down about here as well. I
>> installed Python 3.2.2 via the installer from Python.org so I have no idea
>> whether Python.h is present or where indeed I would find it or how I would
>> add it to the search path.
>> 
>> Do I have to install from source or something like that?
>> 
> 
> No, your Python install should be fine if you just got the dmg installer
> from python.org. I recommend you install the OS X SDKs and distribute (
> http://pypi.python.org/pypi/distribute), as I said before, and try again to
> compile numpy.
> 
> Unfortunately you have chosen a difficult combination of OS and Python
> version, so we don't have binary installers you can use (yet).
> 
> Ralf
> 
> 
>> Thanks again,
>> 
>> 
>> Adam.
>> 
>> 
>> -----Original Message-----
>> From: McNicol, Adam
>> Sent: Fri 12/16/2011 11:07 PM
>> To: numpy-discussion at scipy.org
>> Subject: Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2
>> 
>> Hi There,
>> 
>> I am very new to numpy and have really only started investigating it as
>> one of my students needs some functionality from matplotlib. I have managed
>> to install everything under Windows for work in class but I use a Mac at
>> home and have been struggling all night to get it to build and install.
>> 
>> I should mention that I am using Python 3.2.2 both in school and at home
>> and it isn't an option to use Python 2.7 as all of the rest of my class is
>> taught in Python 3. I also have the most recent version of Xcode installed.
>> 
>> I have installed the correct build of gcc-4.2 with Fortran (gcc-4.2 (Apple
>> build 5666.3) with GNU Fortran 4.2.4 for Mac OS X 10.7 (Lion)) from
>> http://r.research.att.com/tools/
>> 
>> I then followed the install instructions but the build fails with the
>> following message:
>> 
>>  File "numpy/core/setup.py", line 271, in check_types
>>    "Cannot compile 'Python.h'. Perhaps you need to "\
>> SystemError: Cannot compile 'Python.h'. Perhaps you need to install
>> python-dev|python-devel.
>> 
>> I have got no idea what to do with this error message. Any help would be
>> much appreciated.
>> 
>> Kind Regards,
>> 
>> 
>> Adam.
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/0747fb12/attachment-0001.html 
> 
> ------------------------------
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> End of NumPy-Discussion Digest, Vol 63, Issue 54
> ************************************************
> 
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/ms-tnef
> Size: 7394 bytes
> Desc: not available
> Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/02269314/attachment-0001.bin 
> 
> ------------------------------
> 
> Message: 2
> Date: Sun, 18 Dec 2011 22:53:35 +0100
> From: Ralf Gommers <ralf.gommers at googlemail.com>
> Subject: Re: [Numpy-discussion] Problem installing NumPy with Python
> 	3.2.2/MacOS X 10.7.2
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
> 	<CABL7CQgYJWK2ejek0fbQbs2tjT60kHPQjtPuPBTAX4-doT7YtA at mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
> 
> On Sun, Dec 18, 2011 at 7:48 PM, McNicol, Adam <amcnicol at longroad.ac.uk>wrote:
> 
>> Hi Ralf,
>> 
>> Thanks for the response. I tried reinstalling Xcode 4.2.1 and the
>> GCC/Fortran installer from http://r.research.att.com/tools/(gcc-42-5666.3-darwin11.pkg) before installing the distribute package that
>> you suggested.
>> 
>> I then reran the numpy installer being sure to enter the three export
>> lines as suggested on the numpy installation guide for Lion.
>> 
>> Still no success. I guess I'll just have to wait for more official support
>> for my configuration.
>> 
>> I have included the output from terminal just in case it is useful as
>> there were a few lines in red that suggest something isn't quite right with
>> something. I have placed ** before the lines that appear in red.
>> 
>> Your compile flags have "-isysroot /Developer/SDKs/MacOSX10.6.sdk" in it
> twice. Can you confirm you have installed this SDK? If so, I think the
> problem is that it appears twice. Not sure what's causing it though.
> 
> Ralf
> 
> 
> I appreciate the suggestions,
>> 
>> Thanks again,
>> 
>> 
>> Adam.
>> 
>> running build
>> running config_cc
>> unifing config_cc, config, build_clib, build_ext, build commands
>> --compiler options
>> running config_fc
>> unifing config_fc, config, build_clib, build_ext, build commands
>> --fcompiler options
>> running build_src
>> build_src
>> building py_modules sources
>> creating build
>> creating build/src.macosx-10.6-intel-3.2
>> creating build/src.macosx-10.6-intel-3.2/numpy
>> creating build/src.macosx-10.6-intel-3.2/numpy/distutils
>> building library "npymath" sources
>> customize NAGFCompiler
>> **Could not locate executable f95
>> customize AbsoftFCompiler
>> **Could not locate executable f90
>> **Could not locate executable f77
>> customize IBMFCompiler
>> **Could not locate executable xlf90
>> **Could not locate executable xlf
>> customize IntelFCompiler
>> **Could not locate executable fort
>> **Could not locate executable ifc
>> customize GnuFCompiler
>> **Could not locate executable g77
>> customize Gnu95FCompiler
>> **Could not locate executable gfortran
>> customize G95FCompiler
>> **Could not locate executable g95
>> customize PGroupFCompiler
>> **Could not locate executable pgf90
>> **Could not locate executable pgf77
>> **don't know how to compile Fortran code on platform 'posix'
>> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g
>> -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64
>> -isysroot /Developer/SDKs/MacOSX10.6.sdk
>> 
>> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
>> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath
>> -Inumpy/core/include
>> -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c'
>> gcc-4.2: _configtest.c
>> gcc-4.2 _configtest.o -o _configtest
>> success!
>> removing: _configtest.c _configtest.o _configtest
>> customize NAGFCompiler
>> customize AbsoftFCompiler
>> customize IBMFCompiler
>> customize IntelFCompiler
>> customize GnuFCompiler
>> customize Gnu95FCompiler
>> customize G95FCompiler
>> customize PGroupFCompiler
>> **don't know how to compile Fortran code on platform 'posix'
>> customize NAGFCompiler
>> customize AbsoftFCompiler
>> customize IBMFCompiler
>> customize IntelFCompiler
>> customize GnuFCompiler
>> customize Gnu95FCompiler
>> customize G95FCompiler
>> customize PGroupFCompiler
>> **don't know how to compile Fortran code on platform 'posix'
>> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g
>> -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64
>> -isysroot /Developer/SDKs/MacOSX10.6.sdk
>> 
>> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
>> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath
>> -Inumpy/core/include
>> -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c'
>> gcc-4.2: _configtest.c
>> _configtest.c:1: warning: conflicting types for built-in function ?exp?
>> _configtest.c:1: warning: conflicting types for built-in function ?exp?
>> gcc-4.2 _configtest.o -o _configtest
>> success!
>> removing: _configtest.c _configtest.o _configtest
>> creating build/src.macosx-10.6-intel-3.2/numpy/core
>> creating build/src.macosx-10.6-intel-3.2/numpy/core/src
>> creating build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath
>> conv_template:>
>> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/npy_math.c
>> conv_template:>
>> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/ieee754.c
>> conv_template:>
>> build/src.macosx-10.6-intel-3.2/numpy/core/src/npymath/npy_math_complex.c
>> building extension "numpy.core._sort" sources
>> Generating
>> build/src.macosx-10.6-intel-3.2/numpy/core/include/numpy/config.h
>> customize NAGFCompiler
>> customize AbsoftFCompiler
>> customize IBMFCompiler
>> customize IntelFCompiler
>> customize GnuFCompiler
>> customize Gnu95FCompiler
>> customize G95FCompiler
>> customize PGroupFCompiler
>> **don't know how to compile Fortran code on platform 'posix'
>> customize NAGFCompiler
>> customize AbsoftFCompiler
>> customize IBMFCompiler
>> customize IntelFCompiler
>> customize GnuFCompiler
>> customize Gnu95FCompiler
>> customize G95FCompiler
>> customize PGroupFCompiler
>> **don't know how to compile Fortran code on platform 'posix'
>> C compiler: gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g
>> -O3 -isysroot /Developer/SDKs/MacOSX10.6.sdk -arch i386 -arch x86_64
>> -isysroot /Developer/SDKs/MacOSX10.6.sdk
>> 
>> compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core
>> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath
>> -Inumpy/core/include
>> -I/Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m -c'
>> gcc-4.2: _configtest.c
>> In file included from
>> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9,
>>                from
>> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73,
>>                from _configtest.c:1:
>> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h:
>> No such file or directory
>> In file included from
>> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9,
>>                from
>> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73,
>>                from _configtest.c:1:
>> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h:
>> No such file or directory
>> lipo: can't figure out the architecture type of:
>> /var/folders/_c/z033hf1s1cgfcxtxfnpg0lsm0000gn/T//ccKv548x.out
>> In file included from
>> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9,
>>                from
>> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73,
>>                from _configtest.c:1:
>> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h:
>> No such file or directory
>> In file included from
>> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/bytearrayobject.h:9,
>>                from
>> /Library/Frameworks/Python.framework/Versions/3.2/include/python3.2m/Python.h:73,
>>                from _configtest.c:1:
>> /Developer/SDKs/MacOSX10.6.sdk/usr/include/stdarg.h:4:25: error: stdarg.h:
>> No such file or directory
>> lipo: can't figure out the architecture type of:
>> /var/folders/_c/z033hf1s1cgfcxtxfnpg0lsm0000gn/T//ccKv548x.out
>> failure.
>> removing: _configtest.c _configtest.o
>> Running from numpy source directory.Traceback (most recent call last):
>> File "setup.py", line 196, in <module>
>>   setup_package()
>> File "setup.py", line 189, in setup_package
>>   configuration=configuration )
>> File
>> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/core.py",
>> line 186, in setup
>>   return old_setup(**new_attr)
>> File
>> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/core.py",
>> line 148, in setup
>>   dist.run_commands()
>> File
>> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py",
>> line 917, in run_commands
>>   self.run_command(cmd)
>> File
>> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py",
>> line 936, in run_command
>>   cmd_obj.run()
>> File
>> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build.py",
>> line 37, in run
>>   old_build.run(self)
>> File
>> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/command/build.py",
>> line 126, in run
>>   self.run_command(cmd_name)
>> File
>> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/cmd.py",
>> line 313, in run_command
>>   self.distribution.run_command(command)
>> File
>> "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/distutils/dist.py",
>> line 936, in run_command
>>   cmd_obj.run()
>> File
>> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py",
>> line 152, in run
>>   self.build_sources()
>> File
>> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py",
>> line 169, in build_sources
>>   self.build_extension_sources(ext)
>> File
>> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py",
>> line 328, in build_extension_sources
>>   sources = self.generate_sources(sources, ext)
>> File
>> "/Users/adammcnicol/Desktop/numpy-1.6.1/build/py3k/numpy/distutils/command/build_src.py",
>> line 385, in generate_sources
>>   source = func(extension, build_dir)
>> File "numpy/core/setup.py", line 410, in generate_config_h
>>   moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir)
>> File "numpy/core/setup.py", line 41, in check_types
>>   out = check_types(*a, **kw)
>> File "numpy/core/setup.py", line 271, in check_types
>>   "Cannot compile 'Python.h'. Perhaps you need to "\
>> SystemError: Cannot compile 'Python.h'. Perhaps you need to install
>> python-dev|python-devel.
>> 
>> 
>> 
>> Message: 3
>> Date: Sun, 18 Dec 2011 09:49:00 +0100
>> From: Ralf Gommers <ralf.gommers at googlemail.com>
>> Subject: Re: [Numpy-discussion] Problem installing NumPy with Python
>>       3.2.2/MacOS X 10.7.2
>> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
>> Message-ID:
>>       <CABL7CQh-+t+6p5z_S_JVycRCB-FWK0Q3wiAWjazgmDp8AJkKqw at mail.gmail.com
>>> 
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> On Sat, Dec 17, 2011 at 12:59 PM, McNicol, Adam <amcnicol at longroad.ac.uk
>>> wrote:
>> 
>>> **
>>> 
>>> Hi There,
>>> 
>>> Thanks for the responses.
>>> 
>>> At this point I would settle from just being able to install matplotlib.
>>> Even if some of the functionality isn't present currently that is fine.
>>> 
>>> I'm afraid my knowledge of Python falls down about here as well. I
>>> installed Python 3.2.2 via the installer from Python.org so I have no
>> idea
>>> whether Python.h is present or where indeed I would find it or how I
>> would
>>> add it to the search path.
>>> 
>>> Do I have to install from source or something like that?
>>> 
>> 
>> No, your Python install should be fine if you just got the dmg installer
>> from python.org. I recommend you install the OS X SDKs and distribute (
>> http://pypi.python.org/pypi/distribute), as I said before, and try again
>> to
>> compile numpy.
>> 
>> Unfortunately you have chosen a difficult combination of OS and Python
>> version, so we don't have binary installers you can use (yet).
>> 
>> Ralf
>> 
>> 
>>> Thanks again,
>>> 
>>> 
>>> Adam.
>>> 
>>> 
>>> -----Original Message-----
>>> From: McNicol, Adam
>>> Sent: Fri 12/16/2011 11:07 PM
>>> To: numpy-discussion at scipy.org
>>> Subject: Problem installing NumPy with Python 3.2.2/MacOS X 10.7.2
>>> 
>>> Hi There,
>>> 
>>> I am very new to numpy and have really only started investigating it as
>>> one of my students needs some functionality from matplotlib. I have
>> managed
>>> to install everything under Windows for work in class but I use a Mac at
>>> home and have been struggling all night to get it to build and install.
>>> 
>>> I should mention that I am using Python 3.2.2 both in school and at home
>>> and it isn't an option to use Python 2.7 as all of the rest of my class
>> is
>>> taught in Python 3. I also have the most recent version of Xcode
>> installed.
>>> 
>>> I have installed the correct build of gcc-4.2 with Fortran (gcc-4.2
>> (Apple
>>> build 5666.3) with GNU Fortran 4.2.4 for Mac OS X 10.7 (Lion)) from
>>> http://r.research.att.com/tools/
>>> 
>>> I then followed the install instructions but the build fails with the
>>> following message:
>>> 
>>>  File "numpy/core/setup.py", line 271, in check_types
>>>    "Cannot compile 'Python.h'. Perhaps you need to "\
>>> SystemError: Cannot compile 'Python.h'. Perhaps you need to install
>>> python-dev|python-devel.
>>> 
>>> I have got no idea what to do with this error message. Any help would be
>>> much appreciated.
>>> 
>>> Kind Regards,
>>> 
>>> 
>>> Adam.
>>> 
>>> 
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> 
>>> 
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/0747fb12/attachment-0001.html
>> 
>> ------------------------------
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
>> 
>> End of NumPy-Discussion Digest, Vol 63, Issue 54
>> ************************************************
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/35cbac93/attachment.html 
> 
> ------------------------------
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> End of NumPy-Discussion Digest, Vol 63, Issue 55
> ************************************************
> 
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/ms-tnef
> Size: 13714 bytes
> Desc: not available
> Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111218/4ae2f625/attachment.bin 
> 
> ------------------------------
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> End of NumPy-Discussion Digest, Vol 63, Issue 56
> ************************************************


From wesmckinn at gmail.com  Wed Jan 11 20:27:46 2012
From: wesmckinn at gmail.com (Wes McKinney)
Date: Wed, 11 Jan 2012 20:27:46 -0500
Subject: [Numpy-discussion] Numpy 'groupby'
In-Reply-To: <jejtud$abh$1@dough.gmane.org>
References: <CABzAe0YuOPkAO9mXfV-Q91EAF3PtwUX77Xk+8nNuC-U1zow6ZA@mail.gmail.com>
	<jejtud$abh$1@dough.gmane.org>
Message-ID: <CAJPUwMBncdJJPWUpijictfthqa_D-c2vu2o3EVt5xgz4f=An5g@mail.gmail.com>

On Wed, Jan 11, 2012 at 7:05 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Michael Hull wrote:
>
>> Hi Everyone,
>> First off, thanks for all your hard work on numpy, its a really great help!
>> I was wondering if there was a standard 'groupby' in numpy, that
>> similar to that in itertools.
>> I know its not hard to write with np.diff, but I have found myself
>> writing it on more than a couple of occasions, and wondered if
>> ?there was a 'standarised' version I was missing out on??
>> Thanks,
>>
>>
>> Mike
>
> I've played with groupby in pandas.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

I agree (unsurprisingly) that pandas is your best bet:

http://pandas.sourceforge.net/groupby.html

I've had it on my TODO list to extend the pandas groupby engine (which
has grown fairly sophisticated) to work with generic ndarrays and
record arrays:

https://github.com/wesm/pandas/issues/123

It shouldn't actually be that hard for most simple cases. I could
imagine the results of a groupby being somewhat difficult to interpret
without axis labeling/indexing, though.

cheers,
Wes


From ivan.oseledets at gmail.com  Thu Jan 12 09:21:32 2012
From: ivan.oseledets at gmail.com (Ivan Oseledets)
Date: Thu, 12 Jan 2012 18:21:32 +0400
Subject: [Numpy-discussion] Question on F/C-ordering in numpy svd
Message-ID: <CANSLWcS4tgUZRTf=TCzdn=-A6kNUnZztHbCyF8s24FpgPx0jiA@mail.gmail.com>

Dear all!

I quite new to numpy and python.
I am a matlab user, my work is mainly
on multidimensional arrays, and I have a question on the svd function
from numpy.linalg

It seems that

u,s,v=svd(a,full_matrices=False)

returns u and v in the F-contiguous format.

That is not in a good agreement with other numpy stuff, where
C-ordering is default.
For example, matrix multiplication, dot() ignores ordering and returns
result always in C-ordering.
(which is documented), but the svd feature is not documented.

With best wishes, Ivan


From langton2 at llnl.gov  Thu Jan 12 20:13:41 2012
From: langton2 at llnl.gov (Asher Langton)
Date: Thu, 12 Jan 2012 17:13:41 -0800
Subject: [Numpy-discussion] Improving Python+MPI import performance
Message-ID: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>

Hi all,

(I originally posted this to the BayPIGgies list, where Fernando Perez
suggested I send it to the NumPy list as well. My apologies if you're
receiving this email twice.)

I work on a Python/C++ scientific code that runs as a number of
independent Python processes communicating via MPI. Unfortunately, as
some of you may have experienced, module importing does not scale well
in Python/MPI applications. For 32k processes on BlueGene/P, importing
100 trivial C-extension modules takes 5.5 hours, compared to 35
minutes for all other interpreter loading and initialization. We
developed a simple pure-Python module (based on knee.py, a
hierarchical import example) that cuts the import time from 5.5 hours
to 6 minutes.

The code is available here:

https://github.com/langton/MPI_Import

Usage, implementation details, and limitations are described in a
docstring at the beginning of the file (just after the mandatory
legalese).

I've talked with a few people who've faced the same problem and heard
about a variety of approaches, which range from putting all necessary
files in one directory to hacking the interpreter itself so it
distributes the module-loading over MPI. Last summer, I had a student
intern try a few of these approaches. It turned out that the problem
wasn't so much the simultaneous module loads, but rather the huge
number of failed open() calls (ENOENT) as the interpreter tries to
find the module files. In the MPI_Import module, we have rank 0
perform the module lookups and then broadcast the locations to the
rest of the processes. For our real-world scientific applications
written in Python and C++, this has meant that we can start a problem
and actually make computational progress before the batch allocation
ends.

If you try out the code, I'd appreciate any feedback you have:
performance results, bugfixes/feature-additions, or alternate
approaches to solving this problem. Thanks!

-Asher


From pearu.peterson at gmail.com  Fri Jan 13 03:05:55 2012
From: pearu.peterson at gmail.com (Pearu Peterson)
Date: Fri, 13 Jan 2012 10:05:55 +0200
Subject: [Numpy-discussion] Question on F/C-ordering in numpy svd
In-Reply-To: <CANSLWcS4tgUZRTf=TCzdn=-A6kNUnZztHbCyF8s24FpgPx0jiA@mail.gmail.com>
References: <CANSLWcS4tgUZRTf=TCzdn=-A6kNUnZztHbCyF8s24FpgPx0jiA@mail.gmail.com>
Message-ID: <4F0FE5E3.2050502@cens.ioc.ee>


On 01/12/2012 04:21 PM, Ivan Oseledets wrote:
> Dear all!
>
> I quite new to numpy and python.
> I am a matlab user, my work is mainly
> on multidimensional arrays, and I have a question on the svd function
> from numpy.linalg
>
> It seems that
>
> u,s,v=svd(a,full_matrices=False)
>
> returns u and v in the F-contiguous format.

The reason for this is that the underlying computational routine
is in Fortran (when using system lapack library, for instance) that 
requires and returns F-contiguous arrays and the current behaviour 
guarantees the most memory efficient computation of svd.

> That is not in a good agreement with other numpy stuff, where
> C-ordering is default.
> For example, matrix multiplication, dot() ignores ordering and returns
> result always in C-ordering.
> (which is documented), but the svd feature is not documented.

In generic numpy operation, the particular ordering of arrays
should not matter as the underlying code should know how to
compute array operation results from different input orderings
efficiently.

This behaviour of svd should be documented. However, one
should check that when using the svd from numpy lapack_lite (which is 
f2c code and could use also C-ordering, in principle),
F-contiguous arrays are actually returned.

Regards,
Pearu


From mmueller at python-academy.de  Fri Jan 13 05:32:45 2012
From: mmueller at python-academy.de (=?ISO-8859-15?Q?Mike_M=FCller?=)
Date: Fri, 13 Jan 2012 11:32:45 +0100
Subject: [Numpy-discussion] Python for Scientists - courses in Germany and US
Message-ID: <4F10084D.1030708@python-academy.de>

Learn NumPy and Much More
=========================

Scientists like Python. If you would like to learn more about
important libraries for scientific applications, you might be
interested in these courses.

The course in Germany covers:

- Overview of libraries
- NumPy
- Data storage with text files, Excel, netCDF and HDF5
- matplotlib
- Object oriented programming for scientists
- Problem solving session

The course in the USA covers all this plus:

- Extending Python in other languages
- Version control
- Unit testing


More details below.

If you have any questions about the courses, please contact me.

Mike


Python for Scientists and Engineers (Germany)
---------------------------------------------

A three-day course covering all the basic tools scientists and engineers need.
This course requires basic Python knowledge.

Date: 19.01.-21.01.2012
Location: Leipzig, Germany
Trainer: Mike M?ller
Course Language: English
Link: http://www.python-academy.com/courses/python_course_scientists.html


Python for Scientists and Engineers (USA)
-----------------------------------------

This is an extend version of our well-received course for
scientists and engineers. Five days of intensive training
will give you a solid basis for using Python for scientific
an technical problems.

The course is hosted by David Beazley (http://www.dabeaz.com).

Date: 27.02.-02.03.2012
Location: Chicago, IL, USA
Trainer: Mike M?ller
Course Language: English
Link: http://www.dabeaz.com/chicago/science.html


From ischnell at enthought.com  Fri Jan 13 12:57:56 2012
From: ischnell at enthought.com (Ilan Schnell)
Date: Fri, 13 Jan 2012 11:57:56 -0600
Subject: [Numpy-discussion] Python for Scientists - courses in Germany
	and US
In-Reply-To: <4F10084D.1030708@python-academy.de>
References: <4F10084D.1030708@python-academy.de>
Message-ID: <CAAUn5qK1z-4m-oUrBzCUVq6a+7Ntrw7dXUws8G5N3AuorHdXGA@mail.gmail.com>

By the way, Enthought is also offering Python training and we have
just updated out training calendar for this year:
http://www.enthought.com/training/enthought_training_calendar.php

We are offering about 20 open Python classes in the US and Europe
this year.

- Ilan


On Fri, Jan 13, 2012 at 4:32 AM, Mike M?ller <mmueller at python-academy.de> wrote:
> Learn NumPy and Much More
> =========================
>
> Scientists like Python. If you would like to learn more about
> important libraries for scientific applications, you might be
> interested in these courses.
>
> The course in Germany covers:
>
> - Overview of libraries
> - NumPy
> - Data storage with text files, Excel, netCDF and HDF5
> - matplotlib
> - Object oriented programming for scientists
> - Problem solving session
>
> The course in the USA covers all this plus:
>
> - Extending Python in other languages
> - Version control
> - Unit testing
>
>
> More details below.
>
> If you have any questions about the courses, please contact me.
>
> Mike
>
>
> Python for Scientists and Engineers (Germany)
> ---------------------------------------------
>
> A three-day course covering all the basic tools scientists and engineers need.
> This course requires basic Python knowledge.
>
> Date: 19.01.-21.01.2012
> Location: Leipzig, Germany
> Trainer: Mike M?ller
> Course Language: English
> Link: http://www.python-academy.com/courses/python_course_scientists.html
>
>
> Python for Scientists and Engineers (USA)
> -----------------------------------------
>
> This is an extend version of our well-received course for
> scientists and engineers. Five days of intensive training
> will give you a solid basis for using Python for scientific
> an technical problems.
>
> The course is hosted by David Beazley (http://www.dabeaz.com).
>
> Date: 27.02.-02.03.2012
> Location: Chicago, IL, USA
> Trainer: Mike M?ller
> Course Language: English
> Link: http://www.dabeaz.com/chicago/science.html
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From sturla at molden.no  Fri Jan 13 14:41:19 2012
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Jan 2012 20:41:19 +0100
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>
References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>
Message-ID: <4F1088DF.6090201@molden.no>

Den 13.01.2012 02:13, skrev Asher Langton:
> intern try a few of these approaches. It turned out that the problem
> wasn't so much the simultaneous module loads, but rather the huge
> number of failed open() calls (ENOENT) as the interpreter tries to
> find the module files.

It sounds like there is a scalability problem with imp.find_module. I'd 
report
this on python-dev or python-ideas.

Sturla


From robert.kern at gmail.com  Fri Jan 13 14:53:44 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 13 Jan 2012 19:53:44 +0000
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F1088DF.6090201@molden.no>
References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>
	<4F1088DF.6090201@molden.no>
Message-ID: <CAF6FJit110_vLUwThKR5+3EPP3yV_kvQOv4QW4NEAbxz4BWf-w@mail.gmail.com>

On Fri, Jan 13, 2012 at 19:41, Sturla Molden <sturla at molden.no> wrote:
> Den 13.01.2012 02:13, skrev Asher Langton:
>> intern try a few of these approaches. It turned out that the problem
>> wasn't so much the simultaneous module loads, but rather the huge
>> number of failed open() calls (ENOENT) as the interpreter tries to
>> find the module files.
>
> It sounds like there is a scalability problem with imp.find_module. I'd
> report
> this on python-dev or python-ideas.

It's well-known.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From d.s.seljebotn at astro.uio.no  Fri Jan 13 15:19:11 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 13 Jan 2012 21:19:11 +0100
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>
References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>
Message-ID: <4F1091BF.4020801@astro.uio.no>

On 01/13/2012 02:13 AM, Asher Langton wrote:
> Hi all,
>
> (I originally posted this to the BayPIGgies list, where Fernando Perez
> suggested I send it to the NumPy list as well. My apologies if you're
> receiving this email twice.)
>
> I work on a Python/C++ scientific code that runs as a number of
> independent Python processes communicating via MPI. Unfortunately, as
> some of you may have experienced, module importing does not scale well
> in Python/MPI applications. For 32k processes on BlueGene/P, importing
> 100 trivial C-extension modules takes 5.5 hours, compared to 35
> minutes for all other interpreter loading and initialization. We
> developed a simple pure-Python module (based on knee.py, a
> hierarchical import example) that cuts the import time from 5.5 hours
> to 6 minutes.
>
> The code is available here:
>
> https://github.com/langton/MPI_Import
>
> Usage, implementation details, and limitations are described in a
> docstring at the beginning of the file (just after the mandatory
> legalese).
>
> I've talked with a few people who've faced the same problem and heard
> about a variety of approaches, which range from putting all necessary
> files in one directory to hacking the interpreter itself so it
> distributes the module-loading over MPI. Last summer, I had a student
> intern try a few of these approaches. It turned out that the problem
> wasn't so much the simultaneous module loads, but rather the huge
> number of failed open() calls (ENOENT) as the interpreter tries to
> find the module files. In the MPI_Import module, we have rank 0
> perform the module lookups and then broadcast the locations to the
> rest of the processes. For our real-world scientific applications
> written in Python and C++, this has meant that we can start a problem
> and actually make computational progress before the batch allocation
> ends.

This is great news! I've forwarded to the mpi4py mailing list which 
despairs over this regularly.

Another idea: Given your diagnostics, wouldn't dumping the output of 
"find" of every path in sys.path to a single text file work well? Then 
each node download that file once and consult it when looking for 
modules, instead of network file metadata.

(In fact I think "texhash" does the same for LaTeX?)

The disadvantage is that one would need to run "update-python-paths" 
every time a package is installed to update the text file. But I'm not 
sure if that that disadvantage is larger than remembering to avoid 
diverging import paths between nodes; hopefully one could put a reminder 
to run update-python-paths in the ImportError string.


> If you try out the code, I'd appreciate any feedback you have:
> performance results, bugfixes/feature-additions, or alternate
> approaches to solving this problem. Thanks!

I didn't try it myself, but forwarding this from the mpi4py mailing list:

"""
I'm testing it now and actually
running into some funny errors with unittest on Python 2.7 causing
infinite recursion.  If anyone is able to get this going, and could
report successes back to the group, that would be very helpful.
"""

Dag Sverre


From d.s.seljebotn at astro.uio.no  Fri Jan 13 15:21:30 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 13 Jan 2012 21:21:30 +0100
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F1091BF.4020801@astro.uio.no>
References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>
	<4F1091BF.4020801@astro.uio.no>
Message-ID: <4F10924A.4080600@astro.uio.no>

On 01/13/2012 09:19 PM, Dag Sverre Seljebotn wrote:
> On 01/13/2012 02:13 AM, Asher Langton wrote:
>> Hi all,
>>
>> (I originally posted this to the BayPIGgies list, where Fernando Perez
>> suggested I send it to the NumPy list as well. My apologies if you're
>> receiving this email twice.)
>>
>> I work on a Python/C++ scientific code that runs as a number of
>> independent Python processes communicating via MPI. Unfortunately, as
>> some of you may have experienced, module importing does not scale well
>> in Python/MPI applications. For 32k processes on BlueGene/P, importing
>> 100 trivial C-extension modules takes 5.5 hours, compared to 35
>> minutes for all other interpreter loading and initialization. We
>> developed a simple pure-Python module (based on knee.py, a
>> hierarchical import example) that cuts the import time from 5.5 hours
>> to 6 minutes.
>>
>> The code is available here:
>>
>> https://github.com/langton/MPI_Import
>>
>> Usage, implementation details, and limitations are described in a
>> docstring at the beginning of the file (just after the mandatory
>> legalese).
>>
>> I've talked with a few people who've faced the same problem and heard
>> about a variety of approaches, which range from putting all necessary
>> files in one directory to hacking the interpreter itself so it
>> distributes the module-loading over MPI. Last summer, I had a student
>> intern try a few of these approaches. It turned out that the problem
>> wasn't so much the simultaneous module loads, but rather the huge
>> number of failed open() calls (ENOENT) as the interpreter tries to
>> find the module files. In the MPI_Import module, we have rank 0
>> perform the module lookups and then broadcast the locations to the
>> rest of the processes. For our real-world scientific applications
>> written in Python and C++, this has meant that we can start a problem
>> and actually make computational progress before the batch allocation
>> ends.
>
> This is great news! I've forwarded to the mpi4py mailing list which
> despairs over this regularly.
>
> Another idea: Given your diagnostics, wouldn't dumping the output of
> "find" of every path in sys.path to a single text file work well? Then
> each node download that file once and consult it when looking for
> modules, instead of network file metadata.
>
> (In fact I think "texhash" does the same for LaTeX?)
>
> The disadvantage is that one would need to run "update-python-paths"
> every time a package is installed to update the text file. But I'm not
> sure if that that disadvantage is larger than remembering to avoid
> diverging import paths between nodes; hopefully one could put a reminder
> to run update-python-paths in the ImportError string.

I meant "diverging code paths during imports between nodes"..

Dag

>
>
>> If you try out the code, I'd appreciate any feedback you have:
>> performance results, bugfixes/feature-additions, or alternate
>> approaches to solving this problem. Thanks!
>
> I didn't try it myself, but forwarding this from the mpi4py mailing list:
>
> """
> I'm testing it now and actually
> running into some funny errors with unittest on Python 2.7 causing
> infinite recursion.  If anyone is able to get this going, and could
> report successes back to the group, that would be very helpful.
> """
>
> Dag Sverre
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From sturla at molden.no  Fri Jan 13 15:38:50 2012
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Jan 2012 21:38:50 +0100
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F10924A.4080600@astro.uio.no>
References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>
	<4F1091BF.4020801@astro.uio.no> <4F10924A.4080600@astro.uio.no>
Message-ID: <4F10965A.4040605@molden.no>

Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
> Another idea: Given your diagnostics, wouldn't dumping the output of
> "find" of every path in sys.path to a single text file work well?

It probably would, and would also be less prone to synchronization 
problems than using an MPI broadcast. Another possibility would be to 
use a bsddb (or sqlite?) file as a persistent dict for caching the 
output of imp.find_module.

Sturla


From langton2 at llnl.gov  Fri Jan 13 16:20:23 2012
From: langton2 at llnl.gov (Langton, Asher)
Date: Fri, 13 Jan 2012 13:20:23 -0800
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F10965A.4040605@molden.no>
Message-ID: <CB35D7D9.A659%langton2@llnl.gov>

On 1/13/12 12:38 PM, Sturla Molden wrote:
>Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
>> Another idea: Given your diagnostics, wouldn't dumping the output of
>> "find" of every path in sys.path to a single text file work well?
>
>It probably would, and would also be less prone to synchronization
>problems than using an MPI broadcast. Another possibility would be to
>use a bsddb (or sqlite?) file as a persistent dict for caching the
>output of imp.find_module.

We tested something along those lines. Tim Kadich, a summer student at
LLNL, wrote a module that went through the path and built up a dict of
module->location mappings for a subset of module types. My recollection is
that it worked well, and as you note, it didn't have the synchronization
issues that MPI_Import has. We didn't fully implement it, since to handle
complicated packages correctly, it looked like we'd either have to
re-implement a lot of the internal Python import code or modify the
interpreter itself. I don't think that MPI_Import is ultimately the
"right" solution, but it shows how easily we can reap significant gains.
Two better approaches that come to mind are:

1) Fixing this bottleneck at the interpreter level (pre-computing and
caching the locations)

2) More generally, dealing with this as well as other library-loading
issues at the system level, perhaps by putting a small disk near a node or
small collection of nodes, along with a command to push (broadcast) some
portions of the filesystem to these (more-)local disks. Basically, the
idea would be to let the user specify those directories or objects that
will be accessed by most of the processes and treated as read-only so that
those objects can be cached near the node.

-Asher


From robert.kern at gmail.com  Fri Jan 13 16:24:11 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 13 Jan 2012 21:24:11 +0000
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <CB35D7D9.A659%langton2@llnl.gov>
References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov>
Message-ID: <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com>

On Fri, Jan 13, 2012 at 21:20, Langton, Asher <langton2 at llnl.gov> wrote:

> 2) More generally, dealing with this as well as other library-loading
> issues at the system level, perhaps by putting a small disk near a node or
> small collection of nodes, along with a command to push (broadcast) some
> portions of the filesystem to these (more-)local disks. Basically, the
> idea would be to let the user specify those directories or objects that
> will be accessed by most of the processes and treated as read-only so that
> those objects can be cached near the node.

Do these systems have a ramdisk capability?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From sturla at molden.no  Fri Jan 13 16:42:25 2012
From: sturla at molden.no (Sturla Molden)
Date: Fri, 13 Jan 2012 22:42:25 +0100
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com>
References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov>
	<CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com>
Message-ID: <4F10A541.60305@molden.no>

Den 13.01.2012 22:24, skrev Robert Kern:
> Do these systems have a ramdisk capability? 

I assume you have seen this as well :)

http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf


Sturla


From travis at continuum.io  Fri Jan 13 16:48:51 2012
From: travis at continuum.io (Travis Oliphant)
Date: Fri, 13 Jan 2012 15:48:51 -0600
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F10965A.4040605@molden.no>
References: <CAB079HKO-+LXKDYvq=zHF+oTVUqiLzNq4-G0SMN4V34oNYhDSw@mail.gmail.com>
	<4F1091BF.4020801@astro.uio.no> <4F10924A.4080600@astro.uio.no>
	<4F10965A.4040605@molden.no>
Message-ID: <88C67BE4-5FE5-45FC-AD3C-540F269B6D50@continuum.io>

It is a straightforward thing to implement a "registry mechanism" for Python that by-passes imp.find_module (i.e. using sys.meta_path).  You could  imagine creating the registry file for a package or distribution (much like Dag described) and push that to every node during distribution.   

The registry file would have the map between

package_name : file_location

which would avoid all the failed open calls.     You would need to keep the registry updated as Dag describes, but this seems like a fairly simple approach that should help. 

-Travis


On Jan 13, 2012, at 2:38 PM, Sturla Molden wrote:

> Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
>> Another idea: Given your diagnostics, wouldn't dumping the output of
>> "find" of every path in sys.path to a single text file work well?
> 
> It probably would, and would also be less prone to synchronization 
> problems than using an MPI broadcast. Another possibility would be to 
> use a bsddb (or sqlite?) file as a persistent dict for caching the 
> output of imp.find_module.
> 
> Sturla
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From langton2 at llnl.gov  Fri Jan 13 16:52:29 2012
From: langton2 at llnl.gov (Langton, Asher)
Date: Fri, 13 Jan 2012 13:52:29 -0800
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com>
Message-ID: <CB35E3B6.A6A5%langton2@llnl.gov>

On 1/13/12 1:24 PM, Robert Kern wrote:
>On Fri, Jan 13, 2012 at 21:20, Langton, Asher <langton2 at llnl.gov> wrote:
>
>> 2) More generally, dealing with this as well as other library-loading
>> issues at the system level, perhaps by putting a small disk near a node
>>or
>> small collection of nodes, along with a command to push (broadcast) some
>> portions of the filesystem to these (more-)local disks. Basically, the
>> idea would be to let the user specify those directories or objects that
>> will be accessed by most of the processes and treated as read-only so
>>that
>> those objects can be cached near the node.
>
>Do these systems have a ramdisk capability?

That was another thing we looked at (but didn't implement): broadcasting
the modules to each node and putting them in a ramdisk. The drawback (for
us) is that we're already struggling with the amount of available memory
per core, and according to the vendors, the situation will only get worse
on future systems. The ramdisk approach might work well when there are
lots of small objects that will be accessed.

On 1/13/12 1:42 PM, Sturla Molden wrote:
>Den 13.01.2012 22:24, skrev Robert Kern:
>>Do these systems have a ramdisk capability?
>
>I assume you have seen this as well :)
>
>http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final
>.pdf


I hadn't. Thanks!

-Asher


From jlounds at dynamiteinc.com  Fri Jan 13 16:56:59 2012
From: jlounds at dynamiteinc.com (Jeremy Lounds)
Date: Fri, 13 Jan 2012 16:56:59 -0500
Subject: [Numpy-discussion] [JOB] Extracting subset of dataset using
	latitude and longitude
Message-ID: <20120113165659.25556@web003.nyc1.bluetie.com>

Hello,

I am looking for some help extracting a subset of data from a large dataset. The data is being read from a wgrib2 (World Meterological Organization standard gridded data) using the pygrib library.

The data values, latitudes and longitudes are in separate lists (arrays?), and I would like a regional subset.

The budget is not very large, but I am hoping that this is pretty simple job. I am just way too green at Python / numpy to know how to proceed, or even what to search for on Google.

If interested, please e-mail jlounds at dynamiteinc.com

Thank you!

Jeremy Lounds
DynamiteInc.com
1-877-762-7723, ext 711
Fax: 877-202-3014


From d.s.seljebotn at astro.uio.no  Fri Jan 13 16:58:56 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 13 Jan 2012 22:58:56 +0100
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <CB35D7D9.A659%langton2@llnl.gov>
References: <CB35D7D9.A659%langton2@llnl.gov>
Message-ID: <4F10A920.1050109@astro.uio.no>

On 01/13/2012 10:20 PM, Langton, Asher wrote:
> On 1/13/12 12:38 PM, Sturla Molden wrote:
>> Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
>>> Another idea: Given your diagnostics, wouldn't dumping the output of
>>> "find" of every path in sys.path to a single text file work well?
>>
>> It probably would, and would also be less prone to synchronization
>> problems than using an MPI broadcast. Another possibility would be to
>> use a bsddb (or sqlite?) file as a persistent dict for caching the
>> output of imp.find_module.
>
> We tested something along those lines. Tim Kadich, a summer student at
> LLNL, wrote a module that went through the path and built up a dict of
> module->location mappings for a subset of module types. My recollection is
> that it worked well, and as you note, it didn't have the synchronization
> issues that MPI_Import has. We didn't fully implement it, since to handle
> complicated packages correctly, it looked like we'd either have to
> re-implement a lot of the internal Python import code or modify the
> interpreter itself. I don't think that MPI_Import is ultimately the
> "right" solution, but it shows how easily we can reap significant gains.
> Two better approaches that come to mind are:

It's actually not too difficult to do something like

LD_PRELOAD=myhack.so python something.py

and have myhack.so intercept the filesystem calls Python makes (to libc) 
and do whatever it wants. That's a solution that doesn't interfer with 
how Python does its imports at all, it simply changes how Python 
perceives the world around it ("emulation", though much, much lighter).

It does require some low-level C code, but there are several examples on 
the net. I know Ondrej Certik just implemented something similar.

Note, I'm just brainstorming here and recording possible (and perhaps 
impossible) ideas in this thread  -- the solution you have found is 
indeed a great step forward!

Dag Sverre

>
> 1) Fixing this bottleneck at the interpreter level (pre-computing and
> caching the locations)
>
> 2) More generally, dealing with this as well as other library-loading
> issues at the system level, perhaps by putting a small disk near a node or
> small collection of nodes, along with a command to push (broadcast) some
> portions of the filesystem to these (more-)local disks. Basically, the
> idea would be to let the user specify those directories or objects that
> will be accessed by most of the processes and treated as read-only so that
> those objects can be cached near the node.
>
> -Asher
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From langton2 at llnl.gov  Fri Jan 13 17:09:56 2012
From: langton2 at llnl.gov (Langton, Asher)
Date: Fri, 13 Jan 2012 14:09:56 -0800
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F10A920.1050109@astro.uio.no>
Message-ID: <CB35E9D3.A6D1%langton2@llnl.gov>

On 1/13/12 1:58 PM, Dag Sverre Seljebotn wrote:
>
>It's actually not too difficult to do something like
>
>LD_PRELOAD=myhack.so python something.py
>
>and have myhack.so intercept the filesystem calls Python makes (to libc)
>and do whatever it wants. That's a solution that doesn't interfer with
>how Python does its imports at all, it simply changes how Python
>perceives the world around it ("emulation", though much, much lighter).
>
>It does require some low-level C code, but there are several examples on
>the net. I know Ondrej Certik just implemented something similar.

One of my colleagues suggested the LD_PRELOAD trick. I asked around here
at LLNL, and I seem to recall hearing that the LD_PRELOAD trick didn't
work on BlueGene/P, which is where the import bottleneck is the worst.
That might have been incorrect though, since LD_PRELOAD is mentioned on
Argonne's BG/P wiki. I'll have to look into this some more.

-Asher


From robert.kern at gmail.com  Fri Jan 13 17:11:07 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 13 Jan 2012 22:11:07 +0000
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F10A541.60305@molden.no>
References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov>
	<CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com>
	<4F10A541.60305@molden.no>
Message-ID: <CAF6FJistOx6srWumnetxNw7usa5TtChZz9fEwiC=zc7GHRHgwg@mail.gmail.com>

On Fri, Jan 13, 2012 at 21:42, Sturla Molden <sturla at molden.no> wrote:
> Den 13.01.2012 22:24, skrev Robert Kern:
>> Do these systems have a ramdisk capability?
>
> I assume you have seen this as well :)
>
> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf

I hadn't, actually! Good find! Actually, this same problem came up at
the last SciPy conference from several people (Blue Genes are more
common than I expected!), and the ramdisk was just my first idea. I'm
glad people have evaluated it.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From chaoyuejoy at gmail.com  Fri Jan 13 17:31:58 2012
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Fri, 13 Jan 2012 23:31:58 +0100
Subject: [Numpy-discussion] [JOB] Extracting subset of dataset using
 latitude and longitude
In-Reply-To: <20120113165659.25556@web003.nyc1.bluetie.com>
References: <20120113165659.25556@web003.nyc1.bluetie.com>
Message-ID: <CAAN-aREKAz4wC9Y9rGZJD2J1bw0Z80gKKaKQVYtYN46XW9TMAw@mail.gmail.com>

Hi, I don't know if numpy has ready tool for this.
I also have this use in my study.  So I write a simple code for my personal
use. It might no be great. I hope others can also respond as this is very
basic function in earth data analysis.

############################################################3
import numpy as np
lat=np.arange(89.75,-90,-0.5)
lon=np.arange(-179.75,180,0.5)
lon0,lat0=np.meshgrid(lon,lat)  #crate the grid from demonstration

def Get_GridValue(data,(vlat1,vlat2),(vlon1,vlon2)):
    index_lat=np.nonzero((lat[:]>=vlat1)&(lat[:]<=vlat2))[0]
    index_lon=np.nonzero((lon[:]>=vlon1)&(lon[:]<=vlon2))[0]

target=data[...,index_lat[0]:index_lat[-1]+1,index_lon[0]:index_lon[-1]+1]
    return target

Get_GridValue(lat0,(40,45),(-30,-25))
Get_GridValue(lon0,(40,45),(-30,-25))
############################################################

Chao

2012/1/13 Jeremy Lounds <jlounds at dynamiteinc.com>

> Hello,
>
> I am looking for some help extracting a subset of data from a large
> dataset. The data is being read from a wgrib2 (World Meterological
> Organization standard gridded data) using the pygrib library.
>
> The data values, latitudes and longitudes are in separate lists (arrays?),
> and I would like a regional subset.
>
> The budget is not very large, but I am hoping that this is pretty simple
> job. I am just way too green at Python / numpy to know how to proceed, or
> even what to search for on Google.
>
> If interested, please e-mail jlounds at dynamiteinc.com
>
> Thank you!
>
> Jeremy Lounds
> DynamiteInc.com
> 1-877-762-7723, ext 711
> Fax: 877-202-3014
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120113/4a1816b4/attachment.html>

From sturla at molden.no  Fri Jan 13 18:28:39 2012
From: sturla at molden.no (Sturla Molden)
Date: Sat, 14 Jan 2012 00:28:39 +0100
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F10A541.60305@molden.no>
References: <4F10965A.4040605@molden.no> <CB35D7D9.A659%langton2@llnl.gov>
	<CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com>
	<4F10A541.60305@molden.no>
Message-ID: <4F10BE27.6010601@molden.no>

Den 13.01.2012 22:42, skrev Sturla Molden:
> Den 13.01.2012 22:24, skrev Robert Kern:
>> Do these systems have a ramdisk capability?
> I assume you have seen this as well :)
>
> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf
>

This paper also repeats a common mistake about the GIL:

"A future challenge is the increasing number of CPU cores per node, 
which is normally addressed by hybrid thread and message passing based 
parallelization. Whereas message passing can be used transparently by 
both on Python and C level, the global interpreter lock in CPython 
limits the thread based parallelization to the C-extensions only. We are 
currently investigating hybrid OpenMP/MPI implementation with the hope 
that limiting threading to only C-extension provides enough performance."

This is NOT true.

Python threads are native OS threads. They can be used for parallel 
computing on multi-core CPUs. The only requirement is that the Python 
code calls a C extension that releases the GIL. We can use threads in C 
or Python code: OpenMP and threading.Thread perform equally well, but if 
we use threading.Thread the GIL must be released for parallel execution. 
OpenMP is typically better for fine-grained parallelism in C code and 
threading.Thread is better for course-grained parallelism in Python 
code. The latter is also where mpi4py and multiprocessing can be used.

Sturla


From d.s.seljebotn at astro.uio.no  Sat Jan 14 02:21:32 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sat, 14 Jan 2012 08:21:32 +0100
Subject: [Numpy-discussion] Improving Python+MPI import performance
In-Reply-To: <4F10BE27.6010601@molden.no>
References: <4F10965A.4040605@molden.no>
	<CB35D7D9.A659%langton2@llnl.gov>	<CAF6FJitZ9wN7ryECGGvBjx1GDx2nuZL_zW9GFMif0wL3_DLwLg@mail.gmail.com>	<4F10A541.60305@molden.no>
	<4F10BE27.6010601@molden.no>
Message-ID: <4F112CFC.307@astro.uio.no>

On 01/14/2012 12:28 AM, Sturla Molden wrote:
> Den 13.01.2012 22:42, skrev Sturla Molden:
>> Den 13.01.2012 22:24, skrev Robert Kern:
>>> Do these systems have a ramdisk capability?
>> I assume you have seen this as well :)
>>
>> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf
>>
>
> This paper also repeats a common mistake about the GIL:
>
> "A future challenge is the increasing number of CPU cores per node,
> which is normally addressed by hybrid thread and message passing based
> parallelization. Whereas message passing can be used transparently by
> both on Python and C level, the global interpreter lock in CPython
> limits the thread based parallelization to the C-extensions only. We are
> currently investigating hybrid OpenMP/MPI implementation with the hope
> that limiting threading to only C-extension provides enough performance."
>
> This is NOT true.
>
> Python threads are native OS threads. They can be used for parallel
> computing on multi-core CPUs. The only requirement is that the Python
> code calls a C extension that releases the GIL. We can use threads in C
> or Python code: OpenMP and threading.Thread perform equally well, but if
> we use threading.Thread the GIL must be released for parallel execution.
> OpenMP is typically better for fine-grained parallelism in C code and
> threading.Thread is better for course-grained parallelism in Python
> code. The latter is also where mpi4py and multiprocessing can be used.

I don't see how you contradict their statement. The only code that can 
run without the GIL is in C-extensions (even if it is written in, say, 
Cython).

Dag Sverre


From totonixsame at gmail.com  Sat Jan 14 15:52:55 2012
From: totonixsame at gmail.com (Thiago Franco de Moraes)
Date: Sat, 14 Jan 2012 18:52:55 -0200
Subject: [Numpy-discussion] Calculating density based on distance
Message-ID: <4F11EB27.2020909@gmail.com>

Hi all,

I have the following problem:

Given a array with dimension Nx3, where N is generally greater than 
1.000.000, for each item in this array I have to calculate its density, 
Where its density is the number of items from the same array with 
distance less than a given r. The items are the rows from the array.

I was not able to think a solution to this using one or two functions of 
Numpy. Then I wrote this code http://pastebin.com/iQV0bMNy . The problem 
it so slow. So I tried to implement it in Cython, here the result 
http://pastebin.com/zTywzjyM , but it is very slow yet.

Is there a better and faster way of doing that? Is there something in my 
Cython implementation I can do to perform better?

Thanks!


From ben.root at ou.edu  Sat Jan 14 16:07:02 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Sat, 14 Jan 2012 15:07:02 -0600
Subject: [Numpy-discussion] Calculating density based on distance
In-Reply-To: <4F11EB27.2020909@gmail.com>
References: <4F11EB27.2020909@gmail.com>
Message-ID: <CANNq6FkH8R7NBbs4aeczTsf38JA3XzFLD4_EHZucXMiUHYU+Mw@mail.gmail.com>

On Saturday, January 14, 2012, Thiago Franco de Moraes <
totonixsame at gmail.com> wrote:
> Hi all,
>
> I have the following problem:
>
> Given a array with dimension Nx3, where N is generally greater than
> 1.000.000, for each item in this array I have to calculate its density,
> Where its density is the number of items from the same array with
> distance less than a given r. The items are the rows from the array.
>
> I was not able to think a solution to this using one or two functions of
> Numpy. Then I wrote this code http://pastebin.com/iQV0bMNy . The problem
> it so slow. So I tried to implement it in Cython, here the result
> http://pastebin.com/zTywzjyM , but it is very slow yet.
>
> Is there a better and faster way of doing that? Is there something in my
> Cython implementation I can do to perform better?
>
> Thanks!

Have you looked at scipy.spatial.KDTree?  It can efficiently load up a data
structure that lets you easily determine the spatial relationship between
datapoints.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/bc7f2737/attachment.html>

From sturla at molden.no  Sat Jan 14 16:21:48 2012
From: sturla at molden.no (Sturla Molden)
Date: Sat, 14 Jan 2012 22:21:48 +0100
Subject: [Numpy-discussion] Calculating density based on distance
In-Reply-To: <4F11EB27.2020909@gmail.com>
References: <4F11EB27.2020909@gmail.com>
Message-ID: <4F11F1EC.5020704@molden.no>

Den 14.01.2012 21:52, skrev Thiago Franco de Moraes:
> Is there a better and faster way of doing that? Is there something in my
> Cython implementation I can do to perform better?
>
>

You need to use a kd-tree to make the computation run in O(n log n) time 
instead of O(n**2).

scipy.spatial.cKDTree is very fast.

Sturla


From charlesr.harris at gmail.com  Sat Jan 14 17:12:15 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 14 Jan 2012 15:12:15 -0700
Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.
Message-ID: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>

This sort of makes sense, but is it the 'correct' behavior?

In [20]: zeros(2, 'S')
Out[20]:
array(['', ''],
      dtype='|S1')

It might be more consistent to return '0' instead, as in

In [3]: zeros(2, int).astype('S')
Out[3]:
array(['0', '0'],
      dtype='|S24')

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/35805be6/attachment.html>

From ben.root at ou.edu  Sat Jan 14 17:16:28 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Sat, 14 Jan 2012 16:16:28 -0600
Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.
In-Reply-To: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
Message-ID: <CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com>

On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

> This sort of makes sense, but is it the 'correct' behavior?
>
> In [20]: zeros(2, 'S')
> Out[20]:
> array(['', ''],
>       dtype='|S1')
>
> It might be more consistent to return '0' instead, as in
>
> In [3]: zeros(2, int).astype('S')
> Out[3]:
> array(['0', '0'],
>       dtype='|S24')
>
> Chuck
>
>
Whatever it should be, numpy is currently inconsistent:

>>> np.empty(2, 'S')
array(['0', '\xd4'],
      dtype='|S1')
>>> np.zeros(2, 'S')
array(['', ''],
      dtype='|S1')
>>> np.ones(2, 'S')
array(['1', '1'],
      dtype='|S1')

I would expect '0''s for the call to zeros() and empty strings for the call
to empty().

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/c9906a01/attachment.html>

From ben.root at ou.edu  Sat Jan 14 17:25:05 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Sat, 14 Jan 2012 16:25:05 -0600
Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.
In-Reply-To: <CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com>
References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
	<CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com>
Message-ID: <CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com>

On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root <ben.root at ou.edu> wrote:

> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>> This sort of makes sense, but is it the 'correct' behavior?
>>
>> In [20]: zeros(2, 'S')
>> Out[20]:
>> array(['', ''],
>>       dtype='|S1')
>>
>> It might be more consistent to return '0' instead, as in
>>
>> In [3]: zeros(2, int).astype('S')
>> Out[3]:
>> array(['0', '0'],
>>       dtype='|S24')
>>
>> Chuck
>>
>>
> Whatever it should be, numpy is currently inconsistent:
>
> >>> np.empty(2, 'S')
> array(['0', '\xd4'],
>       dtype='|S1')
> >>> np.zeros(2, 'S')
>
> array(['', ''],
>       dtype='|S1')
> >>> np.ones(2, 'S')
> array(['1', '1'],
>       dtype='|S1')
>
> I would expect '0''s for the call to zeros() and empty strings for the
> call to empty().
>
> Ben Root
>
>
On the other hand, it is fairly standard to assume that the values in the
array returned by empty() to be random, uninitialized junk.  So, maybe
empty()'s current behavior is ok, but certainly zeros()'s and ones()'s
behaviors need to be looked at.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/4cd78e7a/attachment.html>

From charlesr.harris at gmail.com  Sat Jan 14 17:31:07 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 14 Jan 2012 15:31:07 -0700
Subject: [Numpy-discussion] Fix for ticket #1973
Message-ID: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com>

I've put up a pull request for a fix to ticket #1973. Currently the fix
simply propagates the maskna flag when the *.astype method is called. A
more complicated option would be to add a maskna keyword to specify whether
the output is masked or not or propagates the type of the source, but that
seems overly complex to me.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/28bfe98c/attachment.html>

From nathan.faggian at gmail.com  Sat Jan 14 18:53:52 2012
From: nathan.faggian at gmail.com (Nathan Faggian)
Date: Sun, 15 Jan 2012 10:53:52 +1100
Subject: [Numpy-discussion]  Negative indexing.
Message-ID: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>

Hi,

I am finding it less than useful to have the negative index wrapping on nd-arrays. Here is a short example:

import numpy as np
a = np.zeros((3, 3))
a[:,2] = 1000
print a[0,-1]
print a[0,-1]
print a[-1,-1]

In all cases 1000 is printed out. 

What I am after is a way to say "please don't wrap around" and have negative indices behave in a way I choose.  I know this is a standard thing - but is there a way to override that behaviour that doesn't involve cython or rolling my own resampler?

Kind Regards,

Nathan.


From josef.pktd at gmail.com  Sat Jan 14 19:21:55 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 14 Jan 2012 19:21:55 -0500
Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.
In-Reply-To: <CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com>
References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
	<CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com>
	<CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com>
Message-ID: <CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com>

On Sat, Jan 14, 2012 at 5:25 PM, Benjamin Root <ben.root at ou.edu> wrote:
> On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>
>> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>> This sort of makes sense, but is it the 'correct' behavior?
>>>
>>> In [20]: zeros(2, 'S')
>>> Out[20]:
>>> array(['', ''],
>>> ????? dtype='|S1')
>>>
>>> It might be more consistent to return '0' instead, as in
>>>
>>> In [3]: zeros(2, int).astype('S')
>>> Out[3]:
>>> array(['0', '0'],
>>> ????? dtype='|S24')


I would be surprised if zeros is not an empty string, since an empty
string is the "zero" for string addition.
multiplication for strings doesn't exist, so ones can be anything even
literally '1'

>>> a = np.zeros(5,'S4')
>>> a[:] = 'b'
>>> reduce(lambda x,y: x+y, a)
'bbbbb'


>>> a = np.zeros(1,'S100')
>>> for i in range(5): a[:] = a.item() + 'a'
...
>>> a
array(['aaaaa'],
      dtype='|S100')


just as a logical argument, I have no idea what's practical since last
time I tried to use numpy strings, I didn't find string addition and
went back to double and triple list comprehension.

Josef

>>>
>>> Chuck
>>>
>>
>> Whatever it should be, numpy is currently inconsistent:
>>
>> >>> np.empty(2, 'S')
>> array(['0', '\xd4'],
>> ????? dtype='|S1')
>> >>> np.zeros(2, 'S')
>>
>> array(['', ''],
>> ????? dtype='|S1')
>> >>> np.ones(2, 'S')
>> array(['1', '1'],
>> ????? dtype='|S1')
>>
>> I would expect '0''s for the call to zeros() and empty strings for the
>> call to empty().
>>
>> Ben Root
>>
>
> On the other hand, it is fairly standard to assume that the values in the
> array returned by empty() to be random, uninitialized junk.? So, maybe
> empty()'s current behavior is ok, but certainly zeros()'s and ones()'s
> behaviors need to be looked at.
>
> Ben Root
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From charlesr.harris at gmail.com  Sat Jan 14 21:02:49 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 14 Jan 2012 19:02:49 -0700
Subject: [Numpy-discussion] GSOC
In-Reply-To: <CABL7CQhV5kd7JFrYswK=neDqpdd4qWEDm+FqbnZdZgXf7HMzZw@mail.gmail.com>
References: <CAB6mnxLYF65T+hwYJSm6-PYOeMitUM6qZcM2WjgjvTsa_yL9ZA@mail.gmail.com>
	<CABL7CQhV5kd7JFrYswK=neDqpdd4qWEDm+FqbnZdZgXf7HMzZw@mail.gmail.com>
Message-ID: <CAB6mnx+qf4F0awbG67D-7JKio_r4mhuL9nfLWq5i8ujueW=YPA@mail.gmail.com>

On Thu, Dec 29, 2011 at 2:36 PM, Ralf Gommers
<ralf.gommers at googlemail.com>wrote:

>
>
> On Thu, Dec 29, 2011 at 9:50 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>> Hi All,
>>
>> I thought I'd raise this topic just to get some ideas out there. At the
>> moment I see two areas that I'd like to see addressed.
>>
>>
>>    1. Documentation editor. This would involve looking at the generated
>>    documentation and it's organization/coverage as well such things as style
>>    and maybe reviewing stuff on the documentation site. This would be more
>>    technical writing than coding.
>>    2. Test coverage. There are a lot of areas of numpy that are not well
>>    tested as well as some tests that are still doc tests and should probably
>>    be updated. This is a substantial amount of work and would require some
>>    familiarity with numpy as well as a willingness to ping developers for
>>    clarification of some topics.
>>
>> Thoughts?
>>
> First thought: very useful, but probably not GSOC topics by themselves.
>
> For a very good student, I'd think topics like implementing NA bit masks
> or improved user-defined dtypes would be interesting. In SciPy there's also
> a lot to do, and that's probably a better project for students who prefer
> to work in Python.
>
>
Besides NA bit masks, the new iterator isn't used in a lot of places it
could be. Maybe replacing all uses of the old iterator? I'll admit, that
smacks more of maintenance than developing new code and might be a hard
sell.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/87af1221/attachment.html>

From totonixsame at gmail.com  Sat Jan 14 21:23:47 2012
From: totonixsame at gmail.com (Thiago Franco Moraes)
Date: Sun, 15 Jan 2012 00:23:47 -0200
Subject: [Numpy-discussion] Calculating density based on distance
In-Reply-To: <CANNq6FkH8R7NBbs4aeczTsf38JA3XzFLD4_EHZucXMiUHYU+Mw@mail.gmail.com>
References: <4F11EB27.2020909@gmail.com>
	<CANNq6FkH8R7NBbs4aeczTsf38JA3XzFLD4_EHZucXMiUHYU+Mw@mail.gmail.com>
Message-ID: <CAMmoLX8VqnaUaD-DqyO4mKXAWywsJDOt-x28-yWddUtrpt7ZnA@mail.gmail.com>

No dia S?bado, 14 de Janeiro de 2012, Benjamin Rootben.root at ou.edu escreveu:
>
>
> On Saturday, January 14, 2012, Thiago Franco de Moraes <
totonixsame at gmail.com> wrote:
>> Hi all,
>>
>> I have the following problem:
>>
>> Given a array with dimension Nx3, where N is generally greater than
>> 1.000.000, for each item in this array I have to calculate its density,
>> Where its density is the number of items from the same array with
>> distance less than a given r. The items are the rows from the array.
>>
>> I was not able to think a solution to this using one or two functions of
>> Numpy. Then I wrote this code http://pastebin.com/iQV0bMNy . The problem
>> it so slow. So I tried to implement it in Cython, here the result
>> http://pastebin.com/zTywzjyM , but it is very slow yet.
>>
>> Is there a better and faster way of doing that? Is there something in my
>> Cython implementation I can do to perform better?
>>
>> Thanks!
>
> Have you looked at scipy.spatial.KDTree?  It can efficiently load up a
data structure that lets you easily determine the spatial relationship
between datapoints.
>
> Ben Root

Thanks, Ben, I'm going to do that.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120115/f89691f3/attachment.html>

From charlesr.harris at gmail.com  Sat Jan 14 23:01:09 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 14 Jan 2012 21:01:09 -0700
Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.
In-Reply-To: <CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com>
References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
	<CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com>
	<CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com>
	<CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com>
Message-ID: <CAB6mnxKDM3OTbsCfR6OszA7BncZB8K9v4oiv6t6L6On=zdgzuQ@mail.gmail.com>

On Sat, Jan 14, 2012 at 5:21 PM, <josef.pktd at gmail.com> wrote:

> On Sat, Jan 14, 2012 at 5:25 PM, Benjamin Root <ben.root at ou.edu> wrote:
> > On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root <ben.root at ou.edu> wrote:
> >>
> >> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >>>
> >>> This sort of makes sense, but is it the 'correct' behavior?
> >>>
> >>> In [20]: zeros(2, 'S')
> >>> Out[20]:
> >>> array(['', ''],
> >>>       dtype='|S1')
> >>>
> >>> It might be more consistent to return '0' instead, as in
> >>>
> >>> In [3]: zeros(2, int).astype('S')
> >>> Out[3]:
> >>> array(['0', '0'],
> >>>       dtype='|S24')
>
>
>
> I would be surprised if zeros is not an empty string, since an empty
> string is the "zero" for string addition.
> multiplication for strings doesn't exist, so ones can be anything even
> literally '1'
>
> >>> a = np.zeros(5,'S4')
> >>> a[:] = 'b'
> >>> reduce(lambda x,y: x+y, a)
> 'bbbbb'
>
>
> >>> a = np.zeros(1,'S100')
> >>> for i in range(5): a[:] = a.item() + 'a'
> ...
> >>> a
> array(['aaaaa'],
>      dtype='|S100')
>
>
> just as a logical argument, I have no idea what's practical since last
> time I tried to use numpy strings, I didn't find string addition and
> went back to double and triple list comprehension.
>
>
I don't think it was quite so cleverly reasoned out ;) The functions works
as expected for object arrays, but that is the only exception. For all
other types the allocated space is simply filled with zero bytes. Too bad
this isn't done in python like ones, it would be easier to fix.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120114/b8784270/attachment.html>

From paul.anton.letnes at gmail.com  Sun Jan 15 02:39:50 2012
From: paul.anton.letnes at gmail.com (Paul Anton Letnes)
Date: Sun, 15 Jan 2012 08:39:50 +0100
Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.
In-Reply-To: <CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com>
References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
	<CANNq6FksPF9Mnm1+mWGfx8bgPP5ND=YRggni_rwCuVAusDxRdQ@mail.gmail.com>
	<CANNq6F=CBAnjU1zy_a9Tc=7pZ4Q3uAasiHyHqrRL+LFRAiQLXA@mail.gmail.com>
	<CAMMTP+DXjuXeaTAwq8DyqDW_1Tk1+eFwR-TtWgOb6Zh9kLj2Gg@mail.gmail.com>
Message-ID: <20CF3C33-B239-4AE0-BE93-AC02E637956E@gmail.com>


On 15. jan. 2012, at 01:21, josef.pktd at gmail.com wrote:

> On Sat, Jan 14, 2012 at 5:25 PM, Benjamin Root <ben.root at ou.edu> wrote:
>> On Sat, Jan 14, 2012 at 4:16 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>> 
>>> On Sat, Jan 14, 2012 at 4:12 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>>> 
>>>> This sort of makes sense, but is it the 'correct' behavior?
>>>> 
>>>> In [20]: zeros(2, 'S')
>>>> Out[20]:
>>>> array(['', ''],
>>>>       dtype='|S1')
>>>> 
>>>> It might be more consistent to return '0' instead, as in
>>>> 
>>>> In [3]: zeros(2, int).astype('S')
>>>> Out[3]:
>>>> array(['0', '0'],
>>>>       dtype='|S24')
> 
> 
> 
> I would be surprised if zeros is not an empty string, since an empty
> string is the "zero" for string addition.
> multiplication for strings doesn't exist, so ones can be anything even
> literally '1'

My python disagrees.
In [1]: 2 * 'spam ham '
Out[1]: 'spam ham spam ham '

Not sure what the element-wise numpy array equivalent would be, though.

Paul


From njs at pobox.com  Sun Jan 15 03:15:41 2012
From: njs at pobox.com (Nathaniel Smith)
Date: Sun, 15 Jan 2012 00:15:41 -0800
Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.
In-Reply-To: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
Message-ID: <CAPJVwBmrhpOAFX9BGNOarfX+BZHJsH8BRRKpqd-xo5mzs3Qckw@mail.gmail.com>

On Sat, Jan 14, 2012 at 2:12 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> This sort of makes sense, but is it the 'correct' behavior?
>
> In [20]: zeros(2, 'S')
> Out[20]:
> array(['', ''],
> ????? dtype='|S1')

I think of numpy strings as raw fixed-length byte arrays (since, well,
that's what they are), so I would expect np.zeros to return all-NUL
strings, like it does. (Not just 'empty' strings, which just means the
first byte is NUL -- I expect all-NUL.) Maybe I've spent too much time
working with C data structures, but that's my $0.02 :-)

-- Nathaniel


From daniele at grinta.net  Sun Jan 15 07:30:33 2012
From: daniele at grinta.net (Daniele Nicolodi)
Date: Sun, 15 Jan 2012 13:30:33 +0100
Subject: [Numpy-discussion] Negative indexing.
In-Reply-To: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
Message-ID: <4F12C6E9.6070804@grinta.net>

On 15/01/12 00:53, Nathan Faggian wrote:
> Hi,
> 
> I am finding it less than useful to have the negative index wrapping
> on nd-arrays. Here is a short example:
> 
> import numpy as np
> a = np.zeros((3, 3))
> a[:,2] = 1000
> print a[0,-1]
> print a[0,-1]
> print a[-1,-1]
> 
> In all cases 1000 is printed out. 

What else would you expect?

> What I am after is a way to say "please don't wrap around" and have
> negative indices behave in a way I choose.  I know this is a standard
> thing - but is there a way to override that behaviour that doesn't
> involve cython or rolling my own resampler?

What other behavior would you choose? I don't see any other that would
make sense and that would be consistent with positive indexing.

Cheers,
-- 
Daniele


From cournape at gmail.com  Sun Jan 15 07:54:59 2012
From: cournape at gmail.com (David Cournapeau)
Date: Sun, 15 Jan 2012 12:54:59 +0000
Subject: [Numpy-discussion] Negative indexing.
In-Reply-To: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
Message-ID: <CAGY4rcV-33B5WJQGRcUm41k-1jPYEXorKt+oQzU_hJqWuVFP0Q@mail.gmail.com>

On Sat, Jan 14, 2012 at 11:53 PM, Nathan Faggian
<nathan.faggian at gmail.com> wrote:
> Hi,
>
> I am finding it less than useful to have the negative index wrapping on nd-arrays. Here is a short example:
>
> import numpy as np
> a = np.zeros((3, 3))
> a[:,2] = 1000
> print a[0,-1]
> print a[0,-1]
> print a[-1,-1]
>
> In all cases 1000 is printed out.
>
> What I am after is a way to say "please don't wrap around" and have negative indices behave in a way I choose. ?I know this is a standard thing - but is there a way to override that behaviour that doesn't involve cython or rolling my own resampler?

Although it could be possible with lots of work, it would most likely
be a bad idea. You will need to wrap something around your
model/data/etc... Could you explain a bit more what you have in mind ?

David


From josef.pktd at gmail.com  Sun Jan 15 08:00:57 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sun, 15 Jan 2012 08:00:57 -0500
Subject: [Numpy-discussion] np.zeros(2, 'S') returns empty strings.
In-Reply-To: <CAPJVwBmrhpOAFX9BGNOarfX+BZHJsH8BRRKpqd-xo5mzs3Qckw@mail.gmail.com>
References: <CAB6mnxJZ-G+v=wtmo6ZboZ2KMVSd_Wj4dFED-EH_+sew1e4Zeg@mail.gmail.com>
	<CAPJVwBmrhpOAFX9BGNOarfX+BZHJsH8BRRKpqd-xo5mzs3Qckw@mail.gmail.com>
Message-ID: <CAMMTP+COp7xE-ffwyYw7xjcwjAjomjqVZqqbtk8u=+v3hABjVQ@mail.gmail.com>

On Sun, Jan 15, 2012 at 3:15 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Sat, Jan 14, 2012 at 2:12 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>> This sort of makes sense, but is it the 'correct' behavior?
>>
>> In [20]: zeros(2, 'S')
>> Out[20]:
>> array(['', ''],
>> ????? dtype='|S1')
>
> I think of numpy strings as raw fixed-length byte arrays (since, well,
> that's what they are), so I would expect np.zeros to return all-NUL
> strings, like it does. (Not just 'empty' strings, which just means the
> first byte is NUL -- I expect all-NUL.) Maybe I've spent too much time
> working with C data structures, but that's my $0.02 :-)

Since I'm not coding in C: can a fixed-length empty string, '', be
represented as only first byte is NUL?

The following with the current behavior looks all reasonable to me

>>> np.zeros(2).view('S4')
array(['', '', '', ''],
      dtype='|S4')
>>> np.zeros(4, 'S4').view(float)
array([ 0.,  0.])
>>> np.zeros(4, 'S4').view(int)
array([0, 0, 0, 0])
>>> np.zeros(4, 'S4').view('S16')
array([''],
      dtype='|S16')


np.zeros(2, float).view('S4')
array(['', '', '', ''],
      dtype='|S4')

instead of astype

>>> np.zeros(2, float).astype('S4')
array(['0.0', '0.0'],
      dtype='|S4')

my 2c (with trying to understand what's the question)

Josef

>
> -- Nathaniel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From apo at pdauf.de  Sun Jan 15 10:45:48 2012
From: apo at pdauf.de (apo at pdauf.de)
Date: Sun, 15 Jan 2012 16:45:48 +0100 (CET)
Subject: [Numpy-discussion] Counting the Colors of RGB-Image
Message-ID: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120115/0310a1ac/attachment.html>
-------------- next part --------------

Counting the Colors of RGB-Image, 
nameit im0 with im0.shape = 2500,3500,3
with this code:

tab0 = zeros( (256,256,256) , dtype=int)
tt = im0.view()
tt.shape = -1,3
for r,g,b in tt:
 tab0[r,g,b] += 1

Question:

Is there a faster way in numpy to get this result?


MfG elodw


From tsyu80 at gmail.com  Sun Jan 15 11:03:29 2012
From: tsyu80 at gmail.com (Tony Yu)
Date: Sun, 15 Jan 2012 11:03:29 -0500
Subject: [Numpy-discussion] Counting the Colors of RGB-Image
In-Reply-To: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de>
References: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de>
Message-ID: <CAEym_Hrh9Zt0q5p5QcqcvE4_utifKT0rG5mQh9uYdKk5Pj+xpA@mail.gmail.com>

On Sun, Jan 15, 2012 at 10:45 AM, <apo at pdauf.de> wrote:

>
> Counting the Colors of RGB-Image,
> nameit im0 with im0.shape = 2500,3500,3
> with this code:
>
> tab0 = zeros( (256,256,256) , dtype=int)
> tt = im0.view()
> tt.shape = -1,3
> for r,g,b in tt:
>  tab0[r,g,b] += 1
>
> Question:
>
> Is there a faster way in numpy to get this result?
>
>
> MfG elodw
>

Assuming that your image is made up of integer values (which I guess they'd
have to be if you're indexing into `tab0`), then you could write:

>>> rgb_unique = set(tuple(rgb) for rgb in tt)

I'm not sure if it's any faster than your loop, but I would assume it is.

-Tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120115/7c6aa5b9/attachment.html>

From nadavh at visionsense.com  Sun Jan 15 12:40:44 2012
From: nadavh at visionsense.com (Nadav Horesh)
Date: Sun, 15 Jan 2012 09:40:44 -0800
Subject: [Numpy-discussion] Counting the Colors of RGB-Image
In-Reply-To: <CAEym_Hrh9Zt0q5p5QcqcvE4_utifKT0rG5mQh9uYdKk5Pj+xpA@mail.gmail.com>
References: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de>,
	<CAEym_Hrh9Zt0q5p5QcqcvE4_utifKT0rG5mQh9uYdKk5Pj+xpA@mail.gmail.com>
Message-ID: <26FC23E7C398A64083C980D16001012D261E514763@VA3DIAXVS361.RED001.local>

im_flat = im0[...,0]*65536 + im[...,1]*256 +im[...,2]
colours = np.unique(im_flat)

   Nadav

________________________________
From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Tony Yu [tsyu80 at gmail.com]
Sent: 15 January 2012 18:03
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Counting the Colors of RGB-Image


On Sun, Jan 15, 2012 at 10:45 AM, <apo at pdauf.de<mailto:apo at pdauf.de>> wrote:

Counting the Colors of RGB-Image,
nameit im0 with im0.shape = 2500,3500,3
with this code:

tab0 = zeros( (256,256,256) , dtype=int)
tt = im0.view()
tt.shape = -1,3
for r,g,b in tt:
 tab0[r,g,b] += 1

Question:

Is there a faster way in numpy to get this result?


MfG elodw

Assuming that your image is made up of integer values (which I guess they'd have to be if you're indexing into `tab0`), then you could write:

>>> rgb_unique = set(tuple(rgb) for rgb in tt)

I'm not sure if it's any faster than your loop, but I would assume it is.

-Tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120115/65eefcac/attachment.html>

From numpy-discussion at maubp.freeserve.co.uk  Sun Jan 15 14:10:34 2012
From: numpy-discussion at maubp.freeserve.co.uk (Peter)
Date: Sun, 15 Jan 2012 19:10:34 +0000
Subject: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series of
	arrays
Message-ID: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com>

Hello all,

Is there a recommended (and ideally cross platform)
way to load the frames of a QuickTime movie (*.mov
file) in Python as NumPy arrays? I'd be happy with
an iterator based approach, but random access to
the frames would be a nice bonus.

My aim is to try some image analysis in Python, if
there is any sound in the files I don't care about it.

I had a look at OpenCV which has Python bindings,
http://opencv.willowgarage.com/documentation/python/index.html
however I had no joy compiling this on Mac OS X
with QuickTime support. Is this the best bet?

Thanks,

Peter


From robert.kern at gmail.com  Sun Jan 15 14:12:11 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 15 Jan 2012 19:12:11 +0000
Subject: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series
	of arrays
In-Reply-To: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com>
References: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com>
Message-ID: <CAF6FJiuocxCrV7Hogq2-mHMB_X+T=FWi1hy_wmNfLmycrbpJQQ@mail.gmail.com>

On Sun, Jan 15, 2012 at 19:10, Peter
<numpy-discussion at maubp.freeserve.co.uk> wrote:
> Hello all,
>
> Is there a recommended (and ideally cross platform)
> way to load the frames of a QuickTime movie (*.mov
> file) in Python as NumPy arrays? I'd be happy with
> an iterator based approach, but random access to
> the frames would be a nice bonus.
>
> My aim is to try some image analysis in Python, if
> there is any sound in the files I don't care about it.
>
> I had a look at OpenCV which has Python bindings,
> http://opencv.willowgarage.com/documentation/python/index.html
> however I had no joy compiling this on Mac OS X
> with QuickTime support. Is this the best bet?

I've had luck with pyffmpeg, though I haven't tried QuickTime .mov files:

  http://code.google.com/p/pyffmpeg/

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From bsouthey at gmail.com  Mon Jan 16 10:37:57 2012
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 16 Jan 2012 09:37:57 -0600
Subject: [Numpy-discussion] Fix for ticket #1973
In-Reply-To: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com>
References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com>
Message-ID: <4F144455.9020904@gmail.com>

On 01/14/2012 04:31 PM, Charles R Harris wrote:
> I've put up a pull request for a fix to ticket #1973. Currently the 
> fix simply propagates the maskna flag when the *.astype method is 
> called. A more complicated option would be to add a maskna keyword to 
> specify whether the output is masked or not or propagates the type of 
> the source, but that seems overly complex to me.
>
> Thoughts?
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Thanks for the correction and as well as the fix. While it worked for 
integer and floats (not complex ones), I got an error when using complex 
dtypes. This error that is also present in array creation of complex 
dtypes. Is this known or a new bug?

If it is new, then we need to identify what functionality should handle 
np.NA but are not working.

Bruce

$ python
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import numpy as np
 >>> np.__version__ # pull request version
'2.0.0.dev-88f9276'
 >>> np.array([1,2], dtype=np.complex)
array([ 1.+0.j,  2.+0.j])
 >>> np.array([1,2, np.NA], dtype=np.complex)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line 
1445, in array_repr
     ', ', "array(")
   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", 
line 459, in array2string
     separator, prefix, formatter=formatter)
   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", 
line 263, in _array2string
     suppress_small),
   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", 
line 724, in __init__
     self.real_format = FloatFormat(x.real, precision, suppress_small)
ValueError: Cannot construct a view of data together with the 
NPY_ARRAY_MASKNA flag, the NA mask must be added later
 >>> ca=np.array([1,2], dtype=np.complex, maskna=True)
 >>> ca[1]=np.NA
 >>> ca
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line 
1445, in array_repr
     ', ', "array(")
   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", 
line 459, in array2string
     separator, prefix, formatter=formatter)
   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", 
line 263, in _array2string
     suppress_small),
   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", 
line 724, in __init__
     self.real_format = FloatFormat(x.real, precision, suppress_small)
ValueError: Cannot construct a view of data together with the 
NPY_ARRAY_MASKNA flag, the NA mask must be added later
 >>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/b40cb3fb/attachment.html>

From charlesr.harris at gmail.com  Mon Jan 16 10:52:10 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 16 Jan 2012 08:52:10 -0700
Subject: [Numpy-discussion] Fix for ticket #1973
In-Reply-To: <4F144455.9020904@gmail.com>
References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com>
	<4F144455.9020904@gmail.com>
Message-ID: <CAB6mnxKmzuexvvMph6u9KS_12d56JGY+74Wf6U+Xk8nhqFUjzw@mail.gmail.com>

On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey <bsouthey at gmail.com> wrote:

> **
> On 01/14/2012 04:31 PM, Charles R Harris wrote:
>
> I've put up a pull request for a fix to ticket #1973. Currently the fix
> simply propagates the maskna flag when the *.astype method is called. A
> more complicated option would be to add a maskna keyword to specify whether
> the output is masked or not or propagates the type of the source, but that
> seems overly complex to me.
>
> Thoughts?
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>  Thanks for the correction and as well as the fix. While it worked for
> integer and floats (not complex ones), I got an error when using complex
> dtypes. This error that is also present in array creation of complex
> dtypes. Is this known or a new bug?
>
> If it is new, then we need to identify what functionality should handle
> np.NA but are not working.
>
> Bruce
>
> $ python
> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy as np
> >>> np.__version__ # pull request version
> '2.0.0.dev-88f9276'
> >>> np.array([1,2], dtype=np.complex)
> array([ 1.+0.j,  2.+0.j])
> >>> np.array([1,2, np.NA], dtype=np.complex)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
> 1445, in array_repr
>     ', ', "array(")
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 459, in array2string
>     separator, prefix, formatter=formatter)
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 263, in _array2string
>     suppress_small),
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 724, in __init__
>     self.real_format = FloatFormat(x.real, precision, suppress_small)
> ValueError: Cannot construct a view of data together with the
> NPY_ARRAY_MASKNA flag, the NA mask must be added later
> >>> ca=np.array([1,2], dtype=np.complex, maskna=True)
> >>> ca[1]=np.NA
> >>> ca
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
> 1445, in array_repr
>     ', ', "array(")
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 459, in array2string
>     separator, prefix, formatter=formatter)
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 263, in _array2string
>     suppress_small),
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 724, in __init__
>     self.real_format = FloatFormat(x.real, precision, suppress_small)
> ValueError: Cannot construct a view of data together with the
> NPY_ARRAY_MASKNA flag, the NA mask must be added later
> >>>
>
>
Looks like a different bug involving the *.real and *.imag views. I'll take
a look.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/d4609e11/attachment.html>

From charlesr.harris at gmail.com  Mon Jan 16 11:14:22 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 16 Jan 2012 09:14:22 -0700
Subject: [Numpy-discussion] Fix for ticket #1973
In-Reply-To: <CAB6mnxKmzuexvvMph6u9KS_12d56JGY+74Wf6U+Xk8nhqFUjzw@mail.gmail.com>
References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com>
	<4F144455.9020904@gmail.com>
	<CAB6mnxKmzuexvvMph6u9KS_12d56JGY+74Wf6U+Xk8nhqFUjzw@mail.gmail.com>
Message-ID: <CAB6mnxKgHGQQnmLXNxDj158cwURw=41WWcE2yV5MfwsK5vV_XA@mail.gmail.com>

On Mon, Jan 16, 2012 at 8:52 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey <bsouthey at gmail.com> wrote:
>
>> **
>> On 01/14/2012 04:31 PM, Charles R Harris wrote:
>>
>> I've put up a pull request for a fix to ticket #1973. Currently the fix
>> simply propagates the maskna flag when the *.astype method is called. A
>> more complicated option would be to add a maskna keyword to specify whether
>> the output is masked or not or propagates the type of the source, but that
>> seems overly complex to me.
>>
>> Thoughts?
>>
>> Chuck
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>  Thanks for the correction and as well as the fix. While it worked for
>> integer and floats (not complex ones), I got an error when using complex
>> dtypes. This error that is also present in array creation of complex
>> dtypes. Is this known or a new bug?
>>
>> If it is new, then we need to identify what functionality should handle
>> np.NA but are not working.
>>
>> Bruce
>>
>> $ python
>> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
>> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> import numpy as np
>> >>> np.__version__ # pull request version
>> '2.0.0.dev-88f9276'
>> >>> np.array([1,2], dtype=np.complex)
>> array([ 1.+0.j,  2.+0.j])
>> >>> np.array([1,2, np.NA], dtype=np.complex)
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
>> 1445, in array_repr
>>     ', ', "array(")
>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>> line 459, in array2string
>>     separator, prefix, formatter=formatter)
>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>> line 263, in _array2string
>>     suppress_small),
>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>> line 724, in __init__
>>     self.real_format = FloatFormat(x.real, precision, suppress_small)
>> ValueError: Cannot construct a view of data together with the
>> NPY_ARRAY_MASKNA flag, the NA mask must be added later
>> >>> ca=np.array([1,2], dtype=np.complex, maskna=True)
>> >>> ca[1]=np.NA
>> >>> ca
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
>> 1445, in array_repr
>>     ', ', "array(")
>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>> line 459, in array2string
>>     separator, prefix, formatter=formatter)
>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>> line 263, in _array2string
>>     suppress_small),
>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>> line 724, in __init__
>>     self.real_format = FloatFormat(x.real, precision, suppress_small)
>> ValueError: Cannot construct a view of data together with the
>> NPY_ARRAY_MASKNA flag, the NA mask must be added later
>> >>>
>>
>>
> Looks like a different bug involving the *.real and *.imag views. I'll
> take a look.
>
>
Looks like views of masked arrays have other problems:

In [13]: a = ones(3, int16, maskna=1)

In [14]: a.view(int8)
Out[14]: array([1, 0, 1, NA, 1, NA], dtype=int8)


I'm not sure what the policy should be here. One could construct a new mask
adapted to the view, raise an error when the types don't align (I think the
real/imag parts should be considered aligned), or just let the view unmask
the array. The last seems dangerous. Hmm...

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/c7d4dd5c/attachment.html>

From charlesr.harris at gmail.com  Mon Jan 16 13:20:21 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 16 Jan 2012 11:20:21 -0700
Subject: [Numpy-discussion] Fix for ticket #1973
In-Reply-To: <4F144455.9020904@gmail.com>
References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com>
	<4F144455.9020904@gmail.com>
Message-ID: <CAB6mnxJ4ukkr+B8aiN3j0KPduvUY2unB1zTcah0Ks+917JU1EQ@mail.gmail.com>

On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey <bsouthey at gmail.com> wrote:

> **
> On 01/14/2012 04:31 PM, Charles R Harris wrote:
>
> I've put up a pull request for a fix to ticket #1973. Currently the fix
> simply propagates the maskna flag when the *.astype method is called. A
> more complicated option would be to add a maskna keyword to specify whether
> the output is masked or not or propagates the type of the source, but that
> seems overly complex to me.
>
> Thoughts?
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>  Thanks for the correction and as well as the fix. While it worked for
> integer and floats (not complex ones), I got an error when using complex
> dtypes. This error that is also present in array creation of complex
> dtypes. Is this known or a new bug?
>
> If it is new, then we need to identify what functionality should handle
> np.NA but are not working.
>
> Bruce
>
> $ python
> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import numpy as np
> >>> np.__version__ # pull request version
> '2.0.0.dev-88f9276'
> >>> np.array([1,2], dtype=np.complex)
> array([ 1.+0.j,  2.+0.j])
> >>> np.array([1,2, np.NA], dtype=np.complex)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
> 1445, in array_repr
>     ', ', "array(")
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 459, in array2string
>     separator, prefix, formatter=formatter)
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 263, in _array2string
>     suppress_small),
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 724, in __init__
>     self.real_format = FloatFormat(x.real, precision, suppress_small)
> ValueError: Cannot construct a view of data together with the
> NPY_ARRAY_MASKNA flag, the NA mask must be added later
> >>> ca=np.array([1,2], dtype=np.complex, maskna=True)
> >>> ca[1]=np.NA
> >>> ca
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
> 1445, in array_repr
>     ', ', "array(")
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 459, in array2string
>     separator, prefix, formatter=formatter)
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 263, in _array2string
>     suppress_small),
>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py", line
> 724, in __init__
>     self.real_format = FloatFormat(x.real, precision, suppress_small)
> ValueError: Cannot construct a view of data together with the
> NPY_ARRAY_MASKNA flag, the NA mask must be added later
> >>>
>
>
The location of this problem is easy to find, but the fix isn't completely
trivial as it seems there is no easy way to copy masks between arrays.
There may be, but I haven't found it. Also there is the unfortunate fact
that real and imag are array methods and work for non-complex arrays

In [6]: a = ones(3, 'S')

In [7]: a.real
Out[7]:
array(['1', '1', '1'],
      dtype='|S1')

In [8]: a.imag
Out[8]:
array(['', '', ''],
      dtype='|S1')

which makes a simple view impractical. Not that views seem to work.

Another complication of the NA stuff is that there two types of NA, a
potential multivalued NA, and a simple boolean NA. I think we need to pick
between the two as supporting both makes a mess. Because of the common
complaint about memory usage, I vote for simple boolean which offers the
option of bit arrays for the masks.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/50be8526/attachment.html>

From tmp50 at ukr.net  Mon Jan 16 15:07:54 2012
From: tmp50 at ukr.net (Dmitrey)
Date: Mon, 16 Jan 2012 22:07:54 +0200
Subject: [Numpy-discussion] [ANN] global constrained solver with discrete
	variables
Message-ID: <17671.1326744474.16853089562548174848@ffe6.ukr.net>


hi all,
I've done support of discrete variables for interalg - free (license:
BSD) solver with specifiable accuracy, you can take a look at an example
here

It is written in Python + NumPy, and I hope it's speed will be
essentially increased when PyPy (Python with dynamic compilation) support
for NumPy will be done (some parts of code are not vectorized and still
use CPython cycles). Also, NumPy funcs like vstack or append produce only
copy of data, and it also slows the solver very much (for mature
problems).

Maybe some bugs still present somewhere - interalg code already became
very long, but since it already works, you could be interested in trying
to use it right now.

Regards, D.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/7afbf75e/attachment.html>

From ben.root at ou.edu  Mon Jan 16 16:05:17 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Mon, 16 Jan 2012 15:05:17 -0600
Subject: [Numpy-discussion] Negative indexing.
In-Reply-To: <CAGY4rcV-33B5WJQGRcUm41k-1jPYEXorKt+oQzU_hJqWuVFP0Q@mail.gmail.com>
References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
	<CAGY4rcV-33B5WJQGRcUm41k-1jPYEXorKt+oQzU_hJqWuVFP0Q@mail.gmail.com>
Message-ID: <CANNq6Fk-Z_ydvyaSkeATooBbZ=YsNvDMw80O6QdabHQ9YKHk3A@mail.gmail.com>

On Sun, Jan 15, 2012 at 6:54 AM, David Cournapeau <cournape at gmail.com>wrote:

> On Sat, Jan 14, 2012 at 11:53 PM, Nathan Faggian
> <nathan.faggian at gmail.com> wrote:
> > Hi,
> >
> > I am finding it less than useful to have the negative index wrapping on
> nd-arrays. Here is a short example:
> >
> > import numpy as np
> > a = np.zeros((3, 3))
> > a[:,2] = 1000
> > print a[0,-1]
> > print a[0,-1]
> > print a[-1,-1]
> >
> > In all cases 1000 is printed out.
> >
> > What I am after is a way to say "please don't wrap around" and have
> negative indices behave in a way I choose.  I know this is a standard thing
> - but is there a way to override that behaviour that doesn't involve cython
> or rolling my own resampler?
>
> Although it could be possible with lots of work, it would most likely
> be a bad idea. You will need to wrap something around your
> model/data/etc... Could you explain a bit more what you have in mind ?
>
> David
>

Another approach that might be useful, depending on the needs, is to use
`np.ravel_multi_index()`, in which ndim coords can be passed in and flatten
coords are returned.  It has options of 'raise', 'wrap' and 'clip' for
handling out-of-bounds indices.  It wouldn't be built directly into the
arrays, but if that isn't needed, this might work.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/e9cce650/attachment.html>

From charlesr.harris at gmail.com  Mon Jan 16 16:24:08 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 16 Jan 2012 14:24:08 -0700
Subject: [Numpy-discussion] Negative indexing.
In-Reply-To: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
Message-ID: <CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com>

On Sat, Jan 14, 2012 at 4:53 PM, Nathan Faggian <nathan.faggian at gmail.com>wrote:

> Hi,
>
> I am finding it less than useful to have the negative index wrapping on
> nd-arrays. Here is a short example:
>
> import numpy as np
> a = np.zeros((3, 3))
> a[:,2] = 1000
> print a[0,-1]
> print a[0,-1]
> print a[-1,-1]
>
> In all cases 1000 is printed out.
>
>
Looks right to me, the whole last column is 1000. What exactly do you want
to do and what is the problem?

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/0783c008/attachment.html>

From ben.root at ou.edu  Mon Jan 16 16:30:27 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Mon, 16 Jan 2012 15:30:27 -0600
Subject: [Numpy-discussion] Negative indexing.
In-Reply-To: <CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com>
References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
	<CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com>
Message-ID: <CANNq6FmsaGMyuP+zOqrZC1xUs0sZ6Jb4FK2KyjCowJFoY+JqNg@mail.gmail.com>

On Mon, Jan 16, 2012 at 3:24 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, Jan 14, 2012 at 4:53 PM, Nathan Faggian <nathan.faggian at gmail.com>wrote:
>
>> Hi,
>>
>> I am finding it less than useful to have the negative index wrapping on
>> nd-arrays. Here is a short example:
>>
>> import numpy as np
>> a = np.zeros((3, 3))
>> a[:,2] = 1000
>> print a[0,-1]
>> print a[0,-1]
>> print a[-1,-1]
>>
>> In all cases 1000 is printed out.
>>
>>
> Looks right to me, the whole last column is 1000. What exactly do you want
> to do and what is the problem?
>
> <snip>
>
> Chuck
>
>
I would imagine that it is some sort of image processing use-case, where
sometimes you want the data to reflect at the boundaries, or be constant,
or have some other value used for access outside the domain.  So, for
reflect, I would guess that he would have wanted 0.0 for the first two and
1000 for the last one.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/ec9d39f2/attachment.html>

From ben.root at ou.edu  Mon Jan 16 16:42:39 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Mon, 16 Jan 2012 15:42:39 -0600
Subject: [Numpy-discussion] Negative indexing.
In-Reply-To: <CANNq6FmsaGMyuP+zOqrZC1xUs0sZ6Jb4FK2KyjCowJFoY+JqNg@mail.gmail.com>
References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
	<CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com>
	<CANNq6FmsaGMyuP+zOqrZC1xUs0sZ6Jb4FK2KyjCowJFoY+JqNg@mail.gmail.com>
Message-ID: <CANNq6FnmNpShMRJj7KMgwAUPRi5YF9EU3UHtz-tLDOSQhqBU=Q@mail.gmail.com>

On Mon, Jan 16, 2012 at 3:30 PM, Benjamin Root <ben.root at ou.edu> wrote:

>
>
> On Mon, Jan 16, 2012 at 3:24 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sat, Jan 14, 2012 at 4:53 PM, Nathan Faggian <nathan.faggian at gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I am finding it less than useful to have the negative index wrapping on
>>> nd-arrays. Here is a short example:
>>>
>>> import numpy as np
>>> a = np.zeros((3, 3))
>>> a[:,2] = 1000
>>> print a[0,-1]
>>> print a[0,-1]
>>> print a[-1,-1]
>>>
>>> In all cases 1000 is printed out.
>>>
>>>
>> Looks right to me, the whole last column is 1000. What exactly do you
>> want to do and what is the problem?
>>
>> <snip>
>>
>> Chuck
>>
>>
> I would imagine that it is some sort of image processing use-case, where
> sometimes you want the data to reflect at the boundaries, or be constant,
> or have some other value used for access outside the domain.  So, for
> reflect, I would guess that he would have wanted 0.0 for the first two and
> 1000 for the last one.
>
> Ben Root
>
>
Errr, I mean 0.0 for the last one.  I can't think today.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120116/1d523065/attachment.html>

From nathan.faggian at gmail.com  Mon Jan 16 19:23:24 2012
From: nathan.faggian at gmail.com (Nathan Faggian)
Date: Tue, 17 Jan 2012 11:23:24 +1100
Subject: [Numpy-discussion] Negative indexing.
In-Reply-To: <CANNq6FnmNpShMRJj7KMgwAUPRi5YF9EU3UHtz-tLDOSQhqBU=Q@mail.gmail.com>
References: <84306A1C-898B-410F-ABFA-5A15DCCDD27F@gmail.com>
	<CAB6mnxLG7RqUzn96twAcXs1d-ZspXpUuAB_qfwyiwZGpfHQozQ@mail.gmail.com>
	<CANNq6FmsaGMyuP+zOqrZC1xUs0sZ6Jb4FK2KyjCowJFoY+JqNg@mail.gmail.com>
	<CANNq6FnmNpShMRJj7KMgwAUPRi5YF9EU3UHtz-tLDOSQhqBU=Q@mail.gmail.com>
Message-ID: <CAN1J6jUzwBAxafSodrGLwgohRDXhfasGVY0O-mOyLZkueJjv6A@mail.gmail.com>

Hi,

I am sorry for the late reply.

Benjamin has hit the nail on the head. I guess I am seeing numpy
"fancy indexing" as equivalent to integer based coordinate sampling
and trying to compare numpy's fancy indexing to something like
map_coordinates in scipy.

I have never used np.ravel_multi_index() and will have a look at this now.

-N


On 17 January 2012 08:42, Benjamin Root <ben.root at ou.edu> wrote:
> On Mon, Jan 16, 2012 at 3:30 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>
>>
>>
>> On Mon, Jan 16, 2012 at 3:24 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>>
>>>
>>> On Sat, Jan 14, 2012 at 4:53 PM, Nathan Faggian
>>> <nathan.faggian at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am finding it less than useful to have the negative index wrapping on
>>>> nd-arrays. Here is a short example:
>>>>
>>>> import numpy as np
>>>> a = np.zeros((3, 3))
>>>> a[:,2] = 1000
>>>> print a[0,-1]
>>>> print a[0,-1]
>>>> print a[-1,-1]
>>>>
>>>> In all cases 1000 is printed out.
>>>>
>>>
>>> Looks right to me, the whole last column is 1000. What exactly do you
>>> want to do and what is the problem?
>>>
>>> <snip>
>>>
>>> Chuck
>>>
>>
>> I would imagine that it is some sort of image processing use-case, where
>> sometimes you want the data to reflect at the boundaries, or be constant, or
>> have some other value used for access outside the domain.? So, for reflect,
>> I would guess that he would have wanted 0.0 for the first two and 1000 for
>> the last one.
>>
>> Ben Root
>>
>
> Errr, I mean 0.0 for the last one.? I can't think today.
>
> Ben Root
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From lists at informa.tiker.net  Tue Jan 17 00:11:10 2012
From: lists at informa.tiker.net (Andreas Kloeckner)
Date: Tue, 17 Jan 2012 00:11:10 -0500
Subject: [Numpy-discussion] dtype comparison, hash
In-Reply-To: <CAF6FJiuPGdg06LO-7B5v5gb741-t-OtN3H2Rn7h_HzcHUfkOMg@mail.gmail.com>
References: <878vlyu7uq.fsf@ding.tiker.net>
	<CAF6FJit4zMEibGi6NACKUNCOZh5viucqB=Pf5+S1CvHDMLqdvA@mail.gmail.com>
	<87wr9drios.fsf@ding.tiker.net>
	<CAF6FJiuPGdg06LO-7B5v5gb741-t-OtN3H2Rn7h_HzcHUfkOMg@mail.gmail.com>
Message-ID: <87ty3u7w29.fsf@ding.tiker.net>

Hi Robert,

On Fri, 30 Dec 2011 20:05:14 +0000, Robert Kern <robert.kern at gmail.com> wrote:
> On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner
> <lists at informa.tiker.net> wrote:
> > Hi Robert,
> >
> > On Tue, 27 Dec 2011 10:17:41 +0000, Robert Kern <robert.kern at gmail.com> wrote:
> >> On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner
> >> <lists at informa.tiker.net> wrote:
> >> > Hi all,
> >> >
> >> > Two questions:
> >> >
> >> > - Are dtypes supposed to be comparable (i.e. implement '==', '!=')?
> >>
> >> Yes.
> >>
> >> > - Are dtypes supposed to be hashable?
> >>
> >> Yes, with caveats. Strictly speaking, we violate the condition that
> >> objects that equal each other should hash equal since we define == to
> >> be rather free. Namely,
> >>
> >> ? np.dtype(x) == x
> >>
> >> for all objects x that can be converted to a dtype.
> >>
> >> ? np.dtype(float) == np.dtype('float')
> >> ? np.dtype(float) == float
> >> ? np.dtype(float) == 'float'
> >>
> >> Since hash(float) != hash('float') we cannot implement
> >> np.dtype.__hash__() to follow the stricture that objects that compare
> >> equal should hash equal.
> >>
> >> However, if you restrict the domain of objects to just dtypes (i.e.
> >> only consider dicts that use only actual dtype objects as keys instead
> >> of arbitrary mixtures of objects), then the stricture is obeyed. This
> >> is a useful domain that is used internally in numpy.
> >>
> >> Is this the problem that you found?
> >
> > Thanks for the reply.
> >
> > It doesn't seem like this is our issue--instead, we're encountering two
> > different dtype objects that claim to be float64, compare as equal, but
> > don't hash to the same value.
> >
> > I've asked the user who encountered the user to investigate, and I'll
> > be back with more detail in a bit.
> 
> I think we've run into this before and tried to fix it. Try to find
> the version of numpy the user has and a minimal example, if you can.

This is what Thomas found:

http://projects.scipy.org/numpy/ticket/2017

Hope this helps,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120117/a4c2e34f/attachment.sig>

From robert.kern at gmail.com  Tue Jan 17 09:28:21 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 17 Jan 2012 14:28:21 +0000
Subject: [Numpy-discussion] dtype comparison, hash
In-Reply-To: <87ty3u7w29.fsf@ding.tiker.net>
References: <878vlyu7uq.fsf@ding.tiker.net>
	<CAF6FJit4zMEibGi6NACKUNCOZh5viucqB=Pf5+S1CvHDMLqdvA@mail.gmail.com>
	<87wr9drios.fsf@ding.tiker.net>
	<CAF6FJiuPGdg06LO-7B5v5gb741-t-OtN3H2Rn7h_HzcHUfkOMg@mail.gmail.com>
	<87ty3u7w29.fsf@ding.tiker.net>
Message-ID: <CAF6FJiu_Y=tKyNuggU6RXQK8tU6-GPJJkrOpz_DDs1Fxkw8U=g@mail.gmail.com>

On Tue, Jan 17, 2012 at 05:11, Andreas Kloeckner
<lists at informa.tiker.net> wrote:
> Hi Robert,
>
> On Fri, 30 Dec 2011 20:05:14 +0000, Robert Kern <robert.kern at gmail.com> wrote:
>> On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner
>> <lists at informa.tiker.net> wrote:
>> > Hi Robert,
>> >
>> > On Tue, 27 Dec 2011 10:17:41 +0000, Robert Kern <robert.kern at gmail.com> wrote:
>> >> On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner
>> >> <lists at informa.tiker.net> wrote:
>> >> > Hi all,
>> >> >
>> >> > Two questions:
>> >> >
>> >> > - Are dtypes supposed to be comparable (i.e. implement '==', '!=')?
>> >>
>> >> Yes.
>> >>
>> >> > - Are dtypes supposed to be hashable?
>> >>
>> >> Yes, with caveats. Strictly speaking, we violate the condition that
>> >> objects that equal each other should hash equal since we define == to
>> >> be rather free. Namely,
>> >>
>> >> ? np.dtype(x) == x
>> >>
>> >> for all objects x that can be converted to a dtype.
>> >>
>> >> ? np.dtype(float) == np.dtype('float')
>> >> ? np.dtype(float) == float
>> >> ? np.dtype(float) == 'float'
>> >>
>> >> Since hash(float) != hash('float') we cannot implement
>> >> np.dtype.__hash__() to follow the stricture that objects that compare
>> >> equal should hash equal.
>> >>
>> >> However, if you restrict the domain of objects to just dtypes (i.e.
>> >> only consider dicts that use only actual dtype objects as keys instead
>> >> of arbitrary mixtures of objects), then the stricture is obeyed. This
>> >> is a useful domain that is used internally in numpy.
>> >>
>> >> Is this the problem that you found?
>> >
>> > Thanks for the reply.
>> >
>> > It doesn't seem like this is our issue--instead, we're encountering two
>> > different dtype objects that claim to be float64, compare as equal, but
>> > don't hash to the same value.
>> >
>> > I've asked the user who encountered the user to investigate, and I'll
>> > be back with more detail in a bit.
>>
>> I think we've run into this before and tried to fix it. Try to find
>> the version of numpy the user has and a minimal example, if you can.
>
> This is what Thomas found:
>
> http://projects.scipy.org/numpy/ticket/2017

It looks like the .flags attribute is different between np.uintp and
np.uint32. The .flags attribute forms part of the hashed information
about the dtype (or PyArray_Descr at the C-level).

[~]
|15> np.dtype(np.uintp).flags
1536

[~]
|16> np.dtype(np.uint32).flags
2048

The same goes for np.intp and np.int32 in numpy 1.6.1 on OS X, so
unlike the comment in the ticket, they do have different hashes for
me.

However, diving through the source a bit, I'm not entirely sure I
trust the values being given at the Python level. It appears that the
flag member of the PyArray_Descr struct is declared as a char.
However, it is exposed as a T_INT member in the PyMemberDef table by
direct addressing. Basically, a Python descriptor gets added to the
np.dtype type that will look up sizeof(long) bytes from the starting
position of the flags member in the struct. This includes 3 bytes of
the following type_num member. Obviously, 2048 does not fit into a
char. Nonetheless, the type_num is also part of the hash, so either
the flags member or the type_num member is different between the two.

Two bugs for the price of one!

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From adam at lambdafoundry.com  Tue Jan 17 10:57:29 2012
From: adam at lambdafoundry.com (Adam Klein)
Date: Tue, 17 Jan 2012 10:57:29 -0500
Subject: [Numpy-discussion] segfault on searchsorted (1.6.2.dev-396dbb9)
Message-ID: <CANxW0rbJAKczXEiS_-7k5FBmHFgk4NhtPzVG4SoFyRMC_2xpyw@mail.gmail.com>

Hello,

I get a segfault here:

In [1]: x = np.array([1,2,3], dtype='M')
In [2]: x.searchsorted(2, side='left')

But it's fine here:

In [1]: x = np.array([1,2,3], dtype='M')
In [2]: x.view('i8').searchsorted(2, side='left')
Out[2]: 1

This segfaults again:

x.view('i8').searchsorted(np.datetime64(2), side='left')

GDB gets me this far:

Program received signal SIGSEGV, Segmentation fault.
PyArray_SearchSorted (op1=0x1b8dd70, op2=0x17dfac0, side=NPY_SEARCHLEFT) at
numpy/core/src/multiarray/item_selection.c:1463
1463        Py_INCREF(dtype);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120117/90b3ac63/attachment.html>

From charlesr.harris at gmail.com  Tue Jan 17 11:53:37 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 17 Jan 2012 09:53:37 -0700
Subject: [Numpy-discussion] segfault on searchsorted (1.6.2.dev-396dbb9)
In-Reply-To: <CANxW0rbJAKczXEiS_-7k5FBmHFgk4NhtPzVG4SoFyRMC_2xpyw@mail.gmail.com>
References: <CANxW0rbJAKczXEiS_-7k5FBmHFgk4NhtPzVG4SoFyRMC_2xpyw@mail.gmail.com>
Message-ID: <CAB6mnxL-FC7PQkD5hz9T5mSDP16aeUAE-TVrkvV5xn89OMjQUQ@mail.gmail.com>

On Tue, Jan 17, 2012 at 8:57 AM, Adam Klein <adam at lambdafoundry.com> wrote:

> Hello,
>
> I get a segfault here:
>
> In [1]: x = np.array([1,2,3], dtype='M')
> In [2]: x.searchsorted(2, side='left')
>
> But it's fine here:
>
> In [1]: x = np.array([1,2,3], dtype='M')
> In [2]: x.view('i8').searchsorted(2, side='left')
> Out[2]: 1
>
> This segfaults again:
>
> x.view('i8').searchsorted(np.datetime64(2), side='left')
>
> GDB gets me this far:
>
> Program received signal SIGSEGV, Segmentation fault.
> PyArray_SearchSorted (op1=0x1b8dd70, op2=0x17dfac0, side=NPY_SEARCHLEFT)
> at numpy/core/src/multiarray/item_selection.c:1463
> 1463        Py_INCREF(dtype);
>
>
>
Confirmed in current development. Note that things have changed and the
initial array creation will fail (no unit). The searchsorted will work if
searching for a datetime:

In [10]: x = np.array([1,2,3], 'datetime64[D]')

In [11]: x.searchsorted(datetime64(2,'D'))
Out[11]: 1

So the failure is one of raising an appropriate error message.

Please open a ticket.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120117/6ab37a02/attachment.html>

From chris.barker at noaa.gov  Tue Jan 17 15:20:37 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 17 Jan 2012 12:20:37 -0800
Subject: [Numpy-discussion] Counting the Colors of RGB-Image
In-Reply-To: <26FC23E7C398A64083C980D16001012D261E514763@VA3DIAXVS361.RED001.local>
References: <1118265273.1201150.1326642348079.JavaMail.tomcat55@mrmseu1.kundenserver.de>
	<CAEym_Hrh9Zt0q5p5QcqcvE4_utifKT0rG5mQh9uYdKk5Pj+xpA@mail.gmail.com>
	<26FC23E7C398A64083C980D16001012D261E514763@VA3DIAXVS361.RED001.local>
Message-ID: <CALGmxEKPOJB8gmg3Z4B_U9o8z0+dVfGh89E28ETfzU2De7R8sw@mail.gmail.com>

Here's a thought:

Too bad numpy doesn't have a 24 bit integer, but you could tack a 0
on, making your image 32 bit, then use histogram2d to count the
colors.

something like (untested):

# create the 32 bit image
32bit_im = np.zeros((w, h), dtype = np.uint32)
view = 32bit_im.view(dtype = np.uint8).reshape((w,h,4))
view[:,:,:3] = im

# histogram it:
bins = # this is the trick -- setting your bins right
           # remember that histrogram is designed for floats, so
you're bin boundaries shold be between the inteer values you want.

colors = np.histogram(32bit_im, bins=bins)


NOTE: the image processing scikit may well have somethign already --
histogramming an image is a common process.

-Chris


On Sun, Jan 15, 2012 at 9:40 AM, Nadav Horesh <nadavh at visionsense.com> wrote:
> im_flat = im0[...,0]*65536 + im[...,1]*256 +im[...,2]
> colours = np.unique(im_flat)
>
> ?? Nadav
>
> ________________________________
> From: numpy-discussion-bounces at scipy.org
> [numpy-discussion-bounces at scipy.org] On Behalf Of Tony Yu [tsyu80 at gmail.com]
> Sent: 15 January 2012 18:03
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Counting the Colors of RGB-Image
>
>
>
> On Sun, Jan 15, 2012 at 10:45 AM, <apo at pdauf.de> wrote:
>>
>>
>> Counting the Colors of RGB-Image,
>> nameit im0 with im0.shape = 2500,3500,3
>> with this code:
>>
>> tab0 = zeros( (256,256,256) , dtype=int)
>> tt = im0.view()
>> tt.shape = -1,3
>> for r,g,b in tt:
>> ?tab0[r,g,b] += 1
>>
>> Question:
>>
>> Is there a faster way in numpy to get this result?
>>
>>
>> MfG elodw
>
>
> Assuming that your image is made up of integer values (which I guess they'd
> have to be if you're indexing into `tab0`), then you could write:
>
>>>> rgb_unique = set(tuple(rgb) for rgb in tt)
>
> I'm not sure if it's any faster than your loop, but I would assume it is.
>
> -Tony
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From sturla at molden.no  Wed Jan 18 00:26:10 2012
From: sturla at molden.no (Sturla Molden)
Date: Wed, 18 Jan 2012 06:26:10 +0100
Subject: [Numpy-discussion] Strange numpy behaviour (bug?)
Message-ID: <4F1657F2.2020803@molden.no>


While "playing" with a point-in-polygon test, I have discovered some a 
failure mode that I cannot make sence of.

The algorithm is vectorized for NumPy from a C and Python implementation 
I found on the net (see links below). It is written to process a large 
dataset in chunks. I'm rather happy with it, it can test 100,000 x,y 
points against a non-convex pentagon in just 50 ms.

Anyway, here is something very strange (or at least I think so):

If I use a small chunk size, it sometimes fails. I know I shouldn't 
blame it on NumPy, beacuse it is by all likelood my mistake. But it does 
not make any sence, as the parameter should not affect the computation.

Observed behavior:

1. Processing the whole dataset in one big chunk always works.

2. Processing the dataset in big chunks (e.g. 8192 points) always works.

3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail.

4. Processing the dataset element-wise always work.

5. The scalar version behaves like the numpy version: fine for large 
chunks, sometimes it fails for small. That is, when list comprehensions 
is used for chunks. Big list comprehensions always work, small ones 
might fail.

It looks like the numerical robstness of the alorithm depends on a 
parameter that has nothing to do with the algorithm at all. For example 
in (5), we might think that calling a function from a nested loop makes 
it fail, depending on the length of the inner loop. But calling it from 
a single loop works just fine.

???

So I wonder:

Could there be a bug in numpy that only shows up only when taking a huge 
number of short slices?

I don't know... But try it if you care.

In the function "inpolygon", change the call that says __chunk(n,8192) 
to e.g. __chunk(n,32) to see it fail (or at least it does on my 
computer, running Enthought 7.2-1 on Win64).


Regards,
Sturla Molden


def __inpolygon_scalar(x,y,poly):

     # Source code taken from:
     # http://paulbourke.net/geometry/insidepoly
     # http://www.ariel.com.au/a/python-point-int-poly.html

     n = len(poly)
     inside = False
     p1x,p1y = poly[0]
     xinters = 0
     for i in range(n+1):
         p2x,p2y = poly[i % n]
         if y > min(p1y,p2y):
             if y <= max(p1y,p2y):
                 if x <= max(p1x,p2x):
                     if p1y != p2y:
                         xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
                     if p1x == p2x or x <= xinters:
                         inside = not inside
         p1x,p1y = p2x,p2y
     return inside


# the rest is (C) Sturla Molden, 2012
# University of Oslo

def __inpolygon_numpy(x,y,poly):
     """ numpy vectorized version """
     n = len(poly)
     inside = np.zeros(x.shape[0], dtype=bool)
     xinters = np.zeros(x.shape[0], dtype=float)
     p1x,p1y = poly[0]
     for i in range(n+1):
         p2x,p2y = poly[i % n]
         mask = (y > min(p1y,p2y)) & (y <= max(p1y,p2y)) & (x <= 
max(p1x,p2x))
         if p1y != p2y:
             xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
         if p1x == p2x:
             inside[mask] = ~inside[mask]
         else:
             mask2 = x[mask] <= xinters[mask]
             idx, = np.where(mask)
             idx2, = np.where(mask2)
             idx = idx[idx2]
             inside[idx] = ~inside[idx]
         p1x,p1y = p2x,p2y
     return inside

def __chunk(n,size):
     x = range(0,n,size)
     if (n%size):
         x.append(n)
     return zip(x[:-1],x[1:])

def inpolygon(x, y, poly):
     """
     point-in-polygon test
     x and y are numpy arrays
     polygon is a list of (x,y) vertex tuples
     """
     if np.isscalar(x) and np.isscalar(y):
         return __inpolygon_scalar(x, y, poly)
     else:
         x = np.asarray(x)
         y = np.asarray(y)
         n = x.shape[0]
         z = np.zeros(n, dtype=bool)
         for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ???
             if j-i > 1:
                 z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly)
             else:
                 z[i] = __inpolygon_scalar(x[i], y[i], poly)
         return z


if __name__ == "__main__":

     import matplotlib
     import matplotlib.pyplot as plt
     from time import clock

     n = 100000
     polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)]
     xp = [x for x,y in polygon]
     yp = [y for x,y in polygon]
     x = np.random.rand(n)
     y = np.random.rand(n)
     t0 = clock()
     inside = inpolygon(x,y,polygon)
     t1 = clock()
     print 'elapsed time %.3g ms' % ((t0-t1)*1E3,)
     plt.figure()
     plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g')
     plt.axis([0,1,0,1])
     plt.show()


From sturla at molden.no  Wed Jan 18 00:57:38 2012
From: sturla at molden.no (Sturla Molden)
Date: Wed, 18 Jan 2012 06:57:38 +0100
Subject: [Numpy-discussion] Strange numpy behaviour (bug?)
In-Reply-To: <4F1657F2.2020803@molden.no>
References: <4F1657F2.2020803@molden.no>
Message-ID: <4F165F52.6070300@molden.no>


Never mind this, it was my own mistake as I expected :-)

def __chunk(n,size):
     x = range(0,n,size)
     x.append(n)
     return zip(x[:-1],x[1:])

makes it a lot better :)

Sturla


Den 18.01.2012 06:26, skrev Sturla Molden:
> While "playing" with a point-in-polygon test, I have discovered some a
> failure mode that I cannot make sence of.
>
> The algorithm is vectorized for NumPy from a C and Python implementation
> I found on the net (see links below). It is written to process a large
> dataset in chunks. I'm rather happy with it, it can test 100,000 x,y
> points against a non-convex pentagon in just 50 ms.
>
> Anyway, here is something very strange (or at least I think so):
>
> If I use a small chunk size, it sometimes fails. I know I shouldn't
> blame it on NumPy, beacuse it is by all likelood my mistake. But it does
> not make any sence, as the parameter should not affect the computation.
>
> Observed behavior:
>
> 1. Processing the whole dataset in one big chunk always works.
>
> 2. Processing the dataset in big chunks (e.g. 8192 points) always works.
>
> 3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail.
>
> 4. Processing the dataset element-wise always work.
>
> 5. The scalar version behaves like the numpy version: fine for large
> chunks, sometimes it fails for small. That is, when list comprehensions
> is used for chunks. Big list comprehensions always work, small ones
> might fail.
>
> It looks like the numerical robstness of the alorithm depends on a
> parameter that has nothing to do with the algorithm at all. For example
> in (5), we might think that calling a function from a nested loop makes
> it fail, depending on the length of the inner loop. But calling it from
> a single loop works just fine.
>
> ???
>
> So I wonder:
>
> Could there be a bug in numpy that only shows up only when taking a huge
> number of short slices?
>
> I don't know... But try it if you care.
>
> In the function "inpolygon", change the call that says __chunk(n,8192)
> to e.g. __chunk(n,32) to see it fail (or at least it does on my
> computer, running Enthought 7.2-1 on Win64).
>
>
> Regards,
> Sturla Molden
>
>
>
>
>
> def __inpolygon_scalar(x,y,poly):
>
>       # Source code taken from:
>       # http://paulbourke.net/geometry/insidepoly
>       # http://www.ariel.com.au/a/python-point-int-poly.html
>
>       n = len(poly)
>       inside = False
>       p1x,p1y = poly[0]
>       xinters = 0
>       for i in range(n+1):
>           p2x,p2y = poly[i % n]
>           if y>  min(p1y,p2y):
>               if y<= max(p1y,p2y):
>                   if x<= max(p1x,p2x):
>                       if p1y != p2y:
>                           xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
>                       if p1x == p2x or x<= xinters:
>                           inside = not inside
>           p1x,p1y = p2x,p2y
>       return inside
>
>
> # the rest is (C) Sturla Molden, 2012
> # University of Oslo
>
> def __inpolygon_numpy(x,y,poly):
>       """ numpy vectorized version """
>       n = len(poly)
>       inside = np.zeros(x.shape[0], dtype=bool)
>       xinters = np.zeros(x.shape[0], dtype=float)
>       p1x,p1y = poly[0]
>       for i in range(n+1):
>           p2x,p2y = poly[i % n]
>           mask = (y>  min(p1y,p2y))&  (y<= max(p1y,p2y))&  (x<=
> max(p1x,p2x))
>           if p1y != p2y:
>               xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
>           if p1x == p2x:
>               inside[mask] = ~inside[mask]
>           else:
>               mask2 = x[mask]<= xinters[mask]
>               idx, = np.where(mask)
>               idx2, = np.where(mask2)
>               idx = idx[idx2]
>               inside[idx] = ~inside[idx]
>           p1x,p1y = p2x,p2y
>       return inside
>
> def __chunk(n,size):
>       x = range(0,n,size)
>       if (n%size):
>           x.append(n)
>       return zip(x[:-1],x[1:])
>
> def inpolygon(x, y, poly):
>       """
>       point-in-polygon test
>       x and y are numpy arrays
>       polygon is a list of (x,y) vertex tuples
>       """
>       if np.isscalar(x) and np.isscalar(y):
>           return __inpolygon_scalar(x, y, poly)
>       else:
>           x = np.asarray(x)
>           y = np.asarray(y)
>           n = x.shape[0]
>           z = np.zeros(n, dtype=bool)
>           for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ???
>               if j-i>  1:
>                   z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly)
>               else:
>                   z[i] = __inpolygon_scalar(x[i], y[i], poly)
>           return z
>
>
>
> if __name__ == "__main__":
>
>       import matplotlib
>       import matplotlib.pyplot as plt
>       from time import clock
>
>       n = 100000
>       polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)]
>       xp = [x for x,y in polygon]
>       yp = [y for x,y in polygon]
>       x = np.random.rand(n)
>       y = np.random.rand(n)
>       t0 = clock()
>       inside = inpolygon(x,y,polygon)
>       t1 = clock()
>       print 'elapsed time %.3g ms' % ((t0-t1)*1E3,)
>       plt.figure()
>       plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g')
>       plt.axis([0,1,0,1])
>       plt.show()
>
>
>
>
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From fperez.net at gmail.com  Wed Jan 18 04:22:57 2012
From: fperez.net at gmail.com (Fernando Perez)
Date: Wed, 18 Jan 2012 01:22:57 -0800
Subject: [Numpy-discussion] Download page still points to SVN
Message-ID: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>

Hi folks,

I was just pointing a colleague to the 'official download page' for
numpy so he could find how to grab current sources:

http://new.scipy.org/download.html

but I was quite surprised to find that it still points to SVN for both
numpy and scipy.  It would probably not be a bad idea to update those
and point them to github...

Cheers,

f


From apo at pdauf.de  Wed Jan 18 04:26:25 2012
From: apo at pdauf.de (apo at pdauf.de)
Date: Wed, 18 Jan 2012 10:26:25 +0100 (CET)
Subject: [Numpy-discussion] Counting the Colors of RGB-Image
Message-ID: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/8fbd4065/attachment.html>
-------------- next part --------------

Sorry, 

that i use this way to send an answer to Tony Yu , Nadav Horesh , Chris Barker.
When iam direct answering on Your e-mail i get an error 5.
I think i did a mistake.

Your ideas are very helpfull and the code is very fast.

Thank You

elodw 


From scott.sinclair.za at gmail.com  Wed Jan 18 05:18:49 2012
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Wed, 18 Jan 2012 12:18:49 +0200
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
Message-ID: <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>

On 18 January 2012 11:22, Fernando Perez <fperez.net at gmail.com> wrote:
> I was just pointing a colleague to the 'official download page' for
> numpy so he could find how to grab current sources:
>
> http://new.scipy.org/download.html
>
> but I was quite surprised to find that it still points to SVN for both
> numpy and scipy. ?It would probably not be a bad idea to update those
> and point them to github...

It's rather confusing having two websites. The "official" page at
http://www.scipy.org/Download points to github.

There hasn't been much maintenance effort for new.scipy.org, and there
was some recent discussion about taking it offline. I'm not sure if a
firm conclusion was reached.

Cheers,
Scott


From numpy-discussion at maubp.freeserve.co.uk  Wed Jan 18 05:19:21 2012
From: numpy-discussion at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jan 2012 10:19:21 +0000
Subject: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series
	of arrays
In-Reply-To: <CAKVJ-_5vmFrxmjRezmozwimfFS4OGKVxyt2AVaTTufRO-zOjgw@mail.gmail.com>
References: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com>
	<CAF6FJiuocxCrV7Hogq2-mHMB_X+T=FWi1hy_wmNfLmycrbpJQQ@mail.gmail.com>
	<CAKVJ-_5vmFrxmjRezmozwimfFS4OGKVxyt2AVaTTufRO-zOjgw@mail.gmail.com>
Message-ID: <CAKVJ-_7puzauUqSHSUNSD1tFZz0ySZ6ePrD1RmOGDCict2G4kQ@mail.gmail.com>

Sending this again (sorry Robert, this will be the second time
for you) since I sent from a non-subscribed email address the
first time.

On Sun, Jan 15, 2012 at 7:12 PM, Robert Kern wrote:
> On Sun, Jan 15, 2012 at 19:10, Peter wrote:
>> Hello all,
>>
>> Is there a recommended (and ideally cross platform)
>> way to load the frames of a QuickTime movie (*.mov
>> file) in Python as NumPy arrays? ...
>
> I've had luck with pyffmpeg, though I haven't tried
> QuickTime .mov files:
>
> ?http://code.google.com/p/pyffmpeg/

Thanks for the suggestion.

Sadly right now pyffmpeg won't install on Mac OS X,
at least not with the version of Cython I have installed:
http://code.google.com/p/pyffmpeg/issues/detail?id=44

There doesn't seem to have been any activity on the
official repository for some time either.

Peter


From robert.kern at gmail.com  Wed Jan 18 05:36:39 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 18 Jan 2012 10:36:39 +0000
Subject: [Numpy-discussion] Loading a Quicktime moive (*.mov) as series
	of arrays
In-Reply-To: <CAKVJ-_7puzauUqSHSUNSD1tFZz0ySZ6ePrD1RmOGDCict2G4kQ@mail.gmail.com>
References: <CAKVJ-_65Z-EYFxsigxyREZRvaM9-SOQAOy868KL+EE60By8fRg@mail.gmail.com>
	<CAF6FJiuocxCrV7Hogq2-mHMB_X+T=FWi1hy_wmNfLmycrbpJQQ@mail.gmail.com>
	<CAKVJ-_5vmFrxmjRezmozwimfFS4OGKVxyt2AVaTTufRO-zOjgw@mail.gmail.com>
	<CAKVJ-_7puzauUqSHSUNSD1tFZz0ySZ6ePrD1RmOGDCict2G4kQ@mail.gmail.com>
Message-ID: <CAF6FJiuAM1F_4vrNOdr9DHNOABvnLiZyuBAeO-j6Bi1kCwxJxw@mail.gmail.com>

On Wed, Jan 18, 2012 at 10:19, Peter
<numpy-discussion at maubp.freeserve.co.uk> wrote:
> Sending this again (sorry Robert, this will be the second time
> for you) since I sent from a non-subscribed email address the
> first time.
>
> On Sun, Jan 15, 2012 at 7:12 PM, Robert Kern wrote:
>> On Sun, Jan 15, 2012 at 19:10, Peter wrote:
>>> Hello all,
>>>
>>> Is there a recommended (and ideally cross platform)
>>> way to load the frames of a QuickTime movie (*.mov
>>> file) in Python as NumPy arrays? ...
>>
>> I've had luck with pyffmpeg, though I haven't tried
>> QuickTime .mov files:
>>
>> ?http://code.google.com/p/pyffmpeg/
>
> Thanks for the suggestion.
>
> Sadly right now pyffmpeg won't install on Mac OS X,
> at least not with the version of Cython I have installed:
> http://code.google.com/p/pyffmpeg/issues/detail?id=44
>
> There doesn't seem to have been any activity on the
> official repository for some time either.

Oh, right, I had to fix those, too. I've attached the patches that I
used. I used MacPorts to install the ffmpeg libraries, so I modified
the paths in the setup.py appropriately.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-fix.diff
Type: application/octet-stream
Size: 1103 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/ec5b0211/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cinit-fix.diff
Type: application/octet-stream
Size: 2427 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/ec5b0211/attachment-0001.obj>

From malcolm.reynolds at gmail.com  Wed Jan 18 09:59:26 2012
From: malcolm.reynolds at gmail.com (Malcolm Reynolds)
Date: Wed, 18 Jan 2012 14:59:26 +0000
Subject: [Numpy-discussion] "Reference count error detected" bug appears
 with multithreading (OpenMP & TBB)
Message-ID: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com>

Hi,

I've built a system which allocates numpy arrays and processes them in
C++ code (this is because I'm building a native code module using
boost.python and it makes sense to use numpy data storage to then deal
with outputs in python, without having to do any copying). Everything
seems fine except when I parallelise the main loop, (openmp and TBB
give the same results) in which case I see a whole bunch of messages
saying

"reference count error detected: an attempt was made to deallocate 12 (d)"

sometimes during the running of the program, sometimes all at the end
(presumably when all the destructors in my program run).

To clarify, the loop I am now running parallel takes read-only
parameters (enforced by the C++ compiler using 'const') and as far as
I can tell there are no race conditions with multiple threads writing
to the same numpy arrays at once or anything obvious like that.

I recompiled numpy (I'm using 1.6.1 from the official git repository)
to print out some extra information with the reference count message,
namely a pointer to the thing which is being erroneously deallocated.
Surprisingly, it is always the same address for any run of the
program, considering this is a message printed out hundreds of times.

I've looked into this a little with GDB and as far as I can see the
object which the message pertains to is an "array descriptor", or at
least that's what I conclude from backtraces similar to the following:

Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
1501	        fprintf(stderr, "*** Reference count error detected: \n" \
(gdb) bt
#0  arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
#1  0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271
#2  0x0000000103e592d7 in
boost::detail::sp_counted_impl_p<garf::multivariate_normal<double>
const>::dispose (this=<value temporarily unavailable, due to
optimizations>) at refcount.hpp:36
#3 .... my code

Obviously I can turn off the parallelism to make this problem go away,
but since my underlying algorithm is trivially parallelisable I was
counting on being able to achieve linear speedup across cores..
Currently I can, and as far as I know there are no actual incorrect
results being produced by the program. However, in my field (Machine
Learning) it's difficult enough to know whether the numbers calculated
are sensible even without the presence of these kind of warnings, so
I'd like to get a handle on at least why this is happening so I'd know
know whether I can safely ignore it.

My guess at what might be happening is that the multiple threads are
dealing with some object concurrently and the updates to the reference
count are not processed atomically, meaning that there are too many
DECREFs which happen later on. I had presumed that allocated different
numpy matrices in different threads, and then all reading from central
numpy matrices would work fine, but apparently there is something I
missed, pertaining to descriptors..

Can anyone offer any guidance, or at least tell me this is safe to
ignore? I can reproduce the problem reliably, so if you need me to do
some digging with GDB at the point the error takes place I can do
that.

Many thanks,

Malcolm


From robert.kern at gmail.com  Wed Jan 18 10:15:31 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 18 Jan 2012 15:15:31 +0000
Subject: [Numpy-discussion] "Reference count error detected" bug appears
 with multithreading (OpenMP & TBB)
In-Reply-To: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com>
References: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com>
Message-ID: <CAF6FJis8Avwb6+uuPPxHm--oaJM3ayyjYYdtud9+idLy4ZhyjA@mail.gmail.com>

On Wed, Jan 18, 2012 at 14:59, Malcolm Reynolds
<malcolm.reynolds at gmail.com> wrote:
> Hi,
>
> I've built a system which allocates numpy arrays and processes them in
> C++ code (this is because I'm building a native code module using
> boost.python and it makes sense to use numpy data storage to then deal
> with outputs in python, without having to do any copying). Everything
> seems fine except when I parallelise the main loop, (openmp and TBB
> give the same results) in which case I see a whole bunch of messages
> saying
>
> "reference count error detected: an attempt was made to deallocate 12 (d)"
>
> sometimes during the running of the program, sometimes all at the end
> (presumably when all the destructors in my program run).
>
> To clarify, the loop I am now running parallel takes read-only
> parameters (enforced by the C++ compiler using 'const') and as far as
> I can tell there are no race conditions with multiple threads writing
> to the same numpy arrays at once or anything obvious like that.
>
> I recompiled numpy (I'm using 1.6.1 from the official git repository)
> to print out some extra information with the reference count message,
> namely a pointer to the thing which is being erroneously deallocated.
> Surprisingly, it is always the same address for any run of the
> program, considering this is a message printed out hundreds of times.
>
> I've looked into this a little with GDB and as far as I can see the
> object which the message pertains to is an "array descriptor", or at
> least that's what I conclude from backtraces similar to the following:
>
> Breakpoint 1, arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
> 1501 ? ? ? ? ? ?fprintf(stderr, "*** Reference count error detected: \n" \
> (gdb) bt
> #0 ?arraydescr_dealloc (self=0x1029083a0) at descriptor.c:1501
> #1 ?0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271
> #2 ?0x0000000103e592d7 in
> boost::detail::sp_counted_impl_p<garf::multivariate_normal<double>
> const>::dispose (this=<value temporarily unavailable, due to
> optimizations>) at refcount.hpp:36
> #3 .... my code

I suspect there is some problem with the reference counting that you
are doing at the C++ level that is causing you to do too many
Py_DECREFs to the numpy objects, and this is being identified by the
arraydescr_dealloc() routine. (By the way, arraydescrs are the C-level
implementation of dtype objects.) Reading the comments just before
descriptor.c:1501 points out that this warning is being printed
because something is trying to deallocate the builtin np.dtype('d') ==
np.dtype('float64') dtype. This should never happen. The refcount for
these objects should always be > 0 because numpy itself holds
references to them.

I suspect that you are obtaining the numpy object (1 Py_INCREF) before
you split into multiple threads but releasing them in each thread
(multiple Py_DECREFs). This is probably being hidden from you by the
boost.python interface and/or the boost::detail::sp_counted_impl_p<>
smart(ish) pointer. Check the backtrace where your code starts to
verify if this looks to be the case.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From malcolm.reynolds at gmail.com  Wed Jan 18 11:14:32 2012
From: malcolm.reynolds at gmail.com (Malcolm Reynolds)
Date: Wed, 18 Jan 2012 16:14:32 +0000
Subject: [Numpy-discussion] "Reference count error detected" bug appears
 with multithreading (OpenMP & TBB)
In-Reply-To: <CAF6FJis8Avwb6+uuPPxHm--oaJM3ayyjYYdtud9+idLy4ZhyjA@mail.gmail.com>
References: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com>
	<CAF6FJis8Avwb6+uuPPxHm--oaJM3ayyjYYdtud9+idLy4ZhyjA@mail.gmail.com>
Message-ID: <CAO1Gn58gcaodhVfVma94m6GsVmVxCN6j6Lr2XujkUCcJYsNB_A@mail.gmail.com>

>
> I suspect that you are obtaining the numpy object (1 Py_INCREF) before
> you split into multiple threads but releasing them in each thread
> (multiple Py_DECREFs). This is probably being hidden from you by the
> boost.python interface and/or the boost::detail::sp_counted_impl_p<>
> smart(ish) pointer. Check the backtrace where your code starts to
> verify if this looks to be the case.

Thankyou for your quick reply. This makes a lot of sense, I'm just
having trouble seeing where this could be happening as everything I
pass into each parallel computation strand is pass down as either
pointer-to-consts or reference-to-const - the only things that need to
be modified (for example random number generator objects) are created
uniquely inside each iteration of the for loop so it can't be that.

This information about which object has the reference count problem
helps though, I will keep digging. I'm vaguely planning on trying to
track every incref and decref so I can pin down which object has an
unbalanced amount - to do this I want to know the address of the
array, rather than the associated datatype descriptor - I assume I
want to pay attention to the (self=0x117e0e850) in this line, and that
is the address of the array I am mishandling?

#1  0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271

Malcolm

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> ? -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From robert.kern at gmail.com  Wed Jan 18 11:54:53 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 18 Jan 2012 16:54:53 +0000
Subject: [Numpy-discussion] "Reference count error detected" bug appears
 with multithreading (OpenMP & TBB)
In-Reply-To: <CAO1Gn58gcaodhVfVma94m6GsVmVxCN6j6Lr2XujkUCcJYsNB_A@mail.gmail.com>
References: <CAO1Gn5-vViYf-uydUYEtEHBVAXhHmkrDNwW6350JUmSk0tbm2w@mail.gmail.com>
	<CAF6FJis8Avwb6+uuPPxHm--oaJM3ayyjYYdtud9+idLy4ZhyjA@mail.gmail.com>
	<CAO1Gn58gcaodhVfVma94m6GsVmVxCN6j6Lr2XujkUCcJYsNB_A@mail.gmail.com>
Message-ID: <CAF6FJiuCE_CUBmmehKgM+RDYWr_MfPdg7+k2jmWqYJmceX6hmg@mail.gmail.com>

On Wed, Jan 18, 2012 at 16:14, Malcolm Reynolds
<malcolm.reynolds at gmail.com> wrote:
>>
>> I suspect that you are obtaining the numpy object (1 Py_INCREF) before
>> you split into multiple threads but releasing them in each thread
>> (multiple Py_DECREFs). This is probably being hidden from you by the
>> boost.python interface and/or the boost::detail::sp_counted_impl_p<>
>> smart(ish) pointer. Check the backtrace where your code starts to
>> verify if this looks to be the case.
>
> Thankyou for your quick reply. This makes a lot of sense, I'm just
> having trouble seeing where this could be happening as everything I
> pass into each parallel computation strand is pass down as either
> pointer-to-consts or reference-to-const - the only things that need to
> be modified (for example random number generator objects) are created
> uniquely inside each iteration of the for loop so it can't be that.

My C++-fu is fairly weak, so I'm never really sure what the smart
pointers are doing when. If there are tracing features that you can
turn on, try that. Is this deallocation of the smart pointer to the
"garf::multivariate_normal<double> const" being done inside the loop
or outside back in the main thread? Where did it get created?

> This information about which object has the reference count problem
> helps though, I will keep digging. I'm vaguely planning on trying to
> track every incref and decref so I can pin down which object has an
> unbalanced amount - to do this I want to know the address of the
> array, rather than the associated datatype descriptor - I assume I
> want to pay attention to the (self=0x117e0e850) in this line, and that
> is the address of the array I am mishandling?
>
> #1 ?0x0000000102897fc4 in array_dealloc (self=0x117e0e850) at arrayobject.c:271

Yes.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From olivier.grisel at ensta.org  Wed Jan 18 14:54:12 2012
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Wed, 18 Jan 2012 20:54:12 +0100
Subject: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012
Message-ID: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com>

Hi all,

Just a quick email to advertise this year's PyCon tutorials as they
are very focused on HPC & data analytics. In particular the numpy /
scipy ecosystem is well covered, see:

  https://us.pycon.org/2012/schedule/tutorials/

Here is a selection of tutorials with an abstracts that mention numpy
or a related project (scipy, ipython, matplotlib...):

- Bayesian statistics made (as) simple (as possible) - Allen Downey
https://us.pycon.org/2012/schedule/presentation/10/

- IPython in-depth: high-productivity interactive and parallel python
- Fernando P?rez , Brian E. Granger , Min Ragan-Kelley
https://us.pycon.org/2012/schedule/presentation/121/

- Faster Python Programs through Optimization - Mike M?ller
https://us.pycon.org/2012/schedule/presentation/245/

- Graph Analysis from the Ground Up - Van Lindberg
https://us.pycon.org/2012/schedule/presentation/228/

- Data analysis in Python with pandas - Wes McKinney
https://us.pycon.org/2012/schedule/presentation/427/

- Social Network Analysis with Python - Maksim Tsvetovat
https://us.pycon.org/2012/schedule/presentation/15/

- High Performance Python I - Ian Ozsvald
https://us.pycon.org/2012/schedule/presentation/174/

- Plotting with matplotlib - Mike M?ller
https://us.pycon.org/2012/schedule/presentation/238/

- Introduction to Interactive Predictive Analytics in Python with
scikit-learn - Olivier Grisel
https://us.pycon.org/2012/schedule/presentation/195/

- High Performance Python II - Travis Oliphant
https://us.pycon.org/2012/schedule/presentation/343/

Also the main conference has also very interesting talks:

  https://us.pycon.org/2012/schedule/

The early birds rate for the PyCOn ends on Jan 25.

See you in PyCon in March,

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel


From fperez.net at gmail.com  Wed Jan 18 17:44:55 2012
From: fperez.net at gmail.com (Fernando Perez)
Date: Wed, 18 Jan 2012 14:44:55 -0800
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
Message-ID: <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>

On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair
<scott.sinclair.za at gmail.com> wrote:
> It's rather confusing having two websites. The "official" page at
> http://www.scipy.org/Download points to github.

The problem is that this page, which looks pretty official to just about anyone:

http://numpy.scipy.org/

takes you to the one at new.scipy...  So as far as traps for the
unwary go, this one was pretty cleverly laid out ;)

Best,

f


From chaoyuejoy at gmail.com  Wed Jan 18 17:51:50 2012
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Wed, 18 Jan 2012 23:51:50 +0100
Subject: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012
In-Reply-To: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com>
References: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com>
Message-ID: <CAAN-aRGLV8EPFWpKXC25whbuOLj5cbDK-R1RCgtPeFOMAVWLvw@mail.gmail.com>

Does anybody know if there is similar chance for training in Paris? (or
other places of France)/
the price is nice, just because it's in US....

thanks,

Chao

2012/1/18 Olivier Grisel <olivier.grisel at ensta.org>

> Hi all,
>
> Just a quick email to advertise this year's PyCon tutorials as they
> are very focused on HPC & data analytics. In particular the numpy /
> scipy ecosystem is well covered, see:
>
>  https://us.pycon.org/2012/schedule/tutorials/
>
> Here is a selection of tutorials with an abstracts that mention numpy
> or a related project (scipy, ipython, matplotlib...):
>
> - Bayesian statistics made (as) simple (as possible) - Allen Downey
> https://us.pycon.org/2012/schedule/presentation/10/
>
> - IPython in-depth: high-productivity interactive and parallel python
> - Fernando P?rez , Brian E. Granger , Min Ragan-Kelley
> https://us.pycon.org/2012/schedule/presentation/121/
>
> - Faster Python Programs through Optimization - Mike M?ller
> https://us.pycon.org/2012/schedule/presentation/245/
>
> - Graph Analysis from the Ground Up - Van Lindberg
> https://us.pycon.org/2012/schedule/presentation/228/
>
> - Data analysis in Python with pandas - Wes McKinney
> https://us.pycon.org/2012/schedule/presentation/427/
>
> - Social Network Analysis with Python - Maksim Tsvetovat
> https://us.pycon.org/2012/schedule/presentation/15/
>
> - High Performance Python I - Ian Ozsvald
> https://us.pycon.org/2012/schedule/presentation/174/
>
> - Plotting with matplotlib - Mike M?ller
> https://us.pycon.org/2012/schedule/presentation/238/
>
> - Introduction to Interactive Predictive Analytics in Python with
> scikit-learn - Olivier Grisel
> https://us.pycon.org/2012/schedule/presentation/195/
>
> - High Performance Python II - Travis Oliphant
> https://us.pycon.org/2012/schedule/presentation/343/
>
> Also the main conference has also very interesting talks:
>
>  https://us.pycon.org/2012/schedule/
>
> The early birds rate for the PyCOn ends on Jan 25.
>
> See you in PyCon in March,
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/7d8bed27/attachment.html>

From scott.sinclair.za at gmail.com  Thu Jan 19 01:19:24 2012
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Thu, 19 Jan 2012 08:19:24 +0200
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
Message-ID: <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>

On 19 January 2012 00:44, Fernando Perez <fperez.net at gmail.com> wrote:
> On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair
> <scott.sinclair.za at gmail.com> wrote:
>> It's rather confusing having two websites. The "official" page at
>> http://www.scipy.org/Download points to github.
>
> The problem is that this page, which looks pretty official to just about anyone:
>
> http://numpy.scipy.org/
>
> takes you to the one at new.scipy... ?So as far as traps for the
> unwary go, this one was pretty cleverly laid out ;)

It certainly is.

I think (as usual), the problem is that fixing the situation lies on
the shoulders of people who are already heavily overburdened..

There is a pull request updating the offending page at
https://github.com/scipy/scipy.org-new/pull/1 if any overburdened
types feel like merging, building and uploading the revised html.

Cheers,
Scott


From fperez.net at gmail.com  Thu Jan 19 01:39:29 2012
From: fperez.net at gmail.com (Fernando Perez)
Date: Wed, 18 Jan 2012 22:39:29 -0800
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
Message-ID: <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>

On Wed, Jan 18, 2012 at 10:19 PM, Scott Sinclair
<scott.sinclair.za at gmail.com> wrote:
> I think (as usual), the problem is that fixing the situation lies on
> the shoulders of people who are already heavily overburdened..

I certainly understand that problem, as I'm eternally behind on a
million things regarding ipython.

But the only solution to these problems is delegation, not asking the
already overburdened few to work even harder than they already do.  I
wonder if we could distribute the process of managing the websites a
little more for numpy/scipy, so this didn't bottleneck as much.

Furthermore, managing those is the kind of task that can be
accomplished by someone who may not feel comfortable touching the
numpy C core, and yet it's a *great* way to help the project out.

In ipython, we've moved to github-pages hosting for everything, which
means that now having a web team is as easy as clicking on the github
interface a couple of times, and that's one more task we can get help
on from others.   In fairness, right now the ipython-web team is the
same people as the core, but at least things are in place to accept
new hands helping should they become available, without any conflict
with core development.

Just a thought.

Cheers,

f


From staticfloat at gmail.com  Thu Jan 19 01:50:04 2012
From: staticfloat at gmail.com (Elliot Saba)
Date: Wed, 18 Jan 2012 22:50:04 -0800
Subject: [Numpy-discussion] Cross-covariance function
Message-ID: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com>

Greetings,

I recently needed to calculate the cross-covariance of two random vectors,
(e.g. I have two matricies, X and Y, the columns of which are observations
of one variable, and I wish to generate a matrix pairing each value of X
and Y) and so I wrote a small utility function to do so, and I'd like to
try and get it integrated into numpy core, if it is deemed useful.

I have never submitted a patch to numpy before, so I'm not sure as to the
protocol; do I ask someone on this list to review the code?  Are there
conventions I should be aware of?  Etc...

Thank you all,
-E
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120118/b9c4abf7/attachment.html>

From d.s.seljebotn at astro.uio.no  Thu Jan 19 04:11:27 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Thu, 19 Jan 2012 10:11:27 +0100
Subject: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012
In-Reply-To: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com>
References: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com>
Message-ID: <4F17DE3F.7070104@astro.uio.no>

On 01/18/2012 08:54 PM, Olivier Grisel wrote:
> Hi all,
>
> Just a quick email to advertise this year's PyCon tutorials as they
> are very focused on HPC&  data analytics. In particular the numpy /
> scipy ecosystem is well covered, see:
>
>    https://us.pycon.org/2012/schedule/tutorials/
>
> Here is a selection of tutorials with an abstracts that mention numpy
> or a related project (scipy, ipython, matplotlib...):
>
> - Bayesian statistics made (as) simple (as possible) - Allen Downey
> https://us.pycon.org/2012/schedule/presentation/10/
>
> - IPython in-depth: high-productivity interactive and parallel python
> - Fernando P?rez , Brian E. Granger , Min Ragan-Kelley
> https://us.pycon.org/2012/schedule/presentation/121/
>
> - Faster Python Programs through Optimization - Mike M?ller
> https://us.pycon.org/2012/schedule/presentation/245/
>
> - Graph Analysis from the Ground Up - Van Lindberg
> https://us.pycon.org/2012/schedule/presentation/228/
>
> - Data analysis in Python with pandas - Wes McKinney
> https://us.pycon.org/2012/schedule/presentation/427/
>
> - Social Network Analysis with Python - Maksim Tsvetovat
> https://us.pycon.org/2012/schedule/presentation/15/
>
> - High Performance Python I - Ian Ozsvald
> https://us.pycon.org/2012/schedule/presentation/174/
>
> - Plotting with matplotlib - Mike M?ller
> https://us.pycon.org/2012/schedule/presentation/238/
>
> - Introduction to Interactive Predictive Analytics in Python with
> scikit-learn - Olivier Grisel
> https://us.pycon.org/2012/schedule/presentation/195/
>
> - High Performance Python II - Travis Oliphant
> https://us.pycon.org/2012/schedule/presentation/343/
>

Also two of the Cython devs (me and Mark Florisson) will attend with a 
poster on Cython.

Dag Sverre


From olivier.grisel at ensta.org  Thu Jan 19 04:22:44 2012
From: olivier.grisel at ensta.org (Olivier Grisel)
Date: Thu, 19 Jan 2012 10:22:44 +0100
Subject: [Numpy-discussion] NumPy / SciPy related tutorials at PyCon 2012
In-Reply-To: <CAAN-aRGLV8EPFWpKXC25whbuOLj5cbDK-R1RCgtPeFOMAVWLvw@mail.gmail.com>
References: <CAFvE7K53yq4qXPrPVGKcZRGO3Su7UgAfLKM_9TPLmb9t+Zd2MA@mail.gmail.com>
	<CAAN-aRGLV8EPFWpKXC25whbuOLj5cbDK-R1RCgtPeFOMAVWLvw@mail.gmail.com>
Message-ID: <CAFvE7K5fexJ1MMfyL7pzTrsOOEKf6S6U-6WX=tif+qL0Pw6TtA@mail.gmail.com>

2012/1/18 Chao YUE <chaoyuejoy at gmail.com>:
> Does anybody know if there is similar chance for training in Paris? (or
> other places of France)/
> the price is nice, just because it's in US....

The next EuroScipy will take place in Brussels. Just 1h25m train ride
from Paris.

  http://www.euroscipy.org/conference/euroscipy2012

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel


From markbak at gmail.com  Thu Jan 19 04:37:00 2012
From: markbak at gmail.com (Mark Bakker)
Date: Thu, 19 Jan 2012 10:37:00 +0100
Subject: [Numpy-discussion] swaxes(0,
	1) 10% faster than transpose on 2D matrix?
Message-ID: <CAEX=yaZ9AEvP_71secpnBVLSdewANtm41J5hAiE5CypsYMYfQA@mail.gmail.com>

Hello List,

I noticed that swapaxes(0,1) is consistently (on my system) 10% faster than
transpose on a 2D matrix.

Any reason why? Any reason why the swapaxes algorithm is not used in
transpose?

Just wondering. Thanks,

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120119/00cfbeba/attachment.html>

From travis at continuum.io  Thu Jan 19 12:21:31 2012
From: travis at continuum.io (Travis Oliphant)
Date: Thu, 19 Jan 2012 11:21:31 -0600
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
Message-ID: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>

I think the problem here is one of delegation and information. 

I'm not even sure how the web-pages get updated at this point.   Does anyone on this list know?    I think it would be a great idea to move to github pages for the NumPy project at least. 

-Travis

On Jan 19, 2012, at 12:39 AM, Fernando Perez wrote:

> On Wed, Jan 18, 2012 at 10:19 PM, Scott Sinclair
> <scott.sinclair.za at gmail.com> wrote:
>> I think (as usual), the problem is that fixing the situation lies on
>> the shoulders of people who are already heavily overburdened..
> 
> I certainly understand that problem, as I'm eternally behind on a
> million things regarding ipython.
> 
> But the only solution to these problems is delegation, not asking the
> already overburdened few to work even harder than they already do.  I
> wonder if we could distribute the process of managing the websites a
> little more for numpy/scipy, so this didn't bottleneck as much.
> 
> Furthermore, managing those is the kind of task that can be
> accomplished by someone who may not feel comfortable touching the
> numpy C core, and yet it's a *great* way to help the project out.
> 
> In ipython, we've moved to github-pages hosting for everything, which
> means that now having a web team is as easy as clicking on the github
> interface a couple of times, and that's one more task we can get help
> on from others.   In fairness, right now the ipython-web team is the
> same people as the core, but at least things are in place to accept
> new hands helping should they become available, without any conflict
> with core development.
> 
> Just a thought.
> 
> Cheers,
> 
> f
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From pav at iki.fi  Thu Jan 19 12:57:40 2012
From: pav at iki.fi (Pauli Virtanen)
Date: Thu, 19 Jan 2012 18:57:40 +0100
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
	<031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
Message-ID: <jf9lik$lu4$1@dough.gmane.org>

19.01.2012 18:21, Travis Oliphant kirjoitti:
> I think the problem here is one of delegation and information. 
> 
> I'm not even sure how the web-pages get updated at this point.
> Does anyone on this list know? I think it would be a great idea
> to move to github pages for the NumPy project at least.

The main scipy.org web page is the wiki. I'm not sure who apart from
Enthought's IT staff has access to the machine running it.

The pages at numpy.scipy.org and new.scipy.org are hosted on
new.scipy.org as static files -- they're just generated by sphinx and
uploaded manually. In addition to that, the machine also runs the Trac,
the doc editor, and the conference.scipy.org and docs.scipy.org
websites. A couple of people (including at least me and Jarrod +
Enthought IT staff) have access to that machine.

Moving the stuff at numpy.scipy.org to Github pages would make sense, as
those are only static files.

IMO, the stuff at new.scipy.org should be taken down --- the idea was to
revise the scipy.org front page during Scipy '09 conference, and make it
rely less on the wiki, but the work was not finished. I think I don't
have the necessary unix permissions to put the site down or edit it,
though.

	Pauli


From kwgoodman at gmail.com  Thu Jan 19 13:53:17 2012
From: kwgoodman at gmail.com (Keith Goodman)
Date: Thu, 19 Jan 2012 10:53:17 -0800
Subject: [Numpy-discussion] swaxes(0,
	1) 10% faster than transpose on 2D matrix?
In-Reply-To: <CAEX=yaZ9AEvP_71secpnBVLSdewANtm41J5hAiE5CypsYMYfQA@mail.gmail.com>
References: <CAEX=yaZ9AEvP_71secpnBVLSdewANtm41J5hAiE5CypsYMYfQA@mail.gmail.com>
Message-ID: <CAB6Y534P757tFKgcjTw4=RU0wCJYuLy2EryenEDYZpmgp_d_Vg@mail.gmail.com>

On Thu, Jan 19, 2012 at 1:37 AM, Mark Bakker <markbak at gmail.com> wrote:

> I noticed that swapaxes(0,1) is consistently (on my system) 10% faster than
> transpose on a 2D matrix.

Transpose is faster for me. And a.T is faster than a.transpose()
perhaps because a.transpose() checks that the inputs make sense? My
guess is that they all do the same thing. It's just a matter of which
function has the least overhead.

I[10] a = np.random.rand(1000,1000)
I[11] timeit a.T
10000000 loops, best of 3: 153 ns per loop
I[12] timeit a.transpose()
10000000 loops, best of 3: 171 ns per loop
I[13] timeit a.swapaxes(0,1)
1000000 loops, best of 3: 227 ns per loop
I[14] timeit np.transpose(a)
1000000 loops, best of 3: 308 ns per loop


From fperez.net at gmail.com  Thu Jan 19 14:48:12 2012
From: fperez.net at gmail.com (Fernando Perez)
Date: Thu, 19 Jan 2012 11:48:12 -0800
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
	<031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
Message-ID: <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com>

On Thu, Jan 19, 2012 at 9:21 AM, Travis Oliphant <travis at continuum.io> wrote:
> I'm not even sure how the web-pages get updated at this point. ? Does anyone on this list know? ? ?I think it would be a great idea to move to github pages for the NumPy project at least.

We've moved to the following setup with ipython, which works very well
for us so far:

1. ipython.org: Main website with only static content, manged as a
repo in github (https://github.com/ipython/ipython-website) and
updated with a gh-pages build
(https://github.com/ipython/ipython.github.com).

2. wiki.ipython.org: a mediawiki instance we run on a server I
personally pay for.

3. archive.ipython.org: static hosting of content such as downloads of
release candidates, same server as #2.  We also keep main releases
here as an alternative, but I think most people get the releases from
pypi these days.

With this setup, the only thing that requires actual ssh access is #3,
and I simply uploaded the keys of a few developers to that server.
But having to upload content there is fairly rare, and the large
majority of content that needs update lives in #1 and #2, both of
which have access control mechanisms that make job delegation
extremely easy.

At this point, our only real bottleneck is that I'm still the sole
release manager so far.  But now that we're hitting a more regular
release pace I plan to change that soon, and start rotating this job
too, so it doesn't depend on my time.  We used to release so
infrequently that this wasn't really an issue, and the 0.11 release
was so big that I wouldn't foist it on anyone else (it took ~2 weeks
just to do the release work), but moving forward this job should also
be easy to delegate and we'll do so soon.

I'm happy to share any other details that may help smooth out the
workflow for numpy and scipy.  I certainly think that the current
setup with a very outdated wiki as the main site and a
new-but-semi-invalid rst one needs fixing; it's kind of a shame to
have the crown jewels of the scientific python ecosystem with such a
poor web presence.  But fortunately the problem isn't too hard to fix
these days (the github machinery really plays a key part in helping
here).

Cheers,

f


From ognen at enthought.com  Thu Jan 19 15:14:05 2012
From: ognen at enthought.com (Ognen Duzlevski)
Date: Thu, 19 Jan 2012 14:14:05 -0600
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
	<031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
	<CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com>
Message-ID: <CAA6U3WC--PC9BsCbhRR6WQeGTeg9eyJjk32D8d_ejMZS=xCgZQ@mail.gmail.com>

On Thu, Jan 19, 2012 at 1:48 PM, Fernando Perez <fperez.net at gmail.com> wrote:
> On Thu, Jan 19, 2012 at 9:21 AM, Travis Oliphant <travis at continuum.io> wrote:
>> I'm not even sure how the web-pages get updated at this point. ? Does anyone on this list know? ? ?I think it would be a great idea to move to github pages for the NumPy project at least.
>
> We've moved to the following setup with ipython, which works very well
> for us so far:
>
> 1. ipython.org: Main website with only static content, manged as a
> repo in github (https://github.com/ipython/ipython-website) and
> updated with a gh-pages build
> (https://github.com/ipython/ipython.github.com).
>
> 2. wiki.ipython.org: a mediawiki instance we run on a server I
> personally pay for.
>
> 3. archive.ipython.org: static hosting of content such as downloads of
> release candidates, same server as #2. ?We also keep main releases
> here as an alternative, but I think most people get the releases from
> pypi these days.
>
> With this setup, the only thing that requires actual ssh access is #3,
> and I simply uploaded the keys of a few developers to that server.
> But having to upload content there is fairly rare, and the large
> majority of content that needs update lives in #1 and #2, both of
> which have access control mechanisms that make job delegation
> extremely easy.
>
> At this point, our only real bottleneck is that I'm still the sole
> release manager so far. ?But now that we're hitting a more regular
> release pace I plan to change that soon, and start rotating this job
> too, so it doesn't depend on my time. ?We used to release so
> infrequently that this wasn't really an issue, and the 0.11 release
> was so big that I wouldn't foist it on anyone else (it took ~2 weeks
> just to do the release work), but moving forward this job should also
> be easy to delegate and we'll do so soon.
>
> I'm happy to share any other details that may help smooth out the
> workflow for numpy and scipy. ?I certainly think that the current
> setup with a very outdated wiki as the main site and a
> new-but-semi-invalid rst one needs fixing; it's kind of a shame to
> have the crown jewels of the scientific python ecosystem with such a
> poor web presence. ?But fortunately the problem isn't too hard to fix
> these days (the github machinery really plays a key part in helping
> here).

ipython.org used to live on scipy.org machine - as far as I can tell
the only thing still on the scipy.org machine related to ipython are
the dev and user mailing lists (via mailman) hosted at
projects.scipy.org.

Ognen


From ognen at enthought.com  Thu Jan 19 15:18:03 2012
From: ognen at enthought.com (Ognen Duzlevski)
Date: Thu, 19 Jan 2012 14:18:03 -0600
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <jf9lik$lu4$1@dough.gmane.org>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
	<031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
	<jf9lik$lu4$1@dough.gmane.org>
Message-ID: <CAA6U3WDejzJzNzAyws7PxGCk=ngbXcxLRekOXc8C1yeC=-0NyA@mail.gmail.com>

On Thu, Jan 19, 2012 at 11:57 AM, Pauli Virtanen <pav at iki.fi> wrote:
> 19.01.2012 18:21, Travis Oliphant kirjoitti:
>> I think the problem here is one of delegation and information.
>>
>> I'm not even sure how the web-pages get updated at this point.
>> Does anyone on this list know? I think it would be a great idea
>> to move to github pages for the NumPy project at least.
>
> The main scipy.org web page is the wiki. I'm not sure who apart from
> Enthought's IT staff has access to the machine running it.

This machine is slated to move to Amazon EC2 no later than end of
March. I am doing it myself. The problem I ran into is one of
accumulated crust (for lack of better expression). There are a zillion
apache .conf files and virtual www sites hosted off that box, just
deciding what it still live and what is not is a big task (I don't
want to shut something off by accident).

I would personally be in favour of moving as much as we can to github
or whatever other place you may think of. The current scipy.org
machine bogs down randomly and apache needs a kick almost daily.
Whenever I log into the box to restart it - the load is in the 17-20
range. The scipy.org machine is actually an OpenVZ container living on
an underpowered (imho) linux box. Hence, I decided to get a large
Amazon instance with plenty of memory. At the same time, this is the
perfect opportunity for cleanup. If someone is willing to assist me, I
have no problems getting more involved into moving things and
reorganizing them.

Ognen


From fperez.net at gmail.com  Thu Jan 19 15:37:34 2012
From: fperez.net at gmail.com (Fernando Perez)
Date: Thu, 19 Jan 2012 12:37:34 -0800
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CAA6U3WC--PC9BsCbhRR6WQeGTeg9eyJjk32D8d_ejMZS=xCgZQ@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
	<031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
	<CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com>
	<CAA6U3WC--PC9BsCbhRR6WQeGTeg9eyJjk32D8d_ejMZS=xCgZQ@mail.gmail.com>
Message-ID: <CAHAreOq=LOiENAWY83GqAKC-q72LQWtyQf2JjaFR0JqEqdAGOg@mail.gmail.com>

On Thu, Jan 19, 2012 at 12:14 PM, Ognen Duzlevski <ognen at enthought.com> wrote:
> ipython.org used to live on scipy.org machine - as far as I can tell
> the only thing still on the scipy.org machine related to ipython are
> the dev and user mailing lists (via mailman) hosted at
> projects.scipy.org.

Yup, we've now moved everything but the mailing lists (as you point
out next, the load on that box was so awful all the time that trying
to use it for anything was nothing but pain).

Technically, when we were on scipy our domain was ipython.scipy.org,
the ipython.org domain has been from the start hosted outside of the
Enthought infrastructure; but that's just a nitpick :)

Thanks for tackling the problem of cleaning up all that accumulated
cruft, and for the always responsive support you gave us in the past.

Cheers,

f


From ruby185 at gmail.com  Thu Jan 19 23:28:08 2012
From: ruby185 at gmail.com (Ruby Stevenson)
Date: Thu, 19 Jan 2012 23:28:08 -0500
Subject: [Numpy-discussion] getting position index from array
Message-ID: <CAA=a5iNw+C4nNp6uFTvtuWz_RA_cBZzAVqd=wvprY8drcfR6+w@mail.gmail.com>

hi, all

I am a newbie on numpy ... I am trying to figure out, given an array,
how to get back position value based on some conditions.
Say, array([1, 0, 0, 0 1], and I want to get a list of indices where
it is none-zero, [ 0 , 4 ]

The closest thing I can find from the doc is select(), but I can't
figure out how to use it properly.

Thanks for your help.

Ruby


From ben.root at ou.edu  Thu Jan 19 23:33:02 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Thu, 19 Jan 2012 22:33:02 -0600
Subject: [Numpy-discussion] getting position index from array
In-Reply-To: <CAA=a5iNw+C4nNp6uFTvtuWz_RA_cBZzAVqd=wvprY8drcfR6+w@mail.gmail.com>
References: <CAA=a5iNw+C4nNp6uFTvtuWz_RA_cBZzAVqd=wvprY8drcfR6+w@mail.gmail.com>
Message-ID: <CANNq6FnS5OGtmyVVcCVo8x+6o5HuVCSijTCUa5Sb+3Qk7SAUQA@mail.gmail.com>

On Thursday, January 19, 2012, Ruby Stevenson <ruby185 at gmail.com> wrote:
> hi, all
>
> I am a newbie on numpy ... I am trying to figure out, given an array,
> how to get back position value based on some conditions.
> Say, array([1, 0, 0, 0 1], and I want to get a list of indices where
> it is none-zero, [ 0 , 4 ]
>
> The closest thing I can find from the doc is select(), but I can't
> figure out how to use it properly.
>
> Thanks for your help.
>
> Ruby
>

np.nonzero()

Note that you typically use it with a Boolean array result like "a >= 4".
 Also note that it returns a tuple of index lists, on for each dimension.
 This can the be feed back into the array to get the values as a flat array.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120119/5fbd06f0/attachment.html>

From scott.sinclair.za at gmail.com  Fri Jan 20 02:49:13 2012
From: scott.sinclair.za at gmail.com (Scott Sinclair)
Date: Fri, 20 Jan 2012 09:49:13 +0200
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
	<031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
	<CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com>
Message-ID: <CA+nsYDtT1SS2UYCgmaxHzTZBmcQfV-02tyBX=QHDcL1+QaLrSA@mail.gmail.com>

On 19 January 2012 21:48, Fernando Perez <fperez.net at gmail.com> wrote:
> We've moved to the following setup with ipython, which works very well
> for us so far:
>
> 1. ipython.org: Main website with only static content, manged as a
> repo in github (https://github.com/ipython/ipython-website) and
> updated with a gh-pages build
> (https://github.com/ipython/ipython.github.com).

I like this idea, and to get the ball rolling I've stripped out the
www directory of the scipy.org-new repo into it's own repository using
git filter-branch (posted here:
https://github.com/scottza/scipy_website) and created
https://github.com/scottza/scottza.github.com. This puts a copy of the
new scipy website at http://scottza.github.com as a proof of concept.

Since there seems to be some agreement on rehosting numpy's website on
github, I'd be happy to do as much of the legwork as I can in getting
the numpy.scipy.org content hosted at numpy.github.com. I don't have
permission to create new repos for the Numpy organization, so someone
would have to create an empty
https://github.com/numpy/numpy.github.com and give me push permission
on that repo.

It would be great to see scipy go the same way and make updating the
site easier. I know that David Warde-Farley, Pauli and others put in a
lot of work scraping content off the wiki to produce the new website,
it would be fantastic to see the fruits of that effort.

Issues with scipy "Trac, the doc editor, and the conference.scipy.org
and docs.scipy.org" as mentioned by Pauli. There is also the cookbook
on the wiki to consider (perhaps http://scipy-central.org/ could play
a role there).

Cheers,
Scott


From valentin.haenel at epfl.ch  Fri Jan 20 05:25:36 2012
From: valentin.haenel at epfl.ch (=?iso-8859-1?Q?H=E4nel?= Nikolaus Valentin)
Date: Fri, 20 Jan 2012 11:25:36 +0100
Subject: [Numpy-discussion] (no subject)
Message-ID: <20120120102536.GB18683@kudu.in-berlin.de>

Hi,

I would like to make a sanity test to check that calling the same
function with different parameters actually gives different results.

I am currently using::

    try:
        npt.assert_almost_equal(numpy_result, result)
    except AssertionError:
        assert True
    else:
        assert False

But maybe you have a better way? I couldn't find a 'assert_not_equal'
and the above just feels stupid.

thanks for your advice.

V-

-- 
Valentin H?nel
Scientific Software Developer
Blue Brain Project http://bluebrain.epfl.ch/


From shish at keba.be  Fri Jan 20 06:53:04 2012
From: shish at keba.be (Olivier Delalleau)
Date: Fri, 20 Jan 2012 06:53:04 -0500
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <20120120102536.GB18683@kudu.in-berlin.de>
References: <20120120102536.GB18683@kudu.in-berlin.de>
Message-ID: <CAFXk4bp2nnZRh9v7kMdaSz=-gEfN61qbUpd+e5W4a=qb2yuWaQ@mail.gmail.com>

Not sure if there's a better way, but you can do it with

assert not numpy.allclose(numpy_result, result)

-=- Olivier

2012/1/20 H?nel Nikolaus Valentin <valentin.haenel at epfl.ch>

> Hi,
>
> I would like to make a sanity test to check that calling the same
> function with different parameters actually gives different results.
>
> I am currently using::
>
>    try:
>        npt.assert_almost_equal(numpy_result, result)
>    except AssertionError:
>        assert True
>    else:
>        assert False
>
> But maybe you have a better way? I couldn't find a 'assert_not_equal'
> and the above just feels stupid.
>
> thanks for your advice.
>
> V-
>
> --
> Valentin H?nel
> Scientific Software Developer
> Blue Brain Project http://bluebrain.epfl.ch/
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120120/e1592eeb/attachment.html>

From david.verelst at gmail.com  Fri Jan 20 06:53:32 2012
From: david.verelst at gmail.com (David Verelst)
Date: Fri, 20 Jan 2012 12:53:32 +0100
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <CA+nsYDtT1SS2UYCgmaxHzTZBmcQfV-02tyBX=QHDcL1+QaLrSA@mail.gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
	<031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
	<CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com>
	<CA+nsYDtT1SS2UYCgmaxHzTZBmcQfV-02tyBX=QHDcL1+QaLrSA@mail.gmail.com>
Message-ID: <4F1955BC.4000207@gmail.com>

I would like to assist on the website. Although I have not made any code 
contributions to Numpy/SciPy (yet), I do follow the mailing lists and 
try to keep up to date on the scientific python scene. However, I need 
to hold my breath until the end of my wind tunnel test campaign mid 
February.

And I do like the sound of the gihub workflow as currently done by the 
ipython team.

Regards,
David

On 20/01/12 08:49, Scott Sinclair wrote:
> On 19 January 2012 21:48, Fernando Perez<fperez.net at gmail.com>  wrote:
>> We've moved to the following setup with ipython, which works very well
>> for us so far:
>>
>> 1. ipython.org: Main website with only static content, manged as a
>> repo in github (https://github.com/ipython/ipython-website) and
>> updated with a gh-pages build
>> (https://github.com/ipython/ipython.github.com).
> I like this idea, and to get the ball rolling I've stripped out the
> www directory of the scipy.org-new repo into it's own repository using
> git filter-branch (posted here:
> https://github.com/scottza/scipy_website) and created
> https://github.com/scottza/scottza.github.com. This puts a copy of the
> new scipy website at http://scottza.github.com as a proof of concept.
>
> Since there seems to be some agreement on rehosting numpy's website on
> github, I'd be happy to do as much of the legwork as I can in getting
> the numpy.scipy.org content hosted at numpy.github.com. I don't have
> permission to create new repos for the Numpy organization, so someone
> would have to create an empty
> https://github.com/numpy/numpy.github.com and give me push permission
> on that repo.
>
> It would be great to see scipy go the same way and make updating the
> site easier. I know that David Warde-Farley, Pauli and others put in a
> lot of work scraping content off the wiki to produce the new website,
> it would be fantastic to see the fruits of that effort.
>
> Issues with scipy "Trac, the doc editor, and the conference.scipy.org
> and docs.scipy.org" as mentioned by Pauli. There is also the cookbook
> on the wiki to consider (perhaps http://scipy-central.org/ could play
> a role there).
>
> Cheers,
> Scott
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From valentin.haenel at epfl.ch  Fri Jan 20 07:27:36 2012
From: valentin.haenel at epfl.ch (=?iso-8859-1?Q?H=E4nel?= Nikolaus Valentin)
Date: Fri, 20 Jan 2012 13:27:36 +0100
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <CAFXk4bp2nnZRh9v7kMdaSz=-gEfN61qbUpd+e5W4a=qb2yuWaQ@mail.gmail.com>
References: <20120120102536.GB18683@kudu.in-berlin.de>
	<CAFXk4bp2nnZRh9v7kMdaSz=-gEfN61qbUpd+e5W4a=qb2yuWaQ@mail.gmail.com>
Message-ID: <20120120122736.GD18683@kudu.in-berlin.de>

* Olivier Delalleau <shish at keba.be> [120120]:
> Not sure if there's a better way, but you can do it with
> 
> assert not numpy.allclose(numpy_result, result)

Okay, thats already better than what I have.

thanks

V-


From pierre.haessig at crans.org  Fri Jan 20 07:39:00 2012
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Fri, 20 Jan 2012 13:39:00 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com>
References: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com>
Message-ID: <4F196064.5090508@crans.org>

Hi Eliot,

Le 19/01/2012 07:50, Elliot Saba a ?crit :
> I recently needed to calculate the cross-covariance of two random 
> vectors, (e.g. I have two matricies, X and Y, the columns of which are 
> observations of one variable, and I wish to generate a matrix pairing 
> each value of X and Y) 
I don't see how does your function relates to numpy.cov [1]. Is it an 
"extended case" function or is there a difference in the underlying math ?

Best,
Pierre

[1] numpy.cov docstring : 
http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html


From ruby185 at gmail.com  Fri Jan 20 09:21:10 2012
From: ruby185 at gmail.com (Ruby Stevenson)
Date: Fri, 20 Jan 2012 09:21:10 -0500
Subject: [Numpy-discussion] getting position index from array
In-Reply-To: <CANNq6FnS5OGtmyVVcCVo8x+6o5HuVCSijTCUa5Sb+3Qk7SAUQA@mail.gmail.com>
References: <CAA=a5iNw+C4nNp6uFTvtuWz_RA_cBZzAVqd=wvprY8drcfR6+w@mail.gmail.com>
	<CANNq6FnS5OGtmyVVcCVo8x+6o5HuVCSijTCUa5Sb+3Qk7SAUQA@mail.gmail.com>
Message-ID: <CAA=a5iNWKBfgjLF9FYLkJU5KunxMr0B0__XSynCqs9zQReg=fw@mail.gmail.com>

Exactly what I need - thank you very much.

Ruby

On Thu, Jan 19, 2012 at 11:33 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>
> On Thursday, January 19, 2012, Ruby Stevenson <ruby185 at gmail.com> wrote:
>> hi, all
>>
>> I am a newbie on numpy ... I am trying to figure out, given an array,
>> how to get back position value based on some conditions.
>> Say, array([1, 0, 0, 0 1], and I want to get a list of indices where
>> it is none-zero, [ 0 , 4 ]
>>
>> The closest thing I can find from the doc is select(), but I can't
>> figure out how to use it properly.
>>
>> Thanks for your help.
>>
>> Ruby
>>
>
> np.nonzero()
>
> Note that you typically use it with a Boolean array result like "a >= 4".
> ?Also note that it returns a tuple of index lists, on for each dimension.
> ?This can the be feed back into the array to get the values as a flat array.
>
> Ben Root
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From ruby185 at gmail.com  Fri Jan 20 09:41:30 2012
From: ruby185 at gmail.com (Ruby Stevenson)
Date: Fri, 20 Jan 2012 09:41:30 -0500
Subject: [Numpy-discussion] condense array along one dimension
Message-ID: <CAA=a5iPGU5+jxRbsgbrfjn83OZQ_eRJWzPGjp+3P+cSS7A9GbA@mail.gmail.com>

hi, all

Say I have a three dimension array, X, Y, Z,  how can I condense into
two dimensions: for example, compute 2-D array with (X, Z) and
summarize along Y dimensions ... is it possible?

thanks

Ruby


From shish at keba.be  Fri Jan 20 09:50:30 2012
From: shish at keba.be (Olivier Delalleau)
Date: Fri, 20 Jan 2012 09:50:30 -0500
Subject: [Numpy-discussion] condense array along one dimension
In-Reply-To: <CAA=a5iPGU5+jxRbsgbrfjn83OZQ_eRJWzPGjp+3P+cSS7A9GbA@mail.gmail.com>
References: <CAA=a5iPGU5+jxRbsgbrfjn83OZQ_eRJWzPGjp+3P+cSS7A9GbA@mail.gmail.com>
Message-ID: <CAFXk4bp7X1H9Qe3-jb5AMCL_ufuK5ASx0emczpLfFYV5f6r7ZQ@mail.gmail.com>

What do you mean by "summarize"?
If for instance you want to sum along Y, just do
  my_array.sum(axis=1)

-=- Olivier

2012/1/20 Ruby Stevenson <ruby185 at gmail.com>

> hi, all
>
> Say I have a three dimension array, X, Y, Z,  how can I condense into
> two dimensions: for example, compute 2-D array with (X, Z) and
> summarize along Y dimensions ... is it possible?
>
> thanks
>
> Ruby
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120120/c12b180c/attachment.html>

From sturla at molden.no  Fri Jan 20 10:30:42 2012
From: sturla at molden.no (Sturla Molden)
Date: Fri, 20 Jan 2012 16:30:42 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <4F196064.5090508@crans.org>
References: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com>
	<4F196064.5090508@crans.org>
Message-ID: <4F1988A2.5000701@molden.no>

Den 20.01.2012 13:39, skrev Pierre Haessig:
> I don't see how does your function relates to numpy.cov [1]. Is it an
> "extended case" function or is there a difference in the underlying math ?
>

If X is rank n x p, then np.cov(X, rowvar=False) is equal
to S after

     cX = X - X.mean(axis=0)[np.newaxis,:]
     S = np.dot(cX.T, cX)/(n-1.)

If we also have Y rank n x p, then the upper p x p
quadrant of

     np.cov(X, y=Y, rowvar=False)

is equal to S after

     XY = np.hstack(X,Y)
     cXY = XY - XY.mean(axis=0)[np.newaxis,:]
S = np.dot(cXY.T, cXY)/(n-1.)

Thus we can see thatthe total cocariance is composed
of four parts:

     S[:p,:p]  = np.dot(cX.T, cX)/(n-1.)   # upper left
     S[:p,p:]  = np.dot(cXY.T, cYY)/(n-1.) # upper right
     S[p:,:p]  = np.dot(cY.T, cX)/(n-1.)   # lower left
     S[p:,:p]  = np.dot(cYX.T, cYX)/(n-1.) # lower right

Often we just want the upper-right p x p quadrant. Thus
we can save 75 % of the cpu time by not computing the rest.

Sturla


From pierre.haessig at crans.org  Fri Jan 20 13:04:26 2012
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Fri, 20 Jan 2012 19:04:26 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <4F1988A2.5000701@molden.no>
References: <CAGGi21ayjdWU2eCya_XZHeQfRE8PcUarXTomsky7OOj6Hm5GrQ@mail.gmail.com>	<4F196064.5090508@crans.org>
	<4F1988A2.5000701@molden.no>
Message-ID: <4F19ACAA.2060707@crans.org>

Le 20/01/2012 16:30, Sturla Molden a ?crit :
> Often we just want the upper-right p x p quadrant.
Thanks for the explanation.
If I understood it correctly, you're interested in the
*cross*-covariance block of the matrix (and now I understand better
Elliot's message). Actually, I thought that was the behavior of the
np.cor function ! But you're right it's not ! [source code] The seconde
'y' argument just gets concatenated with the first one 'm'.

I would go further and ask why it so. People around may have use cases
in mind, because I have not.
Otherwise, I feel that the default behavior of cov when called with two
arguments should be what Sturla and Elliot just described.

Best,
Pierre

(that is something like this :

def cov(X, Y=None):
    if Y is None:
        Y = X
    else:
          assert Y.shape == X.shape # or something like that
    # [...jumping to the end of the existing code...]   
    if not rowvar:
        return (dot(X.T, Y.conj()) / fact).squeeze()
    else:
        return (dot(X, Y.T.conj()) / fact).squeeze()
)

[source code]
https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py


From fperez.net at gmail.com  Fri Jan 20 16:26:33 2012
From: fperez.net at gmail.com (Fernando Perez)
Date: Fri, 20 Jan 2012 13:26:33 -0800
Subject: [Numpy-discussion] Download page still points to SVN
In-Reply-To: <4F1955BC.4000207@gmail.com>
References: <CAHAreOp=JP=U4rFDuKxF=wx7jG+=LcsribTrc+WFH2jJcsRALQ@mail.gmail.com>
	<CA+nsYDtfR624POXumFLaMu3UwJj+e6u765=YrZ19v+DEKbu9sA@mail.gmail.com>
	<CAHAreOpoof16tVMHTZ=bM+uy56egY6uC=BtwDMGHC893Tg9yrQ@mail.gmail.com>
	<CA+nsYDvShtjTHQttT3vfLUqtro0XQTmXa1A-_XHQqE-zQvb9Ng@mail.gmail.com>
	<CAHAreOroAu66=ZPaaoYADZXTHqk+enmZGz4sE-PB_4eDUiD2CQ@mail.gmail.com>
	<031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io>
	<CAHAreOppuOrgEdPTAzTmiBMPPXFnA-xQUK85P=6pn+HDX6e-Yw@mail.gmail.com>
	<CA+nsYDtT1SS2UYCgmaxHzTZBmcQfV-02tyBX=QHDcL1+QaLrSA@mail.gmail.com>
	<4F1955BC.4000207@gmail.com>
Message-ID: <CAHAreOo5BaB8VSAhsOH_4Tn2PqL=QZs7jrvh7XRvKjc3Ri=s-w@mail.gmail.com>

On Fri, Jan 20, 2012 at 3:53 AM, David Verelst <david.verelst at gmail.com> wrote:
> I would like to assist on the website. Although I have not made any code
> contributions to Numpy/SciPy (yet), I do follow the mailing lists and
> try to keep up to date on the scientific python scene. However, I need
> to hold my breath until the end of my wind tunnel test campaign mid
> February.

Fantastic, thanks.  I think the ideal setup would be to create a web
team in the numpy org. so that this team can have permissions over the
website repos (source and build).  I don't belong to the org so I
can't do it myself.

> And I do like the sound of the gihub workflow as currently done by the
> ipython team.

Don't hesitate to ask us if you have any questions.  In particular,
it's important *not* to use gh-pages like they originally suggest, but
instead like we do it in ipython: the build should be a separate repo
altogether, not just a branch in the official source repo.  Ours has
the makefile targets and scripts already for that, let me know if any
of it doesn't make sense.

Cheers,

f


From lamblinp at iro.umontreal.ca  Fri Jan 20 16:55:43 2012
From: lamblinp at iro.umontreal.ca (Pascal Lamblin)
Date: Fri, 20 Jan 2012 22:55:43 +0100
Subject: [Numpy-discussion] Upgrade to 1.6.x: frompyfunc() ufunc	casting
	issue
In-Reply-To: <CANZ39W3ZS1LNVRxanjr34p+P9O65O5vaHxSNLmOsaVgew8N=Lw@mail.gmail.com>
References: <mailman.687.1327095181.1085.numpy-discussion@scipy.org>
Message-ID: <20120120215543.GC10327@bob.blip.be>

Hi everyone,

A long time ago, Aditya Sethi <ady.sethi at gmail... wrote:
> I am facing an issue upgrading numpy from 1.5.1 to 1.6.1.
> In numPy 1.6, the casting behaviour for ufunc has changed and has become
> stricter.
> 
> Can someone advise how to implement the below simple example which worked in
> 1.5.1 but fails in 1.6.1?
> 
> >>> import numpy as np
> >>> def add(a,b):
> ...    return (a+b)
> >>> uadd = np.frompyfunc(add,2,1)
> >>> uadd
> <ufunc 'add (vectorized)'>
> >>> uadd.accumulate([1,2,3])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: could not find a matching type for add (vectorized).accumulate,
> requested type has type code 'l'

Here's the workaround I found to that problem:

>>> uadd.accumulate([1,2,3], dtype='object')
array([1, 3, 6], dtype=object)

It seems like "accumulate" infers that 'l' is the required output dtype,
but does not have the appropriate implementation:
>>> uadd.types
['OO->O']

Forcing the output dtype to be 'object' (the only supported dtype) seems
to do the trick.

Hope this helps,
-- 
Pascal


From torgil.svensson at gmail.com  Sat Jan 21 08:49:24 2012
From: torgil.svensson at gmail.com (Torgil Svensson)
Date: Sat, 21 Jan 2012 14:49:24 +0100
Subject: [Numpy-discussion] Counting the Colors of RGB-Image
In-Reply-To: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de>
References: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de>
Message-ID: <CA+RwOBWf6hfGtRX2ps+Gx0pj2fDozKMQTOUz+k4GPCXdosjOQw@mail.gmail.com>

unique has an option to get indexes out which you can use in
combination with sort to get the actual counts out.

tab0 = zeros( 256*256*256 , dtype=int)
col=ravel(((im0[...,0].astype('u4')*256+im0[...,1])*256)+im0[...,2])
col,idx=unique(sort(col),True)
idx=hstack([idx,[2500*2500]])
tab0[col]=idx[1:]-idx[:-1]
tab0.shape=(256,256,256)

As Chris pointed out, if each pixel were 4 bytes you could probably
just use im0.view('>u4') for histogram values.

//Torgil


On Wed, Jan 18, 2012 at 10:26 AM,  <apo at pdauf.de> wrote:
>
> Sorry,
>
> that i use this way to send an answer to Tony Yu , Nadav Horesh , Chris Barker.
> When iam direct answering on Your e-mail i get an error 5.
> I think i did a mistake.
>
> Your ideas are very helpfull and the code is very fast.
>
> Thank You
>
> elodw
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From charlesr.harris at gmail.com  Sat Jan 21 12:06:09 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 21 Jan 2012 10:06:09 -0700
Subject: [Numpy-discussion] views and mask NA
In-Reply-To: <CAB6mnxL=sFZOT5Wih09UinGHz2RHB3vn=MQy8L5cdEx9A6kmgA@mail.gmail.com>
References: <CAB6mnxL=sFZOT5Wih09UinGHz2RHB3vn=MQy8L5cdEx9A6kmgA@mail.gmail.com>
Message-ID: <CAB6mnxL5Vv8DS4DPqDOd-_oRMeOqx2rNA1kdF6BE61sCt7AULA@mail.gmail.com>

Hi All,

I'd like some feedback on how mask NA should interact with views. The
immediate problem is how to deal with the real and imaginary parts of
complex numbers. If the original has a masked value, it should show up as
masked in the real and imaginary parts. But what should happen on
assignment to one of the masked views? This should probably clear the NA in
the real/imag part, but not in the complex original. However, that does
allow touching things under the mask, so to speak.

Things get more complicated if the complex original is viewed as reals. In
this case the mask needs to be "doubled" up, and there is again the
possibility of touching things beneath the mask in the original. Viewing
the original as bytes leads to even greater duplication.

My thought is that touching the underlying data needs to be allowed in
these cases, but the original mask can only be cleared by assignment to the
original. Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120121/438da4a5/attachment.html>

From staticfloat at gmail.com  Sat Jan 21 15:47:36 2012
From: staticfloat at gmail.com (Elliot Saba)
Date: Sat, 21 Jan 2012 12:47:36 -0800
Subject: [Numpy-discussion] Cross-covariance function
Message-ID: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>

Thank you Sturla, that's exactly what I want.

I'm sorry that I was not able to reply for so long, but Pierre's code is
similar to what I have already implemented, and I am in support of changing
the functionality of cov().  I am unaware of any arguments for a covariance
function that works in this way, except for the fact that the MATLAB cov()
function behaves in the same way. [1]

MATLAB, however, has an xcov() function, which is similar to what we have
been discussing. [2]

Unless you all wish to retain compatibility with MATLAB, I feel that the
behaviour of cov() suggested by Pierre is the most straightforward method,
and that if users wish to calculate the covariance of X concatenated with
Y, then they may simply concatenate the matrices explicitly before passing
into cov(), as this way the default method does not use 75% more CPU time.

Again, if there is an argument for this functionality, I would love to
learn of it!
-E

[1] http://www.mathworks.com/help//techdoc/ref/cov.html
[2] http://www.mathworks.com/help/toolbox/signal/ref/xcov.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120121/040fa1c9/attachment.html>

From jsalvati at u.washington.edu  Sat Jan 21 18:26:30 2012
From: jsalvati at u.washington.edu (John Salvatier)
Date: Sat, 21 Jan 2012 15:26:30 -0800
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
Message-ID: <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>

I ran into this a while ago and was confused why cov did not behave the way
pierre suggested.
On Jan 21, 2012 12:48 PM, "Elliot Saba" <staticfloat at gmail.com> wrote:

> Thank you Sturla, that's exactly what I want.
>
> I'm sorry that I was not able to reply for so long, but Pierre's code is
> similar to what I have already implemented, and I am in support of changing
> the functionality of cov().  I am unaware of any arguments for a covariance
> function that works in this way, except for the fact that the MATLAB cov()
> function behaves in the same way. [1]
>
> MATLAB, however, has an xcov() function, which is similar to what we have
> been discussing. [2]
>
> Unless you all wish to retain compatibility with MATLAB, I feel that the
> behaviour of cov() suggested by Pierre is the most straightforward method,
> and that if users wish to calculate the covariance of X concatenated with
> Y, then they may simply concatenate the matrices explicitly before passing
> into cov(), as this way the default method does not use 75% more CPU time.
>
> Again, if there is an argument for this functionality, I would love to
> learn of it!
> -E
>
> [1] http://www.mathworks.com/help//techdoc/ref/cov.html
> [2] http://www.mathworks.com/help/toolbox/signal/ref/xcov.html
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120121/8c93072e/attachment.html>

From josef.pktd at gmail.com  Sat Jan 21 19:40:34 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 21 Jan 2012 19:40:34 -0500
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
Message-ID: <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>

On Sat, Jan 21, 2012 at 6:26 PM, John Salvatier
<jsalvati at u.washington.edu> wrote:
> I ran into this a while ago and was confused why cov did not behave the way
> pierre suggested.

same here,
When I rewrote scipy.stats.spearmanr, I matched the numpy behavior for
two arrays, while R only returns the cross-correlation part.

Josef

>
> On Jan 21, 2012 12:48 PM, "Elliot Saba" <staticfloat at gmail.com> wrote:
>>
>> Thank you Sturla, that's exactly what I want.
>>
>> I'm sorry that I was not able to reply for so long, but Pierre's code is
>> similar to what I have already implemented, and I am in support of changing
>> the functionality of cov(). ?I am unaware of any arguments for a covariance
>> function that works in this way, except for the fact that the MATLAB cov()
>> function behaves in the same way. [1]
>>
>> MATLAB, however, has an xcov() function, which is similar to what we have
>> been discussing. [2]
>>
>> Unless you all wish to retain compatibility with MATLAB, I feel that the
>> behaviour of cov() suggested by Pierre is the most straightforward method,
>> and that if users wish to calculate the covariance of X concatenated with Y,
>> then they may simply concatenate the matrices explicitly before passing into
>> cov(), as this way the default method does not use 75% more CPU time.
>>
>> Again, if there is an argument for this functionality, I would love to
>> learn of it!
>> -E
>>
>> [1]?http://www.mathworks.com/help//techdoc/ref/cov.html
>> [2]?http://www.mathworks.com/help/toolbox/signal/ref/xcov.html
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From ondrej.certik at gmail.com  Sat Jan 21 22:55:10 2012
From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=)
Date: Sat, 21 Jan 2012 19:55:10 -0800
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
In-Reply-To: <CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
Message-ID: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>

Hi,

I read the Mandelbrot code using NumPy at this page:

http://mentat.za.net/numpy/intro/intro.html

but when I run it, it gives me integer overflows. As such, I have
fixed the code, so that it doesn't overflow here:

https://gist.github.com/1655320

and I have also written an equivalent Fortran program.

You can compare both source codes to see
that that it is pretty much one-to-one translation.
The main idea in the above gist is to take an
algorithm written in NumPy, and translate
it directly to Fortran, without any special
optimizations. So the above is my first try
in Fortran. You can plot the result
using this simple script (you
can also just click on this gist to
see the image there):

https://gist.github.com/1655377

Here are my timings:

               Python  Fortran Speedup
Calculation     12.749  00.784  16.3x
Saving  01.904  01.456  1.3x
Total          14.653   02.240  6.5x

I save the matrices to disk in an ascii format,
so it's quite slow in both cases. The pure computation
is however 16x faster in Fortran (in gfortran,
I didn't even try Intel Fortran, that will probably be
even faster).

As such, I wonder how the NumPy version could be sped up?
I have compiled NumPy with Lapack+Blas from source.

Would anyone be willing to run the NumPy version? Just copy+paste
should do it.

If you want to run the Fortran version, the above gist uses
some of my other modules that I use in my other programs, my goal
was to see how much more complicated the Fortran code gets,
compared to NumPy. As such, I put here

https://gist.github.com/1655350

a single file
with all the dependencies, just compile it like this:

gfortran -fPIC -O3 -march=native -ffast-math -funroll-loops mandelbrot.f90

and run:

$ ./a.out
Iteration 1
Iteration 2
...
Iteration 100
 Saving...
 Times:
 Calculation:  0.74804599999999999
 Saving:   1.3640850000000002
 Total:   2.1121310000000002


Let me know if you figure out something. I think the "mask" thing is
quite slow, but the problem is that it needs to be there, to catch
overflows (and it is there in Fortran as well, see the
"where" statement, which does the same thing). Maybe there is some
other way to write the same thing in NumPy?

Ondrej


From pengyu.ut at gmail.com  Sat Jan 21 23:25:22 2012
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sat, 21 Jan 2012 22:25:22 -0600
Subject: [Numpy-discussion] Easy module installation with less human
	intervention.
Message-ID: <CABrM6w=rehitc+w5aLzPqf8w10U3s16-tKDjC7SaX4=B10eOZQ@mail.gmail.com>

Hi,


Perl has something like ppm so that I can just use one command to
download and install perl modules. But I don't find such thing in
python. As shown on http://docs.python.org/install/index.html, it
seems that I have to download the packages first unzip it then install
it. I'm wondering if there is a better way to install python packages
that require less human intervention. Thanks!


NAME
    ppm - Perl Package Manager, version 4.14

SYNOPSIS
    Invoke the graphical user interface:

        ppm
        ppm gui

    Install, upgrade and remove packages:

        ppm install [--area <area>] [--force] <pkg>
        ppm install [--area <area>] [--force] <module>
        ppm install [--area <area>] <url>
        ppm install [--area <area>] <file>.ppmx
        ppm install [--area <area>] <file>.ppd
        ppm install [--area <area>] <num>
        ppm upgrade [--install]
        ppm upgrade <pkg>
        ppm upgrade <module>
        ppm remove [--area <area>] [--force] <pkg>


-- 
Regards,
Peng


From shish at keba.be  Sat Jan 21 23:34:57 2012
From: shish at keba.be (Olivier Delalleau)
Date: Sat, 21 Jan 2012 23:34:57 -0500
Subject: [Numpy-discussion] Easy module installation with less human
	intervention.
In-Reply-To: <CABrM6w=rehitc+w5aLzPqf8w10U3s16-tKDjC7SaX4=B10eOZQ@mail.gmail.com>
References: <CABrM6w=rehitc+w5aLzPqf8w10U3s16-tKDjC7SaX4=B10eOZQ@mail.gmail.com>
Message-ID: <CAFXk4bq=Qg1mZEMLtmrwiZRF+hd9NtS7dumJ4Q_ozcA8YEmoUw@mail.gmail.com>

You can try easy_install or pip.

-=- Olivier

2012/1/21 Peng Yu <pengyu.ut at gmail.com>

> Hi,
>
>
> Perl has something like ppm so that I can just use one command to
> download and install perl modules. But I don't find such thing in
> python. As shown on http://docs.python.org/install/index.html, it
> seems that I have to download the packages first unzip it then install
> it. I'm wondering if there is a better way to install python packages
> that require less human intervention. Thanks!
>
>
> NAME
>    ppm - Perl Package Manager, version 4.14
>
> SYNOPSIS
>    Invoke the graphical user interface:
>
>        ppm
>        ppm gui
>
>    Install, upgrade and remove packages:
>
>        ppm install [--area <area>] [--force] <pkg>
>        ppm install [--area <area>] [--force] <module>
>        ppm install [--area <area>] <url>
>        ppm install [--area <area>] <file>.ppmx
>        ppm install [--area <area>] <file>.ppd
>        ppm install [--area <area>] <num>
>        ppm upgrade [--install]
>        ppm upgrade <pkg>
>        ppm upgrade <module>
>        ppm remove [--area <area>] [--force] <pkg>
>
>
>
> --
> Regards,
> Peng
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120121/b61dcf55/attachment.html>

From nadavh at visionsense.com  Sun Jan 22 05:28:53 2012
From: nadavh at visionsense.com (Nadav Horesh)
Date: Sun, 22 Jan 2012 02:28:53 -0800
Subject: [Numpy-discussion] Strange error raised by scipy.special.erf
Message-ID: <26FC23E7C398A64083C980D16001012D261F0D9376@VA3DIAXVS361.RED001.local>

With N.seterr(all='raise'):

>>> from scipy import special
>>> import scipy
>>> special.erf(26.6)
1.0
>>> scipy.__version__
'0.11.0.dev-81dc505'
>>> import numpy as N
>>> N.seterr(all='raise')
{'over': 'warn', 'divide': 'warn', 'invalid': 'warn', 'under': 'ignore'}
>>> special.erf(26.5)
1.0
>>> special.erf(26.6)
Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    special.erf(26.6)
FloatingPointError: underflow encountered in erf
>>> special.erf(26.7)
1.0

What is so special in 26.6?
I have this error also with previous versions of scipy

  Nadav.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120122/444ea5ab/attachment.html>

From seb.haase at gmail.com  Sun Jan 22 06:13:32 2012
From: seb.haase at gmail.com (Sebastian Haase)
Date: Sun, 22 Jan 2012 12:13:32 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
Message-ID: <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>

How does the?algorithm and timing compare to this one:

http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f

The author of original version is  Dan Goodman
# FAST FRACTALS WITH PYTHON AND NUMPY


-Sebastian Haase

2012/1/22 Ond?ej ?ert?k <ondrej.certik at gmail.com>
>
> Hi,
>
> I read the Mandelbrot code using NumPy at this page:
>
> http://mentat.za.net/numpy/intro/intro.html
>
> but when I run it, it gives me integer overflows. As such, I have
> fixed the code, so that it doesn't overflow here:
>
> https://gist.github.com/1655320
>
> and I have also written an equivalent Fortran program.
>
> You can compare both source codes to see
> that that it is pretty much one-to-one translation.
> The main idea in the above gist is to take an
> algorithm written in NumPy, and translate
> it directly to Fortran, without any special
> optimizations. So the above is my first try
> in Fortran. You can plot the result
> using this simple script (you
> can also just click on this gist to
> see the image there):
>
> https://gist.github.com/1655377
>
> Here are my timings:
>
> ? ? ? ? ? ? ? Python ?Fortran Speedup
> Calculation ? ? 12.749 ?00.784 ?16.3x
> Saving ?01.904 ?01.456 ?1.3x
> Total ? ? ? ? ?14.653 ? 02.240 ?6.5x
>
> I save the matrices to disk in an ascii format,
> so it's quite slow in both cases. The pure computation
> is however 16x faster in Fortran (in gfortran,
> I didn't even try Intel Fortran, that will probably be
> even faster).
>
> As such, I wonder how the NumPy version could be sped up?
> I have compiled NumPy with Lapack+Blas from source.
>
> Would anyone be willing to run the NumPy version? Just copy+paste
> should do it.
>
> If you want to run the Fortran version, the above gist uses
> some of my other modules that I use in my other programs, my goal
> was to see how much more complicated the Fortran code gets,
> compared to NumPy. As such, I put here
>
> https://gist.github.com/1655350
>
> a single file
> with all the dependencies, just compile it like this:
>
> gfortran -fPIC -O3 -march=native -ffast-math -funroll-loops mandelbrot.f90
>
> and run:
>
> $ ./a.out
> Iteration 1
> Iteration 2
> ...
> Iteration 100
> ?Saving...
> ?Times:
> ?Calculation: ?0.74804599999999999
> ?Saving: ? 1.3640850000000002
> ?Total: ? 2.1121310000000002
>
>
> Let me know if you figure out something. I think the "mask" thing is
> quite slow, but the problem is that it needs to be there, to catch
> overflows (and it is there in Fortran as well, see the
> "where" statement, which does the same thing). Maybe there is some
> other way to write the same thing in NumPy?
>
> Ondrej
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From d.s.seljebotn at astro.uio.no  Sun Jan 22 12:29:20 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Sun, 22 Jan 2012 18:29:20 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
 Fortran
In-Reply-To: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
Message-ID: <4F1C4770.2040603@astro.uio.no>

On 01/22/2012 04:55 AM, Ond?ej ?ert?k wrote:
> Hi,
>
> I read the Mandelbrot code using NumPy at this page:
>
> http://mentat.za.net/numpy/intro/intro.html
>
> but when I run it, it gives me integer overflows. As such, I have
> fixed the code, so that it doesn't overflow here:
>
> https://gist.github.com/1655320
>
> and I have also written an equivalent Fortran program.
>
> You can compare both source codes to see
> that that it is pretty much one-to-one translation.
> The main idea in the above gist is to take an
> algorithm written in NumPy, and translate
> it directly to Fortran, without any special
> optimizations. So the above is my first try
> in Fortran. You can plot the result
> using this simple script (you
> can also just click on this gist to
> see the image there):
>
> https://gist.github.com/1655377
>
> Here are my timings:
>
>                 Python  Fortran Speedup
> Calculation     12.749  00.784  16.3x
> Saving  01.904  01.456  1.3x
> Total          14.653   02.240  6.5x
>
> I save the matrices to disk in an ascii format,
> so it's quite slow in both cases. The pure computation
> is however 16x faster in Fortran (in gfortran,
> I didn't even try Intel Fortran, that will probably be
> even faster).
>
> As such, I wonder how the NumPy version could be sped up?
> I have compiled NumPy with Lapack+Blas from source.

This is a pretty well known weakness with NumPy. In the Python code at 
least, each of c and z are about 15 MB, and the mask about 1 MB. So that 
doesn't fit in CPU cache, and so each and every statement you do in the 
loop transfer that data in and out of CPU cache the memory bus.

There's no quick fix -- you can try to reduce the working set so that it 
fits in CPU cache, but then the Python overhead often comes into play. 
Solutions include numexpr and Theano -- and as often as not, Cython and 
Fortran.

It's a good example, thanks!,

Dag Sverre


From ondrej.certik at gmail.com  Sun Jan 22 14:01:41 2012
From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=)
Date: Sun, 22 Jan 2012 11:01:41 -0800
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
Message-ID: <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>

On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase <seb.haase at gmail.com> wrote:
> How does the?algorithm and timing compare to this one:
>
> http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f
>
> The author of original version is ?Dan Goodman
> # FAST FRACTALS WITH PYTHON AND NUMPY

Thanks Sebastian. This one is much faster ---- 2.7s on my laptop with
the same dimensions/iterations.

It uses a better datastructures -- it only keeps track of points that
still need to be iterated --- very clever.
If I have time, I'll try to provide an equivalent Fortran version too,
for comparison.

Ondrej


From chris.barker at noaa.gov  Sun Jan 22 23:31:30 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Sun, 22 Jan 2012 20:31:30 -0800
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
Message-ID: <CALGmxEJtKUEanrcipSYeH=he8Gc7BTU64pjNJZfQWdrueUA8yg@mail.gmail.com>

2012/1/22 Ond?ej ?ert?k <ondrej.certik at gmail.com>:
> If I have time, I'll try to provide an equivalent Fortran version too,
> for comparison.
>
> Ondrej

here is a Cython example:

http://wiki.cython.org/examples/mandelbrot

I haven't looked to see if it's the same algorithm, but it may be
instructive, none the less.

-Chris

-- 
--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From jrocher at enthought.com  Sun Jan 22 23:35:03 2012
From: jrocher at enthought.com (Jonathan Rocher)
Date: Sun, 22 Jan 2012 22:35:03 -0600
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
Message-ID: <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>

Hi all,

I was reading this while learning about Pytables in more details and the
origin of its efficiency. This sounds like a problem where out of core
computation using pytables would shine since the dataset doesn't fit into
CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
C/Cythonizing the problem would be another good way...

HTH,
Jonathan

2012/1/22 Ond?ej ?ert?k <ondrej.certik at gmail.com>

> On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase <seb.haase at gmail.com>
> wrote:
> > How does the algorithm and timing compare to this one:
> >
> >
> http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f
> >
> > The author of original version is  Dan Goodman
> > # FAST FRACTALS WITH PYTHON AND NUMPY
>
> Thanks Sebastian. This one is much faster ---- 2.7s on my laptop with
> the same dimensions/iterations.
>
> It uses a better datastructures -- it only keeps track of points that
> still need to be iterated --- very clever.
> If I have time, I'll try to provide an equivalent Fortran version too,
> for comparison.
>
> Ondrej
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
Jonathan Rocher, PhD
Scientific software developer
Enthought, Inc.
jrocher at enthought.com
1-512-536-1057
http://www.enthought.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120122/7eb4e4f6/attachment.html>

From dg.gmane at thesamovar.net  Sun Jan 22 23:39:35 2012
From: dg.gmane at thesamovar.net (Dan Goodman)
Date: Mon, 23 Jan 2012 05:39:35 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
Message-ID: <jfioa6$tuo$1@dough.gmane.org>

On 22/01/2012 20:01, Ond?ej ?ert?k wrote:
> On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase<seb.haase at gmail.com>  wrote:
>> How does the algorithm and timing compare to this one:
>>
>> http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f
>>
>> The author of original version is  Dan Goodman
>> # FAST FRACTALS WITH PYTHON AND NUMPY
>
> Thanks Sebastian. This one is much faster ---- 2.7s on my laptop with
> the same dimensions/iterations.
>
> It uses a better datastructures -- it only keeps track of points that
> still need to be iterated --- very clever.
> If I have time, I'll try to provide an equivalent Fortran version too,
> for comparison.

I spent a little while trying to optimise my algorithm using only numpy 
and couldn't get it running much faster than that. Given the relatively 
low number of iterations it's probably not a problem of Python 
overheads, so I guess it is indeed memory access that is the problem. 
One way to get round this using numexpr would be something like this. 
Write f(z)=z^2+c and then f(n+1,z)=f(n,f(z)). Now try out instead of 
computing z->f(z) each iteration, write down the formula for z->f(n,z) 
for a few different n and use that in numexpr, e.g. z->f(2,z) or 
z->(z^2+c)^2+c. This amounts to doing several iterations per step, but 
it means that you'll be spending more time doing floating point ops and 
less time waiting for memory operations so it might get closer to 
fortran/C speeds.

Actually, my curiosity was piqued so I tried it out. On my laptop I get 
that using the idea above gives a maximum speed increase for n=8, and 
after that you start to get overflow errors so it runs slower. At n=8 it 
runs about 4.5x faster than the original version. So if you got the same 
speedup it would be running in about 0.6s compared to your fortran 0.7s. 
However it's not a fair comparison as numexpr is using multiple cores 
(but only about 60% peak on my dual core laptop), but still nice to see 
what can be achieved with numexpr. :)

Code attached.

Dan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: fastermandel.py
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120123/e1d93a12/attachment.ksh>

From d.s.seljebotn at astro.uio.no  Mon Jan 23 04:04:36 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Mon, 23 Jan 2012 10:04:36 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
 Fortran
In-Reply-To: <CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
	<CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>
Message-ID: <4F1D22A4.8010009@astro.uio.no>

On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
> Hi all,
>
> I was reading this while learning about Pytables in more details and the
> origin of its efficiency. This sounds like a problem where out of core
> computation using pytables would shine since the dataset doesn't fit
> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
> C/Cythonizing the problem would be another good way...

Well, since the data certainly fits in RAM, one would use numexpr 
directly (which is what pytables also uses).

Dag Sverre

>
> HTH,
> Jonathan
>
> 2012/1/22 Ond?ej ?ert?k <ondrej.certik at gmail.com
> <mailto:ondrej.certik at gmail.com>>
>
>     On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase
>     <seb.haase at gmail.com <mailto:seb.haase at gmail.com>> wrote:
>      > How does the algorithm and timing compare to this one:
>      >
>      >
>     http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f
>     <http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f>
>      >
>      > The author of original version is  Dan Goodman
>      > # FAST FRACTALS WITH PYTHON AND NUMPY
>
>     Thanks Sebastian. This one is much faster ---- 2.7s on my laptop with
>     the same dimensions/iterations.
>
>     It uses a better datastructures -- it only keeps track of points that
>     still need to be iterated --- very clever.
>     If I have time, I'll try to provide an equivalent Fortran version too,
>     for comparison.
>
>     Ondrej
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> --
> Jonathan Rocher, PhD
> Scientific software developer
> Enthought, Inc.
> jrocher at enthought.com <mailto:jrocher at enthought.com>
> 1-512-536-1057
> http://www.enthought.com <http://www.enthought.com/>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From wardefar at iro.umontreal.ca  Mon Jan 23 05:23:28 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Mon, 23 Jan 2012 05:23:28 -0500
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
Message-ID: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>

A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit:

> a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8')
> b = numpy.random.randint(5000000,size=(4993210,))
> c = a[b]

It seems c is not getting filled in full, namely:

> In [14]: c[1000000:].sum()
> Out[14]: 0

I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug.

Thanks,

David

From sturla at molden.no  Mon Jan 23 06:23:01 2012
From: sturla at molden.no (Sturla Molden)
Date: Mon, 23 Jan 2012 12:23:01 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
 Fortran
In-Reply-To: <4F1D22A4.8010009@astro.uio.no>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
	<CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>
	<4F1D22A4.8010009@astro.uio.no>
Message-ID: <4F1D4315.5080602@molden.no>

Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>> Hi all,
>>
>> I was reading this while learning about Pytables in more details and the
>> origin of its efficiency. This sounds like a problem where out of core
>> computation using pytables would shine since the dataset doesn't fit
>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>> C/Cythonizing the problem would be another good way...
> Well, since the data certainly fits in RAM, one would use numexpr
> directly (which is what pytables also uses).
>
>

Personally I feel this debate is asking the wrong question.

It is not uncommon for NumPy code to be 16x slower than C or Fortran. 
But that is not really interesting.

This is what I think matters:

- Is the NumPy code FAST ENOUGH?  If not, then go ahead and optimize. If 
it's fast enough, then just leave it.

In this case, it seems Python takes ~13 seconds compared to ~1 second 
for Fortran. Sure, those extra 12 seconds could be annoying. But how 
much coding time should we spend to avoid them? 15 minutes? An hour? Two 
hours?

Taking the time spent optimizing into account, then perhaps Python is 
'faster' anyway? It is common to ask what is fastest for the computer. 
But we should really be asking what is fastest for our selves.

For example: I have a computation that will take a day in Fortran or a 
month in Python (estimated). And I am going to run this code several 
times (20 or so, I think). In this case, yes, coding the bottlenecks in 
Fortran matters to me. But 13 seconds versus 1 second? I find that 
hardly interesting.

Sturla


From seb.haase at gmail.com  Mon Jan 23 07:09:58 2012
From: seb.haase at gmail.com (Sebastian Haase)
Date: Mon, 23 Jan 2012 13:09:58 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <4F1D4315.5080602@molden.no>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
	<CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>
	<4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no>
Message-ID: <CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com>

On Mon, Jan 23, 2012 at 12:23 PM, Sturla Molden <sturla at molden.no> wrote:
> Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
>> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>>> Hi all,
>>>
>>> I was reading this while learning about Pytables in more details and the
>>> origin of its efficiency. This sounds like a problem where out of core
>>> computation using pytables would shine since the dataset doesn't fit
>>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>>> C/Cythonizing the problem would be another good way...
>> Well, since the data certainly fits in RAM, one would use numexpr
>> directly (which is what pytables also uses).
>>
>>
>
> Personally I feel this debate is asking the wrong question.
>
> It is not uncommon for NumPy code to be 16x slower than C or Fortran.
> But that is not really interesting.
>
> This is what I think matters:
>
> - Is the NumPy code FAST ENOUGH? ?If not, then go ahead and optimize. If
> it's fast enough, then just leave it.
>
> In this case, it seems Python takes ~13 seconds compared to ~1 second
> for Fortran. Sure, those extra 12 seconds could be annoying. But how
> much coding time should we spend to avoid them? 15 minutes? An hour? Two
> hours?
>
> Taking the time spent optimizing into account, then perhaps Python is
> 'faster' anyway? It is common to ask what is fastest for the computer.
> But we should really be asking what is fastest for our selves.
>
> For example: I have a computation that will take a day in Fortran or a
> month in Python (estimated). And I am going to run this code several
> times (20 or so, I think). In this case, yes, coding the bottlenecks in
> Fortran matters to me. But 13 seconds versus 1 second? I find that
> hardly interesting.
>
> Sturla


I would think that interactive zooming would be quite nice
("illuminating")  .... and for that 13 secs would not be tolerable....
Well... it's not at the top of my priority list ... ;-)

-Sebastian Haase


From d.s.seljebotn at astro.uio.no  Mon Jan 23 07:40:42 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Mon, 23 Jan 2012 13:40:42 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
 Fortran
In-Reply-To: <4F1D4315.5080602@molden.no>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>	<CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>	<4F1D22A4.8010009@astro.uio.no>
	<4F1D4315.5080602@molden.no>
Message-ID: <4F1D554A.4040902@astro.uio.no>

On 01/23/2012 12:23 PM, Sturla Molden wrote:
> Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
>> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>>> Hi all,
>>>
>>> I was reading this while learning about Pytables in more details and the
>>> origin of its efficiency. This sounds like a problem where out of core
>>> computation using pytables would shine since the dataset doesn't fit
>>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>>> C/Cythonizing the problem would be another good way...
>> Well, since the data certainly fits in RAM, one would use numexpr
>> directly (which is what pytables also uses).
>>
>>
>
> Personally I feel this debate is asking the wrong question.
>
> It is not uncommon for NumPy code to be 16x slower than C or Fortran.
> But that is not really interesting.
>
> This is what I think matters:
>
> - Is the NumPy code FAST ENOUGH?  If not, then go ahead and optimize. If
> it's fast enough, then just leave it.
>
> In this case, it seems Python takes ~13 seconds compared to ~1 second
> for Fortran. Sure, those extra 12 seconds could be annoying. But how
> much coding time should we spend to avoid them? 15 minutes? An hour? Two
> hours?
>
> Taking the time spent optimizing into account, then perhaps Python is
> 'faster' anyway? It is common to ask what is fastest for the computer.
> But we should really be asking what is fastest for our selves.
>
> For example: I have a computation that will take a day in Fortran or a
> month in Python (estimated). And I am going to run this code several
> times (20 or so, I think). In this case, yes, coding the bottlenecks in
> Fortran matters to me. But 13 seconds versus 1 second? I find that
> hardly interesting.

You, me, Ondrej, and many more are happy to learn 4 languages and use 
them where they are most appropriate.

But most scientists only want to learn and use one tool. And most 
scientists have both problems where performance doesn't matter, and 
problems where it does. So as long as examples like this exists, many 
people will prefer Fortran for *all* their tasks.

(Of course, that's why I got involved in Cython...)

Dag Sverre


From sturla at molden.no  Mon Jan 23 07:51:42 2012
From: sturla at molden.no (Sturla Molden)
Date: Mon, 23 Jan 2012 13:51:42 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
 Fortran
In-Reply-To: <CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
	<CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>
	<4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no>
	<CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com>
Message-ID: <4F1D57DE.3030507@molden.no>

Den 23.01.2012 13:09, skrev Sebastian Haase:
>
> I would think that interactive zooming would be quite nice
> ("illuminating")  .... and for that 13 secs would not be tolerable....
> Well... it's not at the top of my priority list ... ;-)
>

Sure, that comes under the 'fast enough' issue. But even Fortran might 
be too slow here?

For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader 
(which would be a text string in Python):

madelbrot_fragment_shader = """

uniform sampler1D tex;
uniform vec2 center;
uniform float scale;
uniform int iter;
void main() {
     vec2 z, c;
     c.x = 1.3333 * (gl_TexCoord[0].x - 0.5) * scale - center.x;
     c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
     int i;
     z = c;
     for(i=0; i<iter; i++) {
         float x = (z.x * z.x - z.y * z.y) + c.x;
         float y = (z.y * z.x + z.x * z.y) + c.y;
         if((x * x + y * y)>  4.0) break;
         z.x = x;
         z.y = y;
     }
     gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
}

"""

The rest is just boiler-plate OpenGL...

Sources:

http://nuclear.mutantstargoat.com/articles/sdr_fract/

http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml


Sturla


From cimrman3 at ntc.zcu.cz  Mon Jan 23 08:02:54 2012
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Mon, 23 Jan 2012 14:02:54 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
 Fortran
In-Reply-To: <4F1D57DE.3030507@molden.no>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
	<CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>
	<4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no>
	<CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com>
	<4F1D57DE.3030507@molden.no>
Message-ID: <4F1D5A7E.4080604@ntc.zcu.cz>

On 01/23/12 13:51, Sturla Molden wrote:
> Den 23.01.2012 13:09, skrev Sebastian Haase:
>>
>> I would think that interactive zooming would be quite nice
>> ("illuminating")  .... and for that 13 secs would not be tolerable....
>> Well... it's not at the top of my priority list ... ;-)
>>
>
> Sure, that comes under the 'fast enough' issue. But even Fortran might
> be too slow here?
>
> For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader
> (which would be a text string in Python):
>
> madelbrot_fragment_shader = """
>
> uniform sampler1D tex;
> uniform vec2 center;
> uniform float scale;
> uniform int iter;
> void main() {
>       vec2 z, c;
>       c.x = 1.3333 * (gl_TexCoord[0].x - 0.5) * scale - center.x;
>       c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
>       int i;
>       z = c;
>       for(i=0; i<iter; i++) {
>           float x = (z.x * z.x - z.y * z.y) + c.x;
>           float y = (z.y * z.x + z.x * z.y) + c.y;
>           if((x * x + y * y)>   4.0) break;
>           z.x = x;
>           z.y = y;
>       }
>       gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
> }
>
> """
>
> The rest is just boiler-plate OpenGL...
>
> Sources:
>
> http://nuclear.mutantstargoat.com/articles/sdr_fract/
>
> http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml

Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos 
had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) 
years ago (-> on a laughable hardware by today's standards).

r.

[1] http://wmi.math.u-szeged.hu/xaos/doku.php


From scipy at samueljohn.de  Mon Jan 23 11:35:29 2012
From: scipy at samueljohn.de (Samuel John)
Date: Mon, 23 Jan 2012 17:35:29 +0100
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <4F1D5A7E.4080604@ntc.zcu.cz>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAN06oV_asm3czu2D7NEC+TYPABte0QpkzetS6d-+bf0j_K6ZRA@mail.gmail.com>
	<CADDwiVAmeNCbptSJmX6GxddZEoyHSfCmZVqftow7VvQsbcS5Ew@mail.gmail.com>
	<CAOzk5Qe3kppSOczrLDVjCzoCsWcDneA2ecnfU3B2RR8WsNttyw@mail.gmail.com>
	<4F1D22A4.8010009@astro.uio.no> <4F1D4315.5080602@molden.no>
	<CAN06oV-ArZaZBZ+fHoGmO6Jxj69Pngku-X02hy05Huvm_TPH-w@mail.gmail.com>
	<4F1D57DE.3030507@molden.no> <4F1D5A7E.4080604@ntc.zcu.cz>
Message-ID: <20E62AD8-0A7B-4D32-868C-97C62DE9F9AB@samueljohn.de>

I'd like to add http://git.tiker.net/pyopencl.git/blob/HEAD:/examples/demo_mandelbrot.py to the discussion, since I use pyopencl  (http://mathema.tician.de/software/pyopencl) with great success in my daily scientific computing. Install with pip.

PyOpenCL does understand numpy arrays. You write a kernel (small c-program) directly into a python triple quoted strings and get a pythonic way to program GPU and core i5 and i7 CPUs with python Exception if something goes wrong. Whenever I hit a speed bottleneck that I cannot solve with pure numpy, I code a little part of the computation for GPU. The compilation is done just in time when you run the python code.

Especially for the mandelbrot this may be a _huge_ gain in speed since its embarrassingly parallel.

Samuel


On 23.01.2012, at 14:02, Robert Cimrman wrote:

> On 01/23/12 13:51, Sturla Molden wrote:
>> Den 23.01.2012 13:09, skrev Sebastian Haase:
>>> 
>>> I would think that interactive zooming would be quite nice
>>> ("illuminating")  .... and for that 13 secs would not be tolerable....
>>> Well... it's not at the top of my priority list ... ;-)
>>> 
>> 
>> Sure, that comes under the 'fast enough' issue. But even Fortran might
>> be too slow here?
>> 
>> For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader
>> (which would be a text string in Python):
>> 
>> madelbrot_fragment_shader = """
>> 
>> uniform sampler1D tex;
>> uniform vec2 center;
>> uniform float scale;
>> uniform int iter;
>> void main() {
>>      vec2 z, c;
>>      c.x = 1.3333 * (gl_TexCoord[0].x - 0.5) * scale - center.x;
>>      c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
>>      int i;
>>      z = c;
>>      for(i=0; i<iter; i++) {
>>          float x = (z.x * z.x - z.y * z.y) + c.x;
>>          float y = (z.y * z.x + z.x * z.y) + c.y;
>>          if((x * x + y * y)>   4.0) break;
>>          z.x = x;
>>          z.y = y;
>>      }
>>      gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
>> }
>> 
>> """
>> 
>> The rest is just boiler-plate OpenGL...
>> 
>> Sources:
>> 
>> http://nuclear.mutantstargoat.com/articles/sdr_fract/
>> 
>> http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml
> 
> Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos 
> had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) 
> years ago (-> on a laughable hardware by today's standards).
> 
> r.
> 
> [1] http://wmi.math.u-szeged.hu/xaos/doku.php
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From chris.barker at noaa.gov  Mon Jan 23 12:17:41 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Mon, 23 Jan 2012 09:17:41 -0800
Subject: [Numpy-discussion] Counting the Colors of RGB-Image
In-Reply-To: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de>
References: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de>
Message-ID: <CALGmxEKT5tS276kwnvUTN23CYG07Zrg2rH5-0kZJuPZmCwr=tQ@mail.gmail.com>

On Wed, Jan 18, 2012 at 1:26 AM,  <apo at pdauf.de> wrote:
> Your ideas are very helpfull and the code is very fast.

I'm curios -- a number of ideas were floated here -- what did you end up using?

-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From wardefar at iro.umontreal.ca  Mon Jan 23 13:55:52 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Mon, 23 Jan 2012 13:55:52 -0500
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
Message-ID: <20120123185552.GA27535@ravage>

I've reproduced this (rather serious) bug myself and confirmed that it exists
in master, and as far back as 1.4.1.

I'd really appreciate if someone could reproduce and confirm on another
machine, as so far all my testing has been on our single high-memory machine.

Thanks,
David

On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit:
> 
> > a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8')
> > b = numpy.random.randint(5000000,size=(4993210,))
> > c = a[b]
> 
> It seems c is not getting filled in full, namely:
> 
> > In [14]: c[1000000:].sum()
> > Out[14]: 0
> 
> I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug.
> 
> Thanks,
> 
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From teoliphant at gmail.com  Mon Jan 23 14:33:42 2012
From: teoliphant at gmail.com (Travis Oliphant)
Date: Mon, 23 Jan 2012 13:33:42 -0600
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <20120123185552.GA27535@ravage>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
Message-ID: <46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io>

Can you determine where the problem is, precisely.    In other words, can you verify that c is not getting filled in correctly? 

You are no doubt going to get overflow in the summation as you have a uint8 parameter.   But, having that overflow be exactly '0' would be surprising.  

Can you verify that a and b are getting created correctly?   Also, 'c' should be a 2-d array, can you verify that?  Can you take the sum along the -1 axis and the 0 axis separately: 

print a.shape
print b.shape
print c.shape

c[1000000:].sum(axis=0)
d = c[1000000:].sum(axis=-1)
print d[:100]
print d[-100:]


On Jan 23, 2012, at 12:55 PM, David Warde-Farley wrote:

> I've reproduced this (rather serious) bug myself and confirmed that it exists
> in master, and as far back as 1.4.1.
> 
> I'd really appreciate if someone could reproduce and confirm on another
> machine, as so far all my testing has been on our single high-memory machine.
> 
> Thanks,
> David
> 
> On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
>> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit:
>> 
>>> a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8')
>>> b = numpy.random.randint(5000000,size=(4993210,))
>>> c = a[b]
>> 
>> It seems c is not getting filled in full, namely:
>> 
>>> In [14]: c[1000000:].sum()
>>> Out[14]: 0
>> 
>> I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug.
>> 
>> Thanks,
>> 
>> David
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From robince at gmail.com  Mon Jan 23 14:38:44 2012
From: robince at gmail.com (Robin)
Date: Mon, 23 Jan 2012 20:38:44 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <20120123185552.GA27535@ravage>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
Message-ID: <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>

On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
<wardefar at iro.umontreal.ca> wrote:
> I've reproduced this (rather serious) bug myself and confirmed that it exists
> in master, and as far back as 1.4.1.
>
> I'd really appreciate if someone could reproduce and confirm on another
> machine, as so far all my testing has been on our single high-memory machine.

I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
don't think it is an indexing problem - rather something with the
random number creation. a itself is already zeros for high indexes.
??
In [8]: b[1000000:1000010]
Out[8]:
array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
       2005054, 2565207, 3114930])

In [9]: a[b[1000000:1000010]]
Out[9]:
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

In [41]: a[581350:,0].sum()
Out[41]: 0 

Cheers

Robin
>
> Thanks,
> David
>
> On Mon, Jan 23, 2012 at 05:23:28AM -0500, David Warde-Farley wrote:
>> A colleague has run into this weird behaviour with NumPy 1.6.1, EPD 7.1-2, on Linux (Fedora Core 14) 64-bit:
>>
>> > a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8')
>> > b = numpy.random.randint(5000000,size=(4993210,))
>> > c = a[b]
>>
>> It seems c is not getting filled in full, namely:
>>
>> > In [14]: c[1000000:].sum()
>> > Out[14]: 0
>>
>> I haven't been able to reproduce this quite yet, I'll try to find a machine with sufficient memory tomorrow. But does anyone have any insight in the mean time? It smells like some kind of integer overflow bug.
>>
>> Thanks,
>>
>> David
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From apo at pdauf.de  Mon Jan 23 14:40:21 2012
From: apo at pdauf.de (elodw)
Date: Mon, 23 Jan 2012 20:40:21 +0100
Subject: [Numpy-discussion] Counting the Colors of RGB-Image
In-Reply-To: <CALGmxEKT5tS276kwnvUTN23CYG07Zrg2rH5-0kZJuPZmCwr=tQ@mail.gmail.com>
References: <2041858962.2150341.1326878784947.JavaMail.tomcat55@mrmseu0.kundenserver.de>
	<CALGmxEKT5tS276kwnvUTN23CYG07Zrg2rH5-0kZJuPZmCwr=tQ@mail.gmail.com>
Message-ID: <4F1DB7A5.3050208@pdauf.de>

  Am 23.01.2012 18:17, schrieb Chris Barker:
> On Wed, Jan 18, 2012 at 1:26 AM,<apo at pdauf.de>  wrote:
>> Your ideas are very helpfull and the code is very fast.
> I'm curios -- a number of ideas were floated here -- what did you end up using?
>
> -Chris
>
>
I'am sorry but  when i see the code of Torgil Svenson,
I think, "the game is over".

I use the follow. code:

t0=clock()

tt = n_im2.view()
tt.shape = -1,3
ifl = tt[...,0].astype(np.int)*256*256 + tt[...,1].astype(np.int)*256 + 
tt[...,2].astype(np.int)
colors, inv = np.unique(ifl,return_inverse=True)

zus = np.array([colors[-1]+1])
colplus = np.hstack((colors,zus))
ccnt = np.histogram(ifl,colplus)[0]

t1=clock()
print (t1-t0)
t0=t1


> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From aronne.merrelli at gmail.com  Mon Jan 23 15:04:22 2012
From: aronne.merrelli at gmail.com (Aronne Merrelli)
Date: Mon, 23 Jan 2012 14:04:22 -0600
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io>
Message-ID: <CAHNdQ4LO_9B4v-PtEWisUrKGQv0spwMT8C=Z=WJB+=mmcP4phQ@mail.gmail.com>

On Mon, Jan 23, 2012 at 1:33 PM, Travis Oliphant <teoliphant at gmail.com>wrote:

> Can you determine where the problem is, precisely.    In other words, can
> you verify that c is not getting filled in correctly?
>
> You are no doubt going to get overflow in the summation as you have a
> uint8 parameter.   But, having that overflow be exactly '0' would be
> surprising.
>
> Can you verify that a and b are getting created correctly?   Also, 'c'
> should be a 2-d array, can you verify that?  Can you take the sum along the
> -1 axis and the 0 axis separately:
>
> print a.shape
> print b.shape
> print c.shape
>
> c[1000000:].sum(axis=0)
> d = c[1000000:].sum(axis=-1)
> print d[:100]
> print d[-100:]
>


I am getting the same results as David. It looks like c just "stopped
filling in" partway through the array. I don't think there is any overflow
issue, since the result of sum() is up-promoted to uint64 when I do that.
Travis, here are the outputs at my end - I cut out many zeros for brevity:

In [7]: print a.shape
(5000000, 972)
In [8]: print b.shape
(4993210,)
In [9]: print c.shape
(4993210, 972)

In [10]: c[1000000:].sum(axis=0)
Out[10]:
array([0, 0, 0, .... , 0])

In [11]: d = c[1000000:].sum(axis=-1)

In [12]: print d[:100]
[0 0 0 ... 0 0]

In [13]: print d[-100:]
[0 0 0 ... 0 0 0]

I looked at sparse subsamples with matplotlib - specifically,
imshow(a[::1000, :]) - and the a array looks correct (random values
everywhere), but c is zero past a certain row number. In fact, it looks
like it becomes zero at row 575419 - I think for all rows in c beyond row
574519, the values will be zero. For lower row numbers, I think they are
correctly filled (at least, by the sparse view in matplotlib).

In [15]: a[b[574519], 350:360]
Out[15]: array([143, 155,  11,  30, 212, 149, 110, 164, 165, 120],
dtype=uint8)

In [16]: c[574519, 350:360]
Out[16]: array([143, 155,  11,  30, 212, 149,   0,   0,   0,   0],
dtype=uint8)


I'm using EPD 7.1, numpy 1.6.1, Linux installation (I don't know the kernel
details)

HTH,
Aronne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120123/ad3e9033/attachment.html>

From emayssat at gmail.com  Mon Jan 23 15:15:56 2012
From: emayssat at gmail.com (Emmanuel Mayssat)
Date: Mon, 23 Jan 2012 12:15:56 -0800
Subject: [Numpy-discussion] Saving and loading a structured array from a
	TEXT file
Message-ID: <CACB6ZmA62eg-E8UBhcEGA5rwrgu5P_XS3tbJh9htm+TdpmhyVQ@mail.gmail.com>

Is there a way to save a structured array in a text file?
My problem is not so much in the saving procedure, but rather in the
'reloading' procedure.
See below


In [3]: import numpy as np

In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', '<i8'), ('bar', '<f8')])

In [5]: r.tofile('toto.txt',sep='\n')

bash-4.2$ cat toto.txt
('1', 1, 1.0)
('1', 1, 1.0)
('1', 1, 1.0)

In [7]: r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/cls1fs/clseng/10/<ipython-input-7-b07ba265ede7> in <module>()
----> 1 r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)

ValueError: Unable to read character files of that array type


--
Emmanuel


From wardefar at iro.umontreal.ca  Mon Jan 23 15:21:32 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Mon, 23 Jan 2012 15:21:32 -0500
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<46A721C3-59F2-4317-A622-80A8FD4CB43F@continuum.io>
Message-ID: <20120123202131.GC28091@ravage>

Hi Travis,

Thanks for your reply.

On Mon, Jan 23, 2012 at 01:33:42PM -0600, Travis Oliphant wrote:
> Can you determine where the problem is, precisely.    In other words, can you verify that c is not getting filled in correctly? 
> 
> You are no doubt going to get overflow in the summation as you have a uint8 parameter.   But, having that overflow be exactly '0' would be surprising.  

I've already looked at this actually. The last 4400000 or so rows of c are
all zero, however 'a' seems to be filled in fine:

>>> import numpy
>>> a = numpy.array(numpy.random.randint(256,size=(5000000,972)),
>>> dtype=numpy.uint8)
>>> b = numpy.random.randint(5000000,size=(4993210,))
>>> c = a[b]
>>> print c
[[186 215 204 ..., 170  98 198]
 [ 56  98 112 ...,  32 233   1]
 [ 44 133 171 ..., 163  35  51]
 ..., 
 [  0   0   0 ...,   0   0   0]
 [  0   0   0 ...,   0   0   0]
 [  0   0   0 ...,   0   0   0]]
>>> print a
[[ 30 182  56 ..., 133 162 173]
 [112 100  69 ...,   3 147  80]
 [124  70 232 ..., 114 177  11]
 ..., 
 [ 22  42  31 ..., 141 196 134]
 [ 74  47 167 ...,  38 193   9]
 [162 228 190 ..., 150  18   1]]

So it seems to have nothing to do with the sum, but rather the advanced
indexing operation. The zeros seem to start in the middle of row 574519,
in particular at element 356. This is reproducible with different random
vectors of indices, it seems.

So 558432824th element things go awry. I can't say it makes any sense to
me why this would be the magic number.

David


From wardefar at iro.umontreal.ca  Mon Jan 23 15:33:53 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Mon, 23 Jan 2012 15:33:53 -0500
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
Message-ID: <20120123203353.GD28091@ravage>

On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote:
> On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
> <wardefar at iro.umontreal.ca> wrote:
> > I've reproduced this (rather serious) bug myself and confirmed that it exists
> > in master, and as far back as 1.4.1.
> >
> > I'd really appreciate if someone could reproduce and confirm on another
> > machine, as so far all my testing has been on our single high-memory machine.
> 
> I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
> don't think it is an indexing problem - rather something with the
> random number creation. a itself is already zeros for high indexes.
> ??
> In [8]: b[1000000:1000010]
> Out[8]:
> array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
>        2005054, 2565207, 3114930])
> 
> In [9]: a[b[1000000:1000010]]
> Out[9]:
> array([[0, 0, 0, ..., 0, 0, 0],
>        [0, 0, 0, ..., 0, 0, 0],
>        [0, 0, 0, ..., 0, 0, 0],
>        ...,
>        [0, 0, 0, ..., 0, 0, 0],
>        [0, 0, 0, ..., 0, 0, 0],
>        [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
> 
> In [41]: a[581350:,0].sum()
> Out[41]: 0

Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being
filled in -- the problem arises with c alone. 

So, another Windows-specific bug to add to the pile, perhaps? :(

David


From derek at astro.physik.uni-goettingen.de  Mon Jan 23 16:07:11 2012
From: derek at astro.physik.uni-goettingen.de (Derek Homeier)
Date: Mon, 23 Jan 2012 22:07:11 +0100
Subject: [Numpy-discussion] Saving and loading a structured array from a
	TEXT file
In-Reply-To: <CACB6ZmA62eg-E8UBhcEGA5rwrgu5P_XS3tbJh9htm+TdpmhyVQ@mail.gmail.com>
References: <CACB6ZmA62eg-E8UBhcEGA5rwrgu5P_XS3tbJh9htm+TdpmhyVQ@mail.gmail.com>
Message-ID: <619AF526-C5B8-4FEE-91D9-58B0842B67AC@astro.physik.uni-goettingen.de>

On 23 Jan 2012, at 21:15, Emmanuel Mayssat wrote:

> Is there a way to save a structured array in a text file?
> My problem is not so much in the saving procedure, but rather in the
> 'reloading' procedure.
> See below
> 
> 
> In [3]: import numpy as np
> 
> In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', '<i8'), ('bar', '<f8')])
> 
> In [5]: r.tofile('toto.txt',sep='\n')
> 
> bash-4.2$ cat toto.txt
> ('1', 1, 1.0)
> ('1', 1, 1.0)
> ('1', 1, 1.0)
> 
> In [7]: r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> /home/cls1fs/clseng/10/<ipython-input-7-b07ba265ede7> in <module>()
> ----> 1 r2 = np.fromfile('toto.txt',sep='\n',dtype=r.dtype)
> 
> ValueError: Unable to read character files of that array type

I think most of the np.fromfile functionality works for binary input; for reading text 
input np.loadtxt and np.genfromtxt are the (currently) recommended functions. 
It is bit tricky to read the format generated by tofile() in the above example, but 
the following should work:

cnv =  {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')}
r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype)

Generally loadtxt works more smoothly together with savetxt, but the latter unfortunately 
does not offer an easy way to save structured arrays (note to self and others currently 
working on npyio: definitely room for improvement!).

HTH,
							Derek


From cgohlke at uci.edu  Mon Jan 23 16:08:03 2012
From: cgohlke at uci.edu (Christoph Gohlke)
Date: Mon, 23 Jan 2012 13:08:03 -0800
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <20120123203353.GD28091@ravage>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage>
Message-ID: <4F1DCC33.6090101@uci.edu>


On 1/23/2012 12:33 PM, David Warde-Farley wrote:
> On Mon, Jan 23, 2012 at 08:38:44PM +0100, Robin wrote:
>> On Mon, Jan 23, 2012 at 7:55 PM, David Warde-Farley
>> <wardefar at iro.umontreal.ca>  wrote:
>>> I've reproduced this (rather serious) bug myself and confirmed that it exists
>>> in master, and as far back as 1.4.1.
>>>
>>> I'd really appreciate if someone could reproduce and confirm on another
>>> machine, as so far all my testing has been on our single high-memory machine.
>>
>> I see the same behaviour on a Winodows machine with numpy 1.6.1. But I
>> don't think it is an indexing problem - rather something with the
>> random number creation. a itself is already zeros for high indexes.
>> ??
>> In [8]: b[1000000:1000010]
>> Out[8]:
>> array([3429029, 1251819, 4292918, 2249483,  757620, 3977130, 3455449,
>>         2005054, 2565207, 3114930])
>>
>> In [9]: a[b[1000000:1000010]]
>> Out[9]:
>> array([[0, 0, 0, ..., 0, 0, 0],
>>         [0, 0, 0, ..., 0, 0, 0],
>>         [0, 0, 0, ..., 0, 0, 0],
>>         ...,
>>         [0, 0, 0, ..., 0, 0, 0],
>>         [0, 0, 0, ..., 0, 0, 0],
>>         [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
>>
>> In [41]: a[581350:,0].sum()
>> Out[41]: 0
>
> Hmm, this seems like a separate bug to mine. In mine, 'a' is indeed being
> filled in -- the problem arises with c alone.
>
> So, another Windows-specific bug to add to the pile, perhaps? :(
>
> David


Maybe this explains the win-amd64 behavior: There are a couple of places 
in mtrand where array indices and sizes are C long instead of npy_intp, 
for example in the randint function:

<https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863>

Christoph


From derek at astro.physik.uni-goettingen.de  Mon Jan 23 16:28:47 2012
From: derek at astro.physik.uni-goettingen.de (Derek Homeier)
Date: Mon, 23 Jan 2012 22:28:47 +0100
Subject: [Numpy-discussion] Saving and loading a structured array from a
	TEXT file
In-Reply-To: <619AF526-C5B8-4FEE-91D9-58B0842B67AC@astro.physik.uni-goettingen.de>
References: <CACB6ZmA62eg-E8UBhcEGA5rwrgu5P_XS3tbJh9htm+TdpmhyVQ@mail.gmail.com>
	<619AF526-C5B8-4FEE-91D9-58B0842B67AC@astro.physik.uni-goettingen.de>
Message-ID: <D3CA6FF9-D9C1-4014-955A-2F0A32EA9960@astro.physik.uni-goettingen.de>

On 23 Jan 2012, at 22:07, Derek Homeier wrote:

>> In [4]: r = np.ones(3,dtype=[('name', '|S5'), ('foo', '<i8'), ('bar', '<f8')])
>> 
>> In [5]: r.tofile('toto.txt',sep='\n')
>> 
>> bash-4.2$ cat toto.txt
>> ('1', 1, 1.0)
>> ('1', 1, 1.0)
>> ('1', 1, 1.0)
>> 
> 
> cnv =  {0: lambda s: s.lstrip('('), -1: lambda s: s.rstrip(')')}
> r2 = np.loadtxt('toto.txt', delimiter=',', converters=cnv, dtype=r.dtype)
> 
> Generally loadtxt works more smoothly together with savetxt, but the latter unfortunately 
> does not offer an easy way to save structured arrays (note to self and others currently 
> working on npyio: definitely room for improvement!).

For the record, in that example

np.savetxt('toto.txt', r, fmt='%s,%d,%f')

would work as well, saving you the custom converter for loadtxt - it could just become tedious 
to work out the format for more complex structures, so an option to construct this automatically 
from r.dtype could certainly be a nice enhancement. 
Just wondering, is there something like the inverse operator to np.format_parser, i.e. 
mapping each dtype to a default print format specifier?

Cheers,
						Derek


From emayssat at gmail.com  Mon Jan 23 18:26:08 2012
From: emayssat at gmail.com (Emmanuel Mayssat)
Date: Mon, 23 Jan 2012 15:26:08 -0800
Subject: [Numpy-discussion] 'Advanced' save and restore operation
Message-ID: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com>

After having saved data, I need to know/remember the data dtype to
restore it correctly.
Is there a way to save the dtype with the data?
(I guess the header parameter of savedata could help, but they are
only available in v2.0+ )

I would like to save several related structured array and a dictionary
of parameters into a TEXT file.
Is there an easy way to do that?
(maybe xml file, or maybe archive zip file of other files, or ..... )

Any recommendation is helpful.

Regards,
--
Emmanuel


From deshpande.jaidev at gmail.com  Mon Jan 23 18:42:02 2012
From: deshpande.jaidev at gmail.com (Jaidev Deshpande)
Date: Tue, 24 Jan 2012 05:12:02 +0530
Subject: [Numpy-discussion] Working with MATLAB
Message-ID: <CAB=suE=F7GangCGiZ84oNd4m=4Oq9=SLkVB-RGveQszM4G7Thg@mail.gmail.com>

Dear List,

I frequently work with MATLAB and it is necessary for me many a times
to adapt MATLAB codes for NumPy arrays.

While for most practical purposes it works fine, I think there might
be a lot of 'under the hood' things that I might be missing when I
make the translations from MATLAB to Python.

Are there any 'best practices' for working on this transition?

Thanks


From deshpande.jaidev at gmail.com  Mon Jan 23 18:52:42 2012
From: deshpande.jaidev at gmail.com (Jaidev Deshpande)
Date: Tue, 24 Jan 2012 05:22:42 +0530
Subject: [Numpy-discussion] Working with MATLAB
In-Reply-To: <CAB=suE=F7GangCGiZ84oNd4m=4Oq9=SLkVB-RGveQszM4G7Thg@mail.gmail.com>
References: <CAB=suE=F7GangCGiZ84oNd4m=4Oq9=SLkVB-RGveQszM4G7Thg@mail.gmail.com>
Message-ID: <CAB=suEk6X_AkezMhkE6c3V+CBAWkwaRxsUViCwBR-Q8xj_xwUg@mail.gmail.com>

Please ignore my question. I found what I needed on the scipy website.

I asked the question in haste.

I'm sorry.

Thanks


From shish at keba.be  Mon Jan 23 19:45:09 2012
From: shish at keba.be (Olivier Delalleau)
Date: Mon, 23 Jan 2012 19:45:09 -0500
Subject: [Numpy-discussion] 'Advanced' save and restore operation
In-Reply-To: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com>
References: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com>
Message-ID: <CAFXk4brnKe0xigc0rqx01bmdQMkEyO-rSYUP1rJudT0Zy1kskw@mail.gmail.com>

Note sure if there's a better way, but you can do it with some custom load
and save functions:

>>> with open('f.txt', 'w') as f:
...     f.write(str(x.dtype) + '\n')
...     numpy.savetxt(f, x)

>>> with open('f.txt') as f:
...     dtype = f.readline().strip()
...     y = numpy.loadtxt(f).astype(dtype)

I'm not sure how that'd work with structured arrays though. For the dict of
parameters you'd have to write your own load/save piece of code too if you
need a clean text file.

-=- Olivier

2012/1/23 Emmanuel Mayssat <emayssat at gmail.com>

> After having saved data, I need to know/remember the data dtype to
> restore it correctly.
> Is there a way to save the dtype with the data?
> (I guess the header parameter of savedata could help, but they are
> only available in v2.0+ )
>
> I would like to save several related structured array and a dictionary
> of parameters into a TEXT file.
> Is there an easy way to do that?
> (maybe xml file, or maybe archive zip file of other files, or ..... )
>
> Any recommendation is helpful.
>
> Regards,
> --
> Emmanuel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120123/113c608a/attachment.html>

From derek at astro.physik.uni-goettingen.de  Mon Jan 23 20:00:16 2012
From: derek at astro.physik.uni-goettingen.de (Derek Homeier)
Date: Tue, 24 Jan 2012 02:00:16 +0100
Subject: [Numpy-discussion] 'Advanced' save and restore operation
In-Reply-To: <CAFXk4brnKe0xigc0rqx01bmdQMkEyO-rSYUP1rJudT0Zy1kskw@mail.gmail.com>
References: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com>
	<CAFXk4brnKe0xigc0rqx01bmdQMkEyO-rSYUP1rJudT0Zy1kskw@mail.gmail.com>
Message-ID: <7E29BD99-5064-4009-93BC-64AC87956779@astro.physik.uni-goettingen.de>

On 24 Jan 2012, at 01:45, Olivier Delalleau wrote:

> Note sure if there's a better way, but you can do it with some custom load and save functions:
> 
> >>> with open('f.txt', 'w') as f:
> ...     f.write(str(x.dtype) + '\n')
> ...     numpy.savetxt(f, x)
> 
> >>> with open('f.txt') as f:
> ...     dtype = f.readline().strip()
> ...     y = numpy.loadtxt(f).astype(dtype)
> 
> I'm not sure how that'd work with structured arrays though. For the dict of parameters you'd have to write your own load/save piece of code too if you need a clean text file.
> 
> -=- Olivier
> 
> 2012/1/23 Emmanuel Mayssat <emayssat at gmail.com>
> After having saved data, I need to know/remember the data dtype to
> restore it correctly.
> Is there a way to save the dtype with the data?
> (I guess the header parameter of savedata could help, but they are
> only available in v2.0+ )
> 
> I would like to save several related structured array and a dictionary
> of parameters into a TEXT file.
> Is there an easy way to do that?
> (maybe xml file, or maybe archive zip file of other files, or ..... )
> 
> Any recommendation is helpful.

asciitable might be of some help, but to implement all of your required functionality, 
you'd probably still have to implement your own Reader class:

http://cxc.cfa.harvard.edu/contrib/asciitable/

Cheers,
						Derek


From sturla at molden.no  Mon Jan 23 23:35:20 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 24 Jan 2012 05:35:20 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1DCC33.6090101@uci.edu>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
Message-ID: <4F1E3508.1070405@molden.no>

Den 23.01.2012 22:08, skrev Christoph Gohlke:
> Maybe this explains the win-amd64 behavior: There are a couple of places
> in mtrand where array indices and sizes are C long instead of npy_intp,
> for example in the randint function:
>
> <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863>
>
>

AFAIK, on AMD64 a C long is 64 bit on Linux (gcc) and 32 bit on Windows 
(gcc and MSVC).

Sturla


From sturla at molden.no  Tue Jan 24 00:00:05 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 24 Jan 2012 06:00:05 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1DCC33.6090101@uci.edu>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
Message-ID: <4F1E3AD5.9000507@molden.no>

Den 23.01.2012 22:08, skrev Christoph Gohlke:
>
> Maybe this explains the win-amd64 behavior: There are a couple of places
> in mtrand where array indices and sizes are C long instead of npy_intp,
> for example in the randint function:
>
> <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863>
>
>

Both i and length could overflow here. It should overflow on allocation 
of more than 2 GB.

There is also a lot of C longs in the internal state (line 55-105), as 
well as the other functions.

Producing 2 GB of random ints twice fails:

 >>> import numpy as np
 >>> np.random.randint(5000000,size=(2*1024**3,))
array([0, 0, 0, ..., 0, 0, 0])
 >>> np.random.randint(5000000,size=(2*1024**3,))

Traceback (most recent call last):
   File "<pyshell#3>", line 1, in <module>
     np.random.randint(5000000,size=(2*1024**3,))
   File "mtrand.pyx", line 881, in mtrand.RandomState.randint 
(numpy\random\mtrand\mtrand.c:6040)
MemoryError
 >>>


Sturla


From sturla at molden.no  Tue Jan 24 00:32:14 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 24 Jan 2012 06:32:14 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1E3AD5.9000507@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no>
Message-ID: <4F1E425E.20203@molden.no>

Den 24.01.2012 06:00, skrev Sturla Molden:
> Both i and length could overflow here. It should overflow on 
> allocation of more than 2 GB. There is also a lot of C longs in the 
> internal state (line 55-105), as well as the other functions.

The use of C long affects all the C and Pyrex source code in mtrand 
module, not just mtrand.pyx. All of it is fubar on Win64.

 From the C standard, a C long is only quarranteed to be "at least 32 
bits wide".  Thus a C long can only be expected to index up to 2**31 - 
1, and it is not a Windows specific problem.

So it seems there are hundreds of places in the mtrand module where 
integers can overflow on 64-bit Python.

Also the crappy old Pyrex code should be updated to some more recent Cython.

Sturla


From sturla at molden.no  Tue Jan 24 03:21:00 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 24 Jan 2012 09:21:00 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1E425E.20203@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>	<20120123185552.GA27535@ravage>	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>	<20120123203353.GD28091@ravage>
	<4F1DCC33.6090101@uci.edu>	<4F1E3AD5.9000507@molden.no>
	<4F1E425E.20203@molden.no>
Message-ID: <4F1E69EC.4020408@molden.no>

On 24.01.2012 06:32, Sturla Molden wrote:

> The use of C long affects all the C and Pyrex source code in mtrand
> module, not just mtrand.pyx. All of it is fubar on Win64.

randomkit.c handles C long correctly, I think. There are different codes 
for 32 and 64 bit C long, and buffer sizes are size_t.

Sturla


From sturla at molden.no  Tue Jan 24 03:37:26 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 24 Jan 2012 09:37:26 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1E69EC.4020408@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>	<20120123185552.GA27535@ravage>	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>	<20120123203353.GD28091@ravage>	<4F1DCC33.6090101@uci.edu>	<4F1E3AD5.9000507@molden.no>	<4F1E425E.20203@molden.no>
	<4F1E69EC.4020408@molden.no>
Message-ID: <4F1E6DC6.6040904@molden.no>

On 24.01.2012 09:21, Sturla Molden wrote:

> randomkit.c handles C long correctly, I think. There are different codes
> for 32 and 64 bit C long, and buffer sizes are size_t.

distributions.c take C longs as parameters e.g. for the binomial 
distribution. mtrand.pyx correctly handles this, but it can give an 
unexpected overflow error on 64-bit Windows:


In [1]: np.random.binomial(2**31, .5)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
C:\Windows\system32\<ipython-input-1-000aa0626c42> in <module>()
----> 1 np.random.binomial(2**31, .5)

C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in 
mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()

OverflowError: Python int too large to convert to C long


On systems where C longs are 64 bit, this is likely not to produce an 
error.

This begs the question if also randomkit.c and districutions.c should be 
changed to use npy_intp for consistency across all platforms.

(I assume we are not supporting 16 bit NumPy, in which case we will need 
C long there...)


Sturla


From sturla at molden.no  Tue Jan 24 03:47:01 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 24 Jan 2012 09:47:01 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1E425E.20203@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>	<20120123185552.GA27535@ravage>	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>	<20120123203353.GD28091@ravage>
	<4F1DCC33.6090101@uci.edu>	<4F1E3AD5.9000507@molden.no>
	<4F1E425E.20203@molden.no>
Message-ID: <4F1E7005.1090801@molden.no>

On 24.01.2012 06:32, Sturla Molden wrote:
> Den 24.01.2012 06:00, skrev Sturla Molden:
>> Both i and length could overflow here. It should overflow on
>> allocation of more than 2 GB. There is also a lot of C longs in the
>> internal state (line 55-105), as well as the other functions.
>
> The use of C long affects all the C and Pyrex source code in mtrand
> module, not just mtrand.pyx. All of it is fubar on Win64.


The coding is also inconsistent, compare for example:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201


Sturla


From robert.kern at gmail.com  Tue Jan 24 04:15:01 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 24 Jan 2012 09:15:01 +0000
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1E6DC6.6040904@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no>
	<4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no>
Message-ID: <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com>

On Tue, Jan 24, 2012 at 08:37, Sturla Molden <sturla at molden.no> wrote:
> On 24.01.2012 09:21, Sturla Molden wrote:
>
>> randomkit.c handles C long correctly, I think. There are different codes
>> for 32 and 64 bit C long, and buffer sizes are size_t.
>
> distributions.c take C longs as parameters e.g. for the binomial
> distribution. mtrand.pyx correctly handles this, but it can give an
> unexpected overflow error on 64-bit Windows:
>
>
> In [1]: np.random.binomial(2**31, .5)
> ---------------------------------------------------------------------------
> OverflowError ? ? ? ? ? ? ? ? ? ? ? ? ? ? Traceback (most recent call last)
> C:\Windows\system32\<ipython-input-1-000aa0626c42> in <module>()
> ----> 1 np.random.binomial(2**31, .5)
>
> C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in
> mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()
>
> OverflowError: Python int too large to convert to C long
>
>
> On systems where C longs are 64 bit, this is likely not to produce an
> error.
>
> This begs the question if also randomkit.c and districutions.c should be
> changed to use npy_intp for consistency across all platforms.

There are two different uses of long that you need to distinguish. One
is for sizes, and one is for parameters and values. The sizes should
definitely be upgraded to npy_intp. The latter shouldn't; these should
remain as the default integer type of Python and numpy, a C long.

The reason longs are used for sizes is that I wrote mtrand for Numeric
and Python 2.4 before numpy was even announced (and I don't think we
had npy_intp at the time I merged it into numpy, but I could be
wrong). Using longs for sizes was the order of the day. I don't think
I had even touched a 64-bit machine that wasn't a DEC Alpha at the
time, so I knew very little about the issues.

So yes, please, fix whatever you can.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From robert.kern at gmail.com  Tue Jan 24 04:16:48 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 24 Jan 2012 09:16:48 +0000
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1E7005.1090801@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no>
	<4F1E7005.1090801@molden.no>
Message-ID: <CAF6FJivbJjMTm4ong2N_5J3nD7g9ec71SJeLOqVO86iXov696w@mail.gmail.com>

On Tue, Jan 24, 2012 at 08:47, Sturla Molden <sturla at molden.no> wrote:

> The coding is also inconsistent, compare for example:
>
> https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L180
>
> https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L201

I'm sorry, what are you demonstrating there?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From sturla at molden.no  Tue Jan 24 04:19:29 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 24 Jan 2012 10:19:29 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <CAF6FJivbJjMTm4ong2N_5J3nD7g9ec71SJeLOqVO86iXov696w@mail.gmail.com>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>	<20120123185552.GA27535@ravage>	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>	<20120123203353.GD28091@ravage>
	<4F1DCC33.6090101@uci.edu>	<4F1E3AD5.9000507@molden.no>
	<4F1E425E.20203@molden.no>	<4F1E7005.1090801@molden.no>
	<CAF6FJivbJjMTm4ong2N_5J3nD7g9ec71SJeLOqVO86iXov696w@mail.gmail.com>
Message-ID: <4F1E77A1.3090504@molden.no>

On 24.01.2012 10:16, Robert Kern wrote:

> I'm sorry, what are you demonstrating there?

Both npy_intp and C long are used for sizes and indexing.

Sturla


From robert.kern at gmail.com  Tue Jan 24 04:23:22 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 24 Jan 2012 09:23:22 +0000
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1E77A1.3090504@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no>
	<4F1E7005.1090801@molden.no>
	<CAF6FJivbJjMTm4ong2N_5J3nD7g9ec71SJeLOqVO86iXov696w@mail.gmail.com>
	<4F1E77A1.3090504@molden.no>
Message-ID: <CAF6FJiu7wwvz_XDZYTVanDm0n=5c485s=8ZajGoSKG_TvVFzBg@mail.gmail.com>

On Tue, Jan 24, 2012 at 09:19, Sturla Molden <sturla at molden.no> wrote:
> On 24.01.2012 10:16, Robert Kern wrote:
>
>> I'm sorry, what are you demonstrating there?
>
> Both npy_intp and C long are used for sizes and indexing.

Ah, yes. I think Travis added the multiiter code to cont1_array(),
which does broadcasting, so he used npy_intp as is proper (and
necessary to pass into the multiiter API). The other functions don't
do broadcasting, so he didn't touch them.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From sturla at molden.no  Tue Jan 24 05:01:53 2012
From: sturla at molden.no (Sturla Molden)
Date: Tue, 24 Jan 2012 11:01:53 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>	<20120123185552.GA27535@ravage>	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>	<20120123203353.GD28091@ravage>
	<4F1DCC33.6090101@uci.edu>	<4F1E3AD5.9000507@molden.no>
	<4F1E425E.20203@molden.no>	<4F1E69EC.4020408@molden.no>
	<4F1E6DC6.6040904@molden.no>
	<CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com>
Message-ID: <4F1E8191.3010300@molden.no>

On 24.01.2012 10:15, Robert Kern wrote:

> There are two different uses of long that you need to distinguish. One
> is for sizes, and one is for parameters and values. The sizes should
> definitely be upgraded to npy_intp. The latter shouldn't; these should
> remain as the default integer type of Python and numpy, a C long.

Ok, that makes sence.

> The reason longs are used for sizes is that I wrote mtrand for Numeric
> and Python 2.4 before numpy was even announced (and I don't think we
> had npy_intp at the time I merged it into numpy, but I could be
> wrong). Using longs for sizes was the order of the day. I don't think
> I had even touched a 64-bit machine that wasn't a DEC Alpha at the
> time, so I knew very little about the issues.


On amd64 the "native" datatypes are actually a 64 bit pointer with a 32 
bit offset (contrary to what we see in Python and NumPy C sources), 
which is one reason why C longs are still 32 bits in MSVC. Thus an array 
size (size_t) should be 64 bits, but array indices (C long) should be 32 
bits. But nobody likes to code like that (e.g. we would need an extra 64 
bit pointer as cursor if the buffer size overflows a C long), and I 
don't think using a non-native 64-bit offset incur a lot of extra 
overhead for the CPU.

:-)

Sturla


From pierre.haessig at crans.org  Tue Jan 24 09:01:44 2012
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Tue, 24 Jan 2012 15:01:44 +0100
Subject: [Numpy-discussion] Strange error raised by scipy.special.erf
In-Reply-To: <26FC23E7C398A64083C980D16001012D261F0D9376@VA3DIAXVS361.RED001.local>
References: <26FC23E7C398A64083C980D16001012D261F0D9376@VA3DIAXVS361.RED001.local>
Message-ID: <4F1EB9C8.9070309@crans.org>

Le 22/01/2012 11:28, Nadav Horesh a ?crit :
> >>> special.erf(26.5)
> 1.0
> >>> special.erf(26.6)
> Traceback (most recent call last):
>   File "<pyshell#7>", line 1, in <module>
>     special.erf(26.6)
> FloatingPointError: underflow encountered in erf
> >>> special.erf(26.7)
> 1.0
>  
I can confirm this same behaviour with numpy 1.5.1/scipy 0.9.0
Indeed 26.5 and 26.7 works, while 26.6 raises the underflow... weird
enough !
-- 
Pierre


From gammelmark at gmail.com  Tue Jan 24 09:32:30 2012
From: gammelmark at gmail.com (=?ISO-8859-1?Q?S=F8ren_Gammelmark?=)
Date: Tue, 24 Jan 2012 15:32:30 +0100
Subject: [Numpy-discussion] einsum evaluation order
Message-ID: <CAJO1x6qDxz5qe-=h-tDNwyk3Ywz-2i-_w7z5UnVfKHFHLWrYCQ@mail.gmail.com>

Dear all,

I was just looking into numpy.einsum and encountered an issue which might
be worth pointing out in the documentation.

Let us say you wish to evaluate something like this (repeated indices a
summed)

D[alpha, alphaprime] = A[alpha, beta, sigma] * B[alphaprime, betaprime,
sigma] * C[beta, betaprime]

with einsum as

einsum('abs,cds,bd->ac', A, B, C)

then it is not exactly clear which order einsum evaluates the contractions
(or if it does it in one go). This can be very important since you can do
it in several ways, one of which has the least computational complexity.
The most efficient way of doing it is to contract e.g. A and C and then
contract that with B (or exchange A and B). A quick test on my labtop says
2.6 s with einsum and 0.13 s for two tensordots with A and B being D x D x
2 and C is D x D for D = 96. This scaling seems to explode for higher
dimensions, whereas it is much better with the two independent contractions
(i believe it should be O(D^3)).For D = 512 I could do it in 5 s with two
contractions, whereas I stopped waiting after 60 s for einsum (i guess
einsum probably is O(D^4) in this case).

I had in fact thought of making a function similar to einsum for a while,
but after I noticed it dropped it. I think, however, that there might still
be room for a tool for evaluating more complicated expressions efficiently.
I think the best way would be for the user to enter an expression like the
one above which is then evaluated in the optimal order. I know how to do
this (theoretically) if all the repeated indices only occur twice (like the
expression above), but for the more general expression supported by einsum
I om not sure how to do it (haven't thought about it). Here I am thinking
about stuff like x[i] = a[i] * b[i] and their more general counterparts (at
first glance this seems to be a simpler problem than full contractions). Do
you think there is a need/interest for this kind of thing? In that case I
would like the write it / help write it. Much of it, I think, can be
reduced to decomposing the expression into existing numpy operations (e.g.
tensordot). How to incorporate issues of storage layout etc, however, I
have no idea.

In any case I think it might be nice to write explicitly how the expression
in einsum is evaluated in the docs.

S?ren Gammelmark
PhD-student
Department of Physics and Astronomy
Aarhus University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/ea1cf5a0/attachment.html>

From Kathleen.M.Tacina at nasa.gov  Tue Jan 24 10:29:16 2012
From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina)
Date: Tue, 24 Jan 2012 15:29:16 +0000
Subject: [Numpy-discussion] Unexpected behavior with np.min_scalar_type
Message-ID: <1327418956.6882.67.camel@MOSES.grc.nasa.gov>

I was experimenting with np.min_scalar_type to make sure it worked as
expected, and found some unexpected results for integers between 2**63
and 2**64-1.  I would have expected np.min_scalar_type(2**64-1) to
return uint64.  Instead, I get object.  Further experimenting showed
that the largest integer for which np.min_scalar_type will return uint64
is 2**63-1.  Is this expected behavior?

On python 2.7.2 on a 64-bit linux machine:
>>> import numpy as np
>>> np.version.full_version
'2.0.0.dev-55472ca'
>>> np.min_scalar_type(2**8-1)
dtype('uint8')
>>> np.min_scalar_type(2**16-1)
dtype('uint16')
>>> np.min_scalar_type(2**32-1)
dtype('uint32')
>>> np.min_scalar_type(2**64-1)
dtype('O')
>>> np.min_scalar_type(2**63-1)
dtype('uint64')
>>> np.min_scalar_type(2**63)
dtype('O')

I get the same results on a Windows XP  machine running python 2.7.2 and
numpy 1.6.1. 

Kathy          
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/68c4af44/attachment.html>

From nadavh at visionsense.com  Tue Jan 24 10:37:09 2012
From: nadavh at visionsense.com (Nadav Horesh)
Date: Tue, 24 Jan 2012 07:37:09 -0800
Subject: [Numpy-discussion] Strange error raised by scipy.special.erf
In-Reply-To: <4F1EB9C8.9070309@crans.org>
References: <26FC23E7C398A64083C980D16001012D261F0D9376@VA3DIAXVS361.RED001.local>,
	<4F1EB9C8.9070309@crans.org>
Message-ID: <26FC23E7C398A64083C980D16001012D261F0D937A@VA3DIAXVS361.RED001.local>

I filed a ticket (#1590).

 Thank you for the verification.

   Nadav.
________________________________________
From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Pierre Haessig [pierre.haessig at crans.org]
Sent: 24 January 2012 16:01
To: numpy-discussion at scipy.org
Subject: Re: [Numpy-discussion] Strange error raised by scipy.special.erf

Le 22/01/2012 11:28, Nadav Horesh a ?crit :
> >>> special.erf(26.5)
> 1.0
> >>> special.erf(26.6)
> Traceback (most recent call last):
>   File "<pyshell#7>", line 1, in <module>
>     special.erf(26.6)
> FloatingPointError: underflow encountered in erf
> >>> special.erf(26.7)
> 1.0
>
I can confirm this same behaviour with numpy 1.5.1/scipy 0.9.0
Indeed 26.5 and 26.7 works, while 26.6 raises the underflow... weird
enough !
--
Pierre
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


From wardefar at iro.umontreal.ca  Tue Jan 24 11:19:21 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Tue, 24 Jan 2012 11:19:21 -0500
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no>
	<4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no>
	<CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com>
Message-ID: <20120124161921.GA31456@ravage>

On Tue, Jan 24, 2012 at 09:15:01AM +0000, Robert Kern wrote:
> On Tue, Jan 24, 2012 at 08:37, Sturla Molden <sturla at molden.no> wrote:
> > On 24.01.2012 09:21, Sturla Molden wrote:
> >
> >> randomkit.c handles C long correctly, I think. There are different codes
> >> for 32 and 64 bit C long, and buffer sizes are size_t.
> >
> > distributions.c take C longs as parameters e.g. for the binomial
> > distribution. mtrand.pyx correctly handles this, but it can give an
> > unexpected overflow error on 64-bit Windows:
> >
> >
> > In [1]: np.random.binomial(2**31, .5)
> > ---------------------------------------------------------------------------
> > OverflowError ? ? ? ? ? ? ? ? ? ? ? ? ? ? Traceback (most recent call last)
> > C:\Windows\system32\<ipython-input-1-000aa0626c42> in <module>()
> > ----> 1 np.random.binomial(2**31, .5)
> >
> > C:\Python27\lib\site-packages\numpy\random\mtrand.pyd in
> > mtrand.RandomState.binomial (numpy\random\mtrand\mtrand.c:13770)()
> >
> > OverflowError: Python int too large to convert to C long
> >
> >
> > On systems where C longs are 64 bit, this is likely not to produce an
> > error.
> >
> > This begs the question if also randomkit.c and districutions.c should be
> > changed to use npy_intp for consistency across all platforms.
> 
> There are two different uses of long that you need to distinguish. One
> is for sizes, and one is for parameters and values. The sizes should
> definitely be upgraded to npy_intp. The latter shouldn't; these should
> remain as the default integer type of Python and numpy, a C long.

Hmm. Seeing as the width of a C long is inconsistent, does this imply that
the random number generator will produce different results on different
platforms? Or do the state dynamics prevent it from ever growing in magnitude
to the point where that's an issue?

David


From wardefar at iro.umontreal.ca  Tue Jan 24 12:24:24 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Tue, 24 Jan 2012 12:24:24 -0500
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F1E3AD5.9000507@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no>
Message-ID: <20120124172424.GB31456@ravage>

On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote:
> Den 23.01.2012 22:08, skrev Christoph Gohlke:
> >
> > Maybe this explains the win-amd64 behavior: There are a couple of places
> > in mtrand where array indices and sizes are C long instead of npy_intp,
> > for example in the randint function:
> >
> > <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863>
> >
> >
> 
> Both i and length could overflow here. It should overflow on allocation 
> of more than 2 GB.
> 
> There is also a lot of C longs in the internal state (line 55-105), as 
> well as the other functions.
> 
> Producing 2 GB of random ints twice fails:

Sturla, since you seem to have access to Win64 machines, do you suppose you
could try this code:

>>> a = numpy.ones((1, 972))
>>> b = numpy.zeros((4993210,), dtype=int)
>>> c = a[b]

and verify that there's a whole lot of 0s in the matrix, specifically,

>>> c[574519:].sum()
356.0
>>> c[574520:].sum()
0.0

is the case on Linux 64-bit; is it the case on Windows 64?

Thanks a lot,

David


From robince at gmail.com  Tue Jan 24 12:37:12 2012
From: robince at gmail.com (Robin)
Date: Tue, 24 Jan 2012 18:37:12 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <20120124172424.GB31456@ravage>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage>
Message-ID: <CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com>

On Tue, Jan 24, 2012 at 6:24 PM, David Warde-Farley
<wardefar at iro.umontreal.ca> wrote:
> On Tue, Jan 24, 2012 at 06:00:05AM +0100, Sturla Molden wrote:
>> Den 23.01.2012 22:08, skrev Christoph Gohlke:
>> >
>> > Maybe this explains the win-amd64 behavior: There are a couple of places
>> > in mtrand where array indices and sizes are C long instead of npy_intp,
>> > for example in the randint function:
>> >
>> > <https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L863>
>> >
>> >
>>
>> Both i and length could overflow here. It should overflow on allocation
>> of more than 2 GB.
>>
>> There is also a lot of C longs in the internal state (line 55-105), as
>> well as the other functions.
>>
>> Producing 2 GB of random ints twice fails:
>
> Sturla, since you seem to have access to Win64 machines, do you suppose you
> could try this code:
>
>>>> a = numpy.ones((1, 972))
>>>> b = numpy.zeros((4993210,), dtype=int)
>>>> c = a[b]
>
> and verify that there's a whole lot of 0s in the matrix, specifically,
>
>>>> c[574519:].sum()
> 356.0
>>>> c[574520:].sum()
> 0.0
>
> is the case on Linux 64-bit; is it the case on Windows 64?

Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.

Cheers

Robin


From wardefar at iro.umontreal.ca  Tue Jan 24 13:02:44 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Tue, 24 Jan 2012 13:02:44 -0500
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage>
	<CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com>
Message-ID: <20120124180244.GD31456@ravage>

On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote:

> Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.

Alright, so that rules out platform specific effects.

I'll try and hunt the bug down when I have some time, if someone more
familiar with the indexing code doesn't beat me to it.

David


From kmichael.aye at gmail.com  Tue Jan 24 13:33:30 2012
From: kmichael.aye at gmail.com (K.-Michael Aye)
Date: Tue, 24 Jan 2012 19:33:30 +0100
Subject: [Numpy-discussion] bug in numpy.mean() ?
Message-ID: <jfmthq$jhh$1@dough.gmane.org>

I know I know, that's pretty outrageous to even suggest, but please 
bear with me, I am stumped as you may be:

2-D data file here:
http://dl.dropbox.com/u/139035/data.npy

Then:
In [3]: data.mean()
Out[3]: 3067.0243839999998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.shape
Out[5]: (1000, 1000)

In [6]: data.min()
Out[6]: 3040.498

In [7]: data.dtype
Out[7]: dtype('float32')


A mean value calculated per loop over the data gives me 3045.747251076416
I first thought I still misunderstand how data.mean() works, per axis 
and so on, but did the same with a flattenend version with the same 
results.

Am I really soo tired that I can't see what I am doing wrong here?
For completion, the data was read by a osgeo.gdal dataset method called 
ReadAsArray()
My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
Enthought's EPD.

Best regards,
Michael


From bsouthey at gmail.com  Tue Jan 24 13:50:31 2012
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 24 Jan 2012 12:50:31 -0600
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <jfmthq$jhh$1@dough.gmane.org>
References: <jfmthq$jhh$1@dough.gmane.org>
Message-ID: <4F1EFD77.2010004@gmail.com>

On 01/24/2012 12:33 PM, K.-Michael Aye wrote:
> I know I know, that's pretty outrageous to even suggest, but please
> bear with me, I am stumped as you may be:
>
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
>
> Then:
> In [3]: data.mean()
> Out[3]: 3067.0243839999998
>
> In [4]: data.max()
> Out[4]: 3052.4343
>
> In [5]: data.shape
> Out[5]: (1000, 1000)
>
> In [6]: data.min()
> Out[6]: 3040.498
>
> In [7]: data.dtype
> Out[7]: dtype('float32')
>
>
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis
> and so on, but did the same with a flattenend version with the same
> results.
>
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on
> Enthought's EPD.
>
> Best regards,
> Michael
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
You have a million 32-bit floating point numbers that are in the 
thousands. Thus you are exceeding the 32-bitfloat precision and, if you 
can, you need to increase precision of the accumulator in np.mean() or 
change the input dtype:
 >>> a.mean(dtype=np.float32) # default and lacks precision
3067.0243839999998
 >>> a.mean(dtype=np.float64)
3045.747251076416
 >>> a.mean(dtype=np.float128)
3045.7472510764160156
 >>> b=a.astype(np.float128)
 >>> b.mean()
3045.7472510764160156

Otherwise you are left to using some alternative approach to calculate 
the mean.

Bruce


From Kathleen.M.Tacina at nasa.gov  Tue Jan 24 13:53:27 2012
From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina)
Date: Tue, 24 Jan 2012 18:53:27 +0000
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <jfmthq$jhh$1@dough.gmane.org>
References: <jfmthq$jhh$1@dough.gmane.org>
Message-ID: <1327431207.6882.111.camel@MOSES.grc.nasa.gov>

I have confirmed this on a 64-bit linux machine running python 2.7.2
with the development version of numpy.  It seems to be related to using
float32 instead of float64.   If the array is first converted to a
64-bit float (via astype), mean gives an answer that agrees with your
looped-calculation value: 3045.7472500000002.  With the original 32-bit
array, averaging successively on one axis and then on the other gives
answers that agree with the 64-bit float answer to the second decimal
place.


In [125]: d = np.load('data.npy')

In [126]: d.mean()
Out[126]: 3067.0243839999998

In [127]: d64 = d.astype('float64')

In [128]: d64.mean()
Out[128]: 3045.747251076416

In [129]: d.mean(axis=0).mean()
Out[129]: 3045.7487500000002

In [130]: d.mean(axis=1).mean()
Out[130]: 3045.7444999999998

In [131]: np.version.full_version
Out[131]: '2.0.0.dev-55472ca'


--
On Tue, 2012-01-24 at 12:33 -0600, K.-MichaelA wrote:

> I know I know, that's pretty outrageous to even suggest, but please 
> bear with me, I am stumped as you may be:
> 
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
> 
> Then:
> In [3]: data.mean()
> Out[3]: 3067.0243839999998
> 
> In [4]: data.max()
> Out[4]: 3052.4343
> 
> In [5]: data.shape
> Out[5]: (1000, 1000)
> 
> In [6]: data.min()
> Out[6]: 3040.498
> 
> In [7]: data.dtype
> Out[7]: dtype('float32')
> 
> 
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis 
> and so on, but did the same with a flattenend version with the same 
> results.
> 
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called 
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
> Enthought's EPD.
> 
> Best regards,
> Michael
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
--------------------------------------------------
Kathleen M. Tacina
NASA Glenn Research Center
MS 5-10
21000 Brookpark Road
Cleveland, OH 44135
Telephone: (216) 433-6660
Fax: (216) 433-5802
--------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/9c41e8fd/attachment.html>

From zachary.pincus at yale.edu  Tue Jan 24 13:58:57 2012
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Tue, 24 Jan 2012 13:58:57 -0500
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <jfmthq$jhh$1@dough.gmane.org>
References: <jfmthq$jhh$1@dough.gmane.org>
Message-ID: <919864B3-F3D0-4C18-8669-BB4DCB16172F@yale.edu>


On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote:

> I know I know, that's pretty outrageous to even suggest, but please 
> bear with me, I am stumped as you may be:
> 
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
> 
> Then:
> In [3]: data.mean()
> Out[3]: 3067.0243839999998
> 
> In [4]: data.max()
> Out[4]: 3052.4343
> 
> In [5]: data.shape
> Out[5]: (1000, 1000)
> 
> In [6]: data.min()
> Out[6]: 3040.498
> 
> In [7]: data.dtype
> Out[7]: dtype('float32')
> 
> 
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis 
> and so on, but did the same with a flattenend version with the same 
> results.
> 
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called 
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
> Enthought's EPD.


I get the same result:

In [1]: import numpy

In [2]: data = numpy.load('data.npy')

In [3]: data.mean()
Out[3]: 3067.0243839999998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.min()
Out[5]: 3040.498

In [6]: numpy.version.version
Out[6]: '2.0.0.dev-433b02a'

This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 32 vs. 64-bit process doesn't make a difference.

The data matrix doesn't look too strange when I view it as an image -- all pretty smooth variation around the (min, max) range. But maybe it's still somehow floating-point pathological?

This is fun too:
In [12]: data.mean()
Out[12]: 3067.0243839999998

In [13]: (data/3000).mean()*3000
Out[13]: 3020.8074375000001

In [15]: (data/2).mean()*2
Out[15]: 3067.0243839999998

In [16]: (data/200).mean()*200
Out[16]: 3013.6754000000001


Zach


From kalatsky at gmail.com  Tue Jan 24 14:01:40 2012
From: kalatsky at gmail.com (Val Kalatsky)
Date: Tue, 24 Jan 2012 13:01:40 -0600
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <jfmthq$jhh$1@dough.gmane.org>
References: <jfmthq$jhh$1@dough.gmane.org>
Message-ID: <CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com>

Just what Bruce said.

You can run the following to confirm:
np.mean(data - data.mean())

If for some reason you do not want to convert to float64 you can add the
result of the previous line to the "bad" mean:
bad_mean = data.mean()
good_mean = bad_mean + np.mean(data - bad_mean)

Val

On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye <kmichael.aye at gmail.com>wrote:

> I know I know, that's pretty outrageous to even suggest, but please
> bear with me, I am stumped as you may be:
>
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
>
> Then:
> In [3]: data.mean()
> Out[3]: 3067.0243839999998
>
> In [4]: data.max()
> Out[4]: 3052.4343
>
> In [5]: data.shape
> Out[5]: (1000, 1000)
>
> In [6]: data.min()
> Out[6]: 3040.498
>
> In [7]: data.dtype
> Out[7]: dtype('float32')
>
>
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis
> and so on, but did the same with a flattenend version with the same
> results.
>
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on
> Enthought's EPD.
>
> Best regards,
> Michael
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/32c5136b/attachment.html>

From zachary.pincus at yale.edu  Tue Jan 24 14:05:50 2012
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Tue, 24 Jan 2012 14:05:50 -0500
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <4F1EFD77.2010004@gmail.com>
References: <jfmthq$jhh$1@dough.gmane.org> <4F1EFD77.2010004@gmail.com>
Message-ID: <57F1B495-BF3C-4C9B-9767-04A4F2AFF1DD@yale.edu>

> You have a million 32-bit floating point numbers that are in the 
> thousands. Thus you are exceeding the 32-bitfloat precision and, if you 
> can, you need to increase precision of the accumulator in np.mean() or 
> change the input dtype:
>>>> a.mean(dtype=np.float32) # default and lacks precision
> 3067.0243839999998
>>>> a.mean(dtype=np.float64)
> 3045.747251076416
>>>> a.mean(dtype=np.float128)
> 3045.7472510764160156
>>>> b=a.astype(np.float128)
>>>> b.mean()
> 3045.7472510764160156
> 
> Otherwise you are left to using some alternative approach to calculate 
> the mean.
> 
> Bruce

Interesting -- I knew that float64 accumulators were used with integer arrays, and I had just assumed that 64-bit or higher accumulators would be used with floating-point arrays too, instead of the array's dtype. This is actually quite a bit of a gotcha for floating-point imaging-type tasks -- good to know!

Zach

From kmichael.aye at gmail.com  Tue Jan 24 14:48:31 2012
From: kmichael.aye at gmail.com (K.-Michael Aye)
Date: Tue, 24 Jan 2012 20:48:31 +0100
Subject: [Numpy-discussion] bug in numpy.mean() ?
References: <jfmthq$jhh$1@dough.gmane.org>
	<CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com>
Message-ID: <jfn1ug$kve$1@dough.gmane.org>

Thank you Bruce and all, 
I knew I was doing something wrong (should have read the mean method 
doc more closely). Am of course glad that's so easy understandable.
But: If the error can get so big, wouldn't it be a better idea for the 
accumulator to always be of type 'float64' and then convert later to 
the type of the original array? 
As one can see in this case, the result would be much closer to the true value.

Michael


On 2012-01-24 19:01:40 +0000, Val Kalatsky said:

> 
> Just what Bruce said.?
> 
> You can run the following to confirm:
> np.mean(data - data.mean())
> 
> If for some reason you do not want to convert to float64 you can add 
> the result of the previous line to the "bad" mean:
> bad_mean =?data.mean()
> good_mean =?bad_mean +?np.mean(data - bad_mean)
> 
> Val
> 
> On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye 
> <kmichael.aye at gmail.com> wrote:
> I know I know, that's pretty outrageous to even suggest, but please
> bear with me, I am stumped as you may be:
> 
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
> 
> Then:
> In [3]: data.mean()
> Out[3]: 3067.0243839999998
> 
> In [4]: data.max()
> Out[4]: 3052.4343
> 
> In [5]: data.shape
> Out[5]: (1000, 1000)
> 
> In [6]: data.min()
> Out[6]: 3040.498
> 
> In [7]: data.dtype
> Out[7]: dtype('float32')
> 
> 
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis
> and so on, but did the same with a flattenend version with the same
> results.
> 
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on
> Enthought's EPD.
> 
> Best regards,
> Michael
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/98be8f9e/attachment.html>

From mmueller at python-academy.de  Tue Jan 24 15:49:43 2012
From: mmueller at python-academy.de (=?ISO-8859-15?Q?Mike_M=FCller?=)
Date: Tue, 24 Jan 2012 21:49:43 +0100
Subject: [Numpy-discussion] Course "Python for Scientists and Engineers" in
	Chicago
Message-ID: <4F1F1967.3030209@python-academy.de>

Course "Python for Scientists and Engineers" in Chicago
=======================================================

There will be a comprehensive Python course for scientists and engineers
in Chicago end of February / beginning of March 2012. It consists of a 3-day
intro and a 2-day advanced section. Both sections can be taken separately
or combined.

More details below and here: http://www.dabeaz.com/chicago/science.html

Please let friends or colleagues who might be interested in such a
course know about it.


3-Day Intro Section
-------------------

- Overview of Scientific and Technical Libraries for Python.
- Numerical Calculations with NumPy
- Storage and Processing of Large Amounts of Data
- Graphical Presentation of Scientific Data with matplotlib
- Object Oriented Programming for Scientific and Technical Projects
- Open Time for Problem Solving


2-Day Advanced Section
----------------------

- Extending Python with Other Languages
- Unit Testing
- Version Control with Mercurial


The Details
-----------

The course is hosted by David Beazley (http://www.dabeaz.com).

Date: Feb 27 - Mar 2, 2012
Location: Chicago, IL, USA
Trainer: Mike M?ller
Course Language: English
Link: http://www.dabeaz.com/chicago/science.html


From wardefar at iro.umontreal.ca  Tue Jan 24 17:30:32 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Tue, 24 Jan 2012 17:30:32 -0500
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <20120124180244.GD31456@ravage>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage>
	<CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com>
	<20120124180244.GD31456@ravage>
Message-ID: <20120124223032.GG31456@ravage>

On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote:
> On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote:
> 
> > Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.
> 
> Alright, so that rules out platform specific effects.
> 
> I'll try and hunt the bug down when I have some time, if someone more
> familiar with the indexing code doesn't beat me to it.

I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap
is using an int for a counter variable where it should be using an npy_intp.

I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a
regression test.

David


From scipy at samueljohn.de  Tue Jan 24 17:32:50 2012
From: scipy at samueljohn.de (Samuel John)
Date: Tue, 24 Jan 2012 23:32:50 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
Message-ID: <A75E31A0-FD6A-4DD9-89B4-6CEA57FF0468@samueljohn.de>


On 23.01.2012, at 11:23, David Warde-Farley wrote:
>> a = numpy.array(numpy.random.randint(256,size=(5000000,972)),dtype='uint8')
>> b = numpy.random.randint(5000000,size=(4993210,))
>> c = a[b]
>> In [14]: c[1000000:].sum()
>> Out[14]: 0

Same here.

Python 2.7.2, 64bit, Mac OS X (Lion), 8GB RAM, numpy.__version__ = 2.0.0.dev-55472ca
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)]
Numpy built without llvm.

From scipy at samueljohn.de  Tue Jan 24 17:36:48 2012
From: scipy at samueljohn.de (Samuel John)
Date: Tue, 24 Jan 2012 23:36:48 +0100
Subject: [Numpy-discussion] Unexpected behavior with np.min_scalar_type
In-Reply-To: <1327418956.6882.67.camel@MOSES.grc.nasa.gov>
References: <1327418956.6882.67.camel@MOSES.grc.nasa.gov>
Message-ID: <AD0C30C5-03F6-48D8-9E67-3EA70DB24628@samueljohn.de>

I get the same results as you, Kathy.
*surprised*

(On OS X (Lion), 64 bit, numpy 2.0.0.dev-55472ca, Python 2.7.2.


On 24.01.2012, at 16:29, Kathleen M Tacina wrote:

> I was experimenting with np.min_scalar_type to make sure it worked as expected, and found some unexpected results for integers between 2**63 and 2**64-1.  I would have expected np.min_scalar_type(2**64-1) to return uint64.  Instead, I get object.  Further experimenting showed that the largest integer for which np.min_scalar_type will return uint64 is 2**63-1.  Is this expected behavior?
> 
> On python 2.7.2 on a 64-bit linux machine:
> >>> import numpy as np
> >>> np.version.full_version
> '2.0.0.dev-55472ca'
> >>> np.min_scalar_type(2**8-1)
> dtype('uint8')
> >>> np.min_scalar_type(2**16-1)
> dtype('uint16')
> >>> np.min_scalar_type(2**32-1)
> dtype('uint32')
> >>> np.min_scalar_type(2**64-1)
> dtype('O')
> >>> np.min_scalar_type(2**63-1)
> dtype('uint64')
> >>> np.min_scalar_type(2**63)
> dtype('O')
> 
> I get the same results on a Windows XP  machine running python 2.7.2 and numpy 1.6.1. 
> 
> Kathy         
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From scipy at samueljohn.de  Tue Jan 24 17:44:10 2012
From: scipy at samueljohn.de (Samuel John)
Date: Tue, 24 Jan 2012 23:44:10 +0100
Subject: [Numpy-discussion] 'Advanced' save and restore operation
In-Reply-To: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com>
References: <CACB6ZmDZP=o22AJ1XdmFTSV2+5hbRdj1az-2c5G=mj0-_Dktbw@mail.gmail.com>
Message-ID: <24F05A4F-D070-47D5-8F00-B5F9314B239A@samueljohn.de>

I know you wrote that you want "TEXT" files, but never-the-less, I'd like to point to http://code.google.com/p/h5py/ .
There are viewers for hdf5 and it is stable and widely used.

 Samuel


On 24.01.2012, at 00:26, Emmanuel Mayssat wrote:

> After having saved data, I need to know/remember the data dtype to
> restore it correctly.
> Is there a way to save the dtype with the data?
> (I guess the header parameter of savedata could help, but they are
> only available in v2.0+ )
> 
> I would like to save several related structured array and a dictionary
> of parameters into a TEXT file.
> Is there an easy way to do that?
> (maybe xml file, or maybe archive zip file of other files, or ..... )
> 
> Any recommendation is helpful.
> 
> Regards,
> --
> Emmanuel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From scipy at samueljohn.de  Tue Jan 24 18:06:08 2012
From: scipy at samueljohn.de (Samuel John)
Date: Wed, 25 Jan 2012 00:06:08 +0100
Subject: [Numpy-discussion] installing matplotlib in MacOs 10.6.8.
In-Reply-To: <CAMW75YuPwz1BP+qPt++Ovu9R9y0WQvo1E25Qig0BV_eDcsGFeg@mail.gmail.com>
References: <CAMW75YuPwz1BP+qPt++Ovu9R9y0WQvo1E25Qig0BV_eDcsGFeg@mail.gmail.com>
Message-ID: <5C5F4BC6-E4F7-433F-8A45-BE6D44080DB9@samueljohn.de>

Sorry for the late answer. But at least for the record:

If you are using eclipse, I assume you have also installed the eclipse plugin [pydev](http://pydev.org/). Is use it myself, it's good. 
Then you have to go to the preferences->pydev->PythonInterpreter and select the python version you want to use by searching for the "Python" executable. 

I am not familiar with the pre-built versions of matplotlib. Perhaps they miss the 64bit intel versions? 
Perhaps you can find a lib (.so file) in matplotlib and use the "file" command to see the architectures, it was built for.
You should be able to install matplotlib also with `pip install matplotlib`. (if you have pip)

Samuel

On 26.12.2011, at 06:40, Alex Ter-Sarkissov wrote:

> hi everyone, I run python 2.7.2. in Eclipse (recently upgraded from 2.6). I have a problem with installing matplotlib (I found the version for python 2.7. MacOs 10.3, no later versions). If I run python in terminal using arch -i386 python, and then 
> 
> from matplotlib.pylab import *
> 
> and similar stuff, everything works fine. If I run python in eclipse or just without arch -i386, I can import matplotlib as 
> 
> from matplotlib import  *
> 
> but actually nothing gets imported. If I do it in the same way as above, I get the message
> 
> no matching architecture in universal wrapper
> 
> which means there's conflict of versions or something like that. I tried reinstalling the interpreter and adding matplotlib to forced built-ins, but nothing helped. For some reason I didn't have this problem with numpy and tkinter. 
> 
> Any suggestions are appreciated. 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From e.antero.tammi at gmail.com  Tue Jan 24 18:12:06 2012
From: e.antero.tammi at gmail.com (eat)
Date: Wed, 25 Jan 2012 01:12:06 +0200
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <jfn1ug$kve$1@dough.gmane.org>
References: <jfmthq$jhh$1@dough.gmane.org>
	<CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com>
	<jfn1ug$kve$1@dough.gmane.org>
Message-ID: <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com>

Hi,

Oddly, but numpy 1.6 seems to behave more consistent manner:

In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: d= np.load('data.npy')
In []: d.dtype
Out[]: dtype('float32')

In []: d.mean()
Out[]: 3045.7471999999998
In []: d.mean(dtype= np.float32)
Out[]: 3045.7471999999998
In []: d.mean(dtype= np.float64)
Out[]: 3045.747251076416
In []: (d- d.min()).mean()+ d.min()
Out[]: 3045.7472508750002
In []: d.mean(axis= 0).mean()
Out[]: 3045.7472499999999
In []: d.mean(axis= 1).mean()
Out[]: 3045.7472499999999

Or does the results of calculations depend more on the platform?


My 2 cents,
eat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/912a52b8/attachment.html>

From Kathleen.M.Tacina at nasa.gov  Tue Jan 24 18:21:35 2012
From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina)
Date: Tue, 24 Jan 2012 23:21:35 +0000
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail
	.com>
References: <jfmthq$jhh$1@dough.gmane.org> 
	<CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com> 
	<jfn1ug$kve$1@dough.gmane.org> 
	<CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com>
Message-ID: <1327447295.6882.139.camel@MOSES.grc.nasa.gov>

I found something similar, with a very simple example.

On 64-bit linux, python 2.7.2, numpy development version:

In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)

In [23]: a.mean()
Out[23]: 4034.16357421875

In [24]: np.version.full_version
Out[24]: '2.0.0.dev-55472ca'


But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
>>>a = np.ones((1024,1024),dtype=np.float32)
>>>a.mean()
4000.0
>>>np.version.full_version
'1.6.1'


On Tue, 2012-01-24 at 17:12 -0600, eat wrote:

> Hi,
> 
> 
> 
> Oddly, but numpy 1.6 seems to behave more consistent manner:
> 
> 
> In []: sys.version
> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
> (Intel)]'
> In []: np.version.version
> Out[]: '1.6.0'
> 
> 
> In []: d= np.load('data.npy')
> In []: d.dtype
> Out[]: dtype('float32')
> 
> 
> In []: d.mean()
> Out[]: 3045.7471999999998
> In []: d.mean(dtype= np.float32)
> Out[]: 3045.7471999999998
> In []: d.mean(dtype= np.float64)
> Out[]: 3045.747251076416
> In []: (d- d.min()).mean()+ d.min()
> Out[]: 3045.7472508750002
> In []: d.mean(axis= 0).mean()
> Out[]: 3045.7472499999999
> In []: d.mean(axis= 1).mean()
> Out[]: 3045.7472499999999
> 
> 
> Or does the results of calculations depend more on the platform?
> 
> 
> 
> 
> My 2 cents,
> eat

-- 
--------------------------------------------------
Kathleen M. Tacina
NASA Glenn Research Center
MS 5-10
21000 Brookpark Road
Cleveland, OH 44135
Telephone: (216) 433-6660
Fax: (216) 433-5802
--------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/5cc422b3/attachment.html>

From wardefar at iro.umontreal.ca  Tue Jan 24 18:22:26 2012
From: wardefar at iro.umontreal.ca (David Warde-Farley)
Date: Tue, 24 Jan 2012 18:22:26 -0500
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com>
References: <jfmthq$jhh$1@dough.gmane.org>
	<CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com>
	<jfn1ug$kve$1@dough.gmane.org>
	<CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com>
Message-ID: <20120124232226.GH31456@ravage>

On Wed, Jan 25, 2012 at 01:12:06AM +0200, eat wrote:

> Or does the results of calculations depend more on the platform?

Floating point operations often do, sadly (not saying that this is the case
here, but you'd need to try both versions on the same machine [or at least
architecture/bit-width]/same platform to be certain).

David


From mwwiebe at gmail.com  Tue Jan 24 18:49:05 2012
From: mwwiebe at gmail.com (Mark Wiebe)
Date: Tue, 24 Jan 2012 15:49:05 -0800
Subject: [Numpy-discussion] einsum evaluation order
In-Reply-To: <CAJO1x6qDxz5qe-=h-tDNwyk3Ywz-2i-_w7z5UnVfKHFHLWrYCQ@mail.gmail.com>
References: <CAJO1x6qDxz5qe-=h-tDNwyk3Ywz-2i-_w7z5UnVfKHFHLWrYCQ@mail.gmail.com>
Message-ID: <CAMRnEmqMKkCBg5vXVeEApOSGRNz8Dv9dW49YEeftYd_8GF-UCA@mail.gmail.com>

On Tue, Jan 24, 2012 at 6:32 AM, S?ren Gammelmark <gammelmark at gmail.com>wrote:

> Dear all,
>
> I was just looking into numpy.einsum and encountered an issue which might
> be worth pointing out in the documentation.
>
> Let us say you wish to evaluate something like this (repeated indices a
> summed)
>
> D[alpha, alphaprime] = A[alpha, beta, sigma] * B[alphaprime, betaprime,
> sigma] * C[beta, betaprime]
>
> with einsum as
>
> einsum('abs,cds,bd->ac', A, B, C)
>
> then it is not exactly clear which order einsum evaluates the contractions
> (or if it does it in one go). This can be very important since you can do
> it in several ways, one of which has the least computational complexity.
> The most efficient way of doing it is to contract e.g. A and C and then
> contract that with B (or exchange A and B). A quick test on my labtop says
> 2.6 s with einsum and 0.13 s for two tensordots with A and B being D x D x
> 2 and C is D x D for D = 96. This scaling seems to explode for higher
> dimensions, whereas it is much better with the two independent contractions
> (i believe it should be O(D^3)).For D = 512 I could do it in 5 s with two
> contractions, whereas I stopped waiting after 60 s for einsum (i guess
> einsum probably is O(D^4) in this case).
>

You are correct, einsum presently uses the most naive evaluation.


> I had in fact thought of making a function similar to einsum for a while,
> but after I noticed it dropped it. I think, however, that there might still
> be room for a tool for evaluating more complicated expressions efficiently.
> I think the best way would be for the user to enter an expression like the
> one above which is then evaluated in the optimal order. I know how to do
> this (theoretically) if all the repeated indices only occur twice (like the
> expression above), but for the more general expression supported by einsum
> I om not sure how to do it (haven't thought about it). Here I am thinking
> about stuff like x[i] = a[i] * b[i] and their more general counterparts (at
> first glance this seems to be a simpler problem than full contractions). Do
> you think there is a need/interest for this kind of thing? In that case I
> would like the write it / help write it. Much of it, I think, can be
> reduced to decomposing the expression into existing numpy operations (e.g.
> tensordot). How to incorporate issues of storage layout etc, however, I
> have no idea.
>

I think a good approach would be to modify einsum so it decomposes the
expression into multiple products. It may even just be a simple dynamic
programming problem, but I haven't given it much thought.

In any case I think it might be nice to write explicitly how the expression
> in einsum is evaluated in the docs.
>

That's a good idea, yes.

Thanks,
Mark


>
> S?ren Gammelmark
> PhD-student
> Department of Physics and Astronomy
> Aarhus University
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/6b83d15a/attachment.html>

From e.antero.tammi at gmail.com  Tue Jan 24 19:21:10 2012
From: e.antero.tammi at gmail.com (eat)
Date: Wed, 25 Jan 2012 02:21:10 +0200
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <1327447295.6882.139.camel@MOSES.grc.nasa.gov>
References: <jfmthq$jhh$1@dough.gmane.org>
	<CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com>
	<jfn1ug$kve$1@dough.gmane.org>
	<CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com>
	<1327447295.6882.139.camel@MOSES.grc.nasa.gov>
Message-ID: <CAKa=AYQT6o5No5BC-jqaQcvtWXrpJBAp2fzEDTLF5VWYpVMsvQ@mail.gmail.com>

Hi

On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina <
Kathleen.M.Tacina at nasa.gov> wrote:

> **
> I found something similar, with a very simple example.
>
> On 64-bit linux, python 2.7.2, numpy development version:
>
> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>
> In [23]: a.mean()
> Out[23]: 4034.16357421875
>
> In [24]: np.version.full_version
> Out[24]: '2.0.0.dev-55472ca'
>
>
> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
> >>>a = np.ones((1024,1024),dtype=np.float32)
> >>>a.mean()
> 4000.0
> >>>np.version.full_version
> '1.6.1'
>
This indeed looks very nasty, regardless of whether it is a version or
platform related problem.

-eat

>
>
>
> On Tue, 2012-01-24 at 17:12 -0600, eat wrote:
>
> Hi,
>
>
>
>  Oddly, but numpy 1.6 seems to behave more consistent manner:
>
>
>
>  In []: sys.version
>
>  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
> (Intel)]'
>
>  In []: np.version.version
>
>  Out[]: '1.6.0'
>
>
>
>  In []: d= np.load('data.npy')
>
>  In []: d.dtype
>
>  Out[]: dtype('float32')
>
>
>
>  In []: d.mean()
>
>  Out[]: 3045.7471999999998
>
>  In []: d.mean(dtype= np.float32)
>
>  Out[]: 3045.7471999999998
>
>  In []: d.mean(dtype= np.float64)
>
>  Out[]: 3045.747251076416
>
>  In []: (d- d.min()).mean()+ d.min()
>
>  Out[]: 3045.7472508750002
>
>  In []: d.mean(axis= 0).mean()
>
>  Out[]: 3045.7472499999999
>
>  In []: d.mean(axis= 1).mean()
>
>  Out[]: 3045.7472499999999
>
>
>
>  Or does the results of calculations depend more on the platform?
>
>
>
>
>
>  My 2 cents,
>
>  eat
>
>   --
> --------------------------------------------------
> Kathleen M. Tacina
> NASA Glenn Research Center
> MS 5-10
> 21000 Brookpark Road
> Cleveland, OH 44135
> Telephone: (216) 433-6660
> Fax: (216) 433-5802
> --------------------------------------------------
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/4104b4db/attachment.html>

From questions.anon at gmail.com  Tue Jan 24 19:22:52 2012
From: questions.anon at gmail.com (questions anon)
Date: Wed, 25 Jan 2012 11:22:52 +1100
Subject: [Numpy-discussion] numpy.percentile multiple arrays
Message-ID: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com>

I need some help understanding how to loop through many arrays to calculate
the 95th percentile.
I can easily do this by using numpy.concatenate to make one big array and
then finding the 95th percentile using numpy.percentile but this causes a
memory error when I want to run this on 100's of netcdf files (see code
below).
Any alternative methods will be greatly appreciated.


all_TSFC=[]
for (path, dirs, files) in os.walk(MainFolder):
    for dir in dirs:
        print dir
    path=path+'/'
    for ncfile in files:
        if ncfile[-3:]=='.nc':
            print "dealing with ncfiles:", ncfile
            ncfile=os.path.join(path,ncfile)
            ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
            TSFC=ncfile.variables['T_SFC'][:]
            ncfile.close()
            all_TSFC.append(TSFC)

big_array=N.ma.concatenate(all_TSFC)
Percentile95th=N.percentile(big_array, 95, axis=0)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/f0d2e515/attachment.html>

From mwwiebe at gmail.com  Tue Jan 24 19:33:44 2012
From: mwwiebe at gmail.com (Mark Wiebe)
Date: Tue, 24 Jan 2012 16:33:44 -0800
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
Message-ID: <CAMRnEmoWRVCYpMqR9JeFYWS0uM9t4HwJbfHS5ZH3PGMMUu5RBw@mail.gmail.com>

2012/1/21 Ond?ej ?ert?k <ondrej.certik at gmail.com>

> <snip>
>
> Let me know if you figure out something. I think the "mask" thing is
> quite slow, but the problem is that it needs to be there, to catch
> overflows (and it is there in Fortran as well, see the
> "where" statement, which does the same thing). Maybe there is some
> other way to write the same thing in NumPy?
>

In the current master, you can replace

    z[mask] *= z[mask]
    z[mask] += c[mask]
with
    np.multiply(z, z, out=z, where=mask)
    np.add(z, c, out=z, where=mask)

The performance of this alternate syntax is still not great, but it is
significantly faster than what it replaces. For a particular choice of
mask, I get

In [40]: timeit z[mask] *= z[mask]

10 loops, best of 3: 29.1 ms per loop

In [41]: timeit np.multiply(z, z, out=z, where=mask)

100 loops, best of 3: 4.2 ms per loop


-Mark


> Ondrej
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/0a9719a7/attachment.html>

From marc.shivers at gmail.com  Tue Jan 24 19:55:48 2012
From: marc.shivers at gmail.com (Marc Shivers)
Date: Tue, 24 Jan 2012 19:55:48 -0500
Subject: [Numpy-discussion] numpy.percentile multiple arrays
In-Reply-To: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com>
References: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com>
Message-ID: <CAGFio5ZC3WYK1c4O=OZ30PenfoPNQXxE-4veSQLsR0Y5kkgXeg@mail.gmail.com>

This is probably not the best way to do it, but I think it would work:

Your could take two passes through your data, first calculating and storing
the median for each file and the number of elements in each file.  From
those data, you can get a lower bound on the 95th percentile of the
combined dataset.  For example, if all the files are the same size, and
you've got 100 of them, then the 95th percentile of the full dataset would
be at least as large as the 90th percentile of the individual file median
values.  Once you've got that cut-off value, go back through your files and
just pull out the values larger than your cut-off value.  Then you'd just
need to figure out what percentile in this subset would correspond to the
95th percentile in the full dataset.

HTH,
Marc


On Tue, Jan 24, 2012 at 7:22 PM, questions anon <questions.anon at gmail.com>wrote:

> I need some help understanding how to loop through many arrays to
> calculate the 95th percentile.
> I can easily do this by using numpy.concatenate to make one big array and
> then finding the 95th percentile using numpy.percentile but this causes a
> memory error when I want to run this on 100's of netcdf files (see code
> below).
> Any alternative methods will be greatly appreciated.
>
>
> all_TSFC=[]
> for (path, dirs, files) in os.walk(MainFolder):
>     for dir in dirs:
>         print dir
>     path=path+'/'
>     for ncfile in files:
>         if ncfile[-3:]=='.nc':
>             print "dealing with ncfiles:", ncfile
>             ncfile=os.path.join(path,ncfile)
>             ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
>             TSFC=ncfile.variables['T_SFC'][:]
>             ncfile.close()
>             all_TSFC.append(TSFC)
>
> big_array=N.ma.concatenate(all_TSFC)
> Percentile95th=N.percentile(big_array, 95, axis=0)
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/bbb161f8/attachment.html>

From mwwiebe at gmail.com  Tue Jan 24 19:56:34 2012
From: mwwiebe at gmail.com (Mark Wiebe)
Date: Tue, 24 Jan 2012 16:56:34 -0800
Subject: [Numpy-discussion] Unexpected behavior with np.min_scalar_type
In-Reply-To: <1327418956.6882.67.camel@MOSES.grc.nasa.gov>
References: <1327418956.6882.67.camel@MOSES.grc.nasa.gov>
Message-ID: <CAMRnEmo=4OYisJKtqTnjnqDYV6KN6VDAEwFAFhD5QgfMwZWM2A@mail.gmail.com>

On Tue, Jan 24, 2012 at 7:29 AM, Kathleen M Tacina <
Kathleen.M.Tacina at nasa.gov> wrote:

> **
> I was experimenting with np.min_scalar_type to make sure it worked as
> expected, and found some unexpected results for integers between 2**63 and
> 2**64-1.  I would have expected np.min_scalar_type(2**64-1) to return
> uint64.  Instead, I get object.  Further experimenting showed that the
> largest integer for which np.min_scalar_type will return uint64 is
> 2**63-1.  Is this expected behavior?
>

This is a bug in how numpy detects the dtype of python objects.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/common.c#L18

You can see there it's only checking for a signed long long, not accounting
for the unsigned case. I created a ticket for you here:

http://projects.scipy.org/numpy/ticket/2028

-Mark


>
> On python 2.7.2 on a 64-bit linux machine:
> >>> import numpy as np
> >>> np.version.full_version
> '2.0.0.dev-55472ca'
> >>> np.min_scalar_type(2**8-1)
> dtype('uint8')
> >>> np.min_scalar_type(2**16-1)
> dtype('uint16')
> >>> np.min_scalar_type(2**32-1)
> dtype('uint32')
> >>> np.min_scalar_type(2**64-1)
> dtype('O')
> >>> np.min_scalar_type(2**63-1)
> dtype('uint64')
> >>> np.min_scalar_type(2**63)
> dtype('O')
>
> I get the same results on a Windows XP  machine running python 2.7.2 and
> numpy 1.6.1.
>
> Kathy
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/c4ea701a/attachment.html>

From mwwiebe at gmail.com  Tue Jan 24 19:59:22 2012
From: mwwiebe at gmail.com (Mark Wiebe)
Date: Tue, 24 Jan 2012 16:59:22 -0800
Subject: [Numpy-discussion] Fix for ticket #1973
In-Reply-To: <CAB6mnxKgHGQQnmLXNxDj158cwURw=41WWcE2yV5MfwsK5vV_XA@mail.gmail.com>
References: <CAB6mnxJuEsE0iR+-EH95zjDPTL9qL_YJCyKTjHHLaKSAAWyhuA@mail.gmail.com>
	<4F144455.9020904@gmail.com>
	<CAB6mnxKmzuexvvMph6u9KS_12d56JGY+74Wf6U+Xk8nhqFUjzw@mail.gmail.com>
	<CAB6mnxKgHGQQnmLXNxDj158cwURw=41WWcE2yV5MfwsK5vV_XA@mail.gmail.com>
Message-ID: <CAMRnEmqif+dS-EEJbL+tEQNH1oa_=XzEE9QPaG5fRjsTEC2uQw@mail.gmail.com>

On Mon, Jan 16, 2012 at 8:14 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Mon, Jan 16, 2012 at 8:52 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Mon, Jan 16, 2012 at 8:37 AM, Bruce Southey <bsouthey at gmail.com>wrote:
>>
>>> **
>>> On 01/14/2012 04:31 PM, Charles R Harris wrote:
>>>
>>> I've put up a pull request for a fix to ticket #1973. Currently the fix
>>> simply propagates the maskna flag when the *.astype method is called. A
>>> more complicated option would be to add a maskna keyword to specify whether
>>> the output is masked or not or propagates the type of the source, but that
>>> seems overly complex to me.
>>>
>>> Thoughts?
>>>
>>> Chuck
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>  Thanks for the correction and as well as the fix. While it worked for
>>> integer and floats (not complex ones), I got an error when using complex
>>> dtypes. This error that is also present in array creation of complex
>>> dtypes. Is this known or a new bug?
>>>
>>> If it is new, then we need to identify what functionality should handle
>>> np.NA but are not working.
>>>
>>> Bruce
>>>
>>> $ python
>>> Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
>>> [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> >>> import numpy as np
>>> >>> np.__version__ # pull request version
>>> '2.0.0.dev-88f9276'
>>> >>> np.array([1,2], dtype=np.complex)
>>> array([ 1.+0.j,  2.+0.j])
>>> >>> np.array([1,2, np.NA], dtype=np.complex)
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
>>> 1445, in array_repr
>>>     ', ', "array(")
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 459, in array2string
>>>     separator, prefix, formatter=formatter)
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 263, in _array2string
>>>     suppress_small),
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 724, in __init__
>>>     self.real_format = FloatFormat(x.real, precision, suppress_small)
>>> ValueError: Cannot construct a view of data together with the
>>> NPY_ARRAY_MASKNA flag, the NA mask must be added later
>>> >>> ca=np.array([1,2], dtype=np.complex, maskna=True)
>>> >>> ca[1]=np.NA
>>> >>> ca
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in <module>
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line
>>> 1445, in array_repr
>>>     ', ', "array(")
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 459, in array2string
>>>     separator, prefix, formatter=formatter)
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 263, in _array2string
>>>     suppress_small),
>>>   File "/usr/lib64/python2.7/site-packages/numpy/core/arrayprint.py",
>>> line 724, in __init__
>>>     self.real_format = FloatFormat(x.real, precision, suppress_small)
>>> ValueError: Cannot construct a view of data together with the
>>> NPY_ARRAY_MASKNA flag, the NA mask must be added later
>>> >>>
>>>
>>>
>> Looks like a different bug involving the *.real and *.imag views. I'll
>> take a look.
>>
>>
> Looks like views of masked arrays have other problems:
>
> In [13]: a = ones(3, int16, maskna=1)
>
> In [14]: a.view(int8)
> Out[14]: array([1, 0, 1, NA, 1, NA], dtype=int8)
>
>
This looks like a serious bug to me, to avoid memory corruption issues it
should raise an exception.

-Mark


>
> I'm not sure what the policy should be here. One could construct a new
> mask adapted to the view, raise an error when the types don't align (I
> think the real/imag parts should be considered aligned), or just let the
> view unmask the array. The last seems dangerous. Hmm...
>


> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/dbb841ef/attachment.html>

From brett.olsen at gmail.com  Tue Jan 24 21:26:28 2012
From: brett.olsen at gmail.com (Brett Olsen)
Date: Tue, 24 Jan 2012 20:26:28 -0600
Subject: [Numpy-discussion] numpy.percentile multiple arrays
In-Reply-To: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com>
References: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com>
Message-ID: <CAFq1z2XTFf8uUuW2J_FnPNzm8P4t3yveM61VAvgsuRGBDozWCQ@mail.gmail.com>

On Tue, Jan 24, 2012 at 6:22 PM, questions anon
<questions.anon at gmail.com> wrote:
> I need some help understanding how to loop through many arrays to calculate
> the 95th percentile.
> I can easily do this by using numpy.concatenate to make one big array and
> then finding the 95th percentile using numpy.percentile but this causes a
> memory error when I want to run this on 100's of netcdf files (see code
> below).
> Any alternative methods will be greatly appreciated.
>
>
> all_TSFC=[]
> for (path, dirs, files) in os.walk(MainFolder):
> ??? for dir in dirs:
> ??????? print dir
> ??? path=path+'/'
> ??? for ncfile in files:
> ??????? if ncfile[-3:]=='.nc':
> ??????????? print "dealing with ncfiles:", ncfile
> ??????????? ncfile=os.path.join(path,ncfile)
> ??????????? ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
> ??????????? TSFC=ncfile.variables['T_SFC'][:]
> ??????????? ncfile.close()
> ??????????? all_TSFC.append(TSFC)
>
> big_array=N.ma.concatenate(all_TSFC)
> Percentile95th=N.percentile(big_array, 95, axis=0)

If the range of your data is known and limited (i.e., you have a
comparatively small number of possible values, but a number of repeats
of each value) then you could do this by keeping a running cumulative
distribution function as you go through each of your files.  For each
file, calculate a cumulative distribution function --- at each
possible value, record the fraction of that population strictly less
than that value --- and then it's straightforward to combine the
cumulative distribution functions from two separate files:
cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2)

Then once you've gone through all the files, look for the value where
your cumulative distribution function is equal to 0.95.  If your data
isn't structured with repeated values, though, this won't work,
because your cumulative distribution function will become too big to
hold into memory.  In that case, what I would probably do would be an
iterative approach:  make an approximation to the exact function by
removing some fraction of the possible values, which will provide a
limited range for the exact percentile you want, and then walk through
the files again calculating the function more exactly within the
limited range, repeating until you have the value to the desired
precision.

~Brett


From questions.anon at gmail.com  Tue Jan 24 22:49:46 2012
From: questions.anon at gmail.com (questions anon)
Date: Wed, 25 Jan 2012 14:49:46 +1100
Subject: [Numpy-discussion] numpy.percentile multiple arrays
In-Reply-To: <CAFq1z2XTFf8uUuW2J_FnPNzm8P4t3yveM61VAvgsuRGBDozWCQ@mail.gmail.com>
References: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com>
	<CAFq1z2XTFf8uUuW2J_FnPNzm8P4t3yveM61VAvgsuRGBDozWCQ@mail.gmail.com>
Message-ID: <CAN_=ogtfu=K2QxPSiooX9ekOPkX-2SweUTM_R72Aw6Q2kvk7-Q@mail.gmail.com>

thanks for your responses,
because of the size of the dataset I will still end up with the memory
error if I calculate the median for each file, additionally the files are
not all the same size. I believe this memory problem will still arise with
the cumulative distribution calculation and not sure I understand how to
write the second suggestion about the iterative approach but will have a go.
Thanks again

On Wed, Jan 25, 2012 at 1:26 PM, Brett Olsen <brett.olsen at gmail.com> wrote:

> On Tue, Jan 24, 2012 at 6:22 PM, questions anon
> <questions.anon at gmail.com> wrote:
> > I need some help understanding how to loop through many arrays to
> calculate
> > the 95th percentile.
> > I can easily do this by using numpy.concatenate to make one big array and
> > then finding the 95th percentile using numpy.percentile but this causes a
> > memory error when I want to run this on 100's of netcdf files (see code
> > below).
> > Any alternative methods will be greatly appreciated.
> >
> >
> > all_TSFC=[]
> > for (path, dirs, files) in os.walk(MainFolder):
> >     for dir in dirs:
> >         print dir
> >     path=path+'/'
> >     for ncfile in files:
> >         if ncfile[-3:]=='.nc':
> >             print "dealing with ncfiles:", ncfile
> >             ncfile=os.path.join(path,ncfile)
> >             ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
> >             TSFC=ncfile.variables['T_SFC'][:]
> >             ncfile.close()
> >             all_TSFC.append(TSFC)
> >
> > big_array=N.ma.concatenate(all_TSFC)
> > Percentile95th=N.percentile(big_array, 95, axis=0)
>
> If the range of your data is known and limited (i.e., you have a
> comparatively small number of possible values, but a number of repeats
> of each value) then you could do this by keeping a running cumulative
> distribution function as you go through each of your files.  For each
> file, calculate a cumulative distribution function --- at each
> possible value, record the fraction of that population strictly less
> than that value --- and then it's straightforward to combine the
> cumulative distribution functions from two separate files:
> cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2)
>
> Then once you've gone through all the files, look for the value where
> your cumulative distribution function is equal to 0.95.  If your data
> isn't structured with repeated values, though, this won't work,
> because your cumulative distribution function will become too big to
> hold into memory.  In that case, what I would probably do would be an
> iterative approach:  make an approximation to the exact function by
> removing some fraction of the possible values, which will provide a
> limited range for the exact percentile you want, and then walk through
> the files again calculating the function more exactly within the
> limited range, repeating until you have the value to the desired
> precision.
>
> ~Brett
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/761fa871/attachment.html>

From shish at keba.be  Tue Jan 24 23:00:10 2012
From: shish at keba.be (Olivier Delalleau)
Date: Tue, 24 Jan 2012 23:00:10 -0500
Subject: [Numpy-discussion] numpy.percentile multiple arrays
In-Reply-To: <CAN_=ogtfu=K2QxPSiooX9ekOPkX-2SweUTM_R72Aw6Q2kvk7-Q@mail.gmail.com>
References: <CAN_=ogveEp2TVbH7Ci4rpC2us07bSWGjRb1drOZ6tmis+58vbQ@mail.gmail.com>
	<CAFq1z2XTFf8uUuW2J_FnPNzm8P4t3yveM61VAvgsuRGBDozWCQ@mail.gmail.com>
	<CAN_=ogtfu=K2QxPSiooX9ekOPkX-2SweUTM_R72Aw6Q2kvk7-Q@mail.gmail.com>
Message-ID: <CAFXk4bpDUUwHk+d95wDBDQpiKscaTGKrezPqW=aGG7dOKS_3kQ@mail.gmail.com>

Note that if you are ok with an approximate solution, and you can assume
your data is somewhat shuffled, a simple online algorithm that uses no
memory consists in:
- choosing a small step size delta
- initializing your percentile p to a more or less random value (a
meaningful guess is better though)
- iterate through your samples, updating p after each sample by p += 19 *
delta if sample > p, and p -= delta otherwise

The idea is that the 95th percentile is such that 5% of the data is higher,
and 95% (19 times more) is lower, so if p is equal to this value, on
average it should remain constant through the online update.
You may do multiple passes if you are not confident in your initial value,
possibly reducing delta over time to improve accuracy.

-=- Olivier

2012/1/24 questions anon <questions.anon at gmail.com>

> thanks for your responses,
> because of the size of the dataset I will still end up with the memory
> error if I calculate the median for each file, additionally the files are
> not all the same size. I believe this memory problem will still arise with
> the cumulative distribution calculation and not sure I understand how to
> write the second suggestion about the iterative approach but will have a go.
> Thanks again
>
>
> On Wed, Jan 25, 2012 at 1:26 PM, Brett Olsen <brett.olsen at gmail.com>wrote:
>
>> On Tue, Jan 24, 2012 at 6:22 PM, questions anon
>> <questions.anon at gmail.com> wrote:
>> > I need some help understanding how to loop through many arrays to
>> calculate
>> > the 95th percentile.
>> > I can easily do this by using numpy.concatenate to make one big array
>> and
>> > then finding the 95th percentile using numpy.percentile but this causes
>> a
>> > memory error when I want to run this on 100's of netcdf files (see code
>> > below).
>> > Any alternative methods will be greatly appreciated.
>> >
>> >
>> > all_TSFC=[]
>> > for (path, dirs, files) in os.walk(MainFolder):
>> >     for dir in dirs:
>> >         print dir
>> >     path=path+'/'
>> >     for ncfile in files:
>> >         if ncfile[-3:]=='.nc':
>> >             print "dealing with ncfiles:", ncfile
>> >             ncfile=os.path.join(path,ncfile)
>> >             ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
>> >             TSFC=ncfile.variables['T_SFC'][:]
>> >             ncfile.close()
>> >             all_TSFC.append(TSFC)
>> >
>> > big_array=N.ma.concatenate(all_TSFC)
>> > Percentile95th=N.percentile(big_array, 95, axis=0)
>>
>> If the range of your data is known and limited (i.e., you have a
>> comparatively small number of possible values, but a number of repeats
>> of each value) then you could do this by keeping a running cumulative
>> distribution function as you go through each of your files.  For each
>> file, calculate a cumulative distribution function --- at each
>> possible value, record the fraction of that population strictly less
>> than that value --- and then it's straightforward to combine the
>> cumulative distribution functions from two separate files:
>> cumdist_both = (cumdist1 * N1 + cumdist2 * N2) / (N1 + N2)
>>
>> Then once you've gone through all the files, look for the value where
>> your cumulative distribution function is equal to 0.95.  If your data
>> isn't structured with repeated values, though, this won't work,
>> because your cumulative distribution function will become too big to
>> hold into memory.  In that case, what I would probably do would be an
>> iterative approach:  make an approximation to the exact function by
>> removing some fraction of the possible values, which will provide a
>> limited range for the exact percentile you want, and then walk through
>> the files again calculating the function more exactly within the
>> limited range, repeating until you have the value to the desired
>> precision.
>>
>> ~Brett
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/f1723fc7/attachment.html>

From josef.pktd at gmail.com  Tue Jan 24 23:40:02 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 24 Jan 2012 23:40:02 -0500
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <CAKa=AYQT6o5No5BC-jqaQcvtWXrpJBAp2fzEDTLF5VWYpVMsvQ@mail.gmail.com>
References: <jfmthq$jhh$1@dough.gmane.org>
	<CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com>
	<jfn1ug$kve$1@dough.gmane.org>
	<CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com>
	<1327447295.6882.139.camel@MOSES.grc.nasa.gov>
	<CAKa=AYQT6o5No5BC-jqaQcvtWXrpJBAp2fzEDTLF5VWYpVMsvQ@mail.gmail.com>
Message-ID: <CAMMTP+DB=y5TJrigfZEkv97m96EQbJr+s-V0k9NtqEJEQWoJjw@mail.gmail.com>

On Tue, Jan 24, 2012 at 7:21 PM, eat <e.antero.tammi at gmail.com> wrote:

> Hi
>
> On Wed, Jan 25, 2012 at 1:21 AM, Kathleen M Tacina <
> Kathleen.M.Tacina at nasa.gov> wrote:
>
>> **
>> I found something similar, with a very simple example.
>>
>> On 64-bit linux, python 2.7.2, numpy development version:
>>
>> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>>
>> In [23]: a.mean()
>> Out[23]: 4034.16357421875
>>
>> In [24]: np.version.full_version
>> Out[24]: '2.0.0.dev-55472ca'
>>
>>
>> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
>> >>>a = np.ones((1024,1024),dtype=np.float32)
>> >>>a.mean()
>> 4000.0
>> >>>np.version.full_version
>> '1.6.1'
>>
> This indeed looks very nasty, regardless of whether it is a version or
> platform related problem.
>

Looks like platform specific, same result as -eat

Windows 7,
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32

>>> a = np.ones((1024,1024),dtype=np.float32)
>>> a.mean()
1.0

>>> (4000*a).dtype
dtype('float32')
>>> (4000*a).mean()
4000.0

>>> b = np.load("data.npy")
>>> b.mean()
3045.7471999999998
>>> b.shape
(1000, 1000)
>>> b.mean(0).mean(0)
3045.7472499999999
>>> _.dtype
dtype('float64')
>>> b.dtype
dtype('float32')

>>> b.mean(dtype=np.float32)
3045.7471999999998

Josef


>
> -eat
>
>>
>>
>>
>> On Tue, 2012-01-24 at 17:12 -0600, eat wrote:
>>
>> Hi,
>>
>>
>>
>>  Oddly, but numpy 1.6 seems to behave more consistent manner:
>>
>>
>>
>>  In []: sys.version
>>
>>  Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
>> (Intel)]'
>>
>>  In []: np.version.version
>>
>>  Out[]: '1.6.0'
>>
>>
>>
>>  In []: d= np.load('data.npy')
>>
>>  In []: d.dtype
>>
>>  Out[]: dtype('float32')
>>
>>
>>
>>  In []: d.mean()
>>
>>  Out[]: 3045.7471999999998
>>
>>  In []: d.mean(dtype= np.float32)
>>
>>  Out[]: 3045.7471999999998
>>
>>  In []: d.mean(dtype= np.float64)
>>
>>  Out[]: 3045.747251076416
>>
>>  In []: (d- d.min()).mean()+ d.min()
>>
>>  Out[]: 3045.7472508750002
>>
>>  In []: d.mean(axis= 0).mean()
>>
>>  Out[]: 3045.7472499999999
>>
>>  In []: d.mean(axis= 1).mean()
>>
>>  Out[]: 3045.7472499999999
>>
>>
>>
>>  Or does the results of calculations depend more on the platform?
>>
>>
>>
>>
>>
>>  My 2 cents,
>>
>>  eat
>>
>>   --
>> --------------------------------------------------
>> Kathleen M. Tacina
>> NASA Glenn Research Center
>> MS 5-10
>> 21000 Brookpark Road
>> Cleveland, OH 44135
>> Telephone: (216) 433-6660
>> Fax: (216) 433-5802
>> --------------------------------------------------
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/8df4f297/attachment.html>

From charlesr.harris at gmail.com  Wed Jan 25 00:03:49 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 24 Jan 2012 22:03:49 -0700
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <1327447295.6882.139.camel@MOSES.grc.nasa.gov>
References: <jfmthq$jhh$1@dough.gmane.org>
	<CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com>
	<jfn1ug$kve$1@dough.gmane.org>
	<CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com>
	<1327447295.6882.139.camel@MOSES.grc.nasa.gov>
Message-ID: <CAB6mnx+PQJwotuqFZVb0YXcv28SBuwojPwy0nWX-ELbWeAJdfQ@mail.gmail.com>

On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina <
Kathleen.M.Tacina at nasa.gov> wrote:

> **
> I found something similar, with a very simple example.
>
> On 64-bit linux, python 2.7.2, numpy development version:
>
> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>
> In [23]: a.mean()
> Out[23]: 4034.16357421875
>
> In [24]: np.version.full_version
> Out[24]: '2.0.0.dev-55472ca'
>
>
> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
> >>>a = np.ones((1024,1024),dtype=np.float32)
> >>>a.mean()
> 4000.0
> >>>np.version.full_version
> '1.6.1'
>
>
>
Yes, the results are platform/compiler dependent. The 32 bit platforms tend
to use extended precision accumulators and the x87 instruction set. The 64
bit platforms tend to use sse2+. Different precisions, even though you
might think they are the same.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120124/c3c92e65/attachment.html>

From josef.pktd at gmail.com  Wed Jan 25 00:16:55 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Jan 2012 00:16:55 -0500
Subject: [Numpy-discussion] bug in numpy.mean() ?
In-Reply-To: <CAB6mnx+PQJwotuqFZVb0YXcv28SBuwojPwy0nWX-ELbWeAJdfQ@mail.gmail.com>
References: <jfmthq$jhh$1@dough.gmane.org>
	<CAE8bXE=SoZh6tw_=jZxfejbyGLpvTx9g1cnKPhdqhAzz135M2Q@mail.gmail.com>
	<jfn1ug$kve$1@dough.gmane.org>
	<CAKa=AYQ-=+KSHT-dfD_7_BzF1jee9h2mE=OpagcWdrKg-DfmxQ@mail.gmail.com>
	<1327447295.6882.139.camel@MOSES.grc.nasa.gov>
	<CAB6mnx+PQJwotuqFZVb0YXcv28SBuwojPwy0nWX-ELbWeAJdfQ@mail.gmail.com>
Message-ID: <CAMMTP+CRw7cqTzLxhvFTQNk_fST2Sj6mCTq_TbdNvswnQ8Gj9Q@mail.gmail.com>

On Wed, Jan 25, 2012 at 12:03 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Jan 24, 2012 at 4:21 PM, Kathleen M Tacina
> <Kathleen.M.Tacina at nasa.gov> wrote:
>>
>> I found something similar, with a very simple example.
>>
>> On 64-bit linux, python 2.7.2, numpy development version:
>>
>> In [22]: a = 4000*np.ones((1024,1024),dtype=np.float32)
>>
>> In [23]: a.mean()
>> Out[23]: 4034.16357421875
>>
>> In [24]: np.version.full_version
>> Out[24]: '2.0.0.dev-55472ca'
>>
>>
>> But, a Windows XP machine running python 2.7.2 with numpy 1.6.1 gives:
>> >>>a = np.ones((1024,1024),dtype=np.float32)
>> >>>a.mean()
>> 4000.0
>> >>>np.version.full_version
>> '1.6.1'
>>
>>
>
> Yes, the results are platform/compiler dependent. The 32 bit platforms tend
> to use extended precision accumulators and the x87 instruction set. The 64
> bit platforms tend to use sse2+. Different precisions, even though you might
> think they are the same.

just to confirm, same computer as before but the python 3.2 version is
64 bit, now I get the "Linux" result

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit
(AMD64)] on win32

>>> import numpy as np
>>> np.__version__
'1.5.1'
>>> a = 4000*np.ones((1024,1024),dtype=np.float32)
>>> a.mean()
4034.16357421875
>>> a.mean(0).mean(0)
4000.0
>>> a.mean(dtype=np.float64)
4000.0

Josef

>
> <snip>
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From sturla at molden.no  Wed Jan 25 04:10:22 2012
From: sturla at molden.no (Sturla Molden)
Date: Wed, 25 Jan 2012 10:10:22 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <20120124223032.GG31456@ravage>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>	<20120123185552.GA27535@ravage>	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>	<20120123203353.GD28091@ravage>
	<4F1DCC33.6090101@uci.edu>	<4F1E3AD5.9000507@molden.no>
	<20120124172424.GB31456@ravage>	<CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com>	<20120124180244.GD31456@ravage>
	<20120124223032.GG31456@ravage>
Message-ID: <4F1FC6FE.3040906@molden.no>

On 24.01.2012 23:30, David Warde-Farley wrote:

> I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap
> is using an int for a counter variable where it should be using an npy_intp.
>
> I've filed a pull request at https://github.com/numpy/numpy/pull/188 with a
> regression test.

That is great :)

Now we just need to fix mtrand.pyx and all this will be gone.


Sturla


From charlesr.harris at gmail.com  Wed Jan 25 07:40:23 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 25 Jan 2012 05:40:23 -0700
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <20120124223032.GG31456@ravage>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <20120124172424.GB31456@ravage>
	<CALsWBNMBseZ+OVBvrTXazuef=TKOKddcchEKUt3zG2ooyc54Aw@mail.gmail.com>
	<20120124180244.GD31456@ravage> <20120124223032.GG31456@ravage>
Message-ID: <CAB6mnxJA_zW8KOcLv0U-L=Z-qDFx-oTba3DhTXnstFJW8Fasnw@mail.gmail.com>

On Tue, Jan 24, 2012 at 3:30 PM, David Warde-Farley <
wardefar at iro.umontreal.ca> wrote:

> On Tue, Jan 24, 2012 at 01:02:44PM -0500, David Warde-Farley wrote:
> > On Tue, Jan 24, 2012 at 06:37:12PM +0100, Robin wrote:
> >
> > > Yes - I get exactly the same numbers in 64 bit windows with 1.6.1.
> >
> > Alright, so that rules out platform specific effects.
> >
> > I'll try and hunt the bug down when I have some time, if someone more
> > familiar with the indexing code doesn't beat me to it.
>
> I've figured it out. In numpy/core/src/multiarray/mapping.c, PyArray_GetMap
> is using an int for a counter variable where it should be using an
> npy_intp.
>
> I've filed a pull request at https://github.com/numpy/numpy/pull/188 with
> a
> regression test.
>
>
I think this bug, or one like it, was reported a couple of years ago. But I
don't recall if there was ever a ticket opened.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/a7e6d505/attachment.html>

From edcjones at comcast.net  Wed Jan 25 10:12:23 2012
From: edcjones at comcast.net (Edward C. Jones)
Date: Wed, 25 Jan 2012 10:12:23 -0500
Subject: [Numpy-discussion] Permuting sparse arrays
Message-ID: <4F201BD7.7000301@comcast.net>

I have a vector of bits where there are many more zeros than one.  I 
store the array as a sorted list of the indexes where the bit is one.  
If the bit array is (0, 1, 0, 0, 0, 1, 1), it is stored as (1, 5, 6).  
If the bit array, b, has length n, and p is a random permutation of 
arange(n), then I can permute the bit array using fancy indexing: b[p].  
Is there some neat trick I can use to permute an array while leaving it 
in the list-of-indexes form?  Currently I am doing it with a Python loop 
but I am looking for a faster way.


From robert.kern at gmail.com  Wed Jan 25 10:23:51 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 25 Jan 2012 15:23:51 +0000
Subject: [Numpy-discussion] Permuting sparse arrays
In-Reply-To: <4F201BD7.7000301@comcast.net>
References: <4F201BD7.7000301@comcast.net>
Message-ID: <CAF6FJiu7bxd-5-5eY5P4qBFPhyFjHAO99+CA5P-dBRLvjw4k9w@mail.gmail.com>

On Wed, Jan 25, 2012 at 15:12, Edward C. Jones <edcjones at comcast.net> wrote:
> I have a vector of bits where there are many more zeros than one. ?I
> store the array as a sorted list of the indexes where the bit is one.
> If the bit array is (0, 1, 0, 0, 0, 1, 1), it is stored as (1, 5, 6).
> If the bit array, b, has length n, and p is a random permutation of
> arange(n), then I can permute the bit array using fancy indexing: b[p].
> Is there some neat trick I can use to permute an array while leaving it
> in the list-of-indexes form? ?Currently I am doing it with a Python loop
> but I am looking for a faster way.

Use argsort() to get the "inverse" of the permutation. Then
fancy-index the inverse with the list-of-indexes array.

[~/scratch]
|28> b
array([0, 1, 0, 0, 0, 1, 1])

[~/scratch]
|29> loi
array([1, 5, 6])

[~/scratch]
|30> p = np.random.permutation(len(b))

[~/scratch]
|31> ps = p.argsort()

[~/scratch]
|41> p
array([2, 3, 5, 4, 6, 1, 0])

[~/scratch]
|42> ps
array([6, 5, 0, 1, 3, 2, 4])

[~/scratch]
|43> ps[loi]
array([5, 2, 4])

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From mmueller at python-academy.de  Wed Jan 25 10:47:07 2012
From: mmueller at python-academy.de (=?UTF-8?B?TWlrZSBNw7xsbGVy?=)
Date: Wed, 25 Jan 2012 16:47:07 +0100
Subject: [Numpy-discussion] Matplotlib and optimization tutorials at PyCon US
Message-ID: <4F2023FB.7000108@python-academy.de>

Hi,

I will be giving a matplotlib and a optimization tutorial
at PyCon in March.

The first tutorial is a compact introduction to matplotlib.

The optimization tutorial gives an overview over this topic.

BTW, the early bird deadline is today.

Mike


Plotting with matplotlib
------------------------

Instructor: Mike M?ller
Type:Tutorial
Audience level:Novice
Category:Useful libraries
March 8th 9 a.m. ? 12:20 p.m.
https://us.pycon.org/2012/schedule/presentation/238/

When it comes to plotting with Python many people think about matplotlib. It is
widely used and provides a simple interface for creating a wide variety of
plots from very simple diagrams to sophisticated animations. This tutorial is a
hands-on introduction that teaches the basics of matplotlib. Students will
learn how to create publication-ready plots with just a few lines of Python.


Faster Python Programs through Optimization
-------------------------------------------

Instructor: Mike M?ller
Type:Tutorial
Audience level:Experienced
Category:Best Practices/Patterns
March 7th 9 a.m. ? 12:20 p.m.
https://us.pycon.org/2012/schedule/presentation/245/

This tutorial provides an overview of techniques to improve the performance of
Python programs. The focus is on concepts such as profiling, difference of data
structures and algorithms as well as a selection of tools and libraries that
help to speed up Python.


From emayssat at gmail.com  Wed Jan 25 13:10:27 2012
From: emayssat at gmail.com (Emmanuel Mayssat)
Date: Wed, 25 Jan 2012 10:10:27 -0800
Subject: [Numpy-discussion] array metadata
Message-ID: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com>

Is there a way to store metadata for an array?
For example, date the samples were collected, name of the operator, etc.

Regards,
--
Emmanuel


From kalatsky at gmail.com  Wed Jan 25 13:47:28 2012
From: kalatsky at gmail.com (Val Kalatsky)
Date: Wed, 25 Jan 2012 12:47:28 -0600
Subject: [Numpy-discussion] array metadata
In-Reply-To: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com>
References: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com>
Message-ID: <CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com>

I believe there are no provisions made for that in ndarray.
But you can subclass ndarray.
Val

On Wed, Jan 25, 2012 at 12:10 PM, Emmanuel Mayssat <emayssat at gmail.com>wrote:

> Is there a way to store metadata for an array?
> For example, date the samples were collected, name of the operator, etc.
>
> Regards,
> --
> Emmanuel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/b0d61ae1/attachment.html>

From chris.barker at noaa.gov  Wed Jan 25 14:27:10 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Wed, 25 Jan 2012 11:27:10 -0800
Subject: [Numpy-discussion] view of a structured array?
Message-ID: <CALGmxE+zSWiae=3UC3rA3v-KJ-wnTayPXdv9A3KmovjW21udxg@mail.gmail.com>

HI folks,

Is there a way to get a view of a subset of a structured array? I know
that an arbitrary subset will not fit into the numpy "strides"offsets"
model, but some will, and it would be nice to have a view:

For example, here we have a stuctured array:

In [56]: a
Out[56]:
array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
       (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
      dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')])


if I pull out one "field" a get a view:

In [57]: b = a['f1']

In [58]: b[0] = 1000

In [59]: a
Out[59]:
array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
       (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
      dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')])

However, if I pull out more than one field, I get a copy:

In [60]: b = a[['f1','f2']]

In [61]: b
Out[61]:
array([(1000.0, 3.0), (8.0, 9.0), (123.4, 7.0), (10.0, 11.0), (14.0, 15.0)],
      dtype=[('f1', '<f8'), ('f2', '<f8')])

In [62]: b[1] = (2000,3000)

In [63]: b
Out[63]:
array([(1000.0, 3.0), (2000.0, 3000.0), (123.4, 7.0), (10.0, 11.0),
       (14.0, 15.0)],
      dtype=[('f1', '<f8'), ('f2', '<f8')])

In [64]: a
Out[64]:
array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
       (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
      dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')])


However, in this case, the two fields are contiguous, and thus I'm
pretty sure one could build a numpy array that was a view. Is there
any way to do so? Ideally without manipulating the strides by hand,
but I may want to do that if it's the only way.

-Chris


-- 
--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From josef.pktd at gmail.com  Wed Jan 25 14:33:34 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 25 Jan 2012 14:33:34 -0500
Subject: [Numpy-discussion] view of a structured array?
In-Reply-To: <CALGmxE+zSWiae=3UC3rA3v-KJ-wnTayPXdv9A3KmovjW21udxg@mail.gmail.com>
References: <CALGmxE+zSWiae=3UC3rA3v-KJ-wnTayPXdv9A3KmovjW21udxg@mail.gmail.com>
Message-ID: <CAMMTP+DLo_P8Kc98vQCPsm+dsjceRS0D1a5R6Lpy1U+T0yX4UA@mail.gmail.com>

On Wed, Jan 25, 2012 at 2:27 PM, Chris Barker <chris.barker at noaa.gov> wrote:
> HI folks,
>
> Is there a way to get a view of a subset of a structured array? I know
> that an arbitrary subset will not fit into the numpy "strides"offsets"
> model, but some will, and it would be nice to have a view:
>
> For example, here we have a stuctured array:
>
> In [56]: a
> Out[56]:
> array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
> ? ? ? (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
> ? ? ?dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')])
>
>
> if I pull out one "field" a get a view:
>
> In [57]: b = a['f1']
>
> In [58]: b[0] = 1000
>
> In [59]: a
> Out[59]:
> array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
> ? ? ? (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
> ? ? ?dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')])
>
> However, if I pull out more than one field, I get a copy:
>
> In [60]: b = a[['f1','f2']]
>
> In [61]: b
> Out[61]:
> array([(1000.0, 3.0), (8.0, 9.0), (123.4, 7.0), (10.0, 11.0), (14.0, 15.0)],
> ? ? ?dtype=[('f1', '<f8'), ('f2', '<f8')])
>
> In [62]: b[1] = (2000,3000)
>
> In [63]: b
> Out[63]:
> array([(1000.0, 3.0), (2000.0, 3000.0), (123.4, 7.0), (10.0, 11.0),
> ? ? ? (14.0, 15.0)],
> ? ? ?dtype=[('f1', '<f8'), ('f2', '<f8')])
>
> In [64]: a
> Out[64]:
> array([(1, 1000.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 123.4, 7.0, 8),
> ? ? ? (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
> ? ? ?dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')])
>
>
> However, in this case, the two fields are contiguous, and thus I'm
> pretty sure one could build a numpy array that was a view. Is there
> any way to do so? Ideally without manipulating the strides by hand,
> but I may want to do that if it's the only way.
>
> -Chris
>

that's what I would try:

>>> b = a.view(dtype=[('i', '<i4'), ('fl',[('f1', '<f8'), ('f2', '<f8')]), ('i2', '<i4')])
>>> b['fl']
array([(2.0, 3.0), (8.0, 9.0), (123.40000000000001, 7.0), (10.0, 11.0),
       (14.0, 15.0)],
      dtype=[('f1', '<f8'), ('f2', '<f8')])
>>> b['fl'][2]= (200, 500)
>>> a
array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 200.0, 500.0, 8),
       (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
      dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')])

Josef
>
>
>
> --
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
> 7600 Sand Point Way NE ??(206) 526-6329?? fax
> Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From Kathleen.M.Tacina at nasa.gov  Wed Jan 25 15:30:39 2012
From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina)
Date: Wed, 25 Jan 2012 20:30:39 +0000
Subject: [Numpy-discussion] Unexpected behavior with np.min_scalar_type
In-Reply-To: <CAMRnEmo=4OYisJKtqTnjnqDYV6KN6VDAEwFAFhD5QgfMwZWM2A@mail.gmail
	.com>
References: <1327418956.6882.67.camel@MOSES.grc.nasa.gov> 
	<CAMRnEmo=4OYisJKtqTnjnqDYV6KN6VDAEwFAFhD5QgfMwZWM2A@mail.gmail.com>
Message-ID: <1327523439.6882.249.camel@MOSES.grc.nasa.gov>

Thanks!

It was interesting to see why that happened.  

Kathy

On Tue, 2012-01-24 at 18:56 -0600, Mark Wiebe wrote:
> On Tue, Jan 24, 2012 at 7:29 AM, Kathleen M Tacina
> <Kathleen.M.Tacina at nasa.gov> wrote:
> 
>         I was experimenting with np.min_scalar_type to make sure it
>         worked as expected, and found some unexpected results for
>         integers between 2**63 and 2**64-1.  I would have expected
>         np.min_scalar_type(2**64-1) to return uint64.  Instead, I get
>         object.  Further experimenting showed that the largest integer
>         for which np.min_scalar_type will return uint64 is 2**63-1.
>         Is this expected behavior?
>         
> 
> 
> This is a bug in how numpy detects the dtype of python objects.
> 
> 
> https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/common.c#L18
> 
> 
> You can see there it's only checking for a signed long long, not
> accounting for the unsigned case. I created a ticket for you here:
> 
> 
> http://projects.scipy.org/numpy/ticket/2028
> 
> 
> -Mark
>  
>         
>         On python 2.7.2 on a 64-bit linux machine:
>         >>> import numpy as np
>         >>> np.version.full_version
>         '2.0.0.dev-55472ca'
>         >>> np.min_scalar_type(2**8-1)
>         dtype('uint8')
>         >>> np.min_scalar_type(2**16-1)
>         dtype('uint16')
>         >>> np.min_scalar_type(2**32-1)
>         dtype('uint32')
>         >>> np.min_scalar_type(2**64-1)
>         dtype('O')
>         >>> np.min_scalar_type(2**63-1)
>         dtype('uint64')
>         >>> np.min_scalar_type(2**63)
>         dtype('O')
>         
>         I get the same results on a Windows XP  machine running python
>         2.7.2 and numpy 1.6.1. 
>         
>         Kathy          
>         
>         
>         _______________________________________________
>         NumPy-Discussion mailing list
>         NumPy-Discussion at scipy.org
>         http://mail.scipy.org/mailman/listinfo/numpy-discussion
>         

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120125/961cc37a/attachment.html>

From chris.barker at noaa.gov  Wed Jan 25 18:19:33 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Wed, 25 Jan 2012 15:19:33 -0800
Subject: [Numpy-discussion] view of a structured array?
In-Reply-To: <CAMMTP+DLo_P8Kc98vQCPsm+dsjceRS0D1a5R6Lpy1U+T0yX4UA@mail.gmail.com>
References: <CALGmxE+zSWiae=3UC3rA3v-KJ-wnTayPXdv9A3KmovjW21udxg@mail.gmail.com>
	<CAMMTP+DLo_P8Kc98vQCPsm+dsjceRS0D1a5R6Lpy1U+T0yX4UA@mail.gmail.com>
Message-ID: <CALGmxELqtthmQLNDFrKu+kXMmuPFCn8HvtmPN3cKt8EsNaYB1w@mail.gmail.com>

On Wed, Jan 25, 2012 at 11:33 AM,  <josef.pktd at gmail.com> wrote:
> that's what I would try:
>
>>>> b = a.view(dtype=[('i', '<i4'), ('fl',[('f1', '<f8'), ('f2', '<f8')]), ('i2', '<i4')])

ah yes, I forgot about nesting dtypes -- very nice, thanks!

-Chris


>>>> b['fl']
> array([(2.0, 3.0), (8.0, 9.0), (123.40000000000001, 7.0), (10.0, 11.0),
> ? ? ? (14.0, 15.0)],
> ? ? ?dtype=[('f1', '<f8'), ('f2', '<f8')])
>>>> b['fl'][2]= (200, 500)
>>>> a
> array([(1, 2.0, 3.0, 4), (7, 8.0, 9.0, 10), (5, 200.0, 500.0, 8),
> ? ? ? (9, 10.0, 11.0, 12), (13, 14.0, 15.0, 16)],
> ? ? ?dtype=[('i', '<i4'), ('f1', '<f8'), ('f2', '<f8'), ('i2', '<i4')])
>
> Josef
>>
>>
>>
>> --
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
>> 7600 Sand Point Way NE ??(206) 526-6329?? fax
>> Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception
>>
>> Chris.Barker at noaa.gov
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From ondrej.certik at gmail.com  Wed Jan 25 21:56:31 2012
From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=)
Date: Wed, 25 Jan 2012 18:56:31 -0800
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <CAMRnEmoWRVCYpMqR9JeFYWS0uM9t4HwJbfHS5ZH3PGMMUu5RBw@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
	<CAMRnEmoWRVCYpMqR9JeFYWS0uM9t4HwJbfHS5ZH3PGMMUu5RBw@mail.gmail.com>
Message-ID: <CADDwiVA71zL1WMjX-89_3vaWye9dXTVzDKr2ix+2DoKaM1c6bg@mail.gmail.com>

On Tue, Jan 24, 2012 at 4:33 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> 2012/1/21 Ond?ej ?ert?k <ondrej.certik at gmail.com>
>>
>> <snip>
>>
>>
>> Let me know if you figure out something. I think the "mask" thing is
>> quite slow, but the problem is that it needs to be there, to catch
>> overflows (and it is there in Fortran as well, see the
>> "where" statement, which does the same thing). Maybe there is some
>> other way to write the same thing in NumPy?
>
>
> In the current master, you can replace
>
> ? ? z[mask] *= z[mask]
> ? ? z[mask] += c[mask]
> with
> ? ? np.multiply(z, z, out=z, where=mask)
> ? ? np.add(z, c, out=z, where=mask)

I am getting:

Traceback (most recent call last):
  File "b.py", line 19, in <module>
    np.multiply(z, z, out=z, where=mask)
TypeError: 'where' is an invalid keyword to ufunc 'multiply'

I assume it is a new feature in numpy?

>
> The performance of this alternate syntax is still not great, but it is
> significantly faster than what it replaces. For a particular choice of mask,
> I get
>
> In [40]: timeit z[mask] *= z[mask]
>
> 10 loops, best of 3: 29.1 ms per loop
>
> In [41]: timeit np.multiply(z, z, out=z, where=mask)
>
> 100 loops, best of 3: 4.2 ms per loop

That looks like 7x faster to me. If it works for you, can
you run the mandelbrot example with and without your patch?

That way we'll know the actual speedup.

Ondrej


From sturla at molden.no  Thu Jan 26 04:19:25 2012
From: sturla at molden.no (Sturla Molden)
Date: Thu, 26 Jan 2012 10:19:25 +0100
Subject: [Numpy-discussion] OT: MS C++ AMP library
Message-ID: <4F211A9D.3060108@molden.no>


When we have nice libraries like OpenCL, OpenGL and OpenMP, I am so glad 
we have Microsoft to screw it up.

Congratulations to Redmond: Another C++ API I cannot read, and a 
scientific compute library I hopefully never have to use.

http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx

The annoying part is, with this crap there will never be a standard 
OpenCL DLL in Windows.

Sturla Molden


From matthieu.brucher at gmail.com  Thu Jan 26 04:24:58 2012
From: matthieu.brucher at gmail.com (Matthieu Brucher)
Date: Thu, 26 Jan 2012 10:24:58 +0100
Subject: [Numpy-discussion] OT: MS C++ AMP library
In-Reply-To: <4F211A9D.3060108@molden.no>
References: <4F211A9D.3060108@molden.no>
Message-ID: <CAHCaCk+6N2o0FHw4V7ghFFgxWZmqjT9R9rGzXUWJ3sNSWSLgjw@mail.gmail.com>

Hi Sturla,

It has been several months now since AMP is there, I wouldn't care about it.
You also forgot about OpenAcc, the accelerator sister of OpenMP, Intel's
PBB (with TBB, IPP, ArBB that will soon make a step in Numpy's world),
OmpSS, and so many others.

I wouldn't blame MS for this, IMHO Intel does a far better job at the
moment, and we are only starting consolidation now that everyone has shown
its cards.

Cheers,

Matthieu

2012/1/26 Sturla Molden <sturla at molden.no>

>
> When we have nice libraries like OpenCL, OpenGL and OpenMP, I am so glad
> we have Microsoft to screw it up.
>
> Congratulations to Redmond: Another C++ API I cannot read, and a
> scientific compute library I hopefully never have to use.
>
> http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx
>
> The annoying part is, with this crap there will never be a standard
> OpenCL DLL in Windows.
>
> Sturla Molden
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120126/54554c73/attachment.html>

From paul.anton.letnes at gmail.com  Thu Jan 26 07:30:44 2012
From: paul.anton.letnes at gmail.com (Paul Anton Letnes)
Date: Thu, 26 Jan 2012 13:30:44 +0100
Subject: [Numpy-discussion] array metadata
In-Reply-To: <CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com>
References: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com>
	<CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com>
Message-ID: <CAFh5KYbtP4gTG6GcMLQcUxo1_uZx3oGG+pp1btNmp4Pm4bo1kg@mail.gmail.com>

If by "store" you mean "store on disk", I recommend h5py datasets and
attributes. Reportedly pytables is also good but I don't have any
first hand experience there. Both python modules use the hdf5 library,
written in C/C++/Fortran.

Paul

On Wed, Jan 25, 2012 at 7:47 PM, Val Kalatsky <kalatsky at gmail.com> wrote:
>
> I believe there are no provisions made for that in ndarray.
> But you can subclass ndarray.
> Val
>
>
> On Wed, Jan 25, 2012 at 12:10 PM, Emmanuel Mayssat <emayssat at gmail.com>
> wrote:
>>
>> Is there a way to store metadata for an array?
>> For example, date the samples were collected, name of the operator, etc.
>>
>> Regards,
>> --
>> Emmanuel
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From scipy at samueljohn.de  Thu Jan 26 08:04:49 2012
From: scipy at samueljohn.de (Samuel John)
Date: Thu, 26 Jan 2012 14:04:49 +0100
Subject: [Numpy-discussion] OT: MS C++ AMP library
In-Reply-To: <4F211A9D.3060108@molden.no>
References: <4F211A9D.3060108@molden.no>
Message-ID: <D680AB77-58AB-4A28-A5F8-6192BD1F563D@samueljohn.de>

Yes, I agree 100%.

On 26.01.2012, at 10:19, Sturla Molden wrote:
> When we have nice libraries like OpenCL, OpenGL and OpenMP, I am so glad 
> we have Microsoft to screw it up.
> 
> Congratulations to Redmond: Another C++ API I cannot read, and a 
> scientific compute library I hopefully never have to use.
> 
> http://msdn.microsoft.com/en-us/library/hh265136(v=vs.110).aspx
> 
> The annoying part is, with this crap there will never be a standard 
> OpenCL DLL in Windows.
> 
> Sturla Molden


From pierre.haessig at crans.org  Thu Jan 26 08:19:18 2012
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Thu, 26 Jan 2012 14:19:18 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
Message-ID: <4F2152D6.10303@crans.org>

Le 22/01/2012 01:40, josef.pktd at gmail.com a ?crit :
> same here,
> When I rewrote scipy.stats.spearmanr, I matched the numpy behavior for
> two arrays, while R only returns the cross-correlation part.
Since I've seen no negative feedback, I jumped to the next step by 
creating a Trac account and posting a new ticket :

http://projects.scipy.org/numpy/ticket/2031

If people feel ok with this proposal, I can try to expand the proposed 
implementation skeleton to something more serious. But maybe Elliot has 
already something ready to pull-request on GitHub ?

Pierre


From derek at astro.physik.uni-goettingen.de  Thu Jan 26 08:49:58 2012
From: derek at astro.physik.uni-goettingen.de (Derek Homeier)
Date: Thu, 26 Jan 2012 14:49:58 +0100
Subject: [Numpy-discussion] array metadata
In-Reply-To: <CAFh5KYbtP4gTG6GcMLQcUxo1_uZx3oGG+pp1btNmp4Pm4bo1kg@mail.gmail.com>
References: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com>
	<CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com>
	<CAFh5KYbtP4gTG6GcMLQcUxo1_uZx3oGG+pp1btNmp4Pm4bo1kg@mail.gmail.com>
Message-ID: <21E941F9-86A1-4468-9645-3F48B54BC43B@astro.physik.uni-goettingen.de>

On 26 Jan 2012, at 13:30, Paul Anton Letnes wrote:

> If by "store" you mean "store on disk", I recommend h5py datasets and
> attributes. Reportedly pytables is also good but I don't have any
> first hand experience there. Both python modules use the hdf5 library,
> written in C/C++/Fortran.
> 
> Paul
> 
> On Wed, Jan 25, 2012 at 7:47 PM, Val Kalatsky <kalatsky at gmail.com> wrote:
>> 
>> I believe there are no provisions made for that in ndarray.
>> But you can subclass ndarray.
> 
You could probably use structured arrays with string and datetype fields for the 
metadata and multidimensional fields (i.e. effectively subarrays within the 
structured array) for the actual data. For file storage, they could probably be directly 
saved as .npy, if interoperability is not a concern. Otherwise I'd also highly recommend 
hdf5; with both h5py and pytables allowing quite transparent conversion of structured 
arrays to datasets in the HDF5, but you also have the option to store other objects, 
like dictionary elements, within the same data structure. 
Pytables is generally regarded as having a more database-oriented approach, 
while h5py appears more straightforward to use from a numerics background 
(at least in my experience).

Cheers,
						Derek


From bsouthey at gmail.com  Thu Jan 26 09:57:06 2012
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 26 Jan 2012 08:57:06 -0600
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <4F2152D6.10303@crans.org>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
Message-ID: <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>

On Thu, Jan 26, 2012 at 7:19 AM, Pierre Haessig
<pierre.haessig at crans.org> wrote:
> Le 22/01/2012 01:40, josef.pktd at gmail.com a ?crit :
>> same here,
>> When I rewrote scipy.stats.spearmanr, I matched the numpy behavior for
>> two arrays, while R only returns the cross-correlation part.
> Since I've seen no negative feedback, I jumped to the next step by
> creating a Trac account and posting a new ticket :
>
> http://projects.scipy.org/numpy/ticket/2031
>
> If people feel ok with this proposal, I can try to expand the proposed
> implementation skeleton to something more serious. But maybe Elliot has
> already something ready to pull-request on GitHub ?
>
> Pierre
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Really I do not understand what you want to do especially when the
ticket contains some very basic errors. Can you please provide a
couple of real examples with expected output that clearly show what
you want?

>From a statistical viewpoint, np.cov is correct because it outputs the
variance/covariance matrix. Also I believe that changing the np.cov
function will cause major havoc because numpy and people's code depend
on the current behavior.


Bruce


From pav at iki.fi  Thu Jan 26 10:50:17 2012
From: pav at iki.fi (Pauli Virtanen)
Date: Thu, 26 Jan 2012 16:50:17 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
Message-ID: <jfrsnq$6i8$1@dough.gmane.org>

26.01.2012 15:57, Bruce Southey kirjoitti:
[clip]
> Also I believe that changing the np.cov
> function will cause major havoc because numpy and people's code depend
> on the current behavior.

Changing the behavior of `cov` is IMHO not really possible at this point
--- the current behavior is not a bug, but a documented feature that has
been around probably already since Numeric.

However, adding a new function could be possible.

-- 
Pauli Virtanen


From pierre.haessig at crans.org  Thu Jan 26 11:07:15 2012
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Thu, 26 Jan 2012 17:07:15 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
Message-ID: <4F217A33.6020806@crans.org>

Le 26/01/2012 15:57, Bruce Southey a ?crit :
> Can you please provide a
> couple of real examples with expected output that clearly show what
> you want?
>
Hi Bruce,

Thanks for your ticket feedback ! It's precisely because I see a big 
potential impact of the proposed change that I send first a ML message, 
second a ticket before jumping to a pull-request like a Sergio Leone's 
cowboy (sorry, I watched "for a few dollars more" last weekend...)

Now, I realize that in the ticket writing I made the wrong trade-off 
between conciseness and accuracy which led to some of the errors you 
raised. Let me try to use your example to try to share what I have in mind.

> >> X = array([-2.1, -1. ,  4.3])
> >> Y = array([ 3.  ,  1.1 ,  0.12])

Indeed, with today's cov behavior we have a 2x2 array:
> >> cov(X,Y)
array([[ 11.71      ,  -4.286     ],
        [ -4.286     ,   2.14413333]])

Now, when I used the word 'concatenation', I wasn't precise enough 
because I meant assembling X and Y in the sense of 2 vectors of 
observations from 2 random variables X and Y.
This is achieved by concatenate(X,Y) *when properly playing with 
dimensions* (which I didn't mentioned) :
> >> XY = np.concatenate((X[None, :], Y[None, :]))
array([[-2.1 , -1.  ,  4.3 ],
        [ 3.  ,  1.1 ,  0.12]])

In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
> >> np.cov(XY)
array([[ 11.71      ,  -4.286     ],
        [ -4.286     ,   2.14413333]])

(And indeed, the actual cov Python code does use concatenate() )


Now let me come back to my assertion about this behavior *usefulness*.
You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 
simple scalars blocks).
  * diagonal blocks are just cov(X) and cov(Y) (which in this case comes 
to var(X) and var(Y) when setting ddof to 1)
  * off diagonal blocks are symetric and are actually the covariance 
estimate of X, Y observations (from 
http://en.wikipedia.org/wiki/Covariance)

that is :
> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
-4.2860000000000005

The new proposed behaviour for cov is that cov(X,Y) would return :
array(-4.2860000000000005)  instead of the 2*2 matrix.

  * This would be in line with the cov(X,Y) mathematical definition, as 
well as with R behavior.
  * This would save memory and computing resources. (and therefore help 
save the planet ;-) )

However, I do understand that the impact for this change may be big. 
This indeed requires careful reviewing.

Pierre


From pierre.haessig at crans.org  Thu Jan 26 11:25:38 2012
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Thu, 26 Jan 2012 17:25:38 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <jfrsnq$6i8$1@dough.gmane.org>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>	<4F2152D6.10303@crans.org>	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<jfrsnq$6i8$1@dough.gmane.org>
Message-ID: <4F217E82.2050002@crans.org>

Le 26/01/2012 16:50, Pauli Virtanen a ?crit :
> the current behavior is not a bug,
I completely agree that numpy.cov(m,y) does what it says !

I (and apparently some other people) are only questioning why there is 
such a behavior ? Indeed, the second variable `y` is presented as "An 
additional set of variables and observations".

This raises for me two different questions :
* What is the use case for such an additional set of variables that 
could just be concatenated to the first set `?m` ?
* Or, if indeed this sort of integrated concatenation is useful, why 
just add one "additional set" and not several "additional sets" like :
 >>> cov(m, y1, y2, y3, ....) ?

But I would understand that numpy responsibility to provide a stable 
computing API would prevent any change in cov behavior. You have the 
long term experience to judge that. (I certainly don't ;-) )

However, in the case this change is not possible, I would see this 
solution :
* add and xcov function that does what Elliot and Sturla and I 
described, because
* possibly deprecate the `y` 2nd argument of cov because I feel it 
brings more definition complication than real programming benefits

(but I still find that changing cov would lead to a leaner numpy API 
which was my motivation for reacting to Elliot's first message)

Pierre


From sturla at molden.no  Thu Jan 26 12:26:32 2012
From: sturla at molden.no (Sturla Molden)
Date: Thu, 26 Jan 2012 18:26:32 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <4F217E82.2050002@crans.org>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>	<4F2152D6.10303@crans.org>	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org>
Message-ID: <4F218CC8.9070602@molden.no>

Den 26.01.2012 17:25, skrev Pierre Haessig:
> However, in the case this change is not possible, I would see this
> solution :
> * add and xcov function that does what Elliot and Sturla and I
> described, because

The current np.cov implementation returns the cross-covariance the way 
it is commonly used in statistics. If MATLAB does not, then that is 
MATLAB's problem I think.

http://www.stat.washington.edu/research/reports/2001/tr391.pdf

Sturla


From emayssat at gmail.com  Thu Jan 26 12:37:23 2012
From: emayssat at gmail.com (Emmanuel Mayssat)
Date: Thu, 26 Jan 2012 09:37:23 -0800
Subject: [Numpy-discussion] array metadata
In-Reply-To: <21E941F9-86A1-4468-9645-3F48B54BC43B@astro.physik.uni-goettingen.de>
References: <CACB6ZmBwFDtRKb4cYPGBG7VismuSBqe3omZufn90R+isrjDWkg@mail.gmail.com>
	<CAE8bXEmdAk8FrYGYRjvkxBm-fv7SZ1x19YWP3YxV=r1J2MJ5pA@mail.gmail.com>
	<CAFh5KYbtP4gTG6GcMLQcUxo1_uZx3oGG+pp1btNmp4Pm4bo1kg@mail.gmail.com>
	<21E941F9-86A1-4468-9645-3F48B54BC43B@astro.physik.uni-goettingen.de>
Message-ID: <CACB6ZmBjt=iTCvMFnarZ=2pdezrv1UWwBDXWdE-0pk_V+joNbg@mail.gmail.com>

subclassing is what I was looking for.
Indeed the code is almost available at
http://docs.scipy.org/doc/numpy/user/basics.subclassing.html#simple-example-adding-an-extra-attribute-to-ndarray
I just created a dictionary variable which I called 'metadata'
I had to overload the __repr__ method to print my parameters in the
python shell.

As far as saving the data on the disk.... let me start a new thread ;-)
--
Emmanuel

On Thu, Jan 26, 2012 at 5:49 AM, Derek Homeier
<derek at astro.physik.uni-goettingen.de> wrote:
> On 26 Jan 2012, at 13:30, Paul Anton Letnes wrote:
>
>> If by "store" you mean "store on disk", I recommend h5py datasets andhttp://docs.scipy.org/doc/numpy/user/basics.subclassing.html
>> attributes. Reportedly pytables is also good but I don't have any
>> first hand experience there. Both python modules use the hdf5 library,
>> written in C/C++/Fortran.
>>
>> Paul
>>
>> On Wed, Jan 25, 2012 at 7:47 PM, Val Kalatsky <kalatsky at gmail.com> wrote:
>>>
>>> I believe there are no provisions made for that in ndarray.
>>> But you can subclass ndarray.
>>
> You could probably use structured arrays with string and datetype fields for the
> metadata and multidimensional fields (i.e. effectively subarrays within the
> structured array) for the actual data. For file storage, they could probably be directly
> saved as .npy, if interoperability is not a concern. Otherwise I'd also highly recommend
> hdf5; with both h5py and pytables allowing quite transparent conversion of structured
> arrays to datasets in the HDF5, but you also have the option to store other objects,
> like dictionary elements, within the same data structure.
> Pytables is generally regarded as having a more database-oriented approach,
> while h5py appears more straightforward to use from a numerics background
> (at least in my experience).
>
> Cheers,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Derek
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From sturla at molden.no  Thu Jan 26 12:39:07 2012
From: sturla at molden.no (Sturla Molden)
Date: Thu, 26 Jan 2012 18:39:07 +0100
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <20120124161921.GA31456@ravage>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no>
	<4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no>
	<CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com>
	<20120124161921.GA31456@ravage>
Message-ID: <4F218FBB.5080301@molden.no>

Den 24.01.2012 17:19, skrev David Warde-Farley:
>
> Hmm. Seeing as the width of a C long is inconsistent, does this imply that
> the random number generator will produce different results on different
> platforms?

If it does, it is a C programming mistake. C code should never depend on 
the exact size of a long, only it's minimum size.  ISO C defines other 
datatypes if an exact integer size is needed (include stdint.h), but 
ANSI C used for NumPy does not.

Sturla


From scipy at samueljohn.de  Thu Jan 26 13:01:22 2012
From: scipy at samueljohn.de (Samuel John)
Date: Thu, 26 Jan 2012 19:01:22 +0100
Subject: [Numpy-discussion] Problem installing NumPy with Python
	3.2.2/MacOS X 10.7.2
In-Reply-To: <472327CE-59F4-4DB5-B80C-D7EC2FFBBAF3@gmail.com>
References: <mailman.6468.1324246451.1086.numpy-discussion@scipy.org>
	<472327CE-59F4-4DB5-B80C-D7EC2FFBBAF3@gmail.com>
Message-ID: <AD984E1A-D359-4BA0-BDD4-81826454EFC2@samueljohn.de>

Hi Hans-Martin!

You could try my instructions recently posted to this list http://thread.gmane.org/gmane.comp.python.scientific.devel/15956/
Basically, using llvm-gcc scipy segfaults when scipy.test() (on my system at least).

Therefore, I created the homebrew install formula. 
They work for whatever "which python" you have. But I have tested this for 2.7.2 on MacOS X 10.7.2.

Samuel


On 11.01.2012, at 16:12, Hans-Martin v. Gaudecker wrote:

> I recently upgraded to Lion and just faced the same problem with both Python 2.7.2 and Python 3.2.2 installed via the python.org installers. My hunch is that the errors are related to the fact that Apple dropped gcc-4.2 from XCode 4.2. I got gcc-4.2 via [1] then, still the same error -- who knows what else got lost in that upgrade... Previous successful builds with gcc-4.2 might have been with XCode 4.1 (or 4.2 installed on top of it).
> 
> In the end I decided to re-install both Python versions via homebrew, nicely described here [2] and everything seems to work fine using LLVM. Test outputs for NumPy master under 2.7.2 and 3.2.2 are below in case they are of interest.
> 
> Best,
> Hans-Martin
> 
> [1] https://github.com/kennethreitz/osx-gcc-installer
> [2] http://www.thisisthegreenroom.com/2011/installing-python-numpy-scipy-matplotlib-and-ipython-on-lion/#numpy

The instructions at [2] lead to a segfault in scipy.test() for me, because it used llvm-gcc (which is the default on Lion).

From josef.pktd at gmail.com  Thu Jan 26 13:19:11 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 26 Jan 2012 13:19:11 -0500
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <4F218CC8.9070602@molden.no>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org>
	<4F218CC8.9070602@molden.no>
Message-ID: <CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com>

On Thu, Jan 26, 2012 at 12:26 PM, Sturla Molden <sturla at molden.no> wrote:
> Den 26.01.2012 17:25, skrev Pierre Haessig:
>> However, in the case this change is not possible, I would see this
>> solution :
>> * add and xcov function that does what Elliot and Sturla and I
>> described, because
>
> The current np.cov implementation returns the cross-covariance the way
> it is commonly used in statistics. If MATLAB does not, then that is
> MATLAB's problem I think.

The discussion had this reversed, numpy matches the behavior of
MATLAB, while R (statistics) only returns the cross covariance part as
proposed.

If there is a new xcov, then I think there should also be a xcorrcoef.
This case needs a different implementation than corrcoef, since the
xcov doesn't contain the variances and they need to be calculated
separately.

Josef

>
> http://www.stat.washington.edu/research/reports/2001/tr391.pdf
>
> Sturla
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From bsouthey at gmail.com  Thu Jan 26 13:25:55 2012
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 26 Jan 2012 12:25:55 -0600
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <4F217A33.6020806@crans.org>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<4F217A33.6020806@crans.org>
Message-ID: <CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com>

On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
<pierre.haessig at crans.org> wrote:
> Le 26/01/2012 15:57, Bruce Southey a ?crit :
>> Can you please provide a
>> couple of real examples with expected output that clearly show what
>> you want?
>>
> Hi Bruce,
>
> Thanks for your ticket feedback ! It's precisely because I see a big
> potential impact of the proposed change that I send first a ML message,
> second a ticket before jumping to a pull-request like a Sergio Leone's
> cowboy (sorry, I watched "for a few dollars more" last weekend...)
>
> Now, I realize that in the ticket writing I made the wrong trade-off
> between conciseness and accuracy which led to some of the errors you
> raised. Let me try to use your example to try to share what I have in mind.
>
>> >> X = array([-2.1, -1. , ?4.3])
>> >> Y = array([ 3. ?, ?1.1 , ?0.12])
>
> Indeed, with today's cov behavior we have a 2x2 array:
>> >> cov(X,Y)
> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>
> Now, when I used the word 'concatenation', I wasn't precise enough
> because I meant assembling X and Y in the sense of 2 vectors of
> observations from 2 random variables X and Y.
> This is achieved by concatenate(X,Y) *when properly playing with
> dimensions* (which I didn't mentioned) :
>> >> XY = np.concatenate((X[None, :], Y[None, :]))
> array([[-2.1 , -1. ?, ?4.3 ],
> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]])

In this context, I find stacking,  np.vstack((X,Y)), more appropriate
than concatenate.

>
> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
>> >> np.cov(XY)
> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>
Sure the resulting array is the same but whole process is totally different.


> (And indeed, the actual cov Python code does use concatenate() )
Yes, but the user does not see that. Whereas you are forcing the user
to do the stacking in the correct dimensions.


>
>
> Now let me come back to my assertion about this behavior *usefulness*.
> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
> simple scalars blocks).
No there are not '4' blocks just rows and columns.

> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes
> to var(X) and var(Y) when setting ddof to 1)
Sure but variances are still covariances.

> ?* off diagonal blocks are symetric and are actually the covariance
> estimate of X, Y observations (from
> http://en.wikipedia.org/wiki/Covariance)
Sure
>
> that is :
>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
> -4.2860000000000005
>
> The new proposed behaviour for cov is that cov(X,Y) would return :
> array(-4.2860000000000005) ?instead of the 2*2 matrix.

But how you interpret an 2D array where the rows are greater than 2?
>>> Z=Y+X
>>> np.cov(np.vstack((X,Y,Z)))
array([[ 11.71      ,  -4.286     ,   7.424     ],
       [ -4.286     ,   2.14413333,  -2.14186667],
       [  7.424     ,  -2.14186667,   5.28213333]])


>
> ?* This would be in line with the cov(X,Y) mathematical definition, as
> well as with R behavior.
I don't care what R does because I am using Python and Python is
infinitely better than R is!

But I think that is only in the 1D case.

> ?* This would save memory and computing resources. (and therefore help
> save the planet ;-) )
Nothing that you have provided shows that it will.


>
> However, I do understand that the impact for this change may be big.
> This indeed requires careful reviewing.
>
> Pierre
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Bruce


From josef.pktd at gmail.com  Thu Jan 26 13:45:46 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 26 Jan 2012 13:45:46 -0500
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<4F217A33.6020806@crans.org>
	<CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com>
Message-ID: <CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com>

On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
> <pierre.haessig at crans.org> wrote:
>> Le 26/01/2012 15:57, Bruce Southey a ?crit :
>>> Can you please provide a
>>> couple of real examples with expected output that clearly show what
>>> you want?
>>>
>> Hi Bruce,
>>
>> Thanks for your ticket feedback ! It's precisely because I see a big
>> potential impact of the proposed change that I send first a ML message,
>> second a ticket before jumping to a pull-request like a Sergio Leone's
>> cowboy (sorry, I watched "for a few dollars more" last weekend...)
>>
>> Now, I realize that in the ticket writing I made the wrong trade-off
>> between conciseness and accuracy which led to some of the errors you
>> raised. Let me try to use your example to try to share what I have in mind.
>>
>>> >> X = array([-2.1, -1. , ?4.3])
>>> >> Y = array([ 3. ?, ?1.1 , ?0.12])
>>
>> Indeed, with today's cov behavior we have a 2x2 array:
>>> >> cov(X,Y)
>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>>
>> Now, when I used the word 'concatenation', I wasn't precise enough
>> because I meant assembling X and Y in the sense of 2 vectors of
>> observations from 2 random variables X and Y.
>> This is achieved by concatenate(X,Y) *when properly playing with
>> dimensions* (which I didn't mentioned) :
>>> >> XY = np.concatenate((X[None, :], Y[None, :]))
>> array([[-2.1 , -1. ?, ?4.3 ],
>> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]])
>
> In this context, I find stacking, ?np.vstack((X,Y)), more appropriate
> than concatenate.
>
>>
>> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
>>> >> np.cov(XY)
>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>>
> Sure the resulting array is the same but whole process is totally different.
>
>
>> (And indeed, the actual cov Python code does use concatenate() )
> Yes, but the user does not see that. Whereas you are forcing the user
> to do the stacking in the correct dimensions.
>
>
>>
>>
>> Now let me come back to my assertion about this behavior *usefulness*.
>> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
>> simple scalars blocks).
> No there are not '4' blocks just rows and columns.

Sturla showed the 4 blocks in his first message.

>
>> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes
>> to var(X) and var(Y) when setting ddof to 1)
> Sure but variances are still covariances.
>
>> ?* off diagonal blocks are symetric and are actually the covariance
>> estimate of X, Y observations (from
>> http://en.wikipedia.org/wiki/Covariance)
> Sure
>>
>> that is :
>>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
>> -4.2860000000000005
>>
>> The new proposed behaviour for cov is that cov(X,Y) would return :
>> array(-4.2860000000000005) ?instead of the 2*2 matrix.
>
> But how you interpret an 2D array where the rows are greater than 2?
>>>> Z=Y+X
>>>> np.cov(np.vstack((X,Y,Z)))
> array([[ 11.71 ? ? ?, ?-4.286 ? ? , ? 7.424 ? ? ],
> ? ? ? [ -4.286 ? ? , ? 2.14413333, ?-2.14186667],
> ? ? ? [ ?7.424 ? ? , ?-2.14186667, ? 5.28213333]])
>
>
>>
>> ?* This would be in line with the cov(X,Y) mathematical definition, as
>> well as with R behavior.
> I don't care what R does because I am using Python and Python is
> infinitely better than R is!
>
> But I think that is only in the 1D case.

I just checked R to make sure I remember correctly

> xx = matrix((1:20)^2, nrow=4)
> xx
     [,1] [,2] [,3] [,4] [,5]
[1,]    1   25   81  169  289
[2,]    4   36  100  196  324
[3,]    9   49  121  225  361
[4,]   16   64  144  256  400
> cov(xx, 2*xx[,1:2])
         [,1]      [,2]
[1,]  86.0000  219.3333
[2,] 219.3333  566.0000
[3,] 352.6667  912.6667
[4,] 486.0000 1259.3333
[5,] 619.3333 1606.0000
> cov(xx)
         [,1]     [,2]      [,3]      [,4]      [,5]
[1,]  43.0000 109.6667  176.3333  243.0000  309.6667
[2,] 109.6667 283.0000  456.3333  629.6667  803.0000
[3,] 176.3333 456.3333  736.3333 1016.3333 1296.3333
[4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667
[5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000


>
>> ?* This would save memory and computing resources. (and therefore help
>> save the planet ;-) )
> Nothing that you have provided shows that it will.

I don't know about saving the planet, but if X and Y have the same
number of columns, we save 3 quarters of the calculations, as Sturla
also explained in his first message.

Josef

>
>>
>> However, I do understand that the impact for this change may be big.
>> This indeed requires careful reviewing.
>>
>> Pierre
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> Bruce
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From robert.kern at gmail.com  Thu Jan 26 15:35:58 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 26 Jan 2012 20:35:58 +0000
Subject: [Numpy-discussion] advanced indexing bug with huge arrays?
In-Reply-To: <4F218FBB.5080301@molden.no>
References: <19DFC795-1837-46DB-A199-CDE4A011EE38@iro.umontreal.ca>
	<20120123185552.GA27535@ravage>
	<CALsWBNO-1-eAXNyYgkwoKOH-KQPBwYuVGUNDVqtZELTQLkw=sQ@mail.gmail.com>
	<20120123203353.GD28091@ravage> <4F1DCC33.6090101@uci.edu>
	<4F1E3AD5.9000507@molden.no> <4F1E425E.20203@molden.no>
	<4F1E69EC.4020408@molden.no> <4F1E6DC6.6040904@molden.no>
	<CAF6FJitzTKXqjMpQN2QjVLqk+mEizAuWTWgsP36q6BRy=G3XCQ@mail.gmail.com>
	<20120124161921.GA31456@ravage> <4F218FBB.5080301@molden.no>
Message-ID: <CAF6FJivkKg=aTO5ELPLFbvX5kzd4Efr1BS5sMe4YdGyfnsQ3Eg@mail.gmail.com>

On Thu, Jan 26, 2012 at 17:39, Sturla Molden <sturla at molden.no> wrote:
> Den 24.01.2012 17:19, skrev David Warde-Farley:
>>
>> Hmm. Seeing as the width of a C long is inconsistent, does this imply that
>> the random number generator will produce different results on different
>> platforms?
>
> If it does, it is a C programming mistake. C code should never depend on
> the exact size of a long, only it's minimum size. ?ISO C defines other
> datatypes if an exact integer size is needed (include stdint.h), but
> ANSI C used for NumPy does not.

I think you're subtly misunderstanding his question. He's not asking
if the code is written such that it semantically requires a long to
have one specific size or another (and indeed, it is not). However, it
is true that the code may behave differently for the same inputs on
different machines with different long sizes. Namely, some part of the
computation may overflow on 32-bit longs while giving an accurate
answer with 64-bit longs. They just have different domains of accuracy
over their inputs. It is not necessarily a mistake to take advantage
of the extra room when it is available. That is the reason that Python
ints are C longs and why numpy follows suit.

But unfortunately, it is true that at least some of the distributions
do have different behavior when using 64-bit longs than when using
32-bit longs. Here is an example of drawing from a binomial
distribution with a large N on a 32-bit process and comparing it with
results from a 64-bit process:

[~]$ $PYBIN/python
Enthought Python Distribution -- www.enthought.com
Version: 7.1-2 (32-bit)

Python 2.7.2 |EPD 7.1-2 (32-bit)| (default, Jul 27 2011, 13:29:32)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "packages", "demo" or "enthought" for more information.
>>> import numpy as np
>>> prng = np.random.RandomState(1234567890)
>>> x32 = prng.binomial(500000000, 0.5, size=10000)
>>> x64 = np.load('x64.npy')
>>> np.save('x32.npy', x32)
>>> bad = (x64 != x32)
>>> bad.sum()
9449
>>> bad[-9449:]
array([ True,  True,  True, ...,  True,  True,  True], dtype=bool)
>>> bad[-9449:].sum()
9449

binomial() is using a rejection algorithm for this set of inputs. For
each draw, it is going to generate random numbers until they meet a
certain condition. Both the 32-bit process and the 64-bit process draw
the same exact numbers until the 552nd draw. Then, I suspect, there is
an integer overflow in the 32-bit process causing the rejection
algorithm to terminate either earlier or later than it otherwise
should. Since the two processes have consumed different amounts of
random numbers, the underlying uniform PRNG is no longer in the same
state, so all of the numbers thereafter will be different.

It's not clear to me how problematic this is. I haven't seen any
difference when using reasonable input values (N=500000000 is a
ridiculous number to be using with a binomial distribution). If I'm
right that there is an overflow when using the 32-bit longs, then the
results should not be trusted anyways, so there is no point in
comparing them to the 64-bit results. It's just that the domain of
validity with a 32-bit long is a bit smaller than when using a 64-bit
long. The deviation of x32[551] from the mean is larger than the
maximum deviation from the 64-bit results, so it is reasonably likely
that the draw is just bogus.

>>> np.max(abs(x64 - 250000000))
44519
>>> x32[551] - 250000000
47368

Often, the acceptance criterion is something of the form (X <
something) while expecting X to be positive. An integer overflow would
introduce a negative value somewhere in the computation and could
easily "pass" this acceptance criterion when it really shouldn't have
if the intermediate computations had been done without overflow.

If anyone wants to debug this more thoroughly, this bit of code will
get the PRNG into exactly the right state to see the difference on the
next binomial() draw:

>>> import numpy as np
>>> prng = np.random.RandomState(1234567890)
>>> blah = prng.binomial(500000000, 0.5, size=551)

If you run python under gdb, you can then set a breakpoint in
rk_binomial_btpe() in distributions.c to step through the next call to
prng.binomial(). Sometimes you can fix these issues in a rejection
algorithm by checking for overflow and rejecting those cases.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From bsouthey at gmail.com  Thu Jan 26 15:58:45 2012
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 26 Jan 2012 14:58:45 -0600
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<4F217A33.6020806@crans.org>
	<CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com>
	<CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com>
Message-ID: <CAAea2pYFc7nmMkM0MZvWXh53Z9TFCHf3aMbQ0kM6H4haJbL9yg@mail.gmail.com>

On Thu, Jan 26, 2012 at 12:45 PM,  <josef.pktd at gmail.com> wrote:
> On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
>> <pierre.haessig at crans.org> wrote:
>>> Le 26/01/2012 15:57, Bruce Southey a ?crit :
>>>> Can you please provide a
>>>> couple of real examples with expected output that clearly show what
>>>> you want?
>>>>
>>> Hi Bruce,
>>>
>>> Thanks for your ticket feedback ! It's precisely because I see a big
>>> potential impact of the proposed change that I send first a ML message,
>>> second a ticket before jumping to a pull-request like a Sergio Leone's
>>> cowboy (sorry, I watched "for a few dollars more" last weekend...)
>>>
>>> Now, I realize that in the ticket writing I made the wrong trade-off
>>> between conciseness and accuracy which led to some of the errors you
>>> raised. Let me try to use your example to try to share what I have in mind.
>>>
>>>> >> X = array([-2.1, -1. , ?4.3])
>>>> >> Y = array([ 3. ?, ?1.1 , ?0.12])
>>>
>>> Indeed, with today's cov behavior we have a 2x2 array:
>>>> >> cov(X,Y)
>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>>>
>>> Now, when I used the word 'concatenation', I wasn't precise enough
>>> because I meant assembling X and Y in the sense of 2 vectors of
>>> observations from 2 random variables X and Y.
>>> This is achieved by concatenate(X,Y) *when properly playing with
>>> dimensions* (which I didn't mentioned) :
>>>> >> XY = np.concatenate((X[None, :], Y[None, :]))
>>> array([[-2.1 , -1. ?, ?4.3 ],
>>> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]])
>>
>> In this context, I find stacking, ?np.vstack((X,Y)), more appropriate
>> than concatenate.
>>
>>>
>>> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
>>>> >> np.cov(XY)
>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>>>
>> Sure the resulting array is the same but whole process is totally different.
>>
>>
>>> (And indeed, the actual cov Python code does use concatenate() )
>> Yes, but the user does not see that. Whereas you are forcing the user
>> to do the stacking in the correct dimensions.
>>
>>
>>>
>>>
>>> Now let me come back to my assertion about this behavior *usefulness*.
>>> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
>>> simple scalars blocks).
>> No there are not '4' blocks just rows and columns.
>
> Sturla showed the 4 blocks in his first message.
>
Well, I could not follow that because the code is wrong.
X = np.array([-2.1, -1. ,  4.3])
>>> cX = X - X.mean(axis=0)[np.newaxis,:]

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    cX = X - X.mean(axis=0)[np.newaxis,:]
IndexError: 0-d arrays can only use a single () or a list of newaxes
(and a single ...) as an index
 whoops!

Anyhow, variance-covariance matrix is symmetric but numpy or scipy
lacks  lapac's symmetrix matrix
(http://www.netlib.org/lapack/explore-html/de/d9e/group___s_y.html)

>>
>>> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes
>>> to var(X) and var(Y) when setting ddof to 1)
>> Sure but variances are still covariances.
>>
>>> ?* off diagonal blocks are symetric and are actually the covariance
>>> estimate of X, Y observations (from
>>> http://en.wikipedia.org/wiki/Covariance)
>> Sure
>>>
>>> that is :
>>>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
>>> -4.2860000000000005
>>>
>>> The new proposed behaviour for cov is that cov(X,Y) would return :
>>> array(-4.2860000000000005) ?instead of the 2*2 matrix.
>>
>> But how you interpret an 2D array where the rows are greater than 2?
>>>>> Z=Y+X
>>>>> np.cov(np.vstack((X,Y,Z)))
>> array([[ 11.71 ? ? ?, ?-4.286 ? ? , ? 7.424 ? ? ],
>> ? ? ? [ -4.286 ? ? , ? 2.14413333, ?-2.14186667],
>> ? ? ? [ ?7.424 ? ? , ?-2.14186667, ? 5.28213333]])
>>
>>
>>>
>>> ?* This would be in line with the cov(X,Y) mathematical definition, as
>>> well as with R behavior.
>> I don't care what R does because I am using Python and Python is
>> infinitely better than R is!
>>
>> But I think that is only in the 1D case.
>
> I just checked R to make sure I remember correctly
>
>> xx = matrix((1:20)^2, nrow=4)
>> xx
> ? ? [,1] [,2] [,3] [,4] [,5]
> [1,] ? ?1 ? 25 ? 81 ?169 ?289
> [2,] ? ?4 ? 36 ?100 ?196 ?324
> [3,] ? ?9 ? 49 ?121 ?225 ?361
> [4,] ? 16 ? 64 ?144 ?256 ?400
>> cov(xx, 2*xx[,1:2])
> ? ? ? ? [,1] ? ? ?[,2]
> [1,] ?86.0000 ?219.3333
> [2,] 219.3333 ?566.0000
> [3,] 352.6667 ?912.6667
> [4,] 486.0000 1259.3333
> [5,] 619.3333 1606.0000
>> cov(xx)
> ? ? ? ? [,1] ? ? [,2] ? ? ?[,3] ? ? ?[,4] ? ? ?[,5]
> [1,] ?43.0000 109.6667 ?176.3333 ?243.0000 ?309.6667
> [2,] 109.6667 283.0000 ?456.3333 ?629.6667 ?803.0000
> [3,] 176.3333 456.3333 ?736.3333 1016.3333 1296.3333
> [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667
> [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000
>
>
>>
>>> ?* This would save memory and computing resources. (and therefore help
>>> save the planet ;-) )
>> Nothing that you have provided shows that it will.
>
> I don't know about saving the planet, but if X and Y have the same
> number of columns, we save 3 quarters of the calculations, as Sturla
> also explained in his first message.
>
Can not figure those savings:
For a 2 by 2 output has 3 covariances (so 3/4 =0.75 is 'needed' not 25%)
a 3 by 3  output has 6 covariances
a 5 by 5 output 15 covariances

If you want to save memory and calculation then use symmetric storage
and associated methods.

Bruce

> Josef
>
>>
>>>
>>> However, I do understand that the impact for this change may be big.
>>> This indeed requires careful reviewing.
>>>
>>> Pierre
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> Bruce
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From hmgaudecker at gmail.com  Thu Jan 26 16:12:58 2012
From: hmgaudecker at gmail.com (Hans-Martin v. Gaudecker)
Date: Thu, 26 Jan 2012 22:12:58 +0100
Subject: [Numpy-discussion] Problem installing NumPy with Python
	3.2.2/MacOS X 10.7.2 (Samuel John)
In-Reply-To: <mailman.7386.1327611481.1086.numpy-discussion@scipy.org>
References: <mailman.7386.1327611481.1086.numpy-discussion@scipy.org>
Message-ID: <96DDFBB9-8332-4DE3-A2C2-AF93EE914346@gmail.com>

Hi Samuel, 

I realised that a couple of days ago as well? Same on Python 2.7.2 (full output from both below FWIW). I usually only need a minimal subset of SciPy, so still hoping it's only in places I don't need it. Else I shall be happy to come back to your formulas, thanks for making them! 

Best,
Hans-Martin


python
Python 2.7.2 (default, Jan 11 2012, 16:23:50) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
sci>>> scipy.test(verbose=10)
Running unit tests for scipy
NumPy version 2.0.0.dev-55472ca
NumPy is installed in /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/numpy
SciPy version 0.11.0.dev-600e81f
SciPy is installed in /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy
Python version 2.7.2 (default, Jan 11 2012, 16:23:50) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
nose version 1.1.2
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
nose.config: INFO: Excluding tests matching ['f2py_ext', 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext']
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/fftpack/convolve.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/integrate/vode.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/interpolate/dfitpack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/interpolate/interpnd.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/io/matlab/mio5_utils.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/io/matlab/mio_utils.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/io/matlab/streams.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/blas/cblas.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/blas/fblas.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/lapack/atlas_version.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/lapack/calc_lwork.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/lapack/clapack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/lib/lapack/flapack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/atlas_version.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/calc_lwork.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/cblas.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/clapack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/fblas.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/linalg/flapack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/optimize/minpack2.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/optimize/moduleTNC.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/signal/sigtools.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/signal/spectral.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/signal/spline.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/spatial/ckdtree.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/spatial/qhull.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/special/lambertw.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/special/orthogonal_eval.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/special/specfun.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/stats/futil.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/stats/mvn.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/stats/statlib.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/stats/vonmises_cython.so is executable; skipped
Tests cophenet(Z) on tdist data set. ... ok
Tests cophenet(Z, Y) on tdist data set. ... ok
Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. ... ok
Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. Correspondance should be false. ... ok
Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. Correspondance should be false. ... ok
Tests correspond(Z, y) with empty linkage and condensed distance matrix. ... ok
Tests num_obs_linkage with observation matrices of multiple sizes. ... ok
Tests fcluster(Z, criterion='maxclust', t=2) on a random 3-cluster data set. ... ok
Tests fcluster(Z, criterion='maxclust', t=3) on a random 3-cluster data set. ... ok
Tests fcluster(Z, criterion='maxclust', t=4) on a random 3-cluster data set. ... ok
Tests fclusterdata(X, criterion='maxclust', t=2) on a random 3-cluster data set. ... ok
Tests fclusterdata(X, criterion='maxclust', t=3) on a random 3-cluster data set. ... ok
Tests fclusterdata(X, criterion='maxclust', t=4) on a random 3-cluster data set. ... ok
Tests from_mlab_linkage on empty linkage array. ... ok
Tests from_mlab_linkage on linkage array with multiple rows. ... ok
Tests from_mlab_linkage on linkage array with single row. ... ok
Tests inconsistency matrix calculation (depth=1) on a complete linkage. ... ok
Tests inconsistency matrix calculation (depth=2) on a complete linkage. ... ok
Tests inconsistency matrix calculation (depth=3) on a complete linkage. ... ok
Tests inconsistency matrix calculation (depth=4) on a complete linkage. ... ok
Tests inconsistency matrix calculation (depth=1, dataset=Q) with single linkage. ... ok
Tests inconsistency matrix calculation (depth=2, dataset=Q) with single linkage. ... ok
Tests inconsistency matrix calculation (depth=3, dataset=Q) with single linkage. ... ok
Tests inconsistency matrix calculation (depth=4, dataset=Q) with single linkage. ... ok
Tests inconsistency matrix calculation (depth=1) on a single linkage. ... ok
Tests inconsistency matrix calculation (depth=2) on a single linkage. ... ok
Tests inconsistency matrix calculation (depth=3) on a single linkage. ... ok
Tests inconsistency matrix calculation (depth=4) on a single linkage. ... ok
Tests is_isomorphic on test case #1 (one flat cluster, different labellings) ... ok
Tests is_isomorphic on test case #2 (two flat clusters, different labelings) ... ok
Tests is_isomorphic on test case #3 (no flat clusters) ... ok
Tests is_isomorphic on test case #4A (3 flat clusters, different labelings, isomorphic) ... ok
Tests is_isomorphic on test case #4B (3 flat clusters, different labelings, nonisomorphic) ... ok
Tests is_isomorphic on test case #4C (3 flat clusters, different labelings, isomorphic) ... ok
Tests is_isomorphic on test case #5A (1000 observations, 2 random clusters, random permutation of the labeling). Run 3 times. ... ok
Tests is_isomorphic on test case #5B (1000 observations, 3 random clusters, random permutation of the labeling). Run 3 times. ... ok
Tests is_isomorphic on test case #5C (1000 observations, 5 random clusters, random permutation of the labeling). Run 3 times. ... ok
Tests is_isomorphic on test case #5A (1000 observations, 2 random clusters, random permutation of the labeling, slightly nonisomorphic.) Run 3 times. ... ok
Tests is_isomorphic on test case #5B (1000 observations, 3 random clusters, random permutation of the labeling, slightly nonisomorphic.) Run 3 times. ... ok
Tests is_isomorphic on test case #5C (1000 observations, 5 random clusters, random permutation of the labeling, slightly non-isomorphic.) Run 3 times. ... ok
Tests is_monotonic(Z) on 1x4 linkage. Expecting True. ... ok
Tests is_monotonic(Z) on 2x4 linkage. Expecting False. ... ok
Tests is_monotonic(Z) on 2x4 linkage. Expecting True. ... ok
Tests is_monotonic(Z) on 3x4 linkage (case 1). Expecting False. ... ok
Tests is_monotonic(Z) on 3x4 linkage (case 2). Expecting False. ... ok
Tests is_monotonic(Z) on 3x4 linkage (case 3). Expecting False ... ok
Tests is_monotonic(Z) on 3x4 linkage. Expecting True. ... ok
Tests is_monotonic(Z) on an empty linkage. ... ok
Tests is_monotonic(Z) on clustering generated by single linkage on Iris data set. Expecting True. ... ok
Tests is_monotonic(Z) on clustering generated by single linkage on tdist data set. Expecting True. ... ok
Tests is_monotonic(Z) on clustering generated by single linkage on tdist data set. Perturbing. Expecting False. ... ok
Tests is_valid_im(R) on im over 2 observations. ... ok
Tests is_valid_im(R) on im over 3 observations. ... ok
Tests is_valid_im(R) with 3 columns. ... ok
Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3). ... ok
Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link counts. ... ok
Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link height means. ... ok
Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link height standard deviations. ... ok
Tests is_valid_im(R) with 5 columns. ... ok
Tests is_valid_im(R) with empty inconsistency matrix. ... ok
Tests is_valid_im(R) with integer type. ... ok
Tests is_valid_linkage(Z) on linkage over 2 observations. ... ok
Tests is_valid_linkage(Z) on linkage over 3 observations. ... ok
Tests is_valid_linkage(Z) with 3 columns. ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3). ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative counts. ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative distances. ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative indices (left). ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative indices (right). ... ok
Tests is_valid_linkage(Z) with 5 columns. ... ok
Tests is_valid_linkage(Z) with empty linkage. ... ok
Tests is_valid_linkage(Z) with integer type. ... ok
Tests leaders using a flat clustering generated by single linkage. ... ok
Tests leaves_list(Z) on a 1x4 linkage. ... ok
Tests leaves_list(Z) on a 2x4 linkage. ... ok
Tests leaves_list(Z) on the Iris data set using average linkage. ... ok
Tests leaves_list(Z) on the Iris data set using centroid linkage. ... ok
Tests leaves_list(Z) on the Iris data set using complete linkage. ... ok
Tests leaves_list(Z) on the Iris data set using median linkage. ... ok
Tests leaves_list(Z) on the Iris data set using single linkage. ... ok
Tests leaves_list(Z) on the Iris data set using ward linkage. ... ok
Tests linkage(Y, 'average') on the tdist data set. ... ok
Tests linkage(Y, 'centroid') on the Q data set. ... ok
Tests linkage(Y, 'complete') on the Q data set. ... ok
Tests linkage(Y, 'complete') on the tdist data set. ... ok
Tests linkage(Y) where Y is a 0x4 linkage matrix. Exception expected. ... ok
Tests linkage(Y, 'single') on the Q data set. ... ok
Tests linkage(Y, 'single') on the tdist data set. ... ok
Tests linkage(Y, 'weighted') on the Q data set. ... ok
Tests linkage(Y, 'weighted') on the tdist data set. ... ok
Tests maxdists(Z) on the Q data set using centroid linkage. ... ok
Tests maxdists(Z) on the Q data set using complete linkage. ... ok
Tests maxdists(Z) on the Q data set using median linkage. ... ok
Tests maxdists(Z) on the Q data set using single linkage. ... ok
Tests maxdists(Z) on the Q data set using Ward linkage. ... ok
Tests maxdists(Z) on empty linkage. Expecting exception. ... ok
Tests maxdists(Z) on linkage with one cluster. ... ok
Tests maxinconsts(Z, R) on the Q data set using centroid linkage. ... ok
Tests maxinconsts(Z, R) on the Q data set using complete linkage. ... ok
Tests maxinconsts(Z, R) on the Q data set using median linkage. ... ok
Tests maxinconsts(Z, R) on the Q data set using single linkage. ... ok
Tests maxinconsts(Z, R) on the Q data set using Ward linkage. ... ok
Tests maxinconsts(Z, R) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxinconsts(Z, R) on empty linkage. Expecting exception. ... ok
Tests maxinconsts(Z, R) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using centroid linkage. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using complete linkage. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using median linkage. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using single linkage. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using Ward linkage. ... ok
Tests maxRstat(Z, R, 0) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxRstat(Z, R, 0) on empty linkage. Expecting exception. ... ok
Tests maxRstat(Z, R, 0) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using centroid linkage. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using complete linkage. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using median linkage. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using single linkage. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using Ward linkage. ... ok
Tests maxRstat(Z, R, 1) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxRstat(Z, R, 1) on empty linkage. Expecting exception. ... ok
Tests maxRstat(Z, R, 1) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using centroid linkage. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using complete linkage. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using median linkage. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using single linkage. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using Ward linkage. ... ok
Tests maxRstat(Z, R, 2) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxRstat(Z, R, 2) on empty linkage. Expecting exception. ... ok
Tests maxRstat(Z, R, 2) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using centroid linkage. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using complete linkage. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using median linkage. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using single linkage. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using Ward linkage. ... ok
Tests maxRstat(Z, R, 3) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxRstat(Z, R, 3) on empty linkage. Expecting exception. ... ok
Tests maxRstat(Z, R, 3) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 3.3). Expecting exception. ... ok
Tests maxRstat(Z, R, -1). Expecting exception. ... ok
Tests maxRstat(Z, R, 4). Expecting exception. ... ok
Tests num_obs_linkage(Z) on linkage over 2 observations. ... ok
Tests num_obs_linkage(Z) on linkage over 3 observations. ... ok
Tests num_obs_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3). ... ok
Tests num_obs_linkage(Z) with empty linkage. ... ok
Tests to_mlab_linkage on linkage array with multiple rows. ... ok
Tests to_mlab_linkage on empty linkage array. ... ok
Tests to_mlab_linkage on linkage array with single row. ... ok
test_hierarchy.load_testing_files ... ok
Ticket #505. ... ok
Testing that kmeans2 init methods work. ... ok
Testing simple call to kmeans2 with rank 1 data. ... ok
Testing simple call to kmeans2 with rank 1 data. ... ok
Testing simple call to kmeans2 and its results. ... ok
Regression test for #546: fail when k arg is 0. ... ok
This will cause kmean to have a cluster with no points. ... ok
test_kmeans_simple (test_vq.TestKMean) ... ok
test_large_features (test_vq.TestKMean) ... ok
test_py_vq (test_vq.TestVq) ... ok
test_py_vq2 (test_vq.TestVq) ... ok
test_vq (test_vq.TestVq) ... ok
Test special rank 1 vq algo, python implementation. ... ok
nose.selector: INFO: /usr/local/Cellar/python/2.7.2/lib/python2.7/site-packages/scipy/cluster/tests/vq_test.py is executable; skipped
test_codata.test_find ... ok
test_codata.test_basic_table_parse ... ok
test_codata.test_basic_lookup ... ok
test_codata.test_find_all ... ok
test_codata.test_find_single ... ok
test_codata.test_2002_vs_2006 ... ok
Check that updating stored values with exact ones worked. ... ok
test_constants.test_fahrenheit_to_celcius ... ok
test_constants.test_celcius_to_kelvin ... ok
test_constants.test_kelvin_to_celcius ... ok
test_constants.test_fahrenheit_to_kelvin ... ok
test_constants.test_kelvin_to_fahrenheit ... ok
test_constants.test_celcius_to_fahrenheit ... ok
test_constants.test_lambda_to_nu ... ok
test_constants.test_nu_to_lambda ... ok
test_definition (test_basic.TestDoubleFFT) ... ok
test_djbfft (test_basic.TestDoubleFFT) ... ok
test_n_argument_real (test_basic.TestDoubleFFT) ... ok
test_definition (test_basic.TestDoubleIFFT) ... FAIL
test_definition_real (test_basic.TestDoubleIFFT) ... ok
test_djbfft (test_basic.TestDoubleIFFT) ... FAIL
test_random_complex (test_basic.TestDoubleIFFT) ... python(30168) malloc: *** error for object 0x104cdce88: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6


Python 3.2.2 (default, Jan 11 2012, 16:48:20) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy
>>> scipy.test(verbose=10)
Running unit tests for scipy
NumPy version 2.0.0.dev-55472ca
NumPy is installed in /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/numpy
SciPy version 0.11.0.dev-600e81f
SciPy is installed in /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy
Python version 3.2.2 (default, Jan 11 2012, 16:48:20) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
nose version 1.1.2
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
nose.config: INFO: Excluding tests matching ['f2py_ext', 'f2py_f90_ext', 'gen_ext', 'pyrex_ext', 'swig_ext']
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/fftpack/convolve.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/integrate/vode.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/interpolate/dfitpack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/interpolate/interpnd.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/io/matlab/mio5_utils.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/io/matlab/mio_utils.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/io/matlab/streams.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/blas/cblas.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/blas/fblas.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/lapack/atlas_version.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/lapack/calc_lwork.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/lapack/clapack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/lib/lapack/flapack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/atlas_version.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/calc_lwork.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/cblas.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/clapack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/fblas.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/linalg/flapack.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/optimize/minpack2.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/optimize/moduleTNC.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/signal/sigtools.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/signal/spectral.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/signal/spline.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/spatial/ckdtree.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/spatial/qhull.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/special/lambertw.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/special/orthogonal_eval.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/special/specfun.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/stats/futil.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/stats/mvn.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/stats/statlib.so is executable; skipped
nose.selector: INFO: /usr/local/Cellar/python3/3.2.2/lib/python3.2/site-packages/scipy/stats/vonmises_cython.so is executable; skipped
Tests cophenet(Z) on tdist data set. ... ok
Tests cophenet(Z, Y) on tdist data set. ... ok
Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. ... ok
Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. Correspondance should be false. ... ok
Tests correspond(Z, y) on linkage and CDMs over observation sets of different sizes. Correspondance should be false. ... ok
Tests correspond(Z, y) with empty linkage and condensed distance matrix. ... ok
Tests num_obs_linkage with observation matrices of multiple sizes. ... ok
Tests fcluster(Z, criterion='maxclust', t=2) on a random 3-cluster data set. ... ok
Tests fcluster(Z, criterion='maxclust', t=3) on a random 3-cluster data set. ... ok
Tests fcluster(Z, criterion='maxclust', t=4) on a random 3-cluster data set. ... ok
Tests fclusterdata(X, criterion='maxclust', t=2) on a random 3-cluster data set. ... ok
Tests fclusterdata(X, criterion='maxclust', t=3) on a random 3-cluster data set. ... ok
Tests fclusterdata(X, criterion='maxclust', t=4) on a random 3-cluster data set. ... ok
Tests from_mlab_linkage on empty linkage array. ... ok
Tests from_mlab_linkage on linkage array with multiple rows. ... ok
Tests from_mlab_linkage on linkage array with single row. ... ok
Tests inconsistency matrix calculation (depth=1) on a complete linkage. ... ok
Tests inconsistency matrix calculation (depth=2) on a complete linkage. ... ok
Tests inconsistency matrix calculation (depth=3) on a complete linkage. ... ok
Tests inconsistency matrix calculation (depth=4) on a complete linkage. ... ok
Tests inconsistency matrix calculation (depth=1, dataset=Q) with single linkage. ... ok
Tests inconsistency matrix calculation (depth=2, dataset=Q) with single linkage. ... ok
Tests inconsistency matrix calculation (depth=3, dataset=Q) with single linkage. ... ok
Tests inconsistency matrix calculation (depth=4, dataset=Q) with single linkage. ... ok
Tests inconsistency matrix calculation (depth=1) on a single linkage. ... ok
Tests inconsistency matrix calculation (depth=2) on a single linkage. ... ok
Tests inconsistency matrix calculation (depth=3) on a single linkage. ... ok
Tests inconsistency matrix calculation (depth=4) on a single linkage. ... ok
Tests is_isomorphic on test case #1 (one flat cluster, different labellings) ... ok
Tests is_isomorphic on test case #2 (two flat clusters, different labelings) ... ok
Tests is_isomorphic on test case #3 (no flat clusters) ... ok
Tests is_isomorphic on test case #4A (3 flat clusters, different labelings, isomorphic) ... ok
Tests is_isomorphic on test case #4B (3 flat clusters, different labelings, nonisomorphic) ... ok
Tests is_isomorphic on test case #4C (3 flat clusters, different labelings, isomorphic) ... ok
Tests is_isomorphic on test case #5A (1000 observations, 2 random clusters, random permutation of the labeling). Run 3 times. ... ok
Tests is_isomorphic on test case #5B (1000 observations, 3 random clusters, random permutation of the labeling). Run 3 times. ... ok
Tests is_isomorphic on test case #5C (1000 observations, 5 random clusters, random permutation of the labeling). Run 3 times. ... ok
Tests is_isomorphic on test case #5A (1000 observations, 2 random clusters, random permutation of the labeling, slightly nonisomorphic.) Run 3 times. ... ok
Tests is_isomorphic on test case #5B (1000 observations, 3 random clusters, random permutation of the labeling, slightly nonisomorphic.) Run 3 times. ... ok
Tests is_isomorphic on test case #5C (1000 observations, 5 random clusters, random permutation of the labeling, slightly non-isomorphic.) Run 3 times. ... ok
Tests is_monotonic(Z) on 1x4 linkage. Expecting True. ... ok
Tests is_monotonic(Z) on 2x4 linkage. Expecting False. ... ok
Tests is_monotonic(Z) on 2x4 linkage. Expecting True. ... ok
Tests is_monotonic(Z) on 3x4 linkage (case 1). Expecting False. ... ok
Tests is_monotonic(Z) on 3x4 linkage (case 2). Expecting False. ... ok
Tests is_monotonic(Z) on 3x4 linkage (case 3). Expecting False ... ok
Tests is_monotonic(Z) on 3x4 linkage. Expecting True. ... ok
Tests is_monotonic(Z) on an empty linkage. ... ok
Tests is_monotonic(Z) on clustering generated by single linkage on Iris data set. Expecting True. ... ok
Tests is_monotonic(Z) on clustering generated by single linkage on tdist data set. Expecting True. ... ok
Tests is_monotonic(Z) on clustering generated by single linkage on tdist data set. Perturbing. Expecting False. ... ok
Tests is_valid_im(R) on im over 2 observations. ... ok
Tests is_valid_im(R) on im over 3 observations. ... ok
Tests is_valid_im(R) with 3 columns. ... ok
Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3). ... ok
Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link counts. ... ok
Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link height means. ... ok
Tests is_valid_im(R) on im on observation sets between sizes 4 and 15 (step size 3) with negative link height standard deviations. ... ok
Tests is_valid_im(R) with 5 columns. ... ok
Tests is_valid_im(R) with empty inconsistency matrix. ... ok
Tests is_valid_im(R) with integer type. ... ok
Tests is_valid_linkage(Z) on linkage over 2 observations. ... ok
Tests is_valid_linkage(Z) on linkage over 3 observations. ... ok
Tests is_valid_linkage(Z) with 3 columns. ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3). ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative counts. ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative distances. ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative indices (left). ... ok
Tests is_valid_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3) with negative indices (right). ... ok
Tests is_valid_linkage(Z) with 5 columns. ... ok
Tests is_valid_linkage(Z) with empty linkage. ... ok
Tests is_valid_linkage(Z) with integer type. ... ok
Tests leaders using a flat clustering generated by single linkage. ... ok
Tests leaves_list(Z) on a 1x4 linkage. ... ok
Tests leaves_list(Z) on a 2x4 linkage. ... ok
Tests leaves_list(Z) on the Iris data set using average linkage. ... ok
Tests leaves_list(Z) on the Iris data set using centroid linkage. ... ok
Tests leaves_list(Z) on the Iris data set using complete linkage. ... ok
Tests leaves_list(Z) on the Iris data set using median linkage. ... ok
Tests leaves_list(Z) on the Iris data set using single linkage. ... ok
Tests leaves_list(Z) on the Iris data set using ward linkage. ... ok
Tests linkage(Y, 'average') on the tdist data set. ... ok
Tests linkage(Y, 'centroid') on the Q data set. ... ok
Tests linkage(Y, 'complete') on the Q data set. ... ok
Tests linkage(Y, 'complete') on the tdist data set. ... ok
Tests linkage(Y) where Y is a 0x4 linkage matrix. Exception expected. ... ok
Tests linkage(Y, 'single') on the Q data set. ... ok
Tests linkage(Y, 'single') on the tdist data set. ... ok
Tests linkage(Y, 'weighted') on the Q data set. ... ok
Tests linkage(Y, 'weighted') on the tdist data set. ... ok
Tests maxdists(Z) on the Q data set using centroid linkage. ... ok
Tests maxdists(Z) on the Q data set using complete linkage. ... ok
Tests maxdists(Z) on the Q data set using median linkage. ... ok
Tests maxdists(Z) on the Q data set using single linkage. ... ok
Tests maxdists(Z) on the Q data set using Ward linkage. ... ok
Tests maxdists(Z) on empty linkage. Expecting exception. ... ok
Tests maxdists(Z) on linkage with one cluster. ... ok
Tests maxinconsts(Z, R) on the Q data set using centroid linkage. ... ok
Tests maxinconsts(Z, R) on the Q data set using complete linkage. ... ok
Tests maxinconsts(Z, R) on the Q data set using median linkage. ... ok
Tests maxinconsts(Z, R) on the Q data set using single linkage. ... ok
Tests maxinconsts(Z, R) on the Q data set using Ward linkage. ... ok
Tests maxinconsts(Z, R) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxinconsts(Z, R) on empty linkage. Expecting exception. ... ok
Tests maxinconsts(Z, R) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using centroid linkage. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using complete linkage. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using median linkage. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using single linkage. ... ok
Tests maxRstat(Z, R, 0) on the Q data set using Ward linkage. ... ok
Tests maxRstat(Z, R, 0) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxRstat(Z, R, 0) on empty linkage. Expecting exception. ... ok
Tests maxRstat(Z, R, 0) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using centroid linkage. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using complete linkage. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using median linkage. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using single linkage. ... ok
Tests maxRstat(Z, R, 1) on the Q data set using Ward linkage. ... ok
Tests maxRstat(Z, R, 1) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxRstat(Z, R, 1) on empty linkage. Expecting exception. ... ok
Tests maxRstat(Z, R, 1) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using centroid linkage. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using complete linkage. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using median linkage. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using single linkage. ... ok
Tests maxRstat(Z, R, 2) on the Q data set using Ward linkage. ... ok
Tests maxRstat(Z, R, 2) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxRstat(Z, R, 2) on empty linkage. Expecting exception. ... ok
Tests maxRstat(Z, R, 2) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using centroid linkage. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using complete linkage. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using median linkage. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using single linkage. ... ok
Tests maxRstat(Z, R, 3) on the Q data set using Ward linkage. ... ok
Tests maxRstat(Z, R, 3) on linkage and inconsistency matrices with different numbers of clusters. Expecting exception. ... ok
Tests maxRstat(Z, R, 3) on empty linkage. Expecting exception. ... ok
Tests maxRstat(Z, R, 3) on linkage with one cluster. ... ok
Tests maxRstat(Z, R, 3.3). Expecting exception. ... ok
Tests maxRstat(Z, R, -1). Expecting exception. ... ok
Tests maxRstat(Z, R, 4). Expecting exception. ... ok
Tests num_obs_linkage(Z) on linkage over 2 observations. ... ok
Tests num_obs_linkage(Z) on linkage over 3 observations. ... ok
Tests num_obs_linkage(Z) on linkage on observation sets between sizes 4 and 15 (step size 3). ... ok
Tests num_obs_linkage(Z) with empty linkage. ... ok
Tests to_mlab_linkage on linkage array with multiple rows. ... ok
Tests to_mlab_linkage on empty linkage array. ... ok
Tests to_mlab_linkage on linkage array with single row. ... ok
test_hierarchy.load_testing_files ... ok
Ticket #505. ... ok
Testing that kmeans2 init methods work. ... ok
Testing simple call to kmeans2 with rank 1 data. ... ok
Testing simple call to kmeans2 with rank 1 data. ... ok
Testing simple call to kmeans2 and its results. ... ok
Regression test for #546: fail when k arg is 0. ... ok
This will cause kmean to have a cluster with no points. ... ok
test_kmeans_simple (test_vq.TestKMean) ... ok
test_large_features (test_vq.TestKMean) ... ok
test_py_vq (test_vq.TestVq) ... ok
test_py_vq2 (test_vq.TestVq) ... ok
test_vq (test_vq.TestVq) ... ok
Test special rank 1 vq algo, python implementation. ... ok
test_codata.test_find ... ok
test_codata.test_basic_table_parse ... ok
test_codata.test_basic_lookup ... ok
test_codata.test_find_all ... ok
test_codata.test_find_single ... ok
test_codata.test_2002_vs_2006 ... ok
Check that updating stored values with exact ones worked. ... ok
test_constants.test_fahrenheit_to_celcius ... ok
test_constants.test_celcius_to_kelvin ... ok
test_constants.test_kelvin_to_celcius ... ok
test_constants.test_fahrenheit_to_kelvin ... ok
test_constants.test_kelvin_to_fahrenheit ... ok
test_constants.test_celcius_to_fahrenheit ... ok
test_constants.test_lambda_to_nu ... ok
test_constants.test_nu_to_lambda ... ok
test_definition (test_basic.TestDoubleFFT) ... ok
test_djbfft (test_basic.TestDoubleFFT) ... ok
test_n_argument_real (test_basic.TestDoubleFFT) ... ok
test_definition (test_basic.TestDoubleIFFT) ... python3(30179) malloc: *** error for object 0x1050ae058: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6


> Date: Thu, 26 Jan 2012 19:01:22 +0100
> From: Samuel John <scipy at samueljohn.de>
> Subject: Re: [Numpy-discussion] Problem installing NumPy with Python
> 	3.2.2/MacOS X 10.7.2
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID: <AD984E1A-D359-4BA0-BDD4-81826454EFC2 at samueljohn.de>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi Hans-Martin!
> 
> You could try my instructions recently posted to this list http://thread.gmane.org/gmane.comp.python.scientific.devel/15956/
> Basically, using llvm-gcc scipy segfaults when scipy.test() (on my system at least).
> 
> Therefore, I created the homebrew install formula. 
> They work for whatever "which python" you have. But I have tested this for 2.7.2 on MacOS X 10.7.2.
> 
> Samuel
> 
> 
> On 11.01.2012, at 16:12, Hans-Martin v. Gaudecker wrote:
> 
>> I recently upgraded to Lion and just faced the same problem with both Python 2.7.2 and Python 3.2.2 installed via the python.org installers. My hunch is that the errors are related to the fact that Apple dropped gcc-4.2 from XCode 4.2. I got gcc-4.2 via [1] then, still the same error -- who knows what else got lost in that upgrade... Previous successful builds with gcc-4.2 might have been with XCode 4.1 (or 4.2 installed on top of it).
>> 
>> In the end I decided to re-install both Python versions via homebrew, nicely described here [2] and everything seems to work fine using LLVM. Test outputs for NumPy master under 2.7.2 and 3.2.2 are below in case they are of interest.
>> 
>> Best,
>> Hans-Martin
>> 
>> [1] https://github.com/kennethreitz/osx-gcc-installer
>> [2] http://www.thisisthegreenroom.com/2011/installing-python-numpy-scipy-matplotlib-and-ipython-on-lion/#numpy
> 
> The instructions at [2] lead to a segfault in scipy.test() for me, because it used llvm-gcc (which is the default on Lion).


From josef.pktd at gmail.com  Thu Jan 26 19:43:14 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Thu, 26 Jan 2012 19:43:14 -0500
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAAea2pYFc7nmMkM0MZvWXh53Z9TFCHf3aMbQ0kM6H4haJbL9yg@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<4F217A33.6020806@crans.org>
	<CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com>
	<CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com>
	<CAAea2pYFc7nmMkM0MZvWXh53Z9TFCHf3aMbQ0kM6H4haJbL9yg@mail.gmail.com>
Message-ID: <CAMMTP+CnUiQiAOwOyMkeQMB4rRo478iWvnUazS+Cp1RiZDrFuQ@mail.gmail.com>

On Thu, Jan 26, 2012 at 3:58 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On Thu, Jan 26, 2012 at 12:45 PM, ?<josef.pktd at gmail.com> wrote:
>> On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
>>> <pierre.haessig at crans.org> wrote:
>>>> Le 26/01/2012 15:57, Bruce Southey a ?crit :
>>>>> Can you please provide a
>>>>> couple of real examples with expected output that clearly show what
>>>>> you want?
>>>>>
>>>> Hi Bruce,
>>>>
>>>> Thanks for your ticket feedback ! It's precisely because I see a big
>>>> potential impact of the proposed change that I send first a ML message,
>>>> second a ticket before jumping to a pull-request like a Sergio Leone's
>>>> cowboy (sorry, I watched "for a few dollars more" last weekend...)
>>>>
>>>> Now, I realize that in the ticket writing I made the wrong trade-off
>>>> between conciseness and accuracy which led to some of the errors you
>>>> raised. Let me try to use your example to try to share what I have in mind.
>>>>
>>>>> >> X = array([-2.1, -1. , ?4.3])
>>>>> >> Y = array([ 3. ?, ?1.1 , ?0.12])
>>>>
>>>> Indeed, with today's cov behavior we have a 2x2 array:
>>>>> >> cov(X,Y)
>>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
>>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>>>>
>>>> Now, when I used the word 'concatenation', I wasn't precise enough
>>>> because I meant assembling X and Y in the sense of 2 vectors of
>>>> observations from 2 random variables X and Y.
>>>> This is achieved by concatenate(X,Y) *when properly playing with
>>>> dimensions* (which I didn't mentioned) :
>>>>> >> XY = np.concatenate((X[None, :], Y[None, :]))
>>>> array([[-2.1 , -1. ?, ?4.3 ],
>>>> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]])
>>>
>>> In this context, I find stacking, ?np.vstack((X,Y)), more appropriate
>>> than concatenate.
>>>
>>>>
>>>> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
>>>>> >> np.cov(XY)
>>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
>>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>>>>
>>> Sure the resulting array is the same but whole process is totally different.
>>>
>>>
>>>> (And indeed, the actual cov Python code does use concatenate() )
>>> Yes, but the user does not see that. Whereas you are forcing the user
>>> to do the stacking in the correct dimensions.
>>>
>>>
>>>>
>>>>
>>>> Now let me come back to my assertion about this behavior *usefulness*.
>>>> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
>>>> simple scalars blocks).
>>> No there are not '4' blocks just rows and columns.
>>
>> Sturla showed the 4 blocks in his first message.
>>
> Well, I could not follow that because the code is wrong.
> X = np.array([-2.1, -1. , ?4.3])
>>>> cX = X - X.mean(axis=0)[np.newaxis,:]
>
> Traceback (most recent call last):
> ?File "<pyshell#6>", line 1, in <module>
> ? ?cX = X - X.mean(axis=0)[np.newaxis,:]
> IndexError: 0-d arrays can only use a single () or a list of newaxes
> (and a single ...) as an index
> ?whoops!
>
> Anyhow, variance-covariance matrix is symmetric but numpy or scipy
> lacks ?lapac's symmetrix matrix
> (http://www.netlib.org/lapack/explore-html/de/d9e/group___s_y.html)
>
>>>
>>>> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes
>>>> to var(X) and var(Y) when setting ddof to 1)
>>> Sure but variances are still covariances.
>>>
>>>> ?* off diagonal blocks are symetric and are actually the covariance
>>>> estimate of X, Y observations (from
>>>> http://en.wikipedia.org/wiki/Covariance)
>>> Sure
>>>>
>>>> that is :
>>>>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
>>>> -4.2860000000000005
>>>>
>>>> The new proposed behaviour for cov is that cov(X,Y) would return :
>>>> array(-4.2860000000000005) ?instead of the 2*2 matrix.
>>>
>>> But how you interpret an 2D array where the rows are greater than 2?
>>>>>> Z=Y+X
>>>>>> np.cov(np.vstack((X,Y,Z)))
>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? , ? 7.424 ? ? ],
>>> ? ? ? [ -4.286 ? ? , ? 2.14413333, ?-2.14186667],
>>> ? ? ? [ ?7.424 ? ? , ?-2.14186667, ? 5.28213333]])
>>>
>>>
>>>>
>>>> ?* This would be in line with the cov(X,Y) mathematical definition, as
>>>> well as with R behavior.
>>> I don't care what R does because I am using Python and Python is
>>> infinitely better than R is!
>>>
>>> But I think that is only in the 1D case.
>>
>> I just checked R to make sure I remember correctly
>>
>>> xx = matrix((1:20)^2, nrow=4)
>>> xx
>> ? ? [,1] [,2] [,3] [,4] [,5]
>> [1,] ? ?1 ? 25 ? 81 ?169 ?289
>> [2,] ? ?4 ? 36 ?100 ?196 ?324
>> [3,] ? ?9 ? 49 ?121 ?225 ?361
>> [4,] ? 16 ? 64 ?144 ?256 ?400
>>> cov(xx, 2*xx[,1:2])
>> ? ? ? ? [,1] ? ? ?[,2]
>> [1,] ?86.0000 ?219.3333
>> [2,] 219.3333 ?566.0000
>> [3,] 352.6667 ?912.6667
>> [4,] 486.0000 1259.3333
>> [5,] 619.3333 1606.0000
>>> cov(xx)
>> ? ? ? ? [,1] ? ? [,2] ? ? ?[,3] ? ? ?[,4] ? ? ?[,5]
>> [1,] ?43.0000 109.6667 ?176.3333 ?243.0000 ?309.6667
>> [2,] 109.6667 283.0000 ?456.3333 ?629.6667 ?803.0000
>> [3,] 176.3333 456.3333 ?736.3333 1016.3333 1296.3333
>> [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667
>> [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000
>>
>>
>>>
>>>> ?* This would save memory and computing resources. (and therefore help
>>>> save the planet ;-) )
>>> Nothing that you have provided shows that it will.
>>
>> I don't know about saving the planet, but if X and Y have the same
>> number of columns, we save 3 quarters of the calculations, as Sturla
>> also explained in his first message.
>>
> Can not figure those savings:
> For a 2 by 2 output has 3 covariances (so 3/4 =0.75 is 'needed' not 25%)
> a 3 by 3 ?output has 6 covariances
> a 5 by 5 output 15 covariances

what numpy calculates are 4, 9 and 25 covariances, we might care only
about 1, 2 and 4 of them.

>
> If you want to save memory and calculation then use symmetric storage
> and associated methods.

actually for covariance matrix we stilll need to subtract means, so we
won't save 75%, but we save 75% in the cross-product.

suppose X and Y are (nobs, k_x) and (nobs, k_y)   (means already subtracted)
(and ignoring that numpy "likes" rows instead of columns)

the partitioned dot product  [X,Y]'[X,Y] is

[[ X'X, X'Y],
 [Y'X, Y'Y]]

X'Y is (n_x, n_y)
total shape is (n_x + n_y, n_x + n_y)

If we are only interested in X'Y, we don't need the other three submatrices.

If n_x = 99 and n_y is 1, we save .... ?
(we get a (99,1) instead of a (100, 100) matrix)

and X'Y , np.dot(X, Y), doesn't have any duplicated symmetry, so
exploiting symmetry is a different issue.

Josef

>
> Bruce
>
>> Josef
>>
>>>
>>>>
>>>> However, I do understand that the impact for this change may be big.
>>>> This indeed requires careful reviewing.
>>>>
>>>> Pierre
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>> Bruce
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From bsouthey at gmail.com  Thu Jan 26 21:45:49 2012
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 26 Jan 2012 20:45:49 -0600
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAMMTP+CnUiQiAOwOyMkeQMB4rRo478iWvnUazS+Cp1RiZDrFuQ@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<4F217A33.6020806@crans.org>
	<CAAea2pY-_mV=R2fGP6DNFs8h75Dh8F3XM=BeX_8En90ifa_BRQ@mail.gmail.com>
	<CAMMTP+A6UKxBdnpp5+FPpOu=Um8GRk=2TVEW9Pns9wDrQ9QRHg@mail.gmail.com>
	<CAAea2pYFc7nmMkM0MZvWXh53Z9TFCHf3aMbQ0kM6H4haJbL9yg@mail.gmail.com>
	<CAMMTP+CnUiQiAOwOyMkeQMB4rRo478iWvnUazS+Cp1RiZDrFuQ@mail.gmail.com>
Message-ID: <CAAea2paQ3hGgtaroKiZ9MX_P=EEJAz_eoSozZ2TFR-0BnonKdQ@mail.gmail.com>

On Thu, Jan 26, 2012 at 6:43 PM,  <josef.pktd at gmail.com> wrote:
> On Thu, Jan 26, 2012 at 3:58 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> On Thu, Jan 26, 2012 at 12:45 PM, ?<josef.pktd at gmail.com> wrote:
>>> On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>>> On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
>>>> <pierre.haessig at crans.org> wrote:
>>>>> Le 26/01/2012 15:57, Bruce Southey a ?crit :
>>>>>> Can you please provide a
>>>>>> couple of real examples with expected output that clearly show what
>>>>>> you want?
>>>>>>
>>>>> Hi Bruce,
>>>>>
>>>>> Thanks for your ticket feedback ! It's precisely because I see a big
>>>>> potential impact of the proposed change that I send first a ML message,
>>>>> second a ticket before jumping to a pull-request like a Sergio Leone's
>>>>> cowboy (sorry, I watched "for a few dollars more" last weekend...)
>>>>>
>>>>> Now, I realize that in the ticket writing I made the wrong trade-off
>>>>> between conciseness and accuracy which led to some of the errors you
>>>>> raised. Let me try to use your example to try to share what I have in mind.
>>>>>
>>>>>> >> X = array([-2.1, -1. , ?4.3])
>>>>>> >> Y = array([ 3. ?, ?1.1 , ?0.12])
>>>>>
>>>>> Indeed, with today's cov behavior we have a 2x2 array:
>>>>>> >> cov(X,Y)
>>>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
>>>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>>>>>
>>>>> Now, when I used the word 'concatenation', I wasn't precise enough
>>>>> because I meant assembling X and Y in the sense of 2 vectors of
>>>>> observations from 2 random variables X and Y.
>>>>> This is achieved by concatenate(X,Y) *when properly playing with
>>>>> dimensions* (which I didn't mentioned) :
>>>>>> >> XY = np.concatenate((X[None, :], Y[None, :]))
>>>>> array([[-2.1 , -1. ?, ?4.3 ],
>>>>> ? ? ? ?[ 3. ?, ?1.1 , ?0.12]])
>>>>
>>>> In this context, I find stacking, ?np.vstack((X,Y)), more appropriate
>>>> than concatenate.
>>>>
>>>>>
>>>>> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
>>>>>> >> np.cov(XY)
>>>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? ],
>>>>> ? ? ? ?[ -4.286 ? ? , ? 2.14413333]])
>>>>>
>>>> Sure the resulting array is the same but whole process is totally different.
>>>>
>>>>
>>>>> (And indeed, the actual cov Python code does use concatenate() )
>>>> Yes, but the user does not see that. Whereas you are forcing the user
>>>> to do the stacking in the correct dimensions.
>>>>
>>>>
>>>>>
>>>>>
>>>>> Now let me come back to my assertion about this behavior *usefulness*.
>>>>> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
>>>>> simple scalars blocks).
>>>> No there are not '4' blocks just rows and columns.
>>>
>>> Sturla showed the 4 blocks in his first message.
>>>
>> Well, I could not follow that because the code is wrong.
>> X = np.array([-2.1, -1. , ?4.3])
>>>>> cX = X - X.mean(axis=0)[np.newaxis,:]
>>
>> Traceback (most recent call last):
>> ?File "<pyshell#6>", line 1, in <module>
>> ? ?cX = X - X.mean(axis=0)[np.newaxis,:]
>> IndexError: 0-d arrays can only use a single () or a list of newaxes
>> (and a single ...) as an index
>> ?whoops!
>>
>> Anyhow, variance-covariance matrix is symmetric but numpy or scipy
>> lacks ?lapac's symmetrix matrix
>> (http://www.netlib.org/lapack/explore-html/de/d9e/group___s_y.html)
>>
>>>>
>>>>> ?* diagonal blocks are just cov(X) and cov(Y) (which in this case comes
>>>>> to var(X) and var(Y) when setting ddof to 1)
>>>> Sure but variances are still covariances.
>>>>
>>>>> ?* off diagonal blocks are symetric and are actually the covariance
>>>>> estimate of X, Y observations (from
>>>>> http://en.wikipedia.org/wiki/Covariance)
>>>> Sure
>>>>>
>>>>> that is :
>>>>>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
>>>>> -4.2860000000000005
>>>>>
>>>>> The new proposed behaviour for cov is that cov(X,Y) would return :
>>>>> array(-4.2860000000000005) ?instead of the 2*2 matrix.
>>>>
>>>> But how you interpret an 2D array where the rows are greater than 2?
>>>>>>> Z=Y+X
>>>>>>> np.cov(np.vstack((X,Y,Z)))
>>>> array([[ 11.71 ? ? ?, ?-4.286 ? ? , ? 7.424 ? ? ],
>>>> ? ? ? [ -4.286 ? ? , ? 2.14413333, ?-2.14186667],
>>>> ? ? ? [ ?7.424 ? ? , ?-2.14186667, ? 5.28213333]])
>>>>
>>>>
>>>>>
>>>>> ?* This would be in line with the cov(X,Y) mathematical definition, as
>>>>> well as with R behavior.
>>>> I don't care what R does because I am using Python and Python is
>>>> infinitely better than R is!
>>>>
>>>> But I think that is only in the 1D case.
>>>
>>> I just checked R to make sure I remember correctly
>>>
>>>> xx = matrix((1:20)^2, nrow=4)
>>>> xx
>>> ? ? [,1] [,2] [,3] [,4] [,5]
>>> [1,] ? ?1 ? 25 ? 81 ?169 ?289
>>> [2,] ? ?4 ? 36 ?100 ?196 ?324
>>> [3,] ? ?9 ? 49 ?121 ?225 ?361
>>> [4,] ? 16 ? 64 ?144 ?256 ?400
>>>> cov(xx, 2*xx[,1:2])
>>> ? ? ? ? [,1] ? ? ?[,2]
>>> [1,] ?86.0000 ?219.3333
>>> [2,] 219.3333 ?566.0000
>>> [3,] 352.6667 ?912.6667
>>> [4,] 486.0000 1259.3333
>>> [5,] 619.3333 1606.0000
>>>> cov(xx)
>>> ? ? ? ? [,1] ? ? [,2] ? ? ?[,3] ? ? ?[,4] ? ? ?[,5]
>>> [1,] ?43.0000 109.6667 ?176.3333 ?243.0000 ?309.6667
>>> [2,] 109.6667 283.0000 ?456.3333 ?629.6667 ?803.0000
>>> [3,] 176.3333 456.3333 ?736.3333 1016.3333 1296.3333
>>> [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667
>>> [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000
>>>
>>>
>>>>
>>>>> ?* This would save memory and computing resources. (and therefore help
>>>>> save the planet ;-) )
>>>> Nothing that you have provided shows that it will.
>>>
>>> I don't know about saving the planet, but if X and Y have the same
>>> number of columns, we save 3 quarters of the calculations, as Sturla
>>> also explained in his first message.
>>>
>> Can not figure those savings:
>> For a 2 by 2 output has 3 covariances (so 3/4 =0.75 is 'needed' not 25%)
>> a 3 by 3 ?output has 6 covariances
>> a 5 by 5 output 15 covariances
>
> what numpy calculates are 4, 9 and 25 covariances, we might care only
> about 1, 2 and 4 of them.
>
>>
>> If you want to save memory and calculation then use symmetric storage
>> and associated methods.
>
> actually for covariance matrix we stilll need to subtract means, so we
> won't save 75%, but we save 75% in the cross-product.
>
> suppose X and Y are (nobs, k_x) and (nobs, k_y) ? (means already subtracted)
> (and ignoring that numpy "likes" rows instead of columns)
>
> the partitioned dot product ?[X,Y]'[X,Y] is
>
> [[ X'X, X'Y],
> ?[Y'X, Y'Y]]
>
> X'Y is (n_x, n_y)
> total shape is (n_x + n_y, n_x + n_y)
>
> If we are only interested in X'Y, we don't need the other three submatrices.
>
> If n_x = 99 and n_y is 1, we save .... ?
> (we get a (99,1) instead of a (100, 100) matrix)
>
> and X'Y , np.dot(X, Y), doesn't have any duplicated symmetry, so
> exploiting symmetry is a different issue.
>
> Josef
>
>>
>> Bruce
>>
>>> Josef
>>>
>>>>
>>>>>
>>>>> However, I do understand that the impact for this change may be big.
>>>>> This indeed requires careful reviewing.
>>>>>
>>>>> Pierre
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>> Bruce
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Thanks for someone to clearly state what they want.
But still lacks evidence that it will save the world - when nobs is
large, n_x and n_y are meaningless and thus  (99,1) vs (100, 100) is
also meaningless.
Further dealing separately with the two arrays also bring additional
overhead - small not zero.


Bruce


From wcmclen at gmail.com  Fri Jan 27 00:09:09 2012
From: wcmclen at gmail.com (William McLendon)
Date: Thu, 26 Jan 2012 22:09:09 -0700
Subject: [Numpy-discussion] need advice on installing NumPy onto a Windows 7
	with Python2.7 (32-bit)
Message-ID: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com>

Hi,

I am trying to install NumPy (using  numpy-1.6.1-win32-superpack-python2.7)
on a Windows 7 machine that has 32-bit Python 2.7 installed on it using the
latest installer (python-2.7.2.msi).  Python is installed into the default
location, C:\Python27, and as far as I can tell the registry knows about it
-- or at least the windows uninstaller in the control panel does...

The installation fails because the NumPy installer cannot find the Python
installation.  I am then prompted with a screen that should allow me to
type in the location of my python installation, but the text-boxes where I
should type this do not allow input so I'm kind of stuck.

I did look into trying to build from source, but I don't have a C compiler
on this system so setup.py died a horrible death.  I'd prefer to avoid
having to install Visual C++ Express on this system.

Does anyone have any suggestions that might be helpful?

Thanks!
  -William
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120126/e22023cf/attachment.html>

From kalatsky at gmail.com  Fri Jan 27 00:29:32 2012
From: kalatsky at gmail.com (Val Kalatsky)
Date: Thu, 26 Jan 2012 23:29:32 -0600
Subject: [Numpy-discussion] need advice on installing NumPy onto a
 Windows 7 with Python2.7 (32-bit)
In-Reply-To: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com>
References: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com>
Message-ID: <CAE8bXEmvriDkh43ct11r===iZ-LHxLDJKr5jZ90WC6-nWfTdDg@mail.gmail.com>

To avoid all the hassle I suggest getting EPD:
http://enthought.com/products/epd.php
You'd get way more than just NumPy, which may or may not be what you need.
I have installed various NumPy's on linux only and from source only which
did
require compilation (gcc), so I am not a good help for your setup.
On the hand, I've done multiple EPD installations on various platforms and
never had problems.
Val

On Thu, Jan 26, 2012 at 11:09 PM, William McLendon <wcmclen at gmail.com>wrote:

> Hi,
>
> I am trying to install NumPy (using
> numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has
> 32-bit Python 2.7 installed on it using the latest installer
> (python-2.7.2.msi).  Python is installed into the default location,
> C:\Python27, and as far as I can tell the registry knows about it -- or at
> least the windows uninstaller in the control panel does...
>
> The installation fails because the NumPy installer cannot find the Python
> installation.  I am then prompted with a screen that should allow me to
> type in the location of my python installation, but the text-boxes where I
> should type this do not allow input so I'm kind of stuck.
>
> I did look into trying to build from source, but I don't have a C compiler
> on this system so setup.py died a horrible death.  I'd prefer to avoid
> having to install Visual C++ Express on this system.
>
> Does anyone have any suggestions that might be helpful?
>
> Thanks!
>   -William
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120126/e3598028/attachment.html>

From pierre.haessig at crans.org  Fri Jan 27 05:09:53 2012
From: pierre.haessig at crans.org (Pierre Haessig)
Date: Fri, 27 Jan 2012 11:09:53 +0100
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>	<4F2152D6.10303@crans.org>	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>	<jfrsnq$6i8$1@dough.gmane.org>
	<4F217E82.2050002@crans.org>	<4F218CC8.9070602@molden.no>
	<CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com>
Message-ID: <4F2277F1.4050407@crans.org>

Le 26/01/2012 19:19, josef.pktd at gmail.com a ?crit :
> The discussion had this reversed, numpy matches the behavior of
> MATLAB, while R (statistics) only returns the cross covariance part as
> proposed.
>
I would also say that there was an attempt to match MATLAB behavior. 
However, there is big difference with numpy.cov because of the default 
value `rowvar` being True. Most softwares and textbooks I know consider 
that, in a 2D context, matrix rows are obvervations while columns are 
the variables.

Any idea why the "transposed" convention was selected in np.cov ?
(This question, I'm raising for informative purpose only... ;-) )

I also compared with octave to see how it works :
-- Function File: cov (X, Y)
Compute covariance.

If each row of X and Y is an observation and each column is a
variable, the (I, J)-th entry of `cov (X, Y)' is the covariance
between the I-th variable in X and the J-th variable in Y. If
called with one argument, compute `cov (X, X)'.

(http://www.gnu.org/software/octave/doc/interpreter/Correlation-and-Regression-Analysis.html)
I like the clear tone of this description. But strangely enough, this a 
bit different from Matlab.
(http://webcache.googleusercontent.com/search?q=cache:L3kF8BHcB4EJ:octave.1599824.n4.nabble.com/cov-m-function-behaves-different-from-Matlab-td1634956.html+&cd=1&hl=fr&ct=clnk&client=iceweasel-a)

> If there is a new xcov, then I think there should also be a xcorrcoef.
> This case needs a different implementation than corrcoef, since the
> xcov doesn't contain the variances and they need to be calculated
> separately.
Adding xcorrcoeff as well would make sense. The use of the np.var when 
setting the `axis` and `??ddof` arguments to appropriate values should the 
bring variances needed for the normalization.

In the end, if adding xcov is the path of least resistance, this may be 
the way to go. What do people think ?

Pierre


From shish at keba.be  Fri Jan 27 06:55:24 2012
From: shish at keba.be (Olivier Delalleau)
Date: Fri, 27 Jan 2012 06:55:24 -0500
Subject: [Numpy-discussion] need advice on installing NumPy onto a
 Windows 7 with Python2.7 (32-bit)
In-Reply-To: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com>
References: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com>
Message-ID: <CAFXk4brwS9_fOGf+0exdK2cB4gfQougOdJGvCAOxniJdRASuuQ@mail.gmail.com>

It seems weird that it wouldn't work, as this is a pretty standard setup.
Here's a few ideas of things to check:
- Double-check it's really 32 bit Python (checking sys.maxint)
- Is there another Python installation that may cause some conflicts?
- Did you download the numpy superpack from the official website?
- Reboot

Unlikely to be helpful, but I can't think of something else right now :/

-=- Olivier

2012/1/27 William McLendon <wcmclen at gmail.com>

> Hi,
>
> I am trying to install NumPy (using
> numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has
> 32-bit Python 2.7 installed on it using the latest installer
> (python-2.7.2.msi).  Python is installed into the default location,
> C:\Python27, and as far as I can tell the registry knows about it -- or at
> least the windows uninstaller in the control panel does...
>
> The installation fails because the NumPy installer cannot find the Python
> installation.  I am then prompted with a screen that should allow me to
> type in the location of my python installation, but the text-boxes where I
> should type this do not allow input so I'm kind of stuck.
>
> I did look into trying to build from source, but I don't have a C compiler
> on this system so setup.py died a horrible death.  I'd prefer to avoid
> having to install Visual C++ Express on this system.
>
> Does anyone have any suggestions that might be helpful?
>
> Thanks!
>   -William
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/35b6a90f/attachment.html>

From wcmclen at gmail.com  Fri Jan 27 08:26:33 2012
From: wcmclen at gmail.com (William McLendon)
Date: Fri, 27 Jan 2012 06:26:33 -0700
Subject: [Numpy-discussion] need advice on installing NumPy onto a
 Windows 7 with Python2.7 (32-bit)
In-Reply-To: <CAFXk4brwS9_fOGf+0exdK2cB4gfQougOdJGvCAOxniJdRASuuQ@mail.gmail.com>
References: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com>
	<CAFXk4brwS9_fOGf+0exdK2cB4gfQougOdJGvCAOxniJdRASuuQ@mail.gmail.com>
Message-ID: <CAA3_nJW09Rk63Cr66qdRaZGWjcJhPGsOtOAgMREws2vL6N7G1A@mail.gmail.com>

Yup, it's 32-bit python:
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]
on win32
Type "copyright", "credits" or "license()" for more information.
>>>

I've only got one python instance installed here :D

Here's where I got the numpy installer,
http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/, as far as I can
tell this should be the right place.

Python has been installed on this system for a while and it's been rebooted
numerous times, I can't imagine that it wouldn't be there.  Matplotlib's
installer had no trouble finding Python.

Thanks!
  -William


On Fri, Jan 27, 2012 at 4:55 AM, Olivier Delalleau <shish at keba.be> wrote:

> It seems weird that it wouldn't work, as this is a pretty standard setup.
> Here's a few ideas of things to check:
> - Double-check it's really 32 bit Python (checking sys.maxint)
> - Is there another Python installation that may cause some conflicts?
> - Did you download the numpy superpack from the official website?
> - Reboot
>
> Unlikely to be helpful, but I can't think of something else right now :/
>
> -=- Olivier
>
> 2012/1/27 William McLendon <wcmclen at gmail.com>
>
>> Hi,
>>
>> I am trying to install NumPy (using
>> numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has
>> 32-bit Python 2.7 installed on it using the latest installer
>> (python-2.7.2.msi).  Python is installed into the default location,
>> C:\Python27, and as far as I can tell the registry knows about it -- or at
>> least the windows uninstaller in the control panel does...
>>
>> The installation fails because the NumPy installer cannot find the Python
>> installation.  I am then prompted with a screen that should allow me to
>> type in the location of my python installation, but the text-boxes where I
>> should type this do not allow input so I'm kind of stuck.
>>
>> I did look into trying to build from source, but I don't have a C
>> compiler on this system so setup.py died a horrible death.  I'd prefer to
>> avoid having to install Visual C++ Express on this system.
>>
>> Does anyone have any suggestions that might be helpful?
>>
>> Thanks!
>>   -William
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/c800c411/attachment.html>

From chaoyuejoy at gmail.com  Fri Jan 27 08:52:55 2012
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Fri, 27 Jan 2012 14:52:55 +0100
Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array?
Message-ID: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com>

Dear all,

suppose I have a ndarray a:

In [66]: a
Out[66]: array([0, 1, 2, 3, 4])

how can use it as 5X1 array without doing a=a.reshape(5,1)?

thanks

Chao

-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/a118b884/attachment.html>

From d.s.seljebotn at astro.uio.no  Fri Jan 27 08:56:38 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Fri, 27 Jan 2012 14:56:38 +0100
Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array?
In-Reply-To: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com>
References: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com>
Message-ID: <4F22AD16.3090800@astro.uio.no>

On 01/27/2012 02:52 PM, Chao YUE wrote:
> Dear all,
>
> suppose I have a ndarray a:
>
> In [66]: a
> Out[66]: array([0, 1, 2, 3, 4])
>
> how can use it as 5X1 array without doing a=a.reshape(5,1)?

a[:, np.newaxis]
a[:, None]

np.newaxis is None

Dag


From paul.anton.letnes at gmail.com  Fri Jan 27 09:28:56 2012
From: paul.anton.letnes at gmail.com (Paul Anton Letnes)
Date: Fri, 27 Jan 2012 15:28:56 +0100
Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array?
In-Reply-To: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com>
References: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com>
Message-ID: <702C9277-475E-4949-9A3C-EE3B54A411C0@gmail.com>


On 27. jan. 2012, at 14:52, Chao YUE wrote:

> Dear all,
> 
> suppose I have a ndarray a:
> 
> In [66]: a
> Out[66]: array([0, 1, 2, 3, 4])
> 
> how can use it as 5X1 array without doing a=a.reshape(5,1)?

Several ways, this is one, although not much simpler.
In [6]: a
Out[6]: array([0, 1, 2, 3, 4])

In [7]: a.shape = 5, 1

In [8]: a
Out[8]: 
array([[0],
       [1],
       [2],
       [3],
       [4]])

Paul


From tsyu80 at gmail.com  Fri Jan 27 09:36:46 2012
From: tsyu80 at gmail.com (Tony Yu)
Date: Fri, 27 Jan 2012 09:36:46 -0500
Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array?
In-Reply-To: <702C9277-475E-4949-9A3C-EE3B54A411C0@gmail.com>
References: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com>
	<702C9277-475E-4949-9A3C-EE3B54A411C0@gmail.com>
Message-ID: <CAEym_HqJvtz7b3eEMDSA5CZQaudt68m4QhqM=+51JNR1_69RFw@mail.gmail.com>

On Fri, Jan 27, 2012 at 9:28 AM, Paul Anton Letnes <
paul.anton.letnes at gmail.com> wrote:

>
> On 27. jan. 2012, at 14:52, Chao YUE wrote:
>
> > Dear all,
> >
> > suppose I have a ndarray a:
> >
> > In [66]: a
> > Out[66]: array([0, 1, 2, 3, 4])
> >
> > how can use it as 5X1 array without doing a=a.reshape(5,1)?
>
> Several ways, this is one, although not much simpler.
> In [6]: a
> Out[6]: array([0, 1, 2, 3, 4])
>
> In [7]: a.shape = 5, 1
>
> In [8]: a
> Out[8]:
> array([[0],
>       [1],
>       [2],
>       [3],
>       [4]])
>
> Paul
>
>
I'm assuming your issue with that call to reshape is that you need to know
the dimensions beforehand. An alternative is to call:

>>> a.reshape(-1, 1)

The "-1" allows numpy to "infer" the length based on the given sizes.

Another alternative is:

>>> a[:, np.newaxis]

-Tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/f49ea0d0/attachment.html>

From ben.root at ou.edu  Fri Jan 27 10:00:02 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Fri, 27 Jan 2012 09:00:02 -0600
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <4F2277F1.4050407@crans.org>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org>
	<4F218CC8.9070602@molden.no>
	<CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com>
	<4F2277F1.4050407@crans.org>
Message-ID: <CANNq6Fmdx7OJvv2e4Nz=FOde+1hr7pyfans058Ye+hXjiX5kEA@mail.gmail.com>

On Friday, January 27, 2012, Pierre Haessig <pierre.haessig at crans.org>
wrote:
> Le 26/01/2012 19:19, josef.pktd at gmail.com a ?crit :
>> The discussion had this reversed, numpy matches the behavior of
>> MATLAB, while R (statistics) only returns the cross covariance part as
>> proposed.
>>
> I would also say that there was an attempt to match MATLAB behavior.
> However, there is big difference with numpy.cov because of the default
> value `rowvar` being True. Most softwares and textbooks I know consider
> that, in a 2D context, matrix rows are obvervations while columns are
> the variables.
>
> Any idea why the "transposed" convention was selected in np.cov ?
> (This question, I'm raising for informative purpose only... ;-) )
>
> I also compared with octave to see how it works :
> -- Function File: cov (X, Y)
> Compute covariance.
>
> If each row of X and Y is an observation and each column is a
> variable, the (I, J)-th entry of `cov (X, Y)' is the covariance
> between the I-th variable in X and the J-th variable in Y. If
> called with one argument, compute `cov (X, X)'.
>
> (
http://www.gnu.org/software/octave/doc/interpreter/Correlation-and-Regression-Analysis.html
)
> I like the clear tone of this description. But strangely enough, this a
> bit different from Matlab.
> (
http://webcache.googleusercontent.com/search?q=cache:L3kF8BHcB4EJ:octave.1599824.n4.nabble.com/cov-m-function-behaves-different-from-Matlab-td1634956.html+&cd=1&hl=fr&ct=clnk&client=iceweasel-a
)
>
>> If there is a new xcov, then I think there should also be a xcorrcoef.
>> This case needs a different implementation than corrcoef, since the
>> xcov doesn't contain the variances and they need to be calculated
>> separately.
> Adding xcorrcoeff as well would make sense. The use of the np.var when
> setting the `axis` and `??ddof` arguments to appropriate values should the
> bring variances needed for the normalization.
>
> In the end, if adding xcov is the path of least resistance, this may be
> the way to go. What do people think ?
>
> Pierre
>

My vote is for xcov() and xcorrcoeff(). It won't break compatibility, and
the name of the function makes it clear what it does. It would also make
sense to add "seealso" references to each other in the docstrings.  The
documentation for xcov() should also make it clear the differences between
cov() and xcov() with examples and show how to get equivalent results using
just cov() for those with older versions of numpy.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/04c1e9a3/attachment.html>

From chaoyuejoy at gmail.com  Fri Jan 27 10:45:49 2012
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Fri, 27 Jan 2012 16:45:49 +0100
Subject: [Numpy-discussion] how to cite 1Xn array as nX1 array?
In-Reply-To: <CAEym_HqJvtz7b3eEMDSA5CZQaudt68m4QhqM=+51JNR1_69RFw@mail.gmail.com>
References: <CAAN-aREJKhU9AevzOgqqF9WptL+ZW02yuB_CPQU-aOLSupT-xA@mail.gmail.com>
	<702C9277-475E-4949-9A3C-EE3B54A411C0@gmail.com>
	<CAEym_HqJvtz7b3eEMDSA5CZQaudt68m4QhqM=+51JNR1_69RFw@mail.gmail.com>
Message-ID: <CAAN-aRFnj7byMJ2Jpy31NON0ViT8bCyRq+zAb2w87D0GYZUN=g@mail.gmail.com>

Thanks all.

chao

2012/1/27 Tony Yu <tsyu80 at gmail.com>

>
>
> On Fri, Jan 27, 2012 at 9:28 AM, Paul Anton Letnes <
> paul.anton.letnes at gmail.com> wrote:
>
>>
>> On 27. jan. 2012, at 14:52, Chao YUE wrote:
>>
>> > Dear all,
>> >
>> > suppose I have a ndarray a:
>> >
>> > In [66]: a
>> > Out[66]: array([0, 1, 2, 3, 4])
>> >
>> > how can use it as 5X1 array without doing a=a.reshape(5,1)?
>>
>> Several ways, this is one, although not much simpler.
>> In [6]: a
>> Out[6]: array([0, 1, 2, 3, 4])
>>
>> In [7]: a.shape = 5, 1
>>
>> In [8]: a
>> Out[8]:
>> array([[0],
>>       [1],
>>       [2],
>>       [3],
>>       [4]])
>>
>> Paul
>>
>>
> I'm assuming your issue with that call to reshape is that you need to know
> the dimensions beforehand. An alternative is to call:
>
> >>> a.reshape(-1, 1)
>
> The "-1" allows numpy to "infer" the length based on the given sizes.
>
> Another alternative is:
>
> >>> a[:, np.newaxis]
>
> -Tony
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/47996868/attachment.html>

From bsouthey at gmail.com  Fri Jan 27 11:28:53 2012
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 27 Jan 2012 10:28:53 -0600
Subject: [Numpy-discussion] Cross-covariance function
In-Reply-To: <CANNq6Fmdx7OJvv2e4Nz=FOde+1hr7pyfans058Ye+hXjiX5kEA@mail.gmail.com>
References: <CAGGi21akfzcHm+vJeFwyzVwvu7tvZspC1bggRAxZY4ieGxSh+Q@mail.gmail.com>
	<CAJtbx90k3tOEbrjA+oX_g1G=RTd2GuP_-jd1ovpMUnk2ALfN6Q@mail.gmail.com>
	<CAMMTP+Ah4CBbrpV=bx+QgRYx9UEcKVZDg+WGdn77zFFf6XUTQQ@mail.gmail.com>
	<4F2152D6.10303@crans.org>
	<CAAea2pZkpzX4wgvAy_ZgQyJJCxryFZGTnOK9+dc0FJMYLKc+SA@mail.gmail.com>
	<jfrsnq$6i8$1@dough.gmane.org> <4F217E82.2050002@crans.org>
	<4F218CC8.9070602@molden.no>
	<CAMMTP+B4LUXUHJovDudOjVL0MtHCwS=24Ohk3E3g-XBLwtf4tA@mail.gmail.com>
	<4F2277F1.4050407@crans.org>
	<CANNq6Fmdx7OJvv2e4Nz=FOde+1hr7pyfans058Ye+hXjiX5kEA@mail.gmail.com>
Message-ID: <4F22D0C5.2050807@gmail.com>

On 01/27/2012 09:00 AM, Benjamin Root wrote:
>
>
> On Friday, January 27, 2012, Pierre Haessig <pierre.haessig at crans.org 
> <mailto:pierre.haessig at crans.org>> wrote:
> > Le 26/01/2012 19:19, josef.pktd at gmail.com 
> <mailto:josef.pktd at gmail.com> a ?crit :
> >> The discussion had this reversed, numpy matches the behavior of
> >> MATLAB, while R (statistics) only returns the cross covariance part as
> >> proposed.
> >>
> > I would also say that there was an attempt to match MATLAB behavior.
> > However, there is big difference with numpy.cov because of the default
> > value `rowvar` being True. Most softwares and textbooks I know consider
> > that, in a 2D context, matrix rows are obvervations while columns are
> > the variables.
> >
> > Any idea why the "transposed" convention was selected in np.cov ?
> > (This question, I'm raising for informative purpose only... ;-) )
> >
> > I also compared with octave to see how it works :
> > -- Function File: cov (X, Y)
> > Compute covariance.
> >
> > If each row of X and Y is an observation and each column is a
> > variable, the (I, J)-th entry of `cov (X, Y)' is the covariance
> > between the I-th variable in X and the J-th variable in Y. If
> > called with one argument, compute `cov (X, X)'.
> >
> > 
> (http://www.gnu.org/software/octave/doc/interpreter/Correlation-and-Regression-Analysis.html)
> > I like the clear tone of this description. But strangely enough, this a
> > bit different from Matlab.
> > 
> (http://webcache.googleusercontent.com/search?q=cache:L3kF8BHcB4EJ:octave.1599824.n4.nabble.com/cov-m-function-behaves-different-from-Matlab-td1634956.html+&cd=1&hl=fr&ct=clnk&client=iceweasel-a 
> <http://webcache.googleusercontent.com/search?q=cache:L3kF8BHcB4EJ:octave.1599824.n4.nabble.com/cov-m-function-behaves-different-from-Matlab-td1634956.html+&cd=1&hl=fr&ct=clnk&client=iceweasel-a>)
> >
> >> If there is a new xcov, then I think there should also be a xcorrcoef.
> >> This case needs a different implementation than corrcoef, since the
> >> xcov doesn't contain the variances and they need to be calculated
> >> separately.
> > Adding xcorrcoeff as well would make sense. The use of the np.var when
> > setting the `axis` and `??ddof` arguments to appropriate values 
> should the
> > bring variances needed for the normalization.
> >
> > In the end, if adding xcov is the path of least resistance, this may be
> > the way to go. What do people think ?
> >
> > Pierre
> >
>
> My vote is for xcov() and xcorrcoeff(). It won't break compatibility, 
> and the name of the function makes it clear what it does. It would 
> also make sense to add "seealso" references to each other in the 
> docstrings.  The documentation for xcov() should also make it clear 
> the differences between cov() and xcov() with examples and show how to 
> get equivalent results using just cov() for those with older versions 
> of numpy.
>
> Ben Root
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-1 because these are too close to cross-correlation as used by signal 
processing.

The output is still a covariance so do we really need yet another set of 
very similar functions to maintain?
Or can we get away with a new keyword?

If speed really matters to you guys then surely moving np.cov into C 
would have more impact on 'saving the world' than this proposal. That 
also ignores algorithm used ( 
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance). 


Actually np.cov also is deficient in that it does not have the dtype 
argument so it is prone to numerical precision errors (especially 
getting the mean of the array). Probably should be a ticket...

Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/a3bad208/attachment.html>

From shish at keba.be  Fri Jan 27 12:28:53 2012
From: shish at keba.be (Olivier Delalleau)
Date: Fri, 27 Jan 2012 12:28:53 -0500
Subject: [Numpy-discussion] need advice on installing NumPy onto a
 Windows 7 with Python2.7 (32-bit)
In-Reply-To: <CAA3_nJW09Rk63Cr66qdRaZGWjcJhPGsOtOAgMREws2vL6N7G1A@mail.gmail.com>
References: <CAA3_nJV8=N8EW6VCjOLARRs5uDjvcs-WyFPqbjeKsHACkJsWqg@mail.gmail.com>
	<CAFXk4brwS9_fOGf+0exdK2cB4gfQougOdJGvCAOxniJdRASuuQ@mail.gmail.com>
	<CAA3_nJW09Rk63Cr66qdRaZGWjcJhPGsOtOAgMREws2vL6N7G1A@mail.gmail.com>
Message-ID: <CAFXk4bq1M2dn13Zx2+7Lk4ErNFxDcUJyB6VAzZykxWWkDCBJwA@mail.gmail.com>

Sorry then, I'm afraid I'm out of (simple ideas). Out of curiosity, I tried
to install Python 2.7.2 and numpy 1.6.1 on a Windows 7 computer and it
worked just fine, so it must be something with your specific setup...

-=- Olivier

2012/1/27 William McLendon <wcmclen at gmail.com>

> Yup, it's 32-bit python:
> Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]
> on win32
> Type "copyright", "credits" or "license()" for more information.
> >>>
>
> I've only got one python instance installed here :D
>
> Here's where I got the numpy installer,
> http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/, as far as I can
> tell this should be the right place.
>
> Python has been installed on this system for a while and it's been
> rebooted numerous times, I can't imagine that it wouldn't be there.
> Matplotlib's installer had no trouble finding Python.
>
> Thanks!
>   -William
>
>
>
>
>
> On Fri, Jan 27, 2012 at 4:55 AM, Olivier Delalleau <shish at keba.be> wrote:
>
>> It seems weird that it wouldn't work, as this is a pretty standard setup.
>> Here's a few ideas of things to check:
>> - Double-check it's really 32 bit Python (checking sys.maxint)
>> - Is there another Python installation that may cause some conflicts?
>> - Did you download the numpy superpack from the official website?
>> - Reboot
>>
>> Unlikely to be helpful, but I can't think of something else right now :/
>>
>> -=- Olivier
>>
>> 2012/1/27 William McLendon <wcmclen at gmail.com>
>>
>>> Hi,
>>>
>>> I am trying to install NumPy (using
>>> numpy-1.6.1-win32-superpack-python2.7) on a Windows 7 machine that has
>>> 32-bit Python 2.7 installed on it using the latest installer
>>> (python-2.7.2.msi).  Python is installed into the default location,
>>> C:\Python27, and as far as I can tell the registry knows about it -- or at
>>> least the windows uninstaller in the control panel does...
>>>
>>> The installation fails because the NumPy installer cannot find the
>>> Python installation.  I am then prompted with a screen that should allow me
>>> to type in the location of my python installation, but the text-boxes where
>>> I should type this do not allow input so I'm kind of stuck.
>>>
>>> I did look into trying to build from source, but I don't have a C
>>> compiler on this system so setup.py died a horrible death.  I'd prefer to
>>> avoid having to install Visual C++ Express on this system.
>>>
>>> Does anyone have any suggestions that might be helpful?
>>>
>>> Thanks!
>>>   -William
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/4e9ddf0e/attachment.html>

From b.telenczuk at biologie.hu-berlin.de  Fri Jan 27 15:46:06 2012
From: b.telenczuk at biologie.hu-berlin.de (Bartosz Telenczuk)
Date: Fri, 27 Jan 2012 21:46:06 +0100
Subject: [Numpy-discussion] preferred way of testing empty arrays
Message-ID: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>

I have been using numpy for several years and I am very impressed with its flexibility. However, there is one problem that has always bothered me.

Quite often I need to test consistently whether a variable is any of the following: an empty list, an empty array or None. Since both arrays and lists are ordered sequences I usually allow for both, and convert if necessary. However, when the (optional) argument is an empty list/array or None,  I skip its processing and do nothing.

Now, how should I test for 'emptiness'? 

PEP8 recommends:

For sequences, (strings, lists, tuples), use the fact that empty sequences are false.

>> seq = []
>> if not seq:
...    print 'Hello'

It works for empty numpy arrays:

>> a = np.array(seq)
>> if not a:
...     print 'Hello"
Hello

but if 'a' is non-empty it raises an exception:

>> a = np.array([1,2])
>> if not a:
...     print 'Hello"
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

One solution is to test lengths:

>> if len(seq) > 0:
....    ...
>> if len(a) > 0:
...     ...

but for None it fails again:

>> opt = None
>> if len(opt):
...    
TypeError: object of type 'NoneType' has no len()

even worse we can not test for None, because it will fail if someone accidentally wraps None in an array:

>> a = np.array(opt)
>> if opt is not None:
...      print 'hello'
hello

Although this behaviour is expected, it may be very confusing and it easily leads to errors. Even worse it adds unnecessary complexity in the code, because arrays, lists and None have to be handled differently. 

I hoped the I managed to explain the problem well. Is there a recommended way to test for empty arrays?

Cheers,

Bartosz


From ben.root at ou.edu  Fri Jan 27 15:57:52 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Fri, 27 Jan 2012 14:57:52 -0600
Subject: [Numpy-discussion] preferred way of testing empty arrays
In-Reply-To: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>
References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>
Message-ID: <CANNq6FnmUzN0SNf11YAKLDNJE20e_Ajjf-6BFrMBc9uwmeoR2g@mail.gmail.com>

On Fri, Jan 27, 2012 at 2:46 PM, Bartosz Telenczuk <
b.telenczuk at biologie.hu-berlin.de> wrote:

> I have been using numpy for several years and I am very impressed with its
> flexibility. However, there is one problem that has always bothered me.
>
> Quite often I need to test consistently whether a variable is any of the
> following: an empty list, an empty array or None. Since both arrays and
> lists are ordered sequences I usually allow for both, and convert if
> necessary. However, when the (optional) argument is an empty list/array or
> None,  I skip its processing and do nothing.
>
> Now, how should I test for 'emptiness'?
>
> PEP8 recommends:
>
> For sequences, (strings, lists, tuples), use the fact that empty sequences
> are false.
>
> >> seq = []
> >> if not seq:
> ...    print 'Hello'
>
> It works for empty numpy arrays:
>
> >> a = np.array(seq)
> >> if not a:
> ...     print 'Hello"
> Hello
>
> but if 'a' is non-empty it raises an exception:
>
> >> a = np.array([1,2])
> >> if not a:
> ...     print 'Hello"
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()
>
> One solution is to test lengths:
>
> >> if len(seq) > 0:
> ....    ...
> >> if len(a) > 0:
> ...     ...
>
> but for None it fails again:
>
> >> opt = None
> >> if len(opt):
> ...
> TypeError: object of type 'NoneType' has no len()
>
> even worse we can not test for None, because it will fail if someone
> accidentally wraps None in an array:
>
> >> a = np.array(opt)
> >> if opt is not None:
> ...      print 'hello'
> hello
>
> Although this behaviour is expected, it may be very confusing and it
> easily leads to errors. Even worse it adds unnecessary complexity in the
> code, because arrays, lists and None have to be handled differently.
>
> I hoped the I managed to explain the problem well. Is there a recommended
> way to test for empty arrays?
>
> Cheers,
>
> Bartosz
>
>
Don't know if it is recommended, but this is used frequently within
matplotlib:

if np.prod(a.shape) == 0 :
   print "Is Empty!"

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/c15099ef/attachment.html>

From robert.kern at gmail.com  Fri Jan 27 15:59:03 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 27 Jan 2012 20:59:03 +0000
Subject: [Numpy-discussion] preferred way of testing empty arrays
In-Reply-To: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>
References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>
Message-ID: <CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com>

On Fri, Jan 27, 2012 at 20:46, Bartosz Telenczuk
<b.telenczuk at biologie.hu-berlin.de> wrote:
> I have been using numpy for several years and I am very impressed with its flexibility. However, there is one problem that has always bothered me.
>
> Quite often I need to test consistently whether a variable is any of the following: an empty list, an empty array or None. Since both arrays and lists are ordered sequences I usually allow for both, and convert if necessary. However, when the (optional) argument is an empty list/array or None, ?I skip its processing and do nothing.
>
> Now, how should I test for 'emptiness'?
>
> PEP8 recommends:
>
> For sequences, (strings, lists, tuples), use the fact that empty sequences are false.
>
>>> seq = []
>>> if not seq:
> ... ? ?print 'Hello'
>
> It works for empty numpy arrays:
>
>>> a = np.array(seq)
>>> if not a:
> ... ? ? print 'Hello"
> Hello
>
> but if 'a' is non-empty it raises an exception:
>
>>> a = np.array([1,2])
>>> if not a:
> ... ? ? print 'Hello"
> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>
> One solution is to test lengths:
>
>>> if len(seq) > 0:
> .... ? ?...
>>> if len(a) > 0:
> ... ? ? ...
>
> but for None it fails again:
>
>>> opt = None
>>> if len(opt):
> ...
> TypeError: object of type 'NoneType' has no len()
>
> even worse we can not test for None, because it will fail if someone accidentally wraps None in an array:
>
>>> a = np.array(opt)
>>> if opt is not None:
> ... ? ? ?print 'hello'
> hello
>
> Although this behaviour is expected, it may be very confusing and it easily leads to errors. Even worse it adds unnecessary complexity in the code, because arrays, lists and None have to be handled differently.
>
> I hoped the I managed to explain the problem well. Is there a recommended way to test for empty arrays?

[~]
|5> x = np.zeros([0])

[~]
|6> x
array([], dtype=float64)

[~]
|7> x.size == 0
True


Note that checking for len(x) will fail for some empty arrays:

[~]
|8> x = np.zeros([10, 0])

[~]
|9> x.size == 0
True

[~]
|10> len(x)
10

There is no way to test all of the cases (empty sequence, empty array,
None) in the same way. Usually, it's a bad idea to conflate the three.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From emayssat at gmail.com  Fri Jan 27 16:17:36 2012
From: emayssat at gmail.com (Emmanuel Mayssat)
Date: Fri, 27 Jan 2012 13:17:36 -0800
Subject: [Numpy-discussion] bug in array instanciation?
Message-ID: <CACB6ZmB-b6uSKdDeFojmS4Y7hEodyV=CVhGcr2ttApcv+Hpqvw@mail.gmail.com>

In [20]: dt_knobs =
[('pvName',(str,40)),('start','float'),('stop','float'),('mode',(str,10))]

In [21]: r_knobs = np.recarray([],dtype=dt_knobs)

In [22]: r_knobs
Out[22]:
rec.array(('\xa0\x8c\xc9\x02\x00\x00\x00\x00(\xc8v\x02\x00\x00\x00\x00\x00\xd3\x86\x02\x00\x00\x00\x00\x10\xdeJ\x02\x00\x00\x00\x00\x906\xb9\x02',
1.63e-322, 1.351330465085e-312, '\x90\xc6\xa3\x02\x00\x00\x00\x00P'),
      dtype=[('pvName', '|S40'), ('start', '<f8'), ('stop', '<f8'),
('mode', '|S10')])

why is the array not empty?
--
E


From howard at renci.org  Fri Jan 27 16:18:06 2012
From: howard at renci.org (Howard)
Date: Fri, 27 Jan 2012 16:18:06 -0500
Subject: [Numpy-discussion] NetCDF4/numpy question
Message-ID: <4F23148E.80902@renci.org>

Hi all

I am a fairly recent convert to python and I have got a question that's 
got me stumped.  I hope this is the right mailing list: here goes :)

I am reading some time series data out of a netcdf file a single 
timestep at a time.  If the data is NaN, I want to reset it to the 
minimum of the dataset over all timesteps (which I already know).  The 
data is in a variable of type numpy.ma.core.MaskedArray called modelData.

If I do this:

       for i in range(len(modelData)):
          if math.isnan(modelData[i]):
             modelData[i] = dataMin

I get the effect I want, If I do this:

    modelData[np.isnan(modelData)] = dataMin

it doesn't seem to be working.  Of course I could just do the first one, 
but len(modelData) is about 3.5 million, and it's taking about 20 
seconds to run.  This is happening inside of a rendering loop, so I'd 
like it to be as fast as possible, and I thought the second one might be 
faster, and maybe it is, but it doesn't seem to be working! :)

Any ideas would be much appreciated.

Thanks
Howard

-- 
Howard Lander <mailto:howard at renci.org>
Senior Research Software Developer
Renaissance Computing Institute (RENCI) <http://www.renci.org>
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/2576f2bc/attachment.html>

From robert.kern at gmail.com  Fri Jan 27 16:22:31 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 27 Jan 2012 21:22:31 +0000
Subject: [Numpy-discussion] bug in array instanciation?
In-Reply-To: <CACB6ZmB-b6uSKdDeFojmS4Y7hEodyV=CVhGcr2ttApcv+Hpqvw@mail.gmail.com>
References: <CACB6ZmB-b6uSKdDeFojmS4Y7hEodyV=CVhGcr2ttApcv+Hpqvw@mail.gmail.com>
Message-ID: <CAF6FJiu_m7HQwgh8EFZTM8gcFRiQmT+Dm7Ca+1dyqwWKjCbYLg@mail.gmail.com>

On Fri, Jan 27, 2012 at 21:17, Emmanuel Mayssat <emayssat at gmail.com> wrote:
> In [20]: dt_knobs =
> [('pvName',(str,40)),('start','float'),('stop','float'),('mode',(str,10))]
>
> In [21]: r_knobs = np.recarray([],dtype=dt_knobs)
>
> In [22]: r_knobs
> Out[22]:
> rec.array(('\xa0\x8c\xc9\x02\x00\x00\x00\x00(\xc8v\x02\x00\x00\x00\x00\x00\xd3\x86\x02\x00\x00\x00\x00\x10\xdeJ\x02\x00\x00\x00\x00\x906\xb9\x02',
> 1.63e-322, 1.351330465085e-312, '\x90\xc6\xa3\x02\x00\x00\x00\x00P'),
> ? ? ?dtype=[('pvName', '|S40'), ('start', '<f8'), ('stop', '<f8'),
> ('mode', '|S10')])
>
> why is the array not empty?

The shape [] creates a rank-0 array, which is essentially a scalar.

[~]
|1> x = np.array(10)

[~]
|2> x
array(10)

[~]
|3> x.shape
()


If you want an empty array, you need at least one dimension of size 0:

[~]
|7> r_knobs = np.recarray([0], dtype=dt_knobs)

[~]
|8> r_knobs
rec.array([],
      dtype=[('pvName', '|S40'), ('start', '<f8'), ('stop', '<f8'),
('mode', '|S10')])

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From b.telenczuk at biologie.hu-berlin.de  Fri Jan 27 16:24:52 2012
From: b.telenczuk at biologie.hu-berlin.de (Bartosz Telenczuk)
Date: Fri, 27 Jan 2012 22:24:52 +0100
Subject: [Numpy-discussion] preferred way of testing empty arrays
In-Reply-To: <CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com>
References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>
	<CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com>
Message-ID: <3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de>

Thank you for your tips. I was not aware of the possible problems with len.

> There is no way to test all of the cases (empty sequence, empty array,
> None) in the same way. Usually, it's a bad idea to conflate the three.

I agree that this should be avoided. However, there are cases in which it is not possible or hard. My case is that I get some extra data to add to my plots from a database. The dataset may be undefined (which means None), empty array or empty list. In all cases the data should not be plotted. If I want to test for all the cases, my program becomes quite complex.

In fact, Python provides False values for most empty objects, but NumPy seems to ignore this. It might be a good idea to have a helper function which handles all objects consistently.

Yours,

Bartosz


From robert.kern at gmail.com  Fri Jan 27 16:29:39 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 27 Jan 2012 21:29:39 +0000
Subject: [Numpy-discussion] preferred way of testing empty arrays
In-Reply-To: <3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de>
References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>
	<CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com>
	<3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de>
Message-ID: <CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com>

On Fri, Jan 27, 2012 at 21:24, Bartosz Telenczuk
<b.telenczuk at biologie.hu-berlin.de> wrote:
> Thank you for your tips. I was not aware of the possible problems with len.
>
>> There is no way to test all of the cases (empty sequence, empty array,
>> None) in the same way. Usually, it's a bad idea to conflate the three.
>
> I agree that this should be avoided. However, there are cases in which it is not possible or hard. My case is that I get some extra data to add to my plots from a database. The dataset may be undefined (which means None), empty array or empty list. In all cases the data should not be plotted. If I want to test for all the cases, my program becomes quite complex.

Well, if you really need to do this in more than one place, define a
utility function and call it a day.

def should_not_plot(x):
    if x is None:
        return True
    elif isinstance(x, np.ndarray):
        return x.size == 0
    else:
        return bool(x)

> In fact, Python provides False values for most empty objects, but NumPy seems to ignore this. It might be a good idea to have a helper function which handles all objects consistently.

np.asarray(x).size == 0

None should rarely be treated the same as an empty list or a 0-size
array, so that should be left to application-specific code.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From b.telenczuk at biologie.hu-berlin.de  Fri Jan 27 16:37:32 2012
From: b.telenczuk at biologie.hu-berlin.de (Bartosz Telenczuk)
Date: Fri, 27 Jan 2012 22:37:32 +0100
Subject: [Numpy-discussion] preferred way of testing empty arrays
In-Reply-To: <CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com>
References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>
	<CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com>
	<3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de>
	<CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com>
Message-ID: <FDDAFAA6-B8DF-40A1-AB9B-8DB4E183E212@biologie.hu-berlin.de>

This will be indeed very helpful. Thanks.

> Well, if you really need to do this in more than one place, define a
> utility function and call it a day.
> 
> def should_not_plot(x):
>    if x is None:
>        return True
>    elif isinstance(x, np.ndarray):
>        return x.size == 0
>    else:
>        return bool(x)

Bartosz


From shish at keba.be  Fri Jan 27 16:42:59 2012
From: shish at keba.be (Olivier Delalleau)
Date: Fri, 27 Jan 2012 16:42:59 -0500
Subject: [Numpy-discussion] NetCDF4/numpy question
In-Reply-To: <4F23148E.80902@renci.org>
References: <4F23148E.80902@renci.org>
Message-ID: <CAFXk4bpk0H==Sph4RKyS-8SfYhkYFtvMo3GSbVbu3eyDaGrY6w@mail.gmail.com>

What are the types and shapes of modelData and dataMin? (it works for me
with modelData a (3, 4) numpy array and dataMin a Python float, with numpy
1.6.1)

-=- Olivier

2012/1/27 Howard <howard at renci.org>

>  Hi all
>
> I am a fairly recent convert to python and I have got a question that's
> got me stumped.  I hope this is the right mailing list: here goes :)
>
> I am reading some time series data out of a netcdf file a single timestep
> at a time.  If the data is NaN, I want to reset it to the minimum of the
> dataset over all timesteps (which I already know).  The data is in a
> variable of type numpy.ma.core.MaskedArray called modelData.
>
> If I do this:
>
>       for i in range(len(modelData)):
>          if math.isnan(modelData[i]):
>             modelData[i] = dataMin
>
> I get the effect I want, If I do this:
>
>    modelData[np.isnan(modelData)] = dataMin
>
> it doesn't seem to be working.  Of course I could just do the first one,
> but len(modelData) is about 3.5 million, and it's taking about 20 seconds
> to run.  This is happening inside of a rendering loop, so I'd like it to be
> as fast as possible, and I thought the second one might be faster, and
> maybe it is, but it doesn't seem to be working! :)
>
> Any ideas would be much appreciated.
>
> Thanks
> Howard
>
> --
> Howard Lander <howard at renci.org>
> Senior Research Software Developer
> Renaissance Computing Institute (RENCI) <http://www.renci.org>
> The University of North Carolina at Chapel Hill
> Duke University
> North Carolina State University
> 100 Europa Drive
> Suite 540
> Chapel Hill, NC 27517
> 919-445-9651
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/0cde807b/attachment.html>

From howard at renci.org  Fri Jan 27 16:54:13 2012
From: howard at renci.org (Howard)
Date: Fri, 27 Jan 2012 16:54:13 -0500
Subject: [Numpy-discussion] NetCDF4/numpy question
In-Reply-To: <CAFXk4bpk0H==Sph4RKyS-8SfYhkYFtvMo3GSbVbu3eyDaGrY6w@mail.gmail.com>
References: <4F23148E.80902@renci.org>
	<CAFXk4bpk0H==Sph4RKyS-8SfYhkYFtvMo3GSbVbu3eyDaGrY6w@mail.gmail.com>
Message-ID: <4F231D05.7090705@renci.org>

Hi Olivier

I added this to the code:

print "modelData:", type(modelData), modelData.shape, modelData.size
print "dataMin:", type(dataMin)

and got

modelData: <class 'numpy.ma.core.MaskedArray'> (1767734,) 1767734
dataMin: <type 'float'>

What's funny is I tried the example from

http://docs.scipy.org/doc/numpy-1.6.0/numpy-user.pdf

and it works fine for me. Maybe 1.7 million is over some threshhold?

Thanks
Howard

 >>> myarr = np.ma.core.MaskedArray([1., 0., np.nan, 3.])
 >>> myarr[np.isnan(myarr)] = 30
 >>> myarr
masked_array(data = [  1.   0.  30.   3.],
              mask = False,
        fill_value = 1e+20)


On 1/27/12 4:42 PM, Olivier Delalleau wrote:
> What are the types and shapes of modelData and dataMin? (it works for 
> me with modelData a (3, 4) numpy array and dataMin a Python float, 
> with numpy 1.6.1)
>
> -=- Olivier
>
> 2012/1/27 Howard <howard at renci.org <mailto:howard at renci.org>>
>
>     Hi all
>
>     I am a fairly recent convert to python and I have got a question
>     that's got me stumped.  I hope this is the right mailing list:
>     here goes :)
>
>     I am reading some time series data out of a netcdf file a single
>     timestep at a time.  If the data is NaN, I want to reset it to the
>     minimum of the dataset over all timesteps (which I already know). 
>     The data is in a variable of type numpy.ma.core.MaskedArray called
>     modelData.
>
>     If I do this:
>
>           for i in range(len(modelData)):
>              if math.isnan(modelData[i]):
>                 modelData[i] = dataMin
>
>     I get the effect I want, If I do this:
>
>        modelData[np.isnan(modelData)] = dataMin
>
>     it doesn't seem to be working.  Of course I could just do the
>     first one, but len(modelData) is about 3.5 million, and it's
>     taking about 20 seconds to run.  This is happening inside of a
>     rendering loop, so I'd like it to be as fast as possible, and I
>     thought the second one might be faster, and maybe it is, but it
>     doesn't seem to be working! :)
>
>     Any ideas would be much appreciated.
>
>     Thanks
>     Howard
>
>     -- 
>     Howard Lander <mailto:howard at renci.org>
>     Senior Research Software Developer
>     Renaissance Computing Institute (RENCI) <http://www.renci.org>
>     The University of North Carolina at Chapel Hill
>     Duke University
>     North Carolina State University
>     100 Europa Drive
>     Suite 540
>     Chapel Hill, NC 27517
>     919-445-9651
>
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Howard Lander <mailto:howard at renci.org>
Senior Research Software Developer
Renaissance Computing Institute (RENCI) <http://www.renci.org>
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/35c576ec/attachment.html>

From howard at renci.org  Fri Jan 27 16:58:05 2012
From: howard at renci.org (Howard)
Date: Fri, 27 Jan 2012 16:58:05 -0500
Subject: [Numpy-discussion] NetCDF4/numpy question
In-Reply-To: <4F231D05.7090705@renci.org>
References: <4F23148E.80902@renci.org>
	<CAFXk4bpk0H==Sph4RKyS-8SfYhkYFtvMo3GSbVbu3eyDaGrY6w@mail.gmail.com>
	<4F231D05.7090705@renci.org>
Message-ID: <4F231DED.60001@renci.org>

Oh, one other thing I should mention:

I did the install of numpy yesterday and I also have 1.6.1

Howard

On 1/27/12 4:54 PM, Howard wrote:
> Hi Olivier
>
> I added this to the code:
>
> print "modelData:", type(modelData), modelData.shape, modelData.size
> print "dataMin:", type(dataMin)
>
> and got
>
> modelData: <class 'numpy.ma.core.MaskedArray'> (1767734,) 1767734
> dataMin: <type 'float'>
>
> What's funny is I tried the example from
>
> http://docs.scipy.org/doc/numpy-1.6.0/numpy-user.pdf
>
> and it works fine for me. Maybe 1.7 million is over some threshhold?
>
> Thanks
> Howard
>
> >>> myarr = np.ma.core.MaskedArray([1., 0., np.nan, 3.])
> >>> myarr[np.isnan(myarr)] = 30
> >>> myarr
> masked_array(data = [  1.   0.  30.   3.],
>              mask = False,
>        fill_value = 1e+20)
>
>
> On 1/27/12 4:42 PM, Olivier Delalleau wrote:
>> What are the types and shapes of modelData and dataMin? (it works for 
>> me with modelData a (3, 4) numpy array and dataMin a Python float, 
>> with numpy 1.6.1)
>>
>> -=- Olivier
>>
>> 2012/1/27 Howard <howard at renci.org <mailto:howard at renci.org>>
>>
>>     Hi all
>>
>>     I am a fairly recent convert to python and I have got a question
>>     that's got me stumped.  I hope this is the right mailing list:
>>     here goes :)
>>
>>     I am reading some time series data out of a netcdf file a single
>>     timestep at a time.  If the data is NaN, I want to reset it to
>>     the minimum of the dataset over all timesteps (which I already
>>     know).  The data is in a variable of type
>>     numpy.ma.core.MaskedArray called modelData.
>>
>>     If I do this:
>>
>>           for i in range(len(modelData)):
>>              if math.isnan(modelData[i]):
>>                 modelData[i] = dataMin
>>
>>     I get the effect I want, If I do this:
>>
>>        modelData[np.isnan(modelData)] = dataMin
>>
>>     it doesn't seem to be working.  Of course I could just do the
>>     first one, but len(modelData) is about 3.5 million, and it's
>>     taking about 20 seconds to run.  This is happening inside of a
>>     rendering loop, so I'd like it to be as fast as possible, and I
>>     thought the second one might be faster, and maybe it is, but it
>>     doesn't seem to be working! :)
>>
>>     Any ideas would be much appreciated.
>>
>>     Thanks
>>     Howard
>>
>>     -- 
>>     Howard Lander <mailto:howard at renci.org>
>>     Senior Research Software Developer
>>     Renaissance Computing Institute (RENCI) <http://www.renci.org>
>>     The University of North Carolina at Chapel Hill
>>     Duke University
>>     North Carolina State University
>>     100 Europa Drive
>>     Suite 540
>>     Chapel Hill, NC 27517
>>     919-445-9651
>>
>>     _______________________________________________
>>     NumPy-Discussion mailing list
>>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> -- 
> Howard Lander <mailto:howard at renci.org>
> Senior Research Software Developer
> Renaissance Computing Institute (RENCI) <http://www.renci.org>
> The University of North Carolina at Chapel Hill
> Duke University
> North Carolina State University
> 100 Europa Drive
> Suite 540
> Chapel Hill, NC 27517
> 919-445-9651


-- 
Howard Lander <mailto:howard at renci.org>
Senior Research Software Developer
Renaissance Computing Institute (RENCI) <http://www.renci.org>
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/de08b3ed/attachment.html>

From efiring at hawaii.edu  Fri Jan 27 17:21:04 2012
From: efiring at hawaii.edu (Eric Firing)
Date: Fri, 27 Jan 2012 12:21:04 -1000
Subject: [Numpy-discussion] NetCDF4/numpy question
In-Reply-To: <4F23148E.80902@renci.org>
References: <4F23148E.80902@renci.org>
Message-ID: <4F232350.4090702@hawaii.edu>

On 01/27/2012 11:18 AM, Howard wrote:
> Hi all
>
> I am a fairly recent convert to python and I have got a question that's
> got me stumped. I hope this is the right mailing list: here goes :)
>
> I am reading some time series data out of a netcdf file a single
> timestep at a time. If the data is NaN, I want to reset it to the
> minimum of the dataset over all timesteps (which I already know). The
> data is in a variable of type numpy.ma.core.MaskedArray called modelData.
>
> If I do this:
>
> for i in range(len(modelData)):
> if math.isnan(modelData[i]):
> modelData[i] = dataMin
>
> I get the effect I want, If I do this:
>
> modelData[np.isnan(modelData)] = dataMin
>
> it doesn't seem to be working. Of course I could just do the first one,
> but len(modelData) is about 3.5 million, and it's taking about 20
> seconds to run. This is happening inside of a rendering loop, so I'd
> like it to be as fast as possible, and I thought the second one might be
> faster, and maybe it is, but it doesn't seem to be working! :)

It would help if you would say explicitly what you mean by "doesn't seem 
to be working", ideally by providing a minimal complete example 
illustrating the problem.

Does modelData have masked values that you want to keep separate from 
your NaN values?  If not, you can do this:

y = np.ma.masked_invalid(modelData).filled(dataMin)

Then y will be an ordinary ndarray.  If this is not satisfactory because 
you need to keep separate some initially masked values, then you may 
need to save the initial mask and use it to turn y back into a masked array.

You may be running into trouble with your initial approach because using 
np.isnan on a masked array is giving a masked array, and I think trying 
to index with a masked array is not advised.

In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True]))
Out[2]:
masked_array(data = [False True --],
              mask = [False False  True],
        fill_value = True)

Eric

>
> Any ideas would be much appreciated.
>
> Thanks
> Howard
>
> --
> Howard Lander <mailto:howard at renci.org>
> Senior Research Software Developer
> Renaissance Computing Institute (RENCI) <http://www.renci.org>
> The University of North Carolina at Chapel Hill
> Duke University
> North Carolina State University
> 100 Europa Drive
> Suite 540
> Chapel Hill, NC 27517
> 919-445-9651
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From howard at renci.org  Fri Jan 27 17:37:35 2012
From: howard at renci.org (Howard)
Date: Fri, 27 Jan 2012 17:37:35 -0500
Subject: [Numpy-discussion] NetCDF4/numpy question
In-Reply-To: <4F232350.4090702@hawaii.edu>
References: <4F23148E.80902@renci.org> <4F232350.4090702@hawaii.edu>
Message-ID: <4F23272F.30601@renci.org>

On 1/27/12 5:21 PM, Eric Firing wrote:
> On 01/27/2012 11:18 AM, Howard wrote:
>> Hi all
>>
>> I am a fairly recent convert to python and I have got a question that's
>> got me stumped. I hope this is the right mailing list: here goes :)
>>
>> I am reading some time series data out of a netcdf file a single
>> timestep at a time. If the data is NaN, I want to reset it to the
>> minimum of the dataset over all timesteps (which I already know). The
>> data is in a variable of type numpy.ma.core.MaskedArray called modelData.
>>
>> If I do this:
>>
>> for i in range(len(modelData)):
>> if math.isnan(modelData[i]):
>> modelData[i] = dataMin
>>
>> I get the effect I want, If I do this:
>>
>> modelData[np.isnan(modelData)] = dataMin
>>
>> it doesn't seem to be working. Of course I could just do the first one,
>> but len(modelData) is about 3.5 million, and it's taking about 20
>> seconds to run. This is happening inside of a rendering loop, so I'd
>> like it to be as fast as possible, and I thought the second one might be
>> faster, and maybe it is, but it doesn't seem to be working! :)
> It would help if you would say explicitly what you mean by "doesn't seem
> to be working", ideally by providing a minimal complete example
> illustrating the problem.
Hi Eric

Thanks for the reply.  Yes, I can be a little more specific about the 
issue.  I am reading data from a storm surge model out of a NetCDF file 
so I can render it with tricontourf. The model data has both a 
triangulation and a set of lat, lon points that are invariant for the 
entire model run, as well as data for each time step. As the model runs, 
triangles in the coastal plain wet and dry: the dry values are indicated 
by NaN values in the data and should not be rendered.  Those I mask off 
previous to this code. I have found, in using tricontourf, that in the 
mapping from data values to color values, the range of the data seems to 
include even the data from the masked triangles.  This causes the data 
to be either monochromatic or bi-chromatic (the high and low colors in 
the map).  However, once the triangles are masked, if I set the 
corresponding data values to the known dataMin (or in fact, any value in 
the valid data range) the render proceeds correctly.  So in the case of 
the first piece of code, I get reasonable images: using the second I do not.

>
> Does modelData have masked values that you want to keep separate from
> your NaN values?  If not, you can do this:

No I don't think so.
>
> y = np.ma.masked_invalid(modelData).filled(dataMin)
>
> Then y will be an ordinary ndarray.  If this is not satisfactory because
> you need to keep separate some initially masked values, then you may
> need to save the initial mask and use it to turn y back into a masked array.
>
> You may be running into trouble with your initial approach because using
> np.isnan on a masked array is giving a masked array, and I think trying
> to index with a masked array is not advised.
This could certainly be be the issue. I will look into this Monday.

Thanks very much for taking the time to reply.
Howard

>
> In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True]))
> Out[2]:
> masked_array(data = [False True --],
>                mask = [False False  True],
>          fill_value = True)
>
> Eric
>
>> Any ideas would be much appreciated.
>>
>> Thanks
>> Howard
>>
>> --
>> Howard Lander<mailto:howard at renci.org>
>> Senior Research Software Developer
>> Renaissance Computing Institute (RENCI)<http://www.renci.org>
>> The University of North Carolina at Chapel Hill
>> Duke University
>> North Carolina State University
>> 100 Europa Drive
>> Suite 540
>> Chapel Hill, NC 27517
>> 919-445-9651
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Howard Lander <mailto:howard at renci.org>
Senior Research Software Developer
Renaissance Computing Institute (RENCI) <http://www.renci.org>
The University of North Carolina at Chapel Hill
Duke University
North Carolina State University
100 Europa Drive
Suite 540
Chapel Hill, NC 27517
919-445-9651
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/d89bdb08/attachment.html>

From ben.root at ou.edu  Fri Jan 27 17:46:04 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Fri, 27 Jan 2012 16:46:04 -0600
Subject: [Numpy-discussion] NetCDF4/numpy question
In-Reply-To: <4F23272F.30601@renci.org>
References: <4F23148E.80902@renci.org> <4F232350.4090702@hawaii.edu>
	<4F23272F.30601@renci.org>
Message-ID: <CANNq6FnZJi7rA7yHAP2eFQ5e5HdQ-ximBTbwBebz4yf+sUuw6A@mail.gmail.com>

On Fri, Jan 27, 2012 at 4:37 PM, Howard <howard at renci.org> wrote:

> I have found, in using tricontourf, that in the mapping from data values
> to color values, the range of the data seems to include even the data from
> the masked triangles.  This causes the data to be either monochromatic or
> bi-chromatic (the high and low colors in the map).  However, once the
> triangles are masked, if I set the corresponding data values to the known
> dataMin (or in fact, any value in the valid data range) the render proceeds
> correctly.  So in the case of the first piece of code, I get reasonable
> images: using the second I do not.
>
>
>
This sounds like a bug in tricontourf.  It should not be doing that.  If
you could report it to the matplotlib-devel list with an example
demonstrating your problem, I can see to it that it gets resolved.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120127/91b39ec1/attachment.html>

From stefan at sun.ac.za  Fri Jan 27 23:28:57 2012
From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=)
Date: Fri, 27 Jan 2012 20:28:57 -0800
Subject: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than
	Fortran
In-Reply-To: <CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
References: <CADDwiVDV_oXV8KVK+eHEP9VHbWETp3AoTz7RTMBHzJ=pGp6M4g@mail.gmail.com>
	<CADDwiVC2b_cM2svOinC9ejipTiWKNyWqenPz1GMFr-dwhmT2nA@mail.gmail.com>
	<CADDwiVAciT=n=MZK5hjkD+qKAxdFxpD5PN_m3+sTFsDziVEDWA@mail.gmail.com>
Message-ID: <CABDkGQk-Us4Fe0kxR95VZPQMDaVxip2hL0cA-RDYG8B2H-zHtg@mail.gmail.com>

Hey, Ond?ej

2012/1/21 Ond?ej ?ert?k <ondrej.certik at gmail.com>:
> I read the Mandelbrot code using NumPy at this page:
>
> http://mentat.za.net/numpy/intro/intro.html

I wrote this as a tutorial for beginners, so the emphasis is on
simplicity.  Do you have any suggestions on how to improve the code
without obfuscating the tutorial?

St?fan


From shish at keba.be  Sat Jan 28 00:28:42 2012
From: shish at keba.be (Olivier Delalleau)
Date: Sat, 28 Jan 2012 00:28:42 -0500
Subject: [Numpy-discussion] NetCDF4/numpy question
In-Reply-To: <4F23272F.30601@renci.org>
References: <4F23148E.80902@renci.org> <4F232350.4090702@hawaii.edu>
	<4F23272F.30601@renci.org>
Message-ID: <CAFXk4bpSoNFNuctJwV6CRFcGzKTowS=TNGwoY0c048KWBuq2Gg@mail.gmail.com>

Eric's probably right and it's indexing with a masked array that's causing
you trouble.
Since you seem to say your NaN values correspond to your mask, you should
be able to simply do:

modelData[modeData.mask] = dataMin

Note that in further processing it may then make more sense to remove the
mask, since your array is now full with valid data:
modelData = modelData.data

-=- Olivier

Le 27 janvier 2012 17:37, Howard <howard at renci.org> a ?crit :

>  On 1/27/12 5:21 PM, Eric Firing wrote:
>
> On 01/27/2012 11:18 AM, Howard wrote:
>
>  Hi all
>
> I am a fairly recent convert to python and I have got a question that's
> got me stumped. I hope this is the right mailing list: here goes :)
>
> I am reading some time series data out of a netcdf file a single
> timestep at a time. If the data is NaN, I want to reset it to the
> minimum of the dataset over all timesteps (which I already know). The
> data is in a variable of type numpy.ma.core.MaskedArray called modelData.
>
> If I do this:
>
> for i in range(len(modelData)):
> if math.isnan(modelData[i]):
> modelData[i] = dataMin
>
> I get the effect I want, If I do this:
>
> modelData[np.isnan(modelData)] = dataMin
>
> it doesn't seem to be working. Of course I could just do the first one,
> but len(modelData) is about 3.5 million, and it's taking about 20
> seconds to run. This is happening inside of a rendering loop, so I'd
> like it to be as fast as possible, and I thought the second one might be
> faster, and maybe it is, but it doesn't seem to be working! :)
>
>  It would help if you would say explicitly what you mean by "doesn't seem
> to be working", ideally by providing a minimal complete example
> illustrating the problem.
>
>  Hi Eric
>
> Thanks for the reply.  Yes, I can be a little more specific about the
> issue.  I am reading data from a storm surge model out of a NetCDF file so
> I can render it with tricontourf. The model data has both a triangulation
> and a set of lat, lon points that are invariant for the entire model run,
> as well as data for each time step. As the model runs, triangles in the
> coastal plain wet and dry: the dry values are indicated by NaN values in
> the data and should not be rendered.  Those I mask off previous to this
> code. I have found, in using tricontourf, that in the mapping from data
> values to color values, the range of the data seems to include even the
> data from the masked triangles.  This causes the data to be either
> monochromatic or bi-chromatic (the high and low colors in the map).
> However, once the triangles are masked, if I set the corresponding data
> values to the known dataMin (or in fact, any value in the valid data range)
> the render proceeds correctly.  So in the case of the first piece of code,
> I get reasonable images: using the second I do not.
>
>
>  Does modelData have masked values that you want to keep separate from
> your NaN values?  If not, you can do this:
>
>
> No I don't think so.
>
> y = np.ma.masked_invalid(modelData).filled(dataMin)
>
> Then y will be an ordinary ndarray.  If this is not satisfactory because
> you need to keep separate some initially masked values, then you may
> need to save the initial mask and use it to turn y back into a masked array.
>
> You may be running into trouble with your initial approach because using
> np.isnan on a masked array is giving a masked array, and I think trying
> to index with a masked array is not advised.
>
>  This could certainly be be the issue. I will look into this Monday.
>
> Thanks very much for taking the time to reply.
> Howard
>
>
>  In [2]: np.isnan(np.ma.array([1.0, np.nan, 2.0], mask=[False, False, True]))
> Out[2]:
> masked_array(data = [False True --],
>               mask = [False False  True],
>         fill_value = True)
>
> Eric
>
>
>  Any ideas would be much appreciated.
>
> Thanks
> Howard
>
> --
> Howard Lander <mailto:howard at renci.org> <howard at renci.org>
> Senior Research Software Developer
> Renaissance Computing Institute (RENCI) <http://www.renci.org> <http://www.renci.org>
> The University of North Carolina at Chapel Hill
> Duke University
> North Carolina State University
> 100 Europa Drive
> Suite 540
> Chapel Hill, NC 27517
> 919-445-9651
>
>
>
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>  _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> Howard Lander <howard at renci.org>
>
> Senior Research Software Developer
> Renaissance Computing Institute (RENCI) <http://www.renci.org>
> The University of North Carolina at Chapel Hill
> Duke University
> North Carolina State University
> 100 Europa Drive
> Suite 540
> Chapel Hill, NC 27517
> 919-445-9651
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120128/7101e4c5/attachment.html>

From e.antero.tammi at gmail.com  Sat Jan 28 13:15:37 2012
From: e.antero.tammi at gmail.com (eat)
Date: Sat, 28 Jan 2012 20:15:37 +0200
Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial or
	a bug?
Message-ID: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com>

Hi,

Short demonstration of the issue:
In []: sys.version
Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
In []: np.version.version
Out[]: '1.6.0'

In []: from numpy.polynomial import Polynomial as Poly
In []: def p_tst(c):
   ..:     p= Poly(c)
   ..:     r= p.roots()
   ..:     return sort(abs(p(r)))
   ..:

Now I would expect a result more like:
In []: p_tst(randn(123))[-3:]
Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])

be the case, but actually most result seems to be more like:
In []: p_tst(randn(123))[-3:]
Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
In []: p_tst(randn(123))[-3:]
Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
In []: p_tst(randn(123))[-3:]
Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
In []: p_tst(randn(123))[-3:]
Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
In []: p_tst(randn(123))[-3:]
Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])

So, does this phenomena imply that
- I'm testing with too high order polynomials (if so, does there exists a
definite upper limit of polynomial order I'll not face this issue)
or
- it's just the 'nature' of computations with float values (if so, probably
I should be able to tackle this regardless of the polynomial order)
or
- it's a nasty bug in class Polynomial


Regards,
eat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120128/c059c837/attachment.html>

From charlesr.harris at gmail.com  Sat Jan 28 16:14:17 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 28 Jan 2012 14:14:17 -0700
Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial
 or a bug?
In-Reply-To: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com>
References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com>
Message-ID: <CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com>

On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote:

> Hi,
>
> Short demonstration of the issue:
> In []: sys.version
> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]'
> In []: np.version.version
> Out[]: '1.6.0'
>
> In []: from numpy.polynomial import Polynomial as Poly
> In []: def p_tst(c):
>    ..:     p= Poly(c)
>    ..:     r= p.roots()
>    ..:     return sort(abs(p(r)))
>    ..:
>
> Now I would expect a result more like:
> In []: p_tst(randn(123))[-3:]
> Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])
>
> be the case, but actually most result seems to be more like:
> In []: p_tst(randn(123))[-3:]
> Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
> In []: p_tst(randn(123))[-3:]
> Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
> In []: p_tst(randn(123))[-3:]
> Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
> In []: p_tst(randn(123))[-3:]
> Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
> In []: p_tst(randn(123))[-3:]
> Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])
>
> So, does this phenomena imply that
> - I'm testing with too high order polynomials (if so, does there exists a
> definite upper limit of polynomial order I'll not face this issue)
> or
> - it's just the 'nature' of computations with float values (if so,
> probably I should be able to tackle this regardless of the polynomial order)
> or
> - it's a nasty bug in class Polynomial
>
>
It's a defect. You will get all the roots and the number will equal the
degree. I haven't decided what the best way to deal with this is, but my
thoughts have trended towards specifying an interval with the default being
the domain. If you have other thoughts I'd be glad for the feedback.

For the problem at hand, note first that you are specifying the
coefficients, not the roots as was the case with poly1d. Second, as a rule
of thumb, plain old polynomials will generally only be good for degree < 22
due to being numerically ill conditioned. If you are really looking to use
high degrees, Chebyshev or Legendre will work better, although you will
probably need to explicitly specify the domain. If you want to specify the
polynomial using roots, do Poly.fromroots(...). Third, for the high degrees
you are probably screwed anyway for degree 123, since the accuracy of the
root finding will be limited, especially for roots that can cluster, and
any root that falls even a little bit outside the interval [-1,1] (the
default domain) is going to evaluate to a big number simply because the
polynomial is going to h*ll at a rate you wouldn't believe ;)

For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, things
look good for degree 50, get a bit loose at degree 75 but can be fixed up
with one iteration of Newton, and blow up at degree 100. I think that's
pretty good, actually, doing better would require a lot more work. There
are some zero finding algorithms out there that might do better if someone
wants to give it a shot.

In [20]: p = Cheb.fromroots(linspace(-1, 1, 50))

In [21]: sort(abs(p(p.roots())))
Out[21]:
array([  6.20385459e-25,   1.65436123e-24,   2.06795153e-24,
         5.79026429e-24,   5.89366186e-24,   6.44916482e-24,
         6.44916482e-24,   6.77254127e-24,   6.97933642e-24,
         7.25459208e-24,   1.00295649e-23,   1.37391414e-23,
         1.37391414e-23,   1.63368171e-23,   2.39882378e-23,
         3.30872245e-23,   4.38405725e-23,   4.49502653e-23,
         4.49502653e-23,   5.58346913e-23,   8.35452419e-23,
         9.38407760e-23,   9.38407760e-23,   1.03703218e-22,
         1.03703218e-22,   1.23249911e-22,   1.75197880e-22,
         1.75197880e-22,   3.07711188e-22,   3.09821786e-22,
         3.09821786e-22,   4.56625520e-22,   4.56625520e-22,
         4.69638303e-22,   4.69638303e-22,   5.96448724e-22,
         5.96448724e-22,   1.24076485e-21,   1.24076485e-21,
         1.59972624e-21,   1.59972624e-21,   1.62930347e-21,
         1.62930347e-21,   1.73773328e-21,   1.73773328e-21,
         1.87935435e-21,   2.30287083e-21,   2.48815928e-21,
         2.85411753e-21,   2.85411753e-21])

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120128/d47a62d2/attachment.html>

From warren.weckesser at enthought.com  Mon Jan 30 02:17:27 2012
From: warren.weckesser at enthought.com (Warren Weckesser)
Date: Mon, 30 Jan 2012 01:17:27 -0600
Subject: [Numpy-discussion] ufunc delegation to object method
Message-ID: <CAM-+wY8M7m7y=+nXXhELkqQ9=wsrNzROFAUL8a1K2SNTao0mwA@mail.gmail.com>

In the following code, numpy.sin() calls the object's sin() function:

In [2]: class Foo(object):
   ...:     def sin(self):
   ...:         return "spam"
   ...:

In [3]: f = Foo()

In [4]: np.sin(f)
Out[4]: 'spam'

Is this, in fact, guaranteed behavior for a ufunc?  It does not appear to
be documented.

This question came up in the discussion of SciPy pull request 138 (
https://github.com/scipy/scipy/pull/138), where the idea is to add numpy
unary ufunc support to SciPy's sparse arrays.

(Sorry if this email shows up twice.   I sent it the first time while the
Enthought servers were down, and eventually got an email back saying it had
not been sent.)

Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/86109698/attachment.html>

From e.antero.tammi at gmail.com  Sun Jan 29 12:03:55 2012
From: e.antero.tammi at gmail.com (eat)
Date: Sun, 29 Jan 2012 19:03:55 +0200
Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial
 or a bug?
In-Reply-To: <CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com>
References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com>
	<CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com>
Message-ID: <CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com>

On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote:
>
>> Hi,
>>
>> Short demonstration of the issue:
>> In []: sys.version
>> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
>> (Intel)]'
>> In []: np.version.version
>> Out[]: '1.6.0'
>>
>> In []: from numpy.polynomial import Polynomial as Poly
>> In []: def p_tst(c):
>>    ..:     p= Poly(c)
>>    ..:     r= p.roots()
>>    ..:     return sort(abs(p(r)))
>>    ..:
>>
>> Now I would expect a result more like:
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])
>>
>> be the case, but actually most result seems to be more like:
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
>> In []: p_tst(randn(123))[-3:]
>> Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])
>>
>> So, does this phenomena imply that
>> - I'm testing with too high order polynomials (if so, does there exists a
>> definite upper limit of polynomial order I'll not face this issue)
>> or
>> - it's just the 'nature' of computations with float values (if so,
>> probably I should be able to tackle this regardless of the polynomial order)
>> or
>> - it's a nasty bug in class Polynomial
>>
>>
> It's a defect. You will get all the roots and the number will equal the
> degree. I haven't decided what the best way to deal with this is, but my
> thoughts have trended towards specifying an interval with the default being
> the domain. If you have other thoughts I'd be glad for the feedback.
>
> For the problem at hand, note first that you are specifying the
> coefficients, not the roots as was the case with poly1d. Second, as a rule
> of thumb, plain old polynomials will generally only be good for degree < 22
> due to being numerically ill conditioned. If you are really looking to use
> high degrees, Chebyshev or Legendre will work better, although you will
> probably need to explicitly specify the domain. If you want to specify the
> polynomial using roots, do Poly.fromroots(...). Third, for the high degrees
> you are probably screwed anyway for degree 123, since the accuracy of the
> root finding will be limited, especially for roots that can cluster, and
> any root that falls even a little bit outside the interval [-1,1] (the
> default domain) is going to evaluate to a big number simply because the
> polynomial is going to h*ll at a rate you wouldn't believe ;)
>
> For evenly spaced roots in [-1, 1] and using Chebyshev polynomials, things
> look good for degree 50, get a bit loose at degree 75 but can be fixed up
> with one iteration of Newton, and blow up at degree 100. I think that's
> pretty good, actually, doing better would require a lot more work. There
> are some zero finding algorithms out there that might do better if someone
> wants to give it a shot.
>
> In [20]: p = Cheb.fromroots(linspace(-1, 1, 50))
>
> In [21]: sort(abs(p(p.roots())))
> Out[21]:
> array([  6.20385459e-25,   1.65436123e-24,   2.06795153e-24,
>          5.79026429e-24,   5.89366186e-24,   6.44916482e-24,
>          6.44916482e-24,   6.77254127e-24,   6.97933642e-24,
>          7.25459208e-24,   1.00295649e-23,   1.37391414e-23,
>          1.37391414e-23,   1.63368171e-23,   2.39882378e-23,
>          3.30872245e-23,   4.38405725e-23,   4.49502653e-23,
>          4.49502653e-23,   5.58346913e-23,   8.35452419e-23,
>          9.38407760e-23,   9.38407760e-23,   1.03703218e-22,
>          1.03703218e-22,   1.23249911e-22,   1.75197880e-22,
>          1.75197880e-22,   3.07711188e-22,   3.09821786e-22,
>          3.09821786e-22,   4.56625520e-22,   4.56625520e-22,
>          4.69638303e-22,   4.69638303e-22,   5.96448724e-22,
>          5.96448724e-22,   1.24076485e-21,   1.24076485e-21,
>          1.59972624e-21,   1.59972624e-21,   1.62930347e-21,
>          1.62930347e-21,   1.73773328e-21,   1.73773328e-21,
>          1.87935435e-21,   2.30287083e-21,   2.48815928e-21,
>          2.85411753e-21,   2.85411753e-21])
>
Thanks,

for a very informative feedback. I'll study those orthogonal polynomials
more detail.


Regards,
- eat

>
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120129/3e982447/attachment.html>

From charlesr.harris at gmail.com  Mon Jan 30 08:55:18 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 30 Jan 2012 06:55:18 -0700
Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial
 or a bug?
In-Reply-To: <CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com>
References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com>
	<CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com>
	<CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com>
Message-ID: <CAB6mnxLRXvUdL04Vjvxzo0MrhT7BJkqV4JQFF9iHMu-zbzLrkQ@mail.gmail.com>

On Sun, Jan 29, 2012 at 10:03 AM, eat <e.antero.tammi at gmail.com> wrote:

> On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Short demonstration of the issue:
>>> In []: sys.version
>>> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
>>> (Intel)]'
>>> In []: np.version.version
>>> Out[]: '1.6.0'
>>>
>>> In []: from numpy.polynomial import Polynomial as Poly
>>> In []: def p_tst(c):
>>>    ..:     p= Poly(c)
>>>    ..:     r= p.roots()
>>>    ..:     return sort(abs(p(r)))
>>>    ..:
>>>
>>> Now I would expect a result more like:
>>> In []: p_tst(randn(123))[-3:]
>>> Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])
>>>
>>> be the case, but actually most result seems to be more like:
>>> In []: p_tst(randn(123))[-3:]
>>> Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
>>> In []: p_tst(randn(123))[-3:]
>>> Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
>>> In []: p_tst(randn(123))[-3:]
>>> Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
>>> In []: p_tst(randn(123))[-3:]
>>> Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
>>> In []: p_tst(randn(123))[-3:]
>>> Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])
>>>
>>> So, does this phenomena imply that
>>> - I'm testing with too high order polynomials (if so, does there exists
>>> a definite upper limit of polynomial order I'll not face this issue)
>>> or
>>> - it's just the 'nature' of computations with float values (if so,
>>> probably I should be able to tackle this regardless of the polynomial order)
>>> or
>>> - it's a nasty bug in class Polynomial
>>>
>>>
>> It's a defect. You will get all the roots and the number will equal the
>> degree. I haven't decided what the best way to deal with this is, but my
>> thoughts have trended towards specifying an interval with the default being
>> the domain. If you have other thoughts I'd be glad for the feedback.
>>
>> For the problem at hand, note first that you are specifying the
>> coefficients, not the roots as was the case with poly1d. Second, as a rule
>> of thumb, plain old polynomials will generally only be good for degree < 22
>> due to being numerically ill conditioned. If you are really looking to use
>> high degrees, Chebyshev or Legendre will work better, although you will
>> probably need to explicitly specify the domain. If you want to specify the
>> polynomial using roots, do Poly.fromroots(...). Third, for the high degrees
>> you are probably screwed anyway for degree 123, since the accuracy of the
>> root finding will be limited, especially for roots that can cluster, and
>> any root that falls even a little bit outside the interval [-1,1] (the
>> default domain) is going to evaluate to a big number simply because the
>> polynomial is going to h*ll at a rate you wouldn't believe ;)
>>
>> For evenly spaced roots in [-1, 1] and using Chebyshev polynomials,
>> things look good for degree 50, get a bit loose at degree 75 but can be
>> fixed up with one iteration of Newton, and blow up at degree 100. I think
>> that's pretty good, actually, doing better would require a lot more work.
>> There are some zero finding algorithms out there that might do better if
>> someone wants to give it a shot.
>>
>> In [20]: p = Cheb.fromroots(linspace(-1, 1, 50))
>>
>> In [21]: sort(abs(p(p.roots())))
>> Out[21]:
>> array([  6.20385459e-25,   1.65436123e-24,   2.06795153e-24,
>>          5.79026429e-24,   5.89366186e-24,   6.44916482e-24,
>>          6.44916482e-24,   6.77254127e-24,   6.97933642e-24,
>>          7.25459208e-24,   1.00295649e-23,   1.37391414e-23,
>>          1.37391414e-23,   1.63368171e-23,   2.39882378e-23,
>>          3.30872245e-23,   4.38405725e-23,   4.49502653e-23,
>>          4.49502653e-23,   5.58346913e-23,   8.35452419e-23,
>>          9.38407760e-23,   9.38407760e-23,   1.03703218e-22,
>>          1.03703218e-22,   1.23249911e-22,   1.75197880e-22,
>>          1.75197880e-22,   3.07711188e-22,   3.09821786e-22,
>>          3.09821786e-22,   4.56625520e-22,   4.56625520e-22,
>>          4.69638303e-22,   4.69638303e-22,   5.96448724e-22,
>>          5.96448724e-22,   1.24076485e-21,   1.24076485e-21,
>>          1.59972624e-21,   1.59972624e-21,   1.62930347e-21,
>>          1.62930347e-21,   1.73773328e-21,   1.73773328e-21,
>>          1.87935435e-21,   2.30287083e-21,   2.48815928e-21,
>>          2.85411753e-21,   2.85411753e-21])
>>
> Thanks,
>
> for a very informative feedback. I'll study those orthogonal polynomials
> more detail.
>
>
That said, I'm thinking it might be possible to get a more accurate
polynomial representation from the zeros by going through a barycentric
form rather than simply multiplying the factors together as is done now.
Hmm...

For evenly spaced roots the polynomial grows in amplitude rapidly at the
ends which leads to numerical problems because a small error in the zeros
turns into a large error in value because of the steepness of the curve at
the zeroes. I've attached a semilogy plot of the absolute values of the
polynomial with 30 equally spaced zeroes from -1 to 1.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/51018011/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: polyplot.png
Type: image/png
Size: 42262 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/51018011/attachment.png>

From rainexpected at theo.to  Mon Jan 30 10:25:26 2012
From: rainexpected at theo.to (Ted To)
Date: Mon, 30 Jan 2012 10:25:26 -0500
Subject: [Numpy-discussion] Addressing arrays
Message-ID: <4F26B666.4090108@theo.to>

Hi,

Is there some straightforward way to access an array by values across a
subset of its dimensions?  For example, if I have a three dimensional
array a=(x,y,z), can I look at the values of z given particular values
for x and y?

Thanks,
Ted


From chaoyuejoy at gmail.com  Mon Jan 30 10:27:06 2012
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Mon, 30 Jan 2012 16:27:06 +0100
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <4F26B666.4090108@theo.to>
References: <4F26B666.4090108@theo.to>
Message-ID: <CAAN-aRHHHW2iEJ+sJ5CGWNM6fL1_K4LkCzACU3PqP2ENXH-HyQ@mail.gmail.com>

I am afraid you have to write index inquire function by yourself. I did
like this.

chao

2012/1/30 Ted To <rainexpected at theo.to>

> Hi,
>
> Is there some straightforward way to access an array by values across a
> subset of its dimensions?  For example, if I have a three dimensional
> array a=(x,y,z), can I look at the values of z given particular values
> for x and y?
>
> Thanks,
> Ted
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/a3c81e4b/attachment.html>

From malcolm.reynolds at gmail.com  Mon Jan 30 10:26:59 2012
From: malcolm.reynolds at gmail.com (Malcolm Reynolds)
Date: Mon, 30 Jan 2012 15:26:59 +0000
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <4F26B666.4090108@theo.to>
References: <4F26B666.4090108@theo.to>
Message-ID: <CAO1Gn59xe-znR_3x8-4G1hAJmSDD61pQjKcQaB8_xXHyp+f-XQ@mail.gmail.com>

On Mon, Jan 30, 2012 at 3:25 PM, Ted To <rainexpected at theo.to> wrote:
> Is there some straightforward way to access an array by values across a
> subset of its dimensions? ?For example, if I have a three dimensional
> array a=(x,y,z), can I look at the values of z given particular values
> for x and y?

a[x, y, :] should get you what you want I believe..

Malcolm


From zachary.pincus at yale.edu  Mon Jan 30 10:28:38 2012
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Mon, 30 Jan 2012 10:28:38 -0500
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <4F26B666.4090108@theo.to>
References: <4F26B666.4090108@theo.to>
Message-ID: <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>

a[x,y,:]

Read the slicing part of the tutorial:
http://www.scipy.org/Tentative_NumPy_Tutorial 
(section 1.6)

And the documentation:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html


On Jan 30, 2012, at 10:25 AM, Ted To wrote:

> Hi,
> 
> Is there some straightforward way to access an array by values across a
> subset of its dimensions?  For example, if I have a three dimensional
> array a=(x,y,z), can I look at the values of z given particular values
> for x and y?
> 
> Thanks,
> Ted
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From chaoyuejoy at gmail.com  Mon Jan 30 10:33:05 2012
From: chaoyuejoy at gmail.com (Chao YUE)
Date: Mon, 30 Jan 2012 16:33:05 +0100
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>
References: <4F26B666.4090108@theo.to>
	<66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>
Message-ID: <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>

he is not asking for slicing. he is asking for how to index array by
element value but not element index.

2012/1/30 Zachary Pincus <zachary.pincus at yale.edu>

> a[x,y,:]
>
> Read the slicing part of the tutorial:
> http://www.scipy.org/Tentative_NumPy_Tutorial
> (section 1.6)
>
> And the documentation:
> http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
>
>
>
> On Jan 30, 2012, at 10:25 AM, Ted To wrote:
>
> > Hi,
> >
> > Is there some straightforward way to access an array by values across a
> > subset of its dimensions?  For example, if I have a three dimensional
> > array a=(x,y,z), can I look at the values of z given particular values
> > for x and y?
> >
> > Thanks,
> > Ted
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/cbd0da1e/attachment.html>

From zachary.pincus at yale.edu  Mon Jan 30 10:50:01 2012
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Mon, 30 Jan 2012 10:50:01 -0500
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>
References: <4F26B666.4090108@theo.to>
	<66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>
	<CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>
Message-ID: <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu>

Ted, can you clarify what you're asking for? Maybe give a trivial example of an array and the desired output?

I'm pretty sure this is a slicing question though:
> If I have a three dimensional array a=(x,y,z), can I look at the values of z given particular values for x and y?
Given that element values are scalars in this case, and indices are (x,y,z) triples, it seems likely that looking for "values of z" given an (x,y) pair is an slicing-by-index question, no?

For indexing-by-value, "fancy indexing" with boolean masks is usually the way to go... again, Ted (or Chao), if you can describe your indexing needs in a bit more detail, it's often easy to find a compact slicing and/or fancy-indexing strategy that works well and reasonably efficiently.

Zach


On Jan 30, 2012, at 10:33 AM, Chao YUE wrote:

> he is not asking for slicing. he is asking for how to index array by element value but not element index.
> 
> 2012/1/30 Zachary Pincus <zachary.pincus at yale.edu>
> a[x,y,:]
> 
> Read the slicing part of the tutorial:
> http://www.scipy.org/Tentative_NumPy_Tutorial
> (section 1.6)
> 
> And the documentation:
> http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
> 
> 
> 
> On Jan 30, 2012, at 10:25 AM, Ted To wrote:
> 
> > Hi,
> >
> > Is there some straightforward way to access an array by values across a
> > subset of its dimensions?  For example, if I have a three dimensional
> > array a=(x,y,z), can I look at the values of z given particular values
> > for x and y?
> >
> > Thanks,
> > Ted
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> -- 
> ***********************************************************************************
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
> ************************************************************************************
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From rainexpected at theo.to  Mon Jan 30 11:57:12 2012
From: rainexpected at theo.to (Ted To)
Date: Mon, 30 Jan 2012 11:57:12 -0500
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu>
References: <4F26B666.4090108@theo.to>	<66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>	<CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>
	<19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu>
Message-ID: <4F26CBE8.6090703@theo.to>

Sure thing.  To keep it simple suppose I have just a two dimensional
array (time,output):
[(1,2),(2,3),(3,4)]
I would like to look at all values of output for which, for example time==2.

My actual application has a six dimensional array and I'd like to look
at the contents using one or more of the first three dimensions.

Many thanks,
Ted

On 01/30/2012 10:50 AM, Zachary Pincus wrote:
> Ted, can you clarify what you're asking for? Maybe give a trivial example of an array and the desired output?
> 
> I'm pretty sure this is a slicing question though:
>> If I have a three dimensional array a=(x,y,z), can I look at the values of z given particular values for x and y?
> Given that element values are scalars in this case, and indices are (x,y,z) triples, it seems likely that looking for "values of z" given an (x,y) pair is an slicing-by-index question, no?
> 
> For indexing-by-value, "fancy indexing" with boolean masks is usually the way to go... again, Ted (or Chao), if you can describe your indexing needs in a bit more detail, it's often easy to find a compact slicing and/or fancy-indexing strategy that works well and reasonably efficiently.
> 
> Zach
> 
> 
> 
> On Jan 30, 2012, at 10:33 AM, Chao YUE wrote:
> 
>> he is not asking for slicing. he is asking for how to index array by element value but not element index.
>>
>> 2012/1/30 Zachary Pincus <zachary.pincus at yale.edu>
>> a[x,y,:]
>>
>> Read the slicing part of the tutorial:
>> http://www.scipy.org/Tentative_NumPy_Tutorial
>> (section 1.6)
>>
>> And the documentation:
>> http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
>>
>>
>>
>> On Jan 30, 2012, at 10:25 AM, Ted To wrote:
>>
>>> Hi,
>>>
>>> Is there some straightforward way to access an array by values across a
>>> subset of its dimensions?  For example, if I have a three dimensional
>>> array a=(x,y,z), can I look at the values of z given particular values
>>> for x and y?
>>>
>>> Thanks,
>>> Ted
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> -- 
>> ***********************************************************************************
>> Chao YUE
>> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
>> UMR 1572 CEA-CNRS-UVSQ
>> Batiment 712 - Pe 119
>> 91191 GIF Sur YVETTE Cedex
>> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>> ************************************************************************************
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From brett.olsen at gmail.com  Mon Jan 30 12:13:33 2012
From: brett.olsen at gmail.com (Brett Olsen)
Date: Mon, 30 Jan 2012 11:13:33 -0600
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <4F26CBE8.6090703@theo.to>
References: <4F26B666.4090108@theo.to>
	<66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>
	<CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>
	<19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu>
	<4F26CBE8.6090703@theo.to>
Message-ID: <CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com>

On Mon, Jan 30, 2012 at 10:57 AM, Ted To <rainexpected at theo.to> wrote:
> Sure thing. ?To keep it simple suppose I have just a two dimensional
> array (time,output):
> [(1,2),(2,3),(3,4)]
> I would like to look at all values of output for which, for example time==2.
>
> My actual application has a six dimensional array and I'd like to look
> at the contents using one or more of the first three dimensions.
>
> Many thanks,
> Ted

Couldn't you just do something like this with boolean indexing:

In [1]: import numpy as np

In [2]: a = np.array([(1,2),(2,3),(3,4)])

In [3]: a
Out[3]:
array([[1, 2],
       [2, 3],
       [3, 4]])

In [4]: mask = a[:,0] == 2

In [5]: mask
Out[5]: array([False,  True, False], dtype=bool)

In [6]: a[mask,1]
Out[6]: array([3])

~Brett


From rainexpected at theo.to  Mon Jan 30 12:31:55 2012
From: rainexpected at theo.to (Ted To)
Date: Mon, 30 Jan 2012 12:31:55 -0500
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com>
References: <4F26B666.4090108@theo.to>	<66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>	<CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>	<19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu>	<4F26CBE8.6090703@theo.to>
	<CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com>
Message-ID: <4F26D40B.4040303@theo.to>

On 01/30/2012 12:13 PM, Brett Olsen wrote:
> On Mon, Jan 30, 2012 at 10:57 AM, Ted To <rainexpected at theo.to> wrote:
>> Sure thing.  To keep it simple suppose I have just a two dimensional
>> array (time,output):
>> [(1,2),(2,3),(3,4)]
>> I would like to look at all values of output for which, for example time==2.
>>
>> My actual application has a six dimensional array and I'd like to look
>> at the contents using one or more of the first three dimensions.
>>
>> Many thanks,
>> Ted
> 
> Couldn't you just do something like this with boolean indexing:
> 
> In [1]: import numpy as np
> 
> In [2]: a = np.array([(1,2),(2,3),(3,4)])
> 
> In [3]: a
> Out[3]:
> array([[1, 2],
>        [2, 3],
>        [3, 4]])
> 
> In [4]: mask = a[:,0] == 2
> 
> In [5]: mask
> Out[5]: array([False,  True, False], dtype=bool)
> 
> In [6]: a[mask,1]
> Out[6]: array([3])
> 
> ~Brett

Thanks!  That works great if I only want to search over one index but I
can't quite figure out what to do with more than a single index.  So
suppose I have a labeled, multidimensional array with labels 'month',
'year' and 'quantity'.  a[['month','year']] gives me an array of indices
but "a[['month','year']]==(1,1960)" produces "False".  I'm sure I simply
don't know the proper syntax and I apologize for that -- I'm kind of new
to numpy.

Ted


From zachary.pincus at yale.edu  Mon Jan 30 13:29:38 2012
From: zachary.pincus at yale.edu (Zachary Pincus)
Date: Mon, 30 Jan 2012 13:29:38 -0500
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <4F26D40B.4040303@theo.to>
References: <4F26B666.4090108@theo.to>	<66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>	<CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>	<19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu>	<4F26CBE8.6090703@theo.to>
	<CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com>
	<4F26D40B.4040303@theo.to>
Message-ID: <E42CB68E-F4D5-4E34-817C-46A89053CA8B@yale.edu>

> Thanks!  That works great if I only want to search over one index but I
> can't quite figure out what to do with more than a single index.  So
> suppose I have a labeled, multidimensional array with labels 'month',
> 'year' and 'quantity'.  a[['month','year']] gives me an array of indices
> but "a[['month','year']]==(1,1960)" produces "False".  I'm sure I simply
> don't know the proper syntax and I apologize for that -- I'm kind of new
> to numpy.

I think that your best bet is to form the boolean masks independently and then logical-and them together:

mask = (a['month'] == 1) & (a['year'] == 1960)
jan_60 = a[mask]

Someone might have more insight here. Though I should note that if you have large data and are doing lots of "queries" like this, a more database-ish approach might be better. Something like sqlite's python bindings, or PyTables. 
Alternately, if your data are all time-series based things, PANDAS might be worth looking at. 

But the above approach should be just fine for non-huge datasets...

Zach

From brett.olsen at gmail.com  Mon Jan 30 13:30:58 2012
From: brett.olsen at gmail.com (Brett Olsen)
Date: Mon, 30 Jan 2012 12:30:58 -0600
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <4F26D40B.4040303@theo.to>
References: <4F26B666.4090108@theo.to>
	<66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>
	<CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>
	<19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu>
	<4F26CBE8.6090703@theo.to>
	<CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com>
	<4F26D40B.4040303@theo.to>
Message-ID: <CAFq1z2VfqZL-x1vFxxpEEH25sYaxzr4bHa01KQLSUJ-B912W=A@mail.gmail.com>

On Mon, Jan 30, 2012 at 11:31 AM, Ted To <rainexpected at theo.to> wrote:
> On 01/30/2012 12:13 PM, Brett Olsen wrote:
>> On Mon, Jan 30, 2012 at 10:57 AM, Ted To <rainexpected at theo.to> wrote:
>>> Sure thing. ?To keep it simple suppose I have just a two dimensional
>>> array (time,output):
>>> [(1,2),(2,3),(3,4)]
>>> I would like to look at all values of output for which, for example time==2.
>>>
>>> My actual application has a six dimensional array and I'd like to look
>>> at the contents using one or more of the first three dimensions.
>>>
>>> Many thanks,
>>> Ted
>>
>> Couldn't you just do something like this with boolean indexing:
>>
>> In [1]: import numpy as np
>>
>> In [2]: a = np.array([(1,2),(2,3),(3,4)])
>>
>> In [3]: a
>> Out[3]:
>> array([[1, 2],
>> ? ? ? ?[2, 3],
>> ? ? ? ?[3, 4]])
>>
>> In [4]: mask = a[:,0] == 2
>>
>> In [5]: mask
>> Out[5]: array([False, ?True, False], dtype=bool)
>>
>> In [6]: a[mask,1]
>> Out[6]: array([3])
>>
>> ~Brett
>
> Thanks! ?That works great if I only want to search over one index but I
> can't quite figure out what to do with more than a single index. ?So
> suppose I have a labeled, multidimensional array with labels 'month',
> 'year' and 'quantity'. ?a[['month','year']] gives me an array of indices
> but "a[['month','year']]==(1,1960)" produces "False". ?I'm sure I simply
> don't know the proper syntax and I apologize for that -- I'm kind of new
> to numpy.
>
> Ted

You'd want to update your mask appropriately to get everything you
want to select, one criteria at a time e.g.:
mask = a[:,0] == 1
mask &= a[:,1] == 1960

Alternatively:
mask = (a[:,0] == 1) & (a[:,1] == 1960)
but be careful with the parens, & and | are normally high-priority
bitwise operators and if you leave the parens out, it will try to
bitwise-and 1 and a[:,1] and throw an error.

If you've got a ton of parameters, you can combine these more
aesthetically with:
mask = (a[:,[0,1]] == [1, 1960]).all(axis=1)

~Brett


From rainexpected at theo.to  Mon Jan 30 13:39:13 2012
From: rainexpected at theo.to (Ted To)
Date: Mon, 30 Jan 2012 13:39:13 -0500
Subject: [Numpy-discussion] Addressing arrays
In-Reply-To: <CAFq1z2VfqZL-x1vFxxpEEH25sYaxzr4bHa01KQLSUJ-B912W=A@mail.gmail.com>
References: <4F26B666.4090108@theo.to>	<66712633-1AC0-4349-8AEA-269B47FA8C4E@yale.edu>	<CAAN-aRF=7cgJPQsOyrEdoqdDemeE-Yg2qXmf7a7gt2Mn+H6KQw@mail.gmail.com>	<19CEBDA4-ABDB-403F-8694-A081C34E3CC8@yale.edu>	<4F26CBE8.6090703@theo.to>	<CAFq1z2VkqrxS7QoG6gKTs5Qe6HrG66ycu-EYe9Xbm7ZMoJgZqg@mail.gmail.com>	<4F26D40B.4040303@theo.to>
	<CAFq1z2VfqZL-x1vFxxpEEH25sYaxzr4bHa01KQLSUJ-B912W=A@mail.gmail.com>
Message-ID: <4F26E3D1.2000700@theo.to>

> You'd want to update your mask appropriately to get everything you
> want to select, one criteria at a time e.g.:
> mask = a[:,0] == 1
> mask &= a[:,1] == 1960
> 
> Alternatively:
> mask = (a[:,0] == 1) & (a[:,1] == 1960)
> but be careful with the parens, & and | are normally high-priority
> bitwise operators and if you leave the parens out, it will try to
> bitwise-and 1 and a[:,1] and throw an error.
> 
> If you've got a ton of parameters, you can combine these more
> aesthetically with:
> mask = (a[:,[0,1]] == [1, 1960]).all(axis=1)
> 
> ~Brett

Zach and Brett,

Many thanks -- that is exactly what I need.

Cheers,
Ted


From ruby185 at gmail.com  Mon Jan 30 14:21:03 2012
From: ruby185 at gmail.com (Ruby Stevenson)
Date: Mon, 30 Jan 2012 14:21:03 -0500
Subject: [Numpy-discussion] histogram help
Message-ID: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com>

hi, all

I am trying to figure out how to do histogram with numpy

I have a three-dimension array A[x,y,z],  another array (bins) has
been allocated along Z dimension, z'

how can I get the histogram of H[ x, y, z' ]?

thanks for your help.

Ruby


From ruby185 at gmail.com  Mon Jan 30 14:21:43 2012
From: ruby185 at gmail.com (Ruby Stevenson)
Date: Mon, 30 Jan 2012 14:21:43 -0500
Subject: [Numpy-discussion] condense array along one dimension
In-Reply-To: <CAFXk4bp7X1H9Qe3-jb5AMCL_ufuK5ASx0emczpLfFYV5f6r7ZQ@mail.gmail.com>
References: <CAA=a5iPGU5+jxRbsgbrfjn83OZQ_eRJWzPGjp+3P+cSS7A9GbA@mail.gmail.com>
	<CAFXk4bp7X1H9Qe3-jb5AMCL_ufuK5ASx0emczpLfFYV5f6r7ZQ@mail.gmail.com>
Message-ID: <CAA=a5iPCWp-JxBk49KazeXrar6XR-R2kFvtaGk9+ekCsHm5Tmw@mail.gmail.com>

I think this is exactly what I need. Thanks for your help, Olivier.

Ruby

On Fri, Jan 20, 2012 at 9:50 AM, Olivier Delalleau <shish at keba.be> wrote:
> What do you mean by "summarize"?
> If for instance you want to sum along Y, just do
> ? my_array.sum(axis=1)
>
> -=- Olivier
>
> 2012/1/20 Ruby Stevenson <ruby185 at gmail.com>
>>
>> hi, all
>>
>> Say I have a three dimension array, X, Y, Z, ?how can I condense into
>> two dimensions: for example, compute 2-D array with (X, Z) and
>> summarize along Y dimensions ... is it possible?
>>
>> thanks
>>
>> Ruby
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From ruby185 at gmail.com  Mon Jan 30 14:27:15 2012
From: ruby185 at gmail.com (Ruby Stevenson)
Date: Mon, 30 Jan 2012 14:27:15 -0500
Subject: [Numpy-discussion] histogram help
In-Reply-To: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com>
References: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com>
Message-ID: <CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com>

Sorry, I realize I didn't describe the problem completely clear or correct.

the (x,y) in this case is just many co-ordinates, and  each coordinate
has a list of values (Z value) associated with it.  The bins are
allocated for the Z.

I hope this clarify things a little. Thanks again.

Ruby


On Mon, Jan 30, 2012 at 2:21 PM, Ruby Stevenson <ruby185 at gmail.com> wrote:
> hi, all
>
> I am trying to figure out how to do histogram with numpy
>
> I have a three-dimension array A[x,y,z], ?another array (bins) has
> been allocated along Z dimension, z'
>
> how can I get the histogram of H[ x, y, z' ]?
>
> thanks for your help.
>
> Ruby


From charlesr.harris at gmail.com  Mon Jan 30 15:15:47 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 30 Jan 2012 13:15:47 -0700
Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial
 or a bug?
In-Reply-To: <CAB6mnxLRXvUdL04Vjvxzo0MrhT7BJkqV4JQFF9iHMu-zbzLrkQ@mail.gmail.com>
References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com>
	<CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com>
	<CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com>
	<CAB6mnxLRXvUdL04Vjvxzo0MrhT7BJkqV4JQFF9iHMu-zbzLrkQ@mail.gmail.com>
Message-ID: <CAB6mnx+vo4Fj2GK6kpB2j22=m+vo=rz4wdPW=hReHvKLjJ4aPA@mail.gmail.com>

On Mon, Jan 30, 2012 at 6:55 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sun, Jan 29, 2012 at 10:03 AM, eat <e.antero.tammi at gmail.com> wrote:
>
>> On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Short demonstration of the issue:
>>>> In []: sys.version
>>>> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
>>>> (Intel)]'
>>>> In []: np.version.version
>>>> Out[]: '1.6.0'
>>>>
>>>> In []: from numpy.polynomial import Polynomial as Poly
>>>> In []: def p_tst(c):
>>>>    ..:     p= Poly(c)
>>>>    ..:     r= p.roots()
>>>>    ..:     return sort(abs(p(r)))
>>>>    ..:
>>>>
>>>> Now I would expect a result more like:
>>>> In []: p_tst(randn(123))[-3:]
>>>> Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])
>>>>
>>>> be the case, but actually most result seems to be more like:
>>>> In []: p_tst(randn(123))[-3:]
>>>> Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
>>>> In []: p_tst(randn(123))[-3:]
>>>> Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
>>>> In []: p_tst(randn(123))[-3:]
>>>> Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
>>>> In []: p_tst(randn(123))[-3:]
>>>> Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
>>>> In []: p_tst(randn(123))[-3:]
>>>> Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])
>>>>
>>>> So, does this phenomena imply that
>>>> - I'm testing with too high order polynomials (if so, does there exists
>>>> a definite upper limit of polynomial order I'll not face this issue)
>>>> or
>>>> - it's just the 'nature' of computations with float values (if so,
>>>> probably I should be able to tackle this regardless of the polynomial order)
>>>> or
>>>> - it's a nasty bug in class Polynomial
>>>>
>>>>
>>> It's a defect. You will get all the roots and the number will equal the
>>> degree. I haven't decided what the best way to deal with this is, but my
>>> thoughts have trended towards specifying an interval with the default being
>>> the domain. If you have other thoughts I'd be glad for the feedback.
>>>
>>> For the problem at hand, note first that you are specifying the
>>> coefficients, not the roots as was the case with poly1d. Second, as a rule
>>> of thumb, plain old polynomials will generally only be good for degree < 22
>>> due to being numerically ill conditioned. If you are really looking to use
>>> high degrees, Chebyshev or Legendre will work better, although you will
>>> probably need to explicitly specify the domain. If you want to specify the
>>> polynomial using roots, do Poly.fromroots(...). Third, for the high degrees
>>> you are probably screwed anyway for degree 123, since the accuracy of the
>>> root finding will be limited, especially for roots that can cluster, and
>>> any root that falls even a little bit outside the interval [-1,1] (the
>>> default domain) is going to evaluate to a big number simply because the
>>> polynomial is going to h*ll at a rate you wouldn't believe ;)
>>>
>>> For evenly spaced roots in [-1, 1] and using Chebyshev polynomials,
>>> things look good for degree 50, get a bit loose at degree 75 but can be
>>> fixed up with one iteration of Newton, and blow up at degree 100. I think
>>> that's pretty good, actually, doing better would require a lot more work.
>>> There are some zero finding algorithms out there that might do better if
>>> someone wants to give it a shot.
>>>
>>> In [20]: p = Cheb.fromroots(linspace(-1, 1, 50))
>>>
>>> In [21]: sort(abs(p(p.roots())))
>>> Out[21]:
>>> array([  6.20385459e-25,   1.65436123e-24,   2.06795153e-24,
>>>          5.79026429e-24,   5.89366186e-24,   6.44916482e-24,
>>>          6.44916482e-24,   6.77254127e-24,   6.97933642e-24,
>>>          7.25459208e-24,   1.00295649e-23,   1.37391414e-23,
>>>          1.37391414e-23,   1.63368171e-23,   2.39882378e-23,
>>>          3.30872245e-23,   4.38405725e-23,   4.49502653e-23,
>>>          4.49502653e-23,   5.58346913e-23,   8.35452419e-23,
>>>          9.38407760e-23,   9.38407760e-23,   1.03703218e-22,
>>>          1.03703218e-22,   1.23249911e-22,   1.75197880e-22,
>>>          1.75197880e-22,   3.07711188e-22,   3.09821786e-22,
>>>          3.09821786e-22,   4.56625520e-22,   4.56625520e-22,
>>>          4.69638303e-22,   4.69638303e-22,   5.96448724e-22,
>>>          5.96448724e-22,   1.24076485e-21,   1.24076485e-21,
>>>          1.59972624e-21,   1.59972624e-21,   1.62930347e-21,
>>>          1.62930347e-21,   1.73773328e-21,   1.73773328e-21,
>>>          1.87935435e-21,   2.30287083e-21,   2.48815928e-21,
>>>          2.85411753e-21,   2.85411753e-21])
>>>
>> Thanks,
>>
>> for a very informative feedback. I'll study those orthogonal polynomials
>> more detail.
>>
>>
> That said, I'm thinking it might be possible to get a more accurate
> polynomial representation from the zeros by going through a barycentric
> form rather than simply multiplying the factors together as is done now.
> Hmm...
>
> For evenly spaced roots the polynomial grows in amplitude rapidly at the
> ends which leads to numerical problems because a small error in the zeros
> turns into a large error in value because of the steepness of the curve at
> the zeroes. I've attached a semilogy plot of the absolute values of the
> polynomial with 30 equally spaced zeroes from -1 to 1.
>
>

I've attached a plot of the Chebyshev coefficients for the monic polynomial
with 50 zeros evenly spaced from -1, 1. The odd coefficients should be
zero, so their value tells you what the error in the coefficient
determination was (I used Gauss-Chebyshev integration). The value of the
resulting Chebyshev series cannot be evaluated with sufficient accuracy in
double precision due to the dynamic range of the coefficients and I expect
that simple inability of double precision to correctly represent the values
extends to the root finding.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/7a88b60e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chebcoef-deg50.png
Type: image/png
Size: 41467 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/7a88b60e/attachment.png>

From scipy at samueljohn.de  Mon Jan 30 17:06:22 2012
From: scipy at samueljohn.de (Samuel John)
Date: Mon, 30 Jan 2012 23:06:22 +0100
Subject: [Numpy-discussion] histogram help
In-Reply-To: <CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com>
References: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com>
	<CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com>
Message-ID: <F1916220-7CB1-4DE8-A22F-BE3709E56EC7@samueljohn.de>

Hi Ruby,

I still do not fully understand your question but what I do in such cases is to construct a very simple array and test the functions.
The help of numpy.histogram2d or numpy.histogramdd (for more than two dims) might help here.

So I guess, basically you want to ignore the x,y positions and just look at the combined distribution of the Z values?
In this case, you would just need the numpy.histogram (the 1d version).

Note that the histogram returns the numbers and the bin-borders.

bests
 Samuel


On 30.01.2012, at 20:27, Ruby Stevenson wrote:

> Sorry, I realize I didn't describe the problem completely clear or correct.
> 
> the (x,y) in this case is just many co-ordinates, and  each coordinate
> has a list of values (Z value) associated with it.  The bins are
> allocated for the Z.
> 
> I hope this clarify things a little. Thanks again.
> 
> Ruby
> 
> 
> 
> 
> On Mon, Jan 30, 2012 at 2:21 PM, Ruby Stevenson <ruby185 at gmail.com> wrote:
>> hi, all
>> 
>> I am trying to figure out how to do histogram with numpy
>> 
>> I have a three-dimension array A[x,y,z],  another array (bins) has
>> been allocated along Z dimension, z'
>> 
>> how can I get the histogram of H[ x, y, z' ]?
>> 
>> thanks for your help.
>> 
>> Ruby
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From charlesr.harris at gmail.com  Mon Jan 30 17:20:57 2012
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 30 Jan 2012 15:20:57 -0700
Subject: [Numpy-discussion] Unrealistic expectations of class Polynomial
 or a bug?
In-Reply-To: <CAB6mnx+vo4Fj2GK6kpB2j22=m+vo=rz4wdPW=hReHvKLjJ4aPA@mail.gmail.com>
References: <CAKa=AYQfWLVYS1LFRcbAX+ELnpK4pmze3g7B5xvR=MN_v1-e6w@mail.gmail.com>
	<CAB6mnx+jKy05P5BCM+4_tvYa8zQJQ=HmcwdG_GyEo_7_-pEFMw@mail.gmail.com>
	<CAKa=AYSEp+d4L3ds8ih8BSAe3MX6t30MhzX47BQn+z5snn++xg@mail.gmail.com>
	<CAB6mnxLRXvUdL04Vjvxzo0MrhT7BJkqV4JQFF9iHMu-zbzLrkQ@mail.gmail.com>
	<CAB6mnx+vo4Fj2GK6kpB2j22=m+vo=rz4wdPW=hReHvKLjJ4aPA@mail.gmail.com>
Message-ID: <CAB6mnxJ5rkJcUfT8bH5g6-yDnYN_KUc9U_EnEQFEUk_-9kh5Wg@mail.gmail.com>

On Mon, Jan 30, 2012 at 1:15 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Mon, Jan 30, 2012 at 6:55 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sun, Jan 29, 2012 at 10:03 AM, eat <e.antero.tammi at gmail.com> wrote:
>>
>>> On Sat, Jan 28, 2012 at 11:14 PM, Charles R Harris <
>>> charlesr.harris at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Jan 28, 2012 at 11:15 AM, eat <e.antero.tammi at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Short demonstration of the issue:
>>>>> In []: sys.version
>>>>> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
>>>>> (Intel)]'
>>>>> In []: np.version.version
>>>>> Out[]: '1.6.0'
>>>>>
>>>>> In []: from numpy.polynomial import Polynomial as Poly
>>>>> In []: def p_tst(c):
>>>>>    ..:     p= Poly(c)
>>>>>    ..:     r= p.roots()
>>>>>    ..:     return sort(abs(p(r)))
>>>>>    ..:
>>>>>
>>>>> Now I would expect a result more like:
>>>>> In []: p_tst(randn(123))[-3:]
>>>>> Out[]: array([  3.41987203e-07,   2.82123675e-03,   2.82123675e-03])
>>>>>
>>>>> be the case, but actually most result seems to be more like:
>>>>> In []: p_tst(randn(123))[-3:]
>>>>> Out[]: array([  9.09325898e+13,   9.09325898e+13,   1.29387029e+72])
>>>>> In []: p_tst(randn(123))[-3:]
>>>>> Out[]: array([  8.60862087e-11,   8.60862087e-11,   6.58784520e+32])
>>>>> In []: p_tst(randn(123))[-3:]
>>>>> Out[]: array([  2.00545673e-09,   3.25537709e+32,   3.25537709e+32])
>>>>> In []: p_tst(randn(123))[-3:]
>>>>> Out[]: array([  3.22753481e-04,   1.87056454e+00,   1.87056454e+00])
>>>>> In []: p_tst(randn(123))[-3:]
>>>>> Out[]: array([  2.98556327e+08,   2.98556327e+08,   8.23588003e+12])
>>>>>
>>>>> So, does this phenomena imply that
>>>>> - I'm testing with too high order polynomials (if so, does there
>>>>> exists a definite upper limit of polynomial order I'll not face this issue)
>>>>> or
>>>>> - it's just the 'nature' of computations with float values (if so,
>>>>> probably I should be able to tackle this regardless of the polynomial order)
>>>>> or
>>>>> - it's a nasty bug in class Polynomial
>>>>>
>>>>>
>>>> It's a defect. You will get all the roots and the number will equal the
>>>> degree. I haven't decided what the best way to deal with this is, but my
>>>> thoughts have trended towards specifying an interval with the default being
>>>> the domain. If you have other thoughts I'd be glad for the feedback.
>>>>
>>>> For the problem at hand, note first that you are specifying the
>>>> coefficients, not the roots as was the case with poly1d. Second, as a rule
>>>> of thumb, plain old polynomials will generally only be good for degree < 22
>>>> due to being numerically ill conditioned. If you are really looking to use
>>>> high degrees, Chebyshev or Legendre will work better, although you will
>>>> probably need to explicitly specify the domain. If you want to specify the
>>>> polynomial using roots, do Poly.fromroots(...). Third, for the high degrees
>>>> you are probably screwed anyway for degree 123, since the accuracy of the
>>>> root finding will be limited, especially for roots that can cluster, and
>>>> any root that falls even a little bit outside the interval [-1,1] (the
>>>> default domain) is going to evaluate to a big number simply because the
>>>> polynomial is going to h*ll at a rate you wouldn't believe ;)
>>>>
>>>> For evenly spaced roots in [-1, 1] and using Chebyshev polynomials,
>>>> things look good for degree 50, get a bit loose at degree 75 but can be
>>>> fixed up with one iteration of Newton, and blow up at degree 100. I think
>>>> that's pretty good, actually, doing better would require a lot more work.
>>>> There are some zero finding algorithms out there that might do better if
>>>> someone wants to give it a shot.
>>>>
>>>> In [20]: p = Cheb.fromroots(linspace(-1, 1, 50))
>>>>
>>>> In [21]: sort(abs(p(p.roots())))
>>>> Out[21]:
>>>> array([  6.20385459e-25,   1.65436123e-24,   2.06795153e-24,
>>>>          5.79026429e-24,   5.89366186e-24,   6.44916482e-24,
>>>>          6.44916482e-24,   6.77254127e-24,   6.97933642e-24,
>>>>          7.25459208e-24,   1.00295649e-23,   1.37391414e-23,
>>>>          1.37391414e-23,   1.63368171e-23,   2.39882378e-23,
>>>>          3.30872245e-23,   4.38405725e-23,   4.49502653e-23,
>>>>          4.49502653e-23,   5.58346913e-23,   8.35452419e-23,
>>>>          9.38407760e-23,   9.38407760e-23,   1.03703218e-22,
>>>>          1.03703218e-22,   1.23249911e-22,   1.75197880e-22,
>>>>          1.75197880e-22,   3.07711188e-22,   3.09821786e-22,
>>>>          3.09821786e-22,   4.56625520e-22,   4.56625520e-22,
>>>>          4.69638303e-22,   4.69638303e-22,   5.96448724e-22,
>>>>          5.96448724e-22,   1.24076485e-21,   1.24076485e-21,
>>>>          1.59972624e-21,   1.59972624e-21,   1.62930347e-21,
>>>>          1.62930347e-21,   1.73773328e-21,   1.73773328e-21,
>>>>          1.87935435e-21,   2.30287083e-21,   2.48815928e-21,
>>>>          2.85411753e-21,   2.85411753e-21])
>>>>
>>> Thanks,
>>>
>>> for a very informative feedback. I'll study those orthogonal polynomials
>>> more detail.
>>>
>>>
>> That said, I'm thinking it might be possible to get a more accurate
>> polynomial representation from the zeros by going through a barycentric
>> form rather than simply multiplying the factors together as is done now.
>> Hmm...
>>
>> For evenly spaced roots the polynomial grows in amplitude rapidly at the
>> ends which leads to numerical problems because a small error in the zeros
>> turns into a large error in value because of the steepness of the curve at
>> the zeroes. I've attached a semilogy plot of the absolute values of the
>> polynomial with 30 equally spaced zeroes from -1 to 1.
>>
>>
>
> I've attached a plot of the Chebyshev coefficients for the monic
> polynomial with 50 zeros evenly spaced from -1, 1. The odd coefficients
> should be zero, so their value tells you what the error in the coefficient
> determination was (I used Gauss-Chebyshev integration). The value of the
> resulting Chebyshev series cannot be evaluated with sufficient accuracy in
> double precision due to the dynamic range of the coefficients and I expect
> that simple inability of double precision to correctly represent the values
> extends to the root finding.
>
>
Oops, that was erroneous. The proximate cause of the problem seems to be
poor precision in obtaining the coefficients from the roots. That can be
improved. I've attached a few more plots ;)

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/af411cce/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: polycoef.png
Type: image/png
Size: 36547 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/af411cce/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: polyval.png
Type: image/png
Size: 57555 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120130/af411cce/attachment-0001.png>

From chris.barker at noaa.gov  Mon Jan 30 19:35:05 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Mon, 30 Jan 2012 16:35:05 -0800
Subject: [Numpy-discussion] preferred way of testing empty arrays
In-Reply-To: <CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com>
References: <09D750AD-D095-4F86-86D7-757CD47942FF@biologie.hu-berlin.de>
	<CAF6FJitekms_pzNmirZp--W4N3oh0gS8F7ZM2Tu_nKYuuJkRVA@mail.gmail.com>
	<3D87FCA1-3B8A-41E1-BF6A-CC427B9F37D5@biologie.hu-berlin.de>
	<CAF6FJitw6u0oHQMg=X+v0GeY7zr0nkmedpqT8Texy4_CmvvcwQ@mail.gmail.com>
Message-ID: <CALGmxELAhzRU7PUkxbdsxyF+akbAN_qvxc6Gx+jOB29WF83W8Q@mail.gmail.com>

On Fri, Jan 27, 2012 at 1:29 PM, Robert Kern <robert.kern at gmail.com> wrote:
> Well, if you really need to do this in more than one place, define a
> utility function and call it a day.
>
> def should_not_plot(x):
> ? ?if x is None:
> ? ? ? ?return True
> ? ?elif isinstance(x, np.ndarray):
> ? ? ? ?return x.size == 0
> ? ?else:
> ? ? ? ?return bool(x)

I tend to do things like:

def convert_to_plotable(x):
    if x is None:
        return None
    else:
        x = np.asarray(x)
        if b.size == 0:
            return None
    return x

it does mean you need to check for None later anyway, but I like to
convert to an array early in the process -- then you know you have
either an array or None at that point.

NOTE: you could also raise and handle an exception instead.

-Chris
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From opossumnano at gmail.com  Tue Jan 31 07:42:21 2012
From: opossumnano at gmail.com (Tiziano Zito)
Date: Tue, 31 Jan 2012 13:42:21 +0100
Subject: [Numpy-discussion] [ANN] Summer School "Advanced Scientific
 Programming in Python" in Kiel, Germany
Message-ID: <20120131124221.GD12374@multivac.zonafranca>

Advanced Scientific Programming in Python
=========================================
a Summer School by the G-Node and the Institute of Experimental and
Applied Physics, Christian-Albrechts-Universit?t zu Kiel

Scientists spend more and more time writing, maintaining, and
debugging software. While techniques for doing this efficiently have
evolved, only few scientists actually use them. As a result, instead
of doing their research, they spend far too much time writing
deficient code and reinventing the wheel. In this course we will
present a selection of advanced programming techniques,
incorporating theoretical lectures and practical exercises tailored
to the needs of a programming scientist. New skills will be tested
in a real programming project: we will team up to develop an
entertaining scientific computer game.

We use the Python programming language for the entire course. Python
works as a simple programming language for beginners, but more
importantly, it also works great in scientific simulations and data
analysis. We show how clean language design, ease of extensibility,
and the great wealth of open source libraries for scientific
computing and data visualization are driving Python to become a
standard tool for the programming scientist.

This school is targeted at Master or PhD students and Post-docs from
all areas of science. Competence in Python or in another language
such as Java, C/C++, MATLAB, or Mathematica is absolutely required.
Basic knowledge of Python is assumed. Participants without any prior
experience with Python should work through the proposed introductory
materials before the course. 

Date and Location
=================
September 2?7, 2012. Kiel, Germany.

Preliminary Program
===================
Day 0 (Sun Sept 2) ? Best Programming Practices
  - Best Practices, Development Methodologies and the Zen of Python
  - Version control with git
  - Object-oriented programming & design patterns
Day 1 (Mon Sept 3) ? Software Carpentry
  - Test-driven development, unit testing & quality assurance
  - Debugging, profiling and benchmarking techniques
  - Best practices in data visualization
  - Programming in teams
Day 2 (Tue Sept 4) ? Scientific Tools for Python
  - Advanced NumPy
  - The Quest for Speed (intro): Interfacing to C with Cython
  - Advanced Python I: idioms, useful built-in data structures, 
    generators
Day 3 (Wed Sept 5) ? The Quest for Speed
  - Writing parallel applications in Python
  - Programming project
Day 4 (Thu Sept 6) ? Efficient Memory Management
  - When parallelization does not help:
    the starving CPUs problem
  - Advanced Python II: decorators and context managers
  - Programming project
Day 5 (Fri Sept 7) ? Practical Software Development
  - Programming project
  - The Pelita Tournament

Every evening we will have the tutors' consultation hour: Tutors
will answer your questions and give suggestions for your own
projects.

Applications
============
You can apply on-line at http://python.g-node.org

Applications must be submitted before 23:59 UTC, May 1, 2012.
Notifications of acceptance will be sent by June 1, 2012.

No fee is charged but participants should take care of travel,
living, and accommodation expenses.  Candidates will be selected on
the basis of their profile. Places are limited: acceptance rate last
time was around 20%.  Prerequisites: You are supposed to know the
basics of Python to participate in the lectures. You are encouraged
to go through the introductory material available on the website.

Faculty
=======
  - Francesc Alted, Continuum Analytics Inc., USA
  - Pietro Berkes, Enthought Inc., UK
  - Valentin Haenel,  Blue Brain Project, ?cole Polytechnique
    F?d?rale de Lausanne, Switzerland
  - Zbigniew J?drzejewski-Szmek, Faculty of Physics, University of
    Warsaw, Poland
  - Eilif Muller, Blue Brain Project, ?cole Polytechnique F?d?rale
    de Lausanne, Switzerland
  - Emanuele Olivetti, NeuroInformatics Laboratory, Fondazione Bruno
    Kessler and University of Trento, Italy
  - Rike-Benjamin Schuppner,  Technologit GbR, Germany
  - Bartosz Tele?czuk, Unit? de Neurosciences Information et
    Complexit?, Centre National de la Recherche Scientifique, France
  - St?fan van der Walt, Helen Wills Neuroscience Institute,
    University of California Berkeley, USA
  - Bastian Venthur, Berlin Institute of Technology and Bernstein
    Focus Neurotechnology, Germany
  - Niko Wilbert, TNG Technology Consulting GmbH, Germany
  - Tiziano Zito, Institute for Theoretical Biology,
    Humboldt-Universit?t zu Berlin, Germany

Organized by Christian T. Steigies and Christian Drews of the
Institute of Experimental and Applied Physics,
Christian-Albrechts-Universit?t zu Kiel , and by Zbigniew
J?drzejewski-Szmek and Tiziano Zito for the German Neuroinformatics
Node of the INCF. 

Website:  http://python.g-node.org
Contact:  python-info at g-node.org


From ndbecker2 at gmail.com  Tue Jan 31 08:26:34 2012
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 31 Jan 2012 08:26:34 -0500
Subject: [Numpy-discussion] numpy all unexpected result (generator)
Message-ID: <jg8q6a$j3e$2@dough.gmane.org>

I was just bitten by this unexpected behavior:

In [24]: all ([i>  0 for i in xrange (10)])
Out[24]: False

In [25]: all (i>  0 for i in xrange (10))
Out[25]: True

Turns out:
In [31]: all is numpy.all
Out[31]: True

So numpy.all doesn't seem to do what I would expect when given a generator.  
Bug?


From nadavh at visionsense.com  Tue Jan 31 04:42:55 2012
From: nadavh at visionsense.com (Nadav Horesh)
Date: Tue, 31 Jan 2012 01:42:55 -0800
Subject: [Numpy-discussion] histogram help
In-Reply-To: <CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com>
References: <CAA=a5iNkR=rJnYh+S1ivOn5E9Zu=fN-sJc6n6kbrhNhK+YR8DQ@mail.gmail.com>,
	<CAA=a5iPXoOwq_TRXTxatb0BSYfTje5naFv=wDwe2Z1UsE7AySA@mail.gmail.com>
Message-ID: <26FC23E7C398A64083C980D16001012D261F0D938C@VA3DIAXVS361.RED001.local>

Do you want a histogramm of z for each (x,y) ?

   Nadav

________________________________________
From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Ruby Stevenson [ruby185 at gmail.com]
Sent: 30 January 2012 21:27
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] histogram help

Sorry, I realize I didn't describe the problem completely clear or correct.

the (x,y) in this case is just many co-ordinates, and  each coordinate
has a list of values (Z value) associated with it.  The bins are
allocated for the Z.

I hope this clarify things a little. Thanks again.

Ruby


On Mon, Jan 30, 2012 at 2:21 PM, Ruby Stevenson <ruby185 at gmail.com> wrote:
> hi, all
>
> I am trying to figure out how to do histogram with numpy
>
> I have a three-dimension array A[x,y,z],  another array (bins) has
> been allocated along Z dimension, z'
>
> how can I get the histogram of H[ x, y, z' ]?
>
> thanks for your help.
>
> Ruby
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

From madsipsen at gmail.com  Tue Jan 31 03:29:23 2012
From: madsipsen at gmail.com (Mads Ipsen)
Date: Tue, 31 Jan 2012 09:29:23 +0100
Subject: [Numpy-discussion] Unexpected reorganization of internal data
Message-ID: <4F27A663.4000403@gmail.com>

Hi,

I am confused. Here's the reason:

The following structure is a representation of N points in 3D space:

U = numpy.array([[x1,y1,z1], [x1,y1,z1],...,[xn,yn,zn]])

So the array U has shape (N,3). This order makes sense to me since U[i] 
will give you the i'th point in the set. Now, I want to pass this array 
to a C++ function that does some stuff with the points. Here's how I do 
that

void Foo::doStuff(int n, PyObject * numpy_data)
{
     // Get pointer to data
     double * const positions = (double *) PyArray_DATA(numpy_data);

     // Print positions
     for (int i=0; i<n; ++i)
     {
     float x = static_cast<float>(positions[3*i+0])
     float y = static_cast<float>(positions[3*i+1])
     float z = static_cast<float>(positions[3*i+2])

     printf("Pos[%d] = %f %f %f\n", x, y, z);
     }
}

When I call this routine, using a swig wrapped Python interface to the 
C++ class, everything prints out nice.

Now, I want to apply a rotation to all the positions. So I set up some 
rotation matrix R like this:

R = numpy.array([[r11,r12,r13],
                  [r21,r22,r23],
                  [r31,r32,r33]])

To apply the matrix to the data in one crunch, I do

V = numpy.dot(R, U.transpose()).transpose()

Now when I call my C++ function from the Python side, all the data in V 
is printed, but it has been transposed. So apparently the internal data 
structure handled by numpy has been reorganized, even though I called 
transpose() twice, which I would expect to cancel out each other.

However, if I do:

V = numpy.array(U.transpose()).transpose()

and call the C++ routine, everything is perfectly fine, ie. the data 
structure is as expected.

What went wrong?

Best regards,

Mads

-- 
+-----------------------------------------------------+
| Mads Ipsen                                          |
+----------------------+------------------------------+
| G?seb?ksvej 7, 4. tv |                              |
| DK-2500 Valby        | phone:          +45-29716388 |
| Denmark              | email:  mads.ipsen at gmail.com |
+----------------------+------------------------------+


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/c50bf518/attachment.html>

From robert.kern at gmail.com  Tue Jan 31 09:07:31 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 31 Jan 2012 14:07:31 +0000
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <jg8q6a$j3e$2@dough.gmane.org>
References: <jg8q6a$j3e$2@dough.gmane.org>
Message-ID: <CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com>

On Tue, Jan 31, 2012 at 13:26, Neal Becker <ndbecker2 at gmail.com> wrote:
> I was just bitten by this unexpected behavior:
>
> In [24]: all ([i> ?0 for i in xrange (10)])
> Out[24]: False
>
> In [25]: all (i> ?0 for i in xrange (10))
> Out[25]: True
>
> Turns out:
> In [31]: all is numpy.all
> Out[31]: True
>
> So numpy.all doesn't seem to do what I would expect when given a generator.
> Bug?

Expected behavior. numpy.all(), like nearly all numpy functions,
converts the input to an array using numpy.asarray(). numpy.asarray()
knows nothing special about generators and other iterables that are
not sequences, so it thinks it's a single scalar object. This scalar
object happens to have a __nonzero__() method that returns True like
most Python objects that don't override this.

In order to use generic iterators that are not sequences, you need to
explicitly use numpy.fromiter() to convert them to ndarrays. asarray()
and array() can't do it in general because they need to autodiscover
the shape and dtype all at the same time.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From d.s.seljebotn at astro.uio.no  Tue Jan 31 09:14:24 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 31 Jan 2012 15:14:24 +0100
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org>
	<CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com>
Message-ID: <4F27F740.9060202@astro.uio.no>

On 01/31/2012 03:07 PM, Robert Kern wrote:
> On Tue, Jan 31, 2012 at 13:26, Neal Becker<ndbecker2 at gmail.com>  wrote:
>> I was just bitten by this unexpected behavior:
>>
>> In [24]: all ([i>    0 for i in xrange (10)])
>> Out[24]: False
>>
>> In [25]: all (i>    0 for i in xrange (10))
>> Out[25]: True
>>
>> Turns out:
>> In [31]: all is numpy.all
>> Out[31]: True
>>
>> So numpy.all doesn't seem to do what I would expect when given a generator.
>> Bug?
>
> Expected behavior. numpy.all(), like nearly all numpy functions,
> converts the input to an array using numpy.asarray(). numpy.asarray()
> knows nothing special about generators and other iterables that are
> not sequences, so it thinks it's a single scalar object. This scalar
> object happens to have a __nonzero__() method that returns True like
> most Python objects that don't override this.
>
> In order to use generic iterators that are not sequences, you need to
> explicitly use numpy.fromiter() to convert them to ndarrays. asarray()
> and array() can't do it in general because they need to autodiscover
> the shape and dtype all at the same time.

Perhaps np.asarray could specifically check for a generator argument and 
raise an exception? I imagine that would save people some time when 
running into this...

If you really want

In [7]: x = np.asarray(None)

In [8]: x[()] = (i for i in range(10))

In [9]: x
Out[9]: array(<generator object <genexpr> at 0x4553fa0>, dtype=object)

...then one can type it out?

Dag


From malcolm.reynolds at gmail.com  Tue Jan 31 09:14:14 2012
From: malcolm.reynolds at gmail.com (Malcolm Reynolds)
Date: Tue, 31 Jan 2012 14:14:14 +0000
Subject: [Numpy-discussion] Unexpected reorganization of internal data
In-Reply-To: <4F27A663.4000403@gmail.com>
References: <4F27A663.4000403@gmail.com>
Message-ID: <CAO1Gn5-4D9APhwMXu4WQ8W843OYudKvTTuT1jPbMPtwPfkw8kQ@mail.gmail.com>

Not exactly an answer to your question, but I can highly recommend
using Boost.python, PyUblas and Ublas for your C++ vectors and
matrices. It gives you a really good interface on the C++ side to
numpy arrays and matrices, which can be passed in both directions over
the language threshold with no copying.

If I had to guess I'd say sometimes when transposing numpy simply sets
a flag internally to avoid copying the data, but in some cases (such
as perhaps when multiplication needs to take place) the data has to be
placed in a new object. Accessing the data via raw pointers in C++ may
not be checking for the 'transpose' flag and therefore you see an
unexpected result.

Disclaimer: this is just a guess, someone more familiar with Numpy
internals will no doubt be able to correct me.

Malcolm


From ndbecker2 at gmail.com  Tue Jan 31 09:33:55 2012
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 31 Jan 2012 09:33:55 -0500
Subject: [Numpy-discussion] numpy all unexpected result (generator)
References: <jg8q6a$j3e$2@dough.gmane.org>
	<CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com>
	<4F27F740.9060202@astro.uio.no>
Message-ID: <jg8u4j$j21$1@dough.gmane.org>

Dag Sverre Seljebotn wrote:

> On 01/31/2012 03:07 PM, Robert Kern wrote:
>> On Tue, Jan 31, 2012 at 13:26, Neal Becker<ndbecker2 at gmail.com>  wrote:
>>> I was just bitten by this unexpected behavior:
>>>
>>> In [24]: all ([i>    0 for i in xrange (10)])
>>> Out[24]: False
>>>
>>> In [25]: all (i>    0 for i in xrange (10))
>>> Out[25]: True
>>>
>>> Turns out:
>>> In [31]: all is numpy.all
>>> Out[31]: True
>>>
>>> So numpy.all doesn't seem to do what I would expect when given a generator.
>>> Bug?
>>
>> Expected behavior. numpy.all(), like nearly all numpy functions,
>> converts the input to an array using numpy.asarray(). numpy.asarray()
>> knows nothing special about generators and other iterables that are
>> not sequences, so it thinks it's a single scalar object. This scalar
>> object happens to have a __nonzero__() method that returns True like
>> most Python objects that don't override this.
>>
>> In order to use generic iterators that are not sequences, you need to
>> explicitly use numpy.fromiter() to convert them to ndarrays. asarray()
>> and array() can't do it in general because they need to autodiscover
>> the shape and dtype all at the same time.
> 
> Perhaps np.asarray could specifically check for a generator argument and
> raise an exception? I imagine that would save people some time when
> running into this...
> 
> If you really want
> 
> In [7]: x = np.asarray(None)
> 
> In [8]: x[()] = (i for i in range(10))
> 
> In [9]: x
> Out[9]: array(<generator object <genexpr> at 0x4553fa0>, dtype=object)
> 
> ...then one can type it out?
> 
> Dag

The reason it surprised me, is that python 'all' doesn't behave as numpy 'all' 
in this respect - and using ipython, I didn't even notice that 'all' was 
numpy.all rather than standard python all.  All in all, rather unfortunate :)


From matthew.brett at gmail.com  Tue Jan 31 09:40:59 2012
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 31 Jan 2012 14:40:59 +0000
Subject: [Numpy-discussion] Unexpected reorganization of internal data
In-Reply-To: <4F27A663.4000403@gmail.com>
References: <4F27A663.4000403@gmail.com>
Message-ID: <CAH6Pt5oH1WEvPkfHSNJn7QBM+SwmhCndOoi14UnRDFefOn+Lvw@mail.gmail.com>

Hi,

On Tue, Jan 31, 2012 at 8:29 AM, Mads Ipsen <madsipsen at gmail.com> wrote:
> Hi,
>
> I am confused. Here's the reason:
>
> The following structure is a representation of N points in 3D space:
>
> U = numpy.array([[x1,y1,z1], [x1,y1,z1],...,[xn,yn,zn]])
>
> So the array U has shape (N,3). This order makes sense to me since U[i] will
> give you the i'th point in the set. Now, I want to pass this array to a C++
> function that does some stuff with the points. Here's how I do that
>
> void Foo::doStuff(int n, PyObject * numpy_data)
> {
> ??? // Get pointer to data
> ??? double * const positions = (double *) PyArray_DATA(numpy_data);
>
> ??? // Print positions
> ??? for (int i=0; i<n; ++i)
> ??? {
> ??? float x = static_cast<float>(positions[3*i+0])
> ??? float y = static_cast<float>(positions[3*i+1])
> ??? float z = static_cast<float>(positions[3*i+2])
>
> ??? printf("Pos[%d] = %f %f %f\n", x, y, z);
> ??? }
> }
>
> When I call this routine, using a swig wrapped Python interface to the C++
> class, everything prints out nice.
>
> Now, I want to apply a rotation to all the positions. So I set up some
> rotation matrix R like this:
>
> R = numpy.array([[r11,r12,r13],
> ???????????????? [r21,r22,r23],
> ???????????????? [r31,r32,r33]])
>
> To apply the matrix to the data in one crunch, I do
>
> V = numpy.dot(R, U.transpose()).transpose()
>
> Now when I call my C++ function from the Python side, all the data in V is
> printed, but it has been transposed. So apparently the internal data
> structure handled by numpy has been reorganized, even though I called
> transpose() twice, which I would expect to cancel out each other.
>
> However, if I do:
>
> V = numpy.array(U.transpose()).transpose()
>
> and call the C++ routine, everything is perfectly fine, ie. the data
> structure is as expected.
>
> What went wrong?

The numpy array reserves the right to organize its data internally.
For example, a numpy array can be in Fortran order in memory, or C
order in memory, and many more complicated schemes.  You might want to
have a look at:

http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray

If you depend on a particular order for your array memory, you might
want to look at:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ascontiguousarray.html

Best,

Matthew


From alan.isaac at gmail.com  Tue Jan 31 10:03:55 2012
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Tue, 31 Jan 2012 10:03:55 -0500
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <jg8q6a$j3e$2@dough.gmane.org>
References: <jg8q6a$j3e$2@dough.gmane.org>
Message-ID: <4F2802DB.3060705@gmail.com>

On 1/31/2012 8:26 AM, Neal Becker wrote:
> I was just bitten by this unexpected behavior:
>
> In [24]: all ([i>   0 for i in xrange (10)])
> Out[24]: False
>
> In [25]: all (i>   0 for i in xrange (10))
> Out[25]: True
>
> Turns out:
> In [31]: all is numpy.all
> Out[31]: True


>>> np.array([i>  0 for i in xrange (10)])
array([False,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)
>>> np.array(i>  0 for i in xrange (10))
array(<generator object <genexpr> at 0x0267A210>, dtype=object)
>>> import this


Cheers,
Alan


From ben.root at ou.edu  Tue Jan 31 10:13:54 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Tue, 31 Jan 2012 09:13:54 -0600
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <4F2802DB.3060705@gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org>
	<4F2802DB.3060705@gmail.com>
Message-ID: <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>

On Tuesday, January 31, 2012, Alan G Isaac <alan.isaac at gmail.com> wrote:
> On 1/31/2012 8:26 AM, Neal Becker wrote:
>> I was just bitten by this unexpected behavior:
>>
>> In [24]: all ([i>   0 for i in xrange (10)])
>> Out[24]: False
>>
>> In [25]: all (i>   0 for i in xrange (10))
>> Out[25]: True
>>
>> Turns out:
>> In [31]: all is numpy.all
>> Out[31]: True
>
>
>>>> np.array([i>  0 for i in xrange (10)])
> array([False,  True,  True,  True,  True,  True,  True,  True,  True,
 True], dtype=bool)
>>>> np.array(i>  0 for i in xrange (10))
> array(<generator object <genexpr> at 0x0267A210>, dtype=object)
>>>> import this
>
>
> Cheers,
> Alan
>

Is np.all() using np.array() or np.asanyarray()?  If the latter, I would
expect it to return a numpy array from a generator.  If the former, why
isn't it using asanyarray()?

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/b5df357a/attachment.html>

From robert.kern at gmail.com  Tue Jan 31 10:18:26 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 31 Jan 2012 15:18:26 +0000
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
Message-ID: <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>

On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote:

> Is np.all() using np.array() or np.asanyarray()? ?If the latter, I would
> expect it to return a numpy array from a generator.

Why would you expect that?

[~/scratch]
|37> np.asanyarray(i>5 for i in range(10))
array(<generator object <genexpr> at 0xdc24a08>, dtype=object)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From d.s.seljebotn at astro.uio.no  Tue Jan 31 10:19:29 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 31 Jan 2012 16:19:29 +0100
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org>	<4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
Message-ID: <4F280681.3020700@astro.uio.no>

On 01/31/2012 04:13 PM, Benjamin Root wrote:
>
>
> On Tuesday, January 31, 2012, Alan G Isaac <alan.isaac at gmail.com
> <mailto:alan.isaac at gmail.com>> wrote:
>  > On 1/31/2012 8:26 AM, Neal Becker wrote:
>  >> I was just bitten by this unexpected behavior:
>  >>
>  >> In [24]: all ([i>   0 for i in xrange (10)])
>  >> Out[24]: False
>  >>
>  >> In [25]: all (i>   0 for i in xrange (10))
>  >> Out[25]: True
>  >>
>  >> Turns out:
>  >> In [31]: all is numpy.all
>  >> Out[31]: True
>  >
>  >
>  >>>> np.array([i>  0 for i in xrange (10)])
>  > array([False,  True,  True,  True,  True,  True,  True,  True,  True,
>   True], dtype=bool)
>  >>>> np.array(i>  0 for i in xrange (10))
>  > array(<generator object <genexpr> at 0x0267A210>, dtype=object)
>  >>>> import this
>  >
>  >
>  > Cheers,
>  > Alan
>  >
>
> Is np.all() using np.array() or np.asanyarray()?  If the latter, I would
> expect it to return a numpy array from a generator.  If the former, why
> isn't it using asanyarray()?

Your expectation is probably wrong:

In [12]: np.asanyarray(i for i in range(10))
Out[12]: array(<generator object <genexpr> at 0x455d9b0>, dtype=object)

Dag Sverre


From ben.root at ou.edu  Tue Jan 31 10:35:38 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Tue, 31 Jan 2012 09:35:38 -0600
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
Message-ID: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>

On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote:
>
> > Is np.all() using np.array() or np.asanyarray()?  If the latter, I would
> > expect it to return a numpy array from a generator.
>
> Why would you expect that?
>
> [~/scratch]
> |37> np.asanyarray(i>5 for i in range(10))
> array(<generator object <genexpr> at 0xdc24a08>, dtype=object)
>
> --
> Robert Kern
>

What possible use-case could there be for a numpy array of generators?
Furthermore, from the documentation:

numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None,
ownmaskna=False)
     Convert the input to an ndarray, but pass ndarray subclasses through.

     Parameters
     ----------
     a : array_like
         *Input data, in any form that can be converted to an array*.  This
         includes scalars, lists, lists of tuples, tuples, tuples of tuples,
         tuples of lists, and ndarrays.

Emphasis mine.  A generator is an input that could be converted into an
array.  (Setting aside the issue of non-terminating generators such as
those from cycle()).

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/ec1d9d85/attachment.html>

From d.s.seljebotn at astro.uio.no  Tue Jan 31 10:46:01 2012
From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn)
Date: Tue, 31 Jan 2012 16:46:01 +0100
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org>
	<4F2802DB.3060705@gmail.com>	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
Message-ID: <4F280CB9.9070509@astro.uio.no>

On 01/31/2012 04:35 PM, Benjamin Root wrote:
>
>
> On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com
> <mailto:robert.kern at gmail.com>> wrote:
>
>     On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu
>     <mailto:ben.root at ou.edu>> wrote:
>
>      > Is np.all() using np.array() or np.asanyarray()?  If the latter,
>     I would
>      > expect it to return a numpy array from a generator.
>
>     Why would you expect that?
>
>     [~/scratch]
>     |37> np.asanyarray(i>5 for i in range(10))
>     array(<generator object <genexpr> at 0xdc24a08>, dtype=object)
>
>     --
>     Robert Kern
>
>
> What possible use-case could there be for a numpy array of generators?
> Furthermore, from the documentation:
>
> numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None,
> ownmaskna=False)
>       Convert the input to an ndarray, but pass ndarray subclasses through.
>
>       Parameters
>       ----------
>       a : array_like
> *Input data, in any form that can be converted to an array*.  This
>           includes scalars, lists, lists of tuples, tuples, tuples of
> tuples,
>           tuples of lists, and ndarrays.
>
> Emphasis mine.  A generator is an input that could be converted into an
> array.  (Setting aside the issue of non-terminating generators such as
> those from cycle()).

Splitting semantic hairs doesn't help here -- it *does* return an array, 
it just happens to be a completely useless 0-dimensional one.

The question is, is the current confusing and less than useful? (I vot 
for "yes"). list and tuple are special-cased, why not generators (at 
least to raise an exception)

Going OT, look at this gem:

????

In [3]: a
Out[3]: array([1, 2, 3], dtype=object)

In [4]: a.shape
Out[4]: ()

???

In [9]: b
Out[9]: array([1, 2, 3], dtype=object)

In [10]: b.shape
Out[10]: (3,)

Figuring out the "???" is left as an exercise to the reader :-)

Dag Sverre


From alan.isaac at gmail.com  Tue Jan 31 10:48:05 2012
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Tue, 31 Jan 2012 10:48:05 -0500
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
Message-ID: <4F280D35.6020409@gmail.com>

On 1/31/2012 10:35 AM, Benjamin Root wrote:
> A generator is an input that could be converted into an array.


def mygen():
   i = 0
   while True:
     yield i
     i += 1

Alan Isaac


From robert.kern at gmail.com  Tue Jan 31 10:50:15 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 31 Jan 2012 15:50:15 +0000
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
Message-ID: <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>

On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote:
>
>
> On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com> wrote:
>>
>> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote:
>>
>> > Is np.all() using np.array() or np.asanyarray()? ?If the latter, I would
>> > expect it to return a numpy array from a generator.
>>
>> Why would you expect that?
>>
>> [~/scratch]
>> |37> np.asanyarray(i>5 for i in range(10))
>> array(<generator object <genexpr> at 0xdc24a08>, dtype=object)
>>
>> --
>> Robert Kern
>
>
> What possible use-case could there be for a numpy array of generators?

Not many. This isn't an intentional feature, just a logical
consequence of all of the other intentional features being applied
consistently.

> Furthermore, from the documentation:
>
> numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None,
> ownmaskna=False)
> ???? Convert the input to an ndarray, but pass ndarray subclasses through.
>
> ???? Parameters
> ???? ----------
> ???? a : array_like
> ???????? Input data, in any form that can be converted to an array.? This
> ???????? includes scalars, lists, lists of tuples, tuples, tuples of tuples,
> ???????? tuples of lists, and ndarrays.
>
> Emphasis mine.? A generator is an input that could be converted into an
> array.? (Setting aside the issue of non-terminating generators such as those
> from cycle()).

I'm sorry, but this is not true. In general, it's too hard to do all
of the magic autodetermination that asarray() and array() do when
faced with an indeterminate-length iterable. We tried. That's why we
have fromiter(). By restricting the domain to an iterable yielding
scalars and requiring that the user specify the desired dtype,
fromiter() can figure out the rest.

Like it or not, "array_like" is practically defined by the behavior of
np.asarray(), not vice-versa.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From shish at keba.be  Tue Jan 31 11:05:34 2012
From: shish at keba.be (Olivier Delalleau)
Date: Tue, 31 Jan 2012 11:05:34 -0500
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
	<CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>
Message-ID: <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com>

Le 31 janvier 2012 10:50, Robert Kern <robert.kern at gmail.com> a ?crit :

> On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote:
> >
> >
> > On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com>
> wrote:
> >>
> >> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote:
> >>
> >> > Is np.all() using np.array() or np.asanyarray()?  If the latter, I
> would
> >> > expect it to return a numpy array from a generator.
> >>
> >> Why would you expect that?
> >>
> >> [~/scratch]
> >> |37> np.asanyarray(i>5 for i in range(10))
> >> array(<generator object <genexpr> at 0xdc24a08>, dtype=object)
> >>
> >> --
> >> Robert Kern
> >
> >
> > What possible use-case could there be for a numpy array of generators?
>
> Not many. This isn't an intentional feature, just a logical
> consequence of all of the other intentional features being applied
> consistently.
>
> > Furthermore, from the documentation:
> >
> > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None,
> > ownmaskna=False)
> >      Convert the input to an ndarray, but pass ndarray subclasses
> through.
> >
> >      Parameters
> >      ----------
> >      a : array_like
> >          Input data, in any form that can be converted to an array.  This
> >          includes scalars, lists, lists of tuples, tuples, tuples of
> tuples,
> >          tuples of lists, and ndarrays.
> >
> > Emphasis mine.  A generator is an input that could be converted into an
> > array.  (Setting aside the issue of non-terminating generators such as
> those
> > from cycle()).
>
> I'm sorry, but this is not true. In general, it's too hard to do all
> of the magic autodetermination that asarray() and array() do when
> faced with an indeterminate-length iterable. We tried. That's why we
> have fromiter(). By restricting the domain to an iterable yielding
> scalars and requiring that the user specify the desired dtype,
> fromiter() can figure out the rest.
>
> Like it or not, "array_like" is practically defined by the behavior of
> np.asarray(), not vice-versa.


In that case I agree with whoever said ealier it would be best to detect
this case and throw an exception, as it'll probably save some headaches.

-=- Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/30bf81ab/attachment.html>

From robert.kern at gmail.com  Tue Jan 31 11:11:19 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 31 Jan 2012 16:11:19 +0000
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
Message-ID: <CAF6FJis3WrvvdWX8Z9VypGzSLFd+o8rvVbMkGLKwzEt-qK2=Pg@mail.gmail.com>

On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote:

> Furthermore, from the documentation:
>
> numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None,
> ownmaskna=False)
> ???? Convert the input to an ndarray, but pass ndarray subclasses through.
>
> ???? Parameters
> ???? ----------
> ???? a : array_like
> ???????? Input data, in any form that can be converted to an array.? This
> ???????? includes scalars, lists, lists of tuples, tuples, tuples of tuples,
> ???????? tuples of lists, and ndarrays.

I should also add that this verbiage is also in np.asarray(). The only
additional feature of np.asanyarray() is that is does not convert
ndarray subclasses like matrix to ndarray objects. np.asanyarray()
does not accept more types of objects than np.asarray().

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From ben.root at ou.edu  Tue Jan 31 11:46:56 2012
From: ben.root at ou.edu (Benjamin Root)
Date: Tue, 31 Jan 2012 10:46:56 -0600
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
	<CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>
	<CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com>
Message-ID: <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com>

On Tue, Jan 31, 2012 at 10:05 AM, Olivier Delalleau <shish at keba.be> wrote:

> Le 31 janvier 2012 10:50, Robert Kern <robert.kern at gmail.com> a ?crit :
>
>> On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote:
>> >
>> >
>> > On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>> >>
>> >> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote:
>> >>
>> >> > Is np.all() using np.array() or np.asanyarray()?  If the latter, I
>> would
>> >> > expect it to return a numpy array from a generator.
>> >>
>> >> Why would you expect that?
>> >>
>> >> [~/scratch]
>> >> |37> np.asanyarray(i>5 for i in range(10))
>> >> array(<generator object <genexpr> at 0xdc24a08>, dtype=object)
>> >>
>> >> --
>> >> Robert Kern
>> >
>> >
>> > What possible use-case could there be for a numpy array of generators?
>>
>> Not many. This isn't an intentional feature, just a logical
>> consequence of all of the other intentional features being applied
>> consistently.
>>
>> > Furthermore, from the documentation:
>> >
>> > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None,
>> > ownmaskna=False)
>> >      Convert the input to an ndarray, but pass ndarray subclasses
>> through.
>> >
>> >      Parameters
>> >      ----------
>> >      a : array_like
>> >          Input data, in any form that can be converted to an array.
>> This
>> >          includes scalars, lists, lists of tuples, tuples, tuples of
>> tuples,
>> >          tuples of lists, and ndarrays.
>> >
>> > Emphasis mine.  A generator is an input that could be converted into an
>> > array.  (Setting aside the issue of non-terminating generators such as
>> those
>> > from cycle()).
>>
>> I'm sorry, but this is not true. In general, it's too hard to do all
>> of the magic autodetermination that asarray() and array() do when
>> faced with an indeterminate-length iterable. We tried. That's why we
>> have fromiter(). By restricting the domain to an iterable yielding
>> scalars and requiring that the user specify the desired dtype,
>> fromiter() can figure out the rest.
>>
>> Like it or not, "array_like" is practically defined by the behavior of
>> np.asarray(), not vice-versa.
>
>
> In that case I agree with whoever said ealier it would be best to detect
> this case and throw an exception, as it'll probably save some headaches.
>
> -=- Olivier
>
>
I'll agree with this statement.  This bug has popped up a few times in the
mpl bug tracker due to the pylab mode.  While I would prefer if it were
possible to evaluate the generator into an array, silently returning True
incorrectly for all() and any() is probably far worse.

That said, is it still impossible to make np.all() and np.any() special to
have similar behavior to the built-in all() and any()?  Maybe it could
catch the above exception and then return the result from python's
built-ins?

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/9aa56e31/attachment.html>

From chris.barker at noaa.gov  Tue Jan 31 12:07:20 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 31 Jan 2012 09:07:20 -0800
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <jg8u4j$j21$1@dough.gmane.org>
References: <jg8q6a$j3e$2@dough.gmane.org>
	<CAF6FJitNfL6_K5HscgBgokv8Aod5biRR6NG-FaTS5xwoPH5_tQ@mail.gmail.com>
	<4F27F740.9060202@astro.uio.no> <jg8u4j$j21$1@dough.gmane.org>
Message-ID: <CALGmxEJiZXmLSJ7jYwfca6susvGF2kOt4ESxF0Nn4Ek_2RVWZw@mail.gmail.com>

On Tue, Jan 31, 2012 at 6:33 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
> The reason it surprised me, is that python 'all' doesn't behave as numpy 'all'
> in this respect - and using ipython, I didn't even notice that 'all' was
> numpy.all rather than standard python all.

"namespaces are one honking great idea"

-- sorry, I couldn't help myself....


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From chris.barker at noaa.gov  Tue Jan 31 12:23:58 2012
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 31 Jan 2012 09:23:58 -0800
Subject: [Numpy-discussion] Unexpected reorganization of internal data
In-Reply-To: <CAO1Gn5-4D9APhwMXu4WQ8W843OYudKvTTuT1jPbMPtwPfkw8kQ@mail.gmail.com>
References: <4F27A663.4000403@gmail.com>
	<CAO1Gn5-4D9APhwMXu4WQ8W843OYudKvTTuT1jPbMPtwPfkw8kQ@mail.gmail.com>
Message-ID: <CALGmxE+47X1rmmrm9wqmEiVGHOb3DwbuZ1oRmWY=g1MKXmbQrg@mail.gmail.com>

On Tue, Jan 31, 2012 at 6:14 AM, Malcolm Reynolds
<malcolm.reynolds at gmail.com> wrote:
> Not exactly an answer to your question, but I can highly recommend
> using Boost.python, PyUblas and Ublas for your C++ vectors and
> matrices. It gives you a really good interface on the C++ side to
> numpy arrays and matrices, which can be passed in both directions over
> the language threshold with no copying.

or use Cython...

> If I had to guess I'd say sometimes when transposing numpy simply sets
> a flag internally to avoid copying the data, but in some cases (such
> as perhaps when multiplication needs to take place) the data has to be
> placed in a new object.

good guess:

> V = numpy.dot(R, U.transpose()).transpose()

>>> a
array([[1, 2],
       [3, 4],
       [5, 6]])
>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> b = a.transpose()
>>> b.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

so the transpose() simple re-arranges the strides to Fortran order,
rather than changing anything in memory.

np.dot() produces a new array, so it is C-contiguous, then you
transpose it, so you get a fortran-ordered array.

> Now when I call my C++ function from the Python side, all the data in V is printed, but it has been transposed.

as mentioned, if you are working with arrays in C++ (or fortran, orC,
or...) and need to count on the ordering of the data, you need to
check it in your extension code. There are utilities for this.

> However, if I do:

> V = numpy.array(U.transpose()).transpose()

right:

In [7]: a.flags
Out[7]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [8]: a.transpose().flags
Out[8]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [9]: np.array( a.transpose() ).flags
Out[9]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


so the np.array call doesn't re-arrange the order if it doesn't need
to. If you want to force it, you can specify the order:

In [10]: np.array( a.transpose(), order='C' ).flags
Out[10]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


(note: this does surprise me a bit, as it is making a copy, but there
you go -- if order matters, specify it)

In general, numpy does a lot of things for the sake of efficiency --
avoiding copies when it can, for instance -- this give efficiency and
flexibility, but you do need to be careful, particularly when
interfacing with the binary data directly.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice
7600 Sand Point Way NE ??(206) 526-6329?? fax
Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception

Chris.Barker at noaa.gov


From travis at continuum.io  Tue Jan 31 17:17:28 2012
From: travis at continuum.io (Travis Oliphant)
Date: Tue, 31 Jan 2012 16:17:28 -0600
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
	<CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>
	<CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com>
	<CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com>
Message-ID: <95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io>

I also agree that an exception should be raised at the very least.

It might also be possible to make the NumPy any, all, and sum functions behave like the builtins when given a generator.  It seems worth exploring at least.

Travis 

--
Travis Oliphant
(on a mobile)
512-826-7480


On Jan 31, 2012, at 10:46 AM, Benjamin Root <ben.root at ou.edu> wrote:

> 
> On Tue, Jan 31, 2012 at 10:05 AM, Olivier Delalleau <shish at keba.be> wrote:
> Le 31 janvier 2012 10:50, Robert Kern <robert.kern at gmail.com> a ?crit :
> On Tue, Jan 31, 2012 at 15:35, Benjamin Root <ben.root at ou.edu> wrote:
> >
> >
> > On Tue, Jan 31, 2012 at 9:18 AM, Robert Kern <robert.kern at gmail.com> wrote:
> >>
> >> On Tue, Jan 31, 2012 at 15:13, Benjamin Root <ben.root at ou.edu> wrote:
> >>
> >> > Is np.all() using np.array() or np.asanyarray()?  If the latter, I would
> >> > expect it to return a numpy array from a generator.
> >>
> >> Why would you expect that?
> >>
> >> [~/scratch]
> >> |37> np.asanyarray(i>5 for i in range(10))
> >> array(<generator object <genexpr> at 0xdc24a08>, dtype=object)
> >>
> >> --
> >> Robert Kern
> >
> >
> > What possible use-case could there be for a numpy array of generators?
> 
> Not many. This isn't an intentional feature, just a logical
> consequence of all of the other intentional features being applied
> consistently.
> 
> > Furthermore, from the documentation:
> >
> > numpy.asanyarray = asanyarray(a, dtype=None, order=None, maskna=None,
> > ownmaskna=False)
> >      Convert the input to an ndarray, but pass ndarray subclasses through.
> >
> >      Parameters
> >      ----------
> >      a : array_like
> >          Input data, in any form that can be converted to an array.  This
> >          includes scalars, lists, lists of tuples, tuples, tuples of tuples,
> >          tuples of lists, and ndarrays.
> >
> > Emphasis mine.  A generator is an input that could be converted into an
> > array.  (Setting aside the issue of non-terminating generators such as those
> > from cycle()).
> 
> I'm sorry, but this is not true. In general, it's too hard to do all
> of the magic autodetermination that asarray() and array() do when
> faced with an indeterminate-length iterable. We tried. That's why we
> have fromiter(). By restricting the domain to an iterable yielding
> scalars and requiring that the user specify the desired dtype,
> fromiter() can figure out the rest.
> 
> Like it or not, "array_like" is practically defined by the behavior of
> np.asarray(), not vice-versa.
> 
> In that case I agree with whoever said ealier it would be best to detect this case and throw an exception, as it'll probably save some headaches.
> 
> -=- Olivier
> 
> 
> I'll agree with this statement.  This bug has popped up a few times in the mpl bug tracker due to the pylab mode.  While I would prefer if it were possible to evaluate the generator into an array, silently returning True incorrectly for all() and any() is probably far worse.
> 
> That said, is it still impossible to make np.all() and np.any() special to have similar behavior to the built-in all() and any()?  Maybe it could catch the above exception and then return the result from python's built-ins?
> 
> Cheers!
> Ben Root
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/92da353e/attachment.html>

From robert.kern at gmail.com  Tue Jan 31 17:22:17 2012
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 31 Jan 2012 22:22:17 +0000
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
	<CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>
	<CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com>
	<CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com>
	<95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io>
Message-ID: <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com>

On Tue, Jan 31, 2012 at 22:17, Travis Oliphant <travis at continuum.io> wrote:
> I also agree that an exception should be raised at the very least.
>
> It might also be possible to make the NumPy any, all, and sum functions
> behave like the builtins when given a generator. ?It seems worth exploring
> at least.

I would rather we deprecate the all() and any() functions in favor of
the alltrue() and sometrue() aliases that date back to Numeric.
Renaming them to match the builtin names was a mistake.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
? -- Umberto Eco


From warren.weckesser at enthought.com  Tue Jan 31 17:25:33 2012
From: warren.weckesser at enthought.com (Warren Weckesser)
Date: Tue, 31 Jan 2012 16:25:33 -0600
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
	<CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>
	<CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com>
	<CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com>
	<95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io>
	<CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com>
Message-ID: <CAM-+wY8a7fO0dqVEo4cJVUMQpk3hDRqH7tCeumMk-C1v5ocCVA@mail.gmail.com>

On Tue, Jan 31, 2012 at 4:22 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Jan 31, 2012 at 22:17, Travis Oliphant <travis at continuum.io>
> wrote:
> > I also agree that an exception should be raised at the very least.
> >
> > It might also be possible to make the NumPy any, all, and sum functions
> > behave like the builtins when given a generator.  It seems worth
> exploring
> > at least.
>
> I would rather we deprecate the all() and any() functions in favor of
> the alltrue() and sometrue() aliases that date back to Numeric.
>


+1  (Maybe 'anytrue' for consistency?  (And a royal blue bike shed?))

Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120131/f4af3d62/attachment.html>

From travis at continuum.io  Tue Jan 31 17:35:18 2012
From: travis at continuum.io (Travis Oliphant)
Date: Tue, 31 Jan 2012 16:35:18 -0600
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
	<CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>
	<CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com>
	<CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com>
	<95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io>
	<CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com>
Message-ID: <07157234-E610-4A5F-8B18-AB6940129E96@continuum.io>

Actually i believe the NumPy 'any' and 'all' names pre-date the Python usage which first appeared in Python 2.5

I agree with Chris that namespaces are a great idea.  I don't agree with deprecating 'any' and 'all'

It also seems useful to revisit under what conditions 'array' could correctly interpret a generator expression, but in the context of streaming or deferred arrays.

Travis 


--
Travis Oliphant
(on a mobile)
512-826-7480


On Jan 31, 2012, at 4:22 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Tue, Jan 31, 2012 at 22:17, Travis Oliphant <travis at continuum.io> wrote:
>> I also agree that an exception should be raised at the very least.
>> 
>> It might also be possible to make the NumPy any, all, and sum functions
>> behave like the builtins when given a generator.  It seems worth exploring
>> at least.
> 
> I would rather we deprecate the all() and any() functions in favor of
> the alltrue() and sometrue() aliases that date back to Numeric.
> Renaming them to match the builtin names was a mistake.
> 
> -- 
> Robert Kern
> 
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From josef.pktd at gmail.com  Tue Jan 31 20:45:52 2012
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 31 Jan 2012 20:45:52 -0500
Subject: [Numpy-discussion] numpy all unexpected result (generator)
In-Reply-To: <07157234-E610-4A5F-8B18-AB6940129E96@continuum.io>
References: <jg8q6a$j3e$2@dough.gmane.org> <4F2802DB.3060705@gmail.com>
	<CANNq6F=27vv7LDbMawo8Eo+2yDdL7GvZSsPD7DWvwdoLbv0uqg@mail.gmail.com>
	<CAF6FJiuycVszKv5fRuRogF3x5j4JwNVy57A8xQUXa=E-QodScA@mail.gmail.com>
	<CANNq6F=Q3-2aDEe-5NA3TYcMgKdfVZ=HJgDS7-r8NZYhHszzNA@mail.gmail.com>
	<CAF6FJivWeWarYYiE=8ZBK5kr4CVRZuqbLeUvCuNk+_C75Ph_VA@mail.gmail.com>
	<CAFXk4bqTQSeKjVSS33p4GeWXucaDDfDznNhY3GVM_LKz67qe-Q@mail.gmail.com>
	<CANNq6Fmcn3HzuD-4wP=mxeGKsWkWJk3rvZjjePDftMeOTcfZow@mail.gmail.com>
	<95D8E7D7-C5D2-4A7A-97A4-CD826D53EFDE@continuum.io>
	<CAF6FJiuKtq6FzoMBLxk4_e4M4ViB4s2BO43rQ4fyGeeiyToKqQ@mail.gmail.com>
	<07157234-E610-4A5F-8B18-AB6940129E96@continuum.io>
Message-ID: <CAMMTP+CPOmM7WoaJuELCOMaMctJdR7ezA-=x6KQzhwQn9-JnxA@mail.gmail.com>

On Tue, Jan 31, 2012 at 5:35 PM, Travis Oliphant <travis at continuum.io> wrote:
> Actually i believe the NumPy 'any' and 'all' names pre-date the Python usage which first appeared in Python 2.5
>
> I agree with Chris that namespaces are a great idea. ?I don't agree with deprecating 'any' and 'all'

I completely agree here.

I also like to keep np.all, np.any, np.max, ...

>>> np.max((i>  0 for i in xrange (10)))
<generator object <genexpr> at 0x046493F0>
>>> max((i>  0 for i in xrange (10)))
True

I used an old-style matplotlib example as recipe yesterday, and the
first thing I did is getting rid of the missing name spaces, and I had
to think twice what amax and amin are.

aall, aany ??? ;)

Josef

>
> It also seems useful to revisit under what conditions 'array' could correctly interpret a generator expression, but in the context of streaming or deferred arrays.
>
> Travis
>
>
> --
> Travis Oliphant
> (on a mobile)
> 512-826-7480
>
>
> On Jan 31, 2012, at 4:22 PM, Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Tue, Jan 31, 2012 at 22:17, Travis Oliphant <travis at continuum.io> wrote:
>>> I also agree that an exception should be raised at the very least.
>>>
>>> It might also be possible to make the NumPy any, all, and sum functions
>>> behave like the builtins when given a generator. ?It seems worth exploring
>>> at least.
>>
>> I would rather we deprecate the all() and any() functions in favor of
>> the alltrue() and sometrue() aliases that date back to Numeric.
>> Renaming them to match the builtin names was a mistake.
>>
>> --
>> Robert Kern
>>
>> "I have come to believe that the whole world is an enigma, a harmless
>> enigma that is made terrible by our own mad attempt to interpret it as
>> though it had an underlying truth."
>> ? -- Umberto Eco
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion