From hoogendoorn.eelco at gmail.com  Mon Sep  1 03:49:50 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Mon, 1 Sep 2014 09:49:50 +0200
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
Message-ID: <CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>

Sure, id like to do the hashing things out, but I would also like some
preliminary feedback as to whether this is going in a direction anyone else
sees the point of, if it conflicts with other plans, and indeed if we can
agree that numpy is the right place for it; a point which I would very much
like to defend. If there is some obvious no-go that im missing, I can do
without the drudgery of writing proper documentation ;).

As for whether this belongs in numpy: yes, I would say so. There are the
extension of functionality to functions already in numpy, which are a
no-brainer (it need not cost anything performance wise, and ive needed
unique graph edges many many times), and there is the grouping
functionality, which is the main novelty.

However, note that the grouping functionality itself is a very small
addition, just a few 100 lines of pure python, given that the indexing
logic has been factored out of the classic arraysetops. At least from a
developers perspective, it very much feels like a logical extension of the
same 'thing'.

But also from a conceptual numpy perspective, grouping is really more an
'elementary manipulation of an ndarray' than a 'special purpose algorithm'.
It is useful for literally all kinds of programming; hence there is similar
functionality in the python standard library (itertools.groupby); so why
not have an efficient vectorized equivalent in numpy? It belongs there more
than the linalg module, arguably.

Also, from a community perspective, a significant fraction of all
stackoverflow numpy questions are (unknowingly) exactly about 'how to do
grouping in numpy'.


On Mon, Sep 1, 2014 at 4:36 AM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
>
> On Sun, Aug 31, 2014 at 1:48 PM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>> Ive organized all code I had relating to this subject in a github
>> repository <https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP>.
>> That should facilitate shooting around ideas. Ive also added more
>> documentation and structure to make it easier to see what is going on.
>>
>> Hopefully we can converge on a common vision, and then improve the
>> documentation and testing to make it worthy of including in the numpy
>> master.
>>
>> Note that there is also a complete rewrite of the classic
>> numpy.arraysetops, such that they are also generalized to more complex
>> input, such as finding unique graph edges, and so on.
>>
>> You mentioned getting the numpy core developers involved; are they not
>> subscribed to this mailing list? I wouldn't be surprised; youd hope there
>> is a channel of discussion concerning development with higher signal to
>> noise....
>>
>>
> There are only about 2.5 of us at the moment. Those for whom this is an
> itch that need scratching should hash things out and make a PR. The main
> question for me is if it belongs in numpy, scipy, or somewhere else.
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140901/9eaa5002/attachment.html>

From emel_hasdal at hotmail.com  Mon Sep  1 04:33:57 2014
From: emel_hasdal at hotmail.com (Emel Hasdal)
Date: Mon, 1 Sep 2014 01:33:57 -0700
Subject: [Numpy-discussion] How to install numpy on a box without hardware
	FPU
In-Reply-To: <BAY173-W65CCE70D271A9F2924BB382C60@phx.gbl>
References: <BAY173-W65CCE70D271A9F2924BB382C60@phx.gbl>
Message-ID: <BAY173-W149CFAC35E2EFF57698BFD82C60@phx.gbl>

Hello,
  Is it possible to configure/install numpy on a box without a hardware FPU? When I try to install it using pip,I get a bunch of compile errors since  floating-point exceptions (FE_DIVBYZERO etc) are undefined on this platform. 
How do I get numpy installed and working on such a platform?
Thanks,Emel 		 	   		   		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140901/0d7f7142/attachment.html>

From charlesr.harris at gmail.com  Mon Sep  1 08:05:24 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 1 Sep 2014 06:05:24 -0600
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
Message-ID: <CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>

On Mon, Sep 1, 2014 at 1:49 AM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

> Sure, id like to do the hashing things out, but I would also like some
> preliminary feedback as to whether this is going in a direction anyone else
> sees the point of, if it conflicts with other plans, and indeed if we can
> agree that numpy is the right place for it; a point which I would very much
> like to defend. If there is some obvious no-go that im missing, I can do
> without the drudgery of writing proper documentation ;).
>
> As for whether this belongs in numpy: yes, I would say so. There are the
> extension of functionality to functions already in numpy, which are a
> no-brainer (it need not cost anything performance wise, and ive needed
> unique graph edges many many times), and there is the grouping
> functionality, which is the main novelty.
>
> However, note that the grouping functionality itself is a very small
> addition, just a few 100 lines of pure python, given that the indexing
> logic has been factored out of the classic arraysetops. At least from a
> developers perspective, it very much feels like a logical extension of the
> same 'thing'.
>
> But also from a conceptual numpy perspective, grouping is really more an
> 'elementary manipulation of an ndarray' than a 'special purpose algorithm'.
> It is useful for literally all kinds of programming; hence there is similar
> functionality in the python standard library (itertools.groupby); so why
> not have an efficient vectorized equivalent in numpy? It belongs there more
> than the linalg module, arguably.
>
> Also, from a community perspective, a significant fraction of all
> stackoverflow numpy questions are (unknowingly) exactly about 'how to do
> grouping in numpy'.
>

What I'm trying to say is that numpy is a community project. We don't have
a central planning committee, the only difference between "developers" and
everyone else is activity and commit rights. Which is to say if you develop
and push this topic it is likely to go in. There certainly seems to be
interest in this functionality. The reason that I brought up scipy is that
there are some graph algorithms there that went in a couple of years ago.

Note that the convention on the list is bottom posting.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140901/a35a0335/attachment.html>

From njs at pobox.com  Mon Sep  1 08:18:15 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Mon, 1 Sep 2014 13:18:15 +0100
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
Message-ID: <CAPJVwBnCFXwLDFfq5HsHdvbd9O8WyW0+mxvTqAYh9CzAsVXQMw@mail.gmail.com>

On Mon, Sep 1, 2014 at 8:49 AM, Eelco Hoogendoorn
<hoogendoorn.eelco at gmail.com> wrote:
> Sure, id like to do the hashing things out, but I would also like some
> preliminary feedback as to whether this is going in a direction anyone else
> sees the point of, if it conflicts with other plans, and indeed if we can
> agree that numpy is the right place for it; a point which I would very much
> like to defend. If there is some obvious no-go that im missing, I can do
> without the drudgery of writing proper documentation ;).
>
> As for whether this belongs in numpy: yes, I would say so. There are the
> extension of functionality to functions already in numpy, which are a
> no-brainer (it need not cost anything performance wise, and ive needed
> unique graph edges many many times), and there is the grouping
> functionality, which is the main novelty.
>
> However, note that the grouping functionality itself is a very small
> addition, just a few 100 lines of pure python, given that the indexing logic
> has been factored out of the classic arraysetops. At least from a developers
> perspective, it very much feels like a logical extension of the same
> 'thing'.

My 2 cents: I definitely agree that this is very useful fundamental
functionality, and it would be great if numpy had a solution for it
out of the box. My main concern is that this is a fairly complicated
set of functionality and there are a lot of small decisions to be made
in setting up the API for it. IME it's very hard to just read through
an API like this and reason out the best way to do it by pure logic;
usually it needs to get banged on for a bit in real uses before it
becomes clear what the right set of trade-offs is. And numpy itself is
not a great environment these kinds of iterations. So, IMO the main
challenge is: how do we get the functionality into a state where we
can convince ourselves that it'll be supportable in numpy
indefinitely, and not need to be replaced in a year or two?

Some things that might help with this convincing:
- releasing it as a small standalone package on pypi and getting some
real users to bang on it
- any real code written against the APIs
- feedback from the pandas community since they've spent a lot of time
working on these issues
- ...?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From hoogendoorn.eelco at gmail.com  Mon Sep  1 09:58:57 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Mon, 1 Sep 2014 15:58:57 +0200
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
Message-ID: <CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>

On Mon, Sep 1, 2014 at 2:05 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
>
> On Mon, Sep 1, 2014 at 1:49 AM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>> Sure, id like to do the hashing things out, but I would also like some
>> preliminary feedback as to whether this is going in a direction anyone else
>> sees the point of, if it conflicts with other plans, and indeed if we can
>> agree that numpy is the right place for it; a point which I would very much
>> like to defend. If there is some obvious no-go that im missing, I can do
>> without the drudgery of writing proper documentation ;).
>>
>> As for whether this belongs in numpy: yes, I would say so. There are the
>> extension of functionality to functions already in numpy, which are a
>> no-brainer (it need not cost anything performance wise, and ive needed
>> unique graph edges many many times), and there is the grouping
>> functionality, which is the main novelty.
>>
>> However, note that the grouping functionality itself is a very small
>> addition, just a few 100 lines of pure python, given that the indexing
>> logic has been factored out of the classic arraysetops. At least from a
>> developers perspective, it very much feels like a logical extension of the
>> same 'thing'.
>>
>> But also from a conceptual numpy perspective, grouping is really more an
>> 'elementary manipulation of an ndarray' than a 'special purpose algorithm'.
>> It is useful for literally all kinds of programming; hence there is similar
>> functionality in the python standard library (itertools.groupby); so why
>> not have an efficient vectorized equivalent in numpy? It belongs there more
>> than the linalg module, arguably.
>>
>> Also, from a community perspective, a significant fraction of all
>> stackoverflow numpy questions are (unknowingly) exactly about 'how to do
>> grouping in numpy'.
>>
>
> What I'm trying to say is that numpy is a community project. We don't have
> a central planning committee, the only difference between "developers" and
> everyone else is activity and commit rights. Which is to say if you develop
> and push this topic it is likely to go in. There certainly seems to be
> interest in this functionality. The reason that I brought up scipy is that
> there are some graph algorithms there that went in a couple of years ago.
>
> Note that the convention on the list is bottom posting.
>
> <snip>
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I understand that numpy is a community project, so that the decision isn't
up to any one particular person; but some early stage feedback from those
active in the community would be welcome. I am generally confident that
this addition makes sense, but I have not contributed to numpy before,
and you don't know what you don't know and all... given that there are
multiple suggestions for changing arraysetops, some coordination would be
useful I think.

Note that I use graph edges merely as an example; the proposed
functionality is much more general than graphing algorithms specifically.
The radial reduction
<https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP/blob/master/examples.py>example
I included on github is particularly illustrative of the general utility of
grouping functionality I think. Operations like radial reductions are
rather common, and a custom implementation is quite lengthy, very bug
prone, and potentially very slow.

Thanks for the heads up on posting convention; ive always let gmail do my
thinking for me, which works well enough for me, but I can see how not
following this convention is annoying to others.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140901/245e3a23/attachment.html>

From kgabor79 at gmail.com  Mon Sep  1 11:23:20 2014
From: kgabor79 at gmail.com (Gabor Kovacs)
Date: Mon, 1 Sep 2014 16:23:20 +0100
Subject: [Numpy-discussion] ENH IncrementalWriter for .npy files
Message-ID: <CAPhngs4a_5vhc7HF_91Y8B+AC+bAVzXMOghx1E-c9n_p1z4-Aw@mail.gmail.com>

Dear All,

I would like to add a class for writing one (possibly big) .npy file
saving multiple (same dtype, compatible shape) arrays. My use case was
the saving of slowly accumulating data regularly for a long time into
one file.

Please find a first implementation under
https://github.com/numpy/numpy/pull/4987 . It currently supports
writing a new file only and only in C order in the file. Opening an
existing file for append and reading back parts from a very big .npy
file would be straightforward next steps for a full featured class.

The .npy file format is only affected by leaving some extra space for
re-writing the header later with a possibly bigger "shape" field,
respecting the 16-byte alignment.

Example:
```
A=np.array([[0,1,2,3,4,5,6,7],[8,9,10,11,12,13,14,15]])
with np.IncrementalWriter("testfile.npy",hdrupdate=True,flush=True) as W:
    W.save(A)
    W.save(A)
```

Feel free to comment this idea.

Cheers,
Gabor


From jtaylor.debian at googlemail.com  Mon Sep  1 17:45:02 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Mon, 01 Sep 2014 23:45:02 +0200
Subject: [Numpy-discussion] How to install numpy on a box without
 hardware FPU
In-Reply-To: <BAY173-W149CFAC35E2EFF57698BFD82C60@phx.gbl>
References: <BAY173-W65CCE70D271A9F2924BB382C60@phx.gbl>
	<BAY173-W149CFAC35E2EFF57698BFD82C60@phx.gbl>
Message-ID: <5404E8DE.10000@googlemail.com>

On 01.09.2014 10:33, Emel Hasdal wrote:
> Hello,
> 
>   Is it possible to configure/install numpy on a box without a hardware
> FPU? When I try to install it using pip,
> I get a bunch of compile errors since  floating-point
> exceptions (FE_DIVBYZERO etc) are undefined on this platform. 
> 
> How do I get numpy installed and working on such a platform?
> 

If its just that you can try replacing all the fenv stuff with stubs
doing nothing. You only lose some runtime warnings about special cases.

Why do you want to run numpy on such a system?
Numpy is not really intended to run on such devices.
But it is possible the debian armel port is so far I know softfloat and
numpy seems to be running fine, though it probably also emulates fenv.


From emel_hasdal at hotmail.com  Tue Sep  2 03:29:15 2014
From: emel_hasdal at hotmail.com (Emel Hasdal)
Date: Tue, 2 Sep 2014 00:29:15 -0700
Subject: [Numpy-discussion] How to install numpy on a box without
 hardware FPU
In-Reply-To: <5404E8DE.10000@googlemail.com>
References: <BAY173-W65CCE70D271A9F2924BB382C60@phx.gbl>,
	<BAY173-W149CFAC35E2EFF57698BFD82C60@phx.gbl>,
	<5404E8DE.10000@googlemail.com>
Message-ID: <BAY173-W493D01B40F351F45E3B8EB82C70@phx.gbl>

I am trying to run a python application which performs statistical calculations using Pandas which seem to depend on Numpy. Hence I have to install Numpy to get the app working.

Do you mean I can change 

   numpy/core/src/npymath/ieee754.c.src 

such that the functions referencing exceptions (npy_get_floatstatus, npy_clear_floatstatus, npy_set_floatstatus_divbyzer, npy_set_floatstatus_overflow, npy_set_floatstatus_underflow, npy_set_floatstatus_invalid) do nothing? 

Could there be any implications of this on the numpy functionality?

Thanks a lot,
Emel


> Date: Mon, 1 Sep 2014 23:45:02 +0200
> From: jtaylor.debian at googlemail.com
> To: numpy-discussion at scipy.org
> Subject: Re: [Numpy-discussion] How to install numpy on a box without hardware FPU
> 
> On 01.09.2014 10:33, Emel Hasdal wrote:
> > Hello,
> > 
> >   Is it possible to configure/install numpy on a box without a hardware
> > FPU? When I try to install it using pip,
> > I get a bunch of compile errors since  floating-point
> > exceptions (FE_DIVBYZERO etc) are undefined on this platform. 
> > 
> > How do I get numpy installed and working on such a platform?
> > 
> 
> If its just that you can try replacing all the fenv stuff with stubs
> doing nothing. You only lose some runtime warnings about special cases.
> 
> Why do you want to run numpy on such a system?
> Numpy is not really intended to run on such devices.
> But it is possible the debian armel port is so far I know softfloat and
> numpy seems to be running fine, though it probably also emulates fenv.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140902/9bdf9f83/attachment.html>

From Jerome.Kieffer at esrf.fr  Tue Sep  2 03:32:29 2014
From: Jerome.Kieffer at esrf.fr (Jerome Kieffer)
Date: Tue, 2 Sep 2014 09:32:29 +0200
Subject: [Numpy-discussion] ENH IncrementalWriter for .npy files
In-Reply-To: <CAPhngs4a_5vhc7HF_91Y8B+AC+bAVzXMOghx1E-c9n_p1z4-Aw@mail.gmail.com>
References: <CAPhngs4a_5vhc7HF_91Y8B+AC+bAVzXMOghx1E-c9n_p1z4-Aw@mail.gmail.com>
Message-ID: <20140902093229.da5a20b7827711cab08e287e@esrf.fr>


Hi,

This feature is very similar to what is available in hdf5 and exposed
under h5py using chunks and max_size ...


Cheers,

-- 
J?r?me Kieffer
tel +33 476 882 445


From cournape at gmail.com  Tue Sep  2 04:00:37 2014
From: cournape at gmail.com (David Cournapeau)
Date: Tue, 2 Sep 2014 09:00:37 +0100
Subject: [Numpy-discussion] How to install numpy on a box without
	hardware FPU
In-Reply-To: <BAY173-W493D01B40F351F45E3B8EB82C70@phx.gbl>
References: <BAY173-W65CCE70D271A9F2924BB382C60@phx.gbl>
	<BAY173-W149CFAC35E2EFF57698BFD82C60@phx.gbl>
	<5404E8DE.10000@googlemail.com>
	<BAY173-W493D01B40F351F45E3B8EB82C70@phx.gbl>
Message-ID: <CAGY4rcX2iT8HH2nnDG9Aor_hPqe7F3z+ea+Js2vENnjYd05FBw@mail.gmail.com>

On Tue, Sep 2, 2014 at 8:29 AM, Emel Hasdal <emel_hasdal at hotmail.com> wrote:

> I am trying to run a python application which performs statistical
> calculations using Pandas which seem to depend on Numpy. Hence I have to
> install Numpy to get the app working.
>
> Do you mean I can change
>
>    numpy/core/src/npymath/ieee754.c.src
>
> such that the functions referencing exceptions (npy_get_floatstatus,
> npy_clear_floatstatus, npy_set_floatstatus_divbyzer,
> npy_set_floatstatus_overflow, npy_set_floatstatus_underflow,
> npy_set_floatstatus_invalid) do nothing?
>
> Could there be any implications of this on the numpy functionality?
>

AFAIK, few people have ever tried to run numpy on CPU without an FPU. The
generic answer is that we do not know, as this is not a supported platform,
so you are on your own.

I would suggest you just try adding stubs to see how far you can go, and
come back to this group with the result of your investigation.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140902/9a7812f7/attachment.html>

From emel_hasdal at hotmail.com  Tue Sep  2 04:38:49 2014
From: emel_hasdal at hotmail.com (Emel Hasdal)
Date: Tue, 2 Sep 2014 01:38:49 -0700
Subject: [Numpy-discussion] How to install numpy on a box without
 hardware FPU
In-Reply-To: <CAGY4rcX2iT8HH2nnDG9Aor_hPqe7F3z+ea+Js2vENnjYd05FBw@mail.gmail.com>
References: <BAY173-W65CCE70D271A9F2924BB382C60@phx.gbl>,
	<BAY173-W149CFAC35E2EFF57698BFD82C60@phx.gbl>,
	<5404E8DE.10000@googlemail.com>,
	<BAY173-W493D01B40F351F45E3B8EB82C70@phx.gbl>,
	<CAGY4rcX2iT8HH2nnDG9Aor_hPqe7F3z+ea+Js2vENnjYd05FBw@mail.gmail.com>
Message-ID: <BAY173-W4213C97D2565E2834250CA82C70@phx.gbl>

Sure. I would be happy to try it and report back.
Just to make sure I understand it: 
I will try defining   the following functions for my platform such that they do nothing:
          npy_get_floatstatus,           npy_clear_floatstatus,           npy_set_floatstatus_divbyzer,           npy_set_floatstatus_overflow,           npy_set_floatstatus_underflow,           npy_set_floatstatus_invalid

Thanks,Emel
Date: Tue, 2 Sep 2014 09:00:37 +0100
From: cournape at gmail.com
To: numpy-discussion at scipy.org
Subject: Re: [Numpy-discussion] How to install numpy on a box without	hardware FPU


On Tue, Sep 2, 2014 at 8:29 AM, Emel Hasdal <emel_hasdal at hotmail.com> wrote:


I am trying to run a python application which performs statistical calculations using Pandas which seem to depend on Numpy. Hence I have to install Numpy to get the app working.

Do you mean I can change 


   numpy/core/src/npymath/ieee754.c.src 

such that the functions referencing exceptions (npy_get_floatstatus, npy_clear_floatstatus, npy_set_floatstatus_divbyzer, npy_set_floatstatus_overflow, npy_set_floatstatus_underflow, npy_set_floatstatus_invalid) do nothing? 


Could there be any implications of this on the numpy functionality?

AFAIK, few people have ever tried to run numpy on CPU without an FPU. The generic answer is that we do not know, as this is not a supported platform, so you are on your own.

I would suggest you just try adding stubs to see how far you can go, and come back to this group with the result of your investigation.
David

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140902/e6d8defe/attachment.html>

From charlesr.harris at gmail.com  Tue Sep  2 20:40:09 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 2 Sep 2014 18:40:09 -0600
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
Message-ID: <CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>

On Mon, Sep 1, 2014 at 7:58 AM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

> On Mon, Sep 1, 2014 at 2:05 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>>
>> On Mon, Sep 1, 2014 at 1:49 AM, Eelco Hoogendoorn <
>> hoogendoorn.eelco at gmail.com> wrote:
>>
>>> Sure, id like to do the hashing things out, but I would also like some
>>> preliminary feedback as to whether this is going in a direction anyone else
>>> sees the point of, if it conflicts with other plans, and indeed if we can
>>> agree that numpy is the right place for it; a point which I would very much
>>> like to defend. If there is some obvious no-go that im missing, I can do
>>> without the drudgery of writing proper documentation ;).
>>>
>>> As for whether this belongs in numpy: yes, I would say so. There are the
>>> extension of functionality to functions already in numpy, which are a
>>> no-brainer (it need not cost anything performance wise, and ive needed
>>> unique graph edges many many times), and there is the grouping
>>> functionality, which is the main novelty.
>>>
>>> However, note that the grouping functionality itself is a very small
>>> addition, just a few 100 lines of pure python, given that the indexing
>>> logic has been factored out of the classic arraysetops. At least from a
>>> developers perspective, it very much feels like a logical extension of the
>>> same 'thing'.
>>>
>>> But also from a conceptual numpy perspective, grouping is really more an
>>> 'elementary manipulation of an ndarray' than a 'special purpose algorithm'.
>>> It is useful for literally all kinds of programming; hence there is similar
>>> functionality in the python standard library (itertools.groupby); so why
>>> not have an efficient vectorized equivalent in numpy? It belongs there more
>>> than the linalg module, arguably.
>>>
>>> Also, from a community perspective, a significant fraction of all
>>> stackoverflow numpy questions are (unknowingly) exactly about 'how to do
>>> grouping in numpy'.
>>>
>>
>> What I'm trying to say is that numpy is a community project. We don't
>> have a central planning committee, the only difference between "developers"
>> and everyone else is activity and commit rights. Which is to say if you
>> develop and push this topic it is likely to go in. There certainly seems to
>> be interest in this functionality. The reason that I brought up scipy is
>> that there are some graph algorithms there that went in a couple of years
>> ago.
>>
>> Note that the convention on the list is bottom posting.
>>
>> <snip>
>>
>> Chuck
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> I understand that numpy is a community project, so that the decision isn't
> up to any one particular person; but some early stage feedback from those
> active in the community would be welcome. I am generally confident that
> this addition makes sense, but I have not contributed to numpy before,
> and you don't know what you don't know and all... given that there are
> multiple suggestions for changing arraysetops, some coordination would be
> useful I think.
>
> Note that I use graph edges merely as an example; the proposed
> functionality is much more general than graphing algorithms specifically.
> The radial reduction
> <https://github.com/EelcoHoogendoorn/Numpy_arraysetops_EP/blob/master/examples.py>example
> I included on github is particularly illustrative of the general utility of
> grouping functionality I think. Operations like radial reductions are
> rather common, and a custom implementation is quite lengthy, very bug
> prone, and potentially very slow.
>
> Thanks for the heads up on posting convention; ive always let gmail do my
> thinking for me, which works well enough for me, but I can see how not
> following this convention is annoying to others.
>
>
What do you think about the suggestion of timsort? One would need to
concatenate the arrays before sorting, but it should be fairly efficient.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140902/2a2d2a2d/attachment.html>

From jaime.frio at gmail.com  Tue Sep  2 22:07:52 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Tue, 2 Sep 2014 19:07:52 -0700
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
Message-ID: <CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>

On Tue, Sep 2, 2014 at 5:40 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:
>
>
> What do you think about the suggestion of timsort? One would need to
> concatenate the arrays before sorting, but it should be fairly efficient.
>

Timsort is very cool, and it would definitely be fun to implement in numpy.
It is also a lot more work that merging two sorted arrays! I think +1 if
someone else does it, but although I would love to be part of it, I am not
sure I will be able to find time to look into it seriously in the next
couple of months.

>From a setops point of view, merging two sorted arrays makes it very
straightforward to compute, together with (or instead of) the result of the
merge, index arrays that let you calculate things like `in1d` faster.
Although perhaps an `argtimsort` could provide the same functionality, not
sure. I will probably wrap up what I have, put a lace on it, and submit it
as a PR. Even if it is not destined to be merged, it may serve as a warning
to others.

I have also been thinking lately that one of the problems with all these
unique-related computations may be a case of having a hammer and seeing
everything as nails. Perhaps the algorithm that needs to be ported from
Python is not the sorting one, but the hash table...

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140902/9e263e38/attachment.html>

From cjw at ncf.ca  Wed Sep  3 08:26:02 2014
From: cjw at ncf.ca (cjw)
Date: Wed, 03 Sep 2014 08:26:02 -0400
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
Message-ID: <540708DA.7070604@ncf.ca>

Re: [Numpy-discussion] Does a `mergesorted` function make sense?

On 02/09/2014 8:40 PM, Charles R Harris wrote:
>
>
>
> On Mon, Sep 1, 2014 at 7:58 AM, Eelco Hoogendoorn 
> <hoogendoorn.eelco at gmail.com <mailto:hoogendoorn.eelco at gmail.com>> wrote:
>
>     On Mon, Sep 1, 2014 at 2:05 PM, Charles R Harris
>     <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
>
>
>
>
>         On Mon, Sep 1, 2014 at 1:49 AM, Eelco Hoogendoorn
>         <hoogendoorn.eelco at gmail.com
>         <mailto:hoogendoorn.eelco at gmail.com>> wrote:
>
>             Sure, id like to do the hashing things out, but I would
>             also like some preliminary feedback as to whether this is
>             going in a direction anyone else sees the point of, if it
>             conflicts with other plans, and indeed if we can agree
>             that numpy is the right place for it; a point which I
>             would very much like to defend. If there is some obvious
>             no-go that im missing, I can do without the drudgery of
>             writing proper documentation ;).
>
These are good issues, that need to be discussed and resolved. Python 
has the benefit of having a BDFL.  Numpy has no similar arrangement.
In the post-numarray period, Travis Oliphant took that role and advanced 
the package in many ways.

It seems that Travis is no longer available.  There is a need for 
someone, or maybe some group of three, who would be guided by the sort 
of thinking Charles Harris sets out above, who would make the final 
decision.

Colin W.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/95538982/attachment.html>

From hoogendoorn.eelco at gmail.com  Wed Sep  3 09:41:51 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Wed, 3 Sep 2014 15:41:51 +0200
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
Message-ID: <CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>

On Wed, Sep 3, 2014 at 4:07 AM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Tue, Sep 2, 2014 at 5:40 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>>
>>
>> What do you think about the suggestion of timsort? One would need to
>> concatenate the arrays before sorting, but it should be fairly efficient.
>>
>
> Timsort is very cool, and it would definitely be fun to implement in
> numpy. It is also a lot more work that merging two sorted arrays! I think
> +1 if someone else does it, but although I would love to be part of it, I
> am not sure I will be able to find time to look into it seriously in the
> next couple of months.
>
> From a setops point of view, merging two sorted arrays makes it very
> straightforward to compute, together with (or instead of) the result of the
> merge, index arrays that let you calculate things like `in1d` faster.
> Although perhaps an `argtimsort` could provide the same functionality, not
> sure. I will probably wrap up what I have, put a lace on it, and submit it
> as a PR. Even if it is not destined to be merged, it may serve as a warning
> to others.
>
> I have also been thinking lately that one of the problems with all these
> unique-related computations may be a case of having a hammer and seeing
> everything as nails. Perhaps the algorithm that needs to be ported from
> Python is not the sorting one, but the hash table...
>
> Jaime
>
>
 Not sure about the hashing. Indeed one can also build an index of a set by
means of a hash table, but its questionable if this leads to improved
performance over performing an argsort. Hashing may have better asymptotic
time complexity in theory, but many datasets used in practice are very easy
to sort (O(N)-ish), and the time-constant of hashing is higher. But more
importantly, using a hash-table guarantees poor cache behavior for many
operations using this index. By contrast, sorting may (but need not) make
one random access pass to build the index, and may (but need not) perform
one random access to reorder values for grouping. But insofar as the keys
are better behaved than pure random, this coherence will be exploited.

Also, getting the unique values/keys in sorted order is a nice side-benefit
for many applications.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/aaae1984/attachment.html>

From jaime.frio at gmail.com  Wed Sep  3 12:33:02 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Wed, 3 Sep 2014 09:33:02 -0700
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
Message-ID: <CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>

On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

>  Not sure about the hashing. Indeed one can also build an index of a set
> by means of a hash table, but its questionable if this leads to improved
> performance over performing an argsort. Hashing may have better asymptotic
> time complexity in theory, but many datasets used in practice are very easy
> to sort (O(N)-ish), and the time-constant of hashing is higher. But more
> importantly, using a hash-table guarantees poor cache behavior for many
> operations using this index. By contrast, sorting may (but need not) make
> one random access pass to build the index, and may (but need not) perform
> one random access to reorder values for grouping. But insofar as the keys
> are better behaved than pure random, this coherence will be exploited.
>

If you want to give it a try, these branch of my numpy fork has hash table
based implementations of unique (with no extra indices) and in1d:

https://github.com/jaimefrio/numpy/tree/hash-unique

A use cases where the hash table is clearly better:

In [1]: import numpy as np
In [2]: from numpy.lib._compiled_base import _unique, _in1d

In [3]: a = np.random.randint(10, size=(10000,))
In [4]: %timeit np.unique(a)
1000 loops, best of 3: 258 us per loop
In [5]: %timeit _unique(a)
10000 loops, best of 3: 143 us per loop
In [6]: %timeit np.sort(_unique(a))
10000 loops, best of 3: 149 us per loop

It typically performs between 1.5x and 4x faster than sorting. I haven't
profiled it properly to know, but there may be quite a bit of performance
to dig out: have type specific comparison functions, optimize the starting
hash table size based on the size of the array to avoid reinsertions...

If getting the elements sorted is a necessity, and the array contains very
few or no repeated items, then the hash table approach may even perform
worse,:

In [8]: a = np.random.randint(10000, size=(5000,))
In [9]: %timeit np.unique(a)
1000 loops, best of 3: 277 us per loop
In [10]: %timeit np.sort(_unique(a))
1000 loops, best of 3: 320 us per loop

But the hash table still wins in extracting the unique items only:

In [11]: %timeit _unique(a)
10000 loops, best of 3: 187 us per loop

Where the hash table shines is in more elaborate situations. If you keep
the first index where it was found, and the number of repeats, in the hash
table, you can get return_index and return_counts almost for free, which
means you are performing an extra 3x faster than with sorting.
return_inverse requires a little more trickery, so I won;t attempt to
quantify the improvement. But I wouldn't be surprised if, after fine tuning
it, there is close to an order of magnitude overall improvement

The spped-up for in1d is also nice:

In [16]: a = np.random.randint(1000, size=(1000,))
In [17]: b = np.random.randint(1000, size=(500,))
In [18]: %timeit np.in1d(a, b)
1000 loops, best of 3: 178 us per loop
In [19]: %timeit _in1d(a, b)
10000 loops, best of 3: 30.1 us per loop

Of course, there is no point in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/9b4a97b2/attachment.html>

From jaime.frio at gmail.com  Wed Sep  3 12:46:16 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Wed, 3 Sep 2014 09:46:16 -0700
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
	<CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
Message-ID: <CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>

On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>>  Not sure about the hashing. Indeed one can also build an index of a set
>> by means of a hash table, but its questionable if this leads to improved
>> performance over performing an argsort. Hashing may have better asymptotic
>> time complexity in theory, but many datasets used in practice are very easy
>> to sort (O(N)-ish), and the time-constant of hashing is higher. But more
>> importantly, using a hash-table guarantees poor cache behavior for many
>> operations using this index. By contrast, sorting may (but need not) make
>> one random access pass to build the index, and may (but need not) perform
>> one random access to reorder values for grouping. But insofar as the keys
>> are better behaved than pure random, this coherence will be exploited.
>>
>
> If you want to give it a try, these branch of my numpy fork has hash table
> based implementations of unique (with no extra indices) and in1d:
>
> https://github.com/jaimefrio/numpy/tree/hash-unique
>
> A use cases where the hash table is clearly better:
>
> In [1]: import numpy as np
> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>
> In [3]: a = np.random.randint(10, size=(10000,))
> In [4]: %timeit np.unique(a)
> 1000 loops, best of 3: 258 us per loop
> In [5]: %timeit _unique(a)
> 10000 loops, best of 3: 143 us per loop
> In [6]: %timeit np.sort(_unique(a))
> 10000 loops, best of 3: 149 us per loop
>
> It typically performs between 1.5x and 4x faster than sorting. I haven't
> profiled it properly to know, but there may be quite a bit of performance
> to dig out: have type specific comparison functions, optimize the starting
> hash table size based on the size of the array to avoid reinsertions...
>
> If getting the elements sorted is a necessity, and the array contains very
> few or no repeated items, then the hash table approach may even perform
> worse,:
>
> In [8]: a = np.random.randint(10000, size=(5000,))
> In [9]: %timeit np.unique(a)
> 1000 loops, best of 3: 277 us per loop
> In [10]: %timeit np.sort(_unique(a))
> 1000 loops, best of 3: 320 us per loop
>
> But the hash table still wins in extracting the unique items only:
>
> In [11]: %timeit _unique(a)
> 10000 loops, best of 3: 187 us per loop
>
> Where the hash table shines is in more elaborate situations. If you keep
> the first index where it was found, and the number of repeats, in the hash
> table, you can get return_index and return_counts almost for free, which
> means you are performing an extra 3x faster than with sorting.
> return_inverse requires a little more trickery, so I won;t attempt to
> quantify the improvement. But I wouldn't be surprised if, after fine tuning
> it, there is close to an order of magnitude overall improvement
>
> The spped-up for in1d is also nice:
>
> In [16]: a = np.random.randint(1000, size=(1000,))
> In [17]: b = np.random.randint(1000, size=(500,))
> In [18]: %timeit np.in1d(a, b)
> 1000 loops, best of 3: 178 us per loop
> In [19]: %timeit _in1d(a, b)
> 10000 loops, best of 3: 30.1 us per loop
>
> Of course, there is no point in
>

Ooops!!! Hit the send button too quick. Not to extend myself too long: if
we are going to rethink all of this, we should approach it with an open
mind. Still, and this post is not helping with that either, I am afraid
that we are discussing implementation details, but are missing a broader
vision of what we want to accomplish and why. That vision of what numpy's
grouping functionality, if any, should be, and how it complements or
conflicts with what pandas is providing, should precede anything else. I
now I haven't, but has anyone looked at how pandas implements grouping?
Their documentation on the subject is well worth a read:

http://pandas.pydata.org/pandas-docs/stable/groupby.html

Does numpy need to replicate this? What/why/how can we add to that?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/037d2e1f/attachment.html>

From jaime.frio at gmail.com  Wed Sep  3 12:53:12 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Wed, 3 Sep 2014 09:53:12 -0700
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <540708DA.7070604@ncf.ca>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<540708DA.7070604@ncf.ca>
Message-ID: <CAPOWHWncBqz3km2ijGbDPS0N6mZwQjYxFORKzpAQOkZQCUxE6A@mail.gmail.com>

On Wed, Sep 3, 2014 at 5:26 AM, cjw <cjw at ncf.ca> wrote:

> These are good issues, that need to be discussed and resolved.  Python has
> the benefit of having a BDFL.  Numpy has no similar arrangement.
> In the post-numarray period, Travis Oliphant took that role and advanced
> the package in many ways.
>
> It seems that Travis is no longer available.  There is a need for someone,
> or maybe some group of three, who would be guided by the sort of thinking
> Charles Harris sets out above, who would make the final decision.
>

We should crown Charles philosopher king, and let him wisely rule over us
with the aid of his aristocracy of devs with merge rights. We would make
Plato proud! :-)

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/5affac45/attachment.html>

From charlesr.harris at gmail.com  Wed Sep  3 13:25:45 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 3 Sep 2014 11:25:45 -0600
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAPOWHWncBqz3km2ijGbDPS0N6mZwQjYxFORKzpAQOkZQCUxE6A@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<540708DA.7070604@ncf.ca>
	<CAPOWHWncBqz3km2ijGbDPS0N6mZwQjYxFORKzpAQOkZQCUxE6A@mail.gmail.com>
Message-ID: <CAB6mnxKWkNaLki4agvcVcRQS4UusRf2fDBpg+u5KTQQgUjxNgw@mail.gmail.com>

On Wed, Sep 3, 2014 at 10:53 AM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Wed, Sep 3, 2014 at 5:26 AM, cjw <cjw at ncf.ca> wrote:
>
>> These are good issues, that need to be discussed and resolved.  Python
>> has the benefit of having a BDFL.  Numpy has no similar arrangement.
>> In the post-numarray period, Travis Oliphant took that role and advanced
>> the package in many ways.
>>
>> It seems that Travis is no longer available.  There is a need for
>> someone, or maybe some group of three, who would be guided by the sort of
>> thinking Charles Harris sets out above, who would make the final decision.
>>
>
> We should crown Charles philosopher king, and let him wisely rule over us
> with the aid of his aristocracy of devs with merge rights. We would make
> Plato proud! :-)
>

Charles I needs an heir in case of execution by the Parliamentary forces.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/1f44f1cc/attachment.html>

From alan.isaac at gmail.com  Wed Sep  3 17:19:56 2014
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Wed, 03 Sep 2014 17:19:56 -0400
Subject: [Numpy-discussion] odd (?) behavior: negative integer scalar in
	exponent
Message-ID: <540785FC.2090903@gmail.com>

What should be the value of `2**np.int_(-32)`?
It is apparently currently computed as `1. / (2**np.int_(32))`,
so the computation overflows (when a C long is 32 bits).
I would have hoped for it to be computed as `1./(2.**np.int_(32))`.

Cheers,
Alan Isaac


From charlesr.harris at gmail.com  Wed Sep  3 17:47:27 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 3 Sep 2014 15:47:27 -0600
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
Message-ID: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>

Hi All,

I'd like to give Jaime commit rights. Having at three active developers
with commit rights is the goal and Jaime has been pretty consistent with
code submissions and discussion participation.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/d82da01a/attachment.html>

From robert.kern at gmail.com  Wed Sep  3 17:48:45 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 3 Sep 2014 22:48:45 +0100
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
In-Reply-To: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
References: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
Message-ID: <CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>

On Wed, Sep 3, 2014 at 10:47 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Hi All,
>
> I'd like to give Jaime commit rights. Having at three active developers with
> commit rights is the goal and Jaime has been pretty consistent with code
> submissions and discussion participation.

+1

-- 
Robert Kern


From ralf.gommers at gmail.com  Wed Sep  3 18:42:47 2014
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 4 Sep 2014 00:42:47 +0200
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
In-Reply-To: <CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>
References: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
	<CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>
Message-ID: <CABL7CQgKjmskwbu3s_pZT-W7JwQVs2U2xuQ1J=-pqy+_+Fnt-w@mail.gmail.com>

On Wed, Sep 3, 2014 at 11:48 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Sep 3, 2014 at 10:47 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > Hi All,
> >
> > I'd like to give Jaime commit rights. Having at three active developers
> with
> > commit rights is the goal and Jaime has been pretty consistent with code
> > submissions and discussion participation.
>
> +1
>

+1 excellent idea

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/0f60ce0c/attachment.html>

From charlesr.harris at gmail.com  Wed Sep  3 19:25:03 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 3 Sep 2014 17:25:03 -0600
Subject: [Numpy-discussion] odd (?) behavior: negative integer scalar in
	exponent
In-Reply-To: <540785FC.2090903@gmail.com>
References: <540785FC.2090903@gmail.com>
Message-ID: <CAB6mnxLj4h16xWSaw-jryqUboTr57BaURLfamKUV76rS0eEfTQ@mail.gmail.com>

On Wed, Sep 3, 2014 at 3:19 PM, Alan G Isaac <alan.isaac at gmail.com> wrote:

> What should be the value of `2**np.int_(-32)`?
> It is apparently currently computed as `1. / (2**np.int_(32))`,
> so the computation overflows (when a C long is 32 bits).
> I would have hoped for it to be computed as `1./(2.**np.int_(32))`.
>
>
Looks like a bug to me.

Chuck.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/85687b94/attachment.html>

From hoogendoorn.eelco at gmail.com  Wed Sep  3 19:48:10 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Thu, 4 Sep 2014 01:48:10 +0200
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
In-Reply-To: <CABL7CQgKjmskwbu3s_pZT-W7JwQVs2U2xuQ1J=-pqy+_+Fnt-w@mail.gmail.com>
References: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
	<CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>
	<CABL7CQgKjmskwbu3s_pZT-W7JwQVs2U2xuQ1J=-pqy+_+Fnt-w@mail.gmail.com>
Message-ID: <CAO0rnfE2i4F93q0ts7hJ_ZX43BYvjq-_m4+011FV_GrtEtoLbA@mail.gmail.com>

+1; though I am relatively new to the scene, Jaime's contributions have
always stood out to me as thoughtful.


On Thu, Sep 4, 2014 at 12:42 AM, Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
>
> On Wed, Sep 3, 2014 at 11:48 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> On Wed, Sep 3, 2014 at 10:47 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> > Hi All,
>> >
>> > I'd like to give Jaime commit rights. Having at three active developers
>> with
>> > commit rights is the goal and Jaime has been pretty consistent with code
>> > submissions and discussion participation.
>>
>> +1
>>
>
> +1 excellent idea
>
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/fb3c7d69/attachment.html>

From charlesr.harris at gmail.com  Wed Sep  3 20:47:41 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 3 Sep 2014 18:47:41 -0600
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
In-Reply-To: <CAO0rnfE2i4F93q0ts7hJ_ZX43BYvjq-_m4+011FV_GrtEtoLbA@mail.gmail.com>
References: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
	<CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>
	<CABL7CQgKjmskwbu3s_pZT-W7JwQVs2U2xuQ1J=-pqy+_+Fnt-w@mail.gmail.com>
	<CAO0rnfE2i4F93q0ts7hJ_ZX43BYvjq-_m4+011FV_GrtEtoLbA@mail.gmail.com>
Message-ID: <CAB6mnxL3CVh2VKZoixam+6DYvPo0V+f-P6FdMWd35q3U1Ws-PA@mail.gmail.com>

On Wed, Sep 3, 2014 at 5:48 PM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

> +1; though I am relatively new to the scene, Jaime's contributions have
> always stood out to me as thoughtful.
>
>
> On Thu, Sep 4, 2014 at 12:42 AM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>>
>> On Wed, Sep 3, 2014 at 11:48 PM, Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>> On Wed, Sep 3, 2014 at 10:47 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> > Hi All,
>>> >
>>> > I'd like to give Jaime commit rights. Having at three active
>>> developers with
>>> > commit rights is the goal and Jaime has been pretty consistent with
>>> code
>>> > submissions and discussion participation.
>>>
>>> +1
>>>
>>
>> +1 excellent idea
>>
>
I think the ayes will have it.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/c9d4db22/attachment.html>

From jaime.frio at gmail.com  Wed Sep  3 21:34:13 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Wed, 3 Sep 2014 18:34:13 -0700
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
In-Reply-To: <CAB6mnxL3CVh2VKZoixam+6DYvPo0V+f-P6FdMWd35q3U1Ws-PA@mail.gmail.com>
References: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
	<CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>
	<CABL7CQgKjmskwbu3s_pZT-W7JwQVs2U2xuQ1J=-pqy+_+Fnt-w@mail.gmail.com>
	<CAO0rnfE2i4F93q0ts7hJ_ZX43BYvjq-_m4+011FV_GrtEtoLbA@mail.gmail.com>
	<CAB6mnxL3CVh2VKZoixam+6DYvPo0V+f-P6FdMWd35q3U1Ws-PA@mail.gmail.com>
Message-ID: <CAPOWHW=PCWzc-aP74zTF5zwRDOE8UXfF8SWk-hNkHZnPh=BfVw@mail.gmail.com>

On Wed, Sep 3, 2014 at 5:47 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
> I think the ayes will have it.
>

As I told Chuck (because I now get to call Charles Chuck, right? :-)), I am
not sure I am fully qualified for the job: looking at the names on that
list is a humbling experience. But even if I am the idiot brother, it feels
good to be part of the family.

Numpy has provided me with countless hours of learning and enjoyment. and I
really look forward to giving back, even if only a fraction of that.

Thanks a lot for the trust!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140903/49469d4f/attachment.html>

From sebastian at sipsolutions.net  Thu Sep  4 03:46:50 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 04 Sep 2014 09:46:50 +0200
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
In-Reply-To: <CAB6mnxL3CVh2VKZoixam+6DYvPo0V+f-P6FdMWd35q3U1Ws-PA@mail.gmail.com>
References: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
	<CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>
	<CABL7CQgKjmskwbu3s_pZT-W7JwQVs2U2xuQ1J=-pqy+_+Fnt-w@mail.gmail.com>
	<CAO0rnfE2i4F93q0ts7hJ_ZX43BYvjq-_m4+011FV_GrtEtoLbA@mail.gmail.com>
	<CAB6mnxL3CVh2VKZoixam+6DYvPo0V+f-P6FdMWd35q3U1Ws-PA@mail.gmail.com>
Message-ID: <1409816810.9142.2.camel@sebastian-t440>

On Mi, 2014-09-03 at 18:47 -0600, Charles R Harris wrote:
> 
> 
> 
> On Wed, Sep 3, 2014 at 5:48 PM, Eelco Hoogendoorn
> <hoogendoorn.eelco at gmail.com> wrote:
>         +1; though I am relatively new to the scene, Jaime's
>         contributions have always stood out to me as thoughtful.
>         
>         
>         On Thu, Sep 4, 2014 at 12:42 AM, Ralf Gommers
>         <ralf.gommers at gmail.com> wrote:
>         
>                 
>                 
>                 
>                 On Wed, Sep 3, 2014 at 11:48 PM, Robert Kern
>                 <robert.kern at gmail.com> wrote:
>                         On Wed, Sep 3, 2014 at 10:47 PM, Charles R
>                         Harris
>                         <charlesr.harris at gmail.com> wrote:
>                         > Hi All,
>                         >
>                         > I'd like to give Jaime commit rights. Having
>                         at three active developers with
>                         > commit rights is the goal and Jaime has been
>                         pretty consistent with code
>                         > submissions and discussion participation.
>                         
>                         
>                         +1
>                 
>                 
>                 +1 excellent idea
>                 
> 
> 
> I think the ayes will have it.
> 

Aye.

> 
> Chuck
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/e45284a3/attachment.sig>

From hoogendoorn.eelco at gmail.com  Thu Sep  4 04:31:01 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Thu, 4 Sep 2014 10:31:01 +0200
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
	<CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
	<CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>
Message-ID: <CAO0rnfG+fDS3z601o4JWH+KHxeYEdzKTfZj=Ht4Dk-hMzYFC-A@mail.gmail.com>

On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
>> hoogendoorn.eelco at gmail.com> wrote:
>>
>>>  Not sure about the hashing. Indeed one can also build an index of a set
>>> by means of a hash table, but its questionable if this leads to improved
>>> performance over performing an argsort. Hashing may have better asymptotic
>>> time complexity in theory, but many datasets used in practice are very easy
>>> to sort (O(N)-ish), and the time-constant of hashing is higher. But more
>>> importantly, using a hash-table guarantees poor cache behavior for many
>>> operations using this index. By contrast, sorting may (but need not) make
>>> one random access pass to build the index, and may (but need not) perform
>>> one random access to reorder values for grouping. But insofar as the keys
>>> are better behaved than pure random, this coherence will be exploited.
>>>
>>
>> If you want to give it a try, these branch of my numpy fork has hash
>> table based implementations of unique (with no extra indices) and in1d:
>>
>>  https://github.com/jaimefrio/numpy/tree/hash-unique
>>
>> A use cases where the hash table is clearly better:
>>
>> In [1]: import numpy as np
>> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>>
>> In [3]: a = np.random.randint(10, size=(10000,))
>> In [4]: %timeit np.unique(a)
>> 1000 loops, best of 3: 258 us per loop
>> In [5]: %timeit _unique(a)
>> 10000 loops, best of 3: 143 us per loop
>> In [6]: %timeit np.sort(_unique(a))
>> 10000 loops, best of 3: 149 us per loop
>>
>> It typically performs between 1.5x and 4x faster than sorting. I haven't
>> profiled it properly to know, but there may be quite a bit of performance
>> to dig out: have type specific comparison functions, optimize the starting
>> hash table size based on the size of the array to avoid reinsertions...
>>
>> If getting the elements sorted is a necessity, and the array contains
>> very few or no repeated items, then the hash table approach may even
>> perform worse,:
>>
>> In [8]: a = np.random.randint(10000, size=(5000,))
>> In [9]: %timeit np.unique(a)
>> 1000 loops, best of 3: 277 us per loop
>> In [10]: %timeit np.sort(_unique(a))
>> 1000 loops, best of 3: 320 us per loop
>>
>> But the hash table still wins in extracting the unique items only:
>>
>> In [11]: %timeit _unique(a)
>> 10000 loops, best of 3: 187 us per loop
>>
>> Where the hash table shines is in more elaborate situations. If you keep
>> the first index where it was found, and the number of repeats, in the hash
>> table, you can get return_index and return_counts almost for free, which
>> means you are performing an extra 3x faster than with sorting.
>> return_inverse requires a little more trickery, so I won;t attempt to
>> quantify the improvement. But I wouldn't be surprised if, after fine tuning
>> it, there is close to an order of magnitude overall improvement
>>
>> The spped-up for in1d is also nice:
>>
>> In [16]: a = np.random.randint(1000, size=(1000,))
>> In [17]: b = np.random.randint(1000, size=(500,))
>> In [18]: %timeit np.in1d(a, b)
>> 1000 loops, best of 3: 178 us per loop
>> In [19]: %timeit _in1d(a, b)
>> 10000 loops, best of 3: 30.1 us per loop
>>
>> Of course, there is no point in
>>
>
> Ooops!!! Hit the send button too quick. Not to extend myself too long: if
> we are going to rethink all of this, we should approach it with an open
> mind. Still, and this post is not helping with that either, I am afraid
> that we are discussing implementation details, but are missing a broader
> vision of what we want to accomplish and why. That vision of what numpy's
> grouping functionality, if any, should be, and how it complements or
> conflicts with what pandas is providing, should precede anything else. I
> now I haven't, but has anyone looked at how pandas implements grouping?
> Their documentation on the subject is well worth a read:
>
> http://pandas.pydata.org/pandas-docs/stable/groupby.html
>
> Does numpy need to replicate this? What/why/how can we add to that?
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> I would certainly not be opposed to having a hashing based indexing
mechanism; I think it would make sense design-wise to have a HashIndex
class with the same interface as the rest, and use that subclass in those
arraysetops where it makes sense. The 'how to' of indexing and its
applications are largely orthogonal I think (with some tiny performance
compromises which are worth the abstraction imo). For datasets which are
not purely random, have many unique items, and which do not fit into cache,
I would expect sorting to come out on top, but indeed it depends on the
dataset.

Yeah, the question how pandas does grouping, and whether we can do better,
is a relevant one.

>From what I understand, pandas relies on cython extensions to get
vectorized grouping functionality. This is no longer necessary since the
introduction of ufuncs in numpy. I don't know how the implementations
compare in terms of performance, but I doubt the difference is huge.

I personally use grouping a lot in my code, and I don't like having to use
pandas for it. Most importantly, I don't want to go around creating a
dataframe for a single one-line hit-and-run association between keys and
values. The permanent association of different types of data and their
metadata which pandas offers is I think the key difference from numpy,
which is all about manipulating just plain ndarrays. Arguably, grouping
itself is a pretty elementary manipulating of ndarrays, and adding calls to
DataFrame or Series inbetween a statement that could just be
simply group_by(keys).mean(values) feels wrong to me. As does including
pandas as a dependency just to use this small piece of
functionality. Grouping is a more general functionality than any particular
method of organizing your data.

In terms of features, adding transformations and filtering might be nice
too; I hadn't thought about it, but that is because unlike the currently
implemented features, the need has never arose for me. Im only a small
sample size, and I don't see any fundamental objection to adding such
functionality though. It certainly raises the question as to where to draw
the line with pandas; but my rule of thumb is that if you can think of it
as an elementary operation on ndarrays, then it probably belongs in numpy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/0858fed0/attachment.html>

From ndbecker2 at gmail.com  Thu Sep  4 07:32:51 2014
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 04 Sep 2014 07:32:51 -0400
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
Message-ID: <lu9il3$eb7$1@ger.gmane.org>

http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html

-- 
-- Those who don't understand recursion are doomed to repeat it


From ben.root at ou.edu  Thu Sep  4 09:16:47 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Thu, 4 Sep 2014 09:16:47 -0400
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
In-Reply-To: <CAPOWHW=PCWzc-aP74zTF5zwRDOE8UXfF8SWk-hNkHZnPh=BfVw@mail.gmail.com>
References: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
	<CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>
	<CABL7CQgKjmskwbu3s_pZT-W7JwQVs2U2xuQ1J=-pqy+_+Fnt-w@mail.gmail.com>
	<CAO0rnfE2i4F93q0ts7hJ_ZX43BYvjq-_m4+011FV_GrtEtoLbA@mail.gmail.com>
	<CAB6mnxL3CVh2VKZoixam+6DYvPo0V+f-P6FdMWd35q3U1Ws-PA@mail.gmail.com>
	<CAPOWHW=PCWzc-aP74zTF5zwRDOE8UXfF8SWk-hNkHZnPh=BfVw@mail.gmail.com>
Message-ID: <CANNq6F=MjTCVrXddY_CRUDcOMwLpHdUqJfQz00W0Pe=ARSujxw@mail.gmail.com>

Jaime,

I had the same feeling when John Hunter gave me commit rights to
matplotlib. I later asked him about it, and he said that he gives commit
rights to those who annoy the mailing list the most. So, it might not be
*that* humbling... :-P

Cheers!
Ben Root


On Wed, Sep 3, 2014 at 9:34 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Wed, Sep 3, 2014 at 5:47 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>> I think the ayes will have it.
>>
>
> As I told Chuck (because I now get to call Charles Chuck, right? :-)), I
> am not sure I am fully qualified for the job: looking at the names on that
> list is a humbling experience. But even if I am the idiot brother, it feels
> good to be part of the family.
>
> Numpy has provided me with countless hours of learning and enjoyment. and
> I really look forward to giving back, even if only a fraction of that.
>
> Thanks a lot for the trust!
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/17ff3d4e/attachment.html>

From charlesr.harris at gmail.com  Thu Sep  4 10:43:01 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 4 Sep 2014 08:43:01 -0600
Subject: [Numpy-discussion] Give Jaime Fernandez commit rights.
In-Reply-To: <CAPOWHW=PCWzc-aP74zTF5zwRDOE8UXfF8SWk-hNkHZnPh=BfVw@mail.gmail.com>
References: <CAB6mnx+ew_g0UFrCNDajafCCCHjFgDW05SX+KEz9Vw=uxvic-Q@mail.gmail.com>
	<CAF6FJiuQ2ccGJ_HT9BNzOKRGP8saNTJwHG=DkSNrEYCgrZ1SPA@mail.gmail.com>
	<CABL7CQgKjmskwbu3s_pZT-W7JwQVs2U2xuQ1J=-pqy+_+Fnt-w@mail.gmail.com>
	<CAO0rnfE2i4F93q0ts7hJ_ZX43BYvjq-_m4+011FV_GrtEtoLbA@mail.gmail.com>
	<CAB6mnxL3CVh2VKZoixam+6DYvPo0V+f-P6FdMWd35q3U1Ws-PA@mail.gmail.com>
	<CAPOWHW=PCWzc-aP74zTF5zwRDOE8UXfF8SWk-hNkHZnPh=BfVw@mail.gmail.com>
Message-ID: <CAB6mnx+zEqkgcFf=kXAbATPHifo5=Mcfa=MrdbCCvz17OTVS-Q@mail.gmail.com>

On Wed, Sep 3, 2014 at 7:34 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Wed, Sep 3, 2014 at 5:47 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>> I think the ayes will have it.
>>
>
> As I told Chuck (because I now get to call Charles Chuck, right? :-)), I
> am not sure I am fully qualified for the job: looking at the names on that
> list is a humbling experience. But even if I am the idiot brother, it feels
> good to be part of the family.
>
> Numpy has provided me with countless hours of learning and enjoyment. and
> I really look forward to giving back, even if only a fraction of that.
>
> Thanks a lot for the trust!
>

One thing you might want to check is if you have a strong password.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/fd2afe2e/attachment.html>

From hoogendoorn.eelco at gmail.com  Thu Sep  4 13:39:14 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Thu, 4 Sep 2014 19:39:14 +0200
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAO0rnfG+fDS3z601o4JWH+KHxeYEdzKTfZj=Ht4Dk-hMzYFC-A@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
	<CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
	<CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>
	<CAO0rnfG+fDS3z601o4JWH+KHxeYEdzKTfZj=Ht4Dk-hMzYFC-A@mail.gmail.com>
Message-ID: <CAO0rnfEczcVfPPomSghnugukFQr0k5EwDVLHU1PiMdfXq_+Y_g@mail.gmail.com>

On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

>
> On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fern?ndez del R?o <
>> jaime.frio at gmail.com> wrote:
>>
>>> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
>>> hoogendoorn.eelco at gmail.com> wrote:
>>>
>>>>  Not sure about the hashing. Indeed one can also build an index of a
>>>> set by means of a hash table, but its questionable if this leads to
>>>> improved performance over performing an argsort. Hashing may have better
>>>> asymptotic time complexity in theory, but many datasets used in practice
>>>> are very easy to sort (O(N)-ish), and the time-constant of hashing is
>>>> higher. But more importantly, using a hash-table guarantees poor cache
>>>> behavior for many operations using this index. By contrast, sorting may
>>>> (but need not) make one random access pass to build the index, and may (but
>>>> need not) perform one random access to reorder values for grouping. But
>>>> insofar as the keys are better behaved than pure random, this coherence
>>>> will be exploited.
>>>>
>>>
>>> If you want to give it a try, these branch of my numpy fork has hash
>>> table based implementations of unique (with no extra indices) and in1d:
>>>
>>>  https://github.com/jaimefrio/numpy/tree/hash-unique
>>>
>>> A use cases where the hash table is clearly better:
>>>
>>> In [1]: import numpy as np
>>> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>>>
>>> In [3]: a = np.random.randint(10, size=(10000,))
>>> In [4]: %timeit np.unique(a)
>>> 1000 loops, best of 3: 258 us per loop
>>> In [5]: %timeit _unique(a)
>>> 10000 loops, best of 3: 143 us per loop
>>> In [6]: %timeit np.sort(_unique(a))
>>> 10000 loops, best of 3: 149 us per loop
>>>
>>> It typically performs between 1.5x and 4x faster than sorting. I haven't
>>> profiled it properly to know, but there may be quite a bit of performance
>>> to dig out: have type specific comparison functions, optimize the starting
>>> hash table size based on the size of the array to avoid reinsertions...
>>>
>>> If getting the elements sorted is a necessity, and the array contains
>>> very few or no repeated items, then the hash table approach may even
>>> perform worse,:
>>>
>>> In [8]: a = np.random.randint(10000, size=(5000,))
>>> In [9]: %timeit np.unique(a)
>>> 1000 loops, best of 3: 277 us per loop
>>> In [10]: %timeit np.sort(_unique(a))
>>> 1000 loops, best of 3: 320 us per loop
>>>
>>> But the hash table still wins in extracting the unique items only:
>>>
>>> In [11]: %timeit _unique(a)
>>> 10000 loops, best of 3: 187 us per loop
>>>
>>> Where the hash table shines is in more elaborate situations. If you keep
>>> the first index where it was found, and the number of repeats, in the hash
>>> table, you can get return_index and return_counts almost for free, which
>>> means you are performing an extra 3x faster than with sorting.
>>> return_inverse requires a little more trickery, so I won;t attempt to
>>> quantify the improvement. But I wouldn't be surprised if, after fine tuning
>>> it, there is close to an order of magnitude overall improvement
>>>
>>> The spped-up for in1d is also nice:
>>>
>>> In [16]: a = np.random.randint(1000, size=(1000,))
>>> In [17]: b = np.random.randint(1000, size=(500,))
>>> In [18]: %timeit np.in1d(a, b)
>>> 1000 loops, best of 3: 178 us per loop
>>> In [19]: %timeit _in1d(a, b)
>>> 10000 loops, best of 3: 30.1 us per loop
>>>
>>> Of course, there is no point in
>>>
>>
>> Ooops!!! Hit the send button too quick. Not to extend myself too long: if
>> we are going to rethink all of this, we should approach it with an open
>> mind. Still, and this post is not helping with that either, I am afraid
>> that we are discussing implementation details, but are missing a broader
>> vision of what we want to accomplish and why. That vision of what numpy's
>> grouping functionality, if any, should be, and how it complements or
>> conflicts with what pandas is providing, should precede anything else. I
>> now I haven't, but has anyone looked at how pandas implements grouping?
>> Their documentation on the subject is well worth a read:
>>
>> http://pandas.pydata.org/pandas-docs/stable/groupby.html
>>
>> Does numpy need to replicate this? What/why/how can we add to that?
>>
>> Jaime
>>
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>> de dominaci?n mundial.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> I would certainly not be opposed to having a hashing based indexing
> mechanism; I think it would make sense design-wise to have a HashIndex
> class with the same interface as the rest, and use that subclass in those
> arraysetops where it makes sense. The 'how to' of indexing and its
> applications are largely orthogonal I think (with some tiny performance
> compromises which are worth the abstraction imo). For datasets which are
> not purely random, have many unique items, and which do not fit into cache,
> I would expect sorting to come out on top, but indeed it depends on the
> dataset.
>
> Yeah, the question how pandas does grouping, and whether we can do better,
> is a relevant one.
>
> From what I understand, pandas relies on cython extensions to get
> vectorized grouping functionality. This is no longer necessary since the
> introduction of ufuncs in numpy. I don't know how the implementations
> compare in terms of performance, but I doubt the difference is huge.
>
> I personally use grouping a lot in my code, and I don't like having to use
> pandas for it. Most importantly, I don't want to go around creating a
> dataframe for a single one-line hit-and-run association between keys and
> values. The permanent association of different types of data and their
> metadata which pandas offers is I think the key difference from numpy,
> which is all about manipulating just plain ndarrays. Arguably, grouping
> itself is a pretty elementary manipulating of ndarrays, and adding calls to
> DataFrame or Series inbetween a statement that could just be
> simply group_by(keys).mean(values) feels wrong to me. As does including
> pandas as a dependency just to use this small piece of
> functionality. Grouping is a more general functionality than any particular
> method of organizing your data.
>
> In terms of features, adding transformations and filtering might be nice
> too; I hadn't thought about it, but that is because unlike the currently
> implemented features, the need has never arose for me. Im only a small
> sample size, and I don't see any fundamental objection to adding such
> functionality though. It certainly raises the question as to where to draw
> the line with pandas; but my rule of thumb is that if you can think of it
> as an elementary operation on ndarrays, then it probably belongs in numpy.
>
>
Oh I forgot to add: with an indexing mechanism based on sorting, unique
values and counts also come 'for free', not counting the O(N) cost of
actually creating those arrays. The only time an operating relying on an
index incurs another nontrivial amount of overhead is in case its 'rank' or
'inverse' property is used, which invokes another argsort. But for the vast
majority of grouping or set operations, these properties are never used.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/121a6eb8/attachment.html>

From jaime.frio at gmail.com  Thu Sep  4 13:55:18 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Thu, 4 Sep 2014 10:55:18 -0700
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAO0rnfEczcVfPPomSghnugukFQr0k5EwDVLHU1PiMdfXq_+Y_g@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
	<CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
	<CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>
	<CAO0rnfG+fDS3z601o4JWH+KHxeYEdzKTfZj=Ht4Dk-hMzYFC-A@mail.gmail.com>
	<CAO0rnfEczcVfPPomSghnugukFQr0k5EwDVLHU1PiMdfXq_+Y_g@mail.gmail.com>
Message-ID: <CAPOWHWmq4YrmThAPNqEuRpYTG+GXUger2Zzt2yq4RU-UmbeOjQ@mail.gmail.com>

On Thu, Sep 4, 2014 at 10:39 AM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

>
> On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>>
>> On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fern?ndez del R?o <
>> jaime.frio at gmail.com> wrote:
>>
>>>  On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fern?ndez del R?o <
>>> jaime.frio at gmail.com> wrote:
>>>
>>>> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>
>>>>>  Not sure about the hashing. Indeed one can also build an index of a
>>>>> set by means of a hash table, but its questionable if this leads to
>>>>> improved performance over performing an argsort. Hashing may have better
>>>>> asymptotic time complexity in theory, but many datasets used in practice
>>>>> are very easy to sort (O(N)-ish), and the time-constant of hashing is
>>>>> higher. But more importantly, using a hash-table guarantees poor cache
>>>>> behavior for many operations using this index. By contrast, sorting may
>>>>> (but need not) make one random access pass to build the index, and may (but
>>>>> need not) perform one random access to reorder values for grouping. But
>>>>> insofar as the keys are better behaved than pure random, this coherence
>>>>> will be exploited.
>>>>>
>>>>
>>>> If you want to give it a try, these branch of my numpy fork has hash
>>>> table based implementations of unique (with no extra indices) and in1d:
>>>>
>>>>  https://github.com/jaimefrio/numpy/tree/hash-unique
>>>>
>>>> A use cases where the hash table is clearly better:
>>>>
>>>> In [1]: import numpy as np
>>>> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>>>>
>>>> In [3]: a = np.random.randint(10, size=(10000,))
>>>> In [4]: %timeit np.unique(a)
>>>> 1000 loops, best of 3: 258 us per loop
>>>> In [5]: %timeit _unique(a)
>>>> 10000 loops, best of 3: 143 us per loop
>>>> In [6]: %timeit np.sort(_unique(a))
>>>> 10000 loops, best of 3: 149 us per loop
>>>>
>>>> It typically performs between 1.5x and 4x faster than sorting. I
>>>> haven't profiled it properly to know, but there may be quite a bit of
>>>> performance to dig out: have type specific comparison functions, optimize
>>>> the starting hash table size based on the size of the array to avoid
>>>> reinsertions...
>>>>
>>>> If getting the elements sorted is a necessity, and the array contains
>>>> very few or no repeated items, then the hash table approach may even
>>>> perform worse,:
>>>>
>>>> In [8]: a = np.random.randint(10000, size=(5000,))
>>>> In [9]: %timeit np.unique(a)
>>>> 1000 loops, best of 3: 277 us per loop
>>>> In [10]: %timeit np.sort(_unique(a))
>>>> 1000 loops, best of 3: 320 us per loop
>>>>
>>>> But the hash table still wins in extracting the unique items only:
>>>>
>>>> In [11]: %timeit _unique(a)
>>>> 10000 loops, best of 3: 187 us per loop
>>>>
>>>> Where the hash table shines is in more elaborate situations. If you
>>>> keep the first index where it was found, and the number of repeats, in the
>>>> hash table, you can get return_index and return_counts almost for free,
>>>> which means you are performing an extra 3x faster than with sorting.
>>>> return_inverse requires a little more trickery, so I won;t attempt to
>>>> quantify the improvement. But I wouldn't be surprised if, after fine tuning
>>>> it, there is close to an order of magnitude overall improvement
>>>>
>>>> The spped-up for in1d is also nice:
>>>>
>>>> In [16]: a = np.random.randint(1000, size=(1000,))
>>>> In [17]: b = np.random.randint(1000, size=(500,))
>>>> In [18]: %timeit np.in1d(a, b)
>>>> 1000 loops, best of 3: 178 us per loop
>>>> In [19]: %timeit _in1d(a, b)
>>>> 10000 loops, best of 3: 30.1 us per loop
>>>>
>>>> Of course, there is no point in
>>>>
>>>
>>> Ooops!!! Hit the send button too quick. Not to extend myself too long:
>>> if we are going to rethink all of this, we should approach it with an open
>>> mind. Still, and this post is not helping with that either, I am afraid
>>> that we are discussing implementation details, but are missing a broader
>>> vision of what we want to accomplish and why. That vision of what numpy's
>>> grouping functionality, if any, should be, and how it complements or
>>> conflicts with what pandas is providing, should precede anything else. I
>>> now I haven't, but has anyone looked at how pandas implements grouping?
>>> Their documentation on the subject is well worth a read:
>>>
>>> http://pandas.pydata.org/pandas-docs/stable/groupby.html
>>>
>>> Does numpy need to replicate this? What/why/how can we add to that?
>>>
>>> Jaime
>>>
>>> --
>>> (\__/)
>>> ( O.o)
>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>> planes de dominaci?n mundial.
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>> I would certainly not be opposed to having a hashing based indexing
>> mechanism; I think it would make sense design-wise to have a HashIndex
>> class with the same interface as the rest, and use that subclass in those
>> arraysetops where it makes sense. The 'how to' of indexing and its
>> applications are largely orthogonal I think (with some tiny performance
>> compromises which are worth the abstraction imo). For datasets which are
>> not purely random, have many unique items, and which do not fit into cache,
>> I would expect sorting to come out on top, but indeed it depends on the
>> dataset.
>>
>> Yeah, the question how pandas does grouping, and whether we can do
>> better, is a relevant one.
>>
>> From what I understand, pandas relies on cython extensions to get
>> vectorized grouping functionality. This is no longer necessary since the
>> introduction of ufuncs in numpy. I don't know how the implementations
>> compare in terms of performance, but I doubt the difference is huge.
>>
>> I personally use grouping a lot in my code, and I don't like having to
>> use pandas for it. Most importantly, I don't want to go around creating a
>> dataframe for a single one-line hit-and-run association between keys and
>> values. The permanent association of different types of data and their
>> metadata which pandas offers is I think the key difference from numpy,
>> which is all about manipulating just plain ndarrays. Arguably, grouping
>> itself is a pretty elementary manipulating of ndarrays, and adding calls to
>> DataFrame or Series inbetween a statement that could just be
>> simply group_by(keys).mean(values) feels wrong to me. As does including
>> pandas as a dependency just to use this small piece of
>> functionality. Grouping is a more general functionality than any particular
>> method of organizing your data.
>>
>> In terms of features, adding transformations and filtering might be nice
>> too; I hadn't thought about it, but that is because unlike the currently
>> implemented features, the need has never arose for me. Im only a small
>> sample size, and I don't see any fundamental objection to adding such
>> functionality though. It certainly raises the question as to where to draw
>> the line with pandas; but my rule of thumb is that if you can think of it
>> as an elementary operation on ndarrays, then it probably belongs in numpy.
>>
>>
> Oh I forgot to add: with an indexing mechanism based on sorting, unique
> values and counts also come 'for free', not counting the O(N) cost of
> actually creating those arrays. The only time an operating relying on an
> index incurs another nontrivial amount of overhead is in case its 'rank' or
> 'inverse' property is used, which invokes another argsort. But for the vast
> majority of grouping or set operations, these properties are never used.
>

That extra argsort is now gone from master:

https://github.com/numpy/numpy/pull/5012

Even with this improvement, returning any index typically makes `np.unique`
run at least 2x slower:

In [1]: import numpy as np
In [2]: a = np.random.randint(100, size=(1000,))
In [3]: %timeit np.unique(a)
10000 loops, best of 3: 37.3 us per loop
In [4]: %timeit np.unique(a, return_inverse=True)
10000 loops, best of 3: 62.1 us per loop
In [5]: %timeit np.unique(a, return_index=True)
10000 loops, best of 3: 72.8 us per loop
In [6]: %timeit np.unique(a, return_counts=True)
10000 loops, best of 3: 56.4 us per loop
In [7]: %timeit np.unique(a, return_index=True, return_inverse=True,
return_coun
ts=True)
10000 loops, best of 3: 112 us per loop

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/5c291a2a/attachment.html>

From hoogendoorn.eelco at gmail.com  Thu Sep  4 14:14:22 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Thu, 4 Sep 2014 20:14:22 +0200
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAPOWHWmq4YrmThAPNqEuRpYTG+GXUger2Zzt2yq4RU-UmbeOjQ@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
	<CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
	<CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>
	<CAO0rnfG+fDS3z601o4JWH+KHxeYEdzKTfZj=Ht4Dk-hMzYFC-A@mail.gmail.com>
	<CAO0rnfEczcVfPPomSghnugukFQr0k5EwDVLHU1PiMdfXq_+Y_g@mail.gmail.com>
	<CAPOWHWmq4YrmThAPNqEuRpYTG+GXUger2Zzt2yq4RU-UmbeOjQ@mail.gmail.com>
Message-ID: <CAO0rnfE4r8PWfsLxFDP=OpKYdqeU4a_QXC2_BFBubSoYd9A_pg@mail.gmail.com>

I should clarify: I am speaking about my implementation, I havnt looked at
the numpy implementation for a while so im not sure what it is up to. Note
that by 'almost free', we are still talking about three passes over the
whole array plus temp allocations, but I am assuming a use-case where the
various sorts involved are the dominant cost, which I imagine they are, for
out-of-cache sorts. Perhaps this isn't too realistic an assumption about
the average use case though, I don't know. Though I suppose its a
reasonable guideline to assume that either the dataset is big, or
performance isn't that big a concern in the first place.


On Thu, Sep 4, 2014 at 7:55 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Thu, Sep 4, 2014 at 10:39 AM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>>
>> On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn <
>> hoogendoorn.eelco at gmail.com> wrote:
>>
>>>
>>> On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fern?ndez del R?o <
>>> jaime.frio at gmail.com> wrote:
>>>
>>>>  On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fern?ndez del R?o <
>>>> jaime.frio at gmail.com> wrote:
>>>>
>>>>> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
>>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>>
>>>>>>  Not sure about the hashing. Indeed one can also build an index of a
>>>>>> set by means of a hash table, but its questionable if this leads to
>>>>>> improved performance over performing an argsort. Hashing may have better
>>>>>> asymptotic time complexity in theory, but many datasets used in practice
>>>>>> are very easy to sort (O(N)-ish), and the time-constant of hashing is
>>>>>> higher. But more importantly, using a hash-table guarantees poor cache
>>>>>> behavior for many operations using this index. By contrast, sorting may
>>>>>> (but need not) make one random access pass to build the index, and may (but
>>>>>> need not) perform one random access to reorder values for grouping. But
>>>>>> insofar as the keys are better behaved than pure random, this coherence
>>>>>> will be exploited.
>>>>>>
>>>>>
>>>>> If you want to give it a try, these branch of my numpy fork has hash
>>>>> table based implementations of unique (with no extra indices) and in1d:
>>>>>
>>>>>  https://github.com/jaimefrio/numpy/tree/hash-unique
>>>>>
>>>>> A use cases where the hash table is clearly better:
>>>>>
>>>>> In [1]: import numpy as np
>>>>> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>>>>>
>>>>> In [3]: a = np.random.randint(10, size=(10000,))
>>>>> In [4]: %timeit np.unique(a)
>>>>> 1000 loops, best of 3: 258 us per loop
>>>>> In [5]: %timeit _unique(a)
>>>>> 10000 loops, best of 3: 143 us per loop
>>>>> In [6]: %timeit np.sort(_unique(a))
>>>>> 10000 loops, best of 3: 149 us per loop
>>>>>
>>>>> It typically performs between 1.5x and 4x faster than sorting. I
>>>>> haven't profiled it properly to know, but there may be quite a bit of
>>>>> performance to dig out: have type specific comparison functions, optimize
>>>>> the starting hash table size based on the size of the array to avoid
>>>>> reinsertions...
>>>>>
>>>>> If getting the elements sorted is a necessity, and the array contains
>>>>> very few or no repeated items, then the hash table approach may even
>>>>> perform worse,:
>>>>>
>>>>> In [8]: a = np.random.randint(10000, size=(5000,))
>>>>> In [9]: %timeit np.unique(a)
>>>>> 1000 loops, best of 3: 277 us per loop
>>>>> In [10]: %timeit np.sort(_unique(a))
>>>>> 1000 loops, best of 3: 320 us per loop
>>>>>
>>>>> But the hash table still wins in extracting the unique items only:
>>>>>
>>>>> In [11]: %timeit _unique(a)
>>>>> 10000 loops, best of 3: 187 us per loop
>>>>>
>>>>> Where the hash table shines is in more elaborate situations. If you
>>>>> keep the first index where it was found, and the number of repeats, in the
>>>>> hash table, you can get return_index and return_counts almost for free,
>>>>> which means you are performing an extra 3x faster than with sorting.
>>>>> return_inverse requires a little more trickery, so I won;t attempt to
>>>>> quantify the improvement. But I wouldn't be surprised if, after fine tuning
>>>>> it, there is close to an order of magnitude overall improvement
>>>>>
>>>>> The spped-up for in1d is also nice:
>>>>>
>>>>> In [16]: a = np.random.randint(1000, size=(1000,))
>>>>> In [17]: b = np.random.randint(1000, size=(500,))
>>>>> In [18]: %timeit np.in1d(a, b)
>>>>> 1000 loops, best of 3: 178 us per loop
>>>>> In [19]: %timeit _in1d(a, b)
>>>>> 10000 loops, best of 3: 30.1 us per loop
>>>>>
>>>>> Of course, there is no point in
>>>>>
>>>>
>>>> Ooops!!! Hit the send button too quick. Not to extend myself too long:
>>>> if we are going to rethink all of this, we should approach it with an open
>>>> mind. Still, and this post is not helping with that either, I am afraid
>>>> that we are discussing implementation details, but are missing a broader
>>>> vision of what we want to accomplish and why. That vision of what numpy's
>>>> grouping functionality, if any, should be, and how it complements or
>>>> conflicts with what pandas is providing, should precede anything else. I
>>>> now I haven't, but has anyone looked at how pandas implements grouping?
>>>> Their documentation on the subject is well worth a read:
>>>>
>>>> http://pandas.pydata.org/pandas-docs/stable/groupby.html
>>>>
>>>> Does numpy need to replicate this? What/why/how can we add to that?
>>>>
>>>> Jaime
>>>>
>>>> --
>>>> (\__/)
>>>> ( O.o)
>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>>> planes de dominaci?n mundial.
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>> I would certainly not be opposed to having a hashing based indexing
>>> mechanism; I think it would make sense design-wise to have a HashIndex
>>> class with the same interface as the rest, and use that subclass in those
>>> arraysetops where it makes sense. The 'how to' of indexing and its
>>> applications are largely orthogonal I think (with some tiny performance
>>> compromises which are worth the abstraction imo). For datasets which are
>>> not purely random, have many unique items, and which do not fit into cache,
>>> I would expect sorting to come out on top, but indeed it depends on the
>>> dataset.
>>>
>>> Yeah, the question how pandas does grouping, and whether we can do
>>> better, is a relevant one.
>>>
>>> From what I understand, pandas relies on cython extensions to get
>>> vectorized grouping functionality. This is no longer necessary since the
>>> introduction of ufuncs in numpy. I don't know how the implementations
>>> compare in terms of performance, but I doubt the difference is huge.
>>>
>>> I personally use grouping a lot in my code, and I don't like having to
>>> use pandas for it. Most importantly, I don't want to go around creating a
>>> dataframe for a single one-line hit-and-run association between keys and
>>> values. The permanent association of different types of data and their
>>> metadata which pandas offers is I think the key difference from numpy,
>>> which is all about manipulating just plain ndarrays. Arguably, grouping
>>> itself is a pretty elementary manipulating of ndarrays, and adding calls to
>>> DataFrame or Series inbetween a statement that could just be
>>> simply group_by(keys).mean(values) feels wrong to me. As does including
>>> pandas as a dependency just to use this small piece of
>>> functionality. Grouping is a more general functionality than any particular
>>> method of organizing your data.
>>>
>>> In terms of features, adding transformations and filtering might be nice
>>> too; I hadn't thought about it, but that is because unlike the currently
>>> implemented features, the need has never arose for me. Im only a small
>>> sample size, and I don't see any fundamental objection to adding such
>>> functionality though. It certainly raises the question as to where to draw
>>> the line with pandas; but my rule of thumb is that if you can think of it
>>> as an elementary operation on ndarrays, then it probably belongs in numpy.
>>>
>>>
>> Oh I forgot to add: with an indexing mechanism based on sorting, unique
>> values and counts also come 'for free', not counting the O(N) cost of
>> actually creating those arrays. The only time an operating relying on an
>> index incurs another nontrivial amount of overhead is in case its 'rank' or
>> 'inverse' property is used, which invokes another argsort. But for the vast
>> majority of grouping or set operations, these properties are never used.
>>
>
> That extra argsort is now gone from master:
>
> https://github.com/numpy/numpy/pull/5012
>
> Even with this improvement, returning any index typically makes
> `np.unique` run at least 2x slower:
>
> In [1]: import numpy as np
> In [2]: a = np.random.randint(100, size=(1000,))
> In [3]: %timeit np.unique(a)
> 10000 loops, best of 3: 37.3 us per loop
> In [4]: %timeit np.unique(a, return_inverse=True)
> 10000 loops, best of 3: 62.1 us per loop
> In [5]: %timeit np.unique(a, return_index=True)
> 10000 loops, best of 3: 72.8 us per loop
> In [6]: %timeit np.unique(a, return_counts=True)
> 10000 loops, best of 3: 56.4 us per loop
> In [7]: %timeit np.unique(a, return_index=True, return_inverse=True,
> return_coun
> ts=True)
> 10000 loops, best of 3: 112 us per loop
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/d9b3e350/attachment.html>

From hoogendoorn.eelco at gmail.com  Thu Sep  4 14:29:07 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Thu, 4 Sep 2014 20:29:07 +0200
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAO0rnfE4r8PWfsLxFDP=OpKYdqeU4a_QXC2_BFBubSoYd9A_pg@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
	<CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
	<CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>
	<CAO0rnfG+fDS3z601o4JWH+KHxeYEdzKTfZj=Ht4Dk-hMzYFC-A@mail.gmail.com>
	<CAO0rnfEczcVfPPomSghnugukFQr0k5EwDVLHU1PiMdfXq_+Y_g@mail.gmail.com>
	<CAPOWHWmq4YrmThAPNqEuRpYTG+GXUger2Zzt2yq4RU-UmbeOjQ@mail.gmail.com>
	<CAO0rnfE4r8PWfsLxFDP=OpKYdqeU4a_QXC2_BFBubSoYd9A_pg@mail.gmail.com>
Message-ID: <CAO0rnfFDivDfzWe1iAZOtBZgqxWWNwKj2Vw58+W_aDm0ddG1=g@mail.gmail.com>

On Thu, Sep 4, 2014 at 8:14 PM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

> I should clarify: I am speaking about my implementation, I havnt looked at
> the numpy implementation for a while so im not sure what it is up to. Note
> that by 'almost free', we are still talking about three passes over the
> whole array plus temp allocations, but I am assuming a use-case where the
> various sorts involved are the dominant cost, which I imagine they are, for
> out-of-cache sorts. Perhaps this isn't too realistic an assumption about
> the average use case though, I don't know. Though I suppose its a
> reasonable guideline to assume that either the dataset is big, or
> performance isn't that big a concern in the first place.
>
>
> On Thu, Sep 4, 2014 at 7:55 PM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Thu, Sep 4, 2014 at 10:39 AM, Eelco Hoogendoorn <
>> hoogendoorn.eelco at gmail.com> wrote:
>>
>>>
>>> On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn <
>>> hoogendoorn.eelco at gmail.com> wrote:
>>>
>>>>
>>>> On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fern?ndez del R?o <
>>>> jaime.frio at gmail.com> wrote:
>>>>
>>>>>  On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fern?ndez del R?o <
>>>>> jaime.frio at gmail.com> wrote:
>>>>>
>>>>>> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
>>>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>>>
>>>>>>>  Not sure about the hashing. Indeed one can also build an index of a
>>>>>>> set by means of a hash table, but its questionable if this leads to
>>>>>>> improved performance over performing an argsort. Hashing may have better
>>>>>>> asymptotic time complexity in theory, but many datasets used in practice
>>>>>>> are very easy to sort (O(N)-ish), and the time-constant of hashing is
>>>>>>> higher. But more importantly, using a hash-table guarantees poor cache
>>>>>>> behavior for many operations using this index. By contrast, sorting may
>>>>>>> (but need not) make one random access pass to build the index, and may (but
>>>>>>> need not) perform one random access to reorder values for grouping. But
>>>>>>> insofar as the keys are better behaved than pure random, this coherence
>>>>>>> will be exploited.
>>>>>>>
>>>>>>
>>>>>> If you want to give it a try, these branch of my numpy fork has hash
>>>>>> table based implementations of unique (with no extra indices) and in1d:
>>>>>>
>>>>>>  https://github.com/jaimefrio/numpy/tree/hash-unique
>>>>>>
>>>>>> A use cases where the hash table is clearly better:
>>>>>>
>>>>>> In [1]: import numpy as np
>>>>>> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>>>>>>
>>>>>> In [3]: a = np.random.randint(10, size=(10000,))
>>>>>> In [4]: %timeit np.unique(a)
>>>>>> 1000 loops, best of 3: 258 us per loop
>>>>>> In [5]: %timeit _unique(a)
>>>>>> 10000 loops, best of 3: 143 us per loop
>>>>>> In [6]: %timeit np.sort(_unique(a))
>>>>>> 10000 loops, best of 3: 149 us per loop
>>>>>>
>>>>>> It typically performs between 1.5x and 4x faster than sorting. I
>>>>>> haven't profiled it properly to know, but there may be quite a bit of
>>>>>> performance to dig out: have type specific comparison functions, optimize
>>>>>> the starting hash table size based on the size of the array to avoid
>>>>>> reinsertions...
>>>>>>
>>>>>> If getting the elements sorted is a necessity, and the array contains
>>>>>> very few or no repeated items, then the hash table approach may even
>>>>>> perform worse,:
>>>>>>
>>>>>> In [8]: a = np.random.randint(10000, size=(5000,))
>>>>>> In [9]: %timeit np.unique(a)
>>>>>> 1000 loops, best of 3: 277 us per loop
>>>>>> In [10]: %timeit np.sort(_unique(a))
>>>>>> 1000 loops, best of 3: 320 us per loop
>>>>>>
>>>>>> But the hash table still wins in extracting the unique items only:
>>>>>>
>>>>>> In [11]: %timeit _unique(a)
>>>>>> 10000 loops, best of 3: 187 us per loop
>>>>>>
>>>>>> Where the hash table shines is in more elaborate situations. If you
>>>>>> keep the first index where it was found, and the number of repeats, in the
>>>>>> hash table, you can get return_index and return_counts almost for free,
>>>>>> which means you are performing an extra 3x faster than with sorting.
>>>>>> return_inverse requires a little more trickery, so I won;t attempt to
>>>>>> quantify the improvement. But I wouldn't be surprised if, after fine tuning
>>>>>> it, there is close to an order of magnitude overall improvement
>>>>>>
>>>>>> The spped-up for in1d is also nice:
>>>>>>
>>>>>> In [16]: a = np.random.randint(1000, size=(1000,))
>>>>>> In [17]: b = np.random.randint(1000, size=(500,))
>>>>>> In [18]: %timeit np.in1d(a, b)
>>>>>> 1000 loops, best of 3: 178 us per loop
>>>>>> In [19]: %timeit _in1d(a, b)
>>>>>> 10000 loops, best of 3: 30.1 us per loop
>>>>>>
>>>>>> Of course, there is no point in
>>>>>>
>>>>>
>>>>> Ooops!!! Hit the send button too quick. Not to extend myself too long:
>>>>> if we are going to rethink all of this, we should approach it with an open
>>>>> mind. Still, and this post is not helping with that either, I am afraid
>>>>> that we are discussing implementation details, but are missing a broader
>>>>> vision of what we want to accomplish and why. That vision of what numpy's
>>>>> grouping functionality, if any, should be, and how it complements or
>>>>> conflicts with what pandas is providing, should precede anything else. I
>>>>> now I haven't, but has anyone looked at how pandas implements grouping?
>>>>> Their documentation on the subject is well worth a read:
>>>>>
>>>>> http://pandas.pydata.org/pandas-docs/stable/groupby.html
>>>>>
>>>>> Does numpy need to replicate this? What/why/how can we add to that?
>>>>>
>>>>> Jaime
>>>>>
>>>>> --
>>>>> (\__/)
>>>>> ( O.o)
>>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>>>> planes de dominaci?n mundial.
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>> I would certainly not be opposed to having a hashing based indexing
>>>> mechanism; I think it would make sense design-wise to have a HashIndex
>>>> class with the same interface as the rest, and use that subclass in those
>>>> arraysetops where it makes sense. The 'how to' of indexing and its
>>>> applications are largely orthogonal I think (with some tiny performance
>>>> compromises which are worth the abstraction imo). For datasets which are
>>>> not purely random, have many unique items, and which do not fit into cache,
>>>> I would expect sorting to come out on top, but indeed it depends on the
>>>> dataset.
>>>>
>>>> Yeah, the question how pandas does grouping, and whether we can do
>>>> better, is a relevant one.
>>>>
>>>> From what I understand, pandas relies on cython extensions to get
>>>> vectorized grouping functionality. This is no longer necessary since the
>>>> introduction of ufuncs in numpy. I don't know how the implementations
>>>> compare in terms of performance, but I doubt the difference is huge.
>>>>
>>>> I personally use grouping a lot in my code, and I don't like having to
>>>> use pandas for it. Most importantly, I don't want to go around creating a
>>>> dataframe for a single one-line hit-and-run association between keys and
>>>> values. The permanent association of different types of data and their
>>>> metadata which pandas offers is I think the key difference from numpy,
>>>> which is all about manipulating just plain ndarrays. Arguably, grouping
>>>> itself is a pretty elementary manipulating of ndarrays, and adding calls to
>>>> DataFrame or Series inbetween a statement that could just be
>>>> simply group_by(keys).mean(values) feels wrong to me. As does including
>>>> pandas as a dependency just to use this small piece of
>>>> functionality. Grouping is a more general functionality than any particular
>>>> method of organizing your data.
>>>>
>>>> In terms of features, adding transformations and filtering might be
>>>> nice too; I hadn't thought about it, but that is because unlike the
>>>> currently implemented features, the need has never arose for me. Im only a
>>>> small sample size, and I don't see any fundamental objection to adding such
>>>> functionality though. It certainly raises the question as to where to draw
>>>> the line with pandas; but my rule of thumb is that if you can think of it
>>>> as an elementary operation on ndarrays, then it probably belongs in numpy.
>>>>
>>>>
>>> Oh I forgot to add: with an indexing mechanism based on sorting, unique
>>> values and counts also come 'for free', not counting the O(N) cost of
>>> actually creating those arrays. The only time an operating relying on an
>>> index incurs another nontrivial amount of overhead is in case its 'rank' or
>>> 'inverse' property is used, which invokes another argsort. But for the vast
>>> majority of grouping or set operations, these properties are never used.
>>>
>>
>> That extra argsort is now gone from master:
>>
>> https://github.com/numpy/numpy/pull/5012
>>
>> Even with this improvement, returning any index typically makes
>> `np.unique` run at least 2x slower:
>>
>> In [1]: import numpy as np
>> In [2]: a = np.random.randint(100, size=(1000,))
>> In [3]: %timeit np.unique(a)
>> 10000 loops, best of 3: 37.3 us per loop
>> In [4]: %timeit np.unique(a, return_inverse=True)
>> 10000 loops, best of 3: 62.1 us per loop
>> In [5]: %timeit np.unique(a, return_index=True)
>> 10000 loops, best of 3: 72.8 us per loop
>> In [6]: %timeit np.unique(a, return_counts=True)
>> 10000 loops, best of 3: 56.4 us per loop
>> In [7]: %timeit np.unique(a, return_index=True, return_inverse=True,
>> return_coun
>> ts=True)
>> 10000 loops, best of 3: 112 us per loop
>>
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>> de dominaci?n mundial.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
Yeah, I looked at the numpy implementation, and it seems these speed
differences are simply the result of the extra O(N) costs involved, so my
implementation would have the same characteristics. If another array copy
or two has meaningful impact on performance, then you are done as far as
optimization within numpy is concerned, id say. You could fuse those loops
on the C level, but as you said I think its better to think about these
kind of optimizations once we have a more complete picture of the
functionality we want.

Good call on removing the extra argsort, that hadn't occurred to me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/498000dd/attachment.html>

From jeffreback at gmail.com  Thu Sep  4 14:36:58 2014
From: jeffreback at gmail.com (Jeff Reback)
Date: Thu, 4 Sep 2014 14:36:58 -0400
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAO0rnfFDivDfzWe1iAZOtBZgqxWWNwKj2Vw58+W_aDm0ddG1=g@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
	<CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
	<CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>
	<CAO0rnfG+fDS3z601o4JWH+KHxeYEdzKTfZj=Ht4Dk-hMzYFC-A@mail.gmail.com>
	<CAO0rnfEczcVfPPomSghnugukFQr0k5EwDVLHU1PiMdfXq_+Y_g@mail.gmail.com>
	<CAPOWHWmq4YrmThAPNqEuRpYTG+GXUger2Zzt2yq4RU-UmbeOjQ@mail.gmail.com>
	<CAO0rnfE4r8PWfsLxFDP=OpKYdqeU4a_QXC2_BFBubSoYd9A_pg@mail.gmail.com>
	<CAO0rnfFDivDfzWe1iAZOtBZgqxWWNwKj2Vw58+W_aDm0ddG1=g@mail.gmail.com>
Message-ID: <CAHMnJKh+=ENhy_w_PLRhFniwpF+8-NTgVp34MODaV1Rqf0x7Pw@mail.gmail.com>

FYI pandas DOES use a very performant hash table impl for unique (and
value_counts). Sorted state IS maintained
by underlying Index implmentation.
https://github.com/pydata/pandas/blob/master/pandas/hashtable.pyx

In [8]: a = np.random.randint(10, size=(10000,))

In [9]: %timeit np.unique(a)
1000 loops, best of 3: 284 ?s per loop

In [10]: %timeit Series(a).unique()
10000 loops, best of 3: 161 ?s per loop

In [11]: s = Series(a)

# without the creation overhead
In [12]: %timeit s.unique()
10000 loops, best of 3: 75.3 ?s per loop


On Thu, Sep 4, 2014 at 2:29 PM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

>
> On Thu, Sep 4, 2014 at 8:14 PM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>> I should clarify: I am speaking about my implementation, I havnt looked
>> at the numpy implementation for a while so im not sure what it is up to.
>> Note that by 'almost free', we are still talking about three passes over
>> the whole array plus temp allocations, but I am assuming a use-case where
>> the various sorts involved are the dominant cost, which I imagine they are,
>> for out-of-cache sorts. Perhaps this isn't too realistic an assumption
>> about the average use case though, I don't know. Though I suppose its a
>> reasonable guideline to assume that either the dataset is big, or
>> performance isn't that big a concern in the first place.
>>
>>
>> On Thu, Sep 4, 2014 at 7:55 PM, Jaime Fern?ndez del R?o <
>> jaime.frio at gmail.com> wrote:
>>
>>> On Thu, Sep 4, 2014 at 10:39 AM, Eelco Hoogendoorn <
>>> hoogendoorn.eelco at gmail.com> wrote:
>>>
>>>>
>>>> On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn <
>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>
>>>>>
>>>>> On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fern?ndez del R?o <
>>>>> jaime.frio at gmail.com> wrote:
>>>>>
>>>>>>  On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fern?ndez del R?o <
>>>>>> jaime.frio at gmail.com> wrote:
>>>>>>
>>>>>>> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
>>>>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>>>>
>>>>>>>>  Not sure about the hashing. Indeed one can also build an index of
>>>>>>>> a set by means of a hash table, but its questionable if this leads to
>>>>>>>> improved performance over performing an argsort. Hashing may have better
>>>>>>>> asymptotic time complexity in theory, but many datasets used in practice
>>>>>>>> are very easy to sort (O(N)-ish), and the time-constant of hashing is
>>>>>>>> higher. But more importantly, using a hash-table guarantees poor cache
>>>>>>>> behavior for many operations using this index. By contrast, sorting may
>>>>>>>> (but need not) make one random access pass to build the index, and may (but
>>>>>>>> need not) perform one random access to reorder values for grouping. But
>>>>>>>> insofar as the keys are better behaved than pure random, this coherence
>>>>>>>> will be exploited.
>>>>>>>>
>>>>>>>
>>>>>>> If you want to give it a try, these branch of my numpy fork has hash
>>>>>>> table based implementations of unique (with no extra indices) and in1d:
>>>>>>>
>>>>>>>  https://github.com/jaimefrio/numpy/tree/hash-unique
>>>>>>>
>>>>>>> A use cases where the hash table is clearly better:
>>>>>>>
>>>>>>> In [1]: import numpy as np
>>>>>>> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>>>>>>>
>>>>>>> In [3]: a = np.random.randint(10, size=(10000,))
>>>>>>> In [4]: %timeit np.unique(a)
>>>>>>> 1000 loops, best of 3: 258 us per loop
>>>>>>> In [5]: %timeit _unique(a)
>>>>>>> 10000 loops, best of 3: 143 us per loop
>>>>>>> In [6]: %timeit np.sort(_unique(a))
>>>>>>> 10000 loops, best of 3: 149 us per loop
>>>>>>>
>>>>>>> It typically performs between 1.5x and 4x faster than sorting. I
>>>>>>> haven't profiled it properly to know, but there may be quite a bit of
>>>>>>> performance to dig out: have type specific comparison functions, optimize
>>>>>>> the starting hash table size based on the size of the array to avoid
>>>>>>> reinsertions...
>>>>>>>
>>>>>>> If getting the elements sorted is a necessity, and the array
>>>>>>> contains very few or no repeated items, then the hash table approach may
>>>>>>> even perform worse,:
>>>>>>>
>>>>>>> In [8]: a = np.random.randint(10000, size=(5000,))
>>>>>>> In [9]: %timeit np.unique(a)
>>>>>>> 1000 loops, best of 3: 277 us per loop
>>>>>>> In [10]: %timeit np.sort(_unique(a))
>>>>>>> 1000 loops, best of 3: 320 us per loop
>>>>>>>
>>>>>>> But the hash table still wins in extracting the unique items only:
>>>>>>>
>>>>>>> In [11]: %timeit _unique(a)
>>>>>>> 10000 loops, best of 3: 187 us per loop
>>>>>>>
>>>>>>> Where the hash table shines is in more elaborate situations. If you
>>>>>>> keep the first index where it was found, and the number of repeats, in the
>>>>>>> hash table, you can get return_index and return_counts almost for free,
>>>>>>> which means you are performing an extra 3x faster than with sorting.
>>>>>>> return_inverse requires a little more trickery, so I won;t attempt to
>>>>>>> quantify the improvement. But I wouldn't be surprised if, after fine tuning
>>>>>>> it, there is close to an order of magnitude overall improvement
>>>>>>>
>>>>>>> The spped-up for in1d is also nice:
>>>>>>>
>>>>>>> In [16]: a = np.random.randint(1000, size=(1000,))
>>>>>>> In [17]: b = np.random.randint(1000, size=(500,))
>>>>>>> In [18]: %timeit np.in1d(a, b)
>>>>>>> 1000 loops, best of 3: 178 us per loop
>>>>>>> In [19]: %timeit _in1d(a, b)
>>>>>>> 10000 loops, best of 3: 30.1 us per loop
>>>>>>>
>>>>>>> Of course, there is no point in
>>>>>>>
>>>>>>
>>>>>> Ooops!!! Hit the send button too quick. Not to extend myself too
>>>>>> long: if we are going to rethink all of this, we should approach it with an
>>>>>> open mind. Still, and this post is not helping with that either, I am
>>>>>> afraid that we are discussing implementation details, but are missing a
>>>>>> broader vision of what we want to accomplish and why. That vision of what
>>>>>> numpy's grouping functionality, if any, should be, and how it complements
>>>>>> or conflicts with what pandas is providing, should precede anything else. I
>>>>>> now I haven't, but has anyone looked at how pandas implements grouping?
>>>>>> Their documentation on the subject is well worth a read:
>>>>>>
>>>>>> http://pandas.pydata.org/pandas-docs/stable/groupby.html
>>>>>>
>>>>>> Does numpy need to replicate this? What/why/how can we add to that?
>>>>>>
>>>>>> Jaime
>>>>>>
>>>>>> --
>>>>>> (\__/)
>>>>>> ( O.o)
>>>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>>>>> planes de dominaci?n mundial.
>>>>>>
>>>>>> _______________________________________________
>>>>>> NumPy-Discussion mailing list
>>>>>> NumPy-Discussion at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>>
>>>>>> I would certainly not be opposed to having a hashing based indexing
>>>>> mechanism; I think it would make sense design-wise to have a HashIndex
>>>>> class with the same interface as the rest, and use that subclass in those
>>>>> arraysetops where it makes sense. The 'how to' of indexing and its
>>>>> applications are largely orthogonal I think (with some tiny performance
>>>>> compromises which are worth the abstraction imo). For datasets which are
>>>>> not purely random, have many unique items, and which do not fit into cache,
>>>>> I would expect sorting to come out on top, but indeed it depends on the
>>>>> dataset.
>>>>>
>>>>> Yeah, the question how pandas does grouping, and whether we can do
>>>>> better, is a relevant one.
>>>>>
>>>>> From what I understand, pandas relies on cython extensions to get
>>>>> vectorized grouping functionality. This is no longer necessary since the
>>>>> introduction of ufuncs in numpy. I don't know how the implementations
>>>>> compare in terms of performance, but I doubt the difference is huge.
>>>>>
>>>>> I personally use grouping a lot in my code, and I don't like having to
>>>>> use pandas for it. Most importantly, I don't want to go around creating a
>>>>> dataframe for a single one-line hit-and-run association between keys and
>>>>> values. The permanent association of different types of data and their
>>>>> metadata which pandas offers is I think the key difference from numpy,
>>>>> which is all about manipulating just plain ndarrays. Arguably, grouping
>>>>> itself is a pretty elementary manipulating of ndarrays, and adding calls to
>>>>> DataFrame or Series inbetween a statement that could just be
>>>>> simply group_by(keys).mean(values) feels wrong to me. As does including
>>>>> pandas as a dependency just to use this small piece of
>>>>> functionality. Grouping is a more general functionality than any particular
>>>>> method of organizing your data.
>>>>>
>>>>> In terms of features, adding transformations and filtering might be
>>>>> nice too; I hadn't thought about it, but that is because unlike the
>>>>> currently implemented features, the need has never arose for me. Im only a
>>>>> small sample size, and I don't see any fundamental objection to adding such
>>>>> functionality though. It certainly raises the question as to where to draw
>>>>> the line with pandas; but my rule of thumb is that if you can think of it
>>>>> as an elementary operation on ndarrays, then it probably belongs in numpy.
>>>>>
>>>>>
>>>> Oh I forgot to add: with an indexing mechanism based on sorting, unique
>>>> values and counts also come 'for free', not counting the O(N) cost of
>>>> actually creating those arrays. The only time an operating relying on an
>>>> index incurs another nontrivial amount of overhead is in case its 'rank' or
>>>> 'inverse' property is used, which invokes another argsort. But for the vast
>>>> majority of grouping or set operations, these properties are never used.
>>>>
>>>
>>> That extra argsort is now gone from master:
>>>
>>> https://github.com/numpy/numpy/pull/5012
>>>
>>> Even with this improvement, returning any index typically makes
>>> `np.unique` run at least 2x slower:
>>>
>>> In [1]: import numpy as np
>>> In [2]: a = np.random.randint(100, size=(1000,))
>>> In [3]: %timeit np.unique(a)
>>> 10000 loops, best of 3: 37.3 us per loop
>>> In [4]: %timeit np.unique(a, return_inverse=True)
>>> 10000 loops, best of 3: 62.1 us per loop
>>> In [5]: %timeit np.unique(a, return_index=True)
>>> 10000 loops, best of 3: 72.8 us per loop
>>> In [6]: %timeit np.unique(a, return_counts=True)
>>> 10000 loops, best of 3: 56.4 us per loop
>>> In [7]: %timeit np.unique(a, return_index=True, return_inverse=True,
>>> return_coun
>>> ts=True)
>>> 10000 loops, best of 3: 112 us per loop
>>>
>>> --
>>> (\__/)
>>> ( O.o)
>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>> planes de dominaci?n mundial.
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
> Yeah, I looked at the numpy implementation, and it seems these speed
> differences are simply the result of the extra O(N) costs involved, so my
> implementation would have the same characteristics. If another array copy
> or two has meaningful impact on performance, then you are done as far as
> optimization within numpy is concerned, id say. You could fuse those loops
> on the C level, but as you said I think its better to think about these
> kind of optimizations once we have a more complete picture of the
> functionality we want.
>
> Good call on removing the extra argsort, that hadn't occurred to me.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/739846b3/attachment.html>

From hoogendoorn.eelco at gmail.com  Thu Sep  4 15:19:14 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Thu, 4 Sep 2014 21:19:14 +0200
Subject: [Numpy-discussion] Does a `mergesorted` function make sense?
In-Reply-To: <CAHMnJKh+=ENhy_w_PLRhFniwpF+8-NTgVp34MODaV1Rqf0x7Pw@mail.gmail.com>
References: <CAPOWHW=Ybdt-GPUGpDBEqRhC4q1yhqVpeR-VAX8ZOcPisjqkzQ@mail.gmail.com>
	<CAO0rnfEnvnSBCgh=W5bLVJqe4hUwNYpy1amLaF1eL1xCNEy5Aw@mail.gmail.com>
	<CAPOWHWn5mHP76ysBn5HEjBX4C2Pf-MeaZf_e1VNpokm3C47HMw@mail.gmail.com>
	<CAO0rnfEvC-kSGO8v=_KV1HtEmpsJrSYP8r_q1bmPcw_1nrfqsw@mail.gmail.com>
	<CAO0rnfGHCnvtnmB6YJZqwJ+BG=4R54nmmN7yT08h3wv0kHj-mQ@mail.gmail.com>
	<CAO0rnfHmEEgWJw1Mv=QfiarHjTcU5y+Ys8SYb7tE6OP5oQqzAA@mail.gmail.com>
	<CAPOWHWnQsdMWZ-N+yJ=XgR8yY+5QQ0ewRRPcnMZ8qaAcHCyhyA@mail.gmail.com>
	<CAO0rnfGkXWEfWrT2rf1ow+8vh75aN0R7FjZ5w1A9pxdSG3herQ@mail.gmail.com>
	<CAO0rnfHm-6CTdC2Rz0dLJLAxLZ299Wznz1NnEvS5PBffwncr7Q@mail.gmail.com>
	<CAB6mnxL4s1Z6MX=Jp0wkeK6_07qG98bQg=NMvQhusQksb_ObvA@mail.gmail.com>
	<CAO0rnfEDwnBrcKAWki-ui3kC89pRFeVQVuqtB-RPg2V=HouRvg@mail.gmail.com>
	<CAB6mnx+iabxz0aGBFepvoB_OnknbyVrgb_uu9ovsAD7TCv_Ppw@mail.gmail.com>
	<CAO0rnfExVHvTsRqbrzKUj8u_Yk+FMdJikmgzuLcoqAjoN1o9ig@mail.gmail.com>
	<CAB6mnxKtNxw3FgxTWUh8Xz=D755rkbeA8bO+JDs8w6_-g3c+Og@mail.gmail.com>
	<CAPOWHW=b-=Xe4Om3dE6_gAOtqy7FCB0K=Ba5b9c8u78nNMfshw@mail.gmail.com>
	<CAO0rnfEj=ntHhufZmOhW0BOU-pcy0rZ7segFZJM4foU4M+Oe+Q@mail.gmail.com>
	<CAPOWHWnawyvNgwQTS4xbZf76SHvc-PdCvf1w6_3629fW_EkVqg@mail.gmail.com>
	<CAPOWHWnwsf_xY6SOwQp_eGCj-7uzmRGaSjkpeHo46H2y9mRnbQ@mail.gmail.com>
	<CAO0rnfG+fDS3z601o4JWH+KHxeYEdzKTfZj=Ht4Dk-hMzYFC-A@mail.gmail.com>
	<CAO0rnfEczcVfPPomSghnugukFQr0k5EwDVLHU1PiMdfXq_+Y_g@mail.gmail.com>
	<CAPOWHWmq4YrmThAPNqEuRpYTG+GXUger2Zzt2yq4RU-UmbeOjQ@mail.gmail.com>
	<CAO0rnfE4r8PWfsLxFDP=OpKYdqeU4a_QXC2_BFBubSoYd9A_pg@mail.gmail.com>
	<CAO0rnfFDivDfzWe1iAZOtBZgqxWWNwKj2Vw58+W_aDm0ddG1=g@mail.gmail.com>
	<CAHMnJKh+=ENhy_w_PLRhFniwpF+8-NTgVp34MODaV1Rqf0x7Pw@mail.gmail.com>
Message-ID: <CAO0rnfE79_rZAmSc5W99tq4n4+qaRyJ8c8DMjUvqZ8xTqZQEMQ@mail.gmail.com>

Naturally, youd want to avoid redoing the indexing where you can, which is
another good reason to factor out the indexing mechanisms into separate
classes. A factor two performance difference does not get me too excited;
again, I think it would be the other way around for an out-of-cache dataset
being grouped. But this by itself is ofcourse another argument for
factoring out the indexing behind a uniform interface, so we can play
around with those implementation details later, and specialize the indexing
to serve different scenarios. Also, it really helps with code
maintainability; most arraysetops are almost trivial to implement once you
have abstracted away the indexing machinery.


On Thu, Sep 4, 2014 at 8:36 PM, Jeff Reback <jeffreback at gmail.com> wrote:

> FYI pandas DOES use a very performant hash table impl for unique (and
> value_counts). Sorted state IS maintained
> by underlying Index implmentation.
> https://github.com/pydata/pandas/blob/master/pandas/hashtable.pyx
>
> In [8]: a = np.random.randint(10, size=(10000,))
>
> In [9]: %timeit np.unique(a)
> 1000 loops, best of 3: 284 ?s per loop
>
> In [10]: %timeit Series(a).unique()
> 10000 loops, best of 3: 161 ?s per loop
>
> In [11]: s = Series(a)
>
> # without the creation overhead
> In [12]: %timeit s.unique()
> 10000 loops, best of 3: 75.3 ?s per loop
>
>
>
> On Thu, Sep 4, 2014 at 2:29 PM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>>
>> On Thu, Sep 4, 2014 at 8:14 PM, Eelco Hoogendoorn <
>> hoogendoorn.eelco at gmail.com> wrote:
>>
>>> I should clarify: I am speaking about my implementation, I havnt looked
>>> at the numpy implementation for a while so im not sure what it is up to.
>>> Note that by 'almost free', we are still talking about three passes over
>>> the whole array plus temp allocations, but I am assuming a use-case where
>>> the various sorts involved are the dominant cost, which I imagine they are,
>>> for out-of-cache sorts. Perhaps this isn't too realistic an assumption
>>> about the average use case though, I don't know. Though I suppose its a
>>> reasonable guideline to assume that either the dataset is big, or
>>> performance isn't that big a concern in the first place.
>>>
>>>
>>> On Thu, Sep 4, 2014 at 7:55 PM, Jaime Fern?ndez del R?o <
>>> jaime.frio at gmail.com> wrote:
>>>
>>>> On Thu, Sep 4, 2014 at 10:39 AM, Eelco Hoogendoorn <
>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>
>>>>>
>>>>> On Thu, Sep 4, 2014 at 10:31 AM, Eelco Hoogendoorn <
>>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> On Wed, Sep 3, 2014 at 6:46 PM, Jaime Fern?ndez del R?o <
>>>>>> jaime.frio at gmail.com> wrote:
>>>>>>
>>>>>>>  On Wed, Sep 3, 2014 at 9:33 AM, Jaime Fern?ndez del R?o <
>>>>>>> jaime.frio at gmail.com> wrote:
>>>>>>>
>>>>>>>> On Wed, Sep 3, 2014 at 6:41 AM, Eelco Hoogendoorn <
>>>>>>>> hoogendoorn.eelco at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>  Not sure about the hashing. Indeed one can also build an index of
>>>>>>>>> a set by means of a hash table, but its questionable if this leads to
>>>>>>>>> improved performance over performing an argsort. Hashing may have better
>>>>>>>>> asymptotic time complexity in theory, but many datasets used in practice
>>>>>>>>> are very easy to sort (O(N)-ish), and the time-constant of hashing is
>>>>>>>>> higher. But more importantly, using a hash-table guarantees poor cache
>>>>>>>>> behavior for many operations using this index. By contrast, sorting may
>>>>>>>>> (but need not) make one random access pass to build the index, and may (but
>>>>>>>>> need not) perform one random access to reorder values for grouping. But
>>>>>>>>> insofar as the keys are better behaved than pure random, this coherence
>>>>>>>>> will be exploited.
>>>>>>>>>
>>>>>>>>
>>>>>>>> If you want to give it a try, these branch of my numpy fork has
>>>>>>>> hash table based implementations of unique (with no extra indices) and in1d:
>>>>>>>>
>>>>>>>>  https://github.com/jaimefrio/numpy/tree/hash-unique
>>>>>>>>
>>>>>>>> A use cases where the hash table is clearly better:
>>>>>>>>
>>>>>>>> In [1]: import numpy as np
>>>>>>>> In [2]: from numpy.lib._compiled_base import _unique, _in1d
>>>>>>>>
>>>>>>>> In [3]: a = np.random.randint(10, size=(10000,))
>>>>>>>> In [4]: %timeit np.unique(a)
>>>>>>>> 1000 loops, best of 3: 258 us per loop
>>>>>>>> In [5]: %timeit _unique(a)
>>>>>>>> 10000 loops, best of 3: 143 us per loop
>>>>>>>> In [6]: %timeit np.sort(_unique(a))
>>>>>>>> 10000 loops, best of 3: 149 us per loop
>>>>>>>>
>>>>>>>> It typically performs between 1.5x and 4x faster than sorting. I
>>>>>>>> haven't profiled it properly to know, but there may be quite a bit of
>>>>>>>> performance to dig out: have type specific comparison functions, optimize
>>>>>>>> the starting hash table size based on the size of the array to avoid
>>>>>>>> reinsertions...
>>>>>>>>
>>>>>>>> If getting the elements sorted is a necessity, and the array
>>>>>>>> contains very few or no repeated items, then the hash table approach may
>>>>>>>> even perform worse,:
>>>>>>>>
>>>>>>>> In [8]: a = np.random.randint(10000, size=(5000,))
>>>>>>>> In [9]: %timeit np.unique(a)
>>>>>>>> 1000 loops, best of 3: 277 us per loop
>>>>>>>> In [10]: %timeit np.sort(_unique(a))
>>>>>>>> 1000 loops, best of 3: 320 us per loop
>>>>>>>>
>>>>>>>> But the hash table still wins in extracting the unique items only:
>>>>>>>>
>>>>>>>> In [11]: %timeit _unique(a)
>>>>>>>> 10000 loops, best of 3: 187 us per loop
>>>>>>>>
>>>>>>>> Where the hash table shines is in more elaborate situations. If you
>>>>>>>> keep the first index where it was found, and the number of repeats, in the
>>>>>>>> hash table, you can get return_index and return_counts almost for free,
>>>>>>>> which means you are performing an extra 3x faster than with sorting.
>>>>>>>> return_inverse requires a little more trickery, so I won;t attempt to
>>>>>>>> quantify the improvement. But I wouldn't be surprised if, after fine tuning
>>>>>>>> it, there is close to an order of magnitude overall improvement
>>>>>>>>
>>>>>>>> The spped-up for in1d is also nice:
>>>>>>>>
>>>>>>>> In [16]: a = np.random.randint(1000, size=(1000,))
>>>>>>>> In [17]: b = np.random.randint(1000, size=(500,))
>>>>>>>> In [18]: %timeit np.in1d(a, b)
>>>>>>>> 1000 loops, best of 3: 178 us per loop
>>>>>>>> In [19]: %timeit _in1d(a, b)
>>>>>>>> 10000 loops, best of 3: 30.1 us per loop
>>>>>>>>
>>>>>>>> Of course, there is no point in
>>>>>>>>
>>>>>>>
>>>>>>> Ooops!!! Hit the send button too quick. Not to extend myself too
>>>>>>> long: if we are going to rethink all of this, we should approach it with an
>>>>>>> open mind. Still, and this post is not helping with that either, I am
>>>>>>> afraid that we are discussing implementation details, but are missing a
>>>>>>> broader vision of what we want to accomplish and why. That vision of what
>>>>>>> numpy's grouping functionality, if any, should be, and how it complements
>>>>>>> or conflicts with what pandas is providing, should precede anything else. I
>>>>>>> now I haven't, but has anyone looked at how pandas implements grouping?
>>>>>>> Their documentation on the subject is well worth a read:
>>>>>>>
>>>>>>> http://pandas.pydata.org/pandas-docs/stable/groupby.html
>>>>>>>
>>>>>>> Does numpy need to replicate this? What/why/how can we add to that?
>>>>>>>
>>>>>>> Jaime
>>>>>>>
>>>>>>> --
>>>>>>> (\__/)
>>>>>>> ( O.o)
>>>>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>>>>>> planes de dominaci?n mundial.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> NumPy-Discussion mailing list
>>>>>>> NumPy-Discussion at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>>>>
>>>>>>> I would certainly not be opposed to having a hashing based indexing
>>>>>> mechanism; I think it would make sense design-wise to have a HashIndex
>>>>>> class with the same interface as the rest, and use that subclass in those
>>>>>> arraysetops where it makes sense. The 'how to' of indexing and its
>>>>>> applications are largely orthogonal I think (with some tiny performance
>>>>>> compromises which are worth the abstraction imo). For datasets which are
>>>>>> not purely random, have many unique items, and which do not fit into cache,
>>>>>> I would expect sorting to come out on top, but indeed it depends on the
>>>>>> dataset.
>>>>>>
>>>>>> Yeah, the question how pandas does grouping, and whether we can do
>>>>>> better, is a relevant one.
>>>>>>
>>>>>> From what I understand, pandas relies on cython extensions to get
>>>>>> vectorized grouping functionality. This is no longer necessary since the
>>>>>> introduction of ufuncs in numpy. I don't know how the implementations
>>>>>> compare in terms of performance, but I doubt the difference is huge.
>>>>>>
>>>>>> I personally use grouping a lot in my code, and I don't like
>>>>>> having to use pandas for it. Most importantly, I don't want to go around
>>>>>> creating a dataframe for a single one-line hit-and-run association between
>>>>>> keys and values. The permanent association of different types of data and
>>>>>> their metadata which pandas offers is I think the key difference from
>>>>>> numpy, which is all about manipulating just plain ndarrays. Arguably,
>>>>>> grouping itself is a pretty elementary manipulating of ndarrays, and adding
>>>>>> calls to DataFrame or Series inbetween a statement that could just be
>>>>>> simply group_by(keys).mean(values) feels wrong to me. As does including
>>>>>> pandas as a dependency just to use this small piece of
>>>>>> functionality. Grouping is a more general functionality than any particular
>>>>>> method of organizing your data.
>>>>>>
>>>>>> In terms of features, adding transformations and filtering might be
>>>>>> nice too; I hadn't thought about it, but that is because unlike the
>>>>>> currently implemented features, the need has never arose for me. Im only a
>>>>>> small sample size, and I don't see any fundamental objection to adding such
>>>>>> functionality though. It certainly raises the question as to where to draw
>>>>>> the line with pandas; but my rule of thumb is that if you can think of it
>>>>>> as an elementary operation on ndarrays, then it probably belongs in numpy.
>>>>>>
>>>>>>
>>>>> Oh I forgot to add: with an indexing mechanism based on sorting,
>>>>> unique values and counts also come 'for free', not counting the O(N) cost
>>>>> of actually creating those arrays. The only time an operating relying on an
>>>>> index incurs another nontrivial amount of overhead is in case its 'rank' or
>>>>> 'inverse' property is used, which invokes another argsort. But for the vast
>>>>> majority of grouping or set operations, these properties are never used.
>>>>>
>>>>
>>>> That extra argsort is now gone from master:
>>>>
>>>> https://github.com/numpy/numpy/pull/5012
>>>>
>>>> Even with this improvement, returning any index typically makes
>>>> `np.unique` run at least 2x slower:
>>>>
>>>> In [1]: import numpy as np
>>>> In [2]: a = np.random.randint(100, size=(1000,))
>>>> In [3]: %timeit np.unique(a)
>>>> 10000 loops, best of 3: 37.3 us per loop
>>>> In [4]: %timeit np.unique(a, return_inverse=True)
>>>> 10000 loops, best of 3: 62.1 us per loop
>>>> In [5]: %timeit np.unique(a, return_index=True)
>>>> 10000 loops, best of 3: 72.8 us per loop
>>>> In [6]: %timeit np.unique(a, return_counts=True)
>>>> 10000 loops, best of 3: 56.4 us per loop
>>>> In [7]: %timeit np.unique(a, return_index=True, return_inverse=True,
>>>> return_coun
>>>> ts=True)
>>>> 10000 loops, best of 3: 112 us per loop
>>>>
>>>> --
>>>> (\__/)
>>>> ( O.o)
>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>>> planes de dominaci?n mundial.
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>> Yeah, I looked at the numpy implementation, and it seems these speed
>> differences are simply the result of the extra O(N) costs involved, so my
>> implementation would have the same characteristics. If another array copy
>> or two has meaningful impact on performance, then you are done as far as
>> optimization within numpy is concerned, id say. You could fuse those loops
>> on the C level, but as you said I think its better to think about these
>> kind of optimizations once we have a more complete picture of the
>> functionality we want.
>>
>> Good call on removing the extra argsort, that hadn't occurred to me.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140904/f4daba96/attachment.html>

From robert.kern at gmail.com  Thu Sep  4 16:40:02 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 4 Sep 2014 21:40:02 +0100
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <lu9il3$eb7$1@ger.gmane.org>
References: <lu9il3$eb7$1@ger.gmane.org>
Message-ID: <CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>

On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html

What would you like to say about it?

-- 
Robert Kern


From joseph.martinot-lagarde at m4x.org  Thu Sep  4 20:47:28 2014
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Fri, 05 Sep 2014 02:47:28 +0200
Subject: [Numpy-discussion] 'norm' keyword for FFT functions
Message-ID: <lub170$cov$3@ger.gmane.org>

I have an old PR [1] to fix #2142 [2]. The idea is to have a new keyword
for all fft functions to define the normalisation of the fft:
- if 'norm' is None (the default), the normalisation is the current one:
fft() is not normalized ans ifft is normalized by 1/n.
- if norm is "ortho", the direct and inverse transforms are both
normalized by 1/sqrt(n). The results are then unitary.

The keyword name and value is consistent with scipy.fftpack.dct.

Do you feel that it should be merged ?

Joseph

      [1] https://github.com/numpy/numpy/pull/3883
      [2] https://github.com/numpy/numpy/issues/2142


From joseph.martinot-lagarde at m4x.org  Thu Sep  4 20:46:51 2014
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Fri, 05 Sep 2014 02:46:51 +0200
Subject: [Numpy-discussion] Multiple comment tokens for loadtxt
Message-ID: <lub15r$cov$2@ger.gmane.org>

loadtxt currently has a keyword to change the comment token. The PR
#4612 [1] enables to define multiple comment token for a file. It is
motivated by #2633 [2]

What is your position on this one ?

Joseph

      [1] https://github.com/numpy/numpy/pull/4612
      [2] https://github.com/numpy/numpy/issues/2633


From ndbecker2 at gmail.com  Fri Sep  5 07:05:46 2014
From: ndbecker2 at gmail.com (Neal Becker)
Date: Fri, 05 Sep 2014 07:05:46 -0400
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
References: <lu9il3$eb7$1@ger.gmane.org>
	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>
Message-ID: <luc5ea$5d6$1@ger.gmane.org>

Robert Kern wrote:

> On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html
> 
> What would you like to say about it?
> 

If it is faster (and at least as good), maybe we'd like to adopt it to replace 
that used for mtrand

-- 
-- Those who don't understand recursion are doomed to repeat it


From robert.kern at gmail.com  Fri Sep  5 07:13:33 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 5 Sep 2014 12:13:33 +0100
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <luc5ea$5d6$1@ger.gmane.org>
References: <lu9il3$eb7$1@ger.gmane.org>
	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>
	<luc5ea$5d6$1@ger.gmane.org>
Message-ID: <CAF6FJitKjM=ZfHWXqx9tFX8D7ptd_T5bR4eMfukkp6OkZ9pLvA@mail.gmail.com>

On Fri, Sep 5, 2014 at 12:05 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Robert Kern wrote:
>
>> On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html
>>
>> What would you like to say about it?
>>
>
> If it is faster (and at least as good), maybe we'd like to adopt it to replace
> that used for mtrand

It's a variant of the standard MT rather than just an implementation
of it, so we can't just drop it in. You will need to build the
infrastructure to support multiple PRNGs first (or rather, build the
infrastructure to reuse the non-uniform distribution code with
multiple core PRNGs).

-- 
Robert Kern


From ndbecker2 at gmail.com  Fri Sep  5 13:19:57 2014
From: ndbecker2 at gmail.com (Neal Becker)
Date: Fri, 05 Sep 2014 13:19:57 -0400
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
References: <lu9il3$eb7$1@ger.gmane.org>
	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>
	<luc5ea$5d6$1@ger.gmane.org>
	<CAF6FJitKjM=ZfHWXqx9tFX8D7ptd_T5bR4eMfukkp6OkZ9pLvA@mail.gmail.com>
Message-ID: <lucrbt$71f$1@ger.gmane.org>

Robert Kern wrote:

> On Fri, Sep 5, 2014 at 12:05 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> Robert Kern wrote:
>>
>>> On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>>> http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html
>>>
>>> What would you like to say about it?
>>>
>>
>> If it is faster (and at least as good), maybe we'd like to adopt it to
>> replace that used for mtrand
> 
> It's a variant of the standard MT rather than just an implementation
> of it, so we can't just drop it in. You will need to build the
> infrastructure to support multiple PRNGs first (or rather, build the
> infrastructure to reuse the non-uniform distribution code with
> multiple core PRNGs).
> 

You mean it's not backward compatible because it won't generate exactly the same 
sequence of output for a given seed, and therefore we wouldn't want to make that 
change?

I think it's somewhat debatable whether generating a different sequence of 
random numbers counts as breaking backward compatibility. 

-- 
-- Those who don't understand recursion are doomed to repeat it


From robert.kern at gmail.com  Fri Sep  5 13:28:14 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 5 Sep 2014 18:28:14 +0100
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <lucrbt$71f$1@ger.gmane.org>
References: <lu9il3$eb7$1@ger.gmane.org>
	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>
	<luc5ea$5d6$1@ger.gmane.org>
	<CAF6FJitKjM=ZfHWXqx9tFX8D7ptd_T5bR4eMfukkp6OkZ9pLvA@mail.gmail.com>
	<lucrbt$71f$1@ger.gmane.org>
Message-ID: <CAF6FJisR244_hhk_VX_W_zEZMJshteDwKyO9tOPeZ_AcEGuCHQ@mail.gmail.com>

On Fri, Sep 5, 2014 at 6:19 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Robert Kern wrote:
>
>> On Fri, Sep 5, 2014 at 12:05 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> Robert Kern wrote:
>>>
>>>> On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>>>> http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html
>>>>
>>>> What would you like to say about it?
>>>>
>>>
>>> If it is faster (and at least as good), maybe we'd like to adopt it to
>>> replace that used for mtrand
>>
>> It's a variant of the standard MT rather than just an implementation
>> of it, so we can't just drop it in. You will need to build the
>> infrastructure to support multiple PRNGs first (or rather, build the
>> infrastructure to reuse the non-uniform distribution code with
>> multiple core PRNGs).
>
> You mean it's not backward compatible because it won't generate exactly the same
> sequence of output for a given seed, and therefore we wouldn't want to make that
> change?
>
> I think it's somewhat debatable whether generating a different sequence of
> random numbers counts as breaking backward compatibility.

It's not a matter of debate over the semantics of the term "backwards
compatibility". This is a policy that we have explicitly chosen for
numpy.random (distinct from our backwards compatibility policy for the
rest of numpy) because it was requested of us.

-- 
Robert Kern


From hodge at stsci.edu  Fri Sep  5 13:26:29 2014
From: hodge at stsci.edu (Phil Hodge)
Date: Fri, 5 Sep 2014 13:26:29 -0400
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <lucrbt$71f$1@ger.gmane.org>
References: <lu9il3$eb7$1@ger.gmane.org>	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>	<luc5ea$5d6$1@ger.gmane.org>	<CAF6FJitKjM=ZfHWXqx9tFX8D7ptd_T5bR4eMfukkp6OkZ9pLvA@mail.gmail.com>
	<lucrbt$71f$1@ger.gmane.org>
Message-ID: <5409F245.50501@stsci.edu>

On 09/05/2014 01:19 PM, Neal Becker wrote:
> You mean it's not backward compatible because it won't generate exactly the same
> sequence of output for a given seed, and therefore we wouldn't want to make that
> change?
>
> I think it's somewhat debatable whether generating a different sequence of
> random numbers counts as breaking backward compatibility.

For regression tests it's essential to be able to generate the same 
sequence of values from a pseudo-random number generator.

Phil


From jhaiduce at gmail.com  Fri Sep  5 13:40:52 2014
From: jhaiduce at gmail.com (John Haiducek)
Date: Fri, 5 Sep 2014 13:40:52 -0400
Subject: [Numpy-discussion] numpy.vectorize docstrings not shown by help
	command
Message-ID: <C6D19027-31EF-4F08-B0A5-B2D43B9F23E7@gmail.com>

When I apply numpy.vectorize() to a function, documentation tools behave inconsistently with regard to the new, vectorized function. The function's __doc__ attribute does contain the docstring of the original function as expected, but the built-in help() command displays the documentation of the numpy.vectorize class, and sphinx-autodoc fails to display the function at all. Is there a way to get the docstring of the original function display everywhere as expected?

For instance:

>>> import numpy as np
>>> def myfunc(x):
...     "Square x"
...     return x**2
... 
>>> myfunc=np.vectorize(myfunc)
>>> print myfunc.__doc__
Square x
>>> help(myfunc)
(displays documentation of np.vectorize)


From alan.isaac at gmail.com  Fri Sep  5 15:17:00 2014
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Fri, 05 Sep 2014 15:17:00 -0400
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <lucrbt$71f$1@ger.gmane.org>
References: <lu9il3$eb7$1@ger.gmane.org>	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>	<luc5ea$5d6$1@ger.gmane.org>	<CAF6FJitKjM=ZfHWXqx9tFX8D7ptd_T5bR4eMfukkp6OkZ9pLvA@mail.gmail.com>
	<lucrbt$71f$1@ger.gmane.org>
Message-ID: <540A0C2C.8030806@gmail.com>

On 9/5/2014 1:19 PM, Neal Becker wrote:
> I think it's somewhat debatable whether generating a different sequence of
> random numbers counts as breaking backward compatibility.


Please: it does.

Alan Isaac


From sturla.molden at gmail.com  Fri Sep  5 16:36:52 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Fri, 05 Sep 2014 22:36:52 +0200
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <lucrbt$71f$1@ger.gmane.org>
References: <lu9il3$eb7$1@ger.gmane.org>	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>	<luc5ea$5d6$1@ger.gmane.org>	<CAF6FJitKjM=ZfHWXqx9tFX8D7ptd_T5bR4eMfukkp6OkZ9pLvA@mail.gmail.com>
	<lucrbt$71f$1@ger.gmane.org>
Message-ID: <lud6ro$o0i$1@ger.gmane.org>

On 05/09/14 19:19, Neal Becker wrote:
 >> It's a variant of the standard MT rather than just an implementation
 >> of it, so we can't just drop it in. You will need to build the
 >> infrastructure to support multiple PRNGs first (or rather, build the
 >> infrastructure to reuse the non-uniform distribution code with
 >> multiple core PRNGs).
 >>
 >
 > You mean it's not backward compatible because it won't generate 
exactly the same
 > sequence of output for a given seed, and therefore we wouldn't want 
to make that
 > change?

I thought he meant that the standard MT and SFMT have global states, 
whereas NumPy's randomkit MT does not. Because of this, NumPy allows you 
to use multiple RandomState instances. With a vanilla MT or SFMT there 
would be just one.


Sturla


From robert.kern at gmail.com  Fri Sep  5 18:42:09 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 5 Sep 2014 23:42:09 +0100
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <lud6ro$o0i$1@ger.gmane.org>
References: <lu9il3$eb7$1@ger.gmane.org>
	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>
	<luc5ea$5d6$1@ger.gmane.org>
	<CAF6FJitKjM=ZfHWXqx9tFX8D7ptd_T5bR4eMfukkp6OkZ9pLvA@mail.gmail.com>
	<lucrbt$71f$1@ger.gmane.org> <lud6ro$o0i$1@ger.gmane.org>
Message-ID: <CAF6FJiuRt=-+2A2Uf1-uparTxNxWzfm47SwqF-ToMM0V_31gaQ@mail.gmail.com>

On Fri, Sep 5, 2014 at 9:36 PM, Sturla Molden <sturla.molden at gmail.com> wrote:
> On 05/09/14 19:19, Neal Becker wrote:
>  >> It's a variant of the standard MT rather than just an implementation
>  >> of it, so we can't just drop it in. You will need to build the
>  >> infrastructure to support multiple PRNGs first (or rather, build the
>  >> infrastructure to reuse the non-uniform distribution code with
>  >> multiple core PRNGs).
>  >
>  > You mean it's not backward compatible because it won't generate
> exactly the same
>  > sequence of output for a given seed, and therefore we wouldn't want
> to make that
>  > change?
>
> I thought he meant that the standard MT and SFMT have global states,
> whereas NumPy's randomkit MT does not. Because of this, NumPy allows you
> to use multiple RandomState instances. With a vanilla MT or SFMT there
> would be just one.

No, that is not what I meant. If the SFMT can be made to output the
same bitstream for the same seed, we can use it (modifying it if
necessary to avoid global state if necessary), but it does not look so
to me. I welcome corrections on that count (in PR form, preferably!).

-- 
Robert Kern


From sturla.molden at gmail.com  Fri Sep  5 19:50:14 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Fri, 5 Sep 2014 23:50:14 +0000 (UTC)
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
References: <lu9il3$eb7$1@ger.gmane.org>
	<CAF6FJityECjjVjYAs2cFBWPADfXjDP82WenUScr5CQ_FD242Sg@mail.gmail.com>
	<luc5ea$5d6$1@ger.gmane.org>
	<CAF6FJitKjM=ZfHWXqx9tFX8D7ptd_T5bR4eMfukkp6OkZ9pLvA@mail.gmail.com>
	<lucrbt$71f$1@ger.gmane.org> <lud6ro$o0i$1@ger.gmane.org>
	<CAF6FJiuRt=-+2A2Uf1-uparTxNxWzfm47SwqF-ToMM0V_31gaQ@mail.gmail.com>
Message-ID: <664646241431653467.005819sturla.molden-gmail.com@news.gmane.org>

Robert Kern <robert.kern at gmail.com> wrote:

> No, that is not what I meant. If the SFMT can be made to output the
> same bitstream for the same seed, we can use it (modifying it if
> necessary to avoid global state if necessary), but it does not look so
> to me. I welcome corrections on that count (in PR form, preferably!).

Saito's SFMT master thesis says SFMT as a different equidistribution, so it
will not give the same bitstream. But it also mentions a SIMD version of
the standard MT.


From jbednar at inf.ed.ac.uk  Sat Sep  6 13:37:32 2014
From: jbednar at inf.ed.ac.uk (James A. Bednar)
Date: Sat, 6 Sep 2014 18:37:32 +0100
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <mailman.7.1410022801.9405.numpy-discussion@scipy.org>
References: <mailman.7.1410022801.9405.numpy-discussion@scipy.org>
Message-ID: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>

|  Date: Fri, 05 Sep 2014 13:19:57 -0400
|  From: Neal Becker <ndbecker2 at gmail.com>
|  
|  I think it's somewhat debatable whether generating a different
|  sequence of random numbers counts as breaking backward
|  compatibility.

Please don't ever, ever break the sequence of numpy's random numbers!
Please!  We have put a lot of effort into being able to reproduce our
published work exactly, and all of that would be in vain if the
sequence changes.  See e.g.:
 
  http://journal.frontiersin.org/Journal/10.3389/fninf.2013.00044/full

We'd be very happy to see additional number generators appear
*alongside* the existing ones, though, particularly if there are
faster or otherwise better ones!

Jim Bednar

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jtaylor.debian at googlemail.com  Sun Sep  7 06:33:19 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Sun, 07 Sep 2014 12:33:19 +0200
Subject: [Numpy-discussion] ANN:  NumPy 1.9.0 release
Message-ID: <540C346F.8070109@googlemail.com>

Hello,

We are proud to announce the 1.9.0 release of NumPy.

This release includes numerous performance improvements, most
significantly the indexing code has been rewritten be a lot times
faster for most cases and performance of using small arrays and scalars
has almost doubled.
Plenty of other functions have been improved too, nonzero, where,
bincount, searchsorted, count_nonzero, floating point min/max, boolean
argmin/argmax, triu/tril, masked sorting can be expected to perform
significantly better in many cases.

Also NumPy 1.9.0 releases the GIL for more functions, most notably
indexing now releases it and the random modules state object has a
private lock instead of using the GIL. This allows leveraging pure
python threads more efficiently.

In order to make working with arrays containing NaN values easier
nanmedian and nanpercentile have been added which ignore these values.
These functions and the regular median and percentile now also support
generalized axis arguments that ufuncs already have, these allow
reducing along multiple axis in one call.

Please see the release notes for all the details. Please also take not
of the many small compatibility changes and deprecation in the notes.
https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst

The source tarballs and win32 binaries can be downloaded here:
https://sourceforge.net/projects/numpy/files/NumPy/1.9.0

Cheers,
The NumPy Development Team

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140907/2f74f081/attachment.sig>

From sebastian at sipsolutions.net  Sun Sep  7 08:53:07 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 07 Sep 2014 14:53:07 +0200
Subject: [Numpy-discussion] ANN:  NumPy 1.9.0 release
In-Reply-To: <540C346F.8070109@googlemail.com>
References: <540C346F.8070109@googlemail.com>
Message-ID: <1410094387.14955.0.camel@sebastian-t440>

On So, 2014-09-07 at 12:33 +0200, Julian Taylor wrote:
> Hello,
> 
> We are proud to announce the 1.9.0 release of NumPy.
> 

Awesome, thanks for the release management!

- Sebastian


> This release includes numerous performance improvements, most
> significantly the indexing code has been rewritten be a lot times
> faster for most cases and performance of using small arrays and scalars
> has almost doubled.
> Plenty of other functions have been improved too, nonzero, where,
> bincount, searchsorted, count_nonzero, floating point min/max, boolean
> argmin/argmax, triu/tril, masked sorting can be expected to perform
> significantly better in many cases.
> 
> Also NumPy 1.9.0 releases the GIL for more functions, most notably
> indexing now releases it and the random modules state object has a
> private lock instead of using the GIL. This allows leveraging pure
> python threads more efficiently.
> 
> In order to make working with arrays containing NaN values easier
> nanmedian and nanpercentile have been added which ignore these values.
> These functions and the regular median and percentile now also support
> generalized axis arguments that ufuncs already have, these allow
> reducing along multiple axis in one call.
> 
> Please see the release notes for all the details. Please also take not
> of the many small compatibility changes and deprecation in the notes.
> https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst
> 
> The source tarballs and win32 binaries can be downloaded here:
> https://sourceforge.net/projects/numpy/files/NumPy/1.9.0
> 
> Cheers,
> The NumPy Development Team
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140907/11d1b3b3/attachment.sig>

From charlesr.harris at gmail.com  Sun Sep  7 10:07:27 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 7 Sep 2014 08:07:27 -0600
Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release
In-Reply-To: <540C346F.8070109@googlemail.com>
References: <540C346F.8070109@googlemail.com>
Message-ID: <CAB6mnx+R2ZT9JiuvarwJZfPuWiE--rm+hUKcOoPz+QGy4Sdarw@mail.gmail.com>

On Sun, Sep 7, 2014 at 4:33 AM, Julian Taylor <jtaylor.debian at googlemail.com
> wrote:

> Hello,
>
> We are proud to announce the 1.9.0 release of NumPy.
>
> This release includes numerous performance improvements, most
> significantly the indexing code has been rewritten be a lot times
> faster for most cases and performance of using small arrays and scalars
> has almost doubled.
> Plenty of other functions have been improved too, nonzero, where,
> bincount, searchsorted, count_nonzero, floating point min/max, boolean
> argmin/argmax, triu/tril, masked sorting can be expected to perform
> significantly better in many cases.
>
> Also NumPy 1.9.0 releases the GIL for more functions, most notably
> indexing now releases it and the random modules state object has a
> private lock instead of using the GIL. This allows leveraging pure
> python threads more efficiently.
>
> In order to make working with arrays containing NaN values easier
> nanmedian and nanpercentile have been added which ignore these values.
> These functions and the regular median and percentile now also support
> generalized axis arguments that ufuncs already have, these allow
> reducing along multiple axis in one call.
>
> Please see the release notes for all the details. Please also take not
> of the many small compatibility changes and deprecation in the notes.
>
> https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst
>
> The source tarballs and win32 binaries can be downloaded here:
> https://sourceforge.net/projects/numpy/files/NumPy/1.9.0
>
> Cheers,
> The NumPy Development Team
>

Great! Thanks for all the work you did getting this out.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140907/93e4029b/attachment.html>

From rays at blue-cove.com  Sun Sep  7 11:23:41 2014
From: rays at blue-cove.com (RayS)
Date: Sun, 07 Sep 2014 08:23:41 -0700
Subject: [Numpy-discussion] numpy whois limit exceeded
Message-ID: <201409071523.s87FNfoo024957@blue-cove.com>


In looking up module info for company code policy, I noticed the page
http://www.networksolutions.com/whois/results.jsp?domain=numpy.org
gives "WHOIS LIMIT EXCEEDED - SEE WWW.PIR.ORG/WHOIS FOR DETAILS"
So the domain has been getting a lot of attention 
today:http://pir.org/resources/faq/ "Public Interest Registry 
accredited registrars who submit queries through the web-based WHOIS 
search mechanism are limited to 50 queries per minute. "

(Oddly, WWW.PIR.ORG/WHOIS is a bad link, and pir.org's page: 
http://pir.org/domains/org-domain/ is also non-functional, for this 
Firefox at least, so they are Bcc'd)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140907/5ed0a9bc/attachment.html>

From sturla.molden at gmail.com  Sun Sep  7 15:51:57 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Sun, 7 Sep 2014 19:51:57 +0000 (UTC)
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
Message-ID: <834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>

"James A. Bednar" <jbednar at inf.ed.ac.uk> wrote:

> Please don't ever, ever break the sequence of numpy's random numbers!
> Please!  We have put a lot of effort into being able to reproduce our
> published work exactly, 

Jup, it cannot be understated how important this is for reproducibility of
published research. Thus from a scientific standpoint it is important that
random numbers are not random. Some might think that it's just important
that they are as "random as possible", but reproducibility is just as
essential to stochastic simulations. This is also why parallel random
number generators and parallel stochastic algorithms are so hard to
program, because the operating systems' scheduler can easily break the
reproducibility. I think we could add new generators to NumPy though,
perhaps with a keyword to control the algorithm (defaulting to the current
Mersenne Twister). A particular candidate I think we should consider is the
DCMT, which is exceptionally good for parallel algorithms (the DCMT code is
now BSD licensed, it used to be LGPL). Because of the way randomkit it
written, it is very easy to plug-in different generators.

Sturla


From ben.root at ou.edu  Sun Sep  7 15:59:23 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Sun, 7 Sep 2014 15:59:23 -0400
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
Message-ID: <CANNq6Fk9LL=HgTxaiDQa_xeo-c-DKV6WhnRLo3cOESwokSOgqg@mail.gmail.com>

In addition to issues with reproducibility, think of all of the unit tests
that would break!


On Sun, Sep 7, 2014 at 3:51 PM, Sturla Molden <sturla.molden at gmail.com>
wrote:

> "James A. Bednar" <jbednar at inf.ed.ac.uk> wrote:
>
> > Please don't ever, ever break the sequence of numpy's random numbers!
> > Please!  We have put a lot of effort into being able to reproduce our
> > published work exactly,
>
> Jup, it cannot be understated how important this is for reproducibility of
> published research. Thus from a scientific standpoint it is important that
> random numbers are not random. Some might think that it's just important
> that they are as "random as possible", but reproducibility is just as
> essential to stochastic simulations. This is also why parallel random
> number generators and parallel stochastic algorithms are so hard to
> program, because the operating systems' scheduler can easily break the
> reproducibility. I think we could add new generators to NumPy though,
> perhaps with a keyword to control the algorithm (defaulting to the current
> Mersenne Twister). A particular candidate I think we should consider is the
> DCMT, which is exceptionally good for parallel algorithms (the DCMT code is
> now BSD licensed, it used to be LGPL). Because of the way randomkit it
> written, it is very easy to plug-in different generators.
>
> Sturla
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140907/ba81e4df/attachment.html>

From sturla.molden at gmail.com  Sun Sep  7 16:23:41 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Sun, 7 Sep 2014 20:23:41 +0000 (UTC)
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
	<CANNq6Fk9LL=HgTxaiDQa_xeo-c-DKV6WhnRLo3cOESwokSOgqg@mail.gmail.com>
Message-ID: <666709792431814167.348075sturla.molden-gmail.com@news.gmane.org>

Benjamin Root <ben.root at ou.edu> wrote:
> In addition to issues with reproducibility, think of all of the unit tests
> that would break!

That is a reproducibility problem :)


From stefan.otte at gmail.com  Mon Sep  8 09:29:08 2014
From: stefan.otte at gmail.com (Stefan Otte)
Date: Mon, 8 Sep 2014 15:29:08 +0200
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
	Block matrices like in matlab
Message-ID: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>

Hey,

quite often I work with block matrices. Matlab offers the convenient notation

    [ a b; c d ]

to stack matrices. The numpy equivalent is kinda clumsy:

vstack([hstack([a,b]), hstack([c,d])])

I wrote the little function `stack` that does exactly that:

    stack([[a, b], [c, d]])

In my case `stack` replaced `hstack` and `vstack` almost completely.

If you're interested in including it in numpy I created a pull request
[1]. I'm looking forward to getting some feedback!


Best,
 Stefan


[1] https://github.com/numpy/numpy/pull/5057


From josef.pktd at gmail.com  Mon Sep  8 10:00:47 2014
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 8 Sep 2014 10:00:47 -0400
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <666709792431814167.348075sturla.molden-gmail.com@news.gmane.org>
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
	<CANNq6Fk9LL=HgTxaiDQa_xeo-c-DKV6WhnRLo3cOESwokSOgqg@mail.gmail.com>
	<666709792431814167.348075sturla.molden-gmail.com@news.gmane.org>
Message-ID: <CAMMTP+DPDhuRJ1Z9MxO_mW7VYVkxOGmr0QbDzh3CF6wc5sfKaQ@mail.gmail.com>

On Sun, Sep 7, 2014 at 4:23 PM, Sturla Molden <sturla.molden at gmail.com>
wrote:

> Benjamin Root <ben.root at ou.edu> wrote:
> > In addition to issues with reproducibility, think of all of the unit
> tests
> > that would break!
>
> That is a reproducibility problem :)
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


Related aside about reproducibility of random numbers:

IMO:
scipy.stats.distributions.rvs does not guarantee yet the values of the
random numbers except for those that are directly produced by numpy.
In contrast to numpy.random, scipy's distributions don't have unit tests
for the specific values of the rvs, and the rvs code for specific
distribution could still be improved in some cases, I guess

Josef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/6ffd2dbd/attachment.html>

From sturla.molden at gmail.com  Mon Sep  8 10:41:57 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Mon, 8 Sep 2014 14:41:57 +0000 (UTC)
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
	Block matrices like in matlab
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
Message-ID: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>

Stefan Otte <stefan.otte at gmail.com> wrote:

>     stack([[a, b], [c, d]])
> 
> In my case `stack` replaced `hstack` and `vstack` almost completely.
> 
> If you're interested in including it in numpy I created a pull request
> [1]. I'm looking forward to getting some feedback!

As far as I can see, it uses hstack and vstack. But that means a and b have
to have the same number of rows, c and d must have the same rumber of rows,
and hstack((a,b)) and hstack((c,d)) must have the same number of columns. 

Thus it requires a regularity like this:

AAAABB
AAAABB
CCCDDD
CCCDDD
CCCDDD
CCCDDD

What if we just ignore this constraint, and only require the output to be
rectangular? Now we have a 'tetris game':

AAAABB
AAAABB
CCCCBB
CCCCBB
CCCCDD
CCCCDD

or 

AAAABB
AAAABB
CCCCBB
CCCCBB
CCCCBB
CCCCBB

This should be 'stackable', yes? Or perhaps we need another stacking
function for this, say numpy.tetris? 

And while we're at it, what about higher dimensions? should there be an
ndstack function too?


Sturla


From josef.pktd at gmail.com  Mon Sep  8 12:08:12 2014
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 8 Sep 2014 12:08:12 -0400
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
Message-ID: <CAMMTP+Bq-UESBUtuw+1gJac-wS8YC3sVDcJ1NNKn=JcOHfh3hw@mail.gmail.com>

On Mon, Sep 8, 2014 at 10:41 AM, Sturla Molden <sturla.molden at gmail.com>
wrote:

> Stefan Otte <stefan.otte at gmail.com> wrote:
>
> >     stack([[a, b], [c, d]])
> >
> > In my case `stack` replaced `hstack` and `vstack` almost completely.
> >
> > If you're interested in including it in numpy I created a pull request
> > [1]. I'm looking forward to getting some feedback!
>

np.asarray(np.bmat(....)) ?

Josef


>
> As far as I can see, it uses hstack and vstack. But that means a and b have
> to have the same number of rows, c and d must have the same rumber of rows,
> and hstack((a,b)) and hstack((c,d)) must have the same number of columns.
>
> Thus it requires a regularity like this:
>
> AAAABB
> AAAABB
> CCCDDD
> CCCDDD
> CCCDDD
> CCCDDD
>
> What if we just ignore this constraint, and only require the output to be
> rectangular? Now we have a 'tetris game':
>
> AAAABB
> AAAABB
> CCCCBB
> CCCCBB
> CCCCDD
> CCCCDD
>
> or
>
> AAAABB
> AAAABB
> CCCCBB
> CCCCBB
> CCCCBB
> CCCCBB
>
> This should be 'stackable', yes? Or perhaps we need another stacking
> function for this, say numpy.tetris?
>
> And while we're at it, what about higher dimensions? should there be an
> ndstack function too?
>
>
> Sturla
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/99a33e40/attachment.html>

From jaime.frio at gmail.com  Mon Sep  8 12:10:35 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Mon, 8 Sep 2014 09:10:35 -0700
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
Message-ID: <CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>

On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden <sturla.molden at gmail.com>
wrote:

> Stefan Otte <stefan.otte at gmail.com> wrote:
>
> >     stack([[a, b], [c, d]])
> >
> > In my case `stack` replaced `hstack` and `vstack` almost completely.
> >
> > If you're interested in including it in numpy I created a pull request
> > [1]. I'm looking forward to getting some feedback!
>
> As far as I can see, it uses hstack and vstack. But that means a and b have
> to have the same number of rows, c and d must have the same rumber of rows,
> and hstack((a,b)) and hstack((c,d)) must have the same number of columns.
>
> Thus it requires a regularity like this:
>
> AAAABB
> AAAABB
> CCCDDD
> CCCDDD
> CCCDDD
> CCCDDD
>
> What if we just ignore this constraint, and only require the output to be
> rectangular? Now we have a 'tetris game':
>
> AAAABB
> AAAABB
> CCCCBB
> CCCCBB
> CCCCDD
> CCCCDD
>
> or
>
> AAAABB
> AAAABB
> CCCCBB
> CCCCBB
> CCCCBB
> CCCCBB
>
> This should be 'stackable', yes? Or perhaps we need another stacking
> function for this, say numpy.tetris?
>
> And while we're at it, what about higher dimensions? should there be an
> ndstack function too?
>

This is starting to look like the second time in a row Stefan tries to
extend numpy with a simple convenience function, and he gets tricked into
implementing some sophisticated algorithm...

For his next PR I expect nothing less than an NP-complete problem. ;-)


> Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/e459f551/attachment.html>

From njs at pobox.com  Mon Sep  8 12:13:46 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Mon, 8 Sep 2014 12:13:46 -0400
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
Message-ID: <CAPJVwB=v3BRKTvQao7e0Ukgt3v3mbi1pVphAn3BEauyhXi6tCw@mail.gmail.com>

On 8 Sep 2014 10:42, "Sturla Molden" <sturla.molden at gmail.com> wrote:
>
> Stefan Otte <stefan.otte at gmail.com> wrote:
>
> >     stack([[a, b], [c, d]])
> >
> > In my case `stack` replaced `hstack` and `vstack` almost completely.
> >
> > If you're interested in including it in numpy I created a pull request
> > [1]. I'm looking forward to getting some feedback!
>
> As far as I can see, it uses hstack and vstack. But that means a and b
have
> to have the same number of rows, c and d must have the same rumber of
rows,
> and hstack((a,b)) and hstack((c,d)) must have the same number of columns.
>
> Thus it requires a regularity like this:
>
> AAAABB
> AAAABB
> CCCDDD
> CCCDDD
> CCCDDD
> CCCDDD
>
> What if we just ignore this constraint, and only require the output to be
> rectangular? Now we have a 'tetris game':
>
> AAAABB
> AAAABB
> CCCCBB
> CCCCBB
> CCCCDD
> CCCCDD
>
> or
>
> AAAABB
> AAAABB
> CCCCBB
> CCCCBB
> CCCCBB
> CCCCBB
>
> This should be 'stackable', yes? Or perhaps we need another stacking
> function for this, say numpy.tetris?

It's not at all obvious to me how to describe such "tetris" configurations,
or interpret then unambiguously. Do you have a more detailed specification
in mind?

> And while we're at it, what about higher dimensions? should there be an
> ndstack function too?

Same comment here.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/863aca6e/attachment.html>

From t.b.poole at gmail.com  Mon Sep  8 12:14:22 2014
From: t.b.poole at gmail.com (tpoole)
Date: Mon, 8 Sep 2014 09:14:22 -0700 (MST)
Subject: [Numpy-discussion] Weighted Covariance/correlation
In-Reply-To: <1408110393.20638.16.camel@sebastian-t440>
References: <1408110393.20638.16.camel@sebastian-t440>
Message-ID: <1410192862477-38570.post@n7.nabble.com>

Hi all, 

Any input to this? Last time it generated a fair bit of discussion, which
I?ll summarise here. 

It?s currently possible to calculate a weighted average using np.average,
but the corresponding functionality does not exist for (co)variance or
corrcoeff calculations. In this case it?s less straightforward, and we need
to worry about what type of information the weights contain. 

Repeat type weights are the easiest to explain. Here the variances of 

[x1, x2, x3] with weights [2, 1, 3] 

and 

[x1, x1, x2, x3, x3, x3] 

are identical. For Bessel correction the total number of samples is obtained
by summing the weights. These weights do not have to be integer, and in this
case the only important assumption is that their sum represents the total
sample size. 

The second type of weights are importances or accuracies. Here the weighs
represent the relative strength of contributions from each of the associated
samples. Because this is a purely relative relation, there?s no concrete
information about the total number of samples. This has to be obtained from
the effective sample size, given by (sum(weights)^2)/sum(weights^2). 

I think the the clearest way of providing both options is to have a boolean
switch indicating if the weights represent repeats or frequency type
information. I can?t immediately see a good motivation for allowing both
concurrently, and think this could cause confusion. 

Tom 

On 15 Aug 2014, at 14:46, Sebastian Berg <[hidden email]> wrote: 

> Hi all, 
> 
> Tom Poole has opened pull request 
> https://github.com/numpy/numpy/pull/4960 to implement weights into 
> np.cov (correlation can be added), somewhat picking up the effort 
> started by Noel Dawe in https://github.com/numpy/numpy/pull/3864. 
> 
> The pull request would currently implement an accuracy type `weights` 
> keyword argument as default, but have a switch `repeat_weights` to use 
> repeat type weights instead (frequency type are a special case of this I 
> think). 
> 
> As far as I can see, the code is in a state that it can be tested. But 
> since it is a new feature, the names/defaults are up for discussion, so 
> maybe someone who might use such a feature has a preference. I know we 
> had a short discussion about this before, but it was a while ago. For 
> example another option would be to have the two weights as two keyword 
> arguments, instead of a boolean switch. 
> 
> Regards, 
> 
> Sebastian 
> 
> _______________________________________________ 
> NumPy-Discussion mailing list 
> [hidden email] 
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


--
View this message in context: http://numpy-discussion.10968.n7.nabble.com/Weighted-Covariance-correlation-tp38394p38570.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.


From josef.pktd at gmail.com  Mon Sep  8 12:18:25 2014
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 8 Sep 2014 12:18:25 -0400
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
	<CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
Message-ID: <CAMMTP+DYF6wkScCoZi-VUDSDM37GccYBxYQBiixHix36wVPp_g@mail.gmail.com>

On Mon, Sep 8, 2014 at 12:10 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden <sturla.molden at gmail.com>
> wrote:
>
>> Stefan Otte <stefan.otte at gmail.com> wrote:
>>
>> >     stack([[a, b], [c, d]])
>> >
>> > In my case `stack` replaced `hstack` and `vstack` almost completely.
>> >
>> > If you're interested in including it in numpy I created a pull request
>> > [1]. I'm looking forward to getting some feedback!
>>
>> As far as I can see, it uses hstack and vstack. But that means a and b
>> have
>> to have the same number of rows, c and d must have the same rumber of
>> rows,
>> and hstack((a,b)) and hstack((c,d)) must have the same number of columns.
>>
>> Thus it requires a regularity like this:
>>
>> AAAABB
>> AAAABB
>> CCCDDD
>> CCCDDD
>> CCCDDD
>> CCCDDD
>>
>> What if we just ignore this constraint, and only require the output to be
>> rectangular? Now we have a 'tetris game':
>>
>> AAAABB
>> AAAABB
>> CCCCBB
>> CCCCBB
>> CCCCDD
>> CCCCDD
>>
>> or
>>
>> AAAABB
>> AAAABB
>> CCCCBB
>> CCCCBB
>> CCCCBB
>> CCCCBB
>>
>> This should be 'stackable', yes? Or perhaps we need another stacking
>> function for this, say numpy.tetris?
>>
>> And while we're at it, what about higher dimensions? should there be an
>> ndstack function too?
>>
>
> This is starting to look like the second time in a row Stefan tries to
> extend numpy with a simple convenience function, and he gets tricked into
> implementing some sophisticated algorithm...
>

Maybe the third time is a gem.


>
> For his next PR I expect nothing less than an NP-complete problem. ;-)
>

How about cholesky or qr updating?   I could use one right now.

Josef


>
>
>> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/a5d18c11/attachment.html>

From hoogendoorn.eelco at gmail.com  Mon Sep  8 12:41:58 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Mon, 8 Sep 2014 18:41:58 +0200
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
	<CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
Message-ID: <CAO0rnfEVTDus5C66Krn0G-=+hb5CNAaGyi=UfAAFyu1Xuy6atA@mail.gmail.com>

Sturla: im not sure if the intention is always unambiguous, for such more
flexible arrangements.

Also, I doubt such situations arise often in practice; if the arrays arnt a
grid, they are probably a nested grid, and the code would most naturally
concatenate them with nested calls to a stacking function.

However, some form of nd-stack function would be neat in my opinion.

On Mon, Sep 8, 2014 at 6:10 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden <sturla.molden at gmail.com>
> wrote:
>
>> Stefan Otte <stefan.otte at gmail.com> wrote:
>>
>> >     stack([[a, b], [c, d]])
>> >
>> > In my case `stack` replaced `hstack` and `vstack` almost completely.
>> >
>> > If you're interested in including it in numpy I created a pull request
>> > [1]. I'm looking forward to getting some feedback!
>>
>> As far as I can see, it uses hstack and vstack. But that means a and b
>> have
>> to have the same number of rows, c and d must have the same rumber of
>> rows,
>> and hstack((a,b)) and hstack((c,d)) must have the same number of columns.
>>
>> Thus it requires a regularity like this:
>>
>> AAAABB
>> AAAABB
>> CCCDDD
>> CCCDDD
>> CCCDDD
>> CCCDDD
>>
>> What if we just ignore this constraint, and only require the output to be
>> rectangular? Now we have a 'tetris game':
>>
>> AAAABB
>> AAAABB
>> CCCCBB
>> CCCCBB
>> CCCCDD
>> CCCCDD
>>
>> or
>>
>> AAAABB
>> AAAABB
>> CCCCBB
>> CCCCBB
>> CCCCBB
>> CCCCBB
>>
>> This should be 'stackable', yes? Or perhaps we need another stacking
>> function for this, say numpy.tetris?
>>
>> And while we're at it, what about higher dimensions? should there be an
>> ndstack function too?
>>
>
> This is starting to look like the second time in a row Stefan tries to
> extend numpy with a simple convenience function, and he gets tricked into
> implementing some sophisticated algorithm...
>
> For his next PR I expect nothing less than an NP-complete problem. ;-)
>
>
>> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/1c69154e/attachment.html>

From ben.root at ou.edu  Mon Sep  8 12:55:34 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Mon, 8 Sep 2014 12:55:34 -0400
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <CAO0rnfEVTDus5C66Krn0G-=+hb5CNAaGyi=UfAAFyu1Xuy6atA@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
	<CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
	<CAO0rnfEVTDus5C66Krn0G-=+hb5CNAaGyi=UfAAFyu1Xuy6atA@mail.gmail.com>
Message-ID: <CANNq6Fmv-_vKJKTA2cxDfUtoTQ1R2T5v6t11aBZ18Upk1=efKw@mail.gmail.com>

A use case would be "image stitching" or even data tiling. I have had to
implement something like this at work (so, I can't share it, unfortunately)
and it even goes so far as to allow the caller to specify how much the
tiles can overlap and such. The specification is ungodly hideous and I
doubt I would be willing to share it even if I could lest I release
code-thulu upon the world...

I think just having this generalize stack feature would be nice start.
Tetris could be built on top of that later. (Although, I do vote for at
least 3 or 4 dimensional stacking, if possible).

Cheers!
Ben Root


On Mon, Sep 8, 2014 at 12:41 PM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

> Sturla: im not sure if the intention is always unambiguous, for such more
> flexible arrangements.
>
> Also, I doubt such situations arise often in practice; if the arrays arnt
> a grid, they are probably a nested grid, and the code would most naturally
> concatenate them with nested calls to a stacking function.
>
> However, some form of nd-stack function would be neat in my opinion.
>
> On Mon, Sep 8, 2014 at 6:10 PM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden <sturla.molden at gmail.com>
>> wrote:
>>
>>> Stefan Otte <stefan.otte at gmail.com> wrote:
>>>
>>> >     stack([[a, b], [c, d]])
>>> >
>>> > In my case `stack` replaced `hstack` and `vstack` almost completely.
>>> >
>>> > If you're interested in including it in numpy I created a pull request
>>> > [1]. I'm looking forward to getting some feedback!
>>>
>>> As far as I can see, it uses hstack and vstack. But that means a and b
>>> have
>>> to have the same number of rows, c and d must have the same rumber of
>>> rows,
>>> and hstack((a,b)) and hstack((c,d)) must have the same number of columns.
>>>
>>> Thus it requires a regularity like this:
>>>
>>> AAAABB
>>> AAAABB
>>> CCCDDD
>>> CCCDDD
>>> CCCDDD
>>> CCCDDD
>>>
>>> What if we just ignore this constraint, and only require the output to be
>>> rectangular? Now we have a 'tetris game':
>>>
>>> AAAABB
>>> AAAABB
>>> CCCCBB
>>> CCCCBB
>>> CCCCDD
>>> CCCCDD
>>>
>>> or
>>>
>>> AAAABB
>>> AAAABB
>>> CCCCBB
>>> CCCCBB
>>> CCCCBB
>>> CCCCBB
>>>
>>> This should be 'stackable', yes? Or perhaps we need another stacking
>>> function for this, say numpy.tetris?
>>>
>>> And while we're at it, what about higher dimensions? should there be an
>>> ndstack function too?
>>>
>>
>> This is starting to look like the second time in a row Stefan tries to
>> extend numpy with a simple convenience function, and he gets tricked into
>> implementing some sophisticated algorithm...
>>
>> For his next PR I expect nothing less than an NP-complete problem. ;-)
>>
>>
>>> Jaime
>>
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>> de dominaci?n mundial.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/d5b1ffb9/attachment.html>

From ben.root at ou.edu  Mon Sep  8 13:00:11 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Mon, 8 Sep 2014 13:00:11 -0400
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <CANNq6Fmv-_vKJKTA2cxDfUtoTQ1R2T5v6t11aBZ18Upk1=efKw@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
	<CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
	<CAO0rnfEVTDus5C66Krn0G-=+hb5CNAaGyi=UfAAFyu1Xuy6atA@mail.gmail.com>
	<CANNq6Fmv-_vKJKTA2cxDfUtoTQ1R2T5v6t11aBZ18Upk1=efKw@mail.gmail.com>
Message-ID: <CANNq6FnBnDAvB=tJ7eZMJTR+GFj6tH-wSibLUcHDJXAjGky98g@mail.gmail.com>

Btw, on a somewhat related note, whoever can implement ndarray to be able
to use views from other ndarrays stitched together would get a fruit basket
from me come the holidays and possibly naming rights for the next kid...

Cheers!
Ben Root

On Mon, Sep 8, 2014 at 12:55 PM, Benjamin Root <ben.root at ou.edu> wrote:

> A use case would be "image stitching" or even data tiling. I have had to
> implement something like this at work (so, I can't share it, unfortunately)
> and it even goes so far as to allow the caller to specify how much the
> tiles can overlap and such. The specification is ungodly hideous and I
> doubt I would be willing to share it even if I could lest I release
> code-thulu upon the world...
>
> I think just having this generalize stack feature would be nice start.
> Tetris could be built on top of that later. (Although, I do vote for at
> least 3 or 4 dimensional stacking, if possible).
>
> Cheers!
> Ben Root
>
>
> On Mon, Sep 8, 2014 at 12:41 PM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>> Sturla: im not sure if the intention is always unambiguous, for such more
>> flexible arrangements.
>>
>> Also, I doubt such situations arise often in practice; if the arrays arnt
>> a grid, they are probably a nested grid, and the code would most naturally
>> concatenate them with nested calls to a stacking function.
>>
>> However, some form of nd-stack function would be neat in my opinion.
>>
>> On Mon, Sep 8, 2014 at 6:10 PM, Jaime Fern?ndez del R?o <
>> jaime.frio at gmail.com> wrote:
>>
>>> On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden <sturla.molden at gmail.com>
>>> wrote:
>>>
>>>> Stefan Otte <stefan.otte at gmail.com> wrote:
>>>>
>>>> >     stack([[a, b], [c, d]])
>>>> >
>>>> > In my case `stack` replaced `hstack` and `vstack` almost completely.
>>>> >
>>>> > If you're interested in including it in numpy I created a pull request
>>>> > [1]. I'm looking forward to getting some feedback!
>>>>
>>>> As far as I can see, it uses hstack and vstack. But that means a and b
>>>> have
>>>> to have the same number of rows, c and d must have the same rumber of
>>>> rows,
>>>> and hstack((a,b)) and hstack((c,d)) must have the same number of
>>>> columns.
>>>>
>>>> Thus it requires a regularity like this:
>>>>
>>>> AAAABB
>>>> AAAABB
>>>> CCCDDD
>>>> CCCDDD
>>>> CCCDDD
>>>> CCCDDD
>>>>
>>>> What if we just ignore this constraint, and only require the output to
>>>> be
>>>> rectangular? Now we have a 'tetris game':
>>>>
>>>> AAAABB
>>>> AAAABB
>>>> CCCCBB
>>>> CCCCBB
>>>> CCCCDD
>>>> CCCCDD
>>>>
>>>> or
>>>>
>>>> AAAABB
>>>> AAAABB
>>>> CCCCBB
>>>> CCCCBB
>>>> CCCCBB
>>>> CCCCBB
>>>>
>>>> This should be 'stackable', yes? Or perhaps we need another stacking
>>>> function for this, say numpy.tetris?
>>>>
>>>> And while we're at it, what about higher dimensions? should there be an
>>>> ndstack function too?
>>>>
>>>
>>> This is starting to look like the second time in a row Stefan tries to
>>> extend numpy with a simple convenience function, and he gets tricked into
>>> implementing some sophisticated algorithm...
>>>
>>> For his next PR I expect nothing less than an NP-complete problem. ;-)
>>>
>>>
>>>> Jaime
>>>
>>> --
>>> (\__/)
>>> ( O.o)
>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>> planes de dominaci?n mundial.
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/1dbf94b0/attachment.html>

From noel.pierre.andre at gmail.com  Mon Sep  8 13:05:03 2014
From: noel.pierre.andre at gmail.com (Pierre-Andre Noel)
Date: Mon, 08 Sep 2014 10:05:03 -0700
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
Message-ID: <540DE1BF.4030405@gmail.com>

 > I think we could add new generators to NumPy though,
 > perhaps with a keyword to control the algorithm (defaulting to the 
current
 > Mersenne Twister).

Why not do something like the C++11 <random>? In <random>, a "generator" 
is the engine producing randomness, and a "distribution" decides what is 
the type of outputs that you want. Here is the example on 
http://www.cplusplus.com/reference/random/ .

     std::default_random_engine generator;
     std::uniform_int_distribution<int> distribution(1,6);
     int dice_roll = distribution(generator);  // generates number in 
the range 1..6

For convenience, you can bind the generator with the distribution (still 
from the web page above).

     auto dice = std::bind(distribution, generator);
     int wisdom = dice()+dice()+dice();

Here is how I propose to adapt this scheme to numpy. First, there would 
be a global generator defaulting to the current implementation of 
Mersene Twister. Calls to numpy's "RandomState", "seed", "get_state" and 
"set_state" would affect this global generator.

All numpy functions associated to random number generation (i.e., 
everything listed on 
http://docs.scipy.org/doc/numpy/reference/routines.random.html except 
for "RandomState", "seed", "get_state" and "set_state") would accept the 
kwarg "generator", which defaults to the global generator (by default 
the current Mersene Twister).

Now there could be other generator objects: the new Mersene Twister, 
some lightweight-but-worse generator, or some cryptographically-safe 
random generator. Each such generator would have "RandomState", "seed", 
"get_state" and "set_state" methods (except perhaps the 
criptographically-safe ones). When calling a numpy function with 
generator=my_generator, that function uses this generator instead the 
global one. Moreover, there would be be a function, say 
select_default_random_engine(generator), which changes the global 
generator to a user-specified one.

 > This is also why parallel random
 > number generators and parallel stochastic algorithms are so hard to
 > program, because the operating systems' scheduler can easily break the
 > reproducibility.

What I propose would also simplify this: each thread can use its own 
independently-seeded generator. Timing is no longer a problem: as long 
as which-thread-does-what is not affected by the scheduler, the 
execution remains deterministic.

On 09/07/2014 12:51 PM, Sturla Molden wrote:
> "James A. Bednar" <jbednar at inf.ed.ac.uk> wrote:
>
>> Please don't ever, ever break the sequence of numpy's random numbers!
>> Please!  We have put a lot of effort into being able to reproduce our
>> published work exactly,
> Jup, it cannot be understated how important this is for reproducibility of
> published research. Thus from a scientific standpoint it is important that
> random numbers are not random. Some might think that it's just important
> that they are as "random as possible", but reproducibility is just as
> essential to stochastic simulations. This is also why parallel random
> number generators and parallel stochastic algorithms are so hard to
> program, because the operating systems' scheduler can easily break the
> reproducibility. I think we could add new generators to NumPy though,
> perhaps with a keyword to control the algorithm (defaulting to the current
> Mersenne Twister). A particular candidate I think we should consider is the
> DCMT, which is exceptionally good for parallel algorithms (the DCMT code is
> now BSD licensed, it used to be LGPL). Because of the way randomkit it
> written, it is very easy to plug-in different generators.
>
> Sturla
>
>


From shoyer at gmail.com  Mon Sep  8 13:18:45 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 8 Sep 2014 10:18:45 -0700
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <CANNq6FnBnDAvB=tJ7eZMJTR+GFj6tH-wSibLUcHDJXAjGky98g@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
	<CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
	<CAO0rnfEVTDus5C66Krn0G-=+hb5CNAaGyi=UfAAFyu1Xuy6atA@mail.gmail.com>
	<CANNq6Fmv-_vKJKTA2cxDfUtoTQ1R2T5v6t11aBZ18Upk1=efKw@mail.gmail.com>
	<CANNq6FnBnDAvB=tJ7eZMJTR+GFj6tH-wSibLUcHDJXAjGky98g@mail.gmail.com>
Message-ID: <CAEQ_TvfMMSSa6VB7jNcEECNEYXqXN0DNo+wPqR=uccdVecf7pA@mail.gmail.com>

On Mon, Sep 8, 2014 at 10:00 AM, Benjamin Root <ben.root at ou.edu> wrote:

> Btw, on a somewhat related note, whoever can implement ndarray to be able
> to use views from other ndarrays stitched together would get a fruit basket
> from me come the holidays and possibly naming rights for the next kid...
>

Ben, you should check out Biggus, which does (at least some) of what you're
describing:
https://github.com/SciTools/biggus

Two things I would like to see before this makes it into numpy:
(1) It should handle arbitrary dimensional arrays, not just 2D.
(2) It needs to be more efficient. Composing vstack and hstack directly
makes a whole level of unnecessary intermediate copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/e9a7519d/attachment.html>

From hoogendoorn.eelco at gmail.com  Mon Sep  8 13:37:31 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Mon, 8 Sep 2014 19:37:31 +0200
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Blockmatrices like in matlab
In-Reply-To: <CANNq6FnBnDAvB=tJ7eZMJTR+GFj6tH-wSibLUcHDJXAjGky98g@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
	<CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
	<CAO0rnfEVTDus5C66Krn0G-=+hb5CNAaGyi=UfAAFyu1Xuy6atA@mail.gmail.com>
	<CANNq6Fmv-_vKJKTA2cxDfUtoTQ1R2T5v6t11aBZ18Upk1=efKw@mail.gmail.com>
	<CANNq6FnBnDAvB=tJ7eZMJTR+GFj6tH-wSibLUcHDJXAjGky98g@mail.gmail.com>
Message-ID: <540de947.a50cb40a.5945.0e25@mx.google.com>

Blaze aims to do something like that; to make the notion of an array and how it stores it's data far more flexible. But if it isn't a single strided ND array, it isn't numpy. This concept lies at its very heart; and for good reasons I would add.

-----Original Message-----
From: "Benjamin Root" <ben.root at ou.edu>
Sent: ?8-?9-?2014 19:00
To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
Subject: Re: [Numpy-discussion] Generalize hstack/vstack --> stack; Blockmatrices like in matlab

Btw, on a somewhat related note, whoever can implement ndarray to be able to use views from other ndarrays stitched together would get a fruit basket from me come the holidays and possibly naming rights for the next kid...


Cheers!
Ben Root


On Mon, Sep 8, 2014 at 12:55 PM, Benjamin Root <ben.root at ou.edu> wrote:

A use case would be "image stitching" or even data tiling. I have had to implement something like this at work (so, I can't share it, unfortunately) and it even goes so far as to allow the caller to specify how much the tiles can overlap and such. The specification is ungodly hideous and I doubt I would be willing to share it even if I could lest I release code-thulu upon the world...


I think just having this generalize stack feature would be nice start. Tetris could be built on top of that later. (Although, I do vote for at least 3 or 4 dimensional stacking, if possible).


Cheers!
Ben Root


On Mon, Sep 8, 2014 at 12:41 PM, Eelco Hoogendoorn <hoogendoorn.eelco at gmail.com> wrote:

Sturla: im not sure if the intention is always unambiguous, for such more flexible arrangements.

Also, I doubt such situations arise often in practice; if the arrays arnt a grid, they are probably a nested grid, and the code would most naturally concatenate them with nested calls to a stacking function.

However, some form of nd-stack function would be neat in my opinion.


On Mon, Sep 8, 2014 at 6:10 PM, Jaime Fern?ndez del R?o <jaime.frio at gmail.com> wrote:

On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden <sturla.molden at gmail.com> wrote:

Stefan Otte <stefan.otte at gmail.com> wrote:

>     stack([[a, b], [c, d]])
>
> In my case `stack` replaced `hstack` and `vstack` almost completely.
>
> If you're interested in including it in numpy I created a pull request
> [1]. I'm looking forward to getting some feedback!

As far as I can see, it uses hstack and vstack. But that means a and b have
to have the same number of rows, c and d must have the same rumber of rows,
and hstack((a,b)) and hstack((c,d)) must have the same number of columns.

Thus it requires a regularity like this:

AAAABB
AAAABB
CCCDDD
CCCDDD
CCCDDD
CCCDDD

What if we just ignore this constraint, and only require the output to be
rectangular? Now we have a 'tetris game':

AAAABB
AAAABB
CCCCBB
CCCCBB
CCCCDD
CCCCDD

or

AAAABB
AAAABB
CCCCBB
CCCCBB
CCCCBB
CCCCBB

This should be 'stackable', yes? Or perhaps we need another stacking
function for this, say numpy.tetris?

And while we're at it, what about higher dimensions? should there be an
ndstack function too?


This is starting to look like the second time in a row Stefan tries to extend numpy with a simple convenience function, and he gets tricked into implementing some sophisticated algorithm...


For his next PR I expect nothing less than an NP-complete problem. ;-)
 
Jaime


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. 


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140908/7e74bb8e/attachment.html>

From jtaylor.debian at googlemail.com  Mon Sep  8 14:00:27 2014
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Mon, 08 Sep 2014 20:00:27 +0200
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <540DE1BF.4030405@gmail.com>
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
	<540DE1BF.4030405@gmail.com>
Message-ID: <540DEEBB.5090803@googlemail.com>

On 08.09.2014 19:05, Pierre-Andre Noel wrote:
>  > I think we could add new generators to NumPy though,
>  > perhaps with a keyword to control the algorithm (defaulting to the 
> current
>  > Mersenne Twister).
> 
...
> 
> Here is how I propose to adapt this scheme to numpy. First, there would 
> be a global generator defaulting to the current implementation of 
> Mersene Twister. Calls to numpy's "RandomState", "seed", "get_state" and 
> "set_state" would affect this global generator.
> 
> All numpy functions associated to random number generation (i.e., 
> everything listed on 
> http://docs.scipy.org/doc/numpy/reference/routines.random.html except 
> for "RandomState", "seed", "get_state" and "set_state") would accept the 
> kwarg "generator", which defaults to the global generator (by default 
> the current Mersene Twister).
> 


I don't think every function would need a generator argument, for real
world applications it should be sufficient to have the state object
carry which generator is used and maybe a switch to change the global one.

But as already mentioned by Robert, we know what we can do, what is
missing is someone writting the code.


From robert.kern at gmail.com  Mon Sep  8 14:43:34 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 8 Sep 2014 19:43:34 +0100
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <540DE1BF.4030405@gmail.com>
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
	<540DE1BF.4030405@gmail.com>
Message-ID: <CAF6FJisOkh2EKgTsz7d9TXQDpX6=3Q4czO5SBn5_rJgt0qo9oA@mail.gmail.com>

On Mon, Sep 8, 2014 at 6:05 PM, Pierre-Andre Noel
<noel.pierre.andre at gmail.com> wrote:
>  > I think we could add new generators to NumPy though,
>  > perhaps with a keyword to control the algorithm (defaulting to the
> current
>  > Mersenne Twister).
>
> Why not do something like the C++11 <random>? In <random>, a "generator"
> is the engine producing randomness, and a "distribution" decides what is
> the type of outputs that you want. Here is the example on
> http://www.cplusplus.com/reference/random/ .
>
>      std::default_random_engine generator;
>      std::uniform_int_distribution<int> distribution(1,6);
>      int dice_roll = distribution(generator);  // generates number in
> the range 1..6
>
> For convenience, you can bind the generator with the distribution (still
> from the web page above).
>
>      auto dice = std::bind(distribution, generator);
>      int wisdom = dice()+dice()+dice();
>
> Here is how I propose to adapt this scheme to numpy. First, there would
> be a global generator defaulting to the current implementation of
> Mersene Twister. Calls to numpy's "RandomState", "seed", "get_state" and
> "set_state" would affect this global generator.
>
> All numpy functions associated to random number generation (i.e.,
> everything listed on
> http://docs.scipy.org/doc/numpy/reference/routines.random.html except
> for "RandomState", "seed", "get_state" and "set_state") would accept the
> kwarg "generator", which defaults to the global generator (by default
> the current Mersene Twister).
>
> Now there could be other generator objects: the new Mersene Twister,
> some lightweight-but-worse generator, or some cryptographically-safe
> random generator. Each such generator would have "RandomState", "seed",
> "get_state" and "set_state" methods (except perhaps the
> criptographically-safe ones). When calling a numpy function with
> generator=my_generator, that function uses this generator instead the
> global one. Moreover, there would be be a function, say
> select_default_random_engine(generator), which changes the global
> generator to a user-specified one.

I think the Python standard library's example is more instructive. We
have new classes for each new core uniform generator. They will share
a common superclass to share the implementation of the non-uniform
distributions. numpy.random.RandomState will continue to be the
current Mersenne Twister implementation, and so will the underlying
global RandomState for all of the convenience functions in
numpy.random. If you want the SFMT variant, you instantiate
numpy.random.SFMT() and call its methods directly.

-- 
Robert Kern


From joseph.martinot-lagarde at m4x.org  Mon Sep  8 16:39:49 2014
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Mon, 08 Sep 2014 22:39:49 +0200
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
	Block matrices like in matlab
In-Reply-To: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
Message-ID: <540E1415.5040600@m4x.org>

Le 08/09/2014 16:41, Sturla Molden a ?crit :
> Stefan Otte <stefan.otte at gmail.com> wrote:
>
>>      stack([[a, b], [c, d]])
>>
>> In my case `stack` replaced `hstack` and `vstack` almost completely.
>>
>> If you're interested in including it in numpy I created a pull request
>> [1]. I'm looking forward to getting some feedback!
>
> As far as I can see, it uses hstack and vstack. But that means a and b have
> to have the same number of rows, c and d must have the same rumber of rows,
> and hstack((a,b)) and hstack((c,d)) must have the same number of columns.
>
> Thus it requires a regularity like this:
>
> AAAABB
> AAAABB
> CCCDDD
> CCCDDD
> CCCDDD
> CCCDDD
>
> What if we just ignore this constraint, and only require the output to be
> rectangular? Now we have a 'tetris game':
>
> AAAABB
> AAAABB
> CCCCBB
> CCCCBB
> CCCCDD
> CCCCDD
>
> or
>
> AAAABB
> AAAABB
> CCCCBB
> CCCCBB
> CCCCBB
> CCCCBB

stack([stack([[a], [c]]), b])

>
> This should be 'stackable', yes? Or perhaps we need another stacking
> function for this, say numpy.tetris?
The function should be implemented for its name only ! I like it !
>
> And while we're at it, what about higher dimensions? should there be an
> ndstack function too?
>
>
> Sturla
>


From joseph.martinot-lagarde at m4x.org  Mon Sep  8 16:40:39 2014
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Mon, 08 Sep 2014 22:40:39 +0200
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
	Block matrices like in matlab
In-Reply-To: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
Message-ID: <540E1447.4040009@m4x.org>

Le 08/09/2014 15:29, Stefan Otte a ?crit :
> Hey,
>
> quite often I work with block matrices. Matlab offers the convenient notation
>
>      [ a b; c d ]
>
> to stack matrices. The numpy equivalent is kinda clumsy:
>
> vstack([hstack([a,b]), hstack([c,d])])
>
> I wrote the little function `stack` that does exactly that:
>
>      stack([[a, b], [c, d]])
>
> In my case `stack` replaced `hstack` and `vstack` almost completely.
>
> If you're interested in including it in numpy I created a pull request
> [1]. I'm looking forward to getting some feedback!
>
>
> Best,
>   Stefan
>
>
>
> [1] https://github.com/numpy/numpy/pull/5057
>
The outside brackets are redundant, stack([[a, b], [c, d]]) should be 
stack([a, b], [c, d])


From cjw at ncf.ca  Mon Sep  8 18:14:24 2014
From: cjw at ncf.ca (cjw)
Date: Mon, 08 Sep 2014 18:14:24 -0400
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <540E1447.4040009@m4x.org>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<540E1447.4040009@m4x.org>
Message-ID: <540E2A40.70208@ncf.ca>


On 08-Sep-14 4:40 PM, Joseph Martinot-Lagarde wrote:
> Le 08/09/2014 15:29, Stefan Otte a ?crit :
>> Hey,
>>
>> quite often I work with block matrices. Matlab offers the convenient notation
>>
>>       [ a b; c d ]
This would appear to be a desirable way to go.

Numpy has something similar for strings.  The above is neater.

Colin W.
>> to stack matrices. The numpy equivalent is kinda clumsy:
>>
>> vstack([hstack([a,b]), hstack([c,d])])
>>
>> I wrote the little function `stack` that does exactly that:
>>
>>       stack([[a, b], [c, d]])
>>
>> In my case `stack` replaced `hstack` and `vstack` almost completely.
>>
>> If you're interested in including it in numpy I created a pull request
>> [1]. I'm looking forward to getting some feedback!
>>
>>
>> Best,
>>    Stefan
>>
>>
>>
>> [1] https://github.com/numpy/numpy/pull/5057
>>
> The outside brackets are redundant, stack([[a, b], [c, d]]) should be
> stack([a, b], [c, d])
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From stefan.otte at gmail.com  Tue Sep  9 05:42:38 2014
From: stefan.otte at gmail.com (Stefan Otte)
Date: Tue, 9 Sep 2014 11:42:38 +0200
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <540E2A40.70208@ncf.ca>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<540E1447.4040009@m4x.org> <540E2A40.70208@ncf.ca>
Message-ID: <CAFync2hBH0Mi0mn=sOVezTvPv5eU4nqUa_uW+g7KmM0nT+up5A@mail.gmail.com>

Hey,

@Josef, I wasn't aware of `bmat` and `np.asarray(np.bmat(....))` does
basically what I want and what I'm already using.

Regarding the Tetris problem: that never happened to me, but stack, as
Josef pointed out, can handle that already :)

I like the idea of removing the redundant square brackets:
    stack([[a, b], [c, d]]) --> stack([a, b], [c, d])
However, if the brackets are there there is no difference between
creating a `np.array` and stacking arrays with `np.stack`.

If we want to get fancy and turn this PR into something bigger
(working our way up to a NP-complete problem ;)) then how about this.
I sometimes have arrays that look like:
  AB0
  0 C
Where 0 is a scalar but is supposed to fill the rest of the array.
Having something like 0 in there might lead to ambiguities though. What does
  ABC
  0D0
mean? One could limit the "filler" to appear only on the left or the right:
  AB0
  0CD
But even then the shape is not completely determined. So we could
require to have one row that only consists of arrays and determines
the shape. Alternatively we could have a keyword parameter `shape`:
  stack([A, B, 0], [0, C, D], shape=(8, 8))

Colin, with `bmat` you can do what you're asking for. Directly taken
from the example:
>>> np.bmat('A,B; C,D')
matrix([[1, 1, 2, 2],
        [1, 1, 2, 2],
        [3, 4, 7, 8],
        [5, 6, 9, 0]])


General question: If `bmat` already offers something like `stack`
should we even bother implementing `stack`? More code leads to more
bugs and maintenance work.


Best,
 Stefan


On Tue, Sep 9, 2014 at 12:14 AM, cjw <cjw at ncf.ca> wrote:
>
> On 08-Sep-14 4:40 PM, Joseph Martinot-Lagarde wrote:
>> Le 08/09/2014 15:29, Stefan Otte a ?crit :
>>> Hey,
>>>
>>> quite often I work with block matrices. Matlab offers the convenient notation
>>>
>>>       [ a b; c d ]
> This would appear to be a desirable way to go.
>
> Numpy has something similar for strings.  The above is neater.
>
> Colin W.
>>> to stack matrices. The numpy equivalent is kinda clumsy:
>>>
>>> vstack([hstack([a,b]), hstack([c,d])])
>>>
>>> I wrote the little function `stack` that does exactly that:
>>>
>>>       stack([[a, b], [c, d]])
>>>
>>> In my case `stack` replaced `hstack` and `vstack` almost completely.
>>>
>>> If you're interested in including it in numpy I created a pull request
>>> [1]. I'm looking forward to getting some feedback!
>>>
>>>
>>> Best,
>>>    Stefan
>>>
>>>
>>>
>>> [1] https://github.com/numpy/numpy/pull/5057
>>>
>> The outside brackets are redundant, stack([[a, b], [c, d]]) should be
>> stack([a, b], [c, d])
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From josef.pktd at gmail.com  Tue Sep  9 08:30:13 2014
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 9 Sep 2014 08:30:13 -0400
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <CAFync2hBH0Mi0mn=sOVezTvPv5eU4nqUa_uW+g7KmM0nT+up5A@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<540E1447.4040009@m4x.org> <540E2A40.70208@ncf.ca>
	<CAFync2hBH0Mi0mn=sOVezTvPv5eU4nqUa_uW+g7KmM0nT+up5A@mail.gmail.com>
Message-ID: <CAMMTP+Dt_0WvBC0t7jG1h3b_TLxNKMGNdc8iKRtUgTmaRGTmRQ@mail.gmail.com>

On Tue, Sep 9, 2014 at 5:42 AM, Stefan Otte <stefan.otte at gmail.com> wrote:

> Hey,
>
> @Josef, I wasn't aware of `bmat` and `np.asarray(np.bmat(....))` does
> basically what I want and what I'm already using.
>

I never needed any tetris or anything similar except for the matched block
version.

Just to point out two more related functions

scipy.sparse also has `bmat` for sparse block matrices
scipy.linalg and scipy.sparse have `block_diag` to complement bmat.


What I sometimes wish for is a sparse pseudo kronecker product as
convenience to bmat, where the first (or the second) matrix contains (0,1)
flags where the 1's specify where to put the blocks.
(I'm not sure what I really mean, similar to block_diag but with a
different filling pattern.)

Josef


>
> Regarding the Tetris problem: that never happened to me, but stack, as
> Josef pointed out, can handle that already :)
>
> I like the idea of removing the redundant square brackets:
>     stack([[a, b], [c, d]]) --> stack([a, b], [c, d])
> However, if the brackets are there there is no difference between
> creating a `np.array` and stacking arrays with `np.stack`.
>
> If we want to get fancy and turn this PR into something bigger
> (working our way up to a NP-complete problem ;)) then how about this.
> I sometimes have arrays that look like:
>   AB0
>   0 C
> Where 0 is a scalar but is supposed to fill the rest of the array.
> Having something like 0 in there might lead to ambiguities though. What
> does
>   ABC
>   0D0
> mean? One could limit the "filler" to appear only on the left or the right:
>   AB0
>   0CD
> But even then the shape is not completely determined. So we could
> require to have one row that only consists of arrays and determines
> the shape. Alternatively we could have a keyword parameter `shape`:
>   stack([A, B, 0], [0, C, D], shape=(8, 8))
>
> Colin, with `bmat` you can do what you're asking for. Directly taken
> from the example:
> >>> np.bmat('A,B; C,D')
> matrix([[1, 1, 2, 2],
>         [1, 1, 2, 2],
>         [3, 4, 7, 8],
>         [5, 6, 9, 0]])
>
>
> General question: If `bmat` already offers something like `stack`
> should we even bother implementing `stack`? More code leads to more
> bugs and maintenance work.
>
>
> Best,
>  Stefan
>
>
>
> On Tue, Sep 9, 2014 at 12:14 AM, cjw <cjw at ncf.ca> wrote:
> >
> > On 08-Sep-14 4:40 PM, Joseph Martinot-Lagarde wrote:
> >> Le 08/09/2014 15:29, Stefan Otte a ?crit :
> >>> Hey,
> >>>
> >>> quite often I work with block matrices. Matlab offers the convenient
> notation
> >>>
> >>>       [ a b; c d ]
> > This would appear to be a desirable way to go.
> >
> > Numpy has something similar for strings.  The above is neater.
> >
> > Colin W.
> >>> to stack matrices. The numpy equivalent is kinda clumsy:
> >>>
> >>> vstack([hstack([a,b]), hstack([c,d])])
> >>>
> >>> I wrote the little function `stack` that does exactly that:
> >>>
> >>>       stack([[a, b], [c, d]])
> >>>
> >>> In my case `stack` replaced `hstack` and `vstack` almost completely.
> >>>
> >>> If you're interested in including it in numpy I created a pull request
> >>> [1]. I'm looking forward to getting some feedback!
> >>>
> >>>
> >>> Best,
> >>>    Stefan
> >>>
> >>>
> >>>
> >>> [1] https://github.com/numpy/numpy/pull/5057
> >>>
> >> The outside brackets are redundant, stack([[a, b], [c, d]]) should be
> >> stack([a, b], [c, d])
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140909/05b85246/attachment.html>

From josef.pktd at gmail.com  Tue Sep  9 08:42:27 2014
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 9 Sep 2014 08:42:27 -0400
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <CAMMTP+Dt_0WvBC0t7jG1h3b_TLxNKMGNdc8iKRtUgTmaRGTmRQ@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<540E1447.4040009@m4x.org> <540E2A40.70208@ncf.ca>
	<CAFync2hBH0Mi0mn=sOVezTvPv5eU4nqUa_uW+g7KmM0nT+up5A@mail.gmail.com>
	<CAMMTP+Dt_0WvBC0t7jG1h3b_TLxNKMGNdc8iKRtUgTmaRGTmRQ@mail.gmail.com>
Message-ID: <CAMMTP+D44uVfUV0961sx1UTupLCK-bQH2Vsp1hsr2eYz8okOdQ@mail.gmail.com>

On Tue, Sep 9, 2014 at 8:30 AM, <josef.pktd at gmail.com> wrote:

>
>
>
> On Tue, Sep 9, 2014 at 5:42 AM, Stefan Otte <stefan.otte at gmail.com> wrote:
>
>> Hey,
>>
>> @Josef, I wasn't aware of `bmat` and `np.asarray(np.bmat(....))` does
>> basically what I want and what I'm already using.
>>
>
> I never needed any tetris or anything similar except for the matched block
> version.
>
> Just to point out two more related functions
>
> scipy.sparse also has `bmat` for sparse block matrices
> scipy.linalg and scipy.sparse have `block_diag` to complement bmat.
>
>
> What I sometimes wish for is a sparse pseudo kronecker product as
> convenience to bmat, where the first (or the second) matrix contains (0,1)
> flags where the 1's specify where to put the blocks.
> (I'm not sure what I really mean, similar to block_diag but with a
> different filling pattern.)
>

(or in analogy to kronecker, np.nonzero provides the filling pattern, but
the submatrix is multiplied by the value.)

(application: unbalanced repeated measures or panel data)

Josef
(....)


>
> Josef
>
>
>
>>
>> Regarding the Tetris problem: that never happened to me, but stack, as
>> Josef pointed out, can handle that already :)
>>
>> I like the idea of removing the redundant square brackets:
>>     stack([[a, b], [c, d]]) --> stack([a, b], [c, d])
>> However, if the brackets are there there is no difference between
>> creating a `np.array` and stacking arrays with `np.stack`.
>>
>> If we want to get fancy and turn this PR into something bigger
>> (working our way up to a NP-complete problem ;)) then how about this.
>> I sometimes have arrays that look like:
>>   AB0
>>   0 C
>> Where 0 is a scalar but is supposed to fill the rest of the array.
>> Having something like 0 in there might lead to ambiguities though. What
>> does
>>   ABC
>>   0D0
>> mean? One could limit the "filler" to appear only on the left or the
>> right:
>>   AB0
>>   0CD
>> But even then the shape is not completely determined. So we could
>> require to have one row that only consists of arrays and determines
>> the shape. Alternatively we could have a keyword parameter `shape`:
>>   stack([A, B, 0], [0, C, D], shape=(8, 8))
>>
>> Colin, with `bmat` you can do what you're asking for. Directly taken
>> from the example:
>> >>> np.bmat('A,B; C,D')
>> matrix([[1, 1, 2, 2],
>>         [1, 1, 2, 2],
>>         [3, 4, 7, 8],
>>         [5, 6, 9, 0]])
>>
>>
>> General question: If `bmat` already offers something like `stack`
>> should we even bother implementing `stack`? More code leads to more
>> bugs and maintenance work.
>>
>>
>> Best,
>>  Stefan
>>
>>
>>
>> On Tue, Sep 9, 2014 at 12:14 AM, cjw <cjw at ncf.ca> wrote:
>> >
>> > On 08-Sep-14 4:40 PM, Joseph Martinot-Lagarde wrote:
>> >> Le 08/09/2014 15:29, Stefan Otte a ?crit :
>> >>> Hey,
>> >>>
>> >>> quite often I work with block matrices. Matlab offers the convenient
>> notation
>> >>>
>> >>>       [ a b; c d ]
>> > This would appear to be a desirable way to go.
>> >
>> > Numpy has something similar for strings.  The above is neater.
>> >
>> > Colin W.
>> >>> to stack matrices. The numpy equivalent is kinda clumsy:
>> >>>
>> >>> vstack([hstack([a,b]), hstack([c,d])])
>> >>>
>> >>> I wrote the little function `stack` that does exactly that:
>> >>>
>> >>>       stack([[a, b], [c, d]])
>> >>>
>> >>> In my case `stack` replaced `hstack` and `vstack` almost completely.
>> >>>
>> >>> If you're interested in including it in numpy I created a pull request
>> >>> [1]. I'm looking forward to getting some feedback!
>> >>>
>> >>>
>> >>> Best,
>> >>>    Stefan
>> >>>
>> >>>
>> >>>
>> >>> [1] https://github.com/numpy/numpy/pull/5057
>> >>>
>> >> The outside brackets are redundant, stack([[a, b], [c, d]]) should be
>> >> stack([a, b], [c, d])
>> >>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140909/28dac16a/attachment.html>

From thomas_unterthiner at web.de  Tue Sep  9 11:23:38 2014
From: thomas_unterthiner at web.de (Thomas Unterthiner)
Date: Tue, 09 Sep 2014 17:23:38 +0200
Subject: [Numpy-discussion] numpy ignores OPT/FOPT under Python3
Message-ID: <540F1B7A.8030108@web.de>

Hi!

I want to use the OPT/FOPT environment viariables to set compiler flags 
when compiling numpy. However it seems that they get ignored under 
python3. Using Ubuntu 14.04 and numpy 1.9.0, I did the following:

 >export OPT="-march=native"
 >export FOPT = "-march=native"
 > python setup.py build   # "python" executes python2.7
[...snip...]
C compiler: x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing 
-march=native -fPIC
^C

obviously under python 2.7, the additional march flag works as expected. 
However, using python3:

 >export OPT="-march=native"
 >export FOPT = "-march=native"
 >python3 setup.py build
[.... snip ...]
C compiler: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 
-Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC
^C


obviously, the flags aren't used in python 3.  Did I overlook something 
here? Do $OPT/$FOPT only work in Python 2.7 by design, is this a bug or 
did I miss something?

Cheers

Thomas


From njs at pobox.com  Tue Sep  9 14:08:52 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 9 Sep 2014 14:08:52 -0400
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <CAF6FJisOkh2EKgTsz7d9TXQDpX6=3Q4czO5SBn5_rJgt0qo9oA@mail.gmail.com>
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
	<540DE1BF.4030405@gmail.com>
	<CAF6FJisOkh2EKgTsz7d9TXQDpX6=3Q4czO5SBn5_rJgt0qo9oA@mail.gmail.com>
Message-ID: <CAPJVwB=XoPysNV4ZFX-dDTRDJTE7NDA=NpiQqhz+jj_NAFca0A@mail.gmail.com>

On 8 Sep 2014 14:43, "Robert Kern" <robert.kern at gmail.com> wrote:
>
> On Mon, Sep 8, 2014 at 6:05 PM, Pierre-Andre Noel
> <noel.pierre.andre at gmail.com> wrote:
> >  > I think we could add new generators to NumPy though,
> >  > perhaps with a keyword to control the algorithm (defaulting to the
> > current
> >  > Mersenne Twister).
> >
> > Why not do something like the C++11 <random>? In <random>, a "generator"
> > is the engine producing randomness, and a "distribution" decides what is
> > the type of outputs that you want. Here is the example on
> > http://www.cplusplus.com/reference/random/ .
> >
> >      std::default_random_engine generator;
> >      std::uniform_int_distribution<int> distribution(1,6);
> >      int dice_roll = distribution(generator);  // generates number in
> > the range 1..6
> >
> > For convenience, you can bind the generator with the distribution (still
> > from the web page above).
> >
> >      auto dice = std::bind(distribution, generator);
> >      int wisdom = dice()+dice()+dice();
> >
> > Here is how I propose to adapt this scheme to numpy. First, there would
> > be a global generator defaulting to the current implementation of
> > Mersene Twister. Calls to numpy's "RandomState", "seed", "get_state" and
> > "set_state" would affect this global generator.
> >
> > All numpy functions associated to random number generation (i.e.,
> > everything listed on
> > http://docs.scipy.org/doc/numpy/reference/routines.random.html except
> > for "RandomState", "seed", "get_state" and "set_state") would accept the
> > kwarg "generator", which defaults to the global generator (by default
> > the current Mersene Twister).
> >
> > Now there could be other generator objects: the new Mersene Twister,
> > some lightweight-but-worse generator, or some cryptographically-safe
> > random generator. Each such generator would have "RandomState", "seed",
> > "get_state" and "set_state" methods (except perhaps the
> > criptographically-safe ones). When calling a numpy function with
> > generator=my_generator, that function uses this generator instead the
> > global one. Moreover, there would be be a function, say
> > select_default_random_engine(generator), which changes the global
> > generator to a user-specified one.
>
> I think the Python standard library's example is more instructive. We
> have new classes for each new core uniform generator. They will share
> a common superclass to share the implementation of the non-uniform
> distributions. numpy.random.RandomState will continue to be the
> current Mersenne Twister implementation, and so will the underlying
> global RandomState for all of the convenience functions in
> numpy.random. If you want the SFMT variant, you instantiate
> numpy.random.SFMT() and call its methods directly.

There's also another reason why generator decisions should be part of the
RandomState object itself: we may want to change the distribution methods
themselves over time (e.g., people have been complaining for a while that
we use a suboptimal method for generating gaussian deviates), but changing
these creates similar backcompat problems. So we need a way to say "please
give me samples using the old gaussian implementation" or whatever.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140909/060b7c54/attachment.html>

From sturla.molden at gmail.com  Tue Sep  9 14:30:28 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Tue, 09 Sep 2014 20:30:28 +0200
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
In-Reply-To: <CAPJVwB=XoPysNV4ZFX-dDTRDJTE7NDA=NpiQqhz+jj_NAFca0A@mail.gmail.com>
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>	<540DE1BF.4030405@gmail.com>	<CAF6FJisOkh2EKgTsz7d9TXQDpX6=3Q4czO5SBn5_rJgt0qo9oA@mail.gmail.com>
	<CAPJVwB=XoPysNV4ZFX-dDTRDJTE7NDA=NpiQqhz+jj_NAFca0A@mail.gmail.com>
Message-ID: <lungu7$i6k$1@ger.gmane.org>

On 09/09/14 20:08, Nathaniel Smith wrote:

> There's also another reason why generator decisions should be part of
> the RandomState object itself: we may want to change the distribution
> methods themselves over time (e.g., people have been complaining for a
> while that we use a suboptimal method for generating gaussian deviates),
> but changing these creates similar backcompat problems.

Which one should we rather use? Ziggurat?

Sturla


From charlesr.harris at gmail.com  Tue Sep  9 15:52:19 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 9 Sep 2014 13:52:19 -0600
Subject: [Numpy-discussion] @ operator
Message-ID: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>

Hi All,

I'm in the midst of implementing the '@' operator (PEP 465), and there are
some behaviors that are unspecified by the PEP.


   1. Should the operator accept array_like for one of the arguments?
   2. Does it need to handle __numpy_ufunc__, or will __array_priority__
   serve?
   3. Do we want PyArray_Matmul in the numpy API?
   4. Should a matmul function be supplied by the multiarray module?

If 3 and 4 are wanted, should they use the __numpy_ufunc__ machinery, or
will __array_priority__ serve?

Note that the type number operators, __add__ and such, currently use
__numpy_ufunc__ in combination with __array_priority__, this in addition to
the fact that they are by default using ufuncs that do the same. I'd rather
that the __*__ operators simply rely on __array_priority__.


Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140909/aaabb89d/attachment.html>

From robert.kern at gmail.com  Tue Sep  9 16:03:23 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 9 Sep 2014 21:03:23 +0100
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
Message-ID: <CAF6FJiv07b+S+7mZ70MiLwtqYxT7xZ0uXpKkP2AqwDhq9_c0iw@mail.gmail.com>

On Tue, Sep 9, 2014 at 8:52 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Hi All,
>
> I'm in the midst of implementing the '@' operator (PEP 465), and there are
> some behaviors that are unspecified by the PEP.
>
> Should the operator accept array_like for one of the arguments?

I would be mildly disappointed if it didn't.

> Does it need to handle __numpy_ufunc__, or will __array_priority__ serve?

Not sure (TBH, I don't remember what __numpy_ufunc__ does off-hand and
don't feel bothered enough to look it up).

> Do we want PyArray_Matmul in the numpy API?

Probably.

> Should a matmul function be supplied by the multiarray module?

Yes, please. It's rules are a little different than dot()'s, so we
should have a function that does it.

-- 
Robert Kern


From davidmenhur at gmail.com  Wed Sep 10 03:12:39 2014
From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=)
Date: Wed, 10 Sep 2014 09:12:39 +0200
Subject: [Numpy-discussion] numpy ignores OPT/FOPT under Python3
In-Reply-To: <540F1B7A.8030108@web.de>
References: <540F1B7A.8030108@web.de>
Message-ID: <CAJhcF=3F+VsFcPX-PtGjPenGEH3OiW5S3ykgYVRvZE0hzu_b8w@mail.gmail.com>

On 9 September 2014 17:23, Thomas Unterthiner <thomas_unterthiner at web.de>
wrote:

>
> I want to use the OPT/FOPT environment viariables to set compiler flags
> when compiling numpy. However it seems that they get ignored under
> python3. Using Ubuntu 14.04 and numpy 1.9.0, I did the following:
>
>  >export OPT="-march=native"
>  >export FOPT = "-march=native"
>  > python setup.py build   # "python" executes python2.7
> [...snip...]
> C compiler: x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing
> -march=native -fPIC
> ^C


Running the same on my computer (Fedora 20, python 2.7) doesn't seem to
process the flags:

    C compiler: -fno-strict-aliasing -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -fPIC

Double-checking:

$ echo $OPT
-march=native
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140910/9200bf8f/attachment.html>

From bryanv at continuum.io  Wed Sep 10 09:05:08 2014
From: bryanv at continuum.io (Bryan Van de Ven)
Date: Wed, 10 Sep 2014 08:05:08 -0500
Subject: [Numpy-discussion] ANN: Bokeh 0.6 release
Message-ID: <88D1F25A-B655-40CF-B0A9-B7CF3B225DED@continuum.io>


On behalf of the Bokeh team, I am very happy to announce the release of Bokeh version 0.6!

Bokeh is a Python library for visualizing large and realtime datasets on the web. Its goal is to provide to developers (and domain experts) with capabilities to easily create novel and powerful visualizations that extract insight from local or remote (possibly large) data sets, and to easily publish those visualization to the web for others to explore and interact with.

This release includes many bug fixes and improvements over our most recent 0.5.2 release:

  * Abstract Rendering recipes for large data sets: isocontour, heatmap
  * New charts in bokeh.charts: Time Series and Categorical Heatmap
  * Full Python 3 support for bokeh-server
  * Much expanded User and Dev Guides
  * Multiple axes and ranges capability
  * Plot object graph query interface
  * Hit-testing (hover tool support) for patch glyphs

See the CHANGELOG for full details.

I'd also like to announce a new Github Organization for Bokeh: https://github.com/bokeh. Currently it is home to Scala and and Julia language bindings for Bokeh, but the Bokeh project itself will be moved there before the next 0.7 release.  Any implementors of new language bindings who are interested in hosting your project under this organization are encouraged to contact us.

In upcoming releases, you should expect to see more new layout capabilities (colorbar axes, better grid plots and improved annotations), additional tools, even more widgets and more charts, R language bindings, Blaze integration and cloud hosting for Bokeh apps.

Don't forget to check out the full documentation, interactive gallery, and tutorial at

    http://bokeh.pydata.org

as well as the Bokeh IPython notebook nbviewer index (including all the tutorials) at:

    http://nbviewer.ipython.org/github/ContinuumIO/bokeh-notebooks/blob/master/index.ipynb

If you are using Anaconda, you can install with conda:

    conda install bokeh

Alternatively, you can install with pip:

    pip install bokeh

BokehJS is also available by CDN for use in standalone javascript applications:

    http://cdn.pydata.org/bokeh-0.6.min.js
    http://cdn.pydata.org/bokeh-0.6.min.css

Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: 

    https://github.com/continuumio/bokeh

Questions can be directed to the Bokeh mailing list: bokeh at continuum.io

If you have interest in helping to develop Bokeh, please get involved!

Thanks,

Bryan Van de Ven
Continuum Analytics
http://continuum.io

From sebastian at sipsolutions.net  Wed Sep 10 11:11:49 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 10 Sep 2014 17:11:49 +0200
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
Message-ID: <1410361909.9047.17.camel@sebastian-t440>

On Di, 2014-09-09 at 13:52 -0600, Charles R Harris wrote:
> Hi All,
> 
> 
> I'm in the midst of implementing the '@' operator (PEP 465), and there
> are some behaviors that are unspecified by the PEP.
> 
>      1. Should the operator accept array_like for one of the
>         arguments?

To be in line with all the other operators, I would say yes

>      1. Does it need to handle __numpy_ufunc__, or will
>         __array_priority__ serve?
>      2. Do we want PyArray_Matmul in the numpy API?

Don't care much either way, but I would say yes (why not).

>      1. Should a matmul function be supplied by the multiarray module?
> 

We could possibly put it to the linalg module. But I think we should
provide such a function.

> If 3 and 4 are wanted, should they use the __numpy_ufunc__ machinery,
> or will __array_priority__ serve?
> 
> Note that the type number operators, __add__ and such, currently use
> __numpy_ufunc__ in combination with __array_priority__, this in
> addition to the fact that they are by default using ufuncs that do the
> same. I'd rather that the __*__ operators simply rely on
> __array_priority__.
> 

Hmmm, that is a difficult one. For the operators I agree, numpy_ufunc
should not be necessary, since we have NotImplemented there, and the
array priority does nothing except tell numpy to stop handling all array
likes (i.e. other array-likes could actually say: you have an
array-priority > 0 -> return NotImplemented). So yeah, why do the
operators use numpy_ufunc at all? (unless due to implementation)

If we have a function numpy_ufunc would probably make sense, since that
circumvents the python dispatch mechanism.

- Sebastian

> 
> Thoughts?
> 
> Chuck
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140910/1545cc2f/attachment.sig>

From sturla.molden at gmail.com  Wed Sep 10 11:59:28 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Wed, 10 Sep 2014 15:59:28 +0000 (UTC)
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
	<540DE1BF.4030405@gmail.com>
Message-ID: <917025199431961103.174047sturla.molden-gmail.com@news.gmane.org>

Pierre-Andre Noel <noel.pierre.andre at gmail.com> wrote:

> Why not do something like the C++11 <random>? In <random>, a "generator" 
> is the engine producing randomness, and a "distribution" decides what is 
> the type of outputs that you want. 

This is what randomkit is doing internally, which is why it is so easy to
plug in a different generator.


Sturla


From sturla.molden at gmail.com  Wed Sep 10 11:59:27 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Wed, 10 Sep 2014 15:59:27 +0000 (UTC)
Subject: [Numpy-discussion] SFMT (faster mersenne twister)
References: <21515.18012.951610.781484@hebb.inf.ed.ac.uk>
	<834530714431811490.661154sturla.molden-gmail.com@news.gmane.org>
	<540DE1BF.4030405@gmail.com> <540DEEBB.5090803@googlemail.com>
Message-ID: <261359863431961370.090358sturla.molden-gmail.com@news.gmane.org>

Julian Taylor <jtaylor.debian at googlemail.com> wrote:
 
> But as already mentioned by Robert, we know what we can do, what is
> missing is someone writting the code.

This is actually a part of NumPy I know in detail, so I will be able to
contribute. Robert Kern's last post about objects like np.random.SFMT()
working similar to RandomState should be doable and not break any backwards
compatibility.

Sturla


From pav at iki.fi  Wed Sep 10 12:52:16 2014
From: pav at iki.fi (Pauli Virtanen)
Date: Wed, 10 Sep 2014 19:52:16 +0300
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
Message-ID: <lupvk1$q2e$1@ger.gmane.org>

09.09.2014, 22:52, Charles R Harris kirjoitti:
>   1. Should the operator accept array_like for one of the arguments?
>   2. Does it need to handle __numpy_ufunc__, or will
>   __array_priority__ serve?

I think the __matmul__ operator implementation should follow that of
__mul__.

[clip]
>    3. Do we want PyArray_Matmul in the numpy API?
>    4. Should a matmul function be supplied by the multiarray module?
> 
> If 3 and 4 are wanted, should they use the __numpy_ufunc__ machinery, or
> will __array_priority__ serve?

dot() function deals with __numpy_ufunc__, and the matmul() function
should behave similarly.

It seems dot() uses __array_priority__ for selection of output return
subclass, so matmul() probably needs do the same thing.

> Note that the type number operators, __add__ and such, currently use
> __numpy_ufunc__ in combination with __array_priority__, this in addition to
> the fact that they are by default using ufuncs that do the same. I'd rather
> that the __*__ operators simply rely on __array_priority__.

The whole business of __array_priority__ and __numpy_ufunc__ in the
binary ops is solely about when __<op>__ should yield the execution to
__r<op>__ of the other object.

The rule of operation currently is: "__rmul__ before __numpy_ufunc__"

If you remove the __numpy_ufunc__ handling, it becomes: "__rmul__ before
__numpy_ufunc__, except if array_priority happens to be smaller than
that of the other class and your class is not an ndarray subclass".

The following binops also do not IIRC respect __array_priority__ in
preferring right-hand operand:

- in-place operations
- comparisons

One question here is whether it's possible to change the behavior of
__array_priority__ here at all, or whether changes are possible only in
the context of adding new attributes telling Numpy what to do.

-- 
Pauli Virtanen


From charlesr.harris at gmail.com  Wed Sep 10 16:53:22 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 10 Sep 2014 14:53:22 -0600
Subject: [Numpy-discussion] @ operator
In-Reply-To: <lupvk1$q2e$1@ger.gmane.org>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
Message-ID: <CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>

On Wed, Sep 10, 2014 at 10:52 AM, Pauli Virtanen <pav at iki.fi> wrote:

> 09.09.2014, 22:52, Charles R Harris kirjoitti:
> >   1. Should the operator accept array_like for one of the arguments?
> >   2. Does it need to handle __numpy_ufunc__, or will
> >   __array_priority__ serve?
>
> I think the __matmul__ operator implementation should follow that of
> __mul__.
>
> [clip]
> >    3. Do we want PyArray_Matmul in the numpy API?
> >    4. Should a matmul function be supplied by the multiarray module?
> >
> > If 3 and 4 are wanted, should they use the __numpy_ufunc__ machinery, or
> > will __array_priority__ serve?
>
> dot() function deals with __numpy_ufunc__, and the matmul() function
> should behave similarly.
>
> It seems dot() uses __array_priority__ for selection of output return
> subclass, so matmul() probably needs do the same thing.
>
> > Note that the type number operators, __add__ and such, currently use
> > __numpy_ufunc__ in combination with __array_priority__, this in addition
> to
> > the fact that they are by default using ufuncs that do the same. I'd
> rather
> > that the __*__ operators simply rely on __array_priority__.
>
> The whole business of __array_priority__ and __numpy_ufunc__ in the
> binary ops is solely about when __<op>__ should yield the execution to
> __r<op>__ of the other object.
>
> The rule of operation currently is: "__rmul__ before __numpy_ufunc__"
>
> If you remove the __numpy_ufunc__ handling, it becomes: "__rmul__ before
> __numpy_ufunc__, except if array_priority happens to be smaller than
> that of the other class and your class is not an ndarray subclass".
>
> The following binops also do not IIRC respect __array_priority__ in
> preferring right-hand operand:
>
> - in-place operations
> - comparisons
>
> One question here is whether it's possible to change the behavior of
> __array_priority__ here at all, or whether changes are possible only in
> the context of adding new attributes telling Numpy what to do.
>
>
I was tempted to make it a generalized ufunc, which would take care of a
lot of things, but there is a lot of overhead in those functions. Sounds
like the easiest thing is to make it similar to dot, although having an
inplace versions complicates the type selection a bit.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140910/3c12f7a3/attachment.html>

From charlesr.harris at gmail.com  Wed Sep 10 16:55:20 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 10 Sep 2014 14:55:20 -0600
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
Message-ID: <CAB6mnx+kDxigHrj-XVdQT=3biXw15OYGVA0vgZkz2BbufNY9iA@mail.gmail.com>

On Wed, Sep 10, 2014 at 2:53 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Wed, Sep 10, 2014 at 10:52 AM, Pauli Virtanen <pav at iki.fi> wrote:
>
>> 09.09.2014, 22:52, Charles R Harris kirjoitti:
>> >   1. Should the operator accept array_like for one of the arguments?
>> >   2. Does it need to handle __numpy_ufunc__, or will
>> >   __array_priority__ serve?
>>
>> I think the __matmul__ operator implementation should follow that of
>> __mul__.
>>
>> [clip]
>> >    3. Do we want PyArray_Matmul in the numpy API?
>> >    4. Should a matmul function be supplied by the multiarray module?
>> >
>> > If 3 and 4 are wanted, should they use the __numpy_ufunc__ machinery, or
>> > will __array_priority__ serve?
>>
>> dot() function deals with __numpy_ufunc__, and the matmul() function
>> should behave similarly.
>>
>> It seems dot() uses __array_priority__ for selection of output return
>> subclass, so matmul() probably needs do the same thing.
>>
>> > Note that the type number operators, __add__ and such, currently use
>> > __numpy_ufunc__ in combination with __array_priority__, this in
>> addition to
>> > the fact that they are by default using ufuncs that do the same. I'd
>> rather
>> > that the __*__ operators simply rely on __array_priority__.
>>
>> The whole business of __array_priority__ and __numpy_ufunc__ in the
>> binary ops is solely about when __<op>__ should yield the execution to
>> __r<op>__ of the other object.
>>
>> The rule of operation currently is: "__rmul__ before __numpy_ufunc__"
>>
>> If you remove the __numpy_ufunc__ handling, it becomes: "__rmul__ before
>> __numpy_ufunc__, except if array_priority happens to be smaller than
>> that of the other class and your class is not an ndarray subclass".
>>
>> The following binops also do not IIRC respect __array_priority__ in
>> preferring right-hand operand:
>>
>> - in-place operations
>> - comparisons
>>
>> One question here is whether it's possible to change the behavior of
>> __array_priority__ here at all, or whether changes are possible only in
>> the context of adding new attributes telling Numpy what to do.
>>
>>
> I was tempted to make it a generalized ufunc, which would take care of a
> lot of things, but there is a lot of overhead in those functions. Sounds
> like the easiest thing is to make it similar to dot, although having an
> inplace versions complicates the type selection a bit.
>

Note also that the dot cblas versions are not generally blocked, so the
size of the arrays is limited (and not checked).

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140910/85799075/attachment.html>

From sturla.molden at gmail.com  Wed Sep 10 22:18:24 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Thu, 11 Sep 2014 02:18:24 +0000 (UTC)
Subject: [Numpy-discussion] @ operator
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAB6mnx+kDxigHrj-XVdQT=3biXw15OYGVA0vgZkz2BbufNY9iA@mail.gmail.com>
Message-ID: <1927663638432094396.879571sturla.molden-gmail.com@news.gmane.org>

Charles R Harris <charlesr.harris at gmail.com> wrote:

> Note also that the dot cblas versions are not generally blocked, so the
> size of the arrays is limited (and not checked).

But it is possible to create a blocked dot function with the current cblas,
even though they use C int for array dimensions. It would just further
increase the complexity of dot (as if it's not bad enough already...) 

Sturla


From njs at pobox.com  Thu Sep 11 00:08:49 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 11 Sep 2014 00:08:49 -0400
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
Message-ID: <CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>

On Wed, Sep 10, 2014 at 4:53 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
> On Wed, Sep 10, 2014 at 10:52 AM, Pauli Virtanen <pav at iki.fi> wrote:
>>
>> 09.09.2014, 22:52, Charles R Harris kirjoitti:
>> >   1. Should the operator accept array_like for one of the arguments?
>> >   2. Does it need to handle __numpy_ufunc__, or will
>> >   __array_priority__ serve?
>>
>> I think the __matmul__ operator implementation should follow that of
>> __mul__.
>>
>> [clip]
>> >    3. Do we want PyArray_Matmul in the numpy API?
>> >    4. Should a matmul function be supplied by the multiarray module?
>> >
>> > If 3 and 4 are wanted, should they use the __numpy_ufunc__ machinery, or
>> > will __array_priority__ serve?
>>
>> dot() function deals with __numpy_ufunc__, and the matmul() function
>> should behave similarly.
>>
>> It seems dot() uses __array_priority__ for selection of output return
>> subclass, so matmul() probably needs do the same thing.
>>
>> > Note that the type number operators, __add__ and such, currently use
>> > __numpy_ufunc__ in combination with __array_priority__, this in addition
>> > to
>> > the fact that they are by default using ufuncs that do the same. I'd
>> > rather
>> > that the __*__ operators simply rely on __array_priority__.
>>
>> The whole business of __array_priority__ and __numpy_ufunc__ in the
>> binary ops is solely about when __<op>__ should yield the execution to
>> __r<op>__ of the other object.
>>
>> The rule of operation currently is: "__rmul__ before __numpy_ufunc__"
>>
>> If you remove the __numpy_ufunc__ handling, it becomes: "__rmul__ before
>> __numpy_ufunc__, except if array_priority happens to be smaller than
>> that of the other class and your class is not an ndarray subclass".
>>
>> The following binops also do not IIRC respect __array_priority__ in
>> preferring right-hand operand:
>>
>> - in-place operations
>> - comparisons
>>
>> One question here is whether it's possible to change the behavior of
>> __array_priority__ here at all, or whether changes are possible only in
>> the context of adding new attributes telling Numpy what to do.
>
> I was tempted to make it a generalized ufunc, which would take care of a lot
> of things, but there is a lot of overhead in those functions. Sounds like
> the easiest thing is to make it similar to dot, although having an inplace
> versions complicates the type selection a bit.

Can we please fix the overhead instead of adding more half-complete
implementations of the same concepts? I feel like this usually ends up
slowing things down in the end, as optimization efforts get divided...

My vote is:

__matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
methods do (so I guess check __array_priority__ or whatever it is we
always do). I'd also be okay with ignoring __array_priority__ on the
grounds that __numpy_ufunc__ is better, and there's no existing code
relying on __array_priority__ support in __matmul__.

Having decided that we are actually going to run, they dispatch
unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
in-place version), similarly to how e.g. __add__ dispatches to np.add.

newdot acts like a standard gufunc with all the standard niceties,
including __numpy_ufunc__ dispatch.

("newdot" here is intended as a placeholder name, maybe it should be
np.linalg.matmul or something else to be bikeshed later. I also vote
that eventually 'dot' become an alias for this function, but whether
to do that is an orthogonal discussion for later.)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From ndbecker2 at gmail.com  Thu Sep 11 10:27:51 2014
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 11 Sep 2014 10:27:51 -0400
Subject: [Numpy-discussion] why does u.resize return None?
Message-ID: <lusbh7$74q$1@ger.gmane.org>

It would be useful if u.resize returned the new array, so it could be used for 
chaining operations

-- 
-- Those who don't understand recursion are doomed to repeat it


From hoogendoorn.eelco at gmail.com  Thu Sep 11 10:45:53 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Thu, 11 Sep 2014 16:45:53 +0200
Subject: [Numpy-discussion] why does u.resize return None?
In-Reply-To: <lusbh7$74q$1@ger.gmane.org>
References: <lusbh7$74q$1@ger.gmane.org>
Message-ID: <CAO0rnfFnNG-UDD13wXq-xvL5Ev+1RHcTz77YXU-1FAww8T0fWA@mail.gmail.com>

agreed; I never saw the logic in returning none either.

On Thu, Sep 11, 2014 at 4:27 PM, Neal Becker <ndbecker2 at gmail.com> wrote:

> It would be useful if u.resize returned the new array, so it could be used
> for
> chaining operations
>
> --
> -- Those who don't understand recursion are doomed to repeat it
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140911/be7b6b91/attachment.html>

From robert.kern at gmail.com  Thu Sep 11 10:59:28 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 11 Sep 2014 15:59:28 +0100
Subject: [Numpy-discussion] why does u.resize return None?
In-Reply-To: <lusbh7$74q$1@ger.gmane.org>
References: <lusbh7$74q$1@ger.gmane.org>
Message-ID: <CAF6FJisLYTDOy7RM9ZHag+Bui=EWQbGG0o+HTrshXCXQZLgMDQ@mail.gmail.com>

On Thu, Sep 11, 2014 at 3:27 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> It would be useful if u.resize returned the new array, so it could be used for
> chaining operations

Same reason why list.sort() and friends return None.

https://docs.python.org/2/faq/design.html#why-doesn-t-list-sort-return-the-sorted-list

-- 
Robert Kern


From ndbecker2 at gmail.com  Thu Sep 11 10:56:02 2014
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 11 Sep 2014 10:56:02 -0400
Subject: [Numpy-discussion] why does u.resize return None?
References: <lusbh7$74q$1@ger.gmane.org>
	<CAO0rnfFnNG-UDD13wXq-xvL5Ev+1RHcTz77YXU-1FAww8T0fWA@mail.gmail.com>
Message-ID: <lusd63$uv0$2@ger.gmane.org>

https://github.com/numpy/numpy/issues/5064

Eelco Hoogendoorn wrote:

> agreed; I never saw the logic in returning none either.
> 
> On Thu, Sep 11, 2014 at 4:27 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> 
>> It would be useful if u.resize returned the new array, so it could be used
>> for
>> chaining operations
>>
>> --
>> -- Those who don't understand recursion are doomed to repeat it
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
-- 
-- Those who don't understand recursion are doomed to repeat it


From jhaiduce at gmail.com  Thu Sep 11 11:46:35 2014
From: jhaiduce at gmail.com (John Haiducek)
Date: Thu, 11 Sep 2014 11:46:35 -0400
Subject: [Numpy-discussion] How to get docs for functions processed by
	numpy.vectorize?
In-Reply-To: <C6D19027-31EF-4F08-B0A5-B2D43B9F23E7@gmail.com>
References: <C6D19027-31EF-4F08-B0A5-B2D43B9F23E7@gmail.com>
Message-ID: <5411C3DB.8020501@gmail.com>

When I apply numpy.vectorize() to a function, documentation tools behave 
inconsistently with regard to the new, vectorized function. The 
function's __doc__ attribute does contain the docstring of the original 
function as expected, but the built-in help() command displays the 
documentation of the numpy.vectorize class, and sphinx-autodoc fails to 
display the function at all. Is there a way to get sphinx-autodoc and 
the built-in help() command to display the docstring of the function and 
not something else?

For instance:

>>> import numpy as np
>>> def myfunc(x):
...     "Square x"
...     return x**2
...
>>> myfunc=np.vectorize(myfunc)
>>> print myfunc.__doc__
Square x
>>> help(myfunc)
(displays documentation of np.vectorize)


From charlesr.harris at gmail.com  Thu Sep 11 12:10:50 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 11 Sep 2014 10:10:50 -0600
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
Message-ID: <CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>

On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Sep 10, 2014 at 4:53 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> > On Wed, Sep 10, 2014 at 10:52 AM, Pauli Virtanen <pav at iki.fi> wrote:
> >>
> >> 09.09.2014, 22:52, Charles R Harris kirjoitti:
> >> >   1. Should the operator accept array_like for one of the arguments?
> >> >   2. Does it need to handle __numpy_ufunc__, or will
> >> >   __array_priority__ serve?
> >>
> >> I think the __matmul__ operator implementation should follow that of
> >> __mul__.
> >>
> >> [clip]
> >> >    3. Do we want PyArray_Matmul in the numpy API?
> >> >    4. Should a matmul function be supplied by the multiarray module?
> >> >
> >> > If 3 and 4 are wanted, should they use the __numpy_ufunc__ machinery,
> or
> >> > will __array_priority__ serve?
> >>
> >> dot() function deals with __numpy_ufunc__, and the matmul() function
> >> should behave similarly.
> >>
> >> It seems dot() uses __array_priority__ for selection of output return
> >> subclass, so matmul() probably needs do the same thing.
> >>
> >> > Note that the type number operators, __add__ and such, currently use
> >> > __numpy_ufunc__ in combination with __array_priority__, this in
> addition
> >> > to
> >> > the fact that they are by default using ufuncs that do the same. I'd
> >> > rather
> >> > that the __*__ operators simply rely on __array_priority__.
> >>
> >> The whole business of __array_priority__ and __numpy_ufunc__ in the
> >> binary ops is solely about when __<op>__ should yield the execution to
> >> __r<op>__ of the other object.
> >>
> >> The rule of operation currently is: "__rmul__ before __numpy_ufunc__"
> >>
> >> If you remove the __numpy_ufunc__ handling, it becomes: "__rmul__ before
> >> __numpy_ufunc__, except if array_priority happens to be smaller than
> >> that of the other class and your class is not an ndarray subclass".
> >>
> >> The following binops also do not IIRC respect __array_priority__ in
> >> preferring right-hand operand:
> >>
> >> - in-place operations
> >> - comparisons
> >>
> >> One question here is whether it's possible to change the behavior of
> >> __array_priority__ here at all, or whether changes are possible only in
> >> the context of adding new attributes telling Numpy what to do.
> >
> > I was tempted to make it a generalized ufunc, which would take care of a
> lot
> > of things, but there is a lot of overhead in those functions. Sounds like
> > the easiest thing is to make it similar to dot, although having an
> inplace
> > versions complicates the type selection a bit.
>
> Can we please fix the overhead instead of adding more half-complete
> implementations of the same concepts? I feel like this usually ends up
> slowing things down in the end, as optimization efforts get divided...
>
> My vote is:
>
> __matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
> methods do (so I guess check __array_priority__ or whatever it is we
> always do). I'd also be okay with ignoring __array_priority__ on the
> grounds that __numpy_ufunc__ is better, and there's no existing code
> relying on __array_priority__ support in __matmul__.
>
> Having decided that we are actually going to run, they dispatch
> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
> in-place version), similarly to how e.g. __add__ dispatches to np.add.
>
> newdot acts like a standard gufunc with all the standard niceties,
> including __numpy_ufunc__ dispatch.
>
> ("newdot" here is intended as a placeholder name, maybe it should be
> np.linalg.matmul or something else to be bikeshed later. I also vote
> that eventually 'dot' become an alias for this function, but whether
> to do that is an orthogonal discussion for later.)
>
> If we went the ufunc route, I think we would want three of them, matxvec,
vecxmat, and matxmat, because the best inner loops would be different in
the three cases, but they couldn't be straight ufuncs themselves, as we
don't need the other options, `reduce`, etc., but they can't be exactly
like the linalg machinery, because we do want subclasses to be able to
override. Hmm...

The ufunc machinery has some funky aspects. For instance, there are
hardwired checks for `__radd__` and other such operators in
PyUFunc_GenericFunction  that allows subclasses to overide the ufunc. Those
options should really be part of the PyUFuncObject.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140911/55ffd90e/attachment.html>

From charlesr.harris at gmail.com  Thu Sep 11 21:47:15 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 11 Sep 2014 19:47:15 -0600
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
	<CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
Message-ID: <CAB6mnxJy00X1nCb0QtuQ=WMs0-fHwXqPCFJz01m7HXK5v1Psmg@mail.gmail.com>

On Thu, Sep 11, 2014 at 10:10 AM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Wed, Sep 10, 2014 at 4:53 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> > On Wed, Sep 10, 2014 at 10:52 AM, Pauli Virtanen <pav at iki.fi> wrote:
>> >>
>> >> 09.09.2014, 22:52, Charles R Harris kirjoitti:
>> >> >   1. Should the operator accept array_like for one of the arguments?
>> >> >   2. Does it need to handle __numpy_ufunc__, or will
>> >> >   __array_priority__ serve?
>> >>
>> >> I think the __matmul__ operator implementation should follow that of
>> >> __mul__.
>> >>
>> >> [clip]
>> >> >    3. Do we want PyArray_Matmul in the numpy API?
>> >> >    4. Should a matmul function be supplied by the multiarray module?
>> >> >
>> >> > If 3 and 4 are wanted, should they use the __numpy_ufunc__
>> machinery, or
>> >> > will __array_priority__ serve?
>> >>
>> >> dot() function deals with __numpy_ufunc__, and the matmul() function
>> >> should behave similarly.
>> >>
>> >> It seems dot() uses __array_priority__ for selection of output return
>> >> subclass, so matmul() probably needs do the same thing.
>> >>
>> >> > Note that the type number operators, __add__ and such, currently use
>> >> > __numpy_ufunc__ in combination with __array_priority__, this in
>> addition
>> >> > to
>> >> > the fact that they are by default using ufuncs that do the same. I'd
>> >> > rather
>> >> > that the __*__ operators simply rely on __array_priority__.
>> >>
>> >> The whole business of __array_priority__ and __numpy_ufunc__ in the
>> >> binary ops is solely about when __<op>__ should yield the execution to
>> >> __r<op>__ of the other object.
>> >>
>> >> The rule of operation currently is: "__rmul__ before __numpy_ufunc__"
>> >>
>> >> If you remove the __numpy_ufunc__ handling, it becomes: "__rmul__
>> before
>> >> __numpy_ufunc__, except if array_priority happens to be smaller than
>> >> that of the other class and your class is not an ndarray subclass".
>> >>
>> >> The following binops also do not IIRC respect __array_priority__ in
>> >> preferring right-hand operand:
>> >>
>> >> - in-place operations
>> >> - comparisons
>> >>
>> >> One question here is whether it's possible to change the behavior of
>> >> __array_priority__ here at all, or whether changes are possible only in
>> >> the context of adding new attributes telling Numpy what to do.
>> >
>> > I was tempted to make it a generalized ufunc, which would take care of
>> a lot
>> > of things, but there is a lot of overhead in those functions. Sounds
>> like
>> > the easiest thing is to make it similar to dot, although having an
>> inplace
>> > versions complicates the type selection a bit.
>>
>> Can we please fix the overhead instead of adding more half-complete
>> implementations of the same concepts? I feel like this usually ends up
>> slowing things down in the end, as optimization efforts get divided...
>>
>> My vote is:
>>
>> __matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
>> methods do (so I guess check __array_priority__ or whatever it is we
>> always do). I'd also be okay with ignoring __array_priority__ on the
>> grounds that __numpy_ufunc__ is better, and there's no existing code
>> relying on __array_priority__ support in __matmul__.
>>
>> Having decided that we are actually going to run, they dispatch
>> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
>> in-place version), similarly to how e.g. __add__ dispatches to np.add.
>>
>> newdot acts like a standard gufunc with all the standard niceties,
>> including __numpy_ufunc__ dispatch.
>>
>> ("newdot" here is intended as a placeholder name, maybe it should be
>> np.linalg.matmul or something else to be bikeshed later. I also vote
>> that eventually 'dot' become an alias for this function, but whether
>> to do that is an orthogonal discussion for later.)
>>
>> If we went the ufunc route, I think we would want three of them, matxvec,
> vecxmat, and matxmat, because the best inner loops would be different in
> the three cases, but they couldn't be straight ufuncs themselves, as we
> don't need the other options, `reduce`, etc., but they can't be exactly
> like the linalg machinery, because we do want subclasses to be able to
> override. Hmm...
>
> The ufunc machinery has some funky aspects. For instance, there are
> hardwired checks for `__radd__` and other such operators in
> PyUFunc_GenericFunction  that allows subclasses to overide the ufunc. Those
> options should really be part of the PyUFuncObject.
>

Here are some timing results for the inner product of an integer array
comparing inner, ufunc (inner1d), and einsum implementations. The np.long
type was chosen because it is implemented for inner1d and to avoid the
effect of using cblas.

In [43]: a = ones(100000, dtype=np.long)

In [44]: timeit inner(a,a)
10000 loops, best of 3: 55.5 ?s per loop

In [45]: timeit inner1d(a,a)
10000 loops, best of 3: 56.2 ?s per loop

In [46]: timeit einsum('...i,...i',a,a)
10000 loops, best of 3: 43.8 ?s per loop

In [47]: a = ones(1, dtype=np.long)

In [48]: timeit inner(a,a)
1000000 loops, best of 3: 484 ns per loop

In [49]: timeit inner1d(a,a)
1000000 loops, best of 3: 741 ns per loop

In [50]: timeit einsum('...i,...i',a,a)
1000000 loops, best of 3: 811 ns per loop

For big arrays, einsum has a better inner loop. For small arrays, einsum
and inner1d suffer from call overhead. Note that the einsum overhead could
be improved by special casing, and that the various loops could be
consolidated into best of breed.

The ufunc implementation is easy to do for all cases.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140911/c67bef59/attachment.html>

From njs at pobox.com  Thu Sep 11 22:01:14 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 11 Sep 2014 22:01:14 -0400
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
	<CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
Message-ID: <CAPJVwBn71_ErNenr9k9Mkh34QX5pzYhsMj3zGF232t5yvvBR1A@mail.gmail.com>

On Thu, Sep 11, 2014 at 12:10 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
> On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> My vote is:
>>
>> __matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
>> methods do (so I guess check __array_priority__ or whatever it is we
>> always do). I'd also be okay with ignoring __array_priority__ on the
>> grounds that __numpy_ufunc__ is better, and there's no existing code
>> relying on __array_priority__ support in __matmul__.
>>
>> Having decided that we are actually going to run, they dispatch
>> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
>> in-place version), similarly to how e.g. __add__ dispatches to np.add.
>>
>> newdot acts like a standard gufunc with all the standard niceties,
>> including __numpy_ufunc__ dispatch.
>>
>> ("newdot" here is intended as a placeholder name, maybe it should be
>> np.linalg.matmul or something else to be bikeshed later. I also vote
>> that eventually 'dot' become an alias for this function, but whether
>> to do that is an orthogonal discussion for later.)
>>
> If we went the ufunc route, I think we would want three of them, matxvec,
> vecxmat, and matxmat, because the best inner loops would be different in the
> three cases,

Couldn't we write a single inner loop like:

void ufunc_loop(blah blah) {
    if (arg1_shape[0] == 1 && arg2_shape[1] == 1) {
        call DOT
    } else if (arg2_shape[0] == 1) {
        call GEMV
    } else if (...) {
        ...
    } else {
        call GEMM
    }
}
?

I can't see any reason that this would be measureably slower than
having multiple ufunc loops: the checks are extremely cheap, will
usually only be done once (because most @ calls will only enter the
inner loop once), and if they are done repeatedly in the same call
then they'll come out the same way very time and thus be predictable
branches.

> but they couldn't be straight ufuncs themselves, as we don't
> need the other options, `reduce`, etc.,

Not sure what you mean -- ATM gufuncs don't support reduce anyway, but
if someone felt like implementing it then it would be cool to get
dot.reduce for free -- it is a useful/meaningful operation. Is there
some reason that supporting more ufunc features is bad?

> but they can't be exactly like the
> linalg machinery, because we do want subclasses to be able to override.

Subclasses can override ufuncs using __numpy_ufunc__.

> Hmm...
>
> The ufunc machinery has some funky aspects. For instance, there are
> hardwired checks for `__radd__` and other such operators in
> PyUFunc_GenericFunction  that allows subclasses to overide the ufunc. Those
> options should really be part of the PyUFuncObject.

Are you mentioning this b/c it's an annoying thing that talking about
ufuncs reminded you of, or is there a specific impact on __matmul__
that you see?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From charlesr.harris at gmail.com  Thu Sep 11 22:03:03 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 11 Sep 2014 20:03:03 -0600
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnxJy00X1nCb0QtuQ=WMs0-fHwXqPCFJz01m7HXK5v1Psmg@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
	<CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
	<CAB6mnxJy00X1nCb0QtuQ=WMs0-fHwXqPCFJz01m7HXK5v1Psmg@mail.gmail.com>
Message-ID: <CAB6mnx+qY4kbKKSfoZN86Qn3kzLnVKEsy9b1aaAZSQobP7Nikw@mail.gmail.com>

On Thu, Sep 11, 2014 at 7:47 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Thu, Sep 11, 2014 at 10:10 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>>> On Wed, Sep 10, 2014 at 4:53 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> >
>>> > On Wed, Sep 10, 2014 at 10:52 AM, Pauli Virtanen <pav at iki.fi> wrote:
>>> >>
>>> >> 09.09.2014, 22:52, Charles R Harris kirjoitti:
>>> >> >   1. Should the operator accept array_like for one of the arguments?
>>> >> >   2. Does it need to handle __numpy_ufunc__, or will
>>> >> >   __array_priority__ serve?
>>> >>
>>> >> I think the __matmul__ operator implementation should follow that of
>>> >> __mul__.
>>> >>
>>> >> [clip]
>>> >> >    3. Do we want PyArray_Matmul in the numpy API?
>>> >> >    4. Should a matmul function be supplied by the multiarray module?
>>> >> >
>>> >> > If 3 and 4 are wanted, should they use the __numpy_ufunc__
>>> machinery, or
>>> >> > will __array_priority__ serve?
>>> >>
>>> >> dot() function deals with __numpy_ufunc__, and the matmul() function
>>> >> should behave similarly.
>>> >>
>>> >> It seems dot() uses __array_priority__ for selection of output return
>>> >> subclass, so matmul() probably needs do the same thing.
>>> >>
>>> >> > Note that the type number operators, __add__ and such, currently use
>>> >> > __numpy_ufunc__ in combination with __array_priority__, this in
>>> addition
>>> >> > to
>>> >> > the fact that they are by default using ufuncs that do the same. I'd
>>> >> > rather
>>> >> > that the __*__ operators simply rely on __array_priority__.
>>> >>
>>> >> The whole business of __array_priority__ and __numpy_ufunc__ in the
>>> >> binary ops is solely about when __<op>__ should yield the execution to
>>> >> __r<op>__ of the other object.
>>> >>
>>> >> The rule of operation currently is: "__rmul__ before __numpy_ufunc__"
>>> >>
>>> >> If you remove the __numpy_ufunc__ handling, it becomes: "__rmul__
>>> before
>>> >> __numpy_ufunc__, except if array_priority happens to be smaller than
>>> >> that of the other class and your class is not an ndarray subclass".
>>> >>
>>> >> The following binops also do not IIRC respect __array_priority__ in
>>> >> preferring right-hand operand:
>>> >>
>>> >> - in-place operations
>>> >> - comparisons
>>> >>
>>> >> One question here is whether it's possible to change the behavior of
>>> >> __array_priority__ here at all, or whether changes are possible only
>>> in
>>> >> the context of adding new attributes telling Numpy what to do.
>>> >
>>> > I was tempted to make it a generalized ufunc, which would take care of
>>> a lot
>>> > of things, but there is a lot of overhead in those functions. Sounds
>>> like
>>> > the easiest thing is to make it similar to dot, although having an
>>> inplace
>>> > versions complicates the type selection a bit.
>>>
>>> Can we please fix the overhead instead of adding more half-complete
>>> implementations of the same concepts? I feel like this usually ends up
>>> slowing things down in the end, as optimization efforts get divided...
>>>
>>> My vote is:
>>>
>>> __matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
>>> methods do (so I guess check __array_priority__ or whatever it is we
>>> always do). I'd also be okay with ignoring __array_priority__ on the
>>> grounds that __numpy_ufunc__ is better, and there's no existing code
>>> relying on __array_priority__ support in __matmul__.
>>>
>>> Having decided that we are actually going to run, they dispatch
>>> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
>>> in-place version), similarly to how e.g. __add__ dispatches to np.add.
>>>
>>> newdot acts like a standard gufunc with all the standard niceties,
>>> including __numpy_ufunc__ dispatch.
>>>
>>> ("newdot" here is intended as a placeholder name, maybe it should be
>>> np.linalg.matmul or something else to be bikeshed later. I also vote
>>> that eventually 'dot' become an alias for this function, but whether
>>> to do that is an orthogonal discussion for later.)
>>>
>>> If we went the ufunc route, I think we would want three of them,
>> matxvec, vecxmat, and matxmat, because the best inner loops would be
>> different in the three cases, but they couldn't be straight ufuncs
>> themselves, as we don't need the other options, `reduce`, etc., but they
>> can't be exactly like the linalg machinery, because we do want subclasses
>> to be able to override. Hmm...
>>
>> The ufunc machinery has some funky aspects. For instance, there are
>> hardwired checks for `__radd__` and other such operators in
>> PyUFunc_GenericFunction  that allows subclasses to overide the ufunc. Those
>> options should really be part of the PyUFuncObject.
>>
>
> Here are some timing results for the inner product of an integer array
> comparing inner, ufunc (inner1d), and einsum implementations. The np.long
> type was chosen because it is implemented for inner1d and to avoid the
> effect of using cblas.
>
> In [43]: a = ones(100000, dtype=np.long)
>
> In [44]: timeit inner(a,a)
> 10000 loops, best of 3: 55.5 ?s per loop
>
> In [45]: timeit inner1d(a,a)
> 10000 loops, best of 3: 56.2 ?s per loop
>
> In [46]: timeit einsum('...i,...i',a,a)
> 10000 loops, best of 3: 43.8 ?s per loop
>
> In [47]: a = ones(1, dtype=np.long)
>
> In [48]: timeit inner(a,a)
> 1000000 loops, best of 3: 484 ns per loop
>
> In [49]: timeit inner1d(a,a)
> 1000000 loops, best of 3: 741 ns per loop
>
> In [50]: timeit einsum('...i,...i',a,a)
> 1000000 loops, best of 3: 811 ns per loop
>
> For big arrays, einsum has a better inner loop. For small arrays, einsum
> and inner1d suffer from call overhead. Note that the einsum overhead could
> be improved by special casing, and that the various loops could be
> consolidated into best of breed.
>
> The ufunc implementation is easy to do for all cases.
>
>
Same thing for doubles. The speedup due to cblas can be seen, and the
iterator overhead becomes more visible.

In [51]: a = ones(100000, dtype=np.double)

In [52]: timeit inner(a,a)
10000 loops, best of 3: 32.1 ?s per loop

In [53]: timeit inner1d(a,a)
10000 loops, best of 3: 134 ?s per loop

In [54]: timeit einsum('...i,...i',a,a)
10000 loops, best of 3: 42 ?s per loop

In [55]: a = ones(1, dtype=np.double)

In [56]: timeit inner(a,a)
1000000 loops, best of 3: 336 ns per loop

In [57]: timeit inner1d(a,a)
1000000 loops, best of 3: 742 ns per loop

In [58]: timeit einsum('...i,...i',a,a)
1000000 loops, best of 3: 809 ns per loop

But stacked vectors make a difference

In [59]: a = ones((2, 1), dtype=np.double)

In [60]: timeit for i in range(2): inner(a[i],a[i])
1000000 loops, best of 3: 1.39 ?s per loop

In [61]: timeit inner1d(a,a)
1000000 loops, best of 3: 749 ns per loop

In [62]: timeit einsum('...i,...i',a,a)
1000000 loops, best of 3: 888 ns per loop

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140911/bfb63660/attachment.html>

From charlesr.harris at gmail.com  Thu Sep 11 22:12:43 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 11 Sep 2014 20:12:43 -0600
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAPJVwBn71_ErNenr9k9Mkh34QX5pzYhsMj3zGF232t5yvvBR1A@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
	<CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
	<CAPJVwBn71_ErNenr9k9Mkh34QX5pzYhsMj3zGF232t5yvvBR1A@mail.gmail.com>
Message-ID: <CAB6mnxJKz5X9+QO4KyKanHsgWOJHq0UPz=fEC7LtfU455+Rdyw@mail.gmail.com>

On Thu, Sep 11, 2014 at 8:01 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Thu, Sep 11, 2014 at 12:10 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> > On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >>
> >> My vote is:
> >>
> >> __matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
> >> methods do (so I guess check __array_priority__ or whatever it is we
> >> always do). I'd also be okay with ignoring __array_priority__ on the
> >> grounds that __numpy_ufunc__ is better, and there's no existing code
> >> relying on __array_priority__ support in __matmul__.
> >>
> >> Having decided that we are actually going to run, they dispatch
> >> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
> >> in-place version), similarly to how e.g. __add__ dispatches to np.add.
> >>
> >> newdot acts like a standard gufunc with all the standard niceties,
> >> including __numpy_ufunc__ dispatch.
> >>
> >> ("newdot" here is intended as a placeholder name, maybe it should be
> >> np.linalg.matmul or something else to be bikeshed later. I also vote
> >> that eventually 'dot' become an alias for this function, but whether
> >> to do that is an orthogonal discussion for later.)
> >>
> > If we went the ufunc route, I think we would want three of them, matxvec,
> > vecxmat, and matxmat, because the best inner loops would be different in
> the
> > three cases,
>
> Couldn't we write a single inner loop like:
>
> void ufunc_loop(blah blah) {
>     if (arg1_shape[0] == 1 && arg2_shape[1] == 1) {
>         call DOT
>     } else if (arg2_shape[0] == 1) {
>         call GEMV
>     } else if (...) {
>         ...
>     } else {
>         call GEMM
>     }
> }
> ?
>
>
Not for generalized ufuncs, different signatures, or if linearized, more
info on dimensions. What you show is essentially what dot does now for
cblas enabled functions. But note, we need more than the simple '@', we
also need stacks of vectors, and turning vectors into matrices, and then
back into vectors seems unnecessarily complicated.


> I can't see any reason that this would be measureably slower than
> having multiple ufunc loops: the checks are extremely cheap, will
> usually only be done once (because most @ calls will only enter the
> inner loop once), and if they are done repeatedly in the same call
> then they'll come out the same way very time and thus be predictable
> branches.
>

It's not the loops that are expensive.


>
> > but they couldn't be straight ufuncs themselves, as we don't
> > need the other options, `reduce`, etc.,
>
> Not sure what you mean -- ATM gufuncs don't support reduce anyway, but
> if someone felt like implementing it then it would be cool to get
> dot.reduce for free -- it is a useful/meaningful operation. Is there
> some reason that supporting more ufunc features is bad?
>
> > but they can't be exactly like the
> > linalg machinery, because we do want subclasses to be able to override.
>
> Subclasses can override ufuncs using __numpy_ufunc__.
>

Yep.


>
> > Hmm...
> >
> > The ufunc machinery has some funky aspects. For instance, there are
> > hardwired checks for `__radd__` and other such operators in
> > PyUFunc_GenericFunction  that allows subclasses to overide the ufunc.
> Those
> > options should really be part of the PyUFuncObject.
>
> Are you mentioning this b/c it's an annoying thing that talking about
> ufuncs reminded you of, or is there a specific impact on __matmul__
> that you see?
>

Just because it seems out of place. numpy_ufunc is definitely a better way
to do this.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140911/60df59dd/attachment.html>

From njs at pobox.com  Thu Sep 11 22:49:21 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 11 Sep 2014 22:49:21 -0400
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnxJKz5X9+QO4KyKanHsgWOJHq0UPz=fEC7LtfU455+Rdyw@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
	<CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
	<CAPJVwBn71_ErNenr9k9Mkh34QX5pzYhsMj3zGF232t5yvvBR1A@mail.gmail.com>
	<CAB6mnxJKz5X9+QO4KyKanHsgWOJHq0UPz=fEC7LtfU455+Rdyw@mail.gmail.com>
Message-ID: <CAPJVwBmYAPj0PmntAffd8r=VrN-XUi6Kn1sgWpGoAmKJbxfHyQ@mail.gmail.com>

On Thu, Sep 11, 2014 at 10:12 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
> On Thu, Sep 11, 2014 at 8:01 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Thu, Sep 11, 2014 at 12:10 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> > On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> >>
>> >> My vote is:
>> >>
>> >> __matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
>> >> methods do (so I guess check __array_priority__ or whatever it is we
>> >> always do). I'd also be okay with ignoring __array_priority__ on the
>> >> grounds that __numpy_ufunc__ is better, and there's no existing code
>> >> relying on __array_priority__ support in __matmul__.
>> >>
>> >> Having decided that we are actually going to run, they dispatch
>> >> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
>> >> in-place version), similarly to how e.g. __add__ dispatches to np.add.
>> >>
>> >> newdot acts like a standard gufunc with all the standard niceties,
>> >> including __numpy_ufunc__ dispatch.
>> >>
>> >> ("newdot" here is intended as a placeholder name, maybe it should be
>> >> np.linalg.matmul or something else to be bikeshed later. I also vote
>> >> that eventually 'dot' become an alias for this function, but whether
>> >> to do that is an orthogonal discussion for later.)
>> >>
>> > If we went the ufunc route, I think we would want three of them,
>> > matxvec,
>> > vecxmat, and matxmat, because the best inner loops would be different in
>> > the
>> > three cases,
>>
>> Couldn't we write a single inner loop like:
>>
>> void ufunc_loop(blah blah) {
>>     if (arg1_shape[0] == 1 && arg2_shape[1] == 1) {
>>         call DOT
>>     } else if (arg2_shape[0] == 1) {
>>         call GEMV
>>     } else if (...) {
>>         ...
>>     } else {
>>         call GEMM
>>     }
>> }
>> ?
>
> Not for generalized ufuncs, different signatures, or if linearized, more
> info on dimensions.

This sentence no verb, but I think the point you might be raising is:
we don't actually have the technical capability to define a single
gufunc for @, because the matmat, matvec, vecmat, and vecvec forms
have different gufunc signatures ("mn,nk->mk", "mn,n->m", "n,nk->k",
and "n,n->" respectively, I think)?

This is true, but Jaime has said he's willing to look at fixing this :-):
    http://thread.gmane.org/gmane.comp.python.numeric.general/58669/focus=58670

...and fundamentally, it's very difficult to solve this anywhere else
except the ufunc internals. When we first enter a function like
newdot, we need to check for special overloads like __numpy_ufunc__
*before* we do any array_like->ndarray coercion. But we have to do
array_like->ndarray coercion before we know what the shape/ndim of the
our inputs is. And in the wrapper-around-multiple-ufuncs approach, we
have to check the shape/ndim before we choose which ufunc to dispatch
to. So __numpy_ufunc__ won't get checked until after we've already
coerced to ndarray, oops... OTOH the ufunc internals already know how
to do this dance, so it's at least straightforward (if not necessarily
trivial) to insert the correct logic once in the correct place.

> What you show is essentially what dot does now for cblas
> enabled functions. But note, we need more than the simple '@', we also need
> stacks of vectors, and turning vectors into matrices, and then back into
> vectors seems unnecessarily complicated.

By the time we get into the data/shape/strides world that ufuncs work
with, converting vectors->matrices is literally just adding a single
entry to the shape+strides arrays. This feels trivial and cheap to me?

Or do you just mean that we do actually want broadcasting matvec,
vecmat, vecvec gufuncs? I agree with this too but it seems orthogonal
to me -- we can have those in any case, even if newdot doesn't
literally call them.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From charlesr.harris at gmail.com  Thu Sep 11 23:18:02 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 11 Sep 2014 21:18:02 -0600
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAPJVwBmYAPj0PmntAffd8r=VrN-XUi6Kn1sgWpGoAmKJbxfHyQ@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
	<CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
	<CAPJVwBn71_ErNenr9k9Mkh34QX5pzYhsMj3zGF232t5yvvBR1A@mail.gmail.com>
	<CAB6mnxJKz5X9+QO4KyKanHsgWOJHq0UPz=fEC7LtfU455+Rdyw@mail.gmail.com>
	<CAPJVwBmYAPj0PmntAffd8r=VrN-XUi6Kn1sgWpGoAmKJbxfHyQ@mail.gmail.com>
Message-ID: <CAB6mnxK_bzi_OvTTXX6uLkMZ_BhLUT1EKN61A0GPVGTX3rdk2Q@mail.gmail.com>

On Thu, Sep 11, 2014 at 8:49 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Thu, Sep 11, 2014 at 10:12 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> > On Thu, Sep 11, 2014 at 8:01 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >>
> >> On Thu, Sep 11, 2014 at 12:10 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> > On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com>
> wrote:
> >> >>
> >> >> My vote is:
> >> >>
> >> >> __matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
> >> >> methods do (so I guess check __array_priority__ or whatever it is we
> >> >> always do). I'd also be okay with ignoring __array_priority__ on the
> >> >> grounds that __numpy_ufunc__ is better, and there's no existing code
> >> >> relying on __array_priority__ support in __matmul__.
> >> >>
> >> >> Having decided that we are actually going to run, they dispatch
> >> >> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
> >> >> in-place version), similarly to how e.g. __add__ dispatches to
> np.add.
> >> >>
> >> >> newdot acts like a standard gufunc with all the standard niceties,
> >> >> including __numpy_ufunc__ dispatch.
> >> >>
> >> >> ("newdot" here is intended as a placeholder name, maybe it should be
> >> >> np.linalg.matmul or something else to be bikeshed later. I also vote
> >> >> that eventually 'dot' become an alias for this function, but whether
> >> >> to do that is an orthogonal discussion for later.)
> >> >>
> >> > If we went the ufunc route, I think we would want three of them,
> >> > matxvec,
> >> > vecxmat, and matxmat, because the best inner loops would be different
> in
> >> > the
> >> > three cases,
> >>
> >> Couldn't we write a single inner loop like:
> >>
> >> void ufunc_loop(blah blah) {
> >>     if (arg1_shape[0] == 1 && arg2_shape[1] == 1) {
> >>         call DOT
> >>     } else if (arg2_shape[0] == 1) {
> >>         call GEMV
> >>     } else if (...) {
> >>         ...
> >>     } else {
> >>         call GEMM
> >>     }
> >> }
> >> ?
> >
> > Not for generalized ufuncs, different signatures, or if linearized, more
> > info on dimensions.
>
> This sentence no verb, but I think the point you might be raising is:
> we don't actually have the technical capability to define a single
> gufunc for @, because the matmat, matvec, vecmat, and vecvec forms
> have different gufunc signatures ("mn,nk->mk", "mn,n->m", "n,nk->k",
> and "n,n->" respectively, I think)?
>
> This is true, but Jaime has said he's willing to look at fixing this :-):
>
> http://thread.gmane.org/gmane.comp.python.numeric.general/58669/focus=58670
>
>
Don't see the need here, the loops are not complicated.


> ...and fundamentally, it's very difficult to solve this anywhere else
> except the ufunc internals. When we first enter a function like
> newdot, we need to check for special overloads like __numpy_ufunc__
> *before* we do any array_like->ndarray coercion. But we have to do
> array_like->ndarray coercion before we know what the shape/ndim of the
> our inputs is.


True.


> And in the wrapper-around-multiple-ufuncs approach, we
> have to check the shape/ndim before we choose which ufunc to dispatch
> to.


True. I don't see a problem there and the four ufuncs would be useful in
themselves. I think they should be part of the multiarray module methods if
we go that way.

So __numpy_ufunc__ won't get checked until after we've already
> coerced to ndarray, oops... OTOH the ufunc internals already know how
> to do this dance, so it's at least straightforward (if not necessarily
> trivial) to insert the correct logic once in the correct place.
>

I'm  thinking that the four ufuncs being overridden by user subtypes should
be sufficient.


>
> > What you show is essentially what dot does now for cblas
> > enabled functions. But note, we need more than the simple '@', we also
> need
> > stacks of vectors, and turning vectors into matrices, and then back into
> > vectors seems unnecessarily complicated.
>
> By the time we get into the data/shape/strides world that ufuncs work
> with, converting vectors->matrices is literally just adding a single
> entry to the shape+strides arrays. This feels trivial and cheap to me?
>

And moving things around, then removing. It's an option if we just want to
use matrix multiplication for everything. I don't think there is any speed
advantage one way or the other, although there are currently size
limitations that are easier to block in the 1D case than the matrix case.
In practice 2**31 per dimension for floating point is probably plenty.


>
> Or do you just mean that we do actually want broadcasting matvec,
> vecmat, vecvec gufuncs? I agree with this too but it seems orthogonal
> to me -- we can have those in any case, even if newdot doesn't
> literally call them.
>

Yes. But if we have them, we might as well use them.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140911/9cbe8f6e/attachment.html>

From njs at pobox.com  Fri Sep 12 01:09:15 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 12 Sep 2014 01:09:15 -0400
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAB6mnxK_bzi_OvTTXX6uLkMZ_BhLUT1EKN61A0GPVGTX3rdk2Q@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
	<CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
	<CAPJVwBn71_ErNenr9k9Mkh34QX5pzYhsMj3zGF232t5yvvBR1A@mail.gmail.com>
	<CAB6mnxJKz5X9+QO4KyKanHsgWOJHq0UPz=fEC7LtfU455+Rdyw@mail.gmail.com>
	<CAPJVwBmYAPj0PmntAffd8r=VrN-XUi6Kn1sgWpGoAmKJbxfHyQ@mail.gmail.com>
	<CAB6mnxK_bzi_OvTTXX6uLkMZ_BhLUT1EKN61A0GPVGTX3rdk2Q@mail.gmail.com>
Message-ID: <CAPJVwBkY1Xu5va5wZhyN+Gc7_1ccvfC+uw_ikuKJmzKM3EcTTw@mail.gmail.com>

On Thu, Sep 11, 2014 at 11:18 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
> On Thu, Sep 11, 2014 at 8:49 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Thu, Sep 11, 2014 at 10:12 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> > On Thu, Sep 11, 2014 at 8:01 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> >>
>> >> On Thu, Sep 11, 2014 at 12:10 PM, Charles R Harris
>> >> <charlesr.harris at gmail.com> wrote:
>> >> >
>> >> > On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com>
>> >> > wrote:
>> >> >>
>> >> >> My vote is:
>> >> >>
>> >> >> __matmul__/__rmatmul__ do the standard dispatch stuff that all
>> >> >> __op__
>> >> >> methods do (so I guess check __array_priority__ or whatever it is we
>> >> >> always do). I'd also be okay with ignoring __array_priority__ on the
>> >> >> grounds that __numpy_ufunc__ is better, and there's no existing code
>> >> >> relying on __array_priority__ support in __matmul__.
>> >> >>
>> >> >> Having decided that we are actually going to run, they dispatch
>> >> >> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
>> >> >> in-place version), similarly to how e.g. __add__ dispatches to
>> >> >> np.add.
>> >> >>
>> >> >> newdot acts like a standard gufunc with all the standard niceties,
>> >> >> including __numpy_ufunc__ dispatch.
>> >> >>
>> >> >> ("newdot" here is intended as a placeholder name, maybe it should be
>> >> >> np.linalg.matmul or something else to be bikeshed later. I also vote
>> >> >> that eventually 'dot' become an alias for this function, but whether
>> >> >> to do that is an orthogonal discussion for later.)
>> >> >>
>> >> > If we went the ufunc route, I think we would want three of them,
>> >> > matxvec,
>> >> > vecxmat, and matxmat, because the best inner loops would be different
>> >> > in
>> >> > the
>> >> > three cases,
>> >>
>> >> Couldn't we write a single inner loop like:
>> >>
>> >> void ufunc_loop(blah blah) {
>> >>     if (arg1_shape[0] == 1 && arg2_shape[1] == 1) {
>> >>         call DOT
>> >>     } else if (arg2_shape[0] == 1) {
>> >>         call GEMV
>> >>     } else if (...) {
>> >>         ...
>> >>     } else {
>> >>         call GEMM
>> >>     }
>> >> }
>> >> ?
>> >
>> > Not for generalized ufuncs, different signatures, or if linearized, more
>> > info on dimensions.
>>
>> This sentence no verb, but I think the point you might be raising is:
>> we don't actually have the technical capability to define a single
>> gufunc for @, because the matmat, matvec, vecmat, and vecvec forms
>> have different gufunc signatures ("mn,nk->mk", "mn,n->m", "n,nk->k",
>> and "n,n->" respectively, I think)?
>>
>> This is true, but Jaime has said he's willing to look at fixing this :-):
>>
>> http://thread.gmane.org/gmane.comp.python.numeric.general/58669/focus=58670
>>
>
> Don't see the need here, the loops are not complicated.
>
>>
>> ...and fundamentally, it's very difficult to solve this anywhere else
>> except the ufunc internals. When we first enter a function like
>> newdot, we need to check for special overloads like __numpy_ufunc__
>> *before* we do any array_like->ndarray coercion. But we have to do
>> array_like->ndarray coercion before we know what the shape/ndim of the
>> our inputs is.
>
>
> True.
>
>>
>> And in the wrapper-around-multiple-ufuncs approach, we
>> have to check the shape/ndim before we choose which ufunc to dispatch
>> to.
>
>
> True. I don't see a problem there and the four ufuncs would be useful in
> themselves. I think they should be part of the multiarray module methods if
> we go that way.

Not sure what you mean about multiarray module methods -- it's kinda
tricky to expose ufuncs from multiarray.so isn't it, b/c of the split
between multiarray.so and umath.so?

>> So __numpy_ufunc__ won't get checked until after we've already
>> coerced to ndarray, oops... OTOH the ufunc internals already know how
>> to do this dance, so it's at least straightforward (if not necessarily
>> trivial) to insert the correct logic once in the correct place.
>
>
> I'm  thinking that the four ufuncs being overridden by user subtypes should
> be sufficient.

Maybe I don't understand what you're proposing. Suppose we get handed
a random 3rd party type that has a __numpy_ufunc__ attribute, but we
know nothing else about it. What do we do? Pick one of the 4 ufuncs at
random and call it?

>>
>>
>> > What you show is essentially what dot does now for cblas
>> > enabled functions. But note, we need more than the simple '@', we also
>> > need
>> > stacks of vectors, and turning vectors into matrices, and then back into
>> > vectors seems unnecessarily complicated.
>>
>> By the time we get into the data/shape/strides world that ufuncs work
>> with, converting vectors->matrices is literally just adding a single
>> entry to the shape+strides arrays. This feels trivial and cheap to me?
>
>
> And moving things around, then removing. It's an option if we just want to
> use matrix multiplication for everything. I don't think there is any speed
> advantage one way or the other, although there are currently size
> limitations that are easier to block in the 1D case than the matrix case. In
> practice 2**31 per dimension for floating point is probably plenty.

I guess we'll have to implement the 2d blocking sooner or later, and
then it won't much matter whether the 1d blocking is simpler, because
implementing *anything* for 1d blocking will still be more complicated
than just using the 2d blocking code. (Assuming DGEMM is just as fast
as DDOT/DGEMV, which seems likely.)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From antony.lee at berkeley.edu  Fri Sep 12 01:54:37 2014
From: antony.lee at berkeley.edu (Antony Lee)
Date: Thu, 11 Sep 2014 22:54:37 -0700
Subject: [Numpy-discussion] Broadcasting with np.logical_and.reduce
Message-ID: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>

Hi,
I thought that ufunc.reduce performs broadcasting, but it seems a bit
confused by boolean arrays:

<ipython with pylab mode on>
In [1]: add.reduce([array([1, 2]), array([1])])
Out[1]: array([2, 3])
In [2]: logical_and.reduce([array([True, False], dtype=bool), array([True],
dtype=bool)])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-bedbab4c13e1> in <module>()
----> 1 logical_and.reduce([array([True, False], dtype=bool), array([True],
dtype=bool)])

ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

Am I missing something here?

Thanks,
Antony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140911/86332c93/attachment.html>

From sebastian at sipsolutions.net  Fri Sep 12 03:48:06 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 12 Sep 2014 09:48:06 +0200
Subject: [Numpy-discussion] Broadcasting with np.logical_and.reduce
In-Reply-To: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>
References: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>
Message-ID: <1410508086.13211.1.camel@sebastian-t440>

On Do, 2014-09-11 at 22:54 -0700, Antony Lee wrote:
> Hi,
> I thought that ufunc.reduce performs broadcasting, but it seems a bit
> confused by boolean arrays:
> 
> <ipython with pylab mode on>
> In [1]: add.reduce([array([1, 2]), array([1])])
> Out[1]: array([2, 3])
> In [2]: logical_and.reduce([array([True, False], dtype=bool),
> array([True], dtype=bool)])
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call
> last)
> <ipython-input-2-bedbab4c13e1> in <module>()
> ----> 1 logical_and.reduce([array([True, False], dtype=bool),
> array([True], dtype=bool)])
> 
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()
> 
> Am I missing something here?
> 

`np.asarray([array([1, 2]), array([1])])` is an object array, not a
boolean array. You probably want to concatenate them.

- Sebastian


> Thanks,
> Antony
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140912/f9d88b4c/attachment.sig>

From antony.lee at berkeley.edu  Fri Sep 12 05:04:53 2014
From: antony.lee at berkeley.edu (Antony Lee)
Date: Fri, 12 Sep 2014 02:04:53 -0700
Subject: [Numpy-discussion] Broadcasting with np.logical_and.reduce
In-Reply-To: <1410508086.13211.1.camel@sebastian-t440>
References: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>
	<1410508086.13211.1.camel@sebastian-t440>
Message-ID: <CAGRr6BEbJdtRGyS+So7DRKRU7nRmxuHQ5NQ+0ZX=B34CKkA=Qg@mail.gmail.com>

I am not using asarray here.  Sorry, but I don't see how this is relevant
-- my comparison with np.add.reduce is simply that when a list of float
arrays is passed to np.add.reduce, broadcasting happens as usual, but not
when a list of bool arrays is passed to np.logical_and.reduce.

2014-09-12 0:48 GMT-07:00 Sebastian Berg <sebastian at sipsolutions.net>:

> On Do, 2014-09-11 at 22:54 -0700, Antony Lee wrote:
> > Hi,
> > I thought that ufunc.reduce performs broadcasting, but it seems a bit
> > confused by boolean arrays:
> >
> > <ipython with pylab mode on>
> > In [1]: add.reduce([array([1, 2]), array([1])])
> > Out[1]: array([2, 3])
> > In [2]: logical_and.reduce([array([True, False], dtype=bool),
> > array([True], dtype=bool)])
> >
> ---------------------------------------------------------------------------
> > ValueError                                Traceback (most recent call
> > last)
> > <ipython-input-2-bedbab4c13e1> in <module>()
> > ----> 1 logical_and.reduce([array([True, False], dtype=bool),
> > array([True], dtype=bool)])
> >
> > ValueError: The truth value of an array with more than one element is
> > ambiguous. Use a.any() or a.all()
> >
> > Am I missing something here?
> >
>
> `np.asarray([array([1, 2]), array([1])])` is an object array, not a
> boolean array. You probably want to concatenate them.
>
> - Sebastian
>
>
> > Thanks,
> > Antony
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140912/04633df1/attachment.html>

From robert.kern at gmail.com  Fri Sep 12 05:22:06 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 12 Sep 2014 10:22:06 +0100
Subject: [Numpy-discussion] Broadcasting with np.logical_and.reduce
In-Reply-To: <CAGRr6BEbJdtRGyS+So7DRKRU7nRmxuHQ5NQ+0ZX=B34CKkA=Qg@mail.gmail.com>
References: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>
	<1410508086.13211.1.camel@sebastian-t440>
	<CAGRr6BEbJdtRGyS+So7DRKRU7nRmxuHQ5NQ+0ZX=B34CKkA=Qg@mail.gmail.com>
Message-ID: <CAF6FJisZi6b+tdhK=2wX2jw0FofcV22Yti4GXqaCgmmw26PNUg@mail.gmail.com>

On Fri, Sep 12, 2014 at 10:04 AM, Antony Lee <antony.lee at berkeley.edu> wrote:
> I am not using asarray here.  Sorry, but I don't see how this is relevant --
> my comparison with np.add.reduce is simply that when a list of float arrays
> is passed to np.add.reduce, broadcasting happens as usual, but not when a
> list of bool arrays is passed to np.logical_and.reduce.

But np.logical_and.reduce() *does* use asarray() when it is given a
list object (all ufunc .reduce() methods do this). In both cases, you
get a dtype=object array. This means that the ufunc will use the
dtype=object inner loop, not the dtype=bool inner loop. For np.add,
this isn't a problem. It just calls the __add__() method on the first
object which, since it's an ndarray, calls np.add() again to do the
actual work, this time using the appropriate dtype inner loop for the
inner objects. But np.logical_and is different! For the dtype=object
inner loop, it directly calls bool(x) on each item of the object
array; it doesn't defer to any other method that might do the
computation. bool(almost_any_ndarray) raises the ValueError that you
saw. np.logical_and.reduce([x, y]) is not the same as
np.logical_and(x, y). You can see how the dtype=object inner loop of
np.logical_and() works by directly constructing dtype=object shape-()
arrays:

[~]
|14> x
array(None, dtype=object)

[~]
|15> x[()] = np.array([True, False])

[~]
|16> x
array(array([ True, False], dtype=bool), dtype=object)

[~]
|17> y = np.array(None, dtype=object)

[~]
|18> y[()] = np.array([[True], [False]])

[~]
|19> y
array(array([[ True],
       [False]], dtype=bool), dtype=object)

[~]
|20> np.logical_and(x, y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-17705aa17a6f> in <module>()
----> 1 np.logical_and(x, y)

ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()

-- 
Robert Kern


From charlesr.harris at gmail.com  Fri Sep 12 08:07:53 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 12 Sep 2014 06:07:53 -0600
Subject: [Numpy-discussion] @ operator
In-Reply-To: <CAPJVwBkY1Xu5va5wZhyN+Gc7_1ccvfC+uw_ikuKJmzKM3EcTTw@mail.gmail.com>
References: <CAB6mnxKUM1iYgYTtD=OsUkXwLip8F1nch6X=pSyYSf5CtwSytA@mail.gmail.com>
	<lupvk1$q2e$1@ger.gmane.org>
	<CAB6mnxLKTPNawiWMZQRQLCYHe6QOS656Wn=3uN-1GvArwvtiSg@mail.gmail.com>
	<CAPJVwB=rS7UjYmYLpYBsWquTK3N0EYWNDXxHTgn0PtX4NntZ9g@mail.gmail.com>
	<CAB6mnx+_4dmjhnGDWmw0dW5UYP=b3=+asw3wh3bFWogX90nD-A@mail.gmail.com>
	<CAPJVwBn71_ErNenr9k9Mkh34QX5pzYhsMj3zGF232t5yvvBR1A@mail.gmail.com>
	<CAB6mnxJKz5X9+QO4KyKanHsgWOJHq0UPz=fEC7LtfU455+Rdyw@mail.gmail.com>
	<CAPJVwBmYAPj0PmntAffd8r=VrN-XUi6Kn1sgWpGoAmKJbxfHyQ@mail.gmail.com>
	<CAB6mnxK_bzi_OvTTXX6uLkMZ_BhLUT1EKN61A0GPVGTX3rdk2Q@mail.gmail.com>
	<CAPJVwBkY1Xu5va5wZhyN+Gc7_1ccvfC+uw_ikuKJmzKM3EcTTw@mail.gmail.com>
Message-ID: <CAB6mnxL7ONi9-cUX2UtOFWCgpZVuAVeDu0=QYT6aA128HacfvQ@mail.gmail.com>

On Thu, Sep 11, 2014 at 11:09 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Thu, Sep 11, 2014 at 11:18 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> > On Thu, Sep 11, 2014 at 8:49 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >>
> >> On Thu, Sep 11, 2014 at 10:12 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> > On Thu, Sep 11, 2014 at 8:01 PM, Nathaniel Smith <njs at pobox.com>
> wrote:
> >> >>
> >> >> On Thu, Sep 11, 2014 at 12:10 PM, Charles R Harris
> >> >> <charlesr.harris at gmail.com> wrote:
> >> >> >
> >> >> > On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> My vote is:
> >> >> >>
> >> >> >> __matmul__/__rmatmul__ do the standard dispatch stuff that all
> >> >> >> __op__
> >> >> >> methods do (so I guess check __array_priority__ or whatever it is
> we
> >> >> >> always do). I'd also be okay with ignoring __array_priority__ on
> the
> >> >> >> grounds that __numpy_ufunc__ is better, and there's no existing
> code
> >> >> >> relying on __array_priority__ support in __matmul__.
> >> >> >>
> >> >> >> Having decided that we are actually going to run, they dispatch
> >> >> >> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
> >> >> >> in-place version), similarly to how e.g. __add__ dispatches to
> >> >> >> np.add.
> >> >> >>
> >> >> >> newdot acts like a standard gufunc with all the standard niceties,
> >> >> >> including __numpy_ufunc__ dispatch.
> >> >> >>
> >> >> >> ("newdot" here is intended as a placeholder name, maybe it should
> be
> >> >> >> np.linalg.matmul or something else to be bikeshed later. I also
> vote
> >> >> >> that eventually 'dot' become an alias for this function, but
> whether
> >> >> >> to do that is an orthogonal discussion for later.)
> >> >> >>
> >> >> > If we went the ufunc route, I think we would want three of them,
> >> >> > matxvec,
> >> >> > vecxmat, and matxmat, because the best inner loops would be
> different
> >> >> > in
> >> >> > the
> >> >> > three cases,
> >> >>
> >> >> Couldn't we write a single inner loop like:
> >> >>
> >> >> void ufunc_loop(blah blah) {
> >> >>     if (arg1_shape[0] == 1 && arg2_shape[1] == 1) {
> >> >>         call DOT
> >> >>     } else if (arg2_shape[0] == 1) {
> >> >>         call GEMV
> >> >>     } else if (...) {
> >> >>         ...
> >> >>     } else {
> >> >>         call GEMM
> >> >>     }
> >> >> }
> >> >> ?
> >> >
> >> > Not for generalized ufuncs, different signatures, or if linearized,
> more
> >> > info on dimensions.
> >>
> >> This sentence no verb, but I think the point you might be raising is:
> >> we don't actually have the technical capability to define a single
> >> gufunc for @, because the matmat, matvec, vecmat, and vecvec forms
> >> have different gufunc signatures ("mn,nk->mk", "mn,n->m", "n,nk->k",
> >> and "n,n->" respectively, I think)?
> >>
> >> This is true, but Jaime has said he's willing to look at fixing this
> :-):
> >>
> >>
> http://thread.gmane.org/gmane.comp.python.numeric.general/58669/focus=58670
> >>
> >
> > Don't see the need here, the loops are not complicated.
> >
> >>
> >> ...and fundamentally, it's very difficult to solve this anywhere else
> >> except the ufunc internals. When we first enter a function like
> >> newdot, we need to check for special overloads like __numpy_ufunc__
> >> *before* we do any array_like->ndarray coercion. But we have to do
> >> array_like->ndarray coercion before we know what the shape/ndim of the
> >> our inputs is.
> >
> >
> > True.
> >
> >>
> >> And in the wrapper-around-multiple-ufuncs approach, we
> >> have to check the shape/ndim before we choose which ufunc to dispatch
> >> to.
> >
> >
> > True. I don't see a problem there and the four ufuncs would be useful in
> > themselves. I think they should be part of the multiarray module methods
> if
> > we go that way.
>
> Not sure what you mean about multiarray module methods -- it's kinda
> tricky to expose ufuncs from multiarray.so isn't it, b/c of the split
> between multiarray.so and umath.so?
>

Using the umath c-api is no worse than using the other c-api's. Using
ufuncs from umath is trickier, IIRC multiarray initializes the ops that use
them on load, but I'd define the new functions in multiarray, not umath.
Generating the generalized functions can take place either during module
load, or by using the static variable pattern on first function call, i.e.,

static ufunc myfunc = NULL;
if (myfunc == NULL) {
     myfunc = make_ufunc(...);
}

I tend towards the latter as I want to use the functions in the multiarray
c-api. The umath module uses the former.

The vector dot functions are already available in the descr and can be
passed to a generic loop (overhead), but what I would like to do at some
point is bring all the loops together in their own file in multiarray.

>
> >> So __numpy_ufunc__ won't get checked until after we've already
> >> coerced to ndarray, oops... OTOH the ufunc internals already know how
> >> to do this dance, so it's at least straightforward (if not necessarily
> >> trivial) to insert the correct logic once in the correct place.
> >
> >
> > I'm  thinking that the four ufuncs being overridden by user subtypes
> should
> > be sufficient.
>
> Maybe I don't understand what you're proposing.


I'd like to have the four ufuncs vecvec, vecmat, matvec, and matmat in
multiarray. Each of those can be overridden by subtypes using the
__numpy_ufunc__ mechanism. Then matmul would, as you say, make the input
arrays and then call the appropriate function depending on dimensions.

Suppose we get handed
> a random 3rd party type that has a __numpy_ufunc__ attribute, but we
> know nothing else about it. What do we do? Pick one of the 4 ufuncs at
> random and call it?
>

Every ufunc, when called, uses the numpy_ufunc mechanism to check for
overrides, it is built into the ufunc. So a submodule would treat the four
new functions like it treats any other ufunc. That would be one of the
benefits of using ufuncs. The call chain would be matmul -> new_ufunc ->
subclass, the subclass would not need to bother with matmul itself.


>
> >>
> >>
> >> > What you show is essentially what dot does now for cblas
> >> > enabled functions. But note, we need more than the simple '@', we also
> >> > need
> >> > stacks of vectors, and turning vectors into matrices, and then back
> into
> >> > vectors seems unnecessarily complicated.
> >>
> >> By the time we get into the data/shape/strides world that ufuncs work
> >> with, converting vectors->matrices is literally just adding a single
> >> entry to the shape+strides arrays. This feels trivial and cheap to me?
> >
> >
> > And moving things around, then removing. It's an option if we just want
> to
> > use matrix multiplication for everything. I don't think there is any
> speed
> > advantage one way or the other, although there are currently size
> > limitations that are easier to block in the 1D case than the matrix
> case. In
> > practice 2**31 per dimension for floating point is probably plenty.
>
> I guess we'll have to implement the 2d blocking sooner or later, and
> then it won't much matter whether the 1d blocking is simpler, because
> implementing *anything* for 1d blocking will still be more complicated
> than just using the 2d blocking code. (Assuming DGEMM is just as fast
> as DDOT/DGEMV, which seems likely.)
>

Yeah, but that can be pushed off a while, at least until we have a working
implementation.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140912/e4f2fd89/attachment.html>

From pierre.haessig at crans.org  Fri Sep 12 08:56:58 2014
From: pierre.haessig at crans.org (Pierre Haesssig)
Date: Fri, 12 Sep 2014 14:56:58 +0200
Subject: [Numpy-discussion] How to get docs for functions processed by
 numpy.vectorize?
In-Reply-To: <5411C3DB.8020501@gmail.com>
References: <C6D19027-31EF-4F08-B0A5-B2D43B9F23E7@gmail.com>
	<5411C3DB.8020501@gmail.com>
Message-ID: <5412ED9A.9070101@crans.org>

Hi,

I tried to following the calling logic behind the help function and I've 
arrived at the 3-4th level underground with the pydoc.render_doc 
function. Here some logic is inspecting the `thing` that getting 
documented. The fact that the docstring of the vectorize function is not 
read may relate to the fact that the type of a vectorized function is 
not `function` but `numpy.lib.function_base.vectorize` instead.

Here is the code that does the magic, for futher inspection :

pydoc.render_doc??

def render_doc(thing, title='Python Library Documentation: %s', 
forceload=0):
     """Render text documentation, given an object or a path to an 
object."""
     object, name = resolve(thing, forceload)
     desc = describe(object)
     module = inspect.getmodule(object)
     if name and '.' in name:
         desc += ' in ' + name[:name.rfind('.')]
     elif module and module is not object:
         desc += ' in module ' + module.__name__
     if type(object) is _OLD_INSTANCE_TYPE:
         # If the passed object is an instance of an old-style class,
         # document its available methods instead of its value.
         object = object.__class__
     elif not (inspect.ismodule(object) or
               inspect.isclass(object) or
               inspect.isroutine(object) or
               inspect.isgetsetdescriptor(object) or
               inspect.ismemberdescriptor(object) or
               isinstance(object, property)):
         # If the passed object is a piece of data or an instance,
         # document its available methods instead of its value.
         object = type(object)
         desc += ' object'
     return title % desc + '\n\n' + text.document(object, name)


-- 
Pierre


From antony.lee at berkeley.edu  Fri Sep 12 12:44:25 2014
From: antony.lee at berkeley.edu (Antony Lee)
Date: Fri, 12 Sep 2014 09:44:25 -0700
Subject: [Numpy-discussion] Broadcasting with np.logical_and.reduce
In-Reply-To: <CAF6FJisZi6b+tdhK=2wX2jw0FofcV22Yti4GXqaCgmmw26PNUg@mail.gmail.com>
References: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>
	<1410508086.13211.1.camel@sebastian-t440>
	<CAGRr6BEbJdtRGyS+So7DRKRU7nRmxuHQ5NQ+0ZX=B34CKkA=Qg@mail.gmail.com>
	<CAF6FJisZi6b+tdhK=2wX2jw0FofcV22Yti4GXqaCgmmw26PNUg@mail.gmail.com>
Message-ID: <CAGRr6BGSQ3BfugE-1X6NfXUbJBjJAnzcR0hVYieNnHFAnU2s1Q@mail.gmail.com>

I see.  I went back to the documentation of ufunc.reduce and this is not
explicitly mentioned although a posteriori it makes sense; perhaps this can
be made clearer there?
Antony

2014-09-12 2:22 GMT-07:00 Robert Kern <robert.kern at gmail.com>:

> On Fri, Sep 12, 2014 at 10:04 AM, Antony Lee <antony.lee at berkeley.edu>
> wrote:
> > I am not using asarray here.  Sorry, but I don't see how this is
> relevant --
> > my comparison with np.add.reduce is simply that when a list of float
> arrays
> > is passed to np.add.reduce, broadcasting happens as usual, but not when a
> > list of bool arrays is passed to np.logical_and.reduce.
>
> But np.logical_and.reduce() *does* use asarray() when it is given a
> list object (all ufunc .reduce() methods do this). In both cases, you
> get a dtype=object array. This means that the ufunc will use the
> dtype=object inner loop, not the dtype=bool inner loop. For np.add,
> this isn't a problem. It just calls the __add__() method on the first
> object which, since it's an ndarray, calls np.add() again to do the
> actual work, this time using the appropriate dtype inner loop for the
> inner objects. But np.logical_and is different! For the dtype=object
> inner loop, it directly calls bool(x) on each item of the object
> array; it doesn't defer to any other method that might do the
> computation. bool(almost_any_ndarray) raises the ValueError that you
> saw. np.logical_and.reduce([x, y]) is not the same as
> np.logical_and(x, y). You can see how the dtype=object inner loop of
> np.logical_and() works by directly constructing dtype=object shape-()
> arrays:
>
> [~]
> |14> x
> array(None, dtype=object)
>
> [~]
> |15> x[()] = np.array([True, False])
>
> [~]
> |16> x
> array(array([ True, False], dtype=bool), dtype=object)
>
> [~]
> |17> y = np.array(None, dtype=object)
>
> [~]
> |18> y[()] = np.array([[True], [False]])
>
> [~]
> |19> y
> array(array([[ True],
>        [False]], dtype=bool), dtype=object)
>
> [~]
> |20> np.logical_and(x, y)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-20-17705aa17a6f> in <module>()
> ----> 1 np.logical_and(x, y)
>
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140912/f4818b01/attachment.html>

From robert.kern at gmail.com  Fri Sep 12 12:46:56 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 12 Sep 2014 17:46:56 +0100
Subject: [Numpy-discussion] Broadcasting with np.logical_and.reduce
In-Reply-To: <CAGRr6BGSQ3BfugE-1X6NfXUbJBjJAnzcR0hVYieNnHFAnU2s1Q@mail.gmail.com>
References: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>
	<1410508086.13211.1.camel@sebastian-t440>
	<CAGRr6BEbJdtRGyS+So7DRKRU7nRmxuHQ5NQ+0ZX=B34CKkA=Qg@mail.gmail.com>
	<CAF6FJisZi6b+tdhK=2wX2jw0FofcV22Yti4GXqaCgmmw26PNUg@mail.gmail.com>
	<CAGRr6BGSQ3BfugE-1X6NfXUbJBjJAnzcR0hVYieNnHFAnU2s1Q@mail.gmail.com>
Message-ID: <CAF6FJisDpCMr3fRk3=jg1k7n-yc_eiMrUTG1a1G5Vbu4S54x5Q@mail.gmail.com>

On Fri, Sep 12, 2014 at 5:44 PM, Antony Lee <antony.lee at berkeley.edu> wrote:
> I see.  I went back to the documentation of ufunc.reduce and this is not
> explicitly mentioned although a posteriori it makes sense; perhaps this can
> be made clearer there?

Please recommend the documentation you would like to see.

-- 
Robert Kern


From robert.kern at gmail.com  Fri Sep 12 13:46:15 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 12 Sep 2014 18:46:15 +0100
Subject: [Numpy-discussion] Broadcasting with np.logical_and.reduce
In-Reply-To: <CAF6FJisDpCMr3fRk3=jg1k7n-yc_eiMrUTG1a1G5Vbu4S54x5Q@mail.gmail.com>
References: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>
	<1410508086.13211.1.camel@sebastian-t440>
	<CAGRr6BEbJdtRGyS+So7DRKRU7nRmxuHQ5NQ+0ZX=B34CKkA=Qg@mail.gmail.com>
	<CAF6FJisZi6b+tdhK=2wX2jw0FofcV22Yti4GXqaCgmmw26PNUg@mail.gmail.com>
	<CAGRr6BGSQ3BfugE-1X6NfXUbJBjJAnzcR0hVYieNnHFAnU2s1Q@mail.gmail.com>
	<CAF6FJisDpCMr3fRk3=jg1k7n-yc_eiMrUTG1a1G5Vbu4S54x5Q@mail.gmail.com>
Message-ID: <CAF6FJivgW5frSbgtt6=P47yz1RPm-ocf0eJcUfVPcBghwFeUnw@mail.gmail.com>

On Fri, Sep 12, 2014 at 5:46 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Fri, Sep 12, 2014 at 5:44 PM, Antony Lee <antony.lee at berkeley.edu> wrote:
>> I see.  I went back to the documentation of ufunc.reduce and this is not
>> explicitly mentioned although a posteriori it makes sense; perhaps this can
>> be made clearer there?
>
> Please recommend the documentation you would like to see.

Specifically, the behavior I described is the interaction of several
different things, but you don't mention which part of it "is not
explicitly mentioned".

-- 
Robert Kern


From charlesr.harris at gmail.com  Fri Sep 12 17:38:31 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 12 Sep 2014 15:38:31 -0600
Subject: [Numpy-discussion] Remove numpy/core/src/multiarray/testcalcs.py
	file
Message-ID: <CAB6mnxKQcJ03kqwg0qDWxxfy9YkKN=16BM+F3u4G7jX5bQcf8Q@mail.gmail.com>

Hi All,

There is an old file, numpy/core/src/multiarray/testcalcs.py, that looks
like a forgotten leftover from the original 2009 datetime work. Is there
any reason that this file should not be removed?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140912/1fcbe46c/attachment.html>

From antony.lee at berkeley.edu  Fri Sep 12 18:56:26 2014
From: antony.lee at berkeley.edu (Antony Lee)
Date: Fri, 12 Sep 2014 15:56:26 -0700
Subject: [Numpy-discussion] Broadcasting with np.logical_and.reduce
In-Reply-To: <CAF6FJivgW5frSbgtt6=P47yz1RPm-ocf0eJcUfVPcBghwFeUnw@mail.gmail.com>
References: <CAGRr6BFPUG42sJqapZpso_MDxBjq-YPvUG8QbhwyTA4kDVMYOg@mail.gmail.com>
	<1410508086.13211.1.camel@sebastian-t440>
	<CAGRr6BEbJdtRGyS+So7DRKRU7nRmxuHQ5NQ+0ZX=B34CKkA=Qg@mail.gmail.com>
	<CAF6FJisZi6b+tdhK=2wX2jw0FofcV22Yti4GXqaCgmmw26PNUg@mail.gmail.com>
	<CAGRr6BGSQ3BfugE-1X6NfXUbJBjJAnzcR0hVYieNnHFAnU2s1Q@mail.gmail.com>
	<CAF6FJisDpCMr3fRk3=jg1k7n-yc_eiMrUTG1a1G5Vbu4S54x5Q@mail.gmail.com>
	<CAF6FJivgW5frSbgtt6=P47yz1RPm-ocf0eJcUfVPcBghwFeUnw@mail.gmail.com>
Message-ID: <CAGRr6BEW-6CiRhjqbRX-9GUN8MOKSmhbDv3ujXBTmNLNHvBJCA@mail.gmail.com>

I read the "Methods" section of the ufuncs doc page (
http://docs.scipy.org/doc/numpy/reference/ufuncs.html#methods) again and I
think this could be made clearer simply by replacing the first sentence from

"All ufuncs have four methods."

to

"All ufuncs have five methods that operate on array-like objects." (yes,
there's also "at", which seems to have been added later to the doc...)

This would make it somewhat clearer that

"logical_and.reduce([array([True, False], dtype=bool), array([True],
dtype=bool)])"

interprets the single list argument as an array-like (of dtype object)
rather than as an iterable over which to reduce (as python's builtin reduce
would).

In fact there is another point in that paragraph that could be improved;
namely "axis" does not have to be an integer for "reduce".

Antony

2014-09-12 10:46 GMT-07:00 Robert Kern <robert.kern at gmail.com>:

> On Fri, Sep 12, 2014 at 5:46 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
> > On Fri, Sep 12, 2014 at 5:44 PM, Antony Lee <antony.lee at berkeley.edu>
> wrote:
> >> I see.  I went back to the documentation of ufunc.reduce and this is not
> >> explicitly mentioned although a posteriori it makes sense; perhaps this
> can
> >> be made clearer there?
> >
> > Please recommend the documentation you would like to see.
>
> Specifically, the behavior I described is the interaction of several
> different things, but you don't mention which part of it "is not
> explicitly mentioned".
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140912/ccfab745/attachment.html>

From rnelsonchem at gmail.com  Sun Sep 14 22:22:41 2014
From: rnelsonchem at gmail.com (Ryan Nelson)
Date: Sun, 14 Sep 2014 22:22:41 -0400
Subject: [Numpy-discussion] Question about broadcasting vs for loop
	performance
Message-ID: <CAFbt6dcSa_OAZ8DLzvWjwCE3W5m3bxRMD88AQTdLQDED-akuAA@mail.gmail.com>

Hello all,

I have a question about the performance of broadcasting versus Python for
loops. I have the following sample code that approximates some simulation
I'd like to do:

## Test Code ##

import numpy as np


def lorentz(x, pos, inten, hwhm):

    return inten*( hwhm**2 / ( (x - pos)**2 + hwhm**2 ) )


poss = np.random.rand(100)

intens = np.random.rand(100)

xs = np.linspace(0,10,10000)


def first_try():

    sim_inten = np.zeros(xs.shape)

    for freq, inten in zip(poss, intens):

        sim_inten += lorentz(xs, freq, inten, 5.0)

    return sim_inten


def second_try():

    sim_inten2 = lorentz(xs.reshape((-1,1)), poss, intens, 5.0)

    sim_inten2 = sim_inten2.sum(axis=1)

    return sim_inten2


print np.array_equal(first_try(), second_try())


## End Test ##


Running this script prints "True" for the final equality test. However,
IPython's %timeit magic, gives ~10 ms for first_try and ~30 ms for
second_try. I tried this on Windows 7 (Anaconda Python) and on a Linux
machine both with Python 2.7 and Numpy 1.8.2.


I understand in principle why broadcasting should be faster than Python
loops, but I'm wondering why I'm getting worse results with the pure Numpy
function. Is there some general rules for when broadcasting might give
worse performance than a Python loop?


Thanks


Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140914/6c376410/attachment.html>

From rnelsonchem at gmail.com  Sun Sep 14 22:53:13 2014
From: rnelsonchem at gmail.com (Ryan Nelson)
Date: Sun, 14 Sep 2014 22:53:13 -0400
Subject: [Numpy-discussion] Question about broadcasting vs for loop
	performance
In-Reply-To: <CAFbt6dcSa_OAZ8DLzvWjwCE3W5m3bxRMD88AQTdLQDED-akuAA@mail.gmail.com>
References: <CAFbt6dcSa_OAZ8DLzvWjwCE3W5m3bxRMD88AQTdLQDED-akuAA@mail.gmail.com>
Message-ID: <CAFbt6ddL87Bt_3CCRGp-bgH2K+w+m4y1TNCctvETOMmovp2UaQ@mail.gmail.com>

I think I figured out my own question. I guess that the broadcasting
approach is generating a very large 2D array in memory, which takes a bit
of extra time. I gathered this from reading the last example on the
following site:
http://wiki.scipy.org/EricsBroadcastingDoc
I tried this again with a much smaller "xs" array (~100 points) and the
broadcasting version was much faster.
Thanks

Ryan

Note: The link to the Scipy wiki page above is broken at the bottom of
Numpy's broadcasting page, otherwise I would have seen that earlier. Sorry
for the noise.

On Sun, Sep 14, 2014 at 10:22 PM, Ryan Nelson <rnelsonchem at gmail.com> wrote:

> Hello all,
>
> I have a question about the performance of broadcasting versus Python for
> loops. I have the following sample code that approximates some simulation
> I'd like to do:
>
> ## Test Code ##
>
> import numpy as np
>
>
> def lorentz(x, pos, inten, hwhm):
>
>     return inten*( hwhm**2 / ( (x - pos)**2 + hwhm**2 ) )
>
>
> poss = np.random.rand(100)
>
> intens = np.random.rand(100)
>
> xs = np.linspace(0,10,10000)
>
>
> def first_try():
>
>     sim_inten = np.zeros(xs.shape)
>
>     for freq, inten in zip(poss, intens):
>
>         sim_inten += lorentz(xs, freq, inten, 5.0)
>
>     return sim_inten
>
>
> def second_try():
>
>     sim_inten2 = lorentz(xs.reshape((-1,1)), poss, intens, 5.0)
>
>     sim_inten2 = sim_inten2.sum(axis=1)
>
>     return sim_inten2
>
>
> print np.array_equal(first_try(), second_try())
>
>
> ## End Test ##
>
>
> Running this script prints "True" for the final equality test. However,
> IPython's %timeit magic, gives ~10 ms for first_try and ~30 ms for
> second_try. I tried this on Windows 7 (Anaconda Python) and on a Linux
> machine both with Python 2.7 and Numpy 1.8.2.
>
>
> I understand in principle why broadcasting should be faster than Python
> loops, but I'm wondering why I'm getting worse results with the pure Numpy
> function. Is there some general rules for when broadcasting might give
> worse performance than a Python loop?
>
>
> Thanks
>
>
> Ryan
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140914/3adc8701/attachment.html>

From mads.ipsen at gmail.com  Mon Sep 15 04:16:26 2014
From: mads.ipsen at gmail.com (Mads Ipsen)
Date: Mon, 15 Sep 2014 10:16:26 +0200
Subject: [Numpy-discussion] Tracking and inspecting numpy objects
Message-ID: <5416A05A.2000905@gmail.com>

Hi,

I am trying to inspect the reference count of numpy arrays generated by 
my application.

Initially, I thought I could inspect the tracked objects using 
gc.get_objects(), but, with respect to numpy objects, the returned 
container is empty. For example:

import numpy
import gc

data = numpy.ones(1024).reshape((32,32))

objs = [o for o in gc.get_objects() if isinstance(o, numpy.ndarray)]

print objs                # Prints empty list
print gc.is_tracked(data) # Print False

Why is this? Also, is there some other technique I can use to inspect
all numpy generated objects?

Thanks in advance.

Best regards,

Mads

-- 
+---------------------------------------------------------+
| Mads Ipsen                                              |
+----------------------+----------------------------------+
| G?seb?ksvej 7, 4. tv | phone:              +45-29716388 |
| DK-2500 Valby        | email:      mads.ipsen at gmail.com |
| Denmark              | map  :   www.tinyurl.com/ns52fpa |
+----------------------+----------------------------------+


From sebastian at sipsolutions.net  Mon Sep 15 05:48:48 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 15 Sep 2014 11:48:48 +0200
Subject: [Numpy-discussion] Linear algebra functions on empty arrays
Message-ID: <1410774528.7696.9.camel@sebastian-t440>

Hey all,

for https://github.com/numpy/numpy/pull/3861/files I would like to allow
0-sized dimensions for generalized ufuncs, meaning that the gufunc has
to be able to handle this, but also that it *can* handle it at all.
However lapack does not support this, so it needs some explicit fixing.
Also some of the linalg functions currently explicitly allow and others
explicitly disallow empty arrays.

For example the QR and eigvals does not allow it, but on the other hand
solve explicitly does (most probably never did, simply because lapack
does not). So I am wondering if there is some convention for this, or
what convention we should implement.

Regards,

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140915/43e18ecb/attachment.sig>

From sebastian at sipsolutions.net  Mon Sep 15 05:55:12 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 15 Sep 2014 11:55:12 +0200
Subject: [Numpy-discussion] Tracking and inspecting numpy objects
In-Reply-To: <5416A05A.2000905@gmail.com>
References: <5416A05A.2000905@gmail.com>
Message-ID: <1410774912.7696.13.camel@sebastian-t440>

On Mo, 2014-09-15 at 10:16 +0200, Mads Ipsen wrote:
> Hi,
> 
> I am trying to inspect the reference count of numpy arrays generated by 
> my application.
> 
> Initially, I thought I could inspect the tracked objects using 
> gc.get_objects(), but, with respect to numpy objects, the returned 
> container is empty. For example:
> 
> import numpy
> import gc
> 
> data = numpy.ones(1024).reshape((32,32))
> 
> objs = [o for o in gc.get_objects() if isinstance(o, numpy.ndarray)]
> 
> print objs                # Prints empty list
> print gc.is_tracked(data) # Print False
> 
> Why is this? Also, is there some other technique I can use to inspect
> all numpy generated objects?
> 

Two reasons. First of all, unless your array is an object arrays (or a
structured one with objects in it), there are no objects to track. The
array is a single python object without any referenced objects (except
possibly its `arr.base`).

Second of all -- and this is an issue -- numpy doesn't actually
implement the traverse slot, so it won't even work for object arrays
(numpy object arrays do not support circular garbage collection at this
time, please feel free to implement it ;)).

- Sebastian


> Thanks in advance.
> 
> Best regards,
> 
> Mads
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140915/2c5b447d/attachment.sig>

From hoogendoorn.eelco at gmail.com  Mon Sep 15 06:05:17 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Mon, 15 Sep 2014 12:05:17 +0200
Subject: [Numpy-discussion] Tracking and inspecting numpy objects
In-Reply-To: <1410774912.7696.13.camel@sebastian-t440>
References: <5416A05A.2000905@gmail.com>
	<1410774912.7696.13.camel@sebastian-t440>
Message-ID: <CAO0rnfGPsgLfmLXFcYQKi_gKW0NOMs-3-kSAVz=qFpW3iYoyzQ@mail.gmail.com>

On Mon, Sep 15, 2014 at 11:55 AM, Sebastian Berg <sebastian at sipsolutions.net
> wrote:

> On Mo, 2014-09-15 at 10:16 +0200, Mads Ipsen wrote:
> > Hi,
> >
> > I am trying to inspect the reference count of numpy arrays generated by
> > my application.
> >
> > Initially, I thought I could inspect the tracked objects using
> > gc.get_objects(), but, with respect to numpy objects, the returned
> > container is empty. For example:
> >
> > import numpy
> > import gc
> >
> > data = numpy.ones(1024).reshape((32,32))
> >
> > objs = [o for o in gc.get_objects() if isinstance(o, numpy.ndarray)]
> >
> > print objs                # Prints empty list
> > print gc.is_tracked(data) # Print False
> >
> > Why is this? Also, is there some other technique I can use to inspect
> > all numpy generated objects?
> >
>
> Two reasons. First of all, unless your array is an object arrays (or a
> structured one with objects in it), there are no objects to track. The
> array is a single python object without any referenced objects (except
> possibly its `arr.base`).
>
> Second of all -- and this is an issue -- numpy doesn't actually
> implement the traverse slot, so it won't even work for object arrays
> (numpy object arrays do not support circular garbage collection at this
> time, please feel free to implement it ;)).
>
> - Sebastian
>
>

Does this answer why the ndarray object itself isn't tracked though? I must
say I find this puzzling; the only thing I can think of is that the python
compiler notices that data isn't used anymore after its creation, and
deletes it right after its creation as an optimization, but that conflicts
with my own experience of the GC.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140915/0b6fa429/attachment.html>

From robert.kern at gmail.com  Mon Sep 15 06:10:59 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 15 Sep 2014 11:10:59 +0100
Subject: [Numpy-discussion] Tracking and inspecting numpy objects
In-Reply-To: <CAO0rnfGPsgLfmLXFcYQKi_gKW0NOMs-3-kSAVz=qFpW3iYoyzQ@mail.gmail.com>
References: <5416A05A.2000905@gmail.com>
	<1410774912.7696.13.camel@sebastian-t440>
	<CAO0rnfGPsgLfmLXFcYQKi_gKW0NOMs-3-kSAVz=qFpW3iYoyzQ@mail.gmail.com>
Message-ID: <CAF6FJitzh_Zig7e_5BHuaCv7sJZ1gEBcWN0mjMP7zoDTUF0bdg@mail.gmail.com>

On Mon, Sep 15, 2014 at 11:05 AM, Eelco Hoogendoorn
<hoogendoorn.eelco at gmail.com> wrote:

> Does this answer why the ndarray object itself isn't tracked though? I must
> say I find this puzzling; the only thing I can think of is that the python
> compiler notices that data isn't used anymore after its creation, and
> deletes it right after its creation as an optimization, but that conflicts
> with my own experience of the GC.

The "gc" that "gc.is_tracked()" refers to is just the cyclical garbage
detector, not the usual reference counting memory management that all
Python objects participate in. If your object cannot participate in
reference cycles (like most ndarrays), it doesn't need to be tracked.

-- 
Robert Kern


From sebastian at sipsolutions.net  Mon Sep 15 06:11:55 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 15 Sep 2014 12:11:55 +0200
Subject: [Numpy-discussion] Tracking and inspecting numpy objects
In-Reply-To: <CAO0rnfGPsgLfmLXFcYQKi_gKW0NOMs-3-kSAVz=qFpW3iYoyzQ@mail.gmail.com>
References: <5416A05A.2000905@gmail.com>
	<1410774912.7696.13.camel@sebastian-t440>
	<CAO0rnfGPsgLfmLXFcYQKi_gKW0NOMs-3-kSAVz=qFpW3iYoyzQ@mail.gmail.com>
Message-ID: <1410775915.8320.1.camel@sebastian-t440>

On Mo, 2014-09-15 at 12:05 +0200, Eelco Hoogendoorn wrote:
> 
> 
> 
> On Mon, Sep 15, 2014 at 11:55 AM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
>         On Mo, 2014-09-15 at 10:16 +0200, Mads Ipsen wrote:
>         > Hi,
>         >
>         > I am trying to inspect the reference count of numpy arrays
>         generated by
>         > my application.
>         >
>         > Initially, I thought I could inspect the tracked objects
>         using
>         > gc.get_objects(), but, with respect to numpy objects, the
>         returned
>         > container is empty. For example:
>         >
>         > import numpy
>         > import gc
>         >
>         > data = numpy.ones(1024).reshape((32,32))
>         >
>         > objs = [o for o in gc.get_objects() if isinstance(o,
>         numpy.ndarray)]
>         >
>         > print objs                # Prints empty list
>         > print gc.is_tracked(data) # Print False
>         >
>         > Why is this? Also, is there some other technique I can use
>         to inspect
>         > all numpy generated objects?
>         >
>         
>         Two reasons. First of all, unless your array is an object
>         arrays (or a
>         structured one with objects in it), there are no objects to
>         track. The
>         array is a single python object without any referenced objects
>         (except
>         possibly its `arr.base`).
>         
>         Second of all -- and this is an issue -- numpy doesn't
>         actually
>         implement the traverse slot, so it won't even work for object
>         arrays
>         (numpy object arrays do not support circular garbage
>         collection at this
>         time, please feel free to implement it ;)).
>         
>         - Sebastian
>         
>         
> 
> 
> 
> 
> Does this answer why the ndarray object itself isn't tracked though? I
> must say I find this puzzling; the only thing I can think of is that
> the python compiler notices that data isn't used anymore after its
> creation, and deletes it right after its creation as an optimization,
> but that conflicts with my own experience of the GC.
> 
> 

Not sure if it does, but my quick try and error says:
In [15]: class T(tuple):
   ....:     pass
   ....: 

In [16]: t = T()

In [17]: objs = [o for o in gc.get_objects() if isinstance(o, T)]

In [18]: objs
Out[18]: [()]

In [19]: a = 123.

In [20]: objs = [o for o in gc.get_objects() if isinstance(o, float)]

In [21]: objs
Out[21]: []

So I guess nothing is tracked, unless it contains things, and numpy
arrays don't say they can contain things (i.e. no traverse).

- Sebastian


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140915/e206f9da/attachment.sig>

From mads.ipsen at gmail.com  Mon Sep 15 07:06:15 2014
From: mads.ipsen at gmail.com (Mads Ipsen)
Date: Mon, 15 Sep 2014 13:06:15 +0200
Subject: [Numpy-discussion] Tracking and inspecting numpy objects
In-Reply-To: <1410775915.8320.1.camel@sebastian-t440>
References: <5416A05A.2000905@gmail.com>	<1410774912.7696.13.camel@sebastian-t440>	<CAO0rnfGPsgLfmLXFcYQKi_gKW0NOMs-3-kSAVz=qFpW3iYoyzQ@mail.gmail.com>
	<1410775915.8320.1.camel@sebastian-t440>
Message-ID: <5416C827.4090605@gmail.com>

Thanks to everybody for taking time to answer!

Best regards,

Mads

On 15/09/14 12:11, Sebastian Berg wrote:
> On Mo, 2014-09-15 at 12:05 +0200, Eelco Hoogendoorn wrote:
>>
>>
>>
>> On Mon, Sep 15, 2014 at 11:55 AM, Sebastian Berg
>> <sebastian at sipsolutions.net> wrote:
>>          On Mo, 2014-09-15 at 10:16 +0200, Mads Ipsen wrote:
>>          > Hi,
>>          >
>>          > I am trying to inspect the reference count of numpy arrays
>>          generated by
>>          > my application.
>>          >
>>          > Initially, I thought I could inspect the tracked objects
>>          using
>>          > gc.get_objects(), but, with respect to numpy objects, the
>>          returned
>>          > container is empty. For example:
>>          >
>>          > import numpy
>>          > import gc
>>          >
>>          > data = numpy.ones(1024).reshape((32,32))
>>          >
>>          > objs = [o for o in gc.get_objects() if isinstance(o,
>>          numpy.ndarray)]
>>          >
>>          > print objs                # Prints empty list
>>          > print gc.is_tracked(data) # Print False
>>          >
>>          > Why is this? Also, is there some other technique I can use
>>          to inspect
>>          > all numpy generated objects?
>>          >
>>
>>          Two reasons. First of all, unless your array is an object
>>          arrays (or a
>>          structured one with objects in it), there are no objects to
>>          track. The
>>          array is a single python object without any referenced objects
>>          (except
>>          possibly its `arr.base`).
>>
>>          Second of all -- and this is an issue -- numpy doesn't
>>          actually
>>          implement the traverse slot, so it won't even work for object
>>          arrays
>>          (numpy object arrays do not support circular garbage
>>          collection at this
>>          time, please feel free to implement it ;)).
>>
>>          - Sebastian
>>
>>
>>
>>
>>
>>
>> Does this answer why the ndarray object itself isn't tracked though? I
>> must say I find this puzzling; the only thing I can think of is that
>> the python compiler notices that data isn't used anymore after its
>> creation, and deletes it right after its creation as an optimization,
>> but that conflicts with my own experience of the GC.
>>
>>
>
> Not sure if it does, but my quick try and error says:
> In [15]: class T(tuple):
>     ....:     pass
>     ....:
>
> In [16]: t = T()
>
> In [17]: objs = [o for o in gc.get_objects() if isinstance(o, T)]
>
> In [18]: objs
> Out[18]: [()]
>
> In [19]: a = 123.
>
> In [20]: objs = [o for o in gc.get_objects() if isinstance(o, float)]
>
> In [21]: objs
> Out[21]: []
>
> So I guess nothing is tracked, unless it contains things, and numpy
> arrays don't say they can contain things (i.e. no traverse).
>
> - Sebastian
>
>
>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

-- 
+---------------------------------------------------------+
| Mads Ipsen                                              |
+----------------------+----------------------------------+
| G?seb?ksvej 7, 4. tv | phone:              +45-29716388 |
| DK-2500 Valby        | email:      mads.ipsen at gmail.com |
| Denmark              | map  :   www.tinyurl.com/ns52fpa |
+----------------------+----------------------------------+


From josef.pktd at gmail.com  Mon Sep 15 07:07:11 2014
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 15 Sep 2014 07:07:11 -0400
Subject: [Numpy-discussion] Linear algebra functions on empty arrays
In-Reply-To: <1410774528.7696.9.camel@sebastian-t440>
References: <1410774528.7696.9.camel@sebastian-t440>
Message-ID: <CAMMTP+CzKygODHoKpS+2TOZ8fNEnEuB3tmfB4GciSFFWbs7_FQ@mail.gmail.com>

On Mon, Sep 15, 2014 at 5:48 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> Hey all,
>
> for https://github.com/numpy/numpy/pull/3861/files I would like to allow
> 0-sized dimensions for generalized ufuncs, meaning that the gufunc has
> to be able to handle this, but also that it *can* handle it at all.
> However lapack does not support this, so it needs some explicit fixing.
> Also some of the linalg functions currently explicitly allow and others
> explicitly disallow empty arrays.
>
> For example the QR and eigvals does not allow it, but on the other hand
> solve explicitly does (most probably never did, simply because lapack
> does not). So I am wondering if there is some convention for this, or
> what convention we should implement.

What does an empty square matrix/array look like?

np.linalg.solve   can have empty rhs, but shape of empty lhs, `a`, is ?

If I do a QR(arr)  with arr.shape=(0, 5), what is R supposed to be ?


I just wrote some loops over linalg.qr, but I always initialized explicitly.

I didn't manage to figure out how empty arrays would be useful.

If an empty square matrix can only only be of shape (0, 0), then it's
no use (in my applications).


Josef


>
> Regards,
>
> Sebastian
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From sebastian at sipsolutions.net  Mon Sep 15 07:26:05 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 15 Sep 2014 13:26:05 +0200
Subject: [Numpy-discussion] Linear algebra functions on empty arrays
In-Reply-To: <CAMMTP+CzKygODHoKpS+2TOZ8fNEnEuB3tmfB4GciSFFWbs7_FQ@mail.gmail.com>
References: <1410774528.7696.9.camel@sebastian-t440>
	<CAMMTP+CzKygODHoKpS+2TOZ8fNEnEuB3tmfB4GciSFFWbs7_FQ@mail.gmail.com>
Message-ID: <1410780365.8320.9.camel@sebastian-t440>

On Mo, 2014-09-15 at 07:07 -0400, josef.pktd at gmail.com wrote:
> On Mon, Sep 15, 2014 at 5:48 AM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> > Hey all,
> >
> > for https://github.com/numpy/numpy/pull/3861/files I would like to allow
> > 0-sized dimensions for generalized ufuncs, meaning that the gufunc has
> > to be able to handle this, but also that it *can* handle it at all.
> > However lapack does not support this, so it needs some explicit fixing.
> > Also some of the linalg functions currently explicitly allow and others
> > explicitly disallow empty arrays.
> >
> > For example the QR and eigvals does not allow it, but on the other hand
> > solve explicitly does (most probably never did, simply because lapack
> > does not). So I am wondering if there is some convention for this, or
> > what convention we should implement.
> 
> What does an empty square matrix/array look like?
> 
> np.linalg.solve   can have empty rhs, but shape of empty lhs, `a`, is ?
> 
> If I do a QR(arr)  with arr.shape=(0, 5), what is R supposed to be ?
> 

QR may be more difficult since R may itself could not be empty, begging
the question if you want to error out or fill it sensibly.
Cholesky would require (0, 0) for example and for eigenvalues it would
somewhat make sense too, the (0, 0) matrix has 0 eigenvalues.
I did not go through them all, but I would like to figure out whether we
should aim to generally allow it, or maybe just allow it for some
special ones.

- Sebastian

> 
> I just wrote some loops over linalg.qr, but I always initialized explicitly.
> 
> I didn't manage to figure out how empty arrays would be useful.
> 
> If an empty square matrix can only only be of shape (0, 0), then it's
> no use (in my applications).
> 
> 
> Josef
> 
> 
> >
> > Regards,
> >
> > Sebastian
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140915/670edc83/attachment.sig>

From hoogendoorn.eelco at gmail.com  Mon Sep 15 07:31:31 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Mon, 15 Sep 2014 13:31:31 +0200
Subject: [Numpy-discussion] Tracking and inspecting numpy objects
In-Reply-To: <5416C827.4090605@gmail.com>
References: <5416A05A.2000905@gmail.com>
	<1410774912.7696.13.camel@sebastian-t440>
	<CAO0rnfGPsgLfmLXFcYQKi_gKW0NOMs-3-kSAVz=qFpW3iYoyzQ@mail.gmail.com>
	<1410775915.8320.1.camel@sebastian-t440>
	<5416C827.4090605@gmail.com>
Message-ID: <CAO0rnfGfQzUDnUnHyJ_HsRuMzC2stORJX_XWuVPYQVaRiYX3HA@mail.gmail.com>

I figured the reference to the object through the local scope would also be
tracked by the GC somehow, since the entire stack frame can be regarded as
a separate object itself, but apparently not.

On Mon, Sep 15, 2014 at 1:06 PM, Mads Ipsen <mads.ipsen at gmail.com> wrote:

> Thanks to everybody for taking time to answer!
>
> Best regards,
>
> Mads
>
> On 15/09/14 12:11, Sebastian Berg wrote:
> > On Mo, 2014-09-15 at 12:05 +0200, Eelco Hoogendoorn wrote:
> >>
> >>
> >>
> >> On Mon, Sep 15, 2014 at 11:55 AM, Sebastian Berg
> >> <sebastian at sipsolutions.net> wrote:
> >>          On Mo, 2014-09-15 at 10:16 +0200, Mads Ipsen wrote:
> >>          > Hi,
> >>          >
> >>          > I am trying to inspect the reference count of numpy arrays
> >>          generated by
> >>          > my application.
> >>          >
> >>          > Initially, I thought I could inspect the tracked objects
> >>          using
> >>          > gc.get_objects(), but, with respect to numpy objects, the
> >>          returned
> >>          > container is empty. For example:
> >>          >
> >>          > import numpy
> >>          > import gc
> >>          >
> >>          > data = numpy.ones(1024).reshape((32,32))
> >>          >
> >>          > objs = [o for o in gc.get_objects() if isinstance(o,
> >>          numpy.ndarray)]
> >>          >
> >>          > print objs                # Prints empty list
> >>          > print gc.is_tracked(data) # Print False
> >>          >
> >>          > Why is this? Also, is there some other technique I can use
> >>          to inspect
> >>          > all numpy generated objects?
> >>          >
> >>
> >>          Two reasons. First of all, unless your array is an object
> >>          arrays (or a
> >>          structured one with objects in it), there are no objects to
> >>          track. The
> >>          array is a single python object without any referenced objects
> >>          (except
> >>          possibly its `arr.base`).
> >>
> >>          Second of all -- and this is an issue -- numpy doesn't
> >>          actually
> >>          implement the traverse slot, so it won't even work for object
> >>          arrays
> >>          (numpy object arrays do not support circular garbage
> >>          collection at this
> >>          time, please feel free to implement it ;)).
> >>
> >>          - Sebastian
> >>
> >>
> >>
> >>
> >>
> >>
> >> Does this answer why the ndarray object itself isn't tracked though? I
> >> must say I find this puzzling; the only thing I can think of is that
> >> the python compiler notices that data isn't used anymore after its
> >> creation, and deletes it right after its creation as an optimization,
> >> but that conflicts with my own experience of the GC.
> >>
> >>
> >
> > Not sure if it does, but my quick try and error says:
> > In [15]: class T(tuple):
> >     ....:     pass
> >     ....:
> >
> > In [16]: t = T()
> >
> > In [17]: objs = [o for o in gc.get_objects() if isinstance(o, T)]
> >
> > In [18]: objs
> > Out[18]: [()]
> >
> > In [19]: a = 123.
> >
> > In [20]: objs = [o for o in gc.get_objects() if isinstance(o, float)]
> >
> > In [21]: objs
> > Out[21]: []
> >
> > So I guess nothing is tracked, unless it contains things, and numpy
> > arrays don't say they can contain things (i.e. no traverse).
> >
> > - Sebastian
> >
> >
> >
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> --
> +---------------------------------------------------------+
> | Mads Ipsen                                              |
> +----------------------+----------------------------------+
> | G?seb?ksvej 7, 4. tv | phone:              +45-29716388 |
> | DK-2500 Valby        | email:      mads.ipsen at gmail.com |
> | Denmark              | map  :   www.tinyurl.com/ns52fpa |
> +----------------------+----------------------------------+
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140915/41574c3c/attachment.html>

From robert.kern at gmail.com  Mon Sep 15 08:07:16 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 15 Sep 2014 13:07:16 +0100
Subject: [Numpy-discussion] Tracking and inspecting numpy objects
In-Reply-To: <CAO0rnfGfQzUDnUnHyJ_HsRuMzC2stORJX_XWuVPYQVaRiYX3HA@mail.gmail.com>
References: <5416A05A.2000905@gmail.com>
	<1410774912.7696.13.camel@sebastian-t440>
	<CAO0rnfGPsgLfmLXFcYQKi_gKW0NOMs-3-kSAVz=qFpW3iYoyzQ@mail.gmail.com>
	<1410775915.8320.1.camel@sebastian-t440> <5416C827.4090605@gmail.com>
	<CAO0rnfGfQzUDnUnHyJ_HsRuMzC2stORJX_XWuVPYQVaRiYX3HA@mail.gmail.com>
Message-ID: <CAF6FJiu-CHuL9U2P-jU_h9AGJk8mO0q1KPo33MvrWES_vUoU1w@mail.gmail.com>

On Mon, Sep 15, 2014 at 12:31 PM, Eelco Hoogendoorn
<hoogendoorn.eelco at gmail.com> wrote:
> I figured the reference to the object through the local scope would also be
> tracked by the GC somehow, since the entire stack frame can be regarded as a
> separate object itself, but apparently not.

Objects are "tracked", not references. An object is only considered
"tracked" by the cyclic GC if the cyclic GC needs to traverse the
object's referents when looking for cycles. It is not necessarily
"tracked" just because it is referred to by an object that is
"tracked". If an object does not refer to other objects, *OR* if such
references cannot create a cycle for whatever reason, then the object
does not need to be tracked by the GC. Most ndarrays fall in that
second category. They do have references to their base ndarray, if a
view, and their dtype object, but under most circumstances these
cannot cause a cycle. As Sebastian mentioned, though, dtype=object
arrays can create cycles and should be tracked. We need to implement
the "tp_traverse" slot in order to do so.

-- 
Robert Kern


From josef.pktd at gmail.com  Mon Sep 15 08:08:08 2014
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 15 Sep 2014 08:08:08 -0400
Subject: [Numpy-discussion] Linear algebra functions on empty arrays
In-Reply-To: <1410780365.8320.9.camel@sebastian-t440>
References: <1410774528.7696.9.camel@sebastian-t440>
	<CAMMTP+CzKygODHoKpS+2TOZ8fNEnEuB3tmfB4GciSFFWbs7_FQ@mail.gmail.com>
	<1410780365.8320.9.camel@sebastian-t440>
Message-ID: <CAMMTP+A_WQ4oA7w_Qy07Mb=0737OEy8R0SMAs190gMCV7rSvcA@mail.gmail.com>

On Mon, Sep 15, 2014 at 7:26 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> On Mo, 2014-09-15 at 07:07 -0400, josef.pktd at gmail.com wrote:
>> On Mon, Sep 15, 2014 at 5:48 AM, Sebastian Berg
>> <sebastian at sipsolutions.net> wrote:
>> > Hey all,
>> >
>> > for https://github.com/numpy/numpy/pull/3861/files I would like to allow
>> > 0-sized dimensions for generalized ufuncs, meaning that the gufunc has
>> > to be able to handle this, but also that it *can* handle it at all.
>> > However lapack does not support this, so it needs some explicit fixing.
>> > Also some of the linalg functions currently explicitly allow and others
>> > explicitly disallow empty arrays.
>> >
>> > For example the QR and eigvals does not allow it, but on the other hand
>> > solve explicitly does (most probably never did, simply because lapack
>> > does not). So I am wondering if there is some convention for this, or
>> > what convention we should implement.
>>
>> What does an empty square matrix/array look like?
>>
>> np.linalg.solve   can have empty rhs, but shape of empty lhs, `a`, is ?
>>
>> If I do a QR(arr)  with arr.shape=(0, 5), what is R supposed to be ?
>>
>
> QR may be more difficult since R may itself could not be empty, begging
> the question if you want to error out or fill it sensibly.

I shouldn't have tried it again (I got this a few times last week):

>>> ze = np.ones((z.shape[1], 0))
>>> np.linalg.qr(ze)
 ** On entry to DGEQRF parameter number  7 had an illegal value

crash

z.shape[1]  is 3
>>> np.__version__
'1.6.1'

I think, I would prefer an exception if the output would require a
empty square matrix with shape > (0, 0)
I don't see any useful fill value.


> Cholesky would require (0, 0) for example and for eigenvalues it would
> somewhat make sense too, the (0, 0) matrix has 0 eigenvalues.
> I did not go through them all, but I would like to figure out whether we
> should aim to generally allow it, or maybe just allow it for some
> special ones.

If the return square array has shape (0, 0), then it would make sense,
but I haven't run into a case for it yet.

np.cholesky(np.ones((0, 0)))  ?
(I didn't try since my interpreter is crashed. :)

Josef

>
> - Sebastian
>
>>
>> I just wrote some loops over linalg.qr, but I always initialized explicitly.
>>
>> I didn't manage to figure out how empty arrays would be useful.
>>
>> If an empty square matrix can only only be of shape (0, 0), then it's
>> no use (in my applications).
>>
>>
>> Josef
>>
>>
>> >
>> > Regards,
>> >
>> > Sebastian
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From ben.root at ou.edu  Mon Sep 15 09:50:51 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Mon, 15 Sep 2014 09:50:51 -0400
Subject: [Numpy-discussion] Question about broadcasting vs for loop
	performance
In-Reply-To: <CAFbt6ddL87Bt_3CCRGp-bgH2K+w+m4y1TNCctvETOMmovp2UaQ@mail.gmail.com>
References: <CAFbt6dcSa_OAZ8DLzvWjwCE3W5m3bxRMD88AQTdLQDED-akuAA@mail.gmail.com>
	<CAFbt6ddL87Bt_3CCRGp-bgH2K+w+m4y1TNCctvETOMmovp2UaQ@mail.gmail.com>
Message-ID: <CANNq6FnkDAHrVQ6JWabY=3fM-h8GSOJn-tveRJ2bXCxc=XOZdQ@mail.gmail.com>

Broadcasting, by itself, should not be creating large arrays in memory. It
uses stride tricks to make the array appear larger, while simply reusing
the same memory block. This is why it is so valuable because it doesn't
make a copy.

Now, what may be happening is that the resulting calculation from the
broadcasted arrays is too large to easily fit into the cpu cache, so the
subsequent summation might be hitting performance penalties for that.
Essentially, your first example may be a poor-man's implementation of data
chunking. I bet if you ran these performance metrics over a wide range of
sizes, you will see some interesting results.

Cheers!
Ben Root


On Sun, Sep 14, 2014 at 10:53 PM, Ryan Nelson <rnelsonchem at gmail.com> wrote:

> I think I figured out my own question. I guess that the broadcasting
> approach is generating a very large 2D array in memory, which takes a bit
> of extra time. I gathered this from reading the last example on the
> following site:
> http://wiki.scipy.org/EricsBroadcastingDoc
> I tried this again with a much smaller "xs" array (~100 points) and the
> broadcasting version was much faster.
> Thanks
>
> Ryan
>
> Note: The link to the Scipy wiki page above is broken at the bottom of
> Numpy's broadcasting page, otherwise I would have seen that earlier. Sorry
> for the noise.
>
> On Sun, Sep 14, 2014 at 10:22 PM, Ryan Nelson <rnelsonchem at gmail.com>
> wrote:
>
>> Hello all,
>>
>> I have a question about the performance of broadcasting versus Python for
>> loops. I have the following sample code that approximates some simulation
>> I'd like to do:
>>
>> ## Test Code ##
>>
>> import numpy as np
>>
>>
>> def lorentz(x, pos, inten, hwhm):
>>
>>     return inten*( hwhm**2 / ( (x - pos)**2 + hwhm**2 ) )
>>
>>
>> poss = np.random.rand(100)
>>
>> intens = np.random.rand(100)
>>
>> xs = np.linspace(0,10,10000)
>>
>>
>> def first_try():
>>
>>     sim_inten = np.zeros(xs.shape)
>>
>>     for freq, inten in zip(poss, intens):
>>
>>         sim_inten += lorentz(xs, freq, inten, 5.0)
>>
>>     return sim_inten
>>
>>
>> def second_try():
>>
>>     sim_inten2 = lorentz(xs.reshape((-1,1)), poss, intens, 5.0)
>>
>>     sim_inten2 = sim_inten2.sum(axis=1)
>>
>>     return sim_inten2
>>
>>
>> print np.array_equal(first_try(), second_try())
>>
>>
>> ## End Test ##
>>
>>
>> Running this script prints "True" for the final equality test. However,
>> IPython's %timeit magic, gives ~10 ms for first_try and ~30 ms for
>> second_try. I tried this on Windows 7 (Anaconda Python) and on a Linux
>> machine both with Python 2.7 and Numpy 1.8.2.
>>
>>
>> I understand in principle why broadcasting should be faster than Python
>> loops, but I'm wondering why I'm getting worse results with the pure Numpy
>> function. Is there some general rules for when broadcasting might give
>> worse performance than a Python loop?
>>
>>
>> Thanks
>>
>>
>> Ryan
>>
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140915/fe2d18c4/attachment.html>

From charlesr.harris at gmail.com  Tue Sep 16 15:27:32 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 16 Sep 2014 13:27:32 -0600
Subject: [Numpy-discussion] Is this a bug?
Message-ID: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>

Hi All,

It turns out that gufuncs will broadcast the last dimension if it is one.
For instance, inner1d has signature `(n), (n) -> ()`, yet

In [27]: inner1d([1,1,1], [1])
Out[27]: 3

In [28]: inner1d([1,1,1], [1,1])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-e53e62e35349> in <module>()
----> 1 inner1d([1,1,1], [1,1])

ValueError: inner1d: Operand 1 has a mismatch in its core dimension 0, with
gufunc signature (i),(i)->() (size 2 is different from 3)


I'd think this is a bug, as the dimensions should match. Note that scalar 1
will be promoted to [1] in this case.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/1d8bffdd/attachment.html>

From charlesr.harris at gmail.com  Tue Sep 16 15:39:01 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 16 Sep 2014 13:39:01 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
Message-ID: <CAB6mnx+YpDT=9GcfC06LsjOB4Mfo1K63sqxR16DagVgHhwK6zQ@mail.gmail.com>

On Tue, Sep 16, 2014 at 1:27 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

> Hi All,
>
> It turns out that gufuncs will broadcast the last dimension if it is one.
> For instance, inner1d has signature `(n), (n) -> ()`, yet
>
> In [27]: inner1d([1,1,1], [1])
> Out[27]: 3
>
> In [28]: inner1d([1,1,1], [1,1])
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-28-e53e62e35349> in <module>()
> ----> 1 inner1d([1,1,1], [1,1])
>
> ValueError: inner1d: Operand 1 has a mismatch in its core dimension 0,
> with gufunc signature (i),(i)->() (size 2 is different from 3)
>
>
> I'd think this is a bug, as the dimensions should match. Note that scalar
> 1 will be promoted to [1] in this case.
>
> Thoughts?
>
>
This also holds for matrix_multiply

In [33]: matrix_multiply(eye(3), [[1]])
Out[33]:
array([[ 1.],
       [ 1.],
       [ 1.]])

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/a8f4c2d5/attachment.html>

From njs at pobox.com  Tue Sep 16 15:42:43 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 16 Sep 2014 15:42:43 -0400
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
Message-ID: <CAPJVwB=JyjN+ew180BoahvSwwU4Z732uPOdJUBCJ2Zhbr2dJAw@mail.gmail.com>

On Tue, Sep 16, 2014 at 3:27 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Hi All,
>
> It turns out that gufuncs will broadcast the last dimension if it is one.
> For instance, inner1d has signature `(n), (n) -> ()`, yet
>
> In [27]: inner1d([1,1,1], [1])
> Out[27]: 3

Yes, this looks totally wrong to me too... broadcasting is a feature
of auto-vectorizing a core operation over a set of dimensions, it
shouldn't be applied to the dimensions of the core operation itself
like this.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From josef.pktd at gmail.com  Tue Sep 16 15:55:41 2014
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 16 Sep 2014 15:55:41 -0400
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPJVwB=JyjN+ew180BoahvSwwU4Z732uPOdJUBCJ2Zhbr2dJAw@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPJVwB=JyjN+ew180BoahvSwwU4Z732uPOdJUBCJ2Zhbr2dJAw@mail.gmail.com>
Message-ID: <CAMMTP+CgQ1X_8k2K=KqaVJhUfteZRNbw+-zqrNAZFdzg5yyebw@mail.gmail.com>

On Tue, Sep 16, 2014 at 3:42 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Sep 16, 2014 at 3:27 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>> Hi All,
>>
>> It turns out that gufuncs will broadcast the last dimension if it is one.
>> For instance, inner1d has signature `(n), (n) -> ()`, yet
>>
>> In [27]: inner1d([1,1,1], [1])
>> Out[27]: 3
>
> Yes, this looks totally wrong to me too... broadcasting is a feature
> of auto-vectorizing a core operation over a set of dimensions, it
> shouldn't be applied to the dimensions of the core operation itself
> like this.

Are these functions doing any numerical shortcuts in this case?

If yes, this would be convenient.

inner1d(x, weights)   with weights is either (n, ) or ()

if weights == 1:
    return x.sum()
else:
    return inner1d(x, weights)

Josef

>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From njs at pobox.com  Tue Sep 16 16:04:08 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 16 Sep 2014 16:04:08 -0400
Subject: [Numpy-discussion] Linear algebra functions on empty arrays
In-Reply-To: <1410774528.7696.9.camel@sebastian-t440>
References: <1410774528.7696.9.camel@sebastian-t440>
Message-ID: <CAPJVwB=BdXZT-biV_zwv=5ud7uqK3TB+fBa9f1=Orjy_Mos6Lg@mail.gmail.com>

On 15 Sep 2014 05:49, "Sebastian Berg" <sebastian at sipsolutions.net> wrote:
> For example the QR and eigvals does not allow it, but on the other hand
> solve explicitly does (most probably never did, simply because lapack
> does not). So I am wondering if there is some convention for this, or
> what convention we should implement.

To me the obvious convention would be that whenever there's a unique
obvious answer that satisfies the operation's invariants, then we
should prefer to implement it (though possibly with low priority),
even if this means papering over lapack edge cases. This is consistent
with how e.g. we already define sum([]) and prod([]) and empty matrix
products, etc.

Of course this requires some thinking... e.g. the empty matrix is a
null matrix, b/c given
   empty_vec = np.ones((0,))
   empty_mat = np.ones((0, 0))
then we have
   empty_vec @ empty_mat @ empty_vec = empty_vec @ empty_vec = sum([]) = 0

and therefore empty_mat is not positive definite. np.linalg.cholesky
raises an error on non-positive-definite matrices in general (e.g. try
np.linalg.cholesky(np.zeros((1, 1)))), so I guess cholesky shouldn't
handle empty matrices.

For eigvals, I guess empty_mat @ empty_vec = empty_vec, meaning that
empty_vec is a arguably an eigenvector with some indeterminate
eigenvalue? Or maybe the fact that scalar * empty_vec = empty_vec for
ever scalar means that empty_vec should be counted as a zero vector,
and thus be ineligible to be an eigenvector. Saying that the empty
matrix has zero eigenvectors or eigenvalues seems pretty intuitive.

I don't see any trouble with defining qr for empty matrices either.

-n


From charlesr.harris at gmail.com  Tue Sep 16 16:10:33 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 16 Sep 2014 14:10:33 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAMMTP+CgQ1X_8k2K=KqaVJhUfteZRNbw+-zqrNAZFdzg5yyebw@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPJVwB=JyjN+ew180BoahvSwwU4Z732uPOdJUBCJ2Zhbr2dJAw@mail.gmail.com>
	<CAMMTP+CgQ1X_8k2K=KqaVJhUfteZRNbw+-zqrNAZFdzg5yyebw@mail.gmail.com>
Message-ID: <CAB6mnx+jAyHcxko+6ePjJVigeZk==Ftv-1kGsrF2ZjcsRGRaLg@mail.gmail.com>

On Tue, Sep 16, 2014 at 1:55 PM, <josef.pktd at gmail.com> wrote:

> On Tue, Sep 16, 2014 at 3:42 PM, Nathaniel Smith <njs at pobox.com> wrote:
> > On Tue, Sep 16, 2014 at 3:27 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >> Hi All,
> >>
> >> It turns out that gufuncs will broadcast the last dimension if it is
> one.
> >> For instance, inner1d has signature `(n), (n) -> ()`, yet
> >>
> >> In [27]: inner1d([1,1,1], [1])
> >> Out[27]: 3
> >
> > Yes, this looks totally wrong to me too... broadcasting is a feature
> > of auto-vectorizing a core operation over a set of dimensions, it
> > shouldn't be applied to the dimensions of the core operation itself
> > like this.
>
> Are these functions doing any numerical shortcuts in this case?
>
> If yes, this would be convenient.
>
> inner1d(x, weights)   with weights is either (n, ) or ()
>
> if weights == 1:
>     return x.sum()
> else:
>     return inner1d(x, weights)
>
>
That depends on the inner inner loop ;) Currently inner1d inner loop
multiplies and adds so not as efficient as a sum in the scalar case.
However, it is probably faster than an if statement.

In [4]: timeit inner1d(a, 1)
10000 loops, best of 3: 56.4 ?s per loop

In [5]: timeit a.sum()
10000 loops, best of 3: 48.3 ?s per loop

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/dbc0797c/attachment.html>

From njs at pobox.com  Tue Sep 16 16:26:41 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 16 Sep 2014 16:26:41 -0400
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAMMTP+CgQ1X_8k2K=KqaVJhUfteZRNbw+-zqrNAZFdzg5yyebw@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPJVwB=JyjN+ew180BoahvSwwU4Z732uPOdJUBCJ2Zhbr2dJAw@mail.gmail.com>
	<CAMMTP+CgQ1X_8k2K=KqaVJhUfteZRNbw+-zqrNAZFdzg5yyebw@mail.gmail.com>
Message-ID: <CAPJVwB=BJ8OZdwVD6SJyXc76UJH0+39XFR0jeeASj1XAzLdvow@mail.gmail.com>

On Tue, Sep 16, 2014 at 3:55 PM,  <josef.pktd at gmail.com> wrote:
> On Tue, Sep 16, 2014 at 3:42 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Tue, Sep 16, 2014 at 3:27 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>> Hi All,
>>>
>>> It turns out that gufuncs will broadcast the last dimension if it is one.
>>> For instance, inner1d has signature `(n), (n) -> ()`, yet
>>>
>>> In [27]: inner1d([1,1,1], [1])
>>> Out[27]: 3
>>
>> Yes, this looks totally wrong to me too... broadcasting is a feature
>> of auto-vectorizing a core operation over a set of dimensions, it
>> shouldn't be applied to the dimensions of the core operation itself
>> like this.
>
> Are these functions doing any numerical shortcuts in this case?
>
> If yes, this would be convenient.
>
> inner1d(x, weights)   with weights is either (n, ) or ()
>
> if weights == 1:
>     return x.sum()
> else:
>     return inner1d(x, weights)

Yes, if this is the behaviour you want then I think you should write
this if statement :-). This case isn't general enough to build
directly into inner1d IMHO.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From jaime.frio at gmail.com  Tue Sep 16 16:31:59 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Tue, 16 Sep 2014 13:31:59 -0700
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
Message-ID: <CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>

On Tue, Sep 16, 2014 at 12:27 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

> Hi All,
>
> It turns out that gufuncs will broadcast the last dimension if it is one.
> For instance, inner1d has signature `(n), (n) -> ()`, yet
>
> In [27]: inner1d([1,1,1], [1])
> Out[27]: 3
>
> In [28]: inner1d([1,1,1], [1,1])
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-28-e53e62e35349> in <module>()
> ----> 1 inner1d([1,1,1], [1,1])
>
> ValueError: inner1d: Operand 1 has a mismatch in its core dimension 0,
> with gufunc signature (i),(i)->() (size 2 is different from 3)
>
>
> I'd think this is a bug, as the dimensions should match. Note that scalar
> 1 will be promoted to [1] in this case.
>
> Thoughts?
>

If it is a bug, it is an extended one, because it is the same behavior of
einsum:

>>> np.einsum('i,i', [1,1,1], [1])
3
>>> np.einsum('i,i', [1,1,1], [1,1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with remapped shapes
[origi
nal->remapped]: (3,)->(3,) (2,)->(2,)

And I think it is a conscious design decision, there is a comment about
broadcasting missing core dimensions here:


https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940

and the code makes it very explicit that input argument dimensions with the
same label are broadcast to a common shape, see here:


https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956

I kind of expect numpy to broadcast whenever possible, so this doesn't feel
wrong to me.

That said, it is hard to come up with convincing examples of how this
behavior would be useful in any practical context. But changing something
that has been working like that for so long seems like a risky thing. And I
cannot come with a convincing example of why it would be harmful either.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/41a4faed/attachment.html>

From njs at pobox.com  Tue Sep 16 16:51:35 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 16 Sep 2014 16:51:35 -0400
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
Message-ID: <CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>

On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
<jaime.frio at gmail.com> wrote:
> If it is a bug, it is an extended one, because it is the same behavior of
> einsum:
>
>>>> np.einsum('i,i', [1,1,1], [1])
> 3
>>>> np.einsum('i,i', [1,1,1], [1,1])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: operands could not be broadcast together with remapped shapes
> [origi
> nal->remapped]: (3,)->(3,) (2,)->(2,)
>
> And I think it is a conscious design decision, there is a comment about
> broadcasting missing core dimensions here:
>
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940

"intentional" and "sensible" are not always the same thing :-). That
said, it isn't totally obvious to me what the correct behaviour for
einsum is in this case.

> and the code makes it very explicit that input argument dimensions with the
> same label are broadcast to a common shape, see here:
>
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
>
> I kind of expect numpy to broadcast whenever possible, so this doesn't feel
> wrong to me.

The case Chuck is talking about is like if we allowed matrix
multiplication between an array with shape (n, 1) with an array with
(k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
wrong to me, will certainly hide many bugs, and is definitely not how
it works right now (for np.dot, anyway; apparently it does work that
way for the brand-new gufunc np.linalg.matrix_multiply, but this must
be an accident).

> That said, it is hard to come up with convincing examples of how this
> behavior would be useful in any practical context. But changing something
> that has been working like that for so long seems like a risky thing. And I
> cannot come with a convincing example of why it would be harmful either.

gufuncs are very new.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From charlesr.harris at gmail.com  Tue Sep 16 18:26:23 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 16 Sep 2014 16:26:23 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
Message-ID: <CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>

On Tue, Sep 16, 2014 at 2:51 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
> <jaime.frio at gmail.com> wrote:
> > If it is a bug, it is an extended one, because it is the same behavior of
> > einsum:
> >
> >>>> np.einsum('i,i', [1,1,1], [1])
> > 3
> >>>> np.einsum('i,i', [1,1,1], [1,1])
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > ValueError: operands could not be broadcast together with remapped shapes
> > [origi
> > nal->remapped]: (3,)->(3,) (2,)->(2,)
> >
> > And I think it is a conscious design decision, there is a comment about
> > broadcasting missing core dimensions here:
> >
> >
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
>
> "intentional" and "sensible" are not always the same thing :-). That
> said, it isn't totally obvious to me what the correct behaviour for
> einsum is in this case.
>
> > and the code makes it very explicit that input argument dimensions with
> the
> > same label are broadcast to a common shape, see here:
> >
> >
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
> >
> > I kind of expect numpy to broadcast whenever possible, so this doesn't
> feel
> > wrong to me.
>
> The case Chuck is talking about is like if we allowed matrix
> multiplication between an array with shape (n, 1) with an array with
> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
> wrong to me, will certainly hide many bugs, and is definitely not how
> it works right now (for np.dot, anyway; apparently it does work that
> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
> be an accident).
>
> > That said, it is hard to come up with convincing examples of how this
> > behavior would be useful in any practical context. But changing something
> > that has been working like that for so long seems like a risky thing.
> And I
> > cannot come with a convincing example of why it would be harmful either.
>
> gufuncs are very new.
>
>
Or at least newly used. They've been sitting around for years with little
use and less testing. This is probably (easily?) fixable as the shape of
the operands is available.

In [22]: [d.shape for d in nditer([[1,1,1], [[1,1,1]]*3]).operands]
Out[22]: [(3,), (3, 3)]

In [23]: [d.shape for d in nditer([[[1,1,1]], [[1,1,1]]*3]).operands]
Out[23]: [(1, 3), (3, 3)]

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/212866ad/attachment.html>

From jaime.frio at gmail.com  Tue Sep 16 18:56:47 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Tue, 16 Sep 2014 15:56:47 -0700
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
Message-ID: <CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>

On Tue, Sep 16, 2014 at 3:26 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Tue, Sep 16, 2014 at 2:51 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
>> <jaime.frio at gmail.com> wrote:
>> > If it is a bug, it is an extended one, because it is the same behavior
>> of
>> > einsum:
>> >
>> >>>> np.einsum('i,i', [1,1,1], [1])
>> > 3
>> >>>> np.einsum('i,i', [1,1,1], [1,1])
>> > Traceback (most recent call last):
>> >   File "<stdin>", line 1, in <module>
>> > ValueError: operands could not be broadcast together with remapped
>> shapes
>> > [origi
>> > nal->remapped]: (3,)->(3,) (2,)->(2,)
>> >
>> > And I think it is a conscious design decision, there is a comment about
>> > broadcasting missing core dimensions here:
>> >
>> >
>> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
>>
>> "intentional" and "sensible" are not always the same thing :-). That
>> said, it isn't totally obvious to me what the correct behaviour for
>> einsum is in this case.
>>
>> > and the code makes it very explicit that input argument dimensions with
>> the
>> > same label are broadcast to a common shape, see here:
>> >
>> >
>> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
>> >
>> > I kind of expect numpy to broadcast whenever possible, so this doesn't
>> feel
>> > wrong to me.
>>
>> The case Chuck is talking about is like if we allowed matrix
>> multiplication between an array with shape (n, 1) with an array with
>> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
>> wrong to me, will certainly hide many bugs, and is definitely not how
>> it works right now (for np.dot, anyway; apparently it does work that
>> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
>> be an accident).
>>
>> > That said, it is hard to come up with convincing examples of how this
>> > behavior would be useful in any practical context. But changing
>> something
>> > that has been working like that for so long seems like a risky thing.
>> And I
>> > cannot come with a convincing example of why it would be harmful either.
>>
>> gufuncs are very new.
>>
>>
> Or at least newly used. They've been sitting around for years with little
> use and less testing. This is probably (easily?) fixable as the shape of
> the operands is available.
>
> In [22]: [d.shape for d in nditer([[1,1,1], [[1,1,1]]*3]).operands]
> Out[22]: [(3,), (3, 3)]
>
> In [23]: [d.shape for d in nditer([[[1,1,1]], [[1,1,1]]*3]).operands]
> Out[23]: [(1, 3), (3, 3)]
>
>
If we agree that it is broken, which I still am not fully sure of, then
yes, it is very easy to fix. I have been looking into that code quite a bit
lately, so I could patch something up pretty quick.

Are we OK with the appending of size 1 dimensions to complete the core
dimensions? That is, should matrix_multiply([1,1,1], [[1],[1],[1]]) work,
or should it complain about the first argument having less dimensions than
the core dimensions in the signature?

Lastly, there is an interesting side effect of the way this broadcasting is
handled: if a gufunc specifies a core dimension in an output argument only,
and an `out` kwarg is not passed in, then the output array will have that
core dimension set to be of size 1, e.g. if the signature of `f` is
'(),()->(a)', then f(1, 2).shape is (1,). This has always felt funny to me,
and I think that an unspecified dimension in an output array should either
be specified by a passed out array, or raise an error about an unspecified
core dimension or something like that. Does this sound right?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/c2cebb6f/attachment.html>

From ewm at redtetrahedron.org  Tue Sep 16 19:03:27 2014
From: ewm at redtetrahedron.org (Eric Moore)
Date: Tue, 16 Sep 2014 19:03:27 -0400
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
	<CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
Message-ID: <CAGeA38k4k5_7py+P+zFTRsGYxJpAjF-WuQeeJA_xKpxJN6qsxQ@mail.gmail.com>

On Tuesday, September 16, 2014, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Tue, Sep 16, 2014 at 3:26 PM, Charles R Harris <
> charlesr.harris at gmail.com
> <javascript:_e(%7B%7D,'cvml','charlesr.harris at gmail.com');>> wrote:
>
>>
>>
>> On Tue, Sep 16, 2014 at 2:51 PM, Nathaniel Smith <njs at pobox.com
>> <javascript:_e(%7B%7D,'cvml','njs at pobox.com');>> wrote:
>>
>>> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
>>> <jaime.frio at gmail.com
>>> <javascript:_e(%7B%7D,'cvml','jaime.frio at gmail.com');>> wrote:
>>> > If it is a bug, it is an extended one, because it is the same behavior
>>> of
>>> > einsum:
>>> >
>>> >>>> np.einsum('i,i', [1,1,1], [1])
>>> > 3
>>> >>>> np.einsum('i,i', [1,1,1], [1,1])
>>> > Traceback (most recent call last):
>>> >   File "<stdin>", line 1, in <module>
>>> > ValueError: operands could not be broadcast together with remapped
>>> shapes
>>> > [origi
>>> > nal->remapped]: (3,)->(3,) (2,)->(2,)
>>> >
>>> > And I think it is a conscious design decision, there is a comment about
>>> > broadcasting missing core dimensions here:
>>> >
>>> >
>>> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
>>>
>>> "intentional" and "sensible" are not always the same thing :-). That
>>> said, it isn't totally obvious to me what the correct behaviour for
>>> einsum is in this case.
>>>
>>> > and the code makes it very explicit that input argument dimensions
>>> with the
>>> > same label are broadcast to a common shape, see here:
>>> >
>>> >
>>> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
>>> >
>>> > I kind of expect numpy to broadcast whenever possible, so this doesn't
>>> feel
>>> > wrong to me.
>>>
>>> The case Chuck is talking about is like if we allowed matrix
>>> multiplication between an array with shape (n, 1) with an array with
>>> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
>>> wrong to me, will certainly hide many bugs, and is definitely not how
>>> it works right now (for np.dot, anyway; apparently it does work that
>>> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
>>> be an accident).
>>>
>>> > That said, it is hard to come up with convincing examples of how this
>>> > behavior would be useful in any practical context. But changing
>>> something
>>> > that has been working like that for so long seems like a risky thing.
>>> And I
>>> > cannot come with a convincing example of why it would be harmful
>>> either.
>>>
>>> gufuncs are very new.
>>>
>>>
>> Or at least newly used. They've been sitting around for years with little
>> use and less testing. This is probably (easily?) fixable as the shape of
>> the operands is available.
>>
>> In [22]: [d.shape for d in nditer([[1,1,1], [[1,1,1]]*3]).operands]
>> Out[22]: [(3,), (3, 3)]
>>
>> In [23]: [d.shape for d in nditer([[[1,1,1]], [[1,1,1]]*3]).operands]
>> Out[23]: [(1, 3), (3, 3)]
>>
>>
> If we agree that it is broken, which I still am not fully sure of, then
> yes, it is very easy to fix. I have been looking into that code quite a bit
> lately, so I could patch something up pretty quick.
>
> Are we OK with the appending of size 1 dimensions to complete the core
> dimensions? That is, should matrix_multiply([1,1,1], [[1],[1],[1]]) work,
> or should it complain about the first argument having less dimensions than
> the core dimensions in the signature?
>
> Lastly, there is an interesting side effect of the way this broadcasting
> is handled: if a gufunc specifies a core dimension in an output argument
> only, and an `out` kwarg is not passed in, then the output array will have
> that core dimension set to be of size 1, e.g. if the signature of `f` is
> '(),()->(a)', then f(1, 2).shape is (1,). This has always felt funny to me,
> and I think that an unspecified dimension in an output array should either
> be specified by a passed out array, or raise an error about an unspecified
> core dimension or something like that. Does this sound right?
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
>

Given this and the earlier discussion about improvements to this code, I
wonder if it wouldn't be worth implemented the logic in python first. This
way there is something to test against, and something to play while all of
the cases are sorted out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/6e70c541/attachment.html>

From charlesr.harris at gmail.com  Tue Sep 16 19:07:02 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 16 Sep 2014 17:07:02 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
	<CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
Message-ID: <CAB6mnxKRGM7RqKOBvk9xznF7m2Fw24sKv499qvEkPL4K2VMbng@mail.gmail.com>

On Tue, Sep 16, 2014 at 4:56 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Tue, Sep 16, 2014 at 3:26 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Tue, Sep 16, 2014 at 2:51 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>>> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
>>> <jaime.frio at gmail.com> wrote:
>>> > If it is a bug, it is an extended one, because it is the same behavior
>>> of
>>> > einsum:
>>> >
>>> >>>> np.einsum('i,i', [1,1,1], [1])
>>> > 3
>>> >>>> np.einsum('i,i', [1,1,1], [1,1])
>>> > Traceback (most recent call last):
>>> >   File "<stdin>", line 1, in <module>
>>> > ValueError: operands could not be broadcast together with remapped
>>> shapes
>>> > [origi
>>> > nal->remapped]: (3,)->(3,) (2,)->(2,)
>>> >
>>> > And I think it is a conscious design decision, there is a comment about
>>> > broadcasting missing core dimensions here:
>>> >
>>> >
>>> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
>>>
>>> "intentional" and "sensible" are not always the same thing :-). That
>>> said, it isn't totally obvious to me what the correct behaviour for
>>> einsum is in this case.
>>>
>>> > and the code makes it very explicit that input argument dimensions
>>> with the
>>> > same label are broadcast to a common shape, see here:
>>> >
>>> >
>>> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
>>> >
>>> > I kind of expect numpy to broadcast whenever possible, so this doesn't
>>> feel
>>> > wrong to me.
>>>
>>> The case Chuck is talking about is like if we allowed matrix
>>> multiplication between an array with shape (n, 1) with an array with
>>> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
>>> wrong to me, will certainly hide many bugs, and is definitely not how
>>> it works right now (for np.dot, anyway; apparently it does work that
>>> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
>>> be an accident).
>>>
>>> > That said, it is hard to come up with convincing examples of how this
>>> > behavior would be useful in any practical context. But changing
>>> something
>>> > that has been working like that for so long seems like a risky thing.
>>> And I
>>> > cannot come with a convincing example of why it would be harmful
>>> either.
>>>
>>> gufuncs are very new.
>>>
>>>
>> Or at least newly used. They've been sitting around for years with little
>> use and less testing. This is probably (easily?) fixable as the shape of
>> the operands is available.
>>
>> In [22]: [d.shape for d in nditer([[1,1,1], [[1,1,1]]*3]).operands]
>> Out[22]: [(3,), (3, 3)]
>>
>> In [23]: [d.shape for d in nditer([[[1,1,1]], [[1,1,1]]*3]).operands]
>> Out[23]: [(1, 3), (3, 3)]
>>
>>
> If we agree that it is broken, which I still am not fully sure of, then
> yes, it is very easy to fix. I have been looking into that code quite a bit
> lately, so I could patch something up pretty quick.
>

That would be nice... I've been starting to look through the code and
didn't relish it.

>
> Are we OK with the appending of size 1 dimensions to complete the core
> dimensions? That is, should matrix_multiply([1,1,1], [[1],[1],[1]]) work,
> or should it complain about the first argument having less dimensions than
> the core dimensions in the signature?
>

Yes, I think we need to keep that part. It is even essential ;)


> Lastly, there is an interesting side effect of the way this broadcasting
> is handled: if a gufunc specifies a core dimension in an output argument
> only, and an `out` kwarg is not passed in, then the output array will have
> that core dimension set to be of size 1, e.g. if the signature of `f` is
> '(),()->(a)', then f(1, 2).shape is (1,). This has always felt funny to me,
> and I think that an unspecified dimension in an output array should either
> be specified by a passed out array, or raise an error about an unspecified
> core dimension or something like that. Does this sound right?
>

Uh, I need to get my head around that before commenting.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/821fc819/attachment.html>

From charlesr.harris at gmail.com  Tue Sep 16 19:11:07 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 16 Sep 2014 17:11:07 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAGeA38k4k5_7py+P+zFTRsGYxJpAjF-WuQeeJA_xKpxJN6qsxQ@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
	<CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
	<CAGeA38k4k5_7py+P+zFTRsGYxJpAjF-WuQeeJA_xKpxJN6qsxQ@mail.gmail.com>
Message-ID: <CAB6mnxKb62Sxx0YXfV9n+dfpX+59w8rx7x1c7FLf0X2g9j0p9g@mail.gmail.com>

On Tue, Sep 16, 2014 at 5:03 PM, Eric Moore <ewm at redtetrahedron.org> wrote:

>
>
> On Tuesday, September 16, 2014, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Tue, Sep 16, 2014 at 3:26 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Sep 16, 2014 at 2:51 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>
>>>> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
>>>> <jaime.frio at gmail.com> wrote:
>>>> > If it is a bug, it is an extended one, because it is the same
>>>> behavior of
>>>> > einsum:
>>>> >
>>>> >>>> np.einsum('i,i', [1,1,1], [1])
>>>> > 3
>>>> >>>> np.einsum('i,i', [1,1,1], [1,1])
>>>> > Traceback (most recent call last):
>>>> >   File "<stdin>", line 1, in <module>
>>>> > ValueError: operands could not be broadcast together with remapped
>>>> shapes
>>>> > [origi
>>>> > nal->remapped]: (3,)->(3,) (2,)->(2,)
>>>> >
>>>> > And I think it is a conscious design decision, there is a comment
>>>> about
>>>> > broadcasting missing core dimensions here:
>>>> >
>>>> >
>>>> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
>>>>
>>>> "intentional" and "sensible" are not always the same thing :-). That
>>>> said, it isn't totally obvious to me what the correct behaviour for
>>>> einsum is in this case.
>>>>
>>>> > and the code makes it very explicit that input argument dimensions
>>>> with the
>>>> > same label are broadcast to a common shape, see here:
>>>> >
>>>> >
>>>> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
>>>> >
>>>> > I kind of expect numpy to broadcast whenever possible, so this
>>>> doesn't feel
>>>> > wrong to me.
>>>>
>>>> The case Chuck is talking about is like if we allowed matrix
>>>> multiplication between an array with shape (n, 1) with an array with
>>>> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
>>>> wrong to me, will certainly hide many bugs, and is definitely not how
>>>> it works right now (for np.dot, anyway; apparently it does work that
>>>> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
>>>> be an accident).
>>>>
>>>> > That said, it is hard to come up with convincing examples of how this
>>>> > behavior would be useful in any practical context. But changing
>>>> something
>>>> > that has been working like that for so long seems like a risky thing.
>>>> And I
>>>> > cannot come with a convincing example of why it would be harmful
>>>> either.
>>>>
>>>> gufuncs are very new.
>>>>
>>>>
>>> Or at least newly used. They've been sitting around for years with
>>> little use and less testing. This is probably (easily?) fixable as the
>>> shape of the operands is available.
>>>
>>> In [22]: [d.shape for d in nditer([[1,1,1], [[1,1,1]]*3]).operands]
>>> Out[22]: [(3,), (3, 3)]
>>>
>>> In [23]: [d.shape for d in nditer([[[1,1,1]], [[1,1,1]]*3]).operands]
>>> Out[23]: [(1, 3), (3, 3)]
>>>
>>>
>> If we agree that it is broken, which I still am not fully sure of, then
>> yes, it is very easy to fix. I have been looking into that code quite a bit
>> lately, so I could patch something up pretty quick.
>>
>> Are we OK with the appending of size 1 dimensions to complete the core
>> dimensions? That is, should matrix_multiply([1,1,1], [[1],[1],[1]]) work,
>> or should it complain about the first argument having less dimensions than
>> the core dimensions in the signature?
>>
>> Lastly, there is an interesting side effect of the way this broadcasting
>> is handled: if a gufunc specifies a core dimension in an output argument
>> only, and an `out` kwarg is not passed in, then the output array will have
>> that core dimension set to be of size 1, e.g. if the signature of `f` is
>> '(),()->(a)', then f(1, 2).shape is (1,). This has always felt funny to me,
>> and I think that an unspecified dimension in an output array should either
>> be specified by a passed out array, or raise an error about an unspecified
>> core dimension or something like that. Does this sound right?
>>
>> Jaime
>>
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>> de dominaci?n mundial.
>>
>
> Given this and the earlier discussion about improvements to this code, I
> wonder if it wouldn't be worth implemented the logic in python first. This
> way there is something to test against, and something to play while all of
> the cases are sorted out.
>

I've got a couple of generalized functions whose tests turned this up.
Speaking of which, they are tentatively named

mulvecvec
mulvecmat
mulmatvec
mulmatmat

and work on stacked matrices and vectors. I can see using 'dot' instead of
'mul', and any other suggestions would be welcome. I've also made it easier
to specify generalized functions in the code generator, but given the
multiple loops, I haven't settled on a good way of using generic loops.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/61126d7b/attachment.html>

From njs at pobox.com  Tue Sep 16 19:32:32 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 16 Sep 2014 19:32:32 -0400
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
	<CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
Message-ID: <CAPJVwBmVYzZrRmj3J4JJDHLWg6=A9kiifvdc1cm+x=JU1gXhVg@mail.gmail.com>

On Tue, Sep 16, 2014 at 6:56 PM, Jaime Fern?ndez del R?o
<jaime.frio at gmail.com> wrote:
> On Tue, Sep 16, 2014 at 3:26 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>> On Tue, Sep 16, 2014 at 2:51 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>>
>>> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
>>> <jaime.frio at gmail.com> wrote:
>>> > If it is a bug, it is an extended one, because it is the same behavior
>>> > of
>>> > einsum:
>>> >
>>> >>>> np.einsum('i,i', [1,1,1], [1])
>>> > 3
>>> >>>> np.einsum('i,i', [1,1,1], [1,1])
>>> > Traceback (most recent call last):
>>> >   File "<stdin>", line 1, in <module>
>>> > ValueError: operands could not be broadcast together with remapped
>>> > shapes
>>> > [origi
>>> > nal->remapped]: (3,)->(3,) (2,)->(2,)
>>> >
>>> > And I think it is a conscious design decision, there is a comment about
>>> > broadcasting missing core dimensions here:
>>> >
>>> >
>>> > https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
>>>
>>> "intentional" and "sensible" are not always the same thing :-). That
>>> said, it isn't totally obvious to me what the correct behaviour for
>>> einsum is in this case.
>>>
>>> > and the code makes it very explicit that input argument dimensions with
>>> > the
>>> > same label are broadcast to a common shape, see here:
>>> >
>>> >
>>> > https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
>>> >
>>> > I kind of expect numpy to broadcast whenever possible, so this doesn't
>>> > feel
>>> > wrong to me.
>>>
>>> The case Chuck is talking about is like if we allowed matrix
>>> multiplication between an array with shape (n, 1) with an array with
>>> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
>>> wrong to me, will certainly hide many bugs, and is definitely not how
>>> it works right now (for np.dot, anyway; apparently it does work that
>>> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
>>> be an accident).
>>>
>>> > That said, it is hard to come up with convincing examples of how this
>>> > behavior would be useful in any practical context. But changing
>>> > something
>>> > that has been working like that for so long seems like a risky thing.
>>> > And I
>>> > cannot come with a convincing example of why it would be harmful
>>> > either.
>>>
>>> gufuncs are very new.
>>>
>>
>> Or at least newly used. They've been sitting around for years with little
>> use and less testing. This is probably (easily?) fixable as the shape of the
>> operands is available.
>>
>> In [22]: [d.shape for d in nditer([[1,1,1], [[1,1,1]]*3]).operands]
>> Out[22]: [(3,), (3, 3)]
>>
>> In [23]: [d.shape for d in nditer([[[1,1,1]], [[1,1,1]]*3]).operands]
>> Out[23]: [(1, 3), (3, 3)]
>>
>
> If we agree that it is broken, which I still am not fully sure of, then yes,
> it is very easy to fix. I have been looking into that code quite a bit
> lately, so I could patch something up pretty quick.
>
> Are we OK with the appending of size 1 dimensions to complete the core
> dimensions? That is, should matrix_multiply([1,1,1], [[1],[1],[1]]) work, or
> should it complain about the first argument having less dimensions than the
> core dimensions in the signature?

I think that by default, gufuncs should definitely *not* allow this.

Example case 1: qr can be applied equally well to a (1, n) array or an
(n, 1) array, but with different results. If the user passes in an
(n,) array, then how do we know which one they wanted?

Example case 2: matrix multiplication, as you know :-), is a case
where I do think we should allow for a bit more cleverness with the
core dimensions... but the appropriate cleverness is much more subtle
than just "prepend size 1 dimensions until things fit". Instead, for
the first argument you need to prepend, for the second argument you
need to append, and then you need to remove the corresponding
dimensions from the output. Specific cases:

# Your version gives:
matmul([1, 1, 1], [[1], [1], [1]]).shape == (1, 1)
# But this should be (1,) (try it with np.dot)

# Your version gives:
matmul([[1, 1, 1]], [1, 1, 1]) -> error, (1, 3) and (1, 3) are not conformable
# But this should work (second argument should be treated as (3, 1), not (1, 3))

So the default should be to be strict about core dimensions, unless
explicitly requested otherwise by the person defining the gufunc.

> Lastly, there is an interesting side effect of the way this broadcasting is
> handled: if a gufunc specifies a core dimension in an output argument only,
> and an `out` kwarg is not passed in, then the output array will have that
> core dimension set to be of size 1, e.g. if the signature of `f` is
> '(),()->(a)', then f(1, 2).shape is (1,). This has always felt funny to me,
> and I think that an unspecified dimension in an output array should either
> be specified by a passed out array, or raise an error about an unspecified
> core dimension or something like that. Does this sound right?

Does this have any use cases? My vote is that we simply disallow this
until we have concrete uses and can decide how to do it properly. That
way there won't be any backcompat concerns to deal with later.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From jaime.frio at gmail.com  Tue Sep 16 20:31:01 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Tue, 16 Sep 2014 17:31:01 -0700
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPJVwBmVYzZrRmj3J4JJDHLWg6=A9kiifvdc1cm+x=JU1gXhVg@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
	<CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
	<CAPJVwBmVYzZrRmj3J4JJDHLWg6=A9kiifvdc1cm+x=JU1gXhVg@mail.gmail.com>
Message-ID: <CAPOWHWkKLOPrvDtT_R5J6ZCTeD=eD4nMDeuWrVT=_oa5WxAQTQ@mail.gmail.com>

On Tue, Sep 16, 2014 at 4:32 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Tue, Sep 16, 2014 at 6:56 PM, Jaime Fern?ndez del R?o
> <jaime.frio at gmail.com> wrote:
> > On Tue, Sep 16, 2014 at 3:26 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >> On Tue, Sep 16, 2014 at 2:51 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >>>
> >>> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
> >>> <jaime.frio at gmail.com> wrote:
> >>> > If it is a bug, it is an extended one, because it is the same
> behavior
> >>> > of
> >>> > einsum:
> >>> >
> >>> >>>> np.einsum('i,i', [1,1,1], [1])
> >>> > 3
> >>> >>>> np.einsum('i,i', [1,1,1], [1,1])
> >>> > Traceback (most recent call last):
> >>> >   File "<stdin>", line 1, in <module>
> >>> > ValueError: operands could not be broadcast together with remapped
> >>> > shapes
> >>> > [origi
> >>> > nal->remapped]: (3,)->(3,) (2,)->(2,)
> >>> >
> >>> > And I think it is a conscious design decision, there is a comment
> about
> >>> > broadcasting missing core dimensions here:
> >>> >
> >>> >
> >>> >
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
> >>>
> >>> "intentional" and "sensible" are not always the same thing :-). That
> >>> said, it isn't totally obvious to me what the correct behaviour for
> >>> einsum is in this case.
> >>>
> >>> > and the code makes it very explicit that input argument dimensions
> with
> >>> > the
> >>> > same label are broadcast to a common shape, see here:
> >>> >
> >>> >
> >>> >
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
> >>> >
> >>> > I kind of expect numpy to broadcast whenever possible, so this
> doesn't
> >>> > feel
> >>> > wrong to me.
> >>>
> >>> The case Chuck is talking about is like if we allowed matrix
> >>> multiplication between an array with shape (n, 1) with an array with
> >>> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
> >>> wrong to me, will certainly hide many bugs, and is definitely not how
> >>> it works right now (for np.dot, anyway; apparently it does work that
> >>> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
> >>> be an accident).
> >>>
> >>> > That said, it is hard to come up with convincing examples of how this
> >>> > behavior would be useful in any practical context. But changing
> >>> > something
> >>> > that has been working like that for so long seems like a risky thing.
> >>> > And I
> >>> > cannot come with a convincing example of why it would be harmful
> >>> > either.
> >>>
> >>> gufuncs are very new.
> >>>
> >>
> >> Or at least newly used. They've been sitting around for years with
> little
> >> use and less testing. This is probably (easily?) fixable as the shape
> of the
> >> operands is available.
> >>
> >> In [22]: [d.shape for d in nditer([[1,1,1], [[1,1,1]]*3]).operands]
> >> Out[22]: [(3,), (3, 3)]
> >>
> >> In [23]: [d.shape for d in nditer([[[1,1,1]], [[1,1,1]]*3]).operands]
> >> Out[23]: [(1, 3), (3, 3)]
> >>
> >
> > If we agree that it is broken, which I still am not fully sure of, then
> yes,
> > it is very easy to fix. I have been looking into that code quite a bit
> > lately, so I could patch something up pretty quick.
> >
> > Are we OK with the appending of size 1 dimensions to complete the core
> > dimensions? That is, should matrix_multiply([1,1,1], [[1],[1],[1]])
> work, or
> > should it complain about the first argument having less dimensions than
> the
> > core dimensions in the signature?
>
> I think that by default, gufuncs should definitely *not* allow this.
>

Too late! ;-)

I just put together some working code and sent a PR implementing the
behavior that Charles asked for:

https://github.com/numpy/numpy/pull/5077

Should we keep the discussion here, or take it over there?

Jaime


>
> Example case 1: qr can be applied equally well to a (1, n) array or an
> (n, 1) array, but with different results. If the user passes in an
> (n,) array, then how do we know which one they wanted?
>
> Example case 2: matrix multiplication, as you know :-), is a case
> where I do think we should allow for a bit more cleverness with the
> core dimensions... but the appropriate cleverness is much more subtle
> than just "prepend size 1 dimensions until things fit". Instead, for
> the first argument you need to prepend, for the second argument you
> need to append, and then you need to remove the corresponding
> dimensions from the output. Specific cases:
>
> # Your version gives:
> matmul([1, 1, 1], [[1], [1], [1]]).shape == (1, 1)
> # But this should be (1,) (try it with np.dot)
>
> # Your version gives:
> matmul([[1, 1, 1]], [1, 1, 1]) -> error, (1, 3) and (1, 3) are not
> conformable
> # But this should work (second argument should be treated as (3, 1), not
> (1, 3))
>
> So the default should be to be strict about core dimensions, unless
> explicitly requested otherwise by the person defining the gufunc.
>
> > Lastly, there is an interesting side effect of the way this broadcasting
> is
> > handled: if a gufunc specifies a core dimension in an output argument
> only,
> > and an `out` kwarg is not passed in, then the output array will have that
> > core dimension set to be of size 1, e.g. if the signature of `f` is
> > '(),()->(a)', then f(1, 2).shape is (1,). This has always felt funny to
> me,
> > and I think that an unspecified dimension in an output array should
> either
> > be specified by a passed out array, or raise an error about an
> unspecified
> > core dimension or something like that. Does this sound right?
>
> Does this have any use cases? My vote is that we simply disallow this
> until we have concrete uses and can decide how to do it properly. That
> way there won't be any backcompat concerns to deal with later.
>
> -n
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/39e98092/attachment.html>

From njs at pobox.com  Tue Sep 16 21:06:56 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 16 Sep 2014 21:06:56 -0400
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPOWHWkKLOPrvDtT_R5J6ZCTeD=eD4nMDeuWrVT=_oa5WxAQTQ@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
	<CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
	<CAPJVwBmVYzZrRmj3J4JJDHLWg6=A9kiifvdc1cm+x=JU1gXhVg@mail.gmail.com>
	<CAPOWHWkKLOPrvDtT_R5J6ZCTeD=eD4nMDeuWrVT=_oa5WxAQTQ@mail.gmail.com>
Message-ID: <CAPJVwB=dRSAeaiWqnA4i_nM1i9gqEZrEMu5aMUFMJtFe9j-LBg@mail.gmail.com>

On Tue, Sep 16, 2014 at 8:31 PM, Jaime Fern?ndez del R?o
<jaime.frio at gmail.com> wrote:
> On Tue, Sep 16, 2014 at 4:32 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Tue, Sep 16, 2014 at 6:56 PM, Jaime Fern?ndez del R?o
>> <jaime.frio at gmail.com> wrote:
>> > Are we OK with the appending of size 1 dimensions to complete the core
>> > dimensions? That is, should matrix_multiply([1,1,1], [[1],[1],[1]])
>> > work, or
>> > should it complain about the first argument having less dimensions than
>> > the
>> > core dimensions in the signature?
>>
>> I think that by default, gufuncs should definitely *not* allow this.
>
> Too late! ;-)
>
> I just put together some working code and sent a PR implementing the
> behavior that Charles asked for:
>
> https://github.com/numpy/numpy/pull/5077
>
> Should we keep the discussion here, or take it over there?

I guess the default is, design discussions here where people can chime
in, code finickiness over there to avoid boring people? So that would
suggest keeping the discussion here until we've resolved the
high-level debate about what the behaviour should even be. But it
isn't a huge issue either way...

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From jaime.frio at gmail.com  Wed Sep 17 01:51:10 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Tue, 16 Sep 2014 22:51:10 -0700
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPJVwBmVYzZrRmj3J4JJDHLWg6=A9kiifvdc1cm+x=JU1gXhVg@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<CAB6mnxKx+C3gm0dd5_P+_rZsy00Ne6-p0urA_wnFrNF0vhaHOA@mail.gmail.com>
	<CAPOWHWkjqyKgKODpyGu-01WjCtg3=bDd4jCmmy-u63aV-jf+4w@mail.gmail.com>
	<CAPJVwBmVYzZrRmj3J4JJDHLWg6=A9kiifvdc1cm+x=JU1gXhVg@mail.gmail.com>
Message-ID: <CAPOWHW=jUfNQoADj6TR9K=TOkoYfTTbT=L6H1XdQSYvu_f+Ldg@mail.gmail.com>

On Tue, Sep 16, 2014 at 4:32 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Tue, Sep 16, 2014 at 6:56 PM, Jaime Fern?ndez del R?o
> <jaime.frio at gmail.com> wrote:
> > On Tue, Sep 16, 2014 at 3:26 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >>
> >> On Tue, Sep 16, 2014 at 2:51 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >>>
> >>> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
> >>> <jaime.frio at gmail.com> wrote:
> >>> > If it is a bug, it is an extended one, because it is the same
> behavior
> >>> > of
> >>> > einsum:
> >>> >
> >>> >>>> np.einsum('i,i', [1,1,1], [1])
> >>> > 3
> >>> >>>> np.einsum('i,i', [1,1,1], [1,1])
> >>> > Traceback (most recent call last):
> >>> >   File "<stdin>", line 1, in <module>
> >>> > ValueError: operands could not be broadcast together with remapped
> >>> > shapes
> >>> > [origi
> >>> > nal->remapped]: (3,)->(3,) (2,)->(2,)
> >>> >
> >>> > And I think it is a conscious design decision, there is a comment
> about
> >>> > broadcasting missing core dimensions here:
> >>> >
> >>> >
> >>> >
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
> >>>
> >>> "intentional" and "sensible" are not always the same thing :-). That
> >>> said, it isn't totally obvious to me what the correct behaviour for
> >>> einsum is in this case.
> >>>
> >>> > and the code makes it very explicit that input argument dimensions
> with
> >>> > the
> >>> > same label are broadcast to a common shape, see here:
> >>> >
> >>> >
> >>> >
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
> >>> >
> >>> > I kind of expect numpy to broadcast whenever possible, so this
> doesn't
> >>> > feel
> >>> > wrong to me.
> >>>
> >>> The case Chuck is talking about is like if we allowed matrix
> >>> multiplication between an array with shape (n, 1) with an array with
> >>> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
> >>> wrong to me, will certainly hide many bugs, and is definitely not how
> >>> it works right now (for np.dot, anyway; apparently it does work that
> >>> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
> >>> be an accident).
> >>>
> >>> > That said, it is hard to come up with convincing examples of how this
> >>> > behavior would be useful in any practical context. But changing
> >>> > something
> >>> > that has been working like that for so long seems like a risky thing.
> >>> > And I
> >>> > cannot come with a convincing example of why it would be harmful
> >>> > either.
> >>>
> >>> gufuncs are very new.
> >>>
> >>
> >> Or at least newly used. They've been sitting around for years with
> little
> >> use and less testing. This is probably (easily?) fixable as the shape
> of the
> >> operands is available.
> >>
> >> In [22]: [d.shape for d in nditer([[1,1,1], [[1,1,1]]*3]).operands]
> >> Out[22]: [(3,), (3, 3)]
> >>
> >> In [23]: [d.shape for d in nditer([[[1,1,1]], [[1,1,1]]*3]).operands]
> >> Out[23]: [(1, 3), (3, 3)]
> >>
> >
> > If we agree that it is broken, which I still am not fully sure of, then
> yes,
> > it is very easy to fix. I have been looking into that code quite a bit
> > lately, so I could patch something up pretty quick.
> >
> > Are we OK with the appending of size 1 dimensions to complete the core
> > dimensions? That is, should matrix_multiply([1,1,1], [[1],[1],[1]])
> work, or
> > should it complain about the first argument having less dimensions than
> the
> > core dimensions in the signature?
>
> I think that by default, gufuncs should definitely *not* allow this.
>
> Example case 1: qr can be applied equally well to a (1, n) array or an
> (n, 1) array, but with different results. If the user passes in an
> (n,) array, then how do we know which one they wanted?
>
> Example case 2: matrix multiplication, as you know :-), is a case
> where I do think we should allow for a bit more cleverness with the
> core dimensions... but the appropriate cleverness is much more subtle
> than just "prepend size 1 dimensions until things fit". Instead, for
> the first argument you need to prepend, for the second argument you
> need to append, and then you need to remove the corresponding
> dimensions from the output. Specific cases:
>
> # Your version gives:
> matmul([1, 1, 1], [[1], [1], [1]]).shape == (1, 1)
> # But this should be (1,) (try it with np.dot)
>
> # Your version gives:
> matmul([[1, 1, 1]], [1, 1, 1]) -> error, (1, 3) and (1, 3) are not
> conformable
> # But this should work (second argument should be treated as (3, 1), not
> (1, 3))
>
> So the default should be to be strict about core dimensions, unless
> explicitly requested otherwise by the person defining the gufunc.
>

#5057 now implements this behavior, which I agree is a sensible thing to
do. And it doesn't seem very likely that anyone (numpy tests aside!) is
expecting the old behavior to hold.


>
> > Lastly, there is an interesting side effect of the way this broadcasting
> is
> > handled: if a gufunc specifies a core dimension in an output argument
> only,
> > and an `out` kwarg is not passed in, then the output array will have that
> > core dimension set to be of size 1, e.g. if the signature of `f` is
> > '(),()->(a)', then f(1, 2).shape is (1,). This has always felt funny to
> me,
> > and I think that an unspecified dimension in an output array should
> either
> > be specified by a passed out array, or raise an error about an
> unspecified
> > core dimension or something like that. Does this sound right?
>
> Does this have any use cases? My vote is that we simply disallow this
> until we have concrete uses and can decide how to do it properly. That
> way there won't be any backcompat concerns to deal with later.
>

The obvious example is "compute all pairwise distances." You can define a
gufunc with signature (n,d)->(m) that does that, and wrap it in a Python
function that makes sure that it always gets called with an array with m =
n * (n - 1) for the last diemnsion of the out parameter. If you do not
specify the out array, it will create one with m = 1, or with the
implementation in #5057, m = -1, which I suppose will fail badly with a
cryptic, apparently unrelated error. I don't think either of these has any
practical use, and think there are only three sensible options here:

1. Raise an error if a core dimension wasn't specified by the passed arrays
(in the face of ambiguity...)
2. Do not allow dimensions in the output arrays that are not also in one of
the input arrays.
3. Provide a method for the gufunc to decide on its own what m should be
based on the other core dimensions.

I like the idea of 3, but we need a lot of discussion on whether it really
is a good idea, and what the best mechanism to implement it is.
I think 2 would be a mistake, unless something like 3 was in place, for
being too restrictive.
And I see 1 as the quickest short term solution to fix something that is
broken, or would be if gufuncs were being more extensively used out there.

Jaime
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140916/179eddc7/attachment.html>

From sebastian at sipsolutions.net  Wed Sep 17 04:30:30 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 17 Sep 2014 10:30:30 +0200
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
Message-ID: <1410942630.19651.1.camel@sebastian-t440>

On Di, 2014-09-16 at 16:51 -0400, Nathaniel Smith wrote:
> On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
> <jaime.frio at gmail.com> wrote:
> > If it is a bug, it is an extended one, because it is the same behavior of
> > einsum:
> >
> >>>> np.einsum('i,i', [1,1,1], [1])
> > 3
> >>>> np.einsum('i,i', [1,1,1], [1,1])
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > ValueError: operands could not be broadcast together with remapped shapes
> > [origi
> > nal->remapped]: (3,)->(3,) (2,)->(2,)
> >
> > And I think it is a conscious design decision, there is a comment about
> > broadcasting missing core dimensions here:
> >
> > https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
> 
> "intentional" and "sensible" are not always the same thing :-). That
> said, it isn't totally obvious to me what the correct behaviour for
> einsum is in this case.
> 
> > and the code makes it very explicit that input argument dimensions with the
> > same label are broadcast to a common shape, see here:
> >
> > https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
> >
> > I kind of expect numpy to broadcast whenever possible, so this doesn't feel
> > wrong to me.
> 
> The case Chuck is talking about is like if we allowed matrix
> multiplication between an array with shape (n, 1) with an array with
> (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
> wrong to me, will certainly hide many bugs, and is definitely not how
> it works right now (for np.dot, anyway; apparently it does work that
> way for the brand-new gufunc np.linalg.matrix_multiply, but this must
> be an accident).

Agreed, the only argument to not change it right away would be being
afraid of breaking user code abusing the kind of thing Josef mentioned.

- Sebastian

> 
> > That said, it is hard to come up with convincing examples of how this
> > behavior would be useful in any practical context. But changing something
> > that has been working like that for so long seems like a risky thing. And I
> > cannot come with a convincing example of why it would be harmful either.
> 
> gufuncs are very new.
> 
> -n
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140917/a16f8b1a/attachment.sig>

From charlesr.harris at gmail.com  Wed Sep 17 08:33:26 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 17 Sep 2014 06:33:26 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <1410942630.19651.1.camel@sebastian-t440>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<1410942630.19651.1.camel@sebastian-t440>
Message-ID: <CAB6mnx+VXqQtF+RDrx9LzAEWkGML0FH3HHgrES4jcK0AUF7Wsw@mail.gmail.com>

On Wed, Sep 17, 2014 at 2:30 AM, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Di, 2014-09-16 at 16:51 -0400, Nathaniel Smith wrote:
> > On Tue, Sep 16, 2014 at 4:31 PM, Jaime Fern?ndez del R?o
> > <jaime.frio at gmail.com> wrote:
> > > If it is a bug, it is an extended one, because it is the same behavior
> of
> > > einsum:
> > >
> > >>>> np.einsum('i,i', [1,1,1], [1])
> > > 3
> > >>>> np.einsum('i,i', [1,1,1], [1,1])
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in <module>
> > > ValueError: operands could not be broadcast together with remapped
> shapes
> > > [origi
> > > nal->remapped]: (3,)->(3,) (2,)->(2,)
> > >
> > > And I think it is a conscious design decision, there is a comment about
> > > broadcasting missing core dimensions here:
> > >
> > >
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1940
> >
> > "intentional" and "sensible" are not always the same thing :-). That
> > said, it isn't totally obvious to me what the correct behaviour for
> > einsum is in this case.
> >
> > > and the code makes it very explicit that input argument dimensions
> with the
> > > same label are broadcast to a common shape, see here:
> > >
> > >
> https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c#L1956
> > >
> > > I kind of expect numpy to broadcast whenever possible, so this doesn't
> feel
> > > wrong to me.
> >
> > The case Chuck is talking about is like if we allowed matrix
> > multiplication between an array with shape (n, 1) with an array with
> > (k, m), because (n, 1) can be broadcast to (n, k). This feels VERY
> > wrong to me, will certainly hide many bugs, and is definitely not how
> > it works right now (for np.dot, anyway; apparently it does work that
> > way for the brand-new gufunc np.linalg.matrix_multiply, but this must
> > be an accident).
>
> Agreed, the only argument to not change it right away would be being
> afraid of breaking user code abusing the kind of thing Josef mentioned.
>
>
It *is* a big change. I think of the gufuncs as working with matrices and
vectors, the array version of the matrix class. In that case the signature
shapes must be preserved and there should be no broadcasting within the
signature. The matrix class itself fails in that regard:

In [1]: matrix(eye(3)) + matrix([[1,1,1]])
Out[1]:
matrix([[ 2.,  1.,  1.],
        [ 1.,  2.,  1.],
        [ 1.,  1.,  2.]])

Which is all wrong for a matrix type.

It would also be nice if the order could be made part of the signature as
DGEMM and friends like one of the argument axis to be contiguous, but I
don't see a clean way to do that. The gufuncs do have an order parameter
which should probably default to 'C'  if the arrays/vectors are stacked. I
think the default is currently 'K'. Hmm, we could make 'K' refer to the
last one or two dimensions in the inputs. OTOH, that isn't needed for types
not handled by BLAS. Or it could be handled in the inner loops.

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140917/63df49a4/attachment.html>

From sebastian at sipsolutions.net  Wed Sep 17 08:48:14 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 17 Sep 2014 14:48:14 +0200
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAB6mnx+VXqQtF+RDrx9LzAEWkGML0FH3HHgrES4jcK0AUF7Wsw@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<1410942630.19651.1.camel@sebastian-t440>
	<CAB6mnx+VXqQtF+RDrx9LzAEWkGML0FH3HHgrES4jcK0AUF7Wsw@mail.gmail.com>
Message-ID: <1410958094.21667.4.camel@sebastian-t440>

On Mi, 2014-09-17 at 06:33 -0600, Charles R Harris wrote:
> 
> 
<snip>
> 
> 
> It would also be nice if the order could be made part of the signature
> as DGEMM and friends like one of the argument axis to be contiguous,
> but I don't see a clean way to do that. The gufuncs do have an order
> parameter which should probably default to 'C'  if the arrays/vectors
> are stacked. I think the default is currently 'K'. Hmm, we could make
> 'K' refer to the last one or two dimensions in the inputs. OTOH, that
> isn't needed for types not handled by BLAS. Or it could be handled in
> the inner loops.
> 

This is a different discussion, right? It would be nice to have an order
flag for the core dimensions. The gufunc itself should not care at all
about the outer ones.
All the orders for the core dimensions would be nice probably, including
no contiguity being enforced (or actually, maybe we can define 'K' to
mean that in this context). To be honest, if 'K' means that, it seems
like a decent default.

- Sebastian

> 
> <snip>
> 
> 
> Chuck
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140917/c4d1ecf8/attachment.sig>

From charlesr.harris at gmail.com  Wed Sep 17 08:57:49 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 17 Sep 2014 06:57:49 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <1410958094.21667.4.camel@sebastian-t440>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<1410942630.19651.1.camel@sebastian-t440>
	<CAB6mnx+VXqQtF+RDrx9LzAEWkGML0FH3HHgrES4jcK0AUF7Wsw@mail.gmail.com>
	<1410958094.21667.4.camel@sebastian-t440>
Message-ID: <CAB6mnxJ++joQJyvKYNqoSJg51V2cJo-G0122DNdg-gC6=Z-5dQ@mail.gmail.com>

On Wed, Sep 17, 2014 at 6:48 AM, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Mi, 2014-09-17 at 06:33 -0600, Charles R Harris wrote:
> >
> >
> <snip>
> >
> >
> > It would also be nice if the order could be made part of the signature
> > as DGEMM and friends like one of the argument axis to be contiguous,
> > but I don't see a clean way to do that. The gufuncs do have an order
> > parameter which should probably default to 'C'  if the arrays/vectors
> > are stacked. I think the default is currently 'K'. Hmm, we could make
> > 'K' refer to the last one or two dimensions in the inputs. OTOH, that
> > isn't needed for types not handled by BLAS. Or it could be handled in
> > the inner loops.
> >
>
> This is a different discussion, right? It would be nice to have an order
> flag for the core dimensions. The gufunc itself should not care at all
> about the outer ones.
>

Right. It is possible to check all these things in the loop, but the loop
code grows..
.

> All the orders for the core dimensions would be nice probably, including
> no contiguity being enforced (or actually, maybe we can define 'K' to
> mean that in this context). To be honest, if 'K' means that, it seems
> like a decent default.
>
>
With regards to the main topic, we could extend the signature notation,
using `[...]` instead of `(...)' for the new behavior.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140917/4111a5a9/attachment.html>

From charlesr.harris at gmail.com  Wed Sep 17 16:27:30 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 17 Sep 2014 14:27:30 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAB6mnxJ++joQJyvKYNqoSJg51V2cJo-G0122DNdg-gC6=Z-5dQ@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<1410942630.19651.1.camel@sebastian-t440>
	<CAB6mnx+VXqQtF+RDrx9LzAEWkGML0FH3HHgrES4jcK0AUF7Wsw@mail.gmail.com>
	<1410958094.21667.4.camel@sebastian-t440>
	<CAB6mnxJ++joQJyvKYNqoSJg51V2cJo-G0122DNdg-gC6=Z-5dQ@mail.gmail.com>
Message-ID: <CAB6mnxLPkjzTFfNXAKX71AD2-oLmGcXPsXtTLZq0ffeBA+_Tkg@mail.gmail.com>

On Wed, Sep 17, 2014 at 6:57 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Wed, Sep 17, 2014 at 6:48 AM, Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
>
>> On Mi, 2014-09-17 at 06:33 -0600, Charles R Harris wrote:
>> >
>> >
>> <snip>
>> >
>> >
>> > It would also be nice if the order could be made part of the signature
>> > as DGEMM and friends like one of the argument axis to be contiguous,
>> > but I don't see a clean way to do that. The gufuncs do have an order
>> > parameter which should probably default to 'C'  if the arrays/vectors
>> > are stacked. I think the default is currently 'K'. Hmm, we could make
>> > 'K' refer to the last one or two dimensions in the inputs. OTOH, that
>> > isn't needed for types not handled by BLAS. Or it could be handled in
>> > the inner loops.
>> >
>>
>> This is a different discussion, right? It would be nice to have an order
>> flag for the core dimensions. The gufunc itself should not care at all
>> about the outer ones.
>>
>
> Right. It is possible to check all these things in the loop, but the loop
> code grows..
> .
>
>> All the orders for the core dimensions would be nice probably, including
>> no contiguity being enforced (or actually, maybe we can define 'K' to
>> mean that in this context). To be honest, if 'K' means that, it seems
>> like a decent default.
>>
>>
> With regards to the main topic, we could extend the signature notation,
> using `[...]` instead of `(...)' for the new behavior.
>

Or we could add a new function,  PyUFunc_StrictGeneralizedFunction, with
the new behavior. that might be the safe way to go.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140917/dc2fc9d1/attachment.html>

From jaime.frio at gmail.com  Wed Sep 17 17:01:02 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Wed, 17 Sep 2014 14:01:02 -0700
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAB6mnxLPkjzTFfNXAKX71AD2-oLmGcXPsXtTLZq0ffeBA+_Tkg@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<1410942630.19651.1.camel@sebastian-t440>
	<CAB6mnx+VXqQtF+RDrx9LzAEWkGML0FH3HHgrES4jcK0AUF7Wsw@mail.gmail.com>
	<1410958094.21667.4.camel@sebastian-t440>
	<CAB6mnxJ++joQJyvKYNqoSJg51V2cJo-G0122DNdg-gC6=Z-5dQ@mail.gmail.com>
	<CAB6mnxLPkjzTFfNXAKX71AD2-oLmGcXPsXtTLZq0ffeBA+_Tkg@mail.gmail.com>
Message-ID: <CAPOWHW=yyBSOtAfkAhFQHAOM=R9YtakQu1YmFVTC-qsRStF0Pw@mail.gmail.com>

On Wed, Sep 17, 2014 at 1:27 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Wed, Sep 17, 2014 at 6:57 AM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Wed, Sep 17, 2014 at 6:48 AM, Sebastian Berg <
>> sebastian at sipsolutions.net> wrote:
>>
>>> On Mi, 2014-09-17 at 06:33 -0600, Charles R Harris wrote:
>>> >
>>> >
>>> <snip>
>>> >
>>> >
>>> > It would also be nice if the order could be made part of the signature
>>> > as DGEMM and friends like one of the argument axis to be contiguous,
>>> > but I don't see a clean way to do that. The gufuncs do have an order
>>> > parameter which should probably default to 'C'  if the arrays/vectors
>>> > are stacked. I think the default is currently 'K'. Hmm, we could make
>>> > 'K' refer to the last one or two dimensions in the inputs. OTOH, that
>>> > isn't needed for types not handled by BLAS. Or it could be handled in
>>> > the inner loops.
>>> >
>>>
>>> This is a different discussion, right? It would be nice to have an order
>>> flag for the core dimensions. The gufunc itself should not care at all
>>> about the outer ones.
>>>
>>
>> Right. It is possible to check all these things in the loop, but the loop
>> code grows..
>> .
>>
>>> All the orders for the core dimensions would be nice probably, including
>>> no contiguity being enforced (or actually, maybe we can define 'K' to
>>> mean that in this context). To be honest, if 'K' means that, it seems
>>> like a decent default.
>>>
>>>
>> With regards to the main topic, we could extend the signature notation,
>> using `[...]` instead of `(...)' for the new behavior.
>>
>
> Or we could add a new function,  PyUFunc_StrictGeneralizedFunction, with
> the new behavior. that might be the safe way to go.
>

That sounds good to me, the current flow is that 'ufunc_generic_call',
which is the function in the tp_call slot of the PyUFunc object, calls
'PyUFunc_GenericFunction', which will call 'PyUFunc_GeneralizedFunction' if
the 'core_enabled' member variable is set to 1. We could have a new
'PyUFunc_StrictFromFuncAndDataAndSignature' that sets the 'core_enabled'
variable to e.g. 2, and then dispatch on this value in
'PyUFunc_GenericFunction' to the new 'PyUFunc_StrictGeneralizedFunction'.

This will also give us a better sandbox to experiment with all the other
enhancements we have been talking about: frozen dimensions, optional
dimensions, computed dimensions...

I am guessing we still want to deprecate the old behavior in the next
release and remove it entirely in a couple more, right?

Jaime


> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140917/59609f35/attachment.html>

From charlesr.harris at gmail.com  Wed Sep 17 17:29:21 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 17 Sep 2014 15:29:21 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAPOWHW=yyBSOtAfkAhFQHAOM=R9YtakQu1YmFVTC-qsRStF0Pw@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<1410942630.19651.1.camel@sebastian-t440>
	<CAB6mnx+VXqQtF+RDrx9LzAEWkGML0FH3HHgrES4jcK0AUF7Wsw@mail.gmail.com>
	<1410958094.21667.4.camel@sebastian-t440>
	<CAB6mnxJ++joQJyvKYNqoSJg51V2cJo-G0122DNdg-gC6=Z-5dQ@mail.gmail.com>
	<CAB6mnxLPkjzTFfNXAKX71AD2-oLmGcXPsXtTLZq0ffeBA+_Tkg@mail.gmail.com>
	<CAPOWHW=yyBSOtAfkAhFQHAOM=R9YtakQu1YmFVTC-qsRStF0Pw@mail.gmail.com>
Message-ID: <CAB6mnxKUUHMAEWk9Wym5GFcG04mSSmryy6ViVtUU9CoE6u+pCQ@mail.gmail.com>

On Wed, Sep 17, 2014 at 3:01 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Wed, Sep 17, 2014 at 1:27 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Wed, Sep 17, 2014 at 6:57 AM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Sep 17, 2014 at 6:48 AM, Sebastian Berg <
>>> sebastian at sipsolutions.net> wrote:
>>>
>>>> On Mi, 2014-09-17 at 06:33 -0600, Charles R Harris wrote:
>>>> >
>>>> >
>>>> <snip>
>>>> >
>>>> >
>>>> > It would also be nice if the order could be made part of the signature
>>>> > as DGEMM and friends like one of the argument axis to be contiguous,
>>>> > but I don't see a clean way to do that. The gufuncs do have an order
>>>> > parameter which should probably default to 'C'  if the arrays/vectors
>>>> > are stacked. I think the default is currently 'K'. Hmm, we could make
>>>> > 'K' refer to the last one or two dimensions in the inputs. OTOH, that
>>>> > isn't needed for types not handled by BLAS. Or it could be handled in
>>>> > the inner loops.
>>>> >
>>>>
>>>> This is a different discussion, right? It would be nice to have an order
>>>> flag for the core dimensions. The gufunc itself should not care at all
>>>> about the outer ones.
>>>>
>>>
>>> Right. It is possible to check all these things in the loop, but the
>>> loop code grows..
>>> .
>>>
>>>> All the orders for the core dimensions would be nice probably, including
>>>> no contiguity being enforced (or actually, maybe we can define 'K' to
>>>> mean that in this context). To be honest, if 'K' means that, it seems
>>>> like a decent default.
>>>>
>>>>
>>> With regards to the main topic, we could extend the signature notation,
>>> using `[...]` instead of `(...)' for the new behavior.
>>>
>>
>> Or we could add a new function,  PyUFunc_StrictGeneralizedFunction, with
>> the new behavior. that might be the safe way to go.
>>
>
> That sounds good to me, the current flow is that 'ufunc_generic_call',
> which is the function in the tp_call slot of the PyUFunc object, calls
> 'PyUFunc_GenericFunction', which will call 'PyUFunc_GeneralizedFunction' if
> the 'core_enabled' member variable is set to 1. We could have a new
> 'PyUFunc_StrictFromFuncAndDataAndSignature' that sets the 'core_enabled'
> variable to e.g. 2, and then dispatch on this value in
> 'PyUFunc_GenericFunction' to the new 'PyUFunc_StrictGeneralizedFunction'.
>
> This will also give us a better sandbox to experiment with all the other
> enhancements we have been talking about: frozen dimensions, optional
> dimensions, computed dimensions...
>

That sounds good, it is cleaner than the other solutions. The new
constructor will need to be in the interface and the interface version
updated.


> I am guessing we still want to deprecate the old behavior in the next
> release and remove it entirely in a couple more, right?
>
>
Don't know. It is in the interface, so might want to just deprecate it and
leave it laying around. Could maybe add an argument to the new constructor
that sets the `core_enabled` value so we don't need to keep adding new
functions to the api. If so, should probably be an enum in the include file
so valid values get passed.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140917/1018128f/attachment.html>

From charlesr.harris at gmail.com  Wed Sep 17 17:37:34 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 17 Sep 2014 15:37:34 -0600
Subject: [Numpy-discussion] Is this a bug?
In-Reply-To: <CAB6mnxKUUHMAEWk9Wym5GFcG04mSSmryy6ViVtUU9CoE6u+pCQ@mail.gmail.com>
References: <CAB6mnxL=9Oc85STBWK-wOj4pxTfaNToj8YuTvaCtTtUfD43EGQ@mail.gmail.com>
	<CAPOWHW=z7rQNbR9-GpDi9fcXo80aXrRpioovQoXb1tRWSFsvDA@mail.gmail.com>
	<CAPJVwBmpPAgpum8jSt65dV9kzzFUmJH0fsGyf7W6CTZjqUvdtw@mail.gmail.com>
	<1410942630.19651.1.camel@sebastian-t440>
	<CAB6mnx+VXqQtF+RDrx9LzAEWkGML0FH3HHgrES4jcK0AUF7Wsw@mail.gmail.com>
	<1410958094.21667.4.camel@sebastian-t440>
	<CAB6mnxJ++joQJyvKYNqoSJg51V2cJo-G0122DNdg-gC6=Z-5dQ@mail.gmail.com>
	<CAB6mnxLPkjzTFfNXAKX71AD2-oLmGcXPsXtTLZq0ffeBA+_Tkg@mail.gmail.com>
	<CAPOWHW=yyBSOtAfkAhFQHAOM=R9YtakQu1YmFVTC-qsRStF0Pw@mail.gmail.com>
	<CAB6mnxKUUHMAEWk9Wym5GFcG04mSSmryy6ViVtUU9CoE6u+pCQ@mail.gmail.com>
Message-ID: <CAB6mnxJ8yk9CTtFqe=D170iQG5GmA_=EO6-TJVWT_KdWacOuHA@mail.gmail.com>

On Wed, Sep 17, 2014 at 3:29 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Wed, Sep 17, 2014 at 3:01 PM, Jaime Fern?ndez del R?o <
> jaime.frio at gmail.com> wrote:
>
>> On Wed, Sep 17, 2014 at 1:27 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Sep 17, 2014 at 6:57 AM, Charles R Harris <
>>> charlesr.harris at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Sep 17, 2014 at 6:48 AM, Sebastian Berg <
>>>> sebastian at sipsolutions.net> wrote:
>>>>
>>>>> On Mi, 2014-09-17 at 06:33 -0600, Charles R Harris wrote:
>>>>> >
>>>>> >
>>>>> <snip>
>>>>> >
>>>>> >
>>>>> > It would also be nice if the order could be made part of the
>>>>> signature
>>>>> > as DGEMM and friends like one of the argument axis to be contiguous,
>>>>> > but I don't see a clean way to do that. The gufuncs do have an order
>>>>> > parameter which should probably default to 'C'  if the arrays/vectors
>>>>> > are stacked. I think the default is currently 'K'. Hmm, we could make
>>>>> > 'K' refer to the last one or two dimensions in the inputs. OTOH, that
>>>>> > isn't needed for types not handled by BLAS. Or it could be handled in
>>>>> > the inner loops.
>>>>> >
>>>>>
>>>>> This is a different discussion, right? It would be nice to have an
>>>>> order
>>>>> flag for the core dimensions. The gufunc itself should not care at all
>>>>> about the outer ones.
>>>>>
>>>>
>>>> Right. It is possible to check all these things in the loop, but the
>>>> loop code grows..
>>>> .
>>>>
>>>>> All the orders for the core dimensions would be nice probably,
>>>>> including
>>>>> no contiguity being enforced (or actually, maybe we can define 'K' to
>>>>> mean that in this context). To be honest, if 'K' means that, it seems
>>>>> like a decent default.
>>>>>
>>>>>
>>>> With regards to the main topic, we could extend the signature notation,
>>>> using `[...]` instead of `(...)' for the new behavior.
>>>>
>>>
>>> Or we could add a new function,  PyUFunc_StrictGeneralizedFunction, with
>>> the new behavior. that might be the safe way to go.
>>>
>>
>> That sounds good to me, the current flow is that 'ufunc_generic_call',
>> which is the function in the tp_call slot of the PyUFunc object, calls
>> 'PyUFunc_GenericFunction', which will call 'PyUFunc_GeneralizedFunction' if
>> the 'core_enabled' member variable is set to 1. We could have a new
>> 'PyUFunc_StrictFromFuncAndDataAndSignature' that sets the 'core_enabled'
>> variable to e.g. 2, and then dispatch on this value in
>> 'PyUFunc_GenericFunction' to the new 'PyUFunc_StrictGeneralizedFunction'.
>>
>> This will also give us a better sandbox to experiment with all the other
>> enhancements we have been talking about: frozen dimensions, optional
>> dimensions, computed dimensions...
>>
>
> That sounds good, it is cleaner than the other solutions. The new
> constructor will need to be in the interface and the interface version
> updated.
>
>
>> I am guessing we still want to deprecate the old behavior in the next
>> release and remove it entirely in a couple more, right?
>>
>>
> Don't know. It is in the interface, so might want to just deprecate it and
> leave it laying around. Could maybe add an argument to the new constructor
> that sets the `core_enabled` value so we don't need to keep adding new
> functions to the api. If so, should probably be an enum in the include file
> so valid values get passed.
>
>
And then Ufunc in the code_generator could be modified to take both utype
(core_enabled value) and signature and use the new constructor.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140917/d0adeae6/attachment.html>

From encukou at gmail.com  Thu Sep 18 12:31:23 2014
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 18 Sep 2014 18:31:23 +0200
Subject: [Numpy-discussion] (no subject)
Message-ID: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>

Hello,
Over at Python-ideas, there is a thread [0] about the following discrepancy:

>>> numpy.array(float('inf')) // 1
inf
>>> float('inf') // 1
nan

There are reasons for either result, but I believe it would be very
nice if either Python or Numpy changed, so they would give the same
value.
If any of you have reasons to defend Numpy's (or Python's) choice, or
otherwise want to chime in, please post there.

Thanks,
Petr Viktorin


[0] https://mail.python.org/pipermail/python-ideas/2014-September/029365.html


From encukou at gmail.com  Thu Sep 18 12:34:23 2014
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 18 Sep 2014 18:34:23 +0200
Subject: [Numpy-discussion] float('inf') // 1 = ?
Message-ID: <CA+=+wqCoii4rLh7vf44qzjBWDjh_FQBDxmZdhRF_dc+QmO=1fA@mail.gmail.com>

(Apologies for the lack of subject earlier)

Hello,
Over at Python-ideas, there is a thread [0] about the following discrepancy:

>>> numpy.array(float('inf')) // 1
inf
>>> float('inf') // 1
nan

There are reasons for either result, but I believe it would be very
nice if either Python or Numpy changed, so they would give the same
value.
If any of you have reasons to defend Numpy's (or Python's) choice, or
otherwise want to chime in, please post there.

Thanks,
Petr Viktorin


[0] https://mail.python.org/pipermail/python-ideas/2014-September/029365.html


From ben.root at ou.edu  Thu Sep 18 12:48:30 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Thu, 18 Sep 2014 12:48:30 -0400
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>
References: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>
Message-ID: <CANNq6FkDgDZv9Q-Uc-a=ur9oKiz=CJjKGLjCm1KWEPwKApbsCQ@mail.gmail.com>

My vote is that NumPy is correct here. I see no reason why
>>> float('inf') / 1
and
>>> float('inf') // 1

should return different results.

Ben Root


On Thu, Sep 18, 2014 at 12:31 PM, Petr Viktorin <encukou at gmail.com> wrote:

> Hello,
> Over at Python-ideas, there is a thread [0] about the following
> discrepancy:
>
> >>> numpy.array(float('inf')) // 1
> inf
> >>> float('inf') // 1
> nan
>
> There are reasons for either result, but I believe it would be very
> nice if either Python or Numpy changed, so they would give the same
> value.
> If any of you have reasons to defend Numpy's (or Python's) choice, or
> otherwise want to chime in, please post there.
>
> Thanks,
> Petr Viktorin
>
>
> [0]
> https://mail.python.org/pipermail/python-ideas/2014-September/029365.html
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140918/24b6f709/attachment.html>

From encukou at gmail.com  Thu Sep 18 12:55:08 2014
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 18 Sep 2014 18:55:08 +0200
Subject: [Numpy-discussion] float('inf') // 1 = ?
Message-ID: <CA+=+wqAdb_Ysr1FuzUG1-TSfk9rqGcUK=0XT0fLLG=PG1SeBZQ@mail.gmail.com>

Sorry for the lack of subject before.

On Thu, Sep 18, 2014 at 6:48 PM, Benjamin Root <ben.root at ou.edu> wrote:
> My vote is that NumPy is correct here. I see no reason why
>>>> float('inf') / 1
> and
>>>> float('inf') // 1
>
> should return different results.

I recommend reading the python-ideas thread; there are some arguments
for both sides.

> On Thu, Sep 18, 2014 at 12:31 PM, Petr Viktorin <encukou at gmail.com> wrote:
>>
>> Hello,
>> Over at Python-ideas, there is a thread [0] about the following
>> discrepancy:
>>
>> >>> numpy.array(float('inf')) // 1
>> inf
>> >>> float('inf') // 1
>> nan
>>
>> There are reasons for either result, but I believe it would be very
>> nice if either Python or Numpy changed, so they would give the same
>> value.
>> If any of you have reasons to defend Numpy's (or Python's) choice, or
>> otherwise want to chime in, please post there.
>>
>> Thanks,
>> Petr Viktorin
>>
>>
>> [0]
>> https://mail.python.org/pipermail/python-ideas/2014-September/029365.html
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From chris.barker at noaa.gov  Thu Sep 18 13:01:38 2014
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 18 Sep 2014 10:01:38 -0700
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <CANNq6FkDgDZv9Q-Uc-a=ur9oKiz=CJjKGLjCm1KWEPwKApbsCQ@mail.gmail.com>
References: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>
	<CANNq6FkDgDZv9Q-Uc-a=ur9oKiz=CJjKGLjCm1KWEPwKApbsCQ@mail.gmail.com>
Message-ID: <CALGmxE+PskwuDqi=B8GCxHLn17nFUm_eu=D3EuHxzk4oemX2mg@mail.gmail.com>

Well,

First of all, numpy and the python math module have a number of differences
when it comes to handling these kind of special cases -- and I think that:

1) numpy needs to do what makes the most sense for numpy and NOT mirror the
math lib.

2) the use-cases of the math lib and numpy are different, so they maybe
_should_ have different handling of this kind of thing.

3) I'm not sure that the core devs think these kinds of issues are "wrong"
'enough to break backward compatibility in subtle ways.

But it's a fun topic in any case, and maybe numpy's behavior could be
improved.

> My vote is that NumPy is correct here. I see no reason why
> >>> float('inf') / 1
> and
> >>> float('inf') // 1
>
> should return different results.
>

Well, one argument is that "floor division" is supposed to return an
integer value, and that inf is NOT an integer value. The integral part of
infinity doesn't exist and thus is Not a Number.

You also get some weird edge cases around the mod operator.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140918/e509c93b/attachment.html>

From sebastian at sipsolutions.net  Thu Sep 18 13:13:58 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 18 Sep 2014 19:13:58 +0200
Subject: [Numpy-discussion] float('inf') // 1 = ?
In-Reply-To: <CA+=+wqAdb_Ysr1FuzUG1-TSfk9rqGcUK=0XT0fLLG=PG1SeBZQ@mail.gmail.com>
References: <CA+=+wqAdb_Ysr1FuzUG1-TSfk9rqGcUK=0XT0fLLG=PG1SeBZQ@mail.gmail.com>
Message-ID: <1411060438.3474.2.camel@sebastian-laptop>

On Thu, 2014-09-18 at 18:55 +0200, Petr Viktorin wrote:
> Sorry for the lack of subject before.
> 
> On Thu, Sep 18, 2014 at 6:48 PM, Benjamin Root <ben.root at ou.edu> wrote:
> > My vote is that NumPy is correct here. I see no reason why
> >>>> float('inf') / 1
> > and
> >>>> float('inf') // 1
> >

Didn't read python ideas I have to admit (at least not enough to see
many arguments). My biggest argument is that numpy does exactly this:

>>> import math
>>> math.floor(float('inf')/1)

Maybe I am naive thinking that floordivide is basically a divide+floor
operation, but arguing for NaN seems half arsed to me on first sight.
Either it is `inf` and the equivalence Benjamin Root noted holds or you
make it an error. NaN is not reasonable for an error return *especially*
not for the standard lib (it *might* be for numpy).

- Sebastian

> > should return different results.
> 
> I recommend reading the python-ideas thread; there are some arguments
> for both sides.
> 
> > On Thu, Sep 18, 2014 at 12:31 PM, Petr Viktorin <encukou at gmail.com> wrote:
> >>
> >> Hello,
> >> Over at Python-ideas, there is a thread [0] about the following
> >> discrepancy:
> >>
> >> >>> numpy.array(float('inf')) // 1
> >> inf
> >> >>> float('inf') // 1
> >> nan
> >>
> >> There are reasons for either result, but I believe it would be very
> >> nice if either Python or Numpy changed, so they would give the same
> >> value.
> >> If any of you have reasons to defend Numpy's (or Python's) choice, or
> >> otherwise want to chime in, please post there.
> >>
> >> Thanks,
> >> Petr Viktorin
> >>
> >>
> >> [0]
> >> https://mail.python.org/pipermail/python-ideas/2014-September/029365.html
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From jjhelmus at gmail.com  Thu Sep 18 13:14:51 2014
From: jjhelmus at gmail.com (Jonathan Helmus)
Date: Thu, 18 Sep 2014 12:14:51 -0500
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <CALGmxE+PskwuDqi=B8GCxHLn17nFUm_eu=D3EuHxzk4oemX2mg@mail.gmail.com>
References: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>	<CANNq6FkDgDZv9Q-Uc-a=ur9oKiz=CJjKGLjCm1KWEPwKApbsCQ@mail.gmail.com>
	<CALGmxE+PskwuDqi=B8GCxHLn17nFUm_eu=D3EuHxzk4oemX2mg@mail.gmail.com>
Message-ID: <541B130B.8030307@gmail.com>

On 09/18/2014 12:01 PM, Chris Barker wrote:
> Well,
>
> First of all, numpy and the python math module have a number of 
> differences when it comes to handling these kind of special cases -- 
> and I think that:
>
> 1) numpy needs to do what makes the most sense for numpy and NOT 
> mirror the math lib.
>
> 2) the use-cases of the math lib and numpy are different, so they 
> maybe _should_ have different handling of this kind of thing.
>
> 3) I'm not sure that the core devs think these kinds of issues are 
> "wrong" 'enough to break backward compatibility in subtle ways.
>
> But it's a fun topic in any case, and maybe numpy's behavior could be 
> improved.
>
>     My vote is that NumPy is correct here. I see no reason why
>     >>> float('inf') / 1
>     and
>     >>> float('inf') // 1
>
>     should return different results.
>
>
> Well, one argument is that "floor division" is supposed to return an 
> integer value, and that inf is NOT an integer value. The integral part 
> of infinity doesn't exist and thus is Not a Number.
>

But nan is not an integer value either:

 >>> float('inf') // 1
nan
 >>> int(float('inf') // 1)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: cannot convert float NaN to integer

Perhaps float('inf') // 1 should raise a ValueError directly since there 
is no proper way perform the floor division on infinity.

     - Jonathan Helmus

> You also get some weird edge cases around the mod operator.
>
> -Chris
>
> -- 
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov <mailto:Chris.Barker at noaa.gov>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140918/eb355f6b/attachment.html>

From encukou at gmail.com  Thu Sep 18 13:44:09 2014
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 18 Sep 2014 19:44:09 +0200
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <541B130B.8030307@gmail.com>
References: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>
	<CANNq6FkDgDZv9Q-Uc-a=ur9oKiz=CJjKGLjCm1KWEPwKApbsCQ@mail.gmail.com>
	<CALGmxE+PskwuDqi=B8GCxHLn17nFUm_eu=D3EuHxzk4oemX2mg@mail.gmail.com>
	<541B130B.8030307@gmail.com>
Message-ID: <CA+=+wqDfRNdg0aTvsZoLosNYv=ETeS04kROJ+j+PnnwEt4nnAA@mail.gmail.com>

On Thu, Sep 18, 2014 at 7:14 PM, Jonathan Helmus <jjhelmus at gmail.com> wrote:
> On 09/18/2014 12:01 PM, Chris Barker wrote:
>
> Well,
>
> First of all, numpy and the python math module have a number of differences
> when it comes to handling these kind of special cases -- and I think that:
>
> 1) numpy needs to do what makes the most sense for numpy and NOT mirror the
> math lib.

Sure.

> 2) the use-cases of the math lib and numpy are different, so they maybe
> _should_ have different handling of this kind of thing.

If you have a reason for the difference, I'd like to hear it.

> 3) I'm not sure that the core devs think these kinds of issues are "wrong"
> 'enough to break backward compatibility in subtle ways.

I'd be perfectly fine with it being documented and tested (in CPython)
as either a design mistake or design choice.

> But it's a fun topic in any case, and maybe numpy's behavior could be
> improved.
>>
>> My vote is that NumPy is correct here. I see no reason why
>> >>> float('inf') / 1
>> and
>> >>> float('inf') // 1
>>
>> should return different results.
>
>
> Well, one argument is that "floor division" is supposed to return an integer
> value, and that inf is NOT an integer value. The integral part of infinity
> doesn't exist and thus is Not a Number.
>
>
> But nan is not an integer value either:
>
>>>> float('inf') // 1
> nan
>>>> int(float('inf') // 1)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: cannot convert float NaN to integer
>
> Perhaps float('inf') // 1 should raise a ValueError directly since there is
> no proper way perform the floor division on infinity.

inf not even a *real* number; a lot of operations don't make
mathematical sense on it. But most are defined anyway, and quite
sanely.


From chris.barker at noaa.gov  Thu Sep 18 14:40:28 2014
From: chris.barker at noaa.gov (Chris Barker)
Date: Thu, 18 Sep 2014 11:40:28 -0700
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <CA+=+wqDfRNdg0aTvsZoLosNYv=ETeS04kROJ+j+PnnwEt4nnAA@mail.gmail.com>
References: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>
	<CANNq6FkDgDZv9Q-Uc-a=ur9oKiz=CJjKGLjCm1KWEPwKApbsCQ@mail.gmail.com>
	<CALGmxE+PskwuDqi=B8GCxHLn17nFUm_eu=D3EuHxzk4oemX2mg@mail.gmail.com>
	<541B130B.8030307@gmail.com>
	<CA+=+wqDfRNdg0aTvsZoLosNYv=ETeS04kROJ+j+PnnwEt4nnAA@mail.gmail.com>
Message-ID: <CALGmxELh9BSaj48cyMGVgJ5PLyLo0KuaM0QFP+NPOUBZiV6BzA@mail.gmail.com>

On Thu, Sep 18, 2014 at 10:44 AM, Petr Viktorin <encukou at gmail.com> wrote:

> > 2) the use-cases of the math lib and numpy are different, so they maybe
> > _should_ have different handling of this kind of thing.
>
> If you have a reason for the difference, I'd like to hear it.


For one, numpy does array operations, and you really don't want a
ValueError (or any Exception) raised when perhaps only one value in a huge
array has an issue.

The other is that numpy users are potentially more sophisticated with
regard to numeric computing issues, and in any case, need to prioritize
different things -- like performance over safety.


> > But nan is not an integer value either:
>

I meant conceptually.

sure -- it's not any number at all -- a NaN can be arrived at many ways,
all it means something happened for which there was not an appropriate
numerical answer -- even inf or -inf. So, the question is:

is the integer part of inf infinity? or it undefined, and therefor NaN ?

I can't image a use case where it would matter, which is probably why numpy
returns inf.


> Perhaps float('inf') // 1 should raise a ValueError directly since there
> is
> > no proper way perform the floor division on infinity.
>

not in numpy  for sure -- but I don't see the point in the math lib either,
let the NaN propagate and deal with later if you need to -- that's what
they are for.


- Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140918/375d2a35/attachment.html>

From sebastian at sipsolutions.net  Thu Sep 18 14:41:49 2014
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 18 Sep 2014 20:41:49 +0200
Subject: [Numpy-discussion] float('inf') // 1 = ?
In-Reply-To: <1411060438.3474.2.camel@sebastian-laptop>
References: <CA+=+wqAdb_Ysr1FuzUG1-TSfk9rqGcUK=0XT0fLLG=PG1SeBZQ@mail.gmail.com>
	<1411060438.3474.2.camel@sebastian-laptop>
Message-ID: <1411065709.19963.12.camel@sebastian-t440>

On Do, 2014-09-18 at 19:13 +0200, Sebastian Berg wrote:
> On Thu, 2014-09-18 at 18:55 +0200, Petr Viktorin wrote:
> > Sorry for the lack of subject before.
> > 
> > On Thu, Sep 18, 2014 at 6:48 PM, Benjamin Root <ben.root at ou.edu> wrote:
> > > My vote is that NumPy is correct here. I see no reason why
> > >>>> float('inf') / 1
> > > and
> > >>>> float('inf') // 1
> > >
> 
> Didn't read python ideas I have to admit (at least not enough to see
> many arguments). My biggest argument is that numpy does exactly this:
> 
> >>> import math
> >>> math.floor(float('inf')/1)
> 
> Maybe I am naive thinking that floordivide is basically a divide+floor
> operation, but arguing for NaN seems half arsed to me on first sight.
> Either it is `inf` and the equivalence Benjamin Root noted holds or you
> make it an error. NaN is not reasonable for an error return *especially*
> not for the standard lib (it *might* be for numpy).
> 

Ok, sorry, that was too quick. Since we already have a "special" value
like Inf, it is OK to not give an error in the standard lib, since
Inf-Inf is also NaN, etc.
However, I read the arguments as "Inf is not an integral number", and I
frankly don't understand that at all. Infinity isn't a real number as
well!?

If you represent the result as an IEEE float (integral or not) you can
use Inf as a valid result IMO. Arguably all of the limits:

floor(float) -> Inf
float // 1 -> Inf

exist for float -> Inf and can be represented. Also IEEE seems to define
the floor operation like this and I don't see a reason why to violate
`a//b == floor(a/b)`. As long as the result is represented as an IEEE
floating point, which knows Infinity, I argue that it is correct. If it
is not an IEEE floating point, it is an error in any case.

- Sebastian


> - Sebastian
> 
> > > should return different results.
> > 
> > I recommend reading the python-ideas thread; there are some arguments
> > for both sides.
> > 
> > > On Thu, Sep 18, 2014 at 12:31 PM, Petr Viktorin <encukou at gmail.com> wrote:
> > >>
> > >> Hello,
> > >> Over at Python-ideas, there is a thread [0] about the following
> > >> discrepancy:
> > >>
> > >> >>> numpy.array(float('inf')) // 1
> > >> inf
> > >> >>> float('inf') // 1
> > >> nan
> > >>
> > >> There are reasons for either result, but I believe it would be very
> > >> nice if either Python or Numpy changed, so they would give the same
> > >> value.
> > >> If any of you have reasons to defend Numpy's (or Python's) choice, or
> > >> otherwise want to chime in, please post there.
> > >>
> > >> Thanks,
> > >> Petr Viktorin
> > >>
> > >>
> > >> [0]
> > >> https://mail.python.org/pipermail/python-ideas/2014-September/029365.html
> > >> _______________________________________________
> > >> NumPy-Discussion mailing list
> > >> NumPy-Discussion at scipy.org
> > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at scipy.org
> > > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140918/8e421e30/attachment.sig>

From jjhelmus at gmail.com  Thu Sep 18 15:30:33 2014
From: jjhelmus at gmail.com (Jonathan Helmus)
Date: Thu, 18 Sep 2014 14:30:33 -0500
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <CA+=+wqDfRNdg0aTvsZoLosNYv=ETeS04kROJ+j+PnnwEt4nnAA@mail.gmail.com>
References: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>	<CANNq6FkDgDZv9Q-Uc-a=ur9oKiz=CJjKGLjCm1KWEPwKApbsCQ@mail.gmail.com>	<CALGmxE+PskwuDqi=B8GCxHLn17nFUm_eu=D3EuHxzk4oemX2mg@mail.gmail.com>	<541B130B.8030307@gmail.com>
	<CA+=+wqDfRNdg0aTvsZoLosNYv=ETeS04kROJ+j+PnnwEt4nnAA@mail.gmail.com>
Message-ID: <541B32D9.6070501@gmail.com>

On 09/18/2014 12:44 PM, Petr Viktorin wrote:
> On Thu, Sep 18, 2014 at 7:14 PM, Jonathan Helmus <jjhelmus at gmail.com> wrote:
>> On 09/18/2014 12:01 PM, Chris Barker wrote:
>>
>> Well,
>>
>> First of all, numpy and the python math module have a number of differences
>> when it comes to handling these kind of special cases -- and I think that:
>>
>> 1) numpy needs to do what makes the most sense for numpy and NOT mirror the
>> math lib.
> Sure.
>
>> 2) the use-cases of the math lib and numpy are different, so they maybe
>> _should_ have different handling of this kind of thing.
> If you have a reason for the difference, I'd like to hear it.
>
>> 3) I'm not sure that the core devs think these kinds of issues are "wrong"
>> 'enough to break backward compatibility in subtle ways.
> I'd be perfectly fine with it being documented and tested (in CPython)
> as either a design mistake or design choice.
>
>> But it's a fun topic in any case, and maybe numpy's behavior could be
>> improved.
>>> My vote is that NumPy is correct here. I see no reason why
>>>>>> float('inf') / 1
>>> and
>>>>>> float('inf') // 1
>>> should return different results.
>>
>> Well, one argument is that "floor division" is supposed to return an integer
>> value, and that inf is NOT an integer value. The integral part of infinity
>> doesn't exist and thus is Not a Number.
>>
>>
>> But nan is not an integer value either:
>>
>>>>> float('inf') // 1
>> nan
>>>>> int(float('inf') // 1)
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>> ValueError: cannot convert float NaN to integer
>>
>> Perhaps float('inf') // 1 should raise a ValueError directly since there is
>> no proper way perform the floor division on infinity.
> inf not even a *real* number; a lot of operations don't make
> mathematical sense on it. But most are defined anyway, and quite
> sanely.

But in IEEE-754 inf is a valid floating point number (where-as NaN is 
not) and has well defined arithmetic, specifically inf / 1 == inf and 
RoundToIntergral(inf) == inf.  In the numpy example, the 
numpy.array(float('inf')) statement creates an array containing a 
float32 or float64 representation of inf.  After this I would expect a 
floor division to return inf since that is what IEEE-754 arithmetic 
specifies.

For me the question is if the floor division should also perform a cast 
to an integer type. Since inf cannot be represented in most common 
integer formats this should raise an exception.  Since // does not 
normally perform a cast, for example type(float(5) // 2) == float, the 
point is mute.

The real question is if Python floats follows IEEE-754 arithmetic or 
not.  If they do not what standard are they going to follow?

     - Jonathan Helmus


From insertinterestingnamehere at gmail.com  Thu Sep 18 15:46:08 2014
From: insertinterestingnamehere at gmail.com (Ian Henriksen)
Date: Thu, 18 Sep 2014 13:46:08 -0600
Subject: [Numpy-discussion] (no subject)
In-Reply-To: <541B32D9.6070501@gmail.com>
References: <CA+=+wqCxV2U3EpLjZk9WNt7xy-ZVFUvLmGV7VbR=1T4-G-Mfdw@mail.gmail.com>
	<CANNq6FkDgDZv9Q-Uc-a=ur9oKiz=CJjKGLjCm1KWEPwKApbsCQ@mail.gmail.com>
	<CALGmxE+PskwuDqi=B8GCxHLn17nFUm_eu=D3EuHxzk4oemX2mg@mail.gmail.com>
	<541B130B.8030307@gmail.com>
	<CA+=+wqDfRNdg0aTvsZoLosNYv=ETeS04kROJ+j+PnnwEt4nnAA@mail.gmail.com>
	<541B32D9.6070501@gmail.com>
Message-ID: <CAPWbfpSR7M6YzJGdd5HCtbKcW=DyxxjAfN_dbUD7SJGH_giXug@mail.gmail.com>

On Thu, Sep 18, 2014 at 1:30 PM, Jonathan Helmus <jjhelmus at gmail.com> wrote:

> On 09/18/2014 12:44 PM, Petr Viktorin wrote:
> > On Thu, Sep 18, 2014 at 7:14 PM, Jonathan Helmus <jjhelmus at gmail.com>
> wrote:
> >> On 09/18/2014 12:01 PM, Chris Barker wrote:
> >>
> >> Well,
> >>
> >> First of all, numpy and the python math module have a number of
> differences
> >> when it comes to handling these kind of special cases -- and I think
> that:
> >>
> >> 1) numpy needs to do what makes the most sense for numpy and NOT mirror
> the
> >> math lib.
> > Sure.
> >
> >> 2) the use-cases of the math lib and numpy are different, so they maybe
> >> _should_ have different handling of this kind of thing.
> > If you have a reason for the difference, I'd like to hear it.
> >
> >> 3) I'm not sure that the core devs think these kinds of issues are
> "wrong"
> >> 'enough to break backward compatibility in subtle ways.
> > I'd be perfectly fine with it being documented and tested (in CPython)
> > as either a design mistake or design choice.
> >
> >> But it's a fun topic in any case, and maybe numpy's behavior could be
> >> improved.
> >>> My vote is that NumPy is correct here. I see no reason why
> >>>>>> float('inf') / 1
> >>> and
> >>>>>> float('inf') // 1
> >>> should return different results.
> >>
> >> Well, one argument is that "floor division" is supposed to return an
> integer
> >> value, and that inf is NOT an integer value. The integral part of
> infinity
> >> doesn't exist and thus is Not a Number.
> >>
> >>
> >> But nan is not an integer value either:
> >>
> >>>>> float('inf') // 1
> >> nan
> >>>>> int(float('inf') // 1)
> >> Traceback (most recent call last):
> >>    File "<stdin>", line 1, in <module>
> >> ValueError: cannot convert float NaN to integer
> >>
> >> Perhaps float('inf') // 1 should raise a ValueError directly since
> there is
> >> no proper way perform the floor division on infinity.
> > inf not even a *real* number; a lot of operations don't make
> > mathematical sense on it. But most are defined anyway, and quite
> > sanely.
>
> But in IEEE-754 inf is a valid floating point number (where-as NaN is
> not) and has well defined arithmetic, specifically inf / 1 == inf and
> RoundToIntergral(inf) == inf.  In the numpy example, the
> numpy.array(float('inf')) statement creates an array containing a
> float32 or float64 representation of inf.  After this I would expect a
> floor division to return inf since that is what IEEE-754 arithmetic
> specifies.
>
> For me the question is if the floor division should also perform a cast
> to an integer type. Since inf cannot be represented in most common
> integer formats this should raise an exception.  Since // does not
> normally perform a cast, for example type(float(5) // 2) == float, the
> point is mute.
>
> The real question is if Python floats follows IEEE-754 arithmetic or
> not.  If they do not what standard are they going to follow?
>
>      - Jonathan Helmus
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Agreed. It's definitely best to follow the IEEE conventions. That will be
the most commonly expected behavior, especially in ambiguous cases like
this.
-Ian Henriksen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140918/9298eac5/attachment.html>

From pmhobson at gmail.com  Fri Sep 19 10:47:46 2014
From: pmhobson at gmail.com (Paul Hobson)
Date: Fri, 19 Sep 2014 07:47:46 -0700
Subject: [Numpy-discussion] Generalize hstack/vstack --> stack;
 Block matrices like in matlab
In-Reply-To: <CANNq6Fmv-_vKJKTA2cxDfUtoTQ1R2T5v6t11aBZ18Upk1=efKw@mail.gmail.com>
References: <CAFync2iJ9_1f7Na4EptMaLFNxTLJvTRKj04L4VGnLCrzcL9khA@mail.gmail.com>
	<101656916431878296.890307sturla.molden-gmail.com@news.gmane.org>
	<CAPOWHWm2Jr+Wyfj+Vd_QKj30XrHBGQmpYpmTpZ9ojvcrjH1h5w@mail.gmail.com>
	<CAO0rnfEVTDus5C66Krn0G-=+hb5CNAaGyi=UfAAFyu1Xuy6atA@mail.gmail.com>
	<CANNq6Fmv-_vKJKTA2cxDfUtoTQ1R2T5v6t11aBZ18Upk1=efKw@mail.gmail.com>
Message-ID: <CADT3MEDiBYW=xxCiGfns5ERobG7pe-jYKf0SOwb3pcaRQgoELQ@mail.gmail.com>

Hey Ben,

Side note: I've had to do the same thing for stitching curvilinear model
grid coordinates together. Usings pandas DataFrames indexed by `i` and `j`
is really good for this. You can offset the indices directly, unstack the
DF, and the pandas will align for you.

Happy to send an example along if you're curious.
-p

On Mon, Sep 8, 2014 at 9:55 AM, Benjamin Root <ben.root at ou.edu> wrote:

> A use case would be "image stitching" or even data tiling. I have had to
> implement something like this at work (so, I can't share it, unfortunately)
> and it even goes so far as to allow the caller to specify how much the
> tiles can overlap and such. The specification is ungodly hideous and I
> doubt I would be willing to share it even if I could lest I release
> code-thulu upon the world...
>
> I think just having this generalize stack feature would be nice start.
> Tetris could be built on top of that later. (Although, I do vote for at
> least 3 or 4 dimensional stacking, if possible).
>
> Cheers!
> Ben Root
>
>
> On Mon, Sep 8, 2014 at 12:41 PM, Eelco Hoogendoorn <
> hoogendoorn.eelco at gmail.com> wrote:
>
>> Sturla: im not sure if the intention is always unambiguous, for such more
>> flexible arrangements.
>>
>> Also, I doubt such situations arise often in practice; if the arrays arnt
>> a grid, they are probably a nested grid, and the code would most naturally
>> concatenate them with nested calls to a stacking function.
>>
>> However, some form of nd-stack function would be neat in my opinion.
>>
>> On Mon, Sep 8, 2014 at 6:10 PM, Jaime Fern?ndez del R?o <
>> jaime.frio at gmail.com> wrote:
>>
>>> On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden <sturla.molden at gmail.com>
>>> wrote:
>>>
>>>> Stefan Otte <stefan.otte at gmail.com> wrote:
>>>>
>>>> >     stack([[a, b], [c, d]])
>>>> >
>>>> > In my case `stack` replaced `hstack` and `vstack` almost completely.
>>>> >
>>>> > If you're interested in including it in numpy I created a pull request
>>>> > [1]. I'm looking forward to getting some feedback!
>>>>
>>>> As far as I can see, it uses hstack and vstack. But that means a and b
>>>> have
>>>> to have the same number of rows, c and d must have the same rumber of
>>>> rows,
>>>> and hstack((a,b)) and hstack((c,d)) must have the same number of
>>>> columns.
>>>>
>>>> Thus it requires a regularity like this:
>>>>
>>>> AAAABB
>>>> AAAABB
>>>> CCCDDD
>>>> CCCDDD
>>>> CCCDDD
>>>> CCCDDD
>>>>
>>>> What if we just ignore this constraint, and only require the output to
>>>> be
>>>> rectangular? Now we have a 'tetris game':
>>>>
>>>> AAAABB
>>>> AAAABB
>>>> CCCCBB
>>>> CCCCBB
>>>> CCCCDD
>>>> CCCCDD
>>>>
>>>> or
>>>>
>>>> AAAABB
>>>> AAAABB
>>>> CCCCBB
>>>> CCCCBB
>>>> CCCCBB
>>>> CCCCBB
>>>>
>>>> This should be 'stackable', yes? Or perhaps we need another stacking
>>>> function for this, say numpy.tetris?
>>>>
>>>> And while we're at it, what about higher dimensions? should there be an
>>>> ndstack function too?
>>>>
>>>
>>> This is starting to look like the second time in a row Stefan tries to
>>> extend numpy with a simple convenience function, and he gets tricked into
>>> implementing some sophisticated algorithm...
>>>
>>> For his next PR I expect nothing less than an NP-complete problem. ;-)
>>>
>>>
>>>> Jaime
>>>
>>> --
>>> (\__/)
>>> ( O.o)
>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus
>>> planes de dominaci?n mundial.
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140919/2aeeb19f/attachment.html>

From demitri.muna at gmail.com  Sun Sep 21 17:10:03 2014
From: demitri.muna at gmail.com (Demitri Muna)
Date: Sun, 21 Sep 2014 17:10:03 -0400
Subject: [Numpy-discussion] Numpy 'None' comparison FutureWarning
Message-ID: <ACBE962D-F77F-45FD-994E-6466FEDBAD8C@gmail.com>

Hi,

I just encountered the following in my code:

FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.

I'm very concerned about this. This is a very common programming pattern (lazy loading):

class A(object):
    def __init__(self):
        self._some_array = None

    @property
    def some_array(self):
        if self._some_array == None:
            # perform some expensive setup of array
        return self._some_array

It seems to me that the new behavior will break this pattern. I think that redefining the "==" operator is a little too aggressive here. It strikes me as very nonstandard and not at all obvious to someone reading the code that the comparison is a very special case for numpy objects. Unless there's some aspect I'm missing here, I think an element-wise comparator should be more explicit.

Cheers,
Demitri

_________________________________________
Demitri Muna

Department of Astronomy
Le Ohio State University

http://trillianverse.org
http://scicoder.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/3c8cfc27/attachment.html>

From efiring at hawaii.edu  Sun Sep 21 17:19:56 2014
From: efiring at hawaii.edu (Eric Firing)
Date: Sun, 21 Sep 2014 11:19:56 -1000
Subject: [Numpy-discussion] Numpy 'None' comparison FutureWarning
In-Reply-To: <ACBE962D-F77F-45FD-994E-6466FEDBAD8C@gmail.com>
References: <ACBE962D-F77F-45FD-994E-6466FEDBAD8C@gmail.com>
Message-ID: <541F40FC.6010105@hawaii.edu>

On 2014/09/21, 11:10 AM, Demitri Muna wrote:
> Hi,
>
> I just encountered the following in my code:
>
> FutureWarning: comparison to `None` will result in an elementwise object
> comparison in the future.
>
> I'm very concerned about this. This is a very common programming pattern
> (lazy loading):
>
> class A(object):
>      def __init__(self):
>          self._some_array = None
>
>      @property
>      def some_array(self):
>          if self._some_array == None:
>              # perform some expensive setup of array
>          return self._some_array
>
> It seems to me that the new behavior will break this pattern. I think
> that redefining the "==" operator is a little too aggressive here. It
> strikes me as very nonstandard and not at all obvious to someone reading
> the code that the comparison is a very special case for numpy objects.
> Unless there's some aspect I'm missing here, I think an element-wise
> comparator should be more explicit.


I think what you are missing is that the standard Python idiom for this 
use case is "if self._some_array is None:".  This will continue to work, 
regardless of whether the object being checked is an ndarray or any 
other Python object.

Eric

>
> Cheers,
> Demitri
>
> _________________________________________
> Demitri Muna
>
> Department of Astronomy
> Le Ohio State University
>
> http://trillianverse.org
> http://scicoder.org
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From shoyer at gmail.com  Sun Sep 21 19:50:12 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 21 Sep 2014 16:50:12 -0700
Subject: [Numpy-discussion] Custom dtypes without C -- or,
	a standard ndarray-like type
Message-ID: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>

pandas has some hacks to support custom types of data for which numpy can't
handle well enough or at all. Examples include datetime and Categorical
[1], and others like GeoArray [2] that haven't make it into pandas yet.

Most of these look like numpy arrays but with custom dtypes and type
specific methods/properties. But clearly nobody is particularly excited
about writing the the C necessary to implement custom dtypes [3]. Nor is do
we need the ndarray ABI.

In many cases, writing C may not actually even be necessary for performance
reasons, e.g., categorical can be fast enough just by wrapping an integer
ndarray for the internal storage and using vectorized operations. And even
if it is necessary, I think we'd all rather write Cython than C.

It's great for pandas to write its own ndarray-like wrappers (*not*
subclasses) that work with pandas, but it's a shame that there isn't a
standard interface like the ndarray to make these arrays useable for the
rest of the scientific Python ecosystem. For example, pandas has loads of
fixes for np.datetime64, but nobody seems to be up for porting them to
numpy (I doubt it would be easy).

I know these sort of concerns are not new, but I wish I had a sense of what
the solution looks like. Is anyone actively working on these issues? Does
the fix belong in numpy, pandas, blaze or a new project? I'd love to get a
sense of where things stand and how I could help -- without writing any C
:).

Thanks,
Stephan

[1] https://github.com/pydata/pandas/pull/7217
[2] https://github.com/geopandas/geopandas/issues/166
[3] https://github.com/numpy/numpy-dtypes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/0433f7a3/attachment.html>

From charlesr.harris at gmail.com  Sun Sep 21 20:13:39 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 21 Sep 2014 18:13:39 -0600
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
Message-ID: <CAB6mnxKDL0MA6QB-9nUwqeOZSMRhWA5_Ngsi--nvVR8Kwcg7Sw@mail.gmail.com>

On Sun, Sep 21, 2014 at 5:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> pandas has some hacks to support custom types of data for which numpy
> can't handle well enough or at all. Examples include datetime and
> Categorical [1], and others like GeoArray [2] that haven't make it into
> pandas yet.
>
> Most of these look like numpy arrays but with custom dtypes and type
> specific methods/properties. But clearly nobody is particularly excited
> about writing the the C necessary to implement custom dtypes [3]. Nor is do
> we need the ndarray ABI.
>
> In many cases, writing C may not actually even be necessary for
> performance reasons, e.g., categorical can be fast enough just by wrapping
> an integer ndarray for the internal storage and using vectorized
> operations. And even if it is necessary, I think we'd all rather write
> Cython than C.
>
> It's great for pandas to write its own ndarray-like wrappers (*not*
> subclasses) that work with pandas, but it's a shame that there isn't a
> standard interface like the ndarray to make these arrays useable for the
> rest of the scientific Python ecosystem. For example, pandas has loads of
> fixes for np.datetime64, but nobody seems to be up for porting them to
> numpy (I doubt it would be easy).
>
> I know these sort of concerns are not new, but I wish I had a sense of
> what the solution looks like. Is anyone actively working on these issues?
> Does the fix belong in numpy, pandas, blaze or a new project? I'd love to
> get a sense of where things stand and how I could help -- without writing
> any C :).
>
>
I haven't thought much about this myself, but others (Nathaniel?) have, and
it would be good to explore the topic and maybe put together some
examples/templates to make this approach easier. Input from someone with
some experience would be *much* appreciated.

The datetime problem persists and I've thinking it would be nice to replace
the current implementation with something simpler that can be stolen from
elsewhere. It would be nice to hear how someone else dealt with the problem.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/c0279c4c/attachment.html>

From ben.root at ou.edu  Sun Sep 21 21:30:55 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Sun, 21 Sep 2014 21:30:55 -0400
Subject: [Numpy-discussion] Numpy 'None' comparison FutureWarning
In-Reply-To: <541F40FC.6010105@hawaii.edu>
References: <ACBE962D-F77F-45FD-994E-6466FEDBAD8C@gmail.com>
	<541F40FC.6010105@hawaii.edu>
Message-ID: <CANNq6Fn9z9pJ3dex9hueTBGtUXnb1FrV1KA6uGC8Y_qya2awCg@mail.gmail.com>

That being said, I do wonder about related situations where the lhs of the
equal sign might be an array, or it might be a None and you are comparing
against another numpy array. In those situations, you aren't trying to
compare against None, you are just checking if two objects are equivalent.


When is this change planned?

Ben Root

On Sun, Sep 21, 2014 at 5:19 PM, Eric Firing <efiring at hawaii.edu> wrote:

> On 2014/09/21, 11:10 AM, Demitri Muna wrote:
> > Hi,
> >
> > I just encountered the following in my code:
> >
> > FutureWarning: comparison to `None` will result in an elementwise object
> > comparison in the future.
> >
> > I'm very concerned about this. This is a very common programming pattern
> > (lazy loading):
> >
> > class A(object):
> >      def __init__(self):
> >          self._some_array = None
> >
> >      @property
> >      def some_array(self):
> >          if self._some_array == None:
> >              # perform some expensive setup of array
> >          return self._some_array
> >
> > It seems to me that the new behavior will break this pattern. I think
> > that redefining the "==" operator is a little too aggressive here. It
> > strikes me as very nonstandard and not at all obvious to someone reading
> > the code that the comparison is a very special case for numpy objects.
> > Unless there's some aspect I'm missing here, I think an element-wise
> > comparator should be more explicit.
>
>
> I think what you are missing is that the standard Python idiom for this
> use case is "if self._some_array is None:".  This will continue to work,
> regardless of whether the object being checked is an ndarray or any
> other Python object.
>
> Eric
>
> >
> > Cheers,
> > Demitri
> >
> > _________________________________________
> > Demitri Muna
> >
> > Department of Astronomy
> > Le Ohio State University
> >
> > http://trillianverse.org
> > http://scicoder.org
> >
> >
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/067ed13c/attachment.html>

From demitri.muna at gmail.com  Sun Sep 21 22:02:26 2014
From: demitri.muna at gmail.com (Demitri Muna)
Date: Sun, 21 Sep 2014 22:02:26 -0400
Subject: [Numpy-discussion] Numpy 'None' comparison FutureWarning
In-Reply-To: <541F40FC.6010105@hawaii.edu>
References: <ACBE962D-F77F-45FD-994E-6466FEDBAD8C@gmail.com>
	<541F40FC.6010105@hawaii.edu>
Message-ID: <EA3D6B3F-BAF4-44CF-8B13-99C29FFF9E78@gmail.com>


On Sep 21, 2014, at 5:19 PM, Eric Firing <efiring at hawaii.edu> wrote:

> I think what you are missing is that the standard Python idiom for this 
> use case is "if self._some_array is None:".  This will continue to work, 
> regardless of whether the object being checked is an ndarray or any 
> other Python object.


That's an alternative, but I think it's a subtle distinction that will be lost on many users. I still think that this is something that can easily trip up many people; it's not clear from looking at the code that this is the behavior; it's "hidden". At the very least, I strongly suggest that the warning point this out, e.g.

"FutureWarning: comparison to `None` will result in an elementwise object comparison in the future; use  'value is None' as an alternative."

Assume:

a = np.array([1, 2, 3, 4])
b = np.array([None, None, None, None])

What is the result of "a == None"? Is it "np.array([False, False, False, False])"?

What about the second case? Is the result of "b == None" -> np.array([True, True, True, True])? If so, then

if (b == None):
    ...

will always evaluate to "True" if b is "None" or *any* Numpy array, and that's clearly unexpected behavior.

On Sep 21, 2014, at 9:30 PM, Benjamin Root <ben.root at ou.edu> wrote:

> That being said, I do wonder about related situations where the lhs of the equal sign might be an array, or it might be a None and you are comparing against another numpy array. In those situations, you aren't trying to compare against None, you are just checking if two objects are equivalent.


Right. With this change, using "==" with numpy arrays now sometimes means "are these equivalent" and other times "element-wise comparison". The potential for inadvertent bugs is far greater than what convenience this redefinition of a very basic operator might offer. Any scenario where

(a == b) != (b == a)

is asking for trouble.

Cheers,
Demitri

_________________________________________
Demitri Muna

Department of Astronomy
An Ohio State University

http://trillianverse.org
http://scicoder.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/d287dfc9/attachment.html>

From njs at pobox.com  Sun Sep 21 22:53:57 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Sun, 21 Sep 2014 22:53:57 -0400
Subject: [Numpy-discussion] Numpy 'None' comparison FutureWarning
In-Reply-To: <EA3D6B3F-BAF4-44CF-8B13-99C29FFF9E78@gmail.com>
References: <ACBE962D-F77F-45FD-994E-6466FEDBAD8C@gmail.com>
	<541F40FC.6010105@hawaii.edu>
	<EA3D6B3F-BAF4-44CF-8B13-99C29FFF9E78@gmail.com>
Message-ID: <CAPJVwB=0EDx2a=Aa4qkOEmHjgSKe3MCcc_D7AqRN780KSoip5Q@mail.gmail.com>

On 22 Sep 2014 03:02, "Demitri Muna" <demitri.muna at gmail.com> wrote:
>
>
> On Sep 21, 2014, at 5:19 PM, Eric Firing <efiring at hawaii.edu> wrote:
>
>> I think what you are missing is that the standard Python idiom for this
>> use case is "if self._some_array is None:".  This will continue to work,
>> regardless of whether the object being checked is an ndarray or any
>> other Python object.
>
>
> That's an alternative, but I think it's a subtle distinction that will be
lost on many users. I still think that this is something that can easily
trip up many people; it's not clear from looking at the code that this is
the behavior; it's "hidden". At the very least, I strongly suggest that the
warning point this out, e.g.
>
> "FutureWarning: comparison to `None` will result in an elementwise object
comparison in the future; use  'value is None' as an alternative."

Making messages clearer is always welcome, and we devs aren't always in the
best position to do so because we're to close to the issues to see which
parts are confusing to outsiders - perhaps you'd like to submit a pull
request with this?

> Assume:
>
> a = np.array([1, 2, 3, 4])
> b = np.array([None, None, None, None])
>
> What is the result of "a == None"? Is it "np.array([False, False, False,
False])"?

After this change, yes.

> What about the second case? Is the result of "b == None" ->
np.array([True, True, True, True])?

Yes again.

(Notice that this is also a subtle and confusing point for many users - how
many people realize that if they want to get the latter result they have to
write np.equal(b, None)?)

> If so, then
>
> if (b == None):
>     ...
>
> will always evaluate to "True" if b is "None" or *any* Numpy array, and
that's clearly unexpected behavior.

No, that's not how numpy arrays interact with if statements. This is
independent of the handling of 'arr == None': 'if multi_element_array' is
always an error, because an if statement by definition requires a single
true/false decision (it can't execute both branches after all!), but a
multi-element array by definition contains multiple values that might have
contradictory truthiness.

Currently, 'b == x' returns an array in every situation *except* when x
happens to be 'None'. After this change, 'b == x' will *always* return an
array, so 'if b == x' will always raise an error.

>
> On Sep 21, 2014, at 9:30 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>> That being said, I do wonder about related situations where the lhs of
the equal sign might be an array, or it might be a None and you are
comparing against another numpy array. In those situations, you aren't
trying to compare against None, you are just checking if two objects are
equivalent.

Benjamin, can you give a more concrete example? Right now the *only* time
== on arrays checks for equivalence is when the object being compared
against is None, in which case == pretends to be 'is' because of this
mysterious special case. In every other case it does a broadcasted ==,
which is very different.

> Right. With this change, using "==" with numpy arrays now sometimes means
"are these equivalent" and other times "element-wise comparison".

Err, you have this backwards :-). Right now == means element-wise
comparison except in this one special case, where it doesn't. After the
change, it will mean element-wise comparison consistently in all cases.

> The potential for inadvertent bugs is far greater than what convenience
this redefinition of a very basic operator might offer. Any scenario where
>
> (a == b) != (b == a)
>
> is asking for trouble.

That would be unfortunate, yes, but fortunately it doesn't apply here :-).
'a == b' and 'b == a' currently always return the same thing, and there are
no plans to change this - we'll be changing what both of them mean at the
same time.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/4c86ee0a/attachment.html>

From njs at pobox.com  Sun Sep 21 23:31:30 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Sun, 21 Sep 2014 23:31:30 -0400
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
Message-ID: <CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>

On Sun, Sep 21, 2014 at 7:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
> pandas has some hacks to support custom types of data for which numpy can't
> handle well enough or at all. Examples include datetime and Categorical [1],
> and others like GeoArray [2] that haven't make it into pandas yet.
>
> Most of these look like numpy arrays but with custom dtypes and type
> specific methods/properties. But clearly nobody is particularly excited
> about writing the the C necessary to implement custom dtypes [3]. Nor is do
> we need the ndarray ABI.
>
> In many cases, writing C may not actually even be necessary for performance
> reasons, e.g., categorical can be fast enough just by wrapping an integer
> ndarray for the internal storage and using vectorized operations. And even
> if it is necessary, I think we'd all rather write Cython than C.
>
> It's great for pandas to write its own ndarray-like wrappers (*not*
> subclasses) that work with pandas, but it's a shame that there isn't a
> standard interface like the ndarray to make these arrays useable for the
> rest of the scientific Python ecosystem. For example, pandas has loads of
> fixes for np.datetime64, but nobody seems to be up for porting them to numpy
> (I doubt it would be easy).

Writing them in the first place probably wasn't easy either :-). I
don't really know why pandas spends so much effort on reimplementing
stuff and papering over numpy limitations instead of fixing things
upstream so that everyone can benefit. I assume they have reasons, and
I could make some general guesses at what some of them might be, but
if you want to know what they are -- which is presumably the first
step in changing the situation -- you'll have to ask them, not us :-).

> I know these sort of concerns are not new, but I wish I had a sense of what
> the solution looks like. Is anyone actively working on these issues? Does
> the fix belong in numpy, pandas, blaze or a new project? I'd love to get a
> sense of where things stand and how I could help -- without writing any C
> :).

I think there are there are three parts:

For stuff that's literally just fixing bugs in stuff that numpy
already has, then we'd certainly be happy to accept those bug fixes.
Probably there are things we can do to make this easier, I dunno. I'd
love to see some of numpy's internals moving into Cython to make them
easier to hack on, but this won't be simple because right now using
Cython to implement a module is really an all-or-nothing affair;
making it possible to mix Cython with numpy's existing C code will
require upstream changes in Cython.

For cases where people genuinely want to implement a new array-like
types (e.g. DataFrame or scipy.sparse) then numpy provides a fair
amount of support for this already (e.g., the various hooks that allow
things like np.asarray(mydf) or np.sin(mydf) to work), and we're
working on adding more over time (e.g., __numpy_ufunc__).

My feeling though is that in most of the cases you mention,
implementing a new array-like type is huge overkill. ndarray's
interface is vast and reimplementing even 90% of it is a huge effort.
For most of the cases that people seem to run into in practice, the
solution is to enhance numpy's dtype interface so that it's possible
for mere mortals to implement new dtypes, e.g. by just subclassing
np.dtype. This is totally doable and would enable a ton of
awesomeness, but it requires someone with the time to sit down and
work on it, and no-one has volunteered yet. Unfortunately it does
require hacking on C code though.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


From shoyer at gmail.com  Sun Sep 21 23:44:37 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 21 Sep 2014 20:44:37 -0700
Subject: [Numpy-discussion] ANN: xray 0.3 released
Message-ID: <CAEQ_TveNFqdCnKkUf__bO-uD2XYOyUzFx=gpH9O_sPqxb5tFYQ@mail.gmail.com>

I'm pleased to announce the v0.3 release for xray, N-D labeled arrays and
datasets in Python.

xray is an open source project and Python package that aims to bring
the labeled data power of pandas to the physical sciences, by
providing N-dimensional variants of the core pandas data structures, Series
and DataFrame: the xray DataArray and Dataset.

Our goal is to provide a pandas-like and pandas-compatible toolkit
for analytics on multi-dimensional arrays, rather than the tabular data for
which pandas excels. Our approach adopts the Common Data Model for
self-describing scientific data in widespread use in the Earth sciences
(e.g.,
netCDF and OPeNDAP): xray.Dataset is an in-memory representation of
a netCDF file.

Documentation: http://xray.readthedocs.org
Code: http://github.com/xray/xray
Mailing list: xray-dev at googlegroups.com

Comments, feedback and contributions are all very welcome!

Cheers,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140921/2dd563b1/attachment.html>

From jeffreback at gmail.com  Mon Sep 22 10:34:55 2014
From: jeffreback at gmail.com (Jeff Reback)
Date: Mon, 22 Sep 2014 10:34:55 -0400
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
	<CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
Message-ID: <CAHMnJKiyFv+58EEOUj=pM-UaLWVkuK9x3EDbv-gu2g3EDMoC-Q@mail.gmail.com>

Hopefully this is not TL;DR!

Their are 3 'dtype' likes that exist in pandas that could in theory mostly
be migrated back to numpy. These currently exist as the .values in-other-words
the object to which pandas defers data storage and computation for
some/most of operations.

1) SparseArray: This is the basis for SparseSeries. It is ndarray-like (its
actually a ndarray-sub-class) and optimized for the 1-d case. My guess is
that @wesm <https://github.com/wesm> created this because it a) didn't
exist in numpy, and b) didn't want scipy as an explicity dependency (at the
time), late 2011.

2) datetime support: This is not a target dtype per se, but really a
reimplementation over the top of datetime64[ns], with the associated scalar
Timestamp which is a proper sub-class of datetime.datetime. I believe @wesm
<https://github.com/wesm> created this because numpy datetime support was
(and still is to some extent) just completely broken (though better in
1.7+). It doesn't support proper timezones, the display is always in the
local timezone., and the scalar type (np.datetime64) is not extensible at
all (e.g. so have not easy to have custom printing, or parsing). These are
all well known by the numpy community and have seen some recent proposals
to remedy.

3) pd.Categorical: This was another class wesm wrote several years ago. It
is actually *could* be a numpy sub-class, though its a bit awkward as its
really a numpy-like sub-class that contains 2 ndarray-like arrays, and is
more appropriately implemented as a container of multiple-ndarrays.

So when we added support for Categoricals recently, why didn't we say try
to push a categorical dtype? I think their are several reasons, in no
particular order:

   -

   pd.Categorical is really a container of multiple ndarrays, and is
   ndarray-like. Further its API is somewhat constrained. It was simpler to
   make a python container class rather than try to sub-class ndarray and
   basically override / throw out many methods (as a lot of computation
   methods simply don't make sense between 2 categoricals). You can make a
   case that this *should not * be in numpy for this reason.
   -

   The changes in pandas for the 3 cases outlined above, were mostly on how
   to integrate these with the top-level containers (Series/DataFrame), rather
   than actually writing / re-writing a new dtype for a ndarray class. We
   always try to reuse, so we just try to extend the ndarray-like rather than
   create a new one from scratch.
   -

   Getting for example a Categorical dtype into numpy prob would take a
   pretty long cycle time. I think you need a champion for new features to
   really push them. It hasn't happened with datetime and that's been a while
   (of course its possible that pandas diverted some of this need)
   -

   API design: I think this is a big issue actually. When I added
   Categorical container support, I didn't want to change the API of
   Categorical much (and it pretty much worked out that way, mainly adding
   to it). So, say we took the path of assuming that numpy would have a nice
   categorical data dtype. We would almost certainly have to wrap it in
   something to provided needed functionaility that would necessarily be
   missing in an initial version. (of course eventually that may not be
   necessary).
   -

   So the 'nobody wants to write in C' argument is true for datetimes, but
   not for SparseArray/Categorical. In fact much of that code is just
   calling out to numpy (though some cython code too).
   -

   from a performance perspective, numpy needs a really good hashtable in
   order to support proper factorizing, which @wesm
   <https://github.com/wesm> co-opted klib to do (see this thread here
   <https://www.mail-archive.com/numpy-discussion at scipy.org/msg46024.html> for
   a discussion on this).

So I know I am repeating myself, but it comes down to this. The
API/interface of the delegated methods needs to be defined. For ndarrays it
is long established and well-known. So easy to gear pandas to that. However
with a *newer* type that is not the case, so pandas can easily decide, hey
this is the most correct behavior, let's do it this way, nothing to break,
no back compat needed.


Jeff

On Sun, Sep 21, 2014 at 11:31 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Sun, Sep 21, 2014 at 7:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
> > pandas has some hacks to support custom types of data for which numpy
> can't
> > handle well enough or at all. Examples include datetime and Categorical
> [1],
> > and others like GeoArray [2] that haven't make it into pandas yet.
> >
> > Most of these look like numpy arrays but with custom dtypes and type
> > specific methods/properties. But clearly nobody is particularly excited
> > about writing the the C necessary to implement custom dtypes [3]. Nor is
> do
> > we need the ndarray ABI.
> >
> > In many cases, writing C may not actually even be necessary for
> performance
> > reasons, e.g., categorical can be fast enough just by wrapping an integer
> > ndarray for the internal storage and using vectorized operations. And
> even
> > if it is necessary, I think we'd all rather write Cython than C.
> >
> > It's great for pandas to write its own ndarray-like wrappers (*not*
> > subclasses) that work with pandas, but it's a shame that there isn't a
> > standard interface like the ndarray to make these arrays useable for the
> > rest of the scientific Python ecosystem. For example, pandas has loads of
> > fixes for np.datetime64, but nobody seems to be up for porting them to
> numpy
> > (I doubt it would be easy).
>
> Writing them in the first place probably wasn't easy either :-). I
> don't really know why pandas spends so much effort on reimplementing
> stuff and papering over numpy limitations instead of fixing things
> upstream so that everyone can benefit. I assume they have reasons, and
> I could make some general guesses at what some of them might be, but
> if you want to know what they are -- which is presumably the first
> step in changing the situation -- you'll have to ask them, not us :-).
>
> > I know these sort of concerns are not new, but I wish I had a sense of
> what
> > the solution looks like. Is anyone actively working on these issues? Does
> > the fix belong in numpy, pandas, blaze or a new project? I'd love to get
> a
> > sense of where things stand and how I could help -- without writing any C
> > :).
>
> I think there are there are three parts:
>
> For stuff that's literally just fixing bugs in stuff that numpy
> already has, then we'd certainly be happy to accept those bug fixes.
> Probably there are things we can do to make this easier, I dunno. I'd
> love to see some of numpy's internals moving into Cython to make them
> easier to hack on, but this won't be simple because right now using
> Cython to implement a module is really an all-or-nothing affair;
> making it possible to mix Cython with numpy's existing C code will
> require upstream changes in Cython.
>
> For cases where people genuinely want to implement a new array-like
> types (e.g. DataFrame or scipy.sparse) then numpy provides a fair
> amount of support for this already (e.g., the various hooks that allow
> things like np.asarray(mydf) or np.sin(mydf) to work), and we're
> working on adding more over time (e.g., __numpy_ufunc__).
>
> My feeling though is that in most of the cases you mention,
> implementing a new array-like type is huge overkill. ndarray's
> interface is vast and reimplementing even 90% of it is a huge effort.
> For most of the cases that people seem to run into in practice, the
> solution is to enhance numpy's dtype interface so that it's possible
> for mere mortals to implement new dtypes, e.g. by just subclassing
> np.dtype. This is totally doable and would enable a ton of
> awesomeness, but it requires someone with the time to sit down and
> work on it, and no-one has volunteered yet. Unfortunately it does
> require hacking on C code though.
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140922/179dc608/attachment.html>

From shoyer at gmail.com  Tue Sep 23 02:42:13 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 22 Sep 2014 23:42:13 -0700
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
	<CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
Message-ID: <CAEQ_TveE8FNiW+AgZbu2xFusH_kvhSXe8pHgcG=rUAKQ1dMixw@mail.gmail.com>

On Sun, Sep 21, 2014 at 8:31 PM, Nathaniel Smith <njs at pobox.com> wrote:

> For cases where people genuinely want to implement a new array-like
> types (e.g. DataFrame or scipy.sparse) then numpy provides a fair
> amount of support for this already (e.g., the various hooks that allow
> things like np.asarray(mydf) or np.sin(mydf) to work), and we're
> working on adding more over time (e.g., __numpy_ufunc__).
>

Agreed, numpy does a great job of this. It has been a surprising pleasure
to integrate with numpy for my custom array-like types in xray.
__numpy_ufunc__ will let us add a few more neat tricks.


> My feeling though is that in most of the cases you mention,
> implementing a new array-like type is huge overkill. ndarray's
> interface is vast and reimplementing even 90% of it is a huge effort.
> For most of the cases that people seem to run into in practice, the
> solution is to enhance numpy's dtype interface so that it's possible
> for mere mortals to implement new dtypes, e.g. by just subclassing
> np.dtype. This is totally doable and would enable a ton of
> awesomeness, but it requires someone with the time to sit down and
> work on it, and no-one has volunteered yet. Unfortunately it does
> require hacking on C code though.
>

Something to allow mere mortals such as myself to implement new dtypes
sounds wonderful!

Would it be useful to prototype something like this in pure Python? That
sounds like a task that I could be up for. Like I said, I expect a (mostly)
pure Python solution, at least for categorical and datetime, would be a
more maintainable and even performant enough for use in pandas (given that
this is basically the current approach), as long as the bottlenecks are
dealt with appropriately. Anyone else interested in hacking on this with me?

For what it's worth, I am not convinced that it is that terrible to
reimplement most of the ndarray interface. As long as your object looks
pretty much like an ndarray with a custom dtype, it should be quite
straightforward to wrap the underlying array's methods/properties. So I'm
not too scared of that option, although I agree that it is a complete waste
to do it again and again.

Nathaniel and Jeff  -- thank you so much for detailed replies.

Cheers,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140922/c930baa3/attachment.html>

From cournape at gmail.com  Tue Sep 23 03:19:12 2014
From: cournape at gmail.com (David Cournapeau)
Date: Tue, 23 Sep 2014 08:19:12 +0100
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
	<CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
Message-ID: <CAGY4rcU0LaYuWjp-LtRfyRJzeTZmY3UeO4Qv+rjEmV-q2_hfCQ@mail.gmail.com>

On Mon, Sep 22, 2014 at 4:31 AM, Nathaniel Smith <njs at pobox.com> wrote:

> On Sun, Sep 21, 2014 at 7:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
> > pandas has some hacks to support custom types of data for which numpy
> can't
> > handle well enough or at all. Examples include datetime and Categorical
> [1],
> > and others like GeoArray [2] that haven't make it into pandas yet.
> >
> > Most of these look like numpy arrays but with custom dtypes and type
> > specific methods/properties. But clearly nobody is particularly excited
> > about writing the the C necessary to implement custom dtypes [3]. Nor is
> do
> > we need the ndarray ABI.
> >
> > In many cases, writing C may not actually even be necessary for
> performance
> > reasons, e.g., categorical can be fast enough just by wrapping an integer
> > ndarray for the internal storage and using vectorized operations. And
> even
> > if it is necessary, I think we'd all rather write Cython than C.
> >
> > It's great for pandas to write its own ndarray-like wrappers (*not*
> > subclasses) that work with pandas, but it's a shame that there isn't a
> > standard interface like the ndarray to make these arrays useable for the
> > rest of the scientific Python ecosystem. For example, pandas has loads of
> > fixes for np.datetime64, but nobody seems to be up for porting them to
> numpy
> > (I doubt it would be easy).
>
> Writing them in the first place probably wasn't easy either :-). I
> don't really know why pandas spends so much effort on reimplementing
> stuff and papering over numpy limitations instead of fixing things
> upstream so that everyone can benefit. I assume they have reasons, and
> I could make some general guesses at what some of them might be, but
> if you want to know what they are -- which is presumably the first
> step in changing the situation -- you'll have to ask them, not us :-).
>
> > I know these sort of concerns are not new, but I wish I had a sense of
> what
> > the solution looks like. Is anyone actively working on these issues? Does
> > the fix belong in numpy, pandas, blaze or a new project? I'd love to get
> a
> > sense of where things stand and how I could help -- without writing any C
> > :).
>
> I think there are there are three parts:
>
> For stuff that's literally just fixing bugs in stuff that numpy
> already has, then we'd certainly be happy to accept those bug fixes.
> Probably there are things we can do to make this easier, I dunno. I'd
> love to see some of numpy's internals moving into Cython to make them
> easier to hack on, but this won't be simple because right now using
> Cython to implement a module is really an all-or-nothing affair;
> making it possible to mix Cython with numpy's existing C code will
> require upstream changes in Cython.


> For cases where people genuinely want to implement a new array-like
> types (e.g. DataFrame or scipy.sparse) then numpy provides a fair
> amount of support for this already (e.g., the various hooks that allow
> things like np.asarray(mydf) or np.sin(mydf) to work), and we're
> working on adding more over time (e.g., __numpy_ufunc__).
>
> My feeling though is that in most of the cases you mention,
> implementing a new array-like type is huge overkill. ndarray's
> interface is vast and reimplementing even 90% of it is a huge effort.
> For most of the cases that people seem to run into in practice, the
> solution is to enhance numpy's dtype interface so that it's possible
> for mere mortals to implement new dtypes, e.g. by just subclassing
> np.dtype. This is totally doable and would enable a ton of
> awesomeness, but it requires someone with the time to sit down and
> work on it, and no-one has volunteered yet. Unfortunately it does
> require hacking on C code though.
>

While preparing my tutorial on NumPy C internals 1 year ago, I tried to get
a "basic" dtype implemented in cython, and there were various issues even
if you wanted to do all of it in cython (I can't remember the details now).
Solving this would be a good first step.

There were (are ?) also some issues regarding precedence in ufuncs
depending on the new dtype: numpy hardcodes that long double is the highest
precision floating point type, for example, and there were similar issues
regarding datetime handling. Does not matter for completely new types that
don't require interactions with others (categorical ?).

Would it help to prepare a set of "implement your own dtype" notebooks ? I
have a starting point from last year tutorial (the corresponding slides
were never shown for lack of time).

David


>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/90fbfa6d/attachment.html>

From toddrjen at gmail.com  Tue Sep 23 05:31:46 2014
From: toddrjen at gmail.com (Todd)
Date: Tue, 23 Sep 2014 11:31:46 +0200
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
	<CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
Message-ID: <CAFpSVpKCgiKYrvWr0BVmJk096UKrnC1cSV2MzfHsvZJuJhutqg@mail.gmail.com>

On Mon, Sep 22, 2014 at 5:31 AM, Nathaniel Smith <njs at pobox.com> wrote:

> On Sun, Sep 21, 2014 at 7:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
>  My feeling though is that in most of the cases you mention,
> implementing a new array-like type is huge overkill. ndarray's
> interface is vast and reimplementing even 90% of it is a huge effort.
> For most of the cases that people seem to run into in practice, the
> solution is to enhance numpy's dtype interface so that it's possible
> for mere mortals to implement new dtypes, e.g. by just subclassing
> np.dtype. This is totally doable and would enable a ton of
> awesomeness, but it requires someone with the time to sit down and
> work on it, and no-one has volunteered yet. Unfortunately it does
> require hacking on C code though.
>

I'm unclear about the last sentence.  Do you mean "improving the dtype
system will require hacking on C code" or "even if we improve the dtype
system dtypes will still have to be written in C"?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/e4967970/attachment.html>

From ewm at redtetrahedron.org  Tue Sep 23 07:40:23 2014
From: ewm at redtetrahedron.org (Eric Moore)
Date: Tue, 23 Sep 2014 07:40:23 -0400
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAFpSVpKCgiKYrvWr0BVmJk096UKrnC1cSV2MzfHsvZJuJhutqg@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
	<CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
	<CAFpSVpKCgiKYrvWr0BVmJk096UKrnC1cSV2MzfHsvZJuJhutqg@mail.gmail.com>
Message-ID: <CAGeA38mOTmKEqSm7xRbfKYa_URXAwF1z56hUMwgmFtF7NdwHGQ@mail.gmail.com>

On Tuesday, September 23, 2014, Todd <toddrjen at gmail.com> wrote:

> On Mon, Sep 22, 2014 at 5:31 AM, Nathaniel Smith <njs at pobox.com
> <javascript:_e(%7B%7D,'cvml','njs at pobox.com');>> wrote:
>
>> On Sun, Sep 21, 2014 at 7:50 PM, Stephan Hoyer <shoyer at gmail.com
>> <javascript:_e(%7B%7D,'cvml','shoyer at gmail.com');>> wrote:
>>  My feeling though is that in most of the cases you mention,
>> implementing a new array-like type is huge overkill. ndarray's
>> interface is vast and reimplementing even 90% of it is a huge effort.
>> For most of the cases that people seem to run into in practice, the
>> solution is to enhance numpy's dtype interface so that it's possible
>> for mere mortals to implement new dtypes, e.g. by just subclassing
>> np.dtype. This is totally doable and would enable a ton of
>> awesomeness, but it requires someone with the time to sit down and
>> work on it, and no-one has volunteered yet. Unfortunately it does
>> require hacking on C code though.
>>
>
> I'm unclear about the last sentence.  Do you mean "improving the dtype
> system will require hacking on C code" or "even if we improve the dtype
> system dtypes will still have to be written in C"?
>

What ends up making this hard is every place numpy does anything with a
dtype needs at least audited and probably changed. All of that is in c
right now, and most of it would likely still be after the fact, simply
because the rest of numpy is in c. Improving the dtype system requires
working on c code.

Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/b20be5bf/attachment.html>

From travis at continuum.io  Tue Sep 23 09:34:28 2014
From: travis at continuum.io (Travis Oliphant)
Date: Tue, 23 Sep 2014 08:34:28 -0500
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
Message-ID: <CAMcnTE6NaTrpgqMiLv4VUkv0zoj+8SaTWu5FgUywk38gTkvPcw@mail.gmail.com>

On Sun, Sep 21, 2014 at 6:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> pandas has some hacks to support custom types of data for which numpy
> can't handle well enough or at all. Examples include datetime and
> Categorical [1], and others like GeoArray [2] that haven't make it into
> pandas yet.
>
> Most of these look like numpy arrays but with custom dtypes and type
> specific methods/properties. But clearly nobody is particularly excited
> about writing the the C necessary to implement custom dtypes [3]. Nor is do
> we need the ndarray ABI.
>
> In many cases, writing C may not actually even be necessary for
> performance reasons, e.g., categorical can be fast enough just by wrapping
> an integer ndarray for the internal storage and using vectorized
> operations. And even if it is necessary, I think we'd all rather write
> Cython than C.
>
> It's great for pandas to write its own ndarray-like wrappers (*not*
> subclasses) that work with pandas, but it's a shame that there isn't a
> standard interface like the ndarray to make these arrays useable for the
> rest of the scientific Python ecosystem. For example, pandas has loads of
> fixes for np.datetime64, but nobody seems to be up for porting them to
> numpy (I doubt it would be easy).
>
> I know these sort of concerns are not new, but I wish I had a sense of
> what the solution looks like. Is anyone actively working on these issues?
> Does the fix belong in numpy, pandas, blaze or a new project? I'd love to
> get a sense of where things stand and how I could help -- without writing
> any C :).
>
>
Hey Stephan,

There are not easy answers to your questions.   The reason is that NumPy's
dtype system is not extensible enough with its fixed set of "builtin"
data-types and its bolted-on "user-defined" datatypes.   The implementation
was adapted from the *descriptor* notion that was in Numeric (written
almost 20 years ago).     While a significant improvement over Numeric, the
dtype system in NumPy still has several limitations:

    1) it was not designed to add new fundamental data-types without
breaking the ABI (most of the ABI breakage between 1.3 and 1.7 due to the
addition of np.datetime has been pushed to a small corner but it is still
there).

    2) The user-defined data-type system which is present is not well
tested and likely incomplete:  it was the best I could come up with at the
time NumPy first came out with a bit of input from people like Fernando
Perez and Francesc Alted.

    3) It is far easier than in Numeric to add new data-types (that was a
big part of the effort of NumPy), but it is still not as easy as one would
like to add new data-types (either fundamental ones requiring recompilation
of NumPy or 'user-defined' data-types requiring C-code.

I believe this system has served us well, but it needs to be replaced
eventually.  I think it can be replaced fairly seamlessly in a largely
backward compatible way (though requiring re-compilation of dependencies).
   Fixing the dtype system is a fundamental effort behind several projects
we are working on at Continuum:  datashape, dynd, and numba.    These
projects are addressing fundamental limitations in a way that can lead to a
significantly improved framework for scientific and tabular computing in
Python.

In the mean-time, NumPy can continue to improve in small ways and in
orthogonal ways (like the new __numpy_ufunc__ mechanism which allows ufuncs
to work more seamlessly with different kinds of array-like objects).
 This kind of effort as well as the improved buffer protocol in Python,
mean that multiple array-like objects can co-exist and use each-other's
data.   Right now, I think that is the best current way to address the
data-type limitations of NumPy.

Another small project is possible today --- one could today use Numba or
Cython to generate user-defined data-types for existing NumPy.   That would
be an interesting project and would certainly help to understand the
limitations of the user-defined data-type framework without making people
write C-code.   You could use a meta-class and some code-generation
techniques so that by defining a particular class you end-up with a
user-defined data-type for NumPy.

Even while we have been addressing the fundamental limitations of NumPy
with our new tools at Continuum, replacing NumPy is a big undertaking
because of its large user-base.   While I personally think that NumPy could
be replaced for new users as early as next year with a combination of dynd
and numba, the big install base of NumPy means that many people (including
the company I work with, Continuum) will be supporting NumPy 1.X and Pandas
and the rest of the NumPy-Stack for many years to come.

So, even if you see me working and advocating new technology, that should
never be construed as somehow ignoring or abandoning the current technology
base.   I remain deeply interested in the success of the scientific
computing community --- even though I am not currently contributing a lot
of code directly myself.    As dynd and numba mature, I think it will be
clear to more people how to proceed.

For example, just recently the thought emerged that because dynd addresses
some of the major needs that Pandas has, it may be possible very soon for
dynd to replace NumPy as the foundational container for Pandas data-frames.
   Because Pandas use of the NumPy API is limited, this is an easier
undertaking than having dynd replace NumPy itself.   And given that the new
data-types of dynd:  missing-data, categorical types, variable-length
strings, etc. are some of the key areas that Pandas has work-arounds for,
it may be a straight-forward project.

For those not aware:  dynd is cython code that wraps the C++ library
libdynd.   Currently libdynd is not complete, so working on dynd may
require some improvements / fixes to libdynd.   However, the dynd layer
should be accessible to many people.   The libdynd layer is also fairly
straightforward C++.  I strongly believe that the combination of libdynd
and dynd is a much easier foundation to work on and maintain than the NumPy
code base.      I say this after having personally spent over a decade on
the Numeric code-base and then the NumPy code base.    The NumPy "C"
code-base has been improved since I left it by the excellent work of
several patient developers --- but it is not easy to transmit the knowledge
necessary to understand the code-base sufficient to maintain it without
creating backward compatibility issues.

So, while I continue to support the NumPy code base and its extensions
(personally, through Numfocus, and through Continuum) and believe it will
be relevant for many years, I also believe the future lies in renewing the
NumPy code base with a combination of dynd and numba with more emphasis on
the high-level APIs like pandas and blaze.  The good news is that this
means:  1) a lot more code in Python or Cython, 2) compatibility with the
PyPy world as part of a long term effort to heal the rift that exists
between scientific-use of Python and "web-use" of Python.

In the end, all of this is good news for Python and scientific computing.
More and better tools will continue to be written with better interop
between them.    There are many places to jump in and help:  dynd, libdynd,
datashape, blaze, numba, scipy, scikits, numpy, pandas, and even a new
project you create that enhances some aspect of any of these or does
something like use Cython or Numba to create NumPy user-defined data-types
from a Python class-specification.

I agree it can be hard to know where things will eventually end up and so
therefore where to spend your effort.   All I can tell you is what I've
decided and where I am pushing and promoting.

Best,

-Travis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/26092e83/attachment.html>

From jeffreback at gmail.com  Tue Sep 23 09:55:16 2014
From: jeffreback at gmail.com (Jeff Reback)
Date: Tue, 23 Sep 2014 09:55:16 -0400
Subject: [Numpy-discussion] Dataframe memory info printing
Message-ID: <CAHMnJKhhpZAcSiYkS1RctyPbUPtfEXtzxRJRm4q3h0rfBakReg@mail.gmail.com>

For the 0.15.0 release of pandas (coming 2nd week of oct), we are going to
include memory info printing:

see here: https://github.com/pydata/pandas/pull/7619

This will be controllable by an option display.memory_usage.

My question to the community should this be by default True, e.g. show the
memory usage (this only applies to using df.info())
There is really no performance impact here.

Pls let us know!

thanks

Jeff

>>> df.info(memory_usage=True)<class 'pandas.core.frame.DataFrame'>Int64Index: 10000000 entries, 0 to 9999999Data columns (total 5 columns):date        datetime64[ns]float       float64int         int64smallint    int16string      objectdtypes: datetime64[ns](1), float64(1), int16(1), int64(1), object(1)memory usage: 324.2 MB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/cd8d2385/attachment.html>

From ben.root at ou.edu  Tue Sep 23 10:00:16 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Tue, 23 Sep 2014 10:00:16 -0400
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAMcnTE6NaTrpgqMiLv4VUkv0zoj+8SaTWu5FgUywk38gTkvPcw@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
	<CAMcnTE6NaTrpgqMiLv4VUkv0zoj+8SaTWu5FgUywk38gTkvPcw@mail.gmail.com>
Message-ID: <CANNq6F=cGz1JQv-XFNEb9+ms+t5rZm6SDWBQzg6mw4bjhL4wTg@mail.gmail.com>

Travis,

Thank you for your perspective on this issue. Such input is always valuable
in helping us see where we came from and where we might go.

My perspective on NumPy is fairly different, having come into Python right
after the whole Numeric/NumArray transition to NumPy. One of the things
that really sold me on NumPy was not only just how simple it was for me to
use out of the box, but how easy it was to explicitly state that something
needed to be of one type or another. The dtype notation was fairly simple
and straight-forward.

-- We should not underestimate the value of simple to write and simple to
read notations in Python --

We can go ahead and put as many bells and whistles into the underlaying
infrastructure as you want, but if we can't design a simple notation
language to utilize it, then it will never catch on. This isn't criticism
of the work being done in dynd or the or the other projects, rather, it is
a call for innovation. I don't know how I would design such a notation
language, but we need that "ah-ha!" moment from *somebody*.

I expressed this back at the NumPy BoF this summer. I would love an
improved notation system that Matplotlib could take advantage of that would
facilitate the plotting of more complicated graphs. But I am also not
really interested in seeing NumPy turn into Pandas. Nothing wrong with
Pandas; I just like the idea of modularity and I think it has suited the
community well. Striking the right balance is going to be extremely
important.

Cheers!
Ben Root


On Tue, Sep 23, 2014 at 9:34 AM, Travis Oliphant <travis at continuum.io>
wrote:

>
> On Sun, Sep 21, 2014 at 6:50 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> pandas has some hacks to support custom types of data for which numpy
>> can't handle well enough or at all. Examples include datetime and
>> Categorical [1], and others like GeoArray [2] that haven't make it into
>> pandas yet.
>>
>> Most of these look like numpy arrays but with custom dtypes and type
>> specific methods/properties. But clearly nobody is particularly excited
>> about writing the the C necessary to implement custom dtypes [3]. Nor is do
>> we need the ndarray ABI.
>>
>> In many cases, writing C may not actually even be necessary for
>> performance reasons, e.g., categorical can be fast enough just by wrapping
>> an integer ndarray for the internal storage and using vectorized
>> operations. And even if it is necessary, I think we'd all rather write
>> Cython than C.
>>
>> It's great for pandas to write its own ndarray-like wrappers (*not*
>> subclasses) that work with pandas, but it's a shame that there isn't a
>> standard interface like the ndarray to make these arrays useable for the
>> rest of the scientific Python ecosystem. For example, pandas has loads of
>> fixes for np.datetime64, but nobody seems to be up for porting them to
>> numpy (I doubt it would be easy).
>>
>> I know these sort of concerns are not new, but I wish I had a sense of
>> what the solution looks like. Is anyone actively working on these issues?
>> Does the fix belong in numpy, pandas, blaze or a new project? I'd love to
>> get a sense of where things stand and how I could help -- without writing
>> any C :).
>>
>>
> Hey Stephan,
>
> There are not easy answers to your questions.   The reason is that NumPy's
> dtype system is not extensible enough with its fixed set of "builtin"
> data-types and its bolted-on "user-defined" datatypes.   The implementation
> was adapted from the *descriptor* notion that was in Numeric (written
> almost 20 years ago).     While a significant improvement over Numeric, the
> dtype system in NumPy still has several limitations:
>
>     1) it was not designed to add new fundamental data-types without
> breaking the ABI (most of the ABI breakage between 1.3 and 1.7 due to the
> addition of np.datetime has been pushed to a small corner but it is still
> there).
>
>     2) The user-defined data-type system which is present is not well
> tested and likely incomplete:  it was the best I could come up with at the
> time NumPy first came out with a bit of input from people like Fernando
> Perez and Francesc Alted.
>
>     3) It is far easier than in Numeric to add new data-types (that was a
> big part of the effort of NumPy), but it is still not as easy as one would
> like to add new data-types (either fundamental ones requiring recompilation
> of NumPy or 'user-defined' data-types requiring C-code.
>
> I believe this system has served us well, but it needs to be replaced
> eventually.  I think it can be replaced fairly seamlessly in a largely
> backward compatible way (though requiring re-compilation of dependencies).
>    Fixing the dtype system is a fundamental effort behind several projects
> we are working on at Continuum:  datashape, dynd, and numba.    These
> projects are addressing fundamental limitations in a way that can lead to a
> significantly improved framework for scientific and tabular computing in
> Python.
>
> In the mean-time, NumPy can continue to improve in small ways and in
> orthogonal ways (like the new __numpy_ufunc__ mechanism which allows ufuncs
> to work more seamlessly with different kinds of array-like objects).
>  This kind of effort as well as the improved buffer protocol in Python,
> mean that multiple array-like objects can co-exist and use each-other's
> data.   Right now, I think that is the best current way to address the
> data-type limitations of NumPy.
>
> Another small project is possible today --- one could today use Numba or
> Cython to generate user-defined data-types for existing NumPy.   That would
> be an interesting project and would certainly help to understand the
> limitations of the user-defined data-type framework without making people
> write C-code.   You could use a meta-class and some code-generation
> techniques so that by defining a particular class you end-up with a
> user-defined data-type for NumPy.
>
> Even while we have been addressing the fundamental limitations of NumPy
> with our new tools at Continuum, replacing NumPy is a big undertaking
> because of its large user-base.   While I personally think that NumPy could
> be replaced for new users as early as next year with a combination of dynd
> and numba, the big install base of NumPy means that many people (including
> the company I work with, Continuum) will be supporting NumPy 1.X and Pandas
> and the rest of the NumPy-Stack for many years to come.
>
> So, even if you see me working and advocating new technology, that should
> never be construed as somehow ignoring or abandoning the current technology
> base.   I remain deeply interested in the success of the scientific
> computing community --- even though I am not currently contributing a lot
> of code directly myself.    As dynd and numba mature, I think it will be
> clear to more people how to proceed.
>
> For example, just recently the thought emerged that because dynd addresses
> some of the major needs that Pandas has, it may be possible very soon for
> dynd to replace NumPy as the foundational container for Pandas data-frames.
>    Because Pandas use of the NumPy API is limited, this is an easier
> undertaking than having dynd replace NumPy itself.   And given that the new
> data-types of dynd:  missing-data, categorical types, variable-length
> strings, etc. are some of the key areas that Pandas has work-arounds for,
> it may be a straight-forward project.
>
> For those not aware:  dynd is cython code that wraps the C++ library
> libdynd.   Currently libdynd is not complete, so working on dynd may
> require some improvements / fixes to libdynd.   However, the dynd layer
> should be accessible to many people.   The libdynd layer is also fairly
> straightforward C++.  I strongly believe that the combination of libdynd
> and dynd is a much easier foundation to work on and maintain than the NumPy
> code base.      I say this after having personally spent over a decade on
> the Numeric code-base and then the NumPy code base.    The NumPy "C"
> code-base has been improved since I left it by the excellent work of
> several patient developers --- but it is not easy to transmit the knowledge
> necessary to understand the code-base sufficient to maintain it without
> creating backward compatibility issues.
>
> So, while I continue to support the NumPy code base and its extensions
> (personally, through Numfocus, and through Continuum) and believe it will
> be relevant for many years, I also believe the future lies in renewing the
> NumPy code base with a combination of dynd and numba with more emphasis on
> the high-level APIs like pandas and blaze.  The good news is that this
> means:  1) a lot more code in Python or Cython, 2) compatibility with the
> PyPy world as part of a long term effort to heal the rift that exists
> between scientific-use of Python and "web-use" of Python.
>
> In the end, all of this is good news for Python and scientific computing.
>   More and better tools will continue to be written with better interop
> between them.    There are many places to jump in and help:  dynd, libdynd,
> datashape, blaze, numba, scipy, scikits, numpy, pandas, and even a new
> project you create that enhances some aspect of any of these or does
> something like use Cython or Numba to create NumPy user-defined data-types
> from a Python class-specification.
>
> I agree it can be hard to know where things will eventually end up and so
> therefore where to spend your effort.   All I can tell you is what I've
> decided and where I am pushing and promoting.
>
> Best,
>
> -Travis
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/0f8ca386/attachment.html>

From charlesr.harris at gmail.com  Tue Sep 23 18:59:46 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 23 Sep 2014 16:59:46 -0600
Subject: [Numpy-discussion] Changes to the generalized functions.
Message-ID: <CAB6mnxLgrdUV5eTpvwqg-rH7vvDWm2Og+JAOSbf10rkcmSFpoQ@mail.gmail.com>

Hi All,

The question has come up as the whether of not to treat the new gufunc
behavior as a bug fix, keeping the old constructor name, or have a
different constructor. Keeping the name makes life easier as we don't need
to edit the code where numpy currently uses gufuncs, but is risky if some
third party depends on the old behavior. The gufuncs have been part of
numpy since the 1.3 release, and google doesn't turn up any uses that I can
see apart from repeats of numpy code. We can also make fixes if needed
during the 1.10 beta release cycle. Even so, it is a bit of a risk. To
spread the blame, if any, please weigh in on the following.


   1. Yes, it is a bug, keep the current name and fix the behavior.
   2. No, we need to be conservative and use a new function.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/8d01fc99/attachment.html>

From charlesr.harris at gmail.com  Tue Sep 23 19:31:30 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 23 Sep 2014 17:31:30 -0600
Subject: [Numpy-discussion] Changes to the generalized functions.
In-Reply-To: <CAB6mnxLgrdUV5eTpvwqg-rH7vvDWm2Og+JAOSbf10rkcmSFpoQ@mail.gmail.com>
References: <CAB6mnxLgrdUV5eTpvwqg-rH7vvDWm2Og+JAOSbf10rkcmSFpoQ@mail.gmail.com>
Message-ID: <CAB6mnxKu6w4Ku94PrtpD-1VwtM43_s=KRe8mjzsULjukd5qt_A@mail.gmail.com>

On Tue, Sep 23, 2014 at 4:59 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

> Hi All,
>
> The question has come up as the whether of not to treat the new gufunc
> behavior as a bug fix, keeping the old constructor name, or have a
> different constructor. Keeping the name makes life easier as we don't need
> to edit the code where numpy currently uses gufuncs, but is risky if some
> third party depends on the old behavior. The gufuncs have been part of
> numpy since the 1.3 release, and google doesn't turn up any uses that I can
> see apart from repeats of numpy code. We can also make fixes if needed
> during the 1.10 beta release cycle. Even so, it is a bit of a risk. To
> spread the blame, if any, please weigh in on the following.
>
>
>    1. Yes, it is a bug, keep the current name and fix the behavior.
>    2. No, we need to be conservative and use a new function.
>
> To clarify the changes. Currently, if an input array does not have as many
dimensions as indicated by the signature, it is filled out with ones to
match the signature as well as broadcasting to the other array. It is also
the case that if a dimension in the signature is 1, then it is broadcast
with the data in the other signature. That is, the parts identified in the
signature are treated as little arrays. The proposed change is to require
that the inputs have at least as many dimensions as the signature and that
dimensions of 1 are not broadcast. That is, the parts identified in the
signature are treated as vectors or matrices rather than little arrays.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/8ce955bd/attachment.html>

From huruomu at gmail.com  Tue Sep 23 23:21:15 2014
From: huruomu at gmail.com (Romu Hu)
Date: Wed, 24 Sep 2014 11:21:15 +0800
Subject: [Numpy-discussion] All numpy-f2py tests failed
Message-ID: <542238AB.2090408@gmail.com>

Hi,

I'm using python27-numpy-f2py-1.7.1-9.el6.x86_64 from RHEL6, the package 
has a test directory 
"/usr/lib64/python2.7/site-packages/numpy/f2py/tests", when I run 
unittest in the directory, all 358 testcases fail:

# cd /usr/lib64/python2.7/site-packages/numpy/f2py/tests
# python27 -m unittest discover -v

======================================================================
ERROR: test_c_copy_in_from_23casttype (test_array_from_pyobj.test_BOOL_gen)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "<string>", line 5, in setUp
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 126, in __new__
     obj._init(name)
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 132, in _init
     self.type_num = getattr(wrap,'NPY_'+self.NAME)
AttributeError: 'NoneType' object has no attribute 'NPY_BOOL'

======================================================================
ERROR: test_c_in_from_23casttype (test_array_from_pyobj.test_BOOL_gen)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "<string>", line 5, in setUp
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 126, in __new__
     obj._init(name)
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 132, in _init
     self.type_num = getattr(wrap,'NPY_'+self.NAME)
AttributeError: 'NoneType' object has no attribute 'NPY_BOOL'

......
......

======================================================================
ERROR: test_optional_from_23seq (test_array_from_pyobj.test_USHORT_gen)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "<string>", line 5, in setUp
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 126, in __new__
     obj._init(name)
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 132, in _init
     self.type_num = getattr(wrap,'NPY_'+self.NAME)
AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'

======================================================================
ERROR: test_optional_from_2seq (test_array_from_pyobj.test_USHORT_gen)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "<string>", line 5, in setUp
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 126, in __new__
     obj._init(name)
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 132, in _init
     self.type_num = getattr(wrap,'NPY_'+self.NAME)
AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'

======================================================================
ERROR: test_optional_none (test_array_from_pyobj.test_USHORT_gen)
----------------------------------------------------------------------
Traceback (most recent call last):
   File "<string>", line 5, in setUp
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 126, in __new__
     obj._init(name)
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 132, in _init
     self.type_num = getattr(wrap,'NPY_'+self.NAME)
AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'

======================================================================
ERROR: test_in_out (test_array_from_pyobj.test_intent)
----------------------------------------------------------------------
Traceback (most recent call last):
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 270, in test_in_out
     assert_equal(str(intent.in_.out),'intent(in,out)')
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 67, in __getattr__
     return self.__class__(self.intent_list+[name])
   File 
"/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py", 
line 62, in __init__
     flags |= getattr(wrap,'F2PY_INTENT_'+i.upper())
AttributeError: 'NoneType' object has no attribute 'F2PY_INTENT_IN'

----------------------------------------------------------------------
Ran 358 tests in 0.047s

FAILED (errors=358)


It seems that all tests fail because the 'wrap' variable is None. Am I 
using a wrong way to run the tests?  Any idea?


Thanks
Romu


From charlesr.harris at gmail.com  Wed Sep 24 00:20:54 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 23 Sep 2014 22:20:54 -0600
Subject: [Numpy-discussion] All numpy-f2py tests failed
In-Reply-To: <542238AB.2090408@gmail.com>
References: <542238AB.2090408@gmail.com>
Message-ID: <CAB6mnxLPc4L-7x9Hhnt7c58yOOGk2wsjWV3gJUSvz+-hQod2tw@mail.gmail.com>

On Tue, Sep 23, 2014 at 9:21 PM, Romu Hu <huruomu at gmail.com> wrote:

> Hi,
>
> I'm using python27-numpy-f2py-1.7.1-9.el6.x86_64 from RHEL6, the package
> has a test directory
> "/usr/lib64/python2.7/site-packages/numpy/f2py/tests", when I run
> unittest in the directory, all 358 testcases fail:
>
> # cd /usr/lib64/python2.7/site-packages/numpy/f2py/tests
> # python27 -m unittest discover -v
>
> ======================================================================
> ERROR: test_c_copy_in_from_23casttype (test_array_from_pyobj.test_BOOL_gen)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "<string>", line 5, in setUp
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 126, in __new__
>      obj._init(name)
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 132, in _init
>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
> AttributeError: 'NoneType' object has no attribute 'NPY_BOOL'
>
> ======================================================================
> ERROR: test_c_in_from_23casttype (test_array_from_pyobj.test_BOOL_gen)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "<string>", line 5, in setUp
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 126, in __new__
>      obj._init(name)
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 132, in _init
>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
> AttributeError: 'NoneType' object has no attribute 'NPY_BOOL'
>
> ......
> ......
>
> ======================================================================
> ERROR: test_optional_from_23seq (test_array_from_pyobj.test_USHORT_gen)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "<string>", line 5, in setUp
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 126, in __new__
>      obj._init(name)
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 132, in _init
>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
> AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'
>
> ======================================================================
> ERROR: test_optional_from_2seq (test_array_from_pyobj.test_USHORT_gen)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "<string>", line 5, in setUp
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 126, in __new__
>      obj._init(name)
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 132, in _init
>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
> AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'
>
> ======================================================================
> ERROR: test_optional_none (test_array_from_pyobj.test_USHORT_gen)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File "<string>", line 5, in setUp
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 126, in __new__
>      obj._init(name)
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 132, in _init
>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
> AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'
>
> ======================================================================
> ERROR: test_in_out (test_array_from_pyobj.test_intent)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 270, in test_in_out
>      assert_equal(str(intent.in_.out),'intent(in,out)')
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 67, in __getattr__
>      return self.__class__(self.intent_list+[name])
>    File
>
> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
> line 62, in __init__
>      flags |= getattr(wrap,'F2PY_INTENT_'+i.upper())
> AttributeError: 'NoneType' object has no attribute 'F2PY_INTENT_IN'
>
> ----------------------------------------------------------------------
> Ran 358 tests in 0.047s
>
> FAILED (errors=358)
>
>
> It seems that all tests fail because the 'wrap' variable is None. Am I
> using a wrong way to run the tests?  Any idea?
>
>
Looks pretty drastic, but I expect the problem is that numpy uses nose for
running tests, not unittest. Try

charris at localhost [tests (master)]$ nosetests

<snip>

----------------------------------------------------------------------
Ran 379 tests in 12.457s

OK

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/a1edf2e8/attachment.html>

From charlesr.harris at gmail.com  Wed Sep 24 00:30:55 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 23 Sep 2014 22:30:55 -0600
Subject: [Numpy-discussion] All numpy-f2py tests failed
In-Reply-To: <CAB6mnxLPc4L-7x9Hhnt7c58yOOGk2wsjWV3gJUSvz+-hQod2tw@mail.gmail.com>
References: <542238AB.2090408@gmail.com>
	<CAB6mnxLPc4L-7x9Hhnt7c58yOOGk2wsjWV3gJUSvz+-hQod2tw@mail.gmail.com>
Message-ID: <CAB6mnx+hTs0mOjzfPxr9WBf5FyUVZrxQEcrrWrWuwdh+oQNoVA@mail.gmail.com>

On Tue, Sep 23, 2014 at 10:20 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Tue, Sep 23, 2014 at 9:21 PM, Romu Hu <huruomu at gmail.com> wrote:
>
>> Hi,
>>
>> I'm using python27-numpy-f2py-1.7.1-9.el6.x86_64 from RHEL6, the package
>> has a test directory
>> "/usr/lib64/python2.7/site-packages/numpy/f2py/tests", when I run
>> unittest in the directory, all 358 testcases fail:
>>
>> # cd /usr/lib64/python2.7/site-packages/numpy/f2py/tests
>> # python27 -m unittest discover -v
>>
>> ======================================================================
>> ERROR: test_c_copy_in_from_23casttype
>> (test_array_from_pyobj.test_BOOL_gen)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File "<string>", line 5, in setUp
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 126, in __new__
>>      obj._init(name)
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 132, in _init
>>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
>> AttributeError: 'NoneType' object has no attribute 'NPY_BOOL'
>>
>> ======================================================================
>> ERROR: test_c_in_from_23casttype (test_array_from_pyobj.test_BOOL_gen)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File "<string>", line 5, in setUp
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 126, in __new__
>>      obj._init(name)
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 132, in _init
>>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
>> AttributeError: 'NoneType' object has no attribute 'NPY_BOOL'
>>
>> ......
>> ......
>>
>> ======================================================================
>> ERROR: test_optional_from_23seq (test_array_from_pyobj.test_USHORT_gen)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File "<string>", line 5, in setUp
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 126, in __new__
>>      obj._init(name)
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 132, in _init
>>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
>> AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'
>>
>> ======================================================================
>> ERROR: test_optional_from_2seq (test_array_from_pyobj.test_USHORT_gen)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File "<string>", line 5, in setUp
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 126, in __new__
>>      obj._init(name)
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 132, in _init
>>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
>> AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'
>>
>> ======================================================================
>> ERROR: test_optional_none (test_array_from_pyobj.test_USHORT_gen)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File "<string>", line 5, in setUp
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 126, in __new__
>>      obj._init(name)
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 132, in _init
>>      self.type_num = getattr(wrap,'NPY_'+self.NAME)
>> AttributeError: 'NoneType' object has no attribute 'NPY_USHORT'
>>
>> ======================================================================
>> ERROR: test_in_out (test_array_from_pyobj.test_intent)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 270, in test_in_out
>>      assert_equal(str(intent.in_.out),'intent(in,out)')
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 67, in __getattr__
>>      return self.__class__(self.intent_list+[name])
>>    File
>>
>> "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py",
>> line 62, in __init__
>>      flags |= getattr(wrap,'F2PY_INTENT_'+i.upper())
>> AttributeError: 'NoneType' object has no attribute 'F2PY_INTENT_IN'
>>
>> ----------------------------------------------------------------------
>> Ran 358 tests in 0.047s
>>
>> FAILED (errors=358)
>>
>>
>> It seems that all tests fail because the 'wrap' variable is None. Am I
>> using a wrong way to run the tests?  Any idea?
>>
>>
> Looks pretty drastic, but I expect the problem is that numpy uses nose for
> running tests, not unittest. Try
>
> charris at localhost [tests (master)]$ nosetests
>
> <snip>
>
> ----------------------------------------------------------------------
> Ran 379 tests in 12.457s
>
> OK
>
> But this doesn't work with the installed package. From ipython, or a
terminal, do

In [8]: np.f2py.test()
Running unit tests for numpy.f2py
NumPy version 1.10.0.dev-4083883
NumPy is installed in /home/charris/.local/lib/python2.7/site-packages/numpy
Python version 2.7.5 (default, Jun 25 2014, 10:19:55) [GCC 4.8.2 20131212
(Red Hat 4.8.2-7)]
nose version 1.3.0
.......................................................................................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 359 tests in 1.737s

OK

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/8da33bfb/attachment.html>

From charlesr.harris at gmail.com  Wed Sep 24 00:52:25 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 23 Sep 2014 22:52:25 -0600
Subject: [Numpy-discussion] All numpy-f2py tests failed
In-Reply-To: <CAB6mnx+hTs0mOjzfPxr9WBf5FyUVZrxQEcrrWrWuwdh+oQNoVA@mail.gmail.com>
References: <542238AB.2090408@gmail.com>
	<CAB6mnxLPc4L-7x9Hhnt7c58yOOGk2wsjWV3gJUSvz+-hQod2tw@mail.gmail.com>
	<CAB6mnx+hTs0mOjzfPxr9WBf5FyUVZrxQEcrrWrWuwdh+oQNoVA@mail.gmail.com>
Message-ID: <CAB6mnxLOCF2+SuV-U20ezT9JNOwamXVDdQ=bSR3BKnfyjJ3iHg@mail.gmail.com>

On Tue, Sep 23, 2014 at 10:30 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Tue, Sep 23, 2014 at 10:20 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>>
<snip>

Looks like you need to do

>>> import numpy.f2py as f2py
>>> f2py.test()

To run tests with the redhat package. Redhat splits it up, so things are a
bit weird.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140923/4bbc1548/attachment.html>

From huruomu at gmail.com  Wed Sep 24 01:48:14 2014
From: huruomu at gmail.com (Romu Hu)
Date: Wed, 24 Sep 2014 13:48:14 +0800
Subject: [Numpy-discussion] All numpy-f2py tests failed
In-Reply-To: <CAB6mnxLOCF2+SuV-U20ezT9JNOwamXVDdQ=bSR3BKnfyjJ3iHg@mail.gmail.com>
References: <542238AB.2090408@gmail.com>	<CAB6mnxLPc4L-7x9Hhnt7c58yOOGk2wsjWV3gJUSvz+-hQod2tw@mail.gmail.com>	<CAB6mnx+hTs0mOjzfPxr9WBf5FyUVZrxQEcrrWrWuwdh+oQNoVA@mail.gmail.com>
	<CAB6mnxLOCF2+SuV-U20ezT9JNOwamXVDdQ=bSR3BKnfyjJ3iHg@mail.gmail.com>
Message-ID: <54225B1E.1030202@gmail.com>

On 2014/9/24 12:52, Charles R Harris wrote:
> On Tue, Sep 23, 2014 at 10:30 PM, Charles R Harris 
> <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
>
>     On Tue, Sep 23, 2014 at 10:20 PM, Charles R Harris
>     <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
>
>
>
>
> <snip>
>
> Looks like you need to do
>
> >>> import numpy.f2py as f2py
> >>> f2py.test()

I tried it but no test was run:

Run python in the python27 env:

$ scl enable python27 "python"
Python 2.7.5 (default, Aug 13 2014, 02:49:07)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>>

Run the test:

 >>> import numpy.f2py as f2py
 >>> f2py.test()
Running unit tests for numpy.f2py
NumPy version 1.7.1
NumPy is installed in 
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy
Python version 2.7.5 (default, Aug 13 2014, 02:49:07) [GCC 4.4.7 
20120313 (Red Hat 4.4.7-3)]
nose version 1.3.0
S
----------------------------------------------------------------------
Ran 0 tests in 0.452s

OK (SKIP=1)
<nose.result.TextTestResult run=0 errors=0 failures=0>

Files of the python27-numpy-f2py-1.7.1-9.el6.x86_64 package:

/opt/rh/python27/root/usr/bin/f2py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/__init__.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/__init__.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/__init__.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/__version__.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/__version__.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/__version__.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/auxfuncs.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/auxfuncs.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/auxfuncs.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/capi_maps.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/capi_maps.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/capi_maps.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/cb_rules.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/cb_rules.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/cb_rules.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/cfuncs.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/cfuncs.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/cfuncs.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/common_rules.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/common_rules.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/common_rules.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/crackfortran.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/crackfortran.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/crackfortran.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/diagnose.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/diagnose.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/diagnose.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f2py2e.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f2py2e.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f2py2e.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f2py_testing.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f2py_testing.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f2py_testing.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f90mod_rules.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f90mod_rules.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/f90mod_rules.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/func2subr.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/func2subr.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/func2subr.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/info.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/info.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/info.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/rules.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/rules.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/rules.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/setup.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/setup.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/setup.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/setupscons.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/setupscons.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/setupscons.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/src
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/src/fortranobject.c
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/src/fortranobject.h
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/array_from_pyobj
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/array_from_pyobj/wrapmodule.c
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/assumed_shape
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/assumed_shape/.f2py_f2cmap
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/assumed_shape/foo_free.f90
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/assumed_shape/foo_mod.f90
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/assumed_shape/foo_use.f90
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/assumed_shape/precision.f90
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/kind
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/kind/foo.f90
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/mixed
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/mixed/foo.f
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/mixed/foo_fixed.f90
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/mixed/foo_free.f90
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/size
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/src/size/foo.f90
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_array_from_pyobj.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_assumed_shape.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_assumed_shape.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_assumed_shape.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_callback.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_callback.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_callback.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_kind.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_kind.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_kind.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_mixed.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_mixed.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_mixed.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_character.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_character.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_character.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_complex.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_complex.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_complex.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_integer.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_integer.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_integer.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_logical.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_logical.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_logical.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_real.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_real.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_return_real.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_size.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_size.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/test_size.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/util.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/util.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/tests/util.pyo
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/use_rules.py
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/use_rules.pyc
/opt/rh/python27/root/usr/lib64/python2.7/site-packages/numpy/f2py/use_rules.pyo
/opt/rh/python27/root/usr/share/man/man1/f2py.1.gz


Thanks
Romu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/9b55db1e/attachment.html>

From saullogiovani at gmail.com  Wed Sep 24 12:36:05 2014
From: saullogiovani at gmail.com (Saullo Castro)
Date: Wed, 24 Sep 2014 18:36:05 +0200
Subject: [Numpy-discussion] PR #5109 - interpolation function for polar
	coordinates
Message-ID: <CAHbwRz7AK0iSMpnLt0o0rp-J_a61khpTOcfyJKsBpu+FDLey1A@mail.gmail.com>

Dear all,

today I've submitted a pull request:
https://github.com/numpy/numpy/pull/5109

in order to include an interpolation function for angular coordinates,
since using np.interp for this purpose is cumbersome.

I kindly ask for your feedback and/or questions.

Greetings,
Saullo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/8b0922a0/attachment.html>

From chris.barker at noaa.gov  Wed Sep 24 13:08:29 2014
From: chris.barker at noaa.gov (Chris Barker)
Date: Wed, 24 Sep 2014 10:08:29 -0700
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CAGeA38mOTmKEqSm7xRbfKYa_URXAwF1z56hUMwgmFtF7NdwHGQ@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
	<CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
	<CAFpSVpKCgiKYrvWr0BVmJk096UKrnC1cSV2MzfHsvZJuJhutqg@mail.gmail.com>
	<CAGeA38mOTmKEqSm7xRbfKYa_URXAwF1z56hUMwgmFtF7NdwHGQ@mail.gmail.com>
Message-ID: <CALGmxEL=oV2nDUy1gAHCHp-g7=WH0uk0rkuGADi9r1VajVjx=w@mail.gmail.com>

On Tue, Sep 23, 2014 at 4:40 AM, Eric Moore <ewm at redtetrahedron.org> wrote:

>  Improving the dtype system requires working on c code.
>

yes -- it sure does. But I think that is a bit of a Red Herring. I'm barely
competent in C, and don't like it much, but the real barrier to entry for
 me is not that it's in C, but that it's really complex and hard to hack
on, as it wasn't designed to support custom dtypes, etc. from the start.
There is a lot of ugly code in there that has been hacked in to support
various functionality over time. If there was a clean dtype-extension
system in C, then A) it wouldn't be bad C to write, and B) would be pretty
easy to make a Cython-wrapped version.

Travis gave a nice vision for the future, but in the meantime, I'm
wondering:

Could we hack in a generic "custom dtype"  dtype object into the current
system that would delegate everything to the dtype object -- in a truly
object-oriented way. I'm imagining that this custom dtype object would be a
pyObject and thus very hackable, easy to make a new subclass, etc --
essentially like making a new class in python that emulates one of the
built-in type interfaces.

This would be slow as a dog -- if inside that C loop, numpy would have to
call out to python to do anyting, maybe as simple as arithmetic, but it
would be clean, extensible system, and a good way for folks to plug in and
try out new dtypes when performance didn't matter, or as prototypes for
something that would get plugged in at the C level later once the API was
worked out.

Is this even possible without too much hacking to the current dtype system?
Would it be as simple as adding a bit to the object dtype?

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/661417ab/attachment.html>

From wilshire461 at gmail.com  Wed Sep 24 13:23:27 2014
From: wilshire461 at gmail.com (John)
Date: Wed, 24 Sep 2014 12:23:27 -0500
Subject: [Numpy-discussion] Building numpy with OpenMP support
Message-ID: <CANH--BwXhUCZXZ8tzWk=gfmq1LvQy2xtsszFcBo8u_5VgbJZaQ@mail.gmail.com>

I am in the process of trying to build numpy with OpenMP support but
have had several issues.

Has anyone else built it with success that could offer some guidance
in what needs to be passed at build time.

For reference I am using Atlas 3.10.2 built with OpenMP as well (-F
alg -fopenmp)

Thanks,
JB


From tjhnson at gmail.com  Wed Sep 24 13:23:46 2014
From: tjhnson at gmail.com (T J)
Date: Wed, 24 Sep 2014 12:23:46 -0500
Subject: [Numpy-discussion] Round away from zero (towards +/- infinity)
Message-ID: <CAJ8p+o_kU0viBVzxTtXHDsphiAa-TBQNArA+SD70a1=AttjhWA@mail.gmail.com>

Is there a ufunc for rounding away from zero? Or do I need to do

    x2 = sign(x) * ceil(abs(x))

whenever I want to round away from zero? Maybe the following is better?

    x_ceil = ceil(x)
    x_floor = floor(x)
    x2 = where(x >= 0, x_ceil, x_floor)

Python's round function goes away from zero, so I am looking for the NumPy
equivalent (and using vectorize() seems undesirable). In this sense, it
seems that having a ufunc for this type of rounding could be helpful.

Aside: Is there interest in a more general around() that allows users to
specify alternative tie-breaking rules, with the default staying 'round
half to nearest even'? [1]

Also, what is the difference between NumPy's fix() and trunc() functions?
It seems like they achieve the same goal. trunc() was added in 1.3.0. So is
fix() just legacy?

---
[1]
http://stackoverflow.com/questions/16000574/tie-breaking-of-round-with-numpy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/a28347d0/attachment.html>

From jaime.frio at gmail.com  Wed Sep 24 14:52:11 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Wed, 24 Sep 2014 11:52:11 -0700
Subject: [Numpy-discussion] Add `nrows` to `genfromtxt`
Message-ID: <CAPOWHWmKzga4H8gQC0Zn2-8r4a14ysb414ZR3ooQ_cqcmjoCGg@mail.gmail.com>

There is a PR in github that adds a new keyword to the genfromtxt function,
to limit the number of rows that actually get read in:

https://github.com/numpy/numpy/pull/5103

It is mostly ready to go, and several devs have looked at it without
complaining. Since it is an API change, I wanted to check here: if no one
has any strong opposition, I will be merging it sometime tomorrow.

You have been warned... ;-)

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/c6e017a0/attachment.html>

From alan.isaac at gmail.com  Wed Sep 24 15:20:15 2014
From: alan.isaac at gmail.com (Alan G Isaac)
Date: Wed, 24 Sep 2014 15:20:15 -0400
Subject: [Numpy-discussion] Add `nrows` to `genfromtxt`
In-Reply-To: <CAPOWHWmKzga4H8gQC0Zn2-8r4a14ysb414ZR3ooQ_cqcmjoCGg@mail.gmail.com>
References: <CAPOWHWmKzga4H8gQC0Zn2-8r4a14ysb414ZR3ooQ_cqcmjoCGg@mail.gmail.com>
Message-ID: <5423196F.4000507@gmail.com>

On 9/24/2014 2:52 PM, Jaime Fern?ndez del R?o wrote:
> There is a PR in github that adds a new keyword to the genfromtxt function, to limit the number of rows that actually get read in:
> https://github.com/numpy/numpy/pull/5103

Sorry to come late to this party, but it seems to me that
more versatile than an `nrows` keyword for the number of rows
would be a "rows" keyword for a slice argument.

fwiw,
Alan Isaac


From jaime.frio at gmail.com  Wed Sep 24 16:54:38 2014
From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=)
Date: Wed, 24 Sep 2014 13:54:38 -0700
Subject: [Numpy-discussion] Changes to the generalized functions.
In-Reply-To: <CAB6mnxKu6w4Ku94PrtpD-1VwtM43_s=KRe8mjzsULjukd5qt_A@mail.gmail.com>
References: <CAB6mnxLgrdUV5eTpvwqg-rH7vvDWm2Og+JAOSbf10rkcmSFpoQ@mail.gmail.com>
	<CAB6mnxKu6w4Ku94PrtpD-1VwtM43_s=KRe8mjzsULjukd5qt_A@mail.gmail.com>
Message-ID: <CAPOWHWmDf+KeOkj=ZehCYoOAa30x_JxF1gXJO15D1A2CnZsCEg@mail.gmail.com>

On Tue, Sep 23, 2014 at 4:31 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Tue, Sep 23, 2014 at 4:59 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>> Hi All,
>>
>> The question has come up as the whether of not to treat the new gufunc
>> behavior as a bug fix, keeping the old constructor name, or have a
>> different constructor. Keeping the name makes life easier as we don't need
>> to edit the code where numpy currently uses gufuncs, but is risky if some
>> third party depends on the old behavior. The gufuncs have been part of
>> numpy since the 1.3 release, and google doesn't turn up any uses that I can
>> see apart from repeats of numpy code. We can also make fixes if needed
>> during the 1.10 beta release cycle. Even so, it is a bit of a risk. To
>> spread the blame, if any, please weigh in on the following.
>>
>>
>>    1. Yes, it is a bug, keep the current name and fix the behavior.
>>    2. No, we need to be conservative and use a new function.
>>
>> To clarify the changes. Currently, if an input array does not have as
> many dimensions as indicated by the signature, it is filled out with ones
> to match the signature as well as broadcasting to the other array. It is
> also the case that if a dimension in the signature is 1, then it is
> broadcast with the data in the other signature. That is, the parts
> identified in the signature are treated as little arrays. The proposed
> change is to require that the inputs have at least as many dimensions as
> the signature and that dimensions of 1 are not broadcast. That is, the
> parts identified in the signature are treated as vectors or matrices rather
> than little arrays.
>

It looks like we get to keep all the blame to ourselves... ;-)

After sleeping on it, I am leaning more and more towards Nathaniel's
proposal. Make it a bug, but leave it ready to be a warning if needed,
preferably triggered by something like #define STRICT_SIGNATURE and a bunch
of #ifdef's scattered through the code. It shouldn't be much more work than
what already is in place. We would merge another commit right before the
beta cycle removing the unused code, and if anyone complains revert that
commit, remove the #define, and keep it for one release before deprecating
it.

Testing for both possibilities is going to be a little trickier to get
right, but should be doable.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
de dominaci?n mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/9cc0cdd1/attachment.html>

From charlesr.harris at gmail.com  Wed Sep 24 17:30:37 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 24 Sep 2014 15:30:37 -0600
Subject: [Numpy-discussion] Changes to the generalized functions.
In-Reply-To: <CAPOWHWmDf+KeOkj=ZehCYoOAa30x_JxF1gXJO15D1A2CnZsCEg@mail.gmail.com>
References: <CAB6mnxLgrdUV5eTpvwqg-rH7vvDWm2Og+JAOSbf10rkcmSFpoQ@mail.gmail.com>
	<CAB6mnxKu6w4Ku94PrtpD-1VwtM43_s=KRe8mjzsULjukd5qt_A@mail.gmail.com>
	<CAPOWHWmDf+KeOkj=ZehCYoOAa30x_JxF1gXJO15D1A2CnZsCEg@mail.gmail.com>
Message-ID: <CAB6mnx+NFiKOs8pbLskSNZ+v184p1UEOXqOmCCBBhMZ4UUcQEw@mail.gmail.com>

On Wed, Sep 24, 2014 at 2:54 PM, Jaime Fern?ndez del R?o <
jaime.frio at gmail.com> wrote:

> On Tue, Sep 23, 2014 at 4:31 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Tue, Sep 23, 2014 at 4:59 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> The question has come up as the whether of not to treat the new gufunc
>>> behavior as a bug fix, keeping the old constructor name, or have a
>>> different constructor. Keeping the name makes life easier as we don't need
>>> to edit the code where numpy currently uses gufuncs, but is risky if some
>>> third party depends on the old behavior. The gufuncs have been part of
>>> numpy since the 1.3 release, and google doesn't turn up any uses that I can
>>> see apart from repeats of numpy code. We can also make fixes if needed
>>> during the 1.10 beta release cycle. Even so, it is a bit of a risk. To
>>> spread the blame, if any, please weigh in on the following.
>>>
>>>
>>>    1. Yes, it is a bug, keep the current name and fix the behavior.
>>>    2. No, we need to be conservative and use a new function.
>>>
>>> To clarify the changes. Currently, if an input array does not have as
>> many dimensions as indicated by the signature, it is filled out with ones
>> to match the signature as well as broadcasting to the other array. It is
>> also the case that if a dimension in the signature is 1, then it is
>> broadcast with the data in the other signature. That is, the parts
>> identified in the signature are treated as little arrays. The proposed
>> change is to require that the inputs have at least as many dimensions as
>> the signature and that dimensions of 1 are not broadcast. That is, the
>> parts identified in the signature are treated as vectors or matrices rather
>> than little arrays.
>>
>
> It looks like we get to keep all the blame to ourselves... ;-)
>
> After sleeping on it, I am leaning more and more towards Nathaniel's
> proposal. Make it a bug, but leave it ready to be a warning if needed,
> preferably triggered by something like #define STRICT_SIGNATURE and a bunch
> of #ifdef's scattered through the code. It shouldn't be much more work than
> what already is in place. We would merge another commit right before the
> beta cycle removing the unused code, and if anyone complains revert that
> commit, remove the #define, and keep it for one release before deprecating
> it.
>
> Testing for both possibilities is going to be a little trickier to get
> right, but should be doable.
>

I've been headed the other way, but using a simplified name for the new
function: PyUFunc_FromFuncAndDataAndSignature2, which is along the lines of
PyArray_MatrixProduct and  PyArray_MatrixProduct2 in the numpy API. I do
like Nathaniel's suggestion, but I also like taking chances and need to
fight the urge ;)

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/e7bf37bf/attachment.html>

From saullogiovani at gmail.com  Wed Sep 24 17:57:38 2014
From: saullogiovani at gmail.com (Saullo Castro)
Date: Wed, 24 Sep 2014 23:57:38 +0200
Subject: [Numpy-discussion] Interpolation using `np.interp()` with periodic
	x-coordinates
Message-ID: <CAHbwRz5pbytkkY7aO=ZYFJyfBVJE6baWRQ7U6W8dF4OunEDzLg@mail.gmail.com>

>From the closed pull request PR #5109:

https://github.com/numpy/numpy/pull/5109

it came out that the a good implementation would be adding a parameter
`period`. I would like to know about the community's interest for this
implementation.

The modification are shown here:

https://github.com/saullocastro/numpy/compare/interp_with_period?expand=1

Please, let me know about your feedback.

Regards,
Saullo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/5c2e4ce4/attachment.html>

From travis at continuum.io  Wed Sep 24 18:11:05 2014
From: travis at continuum.io (Travis Oliphant)
Date: Wed, 24 Sep 2014 17:11:05 -0500
Subject: [Numpy-discussion] Custom dtypes without C -- or,
 a standard ndarray-like type
In-Reply-To: <CALGmxEL=oV2nDUy1gAHCHp-g7=WH0uk0rkuGADi9r1VajVjx=w@mail.gmail.com>
References: <CAEQ_TvfKQtwLb4H=D5eaz3czn=Konih0L2onhhwnvdHbM59fSg@mail.gmail.com>
	<CAPJVwBkGvxA8FHjVN4M+dgh+avwyeRtaVgc8qyZqcrfvoeuW_A@mail.gmail.com>
	<CAFpSVpKCgiKYrvWr0BVmJk096UKrnC1cSV2MzfHsvZJuJhutqg@mail.gmail.com>
	<CAGeA38mOTmKEqSm7xRbfKYa_URXAwF1z56hUMwgmFtF7NdwHGQ@mail.gmail.com>
	<CALGmxEL=oV2nDUy1gAHCHp-g7=WH0uk0rkuGADi9r1VajVjx=w@mail.gmail.com>
Message-ID: <CAMcnTE4rrCohm1kTz+z26HhS=0QWMFJessSC1XuDCWXVunMg0A@mail.gmail.com>

This could actually be done by using the structured dtype pretty easily.
The hard work would be improving the ufunc and generalized ufunc mechanism
to handle structured data-types. Numba actually provides some of this
already, so if you have NumPy + Numba you can do this sort of thing now.

-Travis


On Wed, Sep 24, 2014 at 12:08 PM, Chris Barker <chris.barker at noaa.gov>
wrote:

> On Tue, Sep 23, 2014 at 4:40 AM, Eric Moore <ewm at redtetrahedron.org>
> wrote:
>
>>  Improving the dtype system requires working on c code.
>>
>
> yes -- it sure does. But I think that is a bit of a Red Herring. I'm
> barely competent in C, and don't like it much, but the real barrier to
> entry for  me is not that it's in C, but that it's really complex and hard
> to hack on, as it wasn't designed to support custom dtypes, etc. from the
> start. There is a lot of ugly code in there that has been hacked in to
> support various functionality over time. If there was a clean
> dtype-extension system in C, then A) it wouldn't be bad C to write, and B)
> would be pretty easy to make a Cython-wrapped version.
>
> Travis gave a nice vision for the future, but in the meantime, I'm
> wondering:
>
> Could we hack in a generic "custom dtype"  dtype object into the current
> system that would delegate everything to the dtype object -- in a truly
> object-oriented way. I'm imagining that this custom dtype object would be a
> pyObject and thus very hackable, easy to make a new subclass, etc --
> essentially like making a new class in python that emulates one of the
> built-in type interfaces.
>
> This would be slow as a dog -- if inside that C loop, numpy would have to
> call out to python to do anyting, maybe as simple as arithmetic, but it
> would be clean, extensible system, and a good way for folks to plug in and
> try out new dtypes when performance didn't matter, or as prototypes for
> something that would get plugged in at the C level later once the API was
> worked out.
>
> Is this even possible without too much hacking to the current dtype
> system? Would it be as simple as adding a bit to the object dtype?
>
> -Chris
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

Travis Oliphant
CEO
Continuum Analytics, Inc.
http://www.continuum.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140924/f9d7db8c/attachment.html>

From aron at ahmadia.net  Thu Sep 25 10:36:57 2014
From: aron at ahmadia.net (Aron Ahmadia)
Date: Thu, 25 Sep 2014 10:36:57 -0400
Subject: [Numpy-discussion] Fwd: Sport Rock Friday?
In-Reply-To: <CADtjAUn0+1Gcb-kYp0fms+1v0oXaqmtxt70aZQ=RG1kEcQ0PEA@mail.gmail.com>
References: <CADtjAUn0+1Gcb-kYp0fms+1v0oXaqmtxt70aZQ=RG1kEcQ0PEA@mail.gmail.com>
Message-ID: <CAPhiW4i-h0BLT2qsOobze07NWygOZXBD9PatjHsCatQ_+CQTFw@mail.gmail.com>

---------- Forwarded message ----------
From: *Brian Joseph* <briansenta at gmail.com>
Date: Thursday, September 25, 2014
Subject: Sport Rock Friday?
To: Aron Ahmadia <aron at ahmadia.net>


Hey Aron want to hit up Sport Rock tomorrow night?  Meg and I would like to
go.  Meg has to work until 8, so how does 8:30 sound?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140925/e47e238b/attachment.html>

From aron at ahmadia.net  Thu Sep 25 10:37:58 2014
From: aron at ahmadia.net (Aron Ahmadia)
Date: Thu, 25 Sep 2014 10:37:58 -0400
Subject: [Numpy-discussion] Sport Rock Friday?
In-Reply-To: <CAPhiW4i-h0BLT2qsOobze07NWygOZXBD9PatjHsCatQ_+CQTFw@mail.gmail.com>
References: <CADtjAUn0+1Gcb-kYp0fms+1v0oXaqmtxt70aZQ=RG1kEcQ0PEA@mail.gmail.com>
	<CAPhiW4i-h0BLT2qsOobze07NWygOZXBD9PatjHsCatQ_+CQTFw@mail.gmail.com>
Message-ID: <CAPhiW4g83euL45YrFGnuvyZE_yDsvp==inJqx9u1cWfHCcCRxA@mail.gmail.com>

Sorry for the SPAM folks.  Itchy smartphone :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140925/ffdb8dd1/attachment.html>

From cimrman3 at ntc.zcu.cz  Thu Sep 25 11:21:38 2014
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Thu, 25 Sep 2014 17:21:38 +0200
Subject: [Numpy-discussion] ANN: SfePy 2014.3
Message-ID: <54243302.1010701@ntc.zcu.cz>

I am pleased to announce release 2014.3 of SfePy.

Description
-----------

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (preliminary support). It is distributed under the new
BSD license.

Home page: http://sfepy.org
Mailing list: http://groups.google.com/group/sfepy-devel
Git (source) repository, issue tracker, wiki: http://github.com/sfepy

Highlights of this release
--------------------------

- isogeometric analysis (IGA) speed-up by C implementation of NURBS basis
   evaluation
- generalized linear combination boundary conditions that work between
   different fields/variables and support non-homogeneous periodic conditions
- non-constant essential boundary conditions given by a function in IGA
- reorganized and improved documentation

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Best regards,
Robert Cimrman and Contributors (*)

(*) Contributors to this release (alphabetical order):

Vladimir Lukes, Matyas Novak, Zhihua Ouyang, Jaroslav Vondrejc


From warren.weckesser at gmail.com  Thu Sep 25 19:56:02 2014
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Thu, 25 Sep 2014 19:56:02 -0400
Subject: [Numpy-discussion] Online docs for numpy are for version 1.8
Message-ID: <CAGzF1ufJ6TKZ3Hs4fQpX0tPd_jesq2uAsOa9aN2oKQBL-VvOcw@mail.gmail.com>

Pinging the webmeisters: numpy 1.9 is released, but the docs at
http://docs.scipy.org/doc/numpy/ are still for version 1.8.

Warren


From damian.avila at continuum.io  Thu Sep 25 23:34:36 2014
From: damian.avila at continuum.io (Damian Avila)
Date: Fri, 26 Sep 2014 00:34:36 -0300
Subject: [Numpy-discussion] ANN: Bokeh 0.6.1 release
Message-ID: <CAM9Ly3EjBiB8OKcu24P23FjHWXecHAfdQGZP79c4s4iFPYoehQ@mail.gmail.com>

On behalf of the Bokeh team, I am very happy to announce the release of
Bokeh version 0.6.1!

Bokeh is a Python library for visualizing large and realtime datasets on
the web. Its goal is to provide to developers (and domain experts) with
capabilities to easily create novel and powerful visualizations that
extract insight from local or remote (possibly large) data sets, and to
easily publish those visualization to the web for others to explore and
interact with.

This point release includes several bug fixes and improvements over our
most recent 0.6.0 release:

* Toolbar enhancements
* bokeh-server fixes
* Improved documentation
* Button widgets
* Google map support in the Python side
* Code cleanup in the JS side and examples
* New examples

See the CHANGELOG for full details.

In upcoming releases, you should expect to see more new layout capabilities
(colorbar axes, better grid plots and improved annotations), additional
tools, even more widgets and more charts, R language bindings, Blaze
integration and cloud hosting for Bokeh apps.

Don't forget to check out the full documentation, interactive gallery, and
tutorial at

    http://bokeh.pydata.org

as well as the Bokeh IPython notebook nbviewer index (including all the
tutorials) at:


http://nbviewer.ipython.org/github/ContinuumIO/bokeh-notebooks/blob/master/index.ipynb

If you are using Anaconda or miniconda, you can install with conda:

    conda install bokeh

Alternatively, you can install with pip:

    pip install bokeh

BokehJS is also available by CDN for use in standalone javascript
applications:

    http://cdn.pydata.org/bokeh-0.6.1.min.js
    http://cdn.pydata.org/bokeh-0.6.1.min.css

Issues, enhancement requests, and pull requests can be made on the Bokeh
Github page:

    https://github.com/continuumio/bokeh

Questions can be directed to the Bokeh mailing list: bokeh at continuum.io

If you have interest in helping to develop Bokeh, please get involved!

Cheers,


Dami?n Avila
damian.avila at continuum.io
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140926/99540403/attachment.html>

From lanceboyle at qwest.net  Fri Sep 26 04:41:55 2014
From: lanceboyle at qwest.net (Jerry)
Date: Fri, 26 Sep 2014 01:41:55 -0700
Subject: [Numpy-discussion] Hamming etc. windows are wrong
Message-ID: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>

I?ve noticed that the Hamming window (and I suppose other windows) provided by Octave, SciPy****, and NumPy***** is wrong, returning a window that is symmetric rather than one that is ?DFT symmetric.? This is easily spotted by looking at the first and last points of the window; if they are the same values, then the window is incorrect.

These windows are normally applied to data that are equally spaced, thus implying that their DFTs are periodic. Let?s assume that we are windowing N data points, indexed from 0 to N-1 as is normal in signal processing. Then the DFT is also N points long, also indexed from 0 to N-1. The periodic extension principal implies that the point indexed by N in the DFT domain has the same value as the point indexed by 0. The window can also be imagined to have the same kind of periodic extension, so that what would be the N-th point of the window matches the N-point of the data with the same weight as was applied to the 0-th point of the data but without duplicating that point at N-1.

This mistake is widespread, infecting many software packages. (I am sending this same note to the NumPy, SciPy, and Octave lists.) It is certainly wrong in most textbooks, including the classic, Oppenheim & Schafer, 1975. However, the definitive and widely referenced paper on windows by Fredric J. Harris, ?On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform,? Proceedings of the IEEE, vol. 66, NO. 1 , January, 1978 * discusses this problem, both its roots and its correction. For example, quoting from the paper:

?Since the DFT essentially considers sequences to be periodic, we can consider the missing end point to be the beginning of the next period of the periodic extension of this sequence. In fact, under the periodic extension, the next sample (at 16 s in Fig. 1.) is indistinguishable from the sample at zero seconds.

?This apparent lack of symmetry due to the missing (but implied) end point is a source of confusion in sampled window design. This can be traced to the early work related to convergence factors for the partial sums of the Fourier series. The partial sums (or the finite Fourier transform) always include an odd number of points and exhibit even symmetry about the origin. Hence much of the literature and many software libraries incorporate windows designed with true even symmetry rather than the implied symmetry with the missing end point!?

...and later...

?We will now catalog some well-known (and some not well-known windows. For each window we will comment on the justification for its use and identify its significant parameters. All the windows will be presented as even (about the origin) sequences with an odd number of points. To convert the window to DFT even, the right endpoint will be discarded and the sequence will be shifted so that the left end point coincides with the origin.?

Note that he limits himself therefore to even-numbered "DFT even" windows, but the rule remains--honor the implied periodicity even for windows of odd length.

Lest one consider this a trivial difference, consider the following Octave-Matlab code (corresponding NumPy and SciPy code would yield the same results).

x = ones(8, 1); # An even sequence; its DFT should be real.
hWrong = hamming(8)
h9 = hamming(9);
hRight = h9(1:8)
xWrong = x .* hWrong;   # Multiplication by "ones"
xRight = x .* hRight;   # shown for clarity.
XWrong = fft(xWrong)    # Contains nonzero imaginary part.
XRight = fft(xRight)    # Real, as expected from evenness of x.
XWrongMag = abs(XWrong) # The magnitudes are
XRightMag = abs(XRight) # also different.

This causes the following output:

hWrong =

  0.080000
  0.253195
  0.642360
  0.954446
  0.954446
  0.642360
  0.253195
  0.080000

hRight =

  0.080000
  0.214731
  0.540000
  0.865269
  1.000000
  0.865269
  0.540000
  0.214731

XWrong =

  3.86000 + 0.00000i
 -1.76795 - 0.73231i
  0.13889 + 0.13889i
  0.01906 + 0.04602i
  0.00000 + 0.00000i
  0.01906 - 0.04602i
  0.13889 - 0.13889i
 -1.76795 + 0.73231i

XRight =

  4.32000 + 0.00000i
 -1.84000 + 0.00000i
  0.00000 + 0.00000i
  0.00000 + 0.00000i
  0.00000 + 0.00000i
  0.00000 - 0.00000i
  0.00000 - 0.00000i
 -1.84000 - 0.00000i

XWrongMag =

  3.86000
  1.91362
  0.19642
  0.04981
  0.00000
  0.04981
  0.19642
  1.91362

XRightMag =

  4.32000
  1.84000
  0.00000
  0.00000
  0.00000
  0.00000
  0.00000
  1.84000

This news will likely upset many and I will be interested to see what arguments will flow in favor of keeping the status quo--I?m sure one will be, ?we?ve always done it this way? and others will be more substantial, possibly expressing decent arguments why the current situation is useful for some applications. In any case, I refer again to the Harris paper.

So rather than host an argument on this list, this is what I propose: Do what Matlab** does. Acknowledge both uses by making a flag to handle the ?periodic? or the ?symmetric? cases. The Matlab default is ?symmetric? which is of course unfortunate but at least such inclusion in Octave, NumPy, and SciPy would retain compatibility with the existing usage. Then it?s up to the user whether to shoot him/herself in the foot, assuming that such a decision is guided by actually referring to the documentation for the package being used and not blindly using the default.

Jerry Bauck


* http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1455106&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1455106

** http://www.mathworks.com/help/signal/ref/hamming.html

*** http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.hamming.html

**** I am just learning Python-NumPy-SciPy but it appears as though the SciPy situation is that the documentation page above *** mentions the flag but the flag has not been implemented into SciPy itself. I would be glad to stand corrected.

***** http://docs.scipy.org/doc/numpy/reference/generated/numpy.hamming.html


From davidmenhur at gmail.com  Fri Sep 26 05:56:32 2014
From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=)
Date: Fri, 26 Sep 2014 11:56:32 +0200
Subject: [Numpy-discussion] [SciPy-Dev] Hamming etc. windows are wrong
In-Reply-To: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>
References: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>
Message-ID: <CAJhcF=3Ee=EmZU2m61xguZrnd47nH3upa6HCMefUvqEnF=JJQA@mail.gmail.com>

On 26 September 2014 10:41, Jerry <lanceboyle at qwest.net> wrote:

> **** I am just learning Python-NumPy-SciPy but it appears as though the
> SciPy situation is that the documentation page above *** mentions the flag
> but the flag has not been implemented into SciPy itself. I would be glad to
> stand corrected.


Regarding this, you can look at the source:

odd = M % 2
if not sym and not odd:
    w = w[:-1]

so it is implemented  in Scipy, explicitely limited to even windows. It is
not in Numpy, though (but that can be fixed).

I think a reference is worth adding to the documentation, to make the
intent and application of the sym flag clear. I am sure a PR will be most
appreciated.


/David.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140926/78c277ea/attachment.html>

From lanceboyle at qwest.net  Fri Sep 26 06:39:56 2014
From: lanceboyle at qwest.net (Jerry)
Date: Fri, 26 Sep 2014 03:39:56 -0700
Subject: [Numpy-discussion] [SciPy-Dev] Hamming etc. windows are wrong
In-Reply-To: <CAJhcF=3Ee=EmZU2m61xguZrnd47nH3upa6HCMefUvqEnF=JJQA@mail.gmail.com>
References: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>
	<CAJhcF=3Ee=EmZU2m61xguZrnd47nH3upa6HCMefUvqEnF=JJQA@mail.gmail.com>
Message-ID: <47D57189-401F-4C37-B646-8565BFF7958A@qwest.net>


On Sep 26, 2014, at 2:56 AM, Da?id <davidmenhur at gmail.com> wrote:

> 
> On 26 September 2014 10:41, Jerry <lanceboyle at qwest.net> wrote:
> **** I am just learning Python-NumPy-SciPy but it appears as though the SciPy situation is that the documentation page above *** mentions the flag but the flag has not been implemented into SciPy itself. I would be glad to stand corrected.

Hmm.... Earlier I was getting a Python error when I tried to supply the second argument, sym. That was the reason for my comment but now it accepts the second argument without complaint.
> 
> Regarding this, you can look at the source:
> 
> odd = M % 2
> if not sym and not odd:
>     w = w[:-1]
> 
> so it is implemented  in Scipy, explicitely limited to even windows.

I can't see any reason this should be limited to even-length windows. Like I said in my original note, the principal applies regardless of even- or odd-length windows.

Jerry

> It is not in Numpy, though (but that can be fixed).
> 
> I think a reference is worth adding to the documentation, to make the intent and application of the sym flag clear. I am sure a PR will be most appreciated.
> 
>  
> /David.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140926/3fbccba2/attachment.html>

From tom.augspurger88 at gmail.com  Fri Sep 26 10:22:00 2014
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Fri, 26 Sep 2014 07:22:00 -0700 (PDT)
Subject: [Numpy-discussion] ANN: Bokeh 0.6.1 release
In-Reply-To: <CAM9Ly3EjBiB8OKcu24P23FjHWXecHAfdQGZP79c4s4iFPYoehQ@mail.gmail.com>
References: <CAM9Ly3EjBiB8OKcu24P23FjHWXecHAfdQGZP79c4s4iFPYoehQ@mail.gmail.com>
Message-ID: <70e53b8a-ec10-4b49-a54f-6de2e10e7053@googlegroups.com>

Congrats on the release. I've been playing with Bokeh the last couple weeks 
and it's really good.

There was some talk about expanding the ecosystem page 
<http://pandas.pydata.org/pandas-docs/stable/ecosystem.html> in the pandas 
docs to show off brief examples for the various projects (btw I think we 
should rename that to **pydata** ecosystem, instead of pandas ecosystem).

Would you say the API for something like a ColumnDataSource example (like 
the car example 
<http://bokeh.pydata.org/tutorial/solutions/gallery/brushing.html>) is 
stable enough to include in our docs?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140926/a28e2702/attachment.html>

From bryanv at continuum.io  Fri Sep 26 16:18:45 2014
From: bryanv at continuum.io (Bryan Van de Ven)
Date: Fri, 26 Sep 2014 15:18:45 -0500
Subject: [Numpy-discussion] ANN: Bokeh 0.6.1 release
In-Reply-To: <70e53b8a-ec10-4b49-a54f-6de2e10e7053@googlegroups.com>
References: <CAM9Ly3EjBiB8OKcu24P23FjHWXecHAfdQGZP79c4s4iFPYoehQ@mail.gmail.com>
	<70e53b8a-ec10-4b49-a54f-6de2e10e7053@googlegroups.com>
Message-ID: <A0D786E9-1948-44C2-8E91-0ADB22CF7DD4@continuum.io>

Hi Tom,

I would wait for 0.7, where we are going to make a few changes to the plotting.py interface. That API started out as a "stateful, implicit current plot" style interface, and experience has shown that this does not always work well, especially in the IPython notebook where you can have any order of execution of the cells. It basically not possible to reason about which plot is the "current plot" and we got lots of support issues around this. The interface is not changing drastically, but it is changing some. Basically instead of "line(...)" acting on some implicit plot, you will have "p.line(...)" acting on some explicitly specified plot "p". Currently 0.7 looks to be release late October/early November, all the examples will be updated at that time, and I would not expect that API to change any time soon after. 

Bryan

On Sep 26, 2014, at 9:22 AM, Tom Augspurger <tom.augspurger88 at gmail.com> wrote:

> Congrats on the release. I've been playing with Bokeh the last couple weeks and it's really good.
> 
> There was some talk about expanding the ecosystem page in the pandas docs to show off brief examples for the various projects (btw I think we should rename that to **pydata** ecosystem, instead of pandas ecosystem).
> 
> Would you say the API for something like a ColumnDataSource example (like the car example) is stable enough to include in our docs?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


From pmhobson at gmail.com  Fri Sep 26 17:03:46 2014
From: pmhobson at gmail.com (Paul Hobson)
Date: Fri, 26 Sep 2014 14:03:46 -0700
Subject: [Numpy-discussion] [SciPy-Dev] Hamming etc. windows are wrong
In-Reply-To: <47D57189-401F-4C37-B646-8565BFF7958A@qwest.net>
References: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>
	<CAJhcF=3Ee=EmZU2m61xguZrnd47nH3upa6HCMefUvqEnF=JJQA@mail.gmail.com>
	<47D57189-401F-4C37-B646-8565BFF7958A@qwest.net>
Message-ID: <CADT3MEDYpWjTzOQAuXFgYKCbfJYsCYVGrbjo_mm90BMhrumjCg@mail.gmail.com>

On Fri, Sep 26, 2014 at 3:39 AM, Jerry <lanceboyle at qwest.net> wrote:

>
> On Sep 26, 2014, at 2:56 AM, Da?id <davidmenhur at gmail.com> wrote:
>
>
> On 26 September 2014 10:41, Jerry <lanceboyle at qwest.net> wrote:
>
>> **** I am just learning Python-NumPy-SciPy but it appears as though the
>> SciPy situation is that the documentation page above *** mentions the flag
>> but the flag has not been implemented into SciPy itself. I would be glad to
>> stand corrected.
>
>
> Hmm.... Earlier I was getting a Python error when I tried to supply the
> second argument, sym. That was the reason for my comment but now it accepts
> the second argument without complaint.
>

Just curious: had you imported pylab at any point? You sample code looks
like you did several "from xxx import *" and it might have clobbered the
scipy version (or vise versa).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140926/a14a6f0a/attachment.html>

From lanceboyle at qwest.net  Fri Sep 26 20:10:10 2014
From: lanceboyle at qwest.net (Jerry)
Date: Fri, 26 Sep 2014 17:10:10 -0700
Subject: [Numpy-discussion] [SciPy-Dev] Hamming etc. windows are wrong
In-Reply-To: <CADT3MEDYpWjTzOQAuXFgYKCbfJYsCYVGrbjo_mm90BMhrumjCg@mail.gmail.com>
References: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>
	<CAJhcF=3Ee=EmZU2m61xguZrnd47nH3upa6HCMefUvqEnF=JJQA@mail.gmail.com>
	<47D57189-401F-4C37-B646-8565BFF7958A@qwest.net>
	<CADT3MEDYpWjTzOQAuXFgYKCbfJYsCYVGrbjo_mm90BMhrumjCg@mail.gmail.com>
Message-ID: <6D3A944A-E4F3-4B79-8E7A-443758AE1358@qwest.net>


On Sep 26, 2014, at 2:03 PM, Paul Hobson <pmhobson at gmail.com> wrote:

> 
> 
> On Fri, Sep 26, 2014 at 3:39 AM, Jerry <lanceboyle at qwest.net> wrote:
> 
> On Sep 26, 2014, at 2:56 AM, Da?id <davidmenhur at gmail.com> wrote:
> 
>> 
>> On 26 September 2014 10:41, Jerry <lanceboyle at qwest.net> wrote:
>> **** I am just learning Python-NumPy-SciPy but it appears as though the SciPy situation is that the documentation page above *** mentions the flag but the flag has not been implemented into SciPy itself. I would be glad to stand corrected.
> 
> Hmm.... Earlier I was getting a Python error when I tried to supply the second argument, sym. That was the reason for my comment but now it accepts the second argument without complaint.
> 
> Just curious: had you imported pylab at any point? You sample code looks like you did several "from xxx import *" and it might have clobbered the scipy version (or vise versa).

I don't recall what I had imported, but probably scipy, as in "import scipy", or maybe "import scipy as sc." I do believe that I was working interactively so I don't have any actual evidence at this point.

But I just had this thought?maybe I was in numpy and not scipy. For some reason both numpy and scipy provide a hamming function (why?) so maybe I was using the numpy version by accident which IIRC does not have the second argument.

My sample code is in Octave. 8^)

Jerry
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140926/7e6b1ac3/attachment.html>

From robert.kern at gmail.com  Sat Sep 27 06:37:12 2014
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 27 Sep 2014 11:37:12 +0100
Subject: [Numpy-discussion] [SciPy-Dev] Hamming etc. windows are wrong
In-Reply-To: <6D3A944A-E4F3-4B79-8E7A-443758AE1358@qwest.net>
References: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>
	<CAJhcF=3Ee=EmZU2m61xguZrnd47nH3upa6HCMefUvqEnF=JJQA@mail.gmail.com>
	<47D57189-401F-4C37-B646-8565BFF7958A@qwest.net>
	<CADT3MEDYpWjTzOQAuXFgYKCbfJYsCYVGrbjo_mm90BMhrumjCg@mail.gmail.com>
	<6D3A944A-E4F3-4B79-8E7A-443758AE1358@qwest.net>
Message-ID: <CAF6FJiuysOXdLmQtjBjv86wLyVqMqpU1-cJFDLz2YuP9d6dCCQ@mail.gmail.com>

On Sat, Sep 27, 2014 at 1:10 AM, Jerry <lanceboyle at qwest.net> wrote:

> I don't recall what I had imported, but probably scipy, as in "import
> scipy", or maybe "import scipy as sc." I do believe that I was working
> interactively so I don't have any actual evidence at this point.
>
> But I just had this thought?maybe I was in numpy and not scipy. For some
> reason both numpy and scipy provide a hamming function (why?) so maybe I was
> using the numpy version by accident which IIRC does not have the second
> argument.
>
> My sample code is in Octave. 8^)

scipy's hamming() function is scipy.signal.hamming(). It's in the
subpackage, and you need to import it like so:

  from scipy import signal
  signal.hamming(...)

There *is* a scipy.hamming() function as well, which is just an alias
for numpy.hamming().

  [~]
  |1> import scipy

  [~]
  |2> import numpy

  [~]
  |3> scipy.hamming is numpy.hamming
  True

Explaining why is a long story, but let's just say "historical
reasons" and leave it at that. You almost never want to just "import
scipy". All of the scipy goodness is in the subpackages.

-- 
Robert Kern


From darcamo at gmail.com  Sat Sep 27 12:37:59 2014
From: darcamo at gmail.com (Darlan Cavalcante Moreira)
Date: Sat, 27 Sep 2014 13:37:59 -0300
Subject: [Numpy-discussion] Strange bug in SVD when numpy is installed with
	pip
Message-ID: <87vbo96mu0.fsf@gmail.com>


Some time ago I have reported a bug about the linalg.matrix_rank in
numpy for complex matrices. This was quickly fixed and to take the
advantage of the fix I'm now using numpy 1.9.0 installed through pip,
instead of the version from my system (Ubuntu 14.04, with numpy version
1.8.1).

However, I have now encountered a very strange bug in the SVD function,
but only when numpy is manually installed (with pip in my case). When I
calculate the SVD of complex matrices with more columns than rows the
last rows of the returned V_H matrix are all equal to zeros. This does
not happens for all shapes, but for the ones where this happens it will
always happen.

This can be reproduced with the code below. You can change the sizes of
M and N and it happens for other sizes where N > M, but now all of them.

--8<---------------cut here---------------start------------->8---
import numpy as np

M = 8                           # Number of rows
N = 12                          # Number of columns

# Calculate the SVD of a non-square complex random matrix
[U, S, V_H] = np.linalg.svd(np.random.randn(M, N) + 1j*np.random.randn(M, N), full_matrices=True)

# Calculate the norm of the submatrix formed by the last N-M rows
if np.linalg.norm(V_H[M-N:]) < 1e-30:
    print("Bug!")
else:
    print("No Bug")
# See the N-M rows. They are all equal to zeros
print(V_H[M-N:])
--8<---------------cut here---------------end--------------->8---

The original matrix can still be obtained from the decomposition, since
the zero rows correspond to zero singular values due to the fact that
the original matrix has more columns then rows. However, since the user
asked for 'full_matrices' here (the default) returning all zeroes for
these extra rows is not useful.

In order to isolate the bug I tried installing some different numpy
versions in different virtualenvs. I tried version 1.6, 1.7, 1.8.1 and
1.9 and the bug appears in all of then. Since it does not happen if I
use the version 1.8.1 installed through the Ubuntu package manager, I
imagine it is due to some issue when pip compiles numpy locally.

Note: If I run numpy.testing.test() all tests are OK for all numpy versions I
have tested. The only fail is a known fail and I get "OK (KNOWNFAIL=1)".


-- 
Darlan Cavalcante Moreira


From charlesr.harris at gmail.com  Sat Sep 27 13:33:18 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 27 Sep 2014 11:33:18 -0600
Subject: [Numpy-discussion] Strange bug in SVD when numpy is installed
 with pip
In-Reply-To: <87vbo96mu0.fsf@gmail.com>
References: <87vbo96mu0.fsf@gmail.com>
Message-ID: <CAB6mnxKK6-7Z76SdcxHzhw7z0SJy80mBn744i=VeNxWoxS7=6Q@mail.gmail.com>

On Sat, Sep 27, 2014 at 10:37 AM, Darlan Cavalcante Moreira <
darcamo at gmail.com> wrote:

>
> Some time ago I have reported a bug about the linalg.matrix_rank in
> numpy for complex matrices. This was quickly fixed and to take the
> advantage of the fix I'm now using numpy 1.9.0 installed through pip,
> instead of the version from my system (Ubuntu 14.04, with numpy version
> 1.8.1).
>
> However, I have now encountered a very strange bug in the SVD function,
> but only when numpy is manually installed (with pip in my case). When I
> calculate the SVD of complex matrices with more columns than rows the
> last rows of the returned V_H matrix are all equal to zeros. This does
> not happens for all shapes, but for the ones where this happens it will
> always happen.
>
> This can be reproduced with the code below. You can change the sizes of
> M and N and it happens for other sizes where N > M, but now all of them.
>
> --8<---------------cut here---------------start------------->8---
> import numpy as np
>
> M = 8                           # Number of rows
> N = 12                          # Number of columns
>
> # Calculate the SVD of a non-square complex random matrix
> [U, S, V_H] = np.linalg.svd(np.random.randn(M, N) + 1j*np.random.randn(M,
> N), full_matrices=True)
>
> # Calculate the norm of the submatrix formed by the last N-M rows
> if np.linalg.norm(V_H[M-N:]) < 1e-30:
>     print("Bug!")
> else:
>     print("No Bug")
> # See the N-M rows. They are all equal to zeros
> print(V_H[M-N:])
> --8<---------------cut here---------------end--------------->8---
>
> The original matrix can still be obtained from the decomposition, since
> the zero rows correspond to zero singular values due to the fact that
> the original matrix has more columns then rows. However, since the user
> asked for 'full_matrices' here (the default) returning all zeroes for
> these extra rows is not useful.
>
> In order to isolate the bug I tried installing some different numpy
> versions in different virtualenvs. I tried version 1.6, 1.7, 1.8.1 and
> 1.9 and the bug appears in all of then. Since it does not happen if I
> use the version 1.8.1 installed through the Ubuntu package manager, I
> imagine it is due to some issue when pip compiles numpy locally.
>
> Note: If I run numpy.testing.test() all tests are OK for all numpy
> versions I
> have tested. The only fail is a known fail and I get "OK (KNOWNFAIL=1)".
>
>
What does `np.__config__.show()` show?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140927/63f5f3b5/attachment.html>

From darcamo at gmail.com  Sat Sep 27 15:46:10 2014
From: darcamo at gmail.com (Darlan Cavalcante Moreira)
Date: Sat, 27 Sep 2014 16:46:10 -0300
Subject: [Numpy-discussion] Strange bug in SVD when numpy is installed
	with pip
In-Reply-To: <CAB6mnxKK6-7Z76SdcxHzhw7z0SJy80mBn744i=VeNxWoxS7=6Q@mail.gmail.com>
References: <87vbo96mu0.fsf@gmail.com>
	<CAB6mnxKK6-7Z76SdcxHzhw7z0SJy80mBn744i=VeNxWoxS7=6Q@mail.gmail.com>
Message-ID: <87tx3s7sot.fsf@gmail.com>


>>> np.__config__.show()
lapack_info:
  NOT AVAILABLE
lapack_opt_info:
  NOT AVAILABLE
blas_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
blas_src_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
lapack_src_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
blas_opt_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE


With so many "NOT AVAILABLE" I'm now surprised SVD is even defined.


Charles R Harris writes:

> On Sat, Sep 27, 2014 at 10:37 AM, Darlan Cavalcante Moreira <
> darcamo at gmail.com> wrote:
>
>>
>> Some time ago I have reported a bug about the linalg.matrix_rank in
>> numpy for complex matrices. This was quickly fixed and to take the
>> advantage of the fix I'm now using numpy 1.9.0 installed through pip,
>> instead of the version from my system (Ubuntu 14.04, with numpy version
>> 1.8.1).
>>
>> However, I have now encountered a very strange bug in the SVD function,
>> but only when numpy is manually installed (with pip in my case). When I
>> calculate the SVD of complex matrices with more columns than rows the
>> last rows of the returned V_H matrix are all equal to zeros. This does
>> not happens for all shapes, but for the ones where this happens it will
>> always happen.
>>
>> This can be reproduced with the code below. You can change the sizes of
>> M and N and it happens for other sizes where N > M, but now all of them.
>>
>> --8<---------------cut here---------------start------------->8---
>> import numpy as np
>>
>> M = 8                           # Number of rows
>> N = 12                          # Number of columns
>>
>> # Calculate the SVD of a non-square complex random matrix
>> [U, S, V_H] = np.linalg.svd(np.random.randn(M, N) + 1j*np.random.randn(M,
>> N), full_matrices=True)
>>
>> # Calculate the norm of the submatrix formed by the last N-M rows
>> if np.linalg.norm(V_H[M-N:]) < 1e-30:
>>     print("Bug!")
>> else:
>>     print("No Bug")
>> # See the N-M rows. They are all equal to zeros
>> print(V_H[M-N:])
>> --8<---------------cut here---------------end--------------->8---
>>
>> The original matrix can still be obtained from the decomposition, since
>> the zero rows correspond to zero singular values due to the fact that
>> the original matrix has more columns then rows. However, since the user
>> asked for 'full_matrices' here (the default) returning all zeroes for
>> these extra rows is not useful.
>>
>> In order to isolate the bug I tried installing some different numpy
>> versions in different virtualenvs. I tried version 1.6, 1.7, 1.8.1 and
>> 1.9 and the bug appears in all of then. Since it does not happen if I
>> use the version 1.8.1 installed through the Ubuntu package manager, I
>> imagine it is due to some issue when pip compiles numpy locally.
>>
>> Note: If I run numpy.testing.test() all tests are OK for all numpy
>> versions I
>> have tested. The only fail is a known fail and I get "OK (KNOWNFAIL=1)".
>>
>>
> What does `np.__config__.show()` show?
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Sent with my mu4e


From charlesr.harris at gmail.com  Sat Sep 27 16:37:10 2014
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 27 Sep 2014 14:37:10 -0600
Subject: [Numpy-discussion] Strange bug in SVD when numpy is installed
 with pip
In-Reply-To: <87tx3s7sot.fsf@gmail.com>
References: <87vbo96mu0.fsf@gmail.com>
	<CAB6mnxKK6-7Z76SdcxHzhw7z0SJy80mBn744i=VeNxWoxS7=6Q@mail.gmail.com>
	<87tx3s7sot.fsf@gmail.com>
Message-ID: <CAB6mnx+Yg_i2XPq0nOmW56BXXSqd4ypzrwmQBv-7Jkx6d+csCg@mail.gmail.com>

On Sat, Sep 27, 2014 at 1:46 PM, Darlan Cavalcante Moreira <
darcamo at gmail.com> wrote:

>
> >>> np.__config__.show()
> lapack_info:
>   NOT AVAILABLE
> lapack_opt_info:
>   NOT AVAILABLE
> blas_info:
>   NOT AVAILABLE
> atlas_threads_info:
>   NOT AVAILABLE
> blas_src_info:
>   NOT AVAILABLE
> atlas_blas_info:
>   NOT AVAILABLE
> lapack_src_info:
>   NOT AVAILABLE
> openblas_info:
>   NOT AVAILABLE
> atlas_blas_threads_info:
>   NOT AVAILABLE
> blas_mkl_info:
>   NOT AVAILABLE
> blas_opt_info:
>   NOT AVAILABLE
> atlas_info:
>   NOT AVAILABLE
> lapack_mkl_info:
>   NOT AVAILABLE
> mkl_info:
>   NOT AVAILABLE
>
>
> With so many "NOT AVAILABLE" I'm now surprised SVD is even defined.
>
>
>
Looks like it is falling back to numpy's internal versions. You might try
installing ATLAS to see if that makes a difference.

BTW, the numpy custom is bottom posting.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140927/e54456e0/attachment.html>

From darcamo at gmail.com  Sat Sep 27 18:10:26 2014
From: darcamo at gmail.com (darcamo at gmail.com)
Date: Sat, 27 Sep 2014 19:10:26 -0300
Subject: [Numpy-discussion] Strange bug in SVD when numpy is installed
 with pip
In-Reply-To: <CAB6mnx+Yg_i2XPq0nOmW56BXXSqd4ypzrwmQBv-7Jkx6d+csCg@mail.gmail.com>
References: <87vbo96mu0.fsf@gmail.com>
	<CAB6mnxKK6-7Z76SdcxHzhw7z0SJy80mBn744i=VeNxWoxS7=6Q@mail.gmail.com>
	<87tx3s7sot.fsf@gmail.com>
	<CAB6mnx+Yg_i2XPq0nOmW56BXXSqd4ypzrwmQBv-7Jkx6d+csCg@mail.gmail.com>
Message-ID: <CAGjSdXowfvApF10PsTwu+MMFX=XVM3AVbBvvF5VDW-cp07Nqwg@mail.gmail.com>

Thanks Charles!

I installed atlas and lapack and then reinstalled numpy with pip. It works
correctly now.

An easy fix for this for now is raising an exception when numpy is using
its internal version for SVD and the user asked for the full matrices. The
exception message can instruct the user to either install lapack and
recompile numpy, or set full_matrices to False (if that is enough and the
user can't reinstall numpy with lapack for some reason). This would avoid
the bug hunting I had to do in my code.


Em 27/09/2014 17:37, "Charles R Harris" <charlesr.harris at gmail.com>
escreveu:

>
>
> On Sat, Sep 27, 2014 at 1:46 PM, Darlan Cavalcante Moreira <
> darcamo at gmail.com> wrote:
>
>>
>> >>> np.__config__.show()
>> lapack_info:
>>   NOT AVAILABLE
>> lapack_opt_info:
>>   NOT AVAILABLE
>> blas_info:
>>   NOT AVAILABLE
>> atlas_threads_info:
>>   NOT AVAILABLE
>> blas_src_info:
>>   NOT AVAILABLE
>> atlas_blas_info:
>>   NOT AVAILABLE
>> lapack_src_info:
>>   NOT AVAILABLE
>> openblas_info:
>>   NOT AVAILABLE
>> atlas_blas_threads_info:
>>   NOT AVAILABLE
>> blas_mkl_info:
>>   NOT AVAILABLE
>> blas_opt_info:
>>   NOT AVAILABLE
>> atlas_info:
>>   NOT AVAILABLE
>> lapack_mkl_info:
>>   NOT AVAILABLE
>> mkl_info:
>>   NOT AVAILABLE
>>
>>
>> With so many "NOT AVAILABLE" I'm now surprised SVD is even defined.
>>
>>
>>
> Looks like it is falling back to numpy's internal versions. You might try
> installing ATLAS to see if that makes a difference.
>
> BTW, the numpy custom is bottom posting.
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140927/cd8be901/attachment.html>

From daverz at gmail.com  Sun Sep 28 15:05:59 2014
From: daverz at gmail.com (Dave Cook)
Date: Sun, 28 Sep 2014 12:05:59 -0700
Subject: [Numpy-discussion] Hamming etc. windows are wrong
In-Reply-To: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>
References: <32A9B832-5B17-4C3D-B573-336E4700FA36@qwest.net>
Message-ID: <CAJL+0em0DFFDtMkiW1NAm0E5UHM70u5q=OUwwjtScapaHLogbw@mail.gmail.com>

On Fri, Sep 26, 2014 at 1:41 AM, Jerry <lanceboyle at qwest.net> wrote:

> I?ve noticed that the Hamming window (and I suppose other windows)
> provided by Octave, SciPy****, and NumPy***** is wrong, returning a window
> that is symmetric rather than one that is ?DFT symmetric.? This is easily
> spotted by looking at the first and last points of the window; if they are
> the same values, then the window is incorrect.
>

If you use signal.get_window(), the default is sym=False:

def get_window(window, Nx, fftbins=True):
    # snip
    sym = not fftbins

Dave Cook
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140928/44ef64a1/attachment.html>

From jzwinck at gmail.com  Tue Sep 30 06:49:47 2014
From: jzwinck at gmail.com (John Zwinck)
Date: Tue, 30 Sep 2014 18:49:47 +0800
Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return
	dtype.names
Message-ID: <CAK_AF3CLL65woKv4S8iZQNSZLB5ryWatBiUkm-0Dy=5=5KUvEg@mail.gmail.com>

I first proposed this on GitHub:
https://github.com/numpy/numpy/issues/5134 ; jaimefrio requested that
I bring it to this list for discussion.

My proposal is to add a keys() method to NumPy's array class ndarray.
The behavior would be to return self.dtype.names, i.e. the "column
names" for a structured array (and None when dtype.names is None,
which it is for pure numeric arrays without named columns).

I originally proposed to add a values() method also, but I am tabling
that for now so we needn't discuss it in this thread.

The motivation is to enhance the ability to use duck typing with NumPy
arrays, Python dicts, and other types like Pandas DataFrames, h5py
Files, and more.  It's a fairly common thing to want to get the "keys"
of a container, where "keys" is understood to be a sequence of values
one can pass to __getitem__(), and this is exactly what I'm aiming at.

Thoughts?

John Zwinck


From hoogendoorn.eelco at gmail.com  Tue Sep 30 07:29:08 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Tue, 30 Sep 2014 13:29:08 +0200
Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return
	dtype.names
In-Reply-To: <CAK_AF3CLL65woKv4S8iZQNSZLB5ryWatBiUkm-0Dy=5=5KUvEg@mail.gmail.com>
References: <CAK_AF3CLL65woKv4S8iZQNSZLB5ryWatBiUkm-0Dy=5=5KUvEg@mail.gmail.com>
Message-ID: <CAO0rnfGPs29DQEu4OBJa59jXJrvTWhWZp9ipD6M+4sWFPRzkkA@mail.gmail.com>

Sounds fair to me. Indeed the ducktyping argument makes sense, and I have a
hard time imagining any namespace conflicts or other confusion. Should this
attribute return none for non-structured arrays, or simply be undefined?

On Tue, Sep 30, 2014 at 12:49 PM, John Zwinck <jzwinck at gmail.com> wrote:

> I first proposed this on GitHub:
> https://github.com/numpy/numpy/issues/5134 ; jaimefrio requested that
> I bring it to this list for discussion.
>
> My proposal is to add a keys() method to NumPy's array class ndarray.
> The behavior would be to return self.dtype.names, i.e. the "column
> names" for a structured array (and None when dtype.names is None,
> which it is for pure numeric arrays without named columns).
>
> I originally proposed to add a values() method also, but I am tabling
> that for now so we needn't discuss it in this thread.
>
> The motivation is to enhance the ability to use duck typing with NumPy
> arrays, Python dicts, and other types like Pandas DataFrames, h5py
> Files, and more.  It's a fairly common thing to want to get the "keys"
> of a container, where "keys" is understood to be a sequence of values
> one can pass to __getitem__(), and this is exactly what I'm aiming at.
>
> Thoughts?
>
> John Zwinck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140930/d1f774cf/attachment.html>

From ferdinand.bayard at strains.fr  Tue Sep 30 09:15:10 2014
From: ferdinand.bayard at strains.fr (Bayard)
Date: Tue, 30 Sep 2014 15:15:10 +0200
Subject: [Numpy-discussion] f2py and debug mode
Message-ID: <542AACDE.7010408@strains.fr>

Hello to all.
I'm aiming to wrap a Fortran program into Python. I started to work with 
f2py, and am trying to setup a debug mode where I could reach 
breakpoints in Fortran module launched by Python. I've been looking in 
the existing post, but not seeing things like that.

I'm used to work with visual studio 2012 and Intel Fortran compiler, I 
have tried to get that point doing :
1) Run f2py -m to get *.c wrapper
2) Embed it in a C Project in Visual Studio, containing also with 
fortranobject.c and fortranobject.h,
3) Create a solution which also contains my fortran files compiled as a lib
4) Generate in debug mode a "dll" with extension pyd (to get to that 
point name of the "main" function in Fortran by "_main").

I compiled without any error, and reach break point in C Wrapper, but 
not in Fortran, and the fortran code seems not to be executed (whereas 
it is when compiling with f2py -c). Trying to understand f2py code, I 
noticed that f2py is not only writing c-wrapper, but compiling it in a 
specific way. Is there a way to get a debug mode in Visual Studio with 
f2py (some members of the team are used to it) ? Any alternative tool we 
should use for debugging ?

Thanks for answering
Ferdinand


---
Ce courrier ?lectronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com


From ben.root at ou.edu  Tue Sep 30 09:19:49 2014
From: ben.root at ou.edu (Benjamin Root)
Date: Tue, 30 Sep 2014 09:19:49 -0400
Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return
	dtype.names
In-Reply-To: <CAO0rnfGPs29DQEu4OBJa59jXJrvTWhWZp9ipD6M+4sWFPRzkkA@mail.gmail.com>
References: <CAK_AF3CLL65woKv4S8iZQNSZLB5ryWatBiUkm-0Dy=5=5KUvEg@mail.gmail.com>
	<CAO0rnfGPs29DQEu4OBJa59jXJrvTWhWZp9ipD6M+4sWFPRzkkA@mail.gmail.com>
Message-ID: <CANNq6Fm6zyjj0bqnfh2qm06mjXuF7qc0JniHdwFydqCkRAm7NA@mail.gmail.com>

I am also +1. I have already used structured arrays to do keyword-based
string formatting. This makes sense as well. Would this enable keyword
argument expansion?

On Tue, Sep 30, 2014 at 7:29 AM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

> Sounds fair to me. Indeed the ducktyping argument makes sense, and I have
> a hard time imagining any namespace conflicts or other confusion. Should
> this attribute return none for non-structured arrays, or simply be
> undefined?
>
> On Tue, Sep 30, 2014 at 12:49 PM, John Zwinck <jzwinck at gmail.com> wrote:
>
>> I first proposed this on GitHub:
>> https://github.com/numpy/numpy/issues/5134 ; jaimefrio requested that
>> I bring it to this list for discussion.
>>
>> My proposal is to add a keys() method to NumPy's array class ndarray.
>> The behavior would be to return self.dtype.names, i.e. the "column
>> names" for a structured array (and None when dtype.names is None,
>> which it is for pure numeric arrays without named columns).
>>
>> I originally proposed to add a values() method also, but I am tabling
>> that for now so we needn't discuss it in this thread.
>>
>> The motivation is to enhance the ability to use duck typing with NumPy
>> arrays, Python dicts, and other types like Pandas DataFrames, h5py
>> Files, and more.  It's a fairly common thing to want to get the "keys"
>> of a container, where "keys" is understood to be a sequence of values
>> one can pass to __getitem__(), and this is exactly what I'm aiming at.
>>
>> Thoughts?
>>
>> John Zwinck
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140930/908baeb2/attachment.html>

From shoyer at gmail.com  Tue Sep 30 14:05:21 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 30 Sep 2014 11:05:21 -0700
Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return
	dtype.names
In-Reply-To: <CANNq6Fm6zyjj0bqnfh2qm06mjXuF7qc0JniHdwFydqCkRAm7NA@mail.gmail.com>
References: <CAK_AF3CLL65woKv4S8iZQNSZLB5ryWatBiUkm-0Dy=5=5KUvEg@mail.gmail.com>
	<CAO0rnfGPs29DQEu4OBJa59jXJrvTWhWZp9ipD6M+4sWFPRzkkA@mail.gmail.com>
	<CANNq6Fm6zyjj0bqnfh2qm06mjXuF7qc0JniHdwFydqCkRAm7NA@mail.gmail.com>
Message-ID: <CAEQ_Tvf53VWijh_NjH1YVFb_512JrNhWZ+BrnwBm6w-kQLJvGg@mail.gmail.com>

I like this idea. But I am -1 on returning None if the array is
unstructured. I expect .keys(), if present, to always return an iterable.

In fact, this would break some of my existing code, which checks for the
existence of "keys" as a way to do duck typed checks for dictionary like
objects (e.g., including pandas.DataFrame):
https://github.com/xray/xray/blob/v0.3/xray/core/utils.py#L165
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140930/b7cfcd7d/attachment.html>

From hoogendoorn.eelco at gmail.com  Tue Sep 30 16:21:09 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Tue, 30 Sep 2014 22:21:09 +0200
Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return
	dtype.names
In-Reply-To: <CAEQ_Tvf53VWijh_NjH1YVFb_512JrNhWZ+BrnwBm6w-kQLJvGg@mail.gmail.com>
References: <CAK_AF3CLL65woKv4S8iZQNSZLB5ryWatBiUkm-0Dy=5=5KUvEg@mail.gmail.com>
	<CAO0rnfGPs29DQEu4OBJa59jXJrvTWhWZp9ipD6M+4sWFPRzkkA@mail.gmail.com>
	<CANNq6Fm6zyjj0bqnfh2qm06mjXuF7qc0JniHdwFydqCkRAm7NA@mail.gmail.com>
	<CAEQ_Tvf53VWijh_NjH1YVFb_512JrNhWZ+BrnwBm6w-kQLJvGg@mail.gmail.com>
Message-ID: <CAO0rnfEEV-SwOM9mqq6QdzL2Ab6_SBVqHp262Ud0=NUUSYCH7g@mail.gmail.com>

So a non-structured array should return an empty list/iterable as its keys?
That doesn't seem right to me, but perhaps you have a compelling example to
the contrary.

I mean, wouldn't we want the duck-typing to fail if it isn't a structured
array? Throwing an attributeError seems like the best thing to do, from a
duck-typing perspective.

On Tue, Sep 30, 2014 at 8:05 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> I like this idea. But I am -1 on returning None if the array is
> unstructured. I expect .keys(), if present, to always return an iterable.
>
> In fact, this would break some of my existing code, which checks for the
> existence of "keys" as a way to do duck typed checks for dictionary like
> objects (e.g., including pandas.DataFrame):
> https://github.com/xray/xray/blob/v0.3/xray/core/utils.py#L165
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140930/e9f9b4ce/attachment.html>

From hoogendoorn.eelco at gmail.com  Tue Sep 30 16:22:07 2014
From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn)
Date: Tue, 30 Sep 2014 22:22:07 +0200
Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return
	dtype.names
In-Reply-To: <CAO0rnfEEV-SwOM9mqq6QdzL2Ab6_SBVqHp262Ud0=NUUSYCH7g@mail.gmail.com>
References: <CAK_AF3CLL65woKv4S8iZQNSZLB5ryWatBiUkm-0Dy=5=5KUvEg@mail.gmail.com>
	<CAO0rnfGPs29DQEu4OBJa59jXJrvTWhWZp9ipD6M+4sWFPRzkkA@mail.gmail.com>
	<CANNq6Fm6zyjj0bqnfh2qm06mjXuF7qc0JniHdwFydqCkRAm7NA@mail.gmail.com>
	<CAEQ_Tvf53VWijh_NjH1YVFb_512JrNhWZ+BrnwBm6w-kQLJvGg@mail.gmail.com>
	<CAO0rnfEEV-SwOM9mqq6QdzL2Ab6_SBVqHp262Ud0=NUUSYCH7g@mail.gmail.com>
Message-ID: <CAO0rnfGV6zzvNPY3PeWW5mxqqQRk0h=_=YEo_wHALd5fhZR5+w@mail.gmail.com>

On more careful reading of your words, I think we agree; indeed, if keys()
is present is should return an iterable; but I don't think it should be
present for non-structured arrays.

On Tue, Sep 30, 2014 at 10:21 PM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

> So a non-structured array should return an empty list/iterable as its
> keys? That doesn't seem right to me, but perhaps you have a compelling
> example to the contrary.
>
> I mean, wouldn't we want the duck-typing to fail if it isn't a structured
> array? Throwing an attributeError seems like the best thing to do, from a
> duck-typing perspective.
>
> On Tue, Sep 30, 2014 at 8:05 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> I like this idea. But I am -1 on returning None if the array is
>> unstructured. I expect .keys(), if present, to always return an iterable.
>>
>> In fact, this would break some of my existing code, which checks for the
>> existence of "keys" as a way to do duck typed checks for dictionary like
>> objects (e.g., including pandas.DataFrame):
>> https://github.com/xray/xray/blob/v0.3/xray/core/utils.py#L165
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140930/83fb2563/attachment.html>

From shoyer at gmail.com  Tue Sep 30 16:30:43 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 30 Sep 2014 13:30:43 -0700
Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return
	dtype.names
In-Reply-To: <CAO0rnfGV6zzvNPY3PeWW5mxqqQRk0h=_=YEo_wHALd5fhZR5+w@mail.gmail.com>
References: <CAK_AF3CLL65woKv4S8iZQNSZLB5ryWatBiUkm-0Dy=5=5KUvEg@mail.gmail.com>
	<CAO0rnfGPs29DQEu4OBJa59jXJrvTWhWZp9ipD6M+4sWFPRzkkA@mail.gmail.com>
	<CANNq6Fm6zyjj0bqnfh2qm06mjXuF7qc0JniHdwFydqCkRAm7NA@mail.gmail.com>
	<CAEQ_Tvf53VWijh_NjH1YVFb_512JrNhWZ+BrnwBm6w-kQLJvGg@mail.gmail.com>
	<CAO0rnfEEV-SwOM9mqq6QdzL2Ab6_SBVqHp262Ud0=NUUSYCH7g@mail.gmail.com>
	<CAO0rnfGV6zzvNPY3PeWW5mxqqQRk0h=_=YEo_wHALd5fhZR5+w@mail.gmail.com>
Message-ID: <CAEQ_Tvd=3_1M_Do0exjnHwRMnF7Za7fgn7VtN8-7ircREU_k8g@mail.gmail.com>

On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn <
hoogendoorn.eelco at gmail.com> wrote:

> On more careful reading of your words, I think we agree; indeed, if keys()
> is present is should return an iterable; but I don't think it should be
> present for non-structured arrays.
>

Indeed, I think we do agree. The attribute can simply be missing (e.g.,
accessing it raises AttributeError) for non-structured arrays.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140930/bdf86ea7/attachment.html>