From charlesr.harris at gmail.com  Sun Jan  1 19:04:08 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 1 Jan 2017 17:04:08 -0700
Subject: [Numpy-discussion] NumPy 1.12.0rc2 release.
Message-ID: <CAB6mnxLQ6YAfizbp+6m56N03rAcv7qdFya3Er+b5aivL+Q1Aow@mail.gmail.com>

Hi All,

I'm pleased to announce the NumPy 1.12.0rc2 New Year's release. This
release supports Python 2.7 and 3.4-3.6. Wheels for all supported Python
versions may be downloaded from PiPY
<https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the tarball
and zip files may be downloaded from Github
<https://github.com/numpy/numpy/releases/tag/v1.12.0rc2>. The release notes
and files hashes may also be found at Github
<https://github.com/numpy/numpy/releases/tag/v1.12.0rc2> .

NumPy 1.12.0rc 2 is the result of 413 pull requests submitted by 139
contributors and comprises a large number of fixes and improvements. Among
the many improvements it is difficult to  pick out just a few as standing
above the others, but the following may be of particular interest or
indicate areas likely to have future consequences.

* Order of operations in ``np.einsum`` can now be optimized for large speed
improvements.
* New ``signature`` argument to ``np.vectorize`` for vectorizing with core
dimensions.
* The ``keepdims`` argument was added to many functions.
* New context manager for testing warnings
* Support for BLIS in numpy.distutils
* Much improved support for PyPy (not yet finished)

Enjoy,

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170101/b788e25e/attachment.html>

From charlesr.harris at gmail.com  Mon Jan  2 20:36:22 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 2 Jan 2017 18:36:22 -0700
Subject: [Numpy-discussion] Deprecating matrices.
Message-ID: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>

Hi All,

Just throwing this click bait out for discussion. Now that the `@` operator
is available and things seem to be moving towards Python 3, especially in
the classroom, we should consider the real possibility of deprecating the
matrix type and later removing it. No doubt there are old scripts that
require them, but older versions of numpy are available for those who need
to run old scripts.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170102/d850cb17/attachment.html>

From ralf.gommers at gmail.com  Mon Jan  2 21:00:56 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 3 Jan 2017 15:00:56 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
Message-ID: <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>

On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

> Hi All,
>
> Just throwing this click bait out for discussion. Now that the `@`
> operator is available and things seem to be moving towards Python 3,
> especially in the classroom, we should consider the real possibility of
> deprecating the matrix type and later removing it. No doubt there are old
> scripts that require them, but older versions of numpy are available for
> those who need to run old scripts.
>
> Thoughts?
>

Clearly deprecate in the docs now, and warn only later imho. We can't warn
before we have a good solution for scipy.sparse matrices, which have matrix
semantics and return matrix instances.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/643c2792/attachment.html>

From josef.pktd at gmail.com  Mon Jan  2 21:26:32 2017
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 2 Jan 2017 21:26:32 -0500
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>
Message-ID: <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>

On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>> Hi All,
>>
>> Just throwing this click bait out for discussion. Now that the `@`
>> operator is available and things seem to be moving towards Python 3,
>> especially in the classroom, we should consider the real possibility of
>> deprecating the matrix type and later removing it. No doubt there are old
>> scripts that require them, but older versions of numpy are available for
>> those who need to run old scripts.
>>
>> Thoughts?
>>
>
> Clearly deprecate in the docs now, and warn only later imho. We can't warn
> before we have a good solution for scipy.sparse matrices, which have matrix
> semantics and return matrix instances.
>
> Ralf
>

How about dropping python 2 support at the same time, then we can all be in
a @ world.

Josef


>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170102/8cee8076/attachment.html>

From charlesr.harris at gmail.com  Mon Jan  2 21:27:09 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 2 Jan 2017 19:27:09 -0700
Subject: [Numpy-discussion] Default type for functions that accumulate
	integers
Message-ID: <CAB6mnxJ_rSP73H+iF7fMUv+aQmPo5eWesTDTYA4tF24ufEaXGw@mail.gmail.com>

Hi All,

Currently functions like trace use the C long type as the default
accumulator for integer types of lesser precision:

dtype : dtype, optional
>     Determines the data-type of the returned array and of the accumulator
>     where the elements are summed. If dtype has the value None and `a` is
>     of integer type of precision less than the default integer
>     precision, then the default integer precision is used. Otherwise,
>     the precision is the same as that of `a`.
>

The problem with this is that the precision of long varies with the
platform so that the result varies,  see gh-8433
<https://github.com/numpy/numpy/issues/8433> for a complaint about this.
There are two possible alternatives that seem reasonable to me:


   1. Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators
   on 64 bit platforms.
   2. Always use 64 bit accumulators.

Thoughts?

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170102/ab726f55/attachment.html>

From njs at pobox.com  Mon Jan  2 21:46:08 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Mon, 2 Jan 2017 18:46:08 -0800
Subject: [Numpy-discussion] Default type for functions that accumulate
	integers
In-Reply-To: <CAB6mnxJ_rSP73H+iF7fMUv+aQmPo5eWesTDTYA4tF24ufEaXGw@mail.gmail.com>
References: <CAB6mnxJ_rSP73H+iF7fMUv+aQmPo5eWesTDTYA4tF24ufEaXGw@mail.gmail.com>
Message-ID: <CAPJVwBnOb7kHTg_cmZ7UGRaw0WG8euQ5vr=0GFZ=muAdCWyvCw@mail.gmail.com>

On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Hi All,
>
> Currently functions like trace use the C long type as the default
> accumulator for integer types of lesser precision:
>
>> dtype : dtype, optional
>>     Determines the data-type of the returned array and of the accumulator
>>     where the elements are summed. If dtype has the value None and `a` is
>>     of integer type of precision less than the default integer
>>     precision, then the default integer precision is used. Otherwise,
>>     the precision is the same as that of `a`.
>
>
> The problem with this is that the precision of long varies with the platform
> so that the result varies,  see gh-8433 for a complaint about this. There
> are two possible alternatives that seem reasonable to me:
>
> Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators on 64
> bit platforms.
> Always use 64 bit accumulators.

This is a special case of a more general question: right now we use
the default integer precision (i.e., what you get from np.array([1]),
or np.arange, or np.dtype(int)), and it turns out that the default
integer precision itself varies in confusing ways, and this is a
common source of bugs. Specifically: right now it's 32-bit on 32-bit
builds, and 64-bit on 64-bit builds, except on Windows where it's
always 32-bit. This matches the default precision of Python 2 'int'.

So some options include:
- make the default integer precision 64-bits everywhere
- make the default integer precision 32-bits on 32-bit systems, and
64-bits on 64-bit systems (including Windows)
- leave the default integer precision the same, but make accumulators
64-bits everywhere
- leave the default integer precision the same, but make accumulators
64-bits on 64-bit systems (including Windows)
- ...

Given the prevalence of 64-bit systems these days, and the fact that
the current setup makes it very easy to write code that seems to work
when tested on a 64-bit system but that silently returns incorrect
results on 32-bit systems, it sure would be nice if we could switch to
a 64-bit default everywhere. (You could still get 32-bit integers, of
course, you'd just have to ask for them explicitly.)

Things we'd need to know more about before making a decision:
- compatibility: if we flip this switch, how much code breaks? In
general correct numpy-using code has to be prepared to handle
np.dtype(int) being 64-bits, and in fact there might be more code that
accidentally assumes that np.dtype(int) is always 64-bits than there
is code that assumes it is always 32-bits. But that's theory; to know
how bad this is we would need to try actually running some projects
test suites and see whether they break or not.
- speed: there's probably some cost to using 64-bit integers on 32-bit
systems; how big is the penalty in practice?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


From njs at pobox.com  Mon Jan  2 22:11:01 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Mon, 2 Jan 2017 19:11:01 -0800
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>
 <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>
Message-ID: <CAPJVwBn2jjxY2TikptLzpWS6cj9o7cD_kd74F_f7+W1UaL_VvQ@mail.gmail.com>

On Mon, Jan 2, 2017 at 6:26 PM,  <josef.pktd at gmail.com> wrote:
>
>
> On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>>
>>
>>
>> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>>>
>>> Hi All,
>>>
>>> Just throwing this click bait out for discussion. Now that the `@`
>>> operator is available and things seem to be moving towards Python 3,
>>> especially in the classroom, we should consider the real possibility of
>>> deprecating the matrix type and later removing it. No doubt there are old
>>> scripts that require them, but older versions of numpy are available for
>>> those who need to run old scripts.
>>>
>>> Thoughts?
>>
>>
>> Clearly deprecate in the docs now, and warn only later imho. We can't warn
>> before we have a good solution for scipy.sparse matrices, which have matrix
>> semantics and return matrix instances.
>>
>> Ralf
>
>
> How about dropping python 2 support at the same time, then we can all be in
> a @ world.
>
> Josef

Let's not yoke together two (mostly) unrelated controversial
discussions? I doubt we'll be able to remove either Python 2 or matrix
support before 2020 at the earliest, so the discussion now is just
about how to communicate to users that they should not be using
'matrix'.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


From charlesr.harris at gmail.com  Mon Jan  2 22:12:00 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 2 Jan 2017 20:12:00 -0700
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>
 <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>
Message-ID: <CAB6mnxJ=hscFGyHrcZaZ+muWM+nNSS2oRBAFQk9=_5nt+sq1Jw@mail.gmail.com>

On Mon, Jan 2, 2017 at 7:26 PM, <josef.pktd at gmail.com> wrote:

>
>
> On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Just throwing this click bait out for discussion. Now that the `@`
>>> operator is available and things seem to be moving towards Python 3,
>>> especially in the classroom, we should consider the real possibility of
>>> deprecating the matrix type and later removing it. No doubt there are old
>>> scripts that require them, but older versions of numpy are available for
>>> those who need to run old scripts.
>>>
>>> Thoughts?
>>>
>>
>> Clearly deprecate in the docs now, and warn only later imho. We can't
>> warn before we have a good solution for scipy.sparse matrices, which have
>> matrix semantics and return matrix instances.
>>
>> Ralf
>>
>
> How about dropping python 2 support at the same time, then we can all be
> in a @ world.
>
>
The "@" operator works with matrices already, what causes problems is the
combination  of matrices with 1-D arrays. That can be fixed, I think. The
big problem is probably the lack of "@" in Python 2.7. I wonder if there is
any chance of getting it backported to 2.7 before support is dropped in
2020? I expect it would be a fight, but I also suspect it would not be
difficult to do if the proposal was accepted. Then at some future date
sparse could simply start returning arrays.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170102/6e632776/attachment.html>

From charlesr.harris at gmail.com  Mon Jan  2 22:15:43 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 2 Jan 2017 20:15:43 -0700
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnxJ=hscFGyHrcZaZ+muWM+nNSS2oRBAFQk9=_5nt+sq1Jw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>
 <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>
 <CAB6mnxJ=hscFGyHrcZaZ+muWM+nNSS2oRBAFQk9=_5nt+sq1Jw@mail.gmail.com>
Message-ID: <CAB6mnx+ibHDqpvHuimkZGLVGecbdkhYVEJEyE5f2C4AsoZQffw@mail.gmail.com>

On Mon, Jan 2, 2017 at 8:12 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Mon, Jan 2, 2017 at 7:26 PM, <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris <
>>> charlesr.harris at gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Just throwing this click bait out for discussion. Now that the `@`
>>>> operator is available and things seem to be moving towards Python 3,
>>>> especially in the classroom, we should consider the real possibility of
>>>> deprecating the matrix type and later removing it. No doubt there are old
>>>> scripts that require them, but older versions of numpy are available for
>>>> those who need to run old scripts.
>>>>
>>>> Thoughts?
>>>>
>>>
>>> Clearly deprecate in the docs now, and warn only later imho. We can't
>>> warn before we have a good solution for scipy.sparse matrices, which have
>>> matrix semantics and return matrix instances.
>>>
>>> Ralf
>>>
>>
>> How about dropping python 2 support at the same time, then we can all be
>> in a @ world.
>>
>>
> The "@" operator works with matrices already, what causes problems is the
> combination  of matrices with 1-D arrays. That can be fixed, I think. The
> big problem is probably the lack of "@" in Python 2.7. I wonder if there is
> any chance of getting it backported to 2.7 before support is dropped in
> 2020? I expect it would be a fight, but I also suspect it would not be
> difficult to do if the proposal was accepted. Then at some future date
> sparse could simply start returning arrays.
>

Hmm, matrix-scalar multiplication will be a problem.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170102/4e93b0dd/attachment.html>

From njs at pobox.com  Mon Jan  2 22:29:09 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Mon, 2 Jan 2017 19:29:09 -0800
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnxJ=hscFGyHrcZaZ+muWM+nNSS2oRBAFQk9=_5nt+sq1Jw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>
 <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>
 <CAB6mnxJ=hscFGyHrcZaZ+muWM+nNSS2oRBAFQk9=_5nt+sq1Jw@mail.gmail.com>
Message-ID: <CAPJVwB=fAnwmvEMz_mfgAHmSz4A-1xTr1bxAQy7NrwEL7FKYTQ@mail.gmail.com>

On Mon, Jan 2, 2017 at 7:12 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Jan 2, 2017 at 7:26 PM, <josef.pktd at gmail.com> wrote:
[...]
>> How about dropping python 2 support at the same time, then we can all be
>> in a @ world.
>>
>
> The "@" operator works with matrices already, what causes problems is the
> combination  of matrices with 1-D arrays. That can be fixed, I think. The
> big problem is probably the lack of "@" in Python 2.7. I wonder if there is
> any chance of getting it backported to 2.7 before support is dropped in
> 2020? I expect it would be a fight, but I also suspect it would not be
> difficult to do if the proposal was accepted. Then at some future date
> sparse could simply start returning arrays.

Unfortunately the chance of Python 2.7 adding support for "@" is best
expressed as a denormal.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


From charlesr.harris at gmail.com  Mon Jan  2 22:54:19 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 2 Jan 2017 20:54:19 -0700
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAPJVwB=fAnwmvEMz_mfgAHmSz4A-1xTr1bxAQy7NrwEL7FKYTQ@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>
 <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>
 <CAB6mnxJ=hscFGyHrcZaZ+muWM+nNSS2oRBAFQk9=_5nt+sq1Jw@mail.gmail.com>
 <CAPJVwB=fAnwmvEMz_mfgAHmSz4A-1xTr1bxAQy7NrwEL7FKYTQ@mail.gmail.com>
Message-ID: <CAB6mnxJ2gGRGcW0XghLpGn52dZa37jC4KPa+N1PymRpC98M=Fw@mail.gmail.com>

On Mon, Jan 2, 2017 at 8:29 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Mon, Jan 2, 2017 at 7:12 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Mon, Jan 2, 2017 at 7:26 PM, <josef.pktd at gmail.com> wrote:
> [...]
> >> How about dropping python 2 support at the same time, then we can all be
> >> in a @ world.
> >>
> >
> > The "@" operator works with matrices already, what causes problems is the
> > combination  of matrices with 1-D arrays. That can be fixed, I think. The
> > big problem is probably the lack of "@" in Python 2.7. I wonder if there
> is
> > any chance of getting it backported to 2.7 before support is dropped in
> > 2020? I expect it would be a fight, but I also suspect it would not be
> > difficult to do if the proposal was accepted. Then at some future date
> > sparse could simply start returning arrays.
>
> Unfortunately the chance of Python 2.7 adding support for "@" is best
> expressed as a denormal.
>

That's what I figured ;) Hmm, matrices would work fine with the current
combination of '*' (works for scalar muiltiplication) and '@' (works for
matrices). So for Python3 code currently written for matrices can be
reformed to be array compatible. But '@' for Python 2.7 would sure help...

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170102/1990e33b/attachment.html>

From njs at pobox.com  Mon Jan  2 22:58:06 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Mon, 2 Jan 2017 19:58:06 -0800
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnxJ2gGRGcW0XghLpGn52dZa37jC4KPa+N1PymRpC98M=Fw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CABL7CQhO0+ZZvrtQwuL4LhxfWjYDWapEeDGQZZiRRtG8RUM42Q@mail.gmail.com>
 <CAMMTP+CPeRgdHFpZGhAK2jg30yuC=zO8YR88B63RAqyNCkJkRw@mail.gmail.com>
 <CAB6mnxJ=hscFGyHrcZaZ+muWM+nNSS2oRBAFQk9=_5nt+sq1Jw@mail.gmail.com>
 <CAPJVwB=fAnwmvEMz_mfgAHmSz4A-1xTr1bxAQy7NrwEL7FKYTQ@mail.gmail.com>
 <CAB6mnxJ2gGRGcW0XghLpGn52dZa37jC4KPa+N1PymRpC98M=Fw@mail.gmail.com>
Message-ID: <CAPJVwB=HGea0w7VbpGfThcPS+0E67O1CLe7Bgfm8dnDYC2=MKw@mail.gmail.com>

On Mon, Jan 2, 2017 at 7:54 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Jan 2, 2017 at 8:29 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Mon, Jan 2, 2017 at 7:12 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Mon, Jan 2, 2017 at 7:26 PM, <josef.pktd at gmail.com> wrote:
>> [...]
>> >> How about dropping python 2 support at the same time, then we can all
>> >> be
>> >> in a @ world.
>> >>
>> >
>> > The "@" operator works with matrices already, what causes problems is
>> > the
>> > combination  of matrices with 1-D arrays. That can be fixed, I think.
>> > The
>> > big problem is probably the lack of "@" in Python 2.7. I wonder if there
>> > is
>> > any chance of getting it backported to 2.7 before support is dropped in
>> > 2020? I expect it would be a fight, but I also suspect it would not be
>> > difficult to do if the proposal was accepted. Then at some future date
>> > sparse could simply start returning arrays.
>>
>> Unfortunately the chance of Python 2.7 adding support for "@" is best
>> expressed as a denormal.
>
>
> That's what I figured ;) Hmm, matrices would work fine with the current
> combination of '*' (works for scalar muiltiplication) and '@' (works for
> matrices). So for Python3 code currently written for matrices can be
> reformed to be array compatible. But '@' for Python 2.7 would sure help...

I mean, it can just use arrays + dot(). It's not as elegant as '@',
but given that almost everyone has already switched it's clearly not
*that* bad...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


From lists at onerussian.com  Tue Jan  3 12:00:04 2017
From: lists at onerussian.com (Yaroslav Halchenko)
Date: Tue, 3 Jan 2017 12:00:04 -0500
Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative
 integer powers...
In-Reply-To: <CACUQHuLH0hgPyv3MV=kMNBgREb--BACTYTWgAZZZM2qpPWcxbw@mail.gmail.com>
References: <CACUQHuLH0hgPyv3MV=kMNBgREb--BACTYTWgAZZZM2qpPWcxbw@mail.gmail.com>
Message-ID: <20170103170004.GA7160@onerussian.com>


On Tue, 11 Oct 2016, Peter Creasey wrote:
> >> I agree with Sebastian and Nathaniel. I don't think we can deviating from
> >> the existing behavior (int ** int -> int) without breaking lots of existing
> >> code, and if we did, yes, we would need a new integer power function.

> >> I think it's better to preserve the existing behavior when it gives
> >> sensible results, and error when it doesn't. Adding another function
> >> float_power for the case that is currently broken seems like the right way
> >> to go.

> I actually suspect that the amount of code broken by int**int->float
> may be relatively small (though extremely annoying for those that it
> happens to, and it would definitely be good to have statistics). I
> mean, Numpy silently transitioned to int32+uint64->float64 not so long
> ago which broke my code, but the world didn?t end.

> If the primary argument against int**int->float seems to be the
> difficulty of managing the transition, with int**int->Error being the
> seen as the required yet *very* painful intermediate step for the
> large fraction of the int**int users who didn?t care if it was int or
> float (e.g. the output is likely to be cast to float in the next step
> anyway), and fail loudly for those users who need int**int->int, then
> if you are prepared to risk a less conservative transition (i.e. we
> think that latter group is small enough) you could skip the error on
> users and just throw a warning for a couple of releases, along the
> lines of:

> WARNING int**int -> int is going to be deprecated in favour of
> int**int->float in Numpy 1.16. To avoid seeing this message, either
> use ?from numpy import __future_float_power__? or explicitly set the
> type of one of your inputs to float, or use the new ipower(x,y)
> function for integer powers.

Sorry for coming too late to the discussion and after PR "addressing"
the issue by issuing an error was merged [1].  I got burnt by new
behavior while trying to build fresh pandas release on Debian (we are
freezing for release way too soon ;) ) -- some pandas tests failed since
they rely on previous non-erroring behavior and we got  numpy 1.12.0~b1
which included [1] in unstable/testing (candidate release) now.

I quickly glanced over the discussion but I guess I have missed
actual description of the problem being fixed here...  what was it??

previous behavior, int**int->int made sense to me as it seemed to be
consistent with casting Python's pow result to int, somewhat fulfilling
desired promise for in-place operations and being inline with built-in
pow results as far as I see it (up to casting).

Current handling and error IMHO is going against rudimentary
algebra, where numbers can be brought to negative power (integer or
not).

[1] https://github.com/numpy/numpy/pull/8231
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        


From sebastian at sipsolutions.net  Tue Jan  3 12:08:41 2017
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 03 Jan 2017 18:08:41 +0100
Subject: [Numpy-discussion] Default type for functions that accumulate
 integers
In-Reply-To: <CAPJVwBnOb7kHTg_cmZ7UGRaw0WG8euQ5vr=0GFZ=muAdCWyvCw@mail.gmail.com>
References: <CAB6mnxJ_rSP73H+iF7fMUv+aQmPo5eWesTDTYA4tF24ufEaXGw@mail.gmail.com>
 <CAPJVwBnOb7kHTg_cmZ7UGRaw0WG8euQ5vr=0GFZ=muAdCWyvCw@mail.gmail.com>
Message-ID: <1483463321.27223.46.camel@sipsolutions.net>

On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote:
> On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > 
> > Hi All,
> > 
> > Currently functions like trace use the C long type as the default
> > accumulator for integer types of lesser precision:
> > 

<snip>

> 
> Things we'd need to know more about before making a decision:
> - compatibility: if we flip this switch, how much code breaks? In
> general correct numpy-using code has to be prepared to handle
> np.dtype(int) being 64-bits, and in fact there might be more code
> that
> accidentally assumes that np.dtype(int) is always 64-bits than there
> is code that assumes it is always 32-bits. But that's theory; to know
> how bad this is we would need to try actually running some projects
> test suites and see whether they break or not.
> - speed: there's probably some cost to using 64-bit integers on 32-
> bit
> systems; how big is the penalty in practice?
> 

I agree with trying to switch the default in general first, I don't
like the idea of having two different "defaults".

There are two issues, one is the change on Python 2 (no inheritance of
Python int by default numpy type) and any issues due to increased
precision (more RAM usage, code actually expects lower precision
somehow, etc.).
Cannot say I know for sure, but I would be extremely surprised if there
is a speed difference between 32bit vs. 64bit architectures, except the
general slowdown you get due to bus speeds, etc. when going to higher
bit width.

If the inheritance for some reason is a bigger issue, we might limit
the change to Python 3. For other possible problems, I think we may
have difficulties assessing how much is affected. The problem is, that
the most affected thing should be projects only being used on windows,
or so. Bigger projects should work fine already (they are more likely
to get better due to not being tested as well on 32bit long platforms,
especially 64bit windows).

Of course limiting the change to python 3, could have the advantage of
not affecting older projects which are possibly more likely to be
specifically using the current behaviour.

So, I would be open to trying the change, I think the idea of at least
changing it in python 3 has been brought up a couple of times,
including by Julian, so maybe it is time to give it a shot....

It would be interesting to see if anyone knows projects that may be
affected (for example because they are designed to only run on windows
or limited hardware), and if avoiding to change anything in python 2
might mitigate problems here as well (additionally to avoiding the
inheritance change)?

Best,

Sebastian


> -n
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/a729edbf/attachment.sig>

From shoyer at gmail.com  Tue Jan  3 12:37:27 2017
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 3 Jan 2017 09:37:27 -0800
Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative
 integer powers...
In-Reply-To: <20170103170004.GA7160@onerussian.com>
References: <CACUQHuLH0hgPyv3MV=kMNBgREb--BACTYTWgAZZZM2qpPWcxbw@mail.gmail.com>
 <20170103170004.GA7160@onerussian.com>
Message-ID: <CAEQ_Tve5=GHBjyU162mqjzcAAqSwt7c0Of3hZ97b-849Y1ntnA@mail.gmail.com>

On Tue, Jan 3, 2017 at 9:00 AM, Yaroslav Halchenko <lists at onerussian.com>
wrote:

> Sorry for coming too late to the discussion and after PR "addressing"
> the issue by issuing an error was merged [1].  I got burnt by new
> behavior while trying to build fresh pandas release on Debian (we are
> freezing for release way too soon ;) ) -- some pandas tests failed since
> they rely on previous non-erroring behavior and we got  numpy 1.12.0~b1
> which included [1] in unstable/testing (candidate release) now.
>
> I quickly glanced over the discussion but I guess I have missed
> actual description of the problem being fixed here...  what was it??
>
> previous behavior, int**int->int made sense to me as it seemed to be
> consistent with casting Python's pow result to int, somewhat fulfilling
> desired promise for in-place operations and being inline with built-in
> pow results as far as I see it (up to casting).


I believe this is exactly the behavior we preserved. Rather, we turned some
cases that previously often gave wrong results (involving negative integer
powers) into errors.

The pandas test suite triggered this behavior, but not intentionally, and
should be fixed in the next release:
https://github.com/pandas-dev/pandas/pull/14498
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/a26b3818/attachment.html>

From charlesr.harris at gmail.com  Tue Jan  3 13:15:14 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 3 Jan 2017 11:15:14 -0700
Subject: [Numpy-discussion] Default type for functions that accumulate
	integers
In-Reply-To: <1483463321.27223.46.camel@sipsolutions.net>
References: <CAB6mnxJ_rSP73H+iF7fMUv+aQmPo5eWesTDTYA4tF24ufEaXGw@mail.gmail.com>
 <CAPJVwBnOb7kHTg_cmZ7UGRaw0WG8euQ5vr=0GFZ=muAdCWyvCw@mail.gmail.com>
 <1483463321.27223.46.camel@sipsolutions.net>
Message-ID: <CAB6mnxJwOU19S5nAH4Z832cS6=CP0iZ3qE4FeC7scf-FFy32Jw@mail.gmail.com>

On Tue, Jan 3, 2017 at 10:08 AM, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote:
> > On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> > >
> > > Hi All,
> > >
> > > Currently functions like trace use the C long type as the default
> > > accumulator for integer types of lesser precision:
> > >
>
> <snip>
>
> >
> > Things we'd need to know more about before making a decision:
> > - compatibility: if we flip this switch, how much code breaks? In
> > general correct numpy-using code has to be prepared to handle
> > np.dtype(int) being 64-bits, and in fact there might be more code
> > that
> > accidentally assumes that np.dtype(int) is always 64-bits than there
> > is code that assumes it is always 32-bits. But that's theory; to know
> > how bad this is we would need to try actually running some projects
> > test suites and see whether they break or not.
> > - speed: there's probably some cost to using 64-bit integers on 32-
> > bit
> > systems; how big is the penalty in practice?
> >
>
> I agree with trying to switch the default in general first, I don't
> like the idea of having two different "defaults".
>
> There are two issues, one is the change on Python 2 (no inheritance of
> Python int by default numpy type) and any issues due to increased
> precision (more RAM usage, code actually expects lower precision
> somehow, etc.).
> Cannot say I know for sure, but I would be extremely surprised if there
> is a speed difference between 32bit vs. 64bit architectures, except the
> general slowdown you get due to bus speeds, etc. when going to higher
> bit width.
>
> If the inheritance for some reason is a bigger issue, we might limit
> the change to Python 3. For other possible problems, I think we may
> have difficulties assessing how much is affected. The problem is, that
> the most affected thing should be projects only being used on windows,
> or so. Bigger projects should work fine already (they are more likely
> to get better due to not being tested as well on 32bit long platforms,
> especially 64bit windows).
>
> Of course limiting the change to python 3, could have the advantage of
> not affecting older projects which are possibly more likely to be
> specifically using the current behaviour.
>
> So, I would be open to trying the change, I think the idea of at least
> changing it in python 3 has been brought up a couple of times,
> including by Julian, so maybe it is time to give it a shot....
>
> It would be interesting to see if anyone knows projects that may be
> affected (for example because they are designed to only run on windows
> or limited hardware), and if avoiding to change anything in python 2
> might mitigate problems here as well (additionally to avoiding the
> inheritance change)?
>

There have been a number of reports of problems due to the inheritance
stemming both from the changing precision and, IIRC, from differences in
print format or some such. So I don't expect that there will be no
problems, but they will probably not be difficult to fix.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/021f8beb/attachment.html>

From toddrjen at gmail.com  Tue Jan  3 14:31:45 2017
From: toddrjen at gmail.com (Todd)
Date: Tue, 3 Jan 2017 14:31:45 -0500
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
Message-ID: <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>

On Mon, Jan 2, 2017 at 8:36 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

> Hi All,
>
> Just throwing this click bait out for discussion. Now that the `@`
> operator is available and things seem to be moving towards Python 3,
> especially in the classroom, we should consider the real possibility of
> deprecating the matrix type and later removing it. No doubt there are old
> scripts that require them, but older versions of numpy are available for
> those who need to run old scripts.
>
> Thoughts?
>
> Chuck
>
>
What if the matrix class was split out into its own project, perhaps as a
scikit.  That way those who really need it can still use it.  If there is
sufficient desire for it, those who need it can maintain it.  If not, it
will hopefully it will take long enough for it to bitrot that everyone has
transitioned.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/4e65507f/attachment.html>

From ben.v.root at gmail.com  Tue Jan  3 14:54:28 2017
From: ben.v.root at gmail.com (Benjamin Root)
Date: Tue, 3 Jan 2017 14:54:28 -0500
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
Message-ID: <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>

That's not a bad idea. Matplotlib is currently considering something
similar for its mlab module. It has been there since the beginning, but it
is very outdated and very out-of-scope for matplotlib. However, there are
still lots of code out there that depends on it. So, we are looking to
split it off as its own package. The details still need to be worked out
(should we initially depend on the package and simply alias its import with
a DeprecationWarning, or should we go cold turkey and have a good message
explaining the change).

Ben Root


On Tue, Jan 3, 2017 at 2:31 PM, Todd <toddrjen at gmail.com> wrote:

> On Mon, Jan 2, 2017 at 8:36 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>> Hi All,
>>
>> Just throwing this click bait out for discussion. Now that the `@`
>> operator is available and things seem to be moving towards Python 3,
>> especially in the classroom, we should consider the real possibility of
>> deprecating the matrix type and later removing it. No doubt there are old
>> scripts that require them, but older versions of numpy are available for
>> those who need to run old scripts.
>>
>> Thoughts?
>>
>> Chuck
>>
>>
> What if the matrix class was split out into its own project, perhaps as a
> scikit.  That way those who really need it can still use it.  If there is
> sufficient desire for it, those who need it can maintain it.  If not, it
> will hopefully it will take long enough for it to bitrot that everyone has
> transitioned.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/4c3dda61/attachment.html>

From solipsis at pitrou.net  Tue Jan  3 14:59:47 2017
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 3 Jan 2017 20:59:47 +0100
Subject: [Numpy-discussion] Default type for functions that accumulate
	integers
References: <CAB6mnxJ_rSP73H+iF7fMUv+aQmPo5eWesTDTYA4tF24ufEaXGw@mail.gmail.com>
 <CAPJVwBnOb7kHTg_cmZ7UGRaw0WG8euQ5vr=0GFZ=muAdCWyvCw@mail.gmail.com>
Message-ID: <20170103205947.69473d91@fsol>

On Mon, 2 Jan 2017 18:46:08 -0800
Nathaniel Smith <njs at pobox.com> wrote:
> 
> So some options include:
> - make the default integer precision 64-bits everywhere
> - make the default integer precision 32-bits on 32-bit systems, and
> 64-bits on 64-bit systems (including Windows)

Either of those two would be the best IMO.

Intuitively, I think people would expect 32-bit ints in 32-bit
processes by default, and 64-bit ints in 64-bit processes likewise. So
I would slightly favour the latter option.

> - leave the default integer precision the same, but make accumulators
> 64-bits everywhere
> - leave the default integer precision the same, but make accumulators
> 64-bits on 64-bit systems (including Windows)

Both of these options introduce a confusing discrepancy.

> - speed: there's probably some cost to using 64-bit integers on 32-bit
> systems; how big is the penalty in practice?

Ok, I have fired up a Windows VM to compare 32-bit and 64-bit builds.
Numpy version is 1.11.2, Python version is 3.5.2.  Keep in mind those
are Anaconda builds of Numpy, with MKL enabled for linear algebra;
YMMV.

For each benchmark, the first number is the result on the 32-bit build,
the second number on the 64-bit build.

Simple arithmetic
-----------------

>>> v = np.ones(1024**2, dtype='int32')

>>> %timeit v + v            # 1.73 ms per loop | 1.78 ms per loop
>>> %timeit v * v            # 1.77 ms per loop | 1.79 ms per loop
>>> %timeit v // v           # 5.89 ms per loop | 5.39 ms per loop

>>> v = np.ones(1024**2, dtype='int64')

>>> %timeit v + v            # 3.54 ms per loop | 3.54 ms per loop
>>> %timeit v * v            # 5.61 ms per loop | 3.52 ms per loop
>>> %timeit v // v           # 17.1 ms per loop | 13.9 ms per loop

Linear algebra
--------------

>>> m = np.ones((1024,1024), dtype='int32')

>>> %timeit m @ m            # 556 ms per loop  | 569 ms per loop

>>> m = np.ones((1024,1024), dtype='int64')

>>> %timeit m @ m            # 3.81 s per loop  | 1.01 s per loop

Sorting
-------

>>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int32')

>>> %timeit np.sort(v)       # 43.4 ms per loop | 44 ms per loop

>>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int64')

>>> %timeit np.sort(v)       # 61.5 ms per loop | 45.5 ms per loop

Indexing
--------

>>> v = np.ones(1024**2, dtype='int32')

>>> %timeit v[v[::-1]]       # 2.38 ms per loop | 4.63 ms per loop

>>> v = np.ones(1024**2, dtype='int64')

>>> %timeit v[v[::-1]]       # 6.9 ms per loop  | 3.63 ms per loop


Quick summary:
- for very simple operations, 32b and 64b builds can have the same perf
  on each given bitwidth (though speed is uniformly halved on 64-bit
  integers when the given operation is SIMD-vectorized)
- for more sophisticated operations (such as element-wise
  multiplication or division, or quicksort, but much more so on the
  matrix product), 32b builds are competitive with 64b builds on 32-bit
  ints, but lag behind on 64-bit ints
- for indexing, it's desirable to use a "native" width integer,
  regardless of whether that means 32- or 64-bit

Of course the numbers will vary depend on the platform (read:
compiler), but some aspects of this comparison will probably translate
to other platforms.

Regards

Antoine.


From bryanv at continuum.io  Tue Jan  3 15:07:55 2017
From: bryanv at continuum.io (Bryan Van de Ven)
Date: Tue, 3 Jan 2017 14:07:55 -0600
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
Message-ID: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>

There's a good chance that bokeh.charts will be split off into a separately distributed package as well. Hopefully being a much smaller, pure Python project makes it a more accessible target for anyone interested in maintaining it, and if no one is interested in it anymore, well that fact becomes easier to judge. I think it would be a reasonable approach here for the same reasons. 

Bryan

> On Jan 3, 2017, at 13:54, Benjamin Root <ben.v.root at gmail.com> wrote:
> 
> That's not a bad idea. Matplotlib is currently considering something similar for its mlab module. It has been there since the beginning, but it is very outdated and very out-of-scope for matplotlib. However, there are still lots of code out there that depends on it. So, we are looking to split it off as its own package. The details still need to be worked out (should we initially depend on the package and simply alias its import with a DeprecationWarning, or should we go cold turkey and have a good message explaining the change).
> 
> Ben Root
> 
> 
>> On Tue, Jan 3, 2017 at 2:31 PM, Todd <toddrjen at gmail.com> wrote:
>>> On Mon, Jan 2, 2017 at 8:36 PM, Charles R Harris <charlesr.harris at gmail.com> wrote:
>>> Hi All,
>>> 
>>> Just throwing this click bait out for discussion. Now that the `@` operator is available and things seem to be moving towards Python 3, especially in the classroom, we should consider the real possibility of deprecating the matrix type and later removing it. No doubt there are old scripts that require them, but older versions of numpy are available for those who need to run old scripts. 
>>> 
>>> Thoughts?
>>> 
>>> Chuck
>>> 
>> 
>> What if the matrix class was split out into its own project, perhaps as a scikit.  That way those who really need it can still use it.  If there is sufficient desire for it, those who need it can maintain it.  If not, it will hopefully it will take long enough for it to bitrot that everyone has transitioned.
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/97523034/attachment.html>

From lists at onerussian.com  Tue Jan  3 16:46:59 2017
From: lists at onerussian.com (Yaroslav Halchenko)
Date: Tue, 3 Jan 2017 16:46:59 -0500
Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative
 integer powers...
In-Reply-To: <CAEQ_Tve5=GHBjyU162mqjzcAAqSwt7c0Of3hZ97b-849Y1ntnA@mail.gmail.com>
References: <CACUQHuLH0hgPyv3MV=kMNBgREb--BACTYTWgAZZZM2qpPWcxbw@mail.gmail.com>
 <20170103170004.GA7160@onerussian.com>
 <CAEQ_Tve5=GHBjyU162mqjzcAAqSwt7c0Of3hZ97b-849Y1ntnA@mail.gmail.com>
Message-ID: <20170103214659.GB7160@onerussian.com>


On Tue, 03 Jan 2017, Stephan Hoyer wrote:

>    On Tue, Jan 3, 2017 at 9:00 AM, Yaroslav Halchenko <lists at onerussian.com>
>    wrote:

>      Sorry for coming too late to the discussion and after PR "addressing"
>      the issue by issuing an error was merged [1].A  I got burnt by new
>      behavior while trying to build fresh pandas release on Debian (we are
>      freezing for release way too soon ;) ) -- some pandas tests failed since
>      they rely on previous non-erroring behavior and we gotA  numpy 1.12.0~b1
>      which included [1] in unstable/testing (candidate release) now.

>      I quickly glanced over the discussion but I guess I have missed
>      actual description of the problem being fixed here...A  what was it??

>      previous behavior, int**int->int made sense to me as it seemed to be
>      consistent with casting Python's pow result to int, somewhat fulfilling
>      desired promise for in-place operations and being inline with built-in
>      pow results as far as I see it (up to casting).

>    I believe this is exactly the behavior we preserved. Rather, we turned
>    some cases that previously often gave wrong results (involving negative
>    integer powers) into errors.

hm... testing on current master (first result is from python's pow)

$> python -c "import numpy; print('numpy version: ', numpy.__version__); a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))" 
('numpy version: ', '1.13.0.dev0+02e2ea8')
0.25
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: Integers to negative integer powers are not allowed.


testing on Debian's packaged beta

$> python -c "import numpy; print('numpy version: ', numpy.__version__); a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
('numpy version: ', '1.12.0b1')
0.25
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: Integers to negative integer powers are not allowed.


testing on stable debian box with elderly numpy, where it does behave sensibly:

$> python -c "import numpy; print('numpy version: ', numpy.__version__); a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
('numpy version: ', '1.8.2')
0.25
0

what am I missing?

>    The pandas test suite triggered this behavior, but not intentionally, and
>    should be fixed in the next release:
>    https://github.com/pandas-dev/pandas/pull/14498

I don't think that was the full set of cases, e.g.

(git)hopa/sid-i386:~exppsy/pandas[bf-i386]
$> nosetests -s -v pandas/tests/test_expressions.py:TestExpressions.test_mixed_arithmetic_series
test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) ... ERROR

======================================================================
ERROR: test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 223, in test_mixed_arithmetic_series
    self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 164, in run_series
    test_flex=False, **kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 93, in run_arithmetic_test
    expected = op(df, other)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 715, in wrapper
    result = wrap_results(safe_na_op(lvalues, rvalues))
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 676, in safe_na_op
    return na_op(lvalues, rvalues)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 652, in na_op
    raise_on_error=True, **eval_kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", line 210, in evaluate
    **eval_kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", line 63, in _evaluate_standard
    return op(a, b)
ValueError: Integers to negative integer powers are not allowed.


and being paranoid, I have rebuilt exact current master  of pandas with
master numpy in PYTHONPATH:

(git)hopa:~exppsy/pandas[master]git
$> PYTHONPATH=/home/yoh/proj/numpy nosetests -s -v pandas/tests/test_expressions.py:TestExpressions.test_mixed_arithmetic_series 
test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) ... ERROR

======================================================================
ERROR: test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 223, in test_mixed_arithmetic_series
    self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 164, in run_series
    test_flex=False, **kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 93, in run_arithmetic_test
    expected = op(df, other)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 715, in wrapper
    result = wrap_results(safe_na_op(lvalues, rvalues))
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 676, in safe_na_op
    return na_op(lvalues, rvalues)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 652, in na_op
    raise_on_error=True, **eval_kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", line 210, in evaluate
    **eval_kwargs)
  File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", line 63, in _evaluate_standard
    return op(a, b)
ValueError: Integers to negative integer powers are not allowed.

----------------------------------------------------------------------
Ran 1 test in 0.015s

FAILED (errors=1)

$> git describe --tags
v0.19.0-303-gb957f6f

$> PYTHONPATH=/home/yoh/proj/numpy python -c "import numpy; print('numpy version: ', numpy.__version__); a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"                                                                                                                                            ('numpy version: ', '1.13.0.dev0+02e2ea8')
0.25
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: Integers to negative integer powers are not allowed.


-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        


From njs at pobox.com  Tue Jan  3 18:05:09 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 3 Jan 2017 15:05:09 -0800
Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative
 integer powers...
In-Reply-To: <20170103214659.GB7160@onerussian.com>
References: <CACUQHuLH0hgPyv3MV=kMNBgREb--BACTYTWgAZZZM2qpPWcxbw@mail.gmail.com>
 <20170103170004.GA7160@onerussian.com>
 <CAEQ_Tve5=GHBjyU162mqjzcAAqSwt7c0Of3hZ97b-849Y1ntnA@mail.gmail.com>
 <20170103214659.GB7160@onerussian.com>
Message-ID: <CAPJVwBmEf37VU4U0FLqXXsZkUtMVjSPmGH1mgzsxuazD40p_Uw@mail.gmail.com>

It's possible we should back off to just issuing a deprecation warning in
1.12?

On Jan 3, 2017 1:47 PM, "Yaroslav Halchenko" <lists at onerussian.com> wrote:

>
> On Tue, 03 Jan 2017, Stephan Hoyer wrote:
>
> >    On Tue, Jan 3, 2017 at 9:00 AM, Yaroslav Halchenko <
> lists at onerussian.com>
> >    wrote:
>
> >      Sorry for coming too late to the discussion and after PR
> "addressing"
> >      the issue by issuing an error was merged [1].A  I got burnt by new
> >      behavior while trying to build fresh pandas release on Debian (we
> are
> >      freezing for release way too soon ;) ) -- some pandas tests failed
> since
> >      they rely on previous non-erroring behavior and we gotA  numpy
> 1.12.0~b1
> >      which included [1] in unstable/testing (candidate release) now.
>
> >      I quickly glanced over the discussion but I guess I have missed
> >      actual description of the problem being fixed here...A  what was
> it??
>
> >      previous behavior, int**int->int made sense to me as it seemed to be
> >      consistent with casting Python's pow result to int, somewhat
> fulfilling
> >      desired promise for in-place operations and being inline with
> built-in
> >      pow results as far as I see it (up to casting).
>
> >    I believe this is exactly the behavior we preserved. Rather, we turned
> >    some cases that previously often gave wrong results (involving
> negative
> >    integer powers) into errors.
>
> hm... testing on current master (first result is from python's pow)
>
> $> python -c "import numpy; print('numpy version: ', numpy.__version__);
> a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
> ('numpy version: ', '1.13.0.dev0+02e2ea8')
> 0.25
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> ValueError: Integers to negative integer powers are not allowed.
>
>
> testing on Debian's packaged beta
>
> $> python -c "import numpy; print('numpy version: ', numpy.__version__);
> a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
> ('numpy version: ', '1.12.0b1')
> 0.25
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> ValueError: Integers to negative integer powers are not allowed.
>
>
> testing on stable debian box with elderly numpy, where it does behave
> sensibly:
>
> $> python -c "import numpy; print('numpy version: ', numpy.__version__);
> a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
> ('numpy version: ', '1.8.2')
> 0.25
> 0
>
> what am I missing?
>
> >    The pandas test suite triggered this behavior, but not intentionally,
> and
> >    should be fixed in the next release:
> >    https://github.com/pandas-dev/pandas/pull/14498
>
> I don't think that was the full set of cases, e.g.
>
> (git)hopa/sid-i386:~exppsy/pandas[bf-i386]
> $> nosetests -s -v pandas/tests/test_expressions.
> py:TestExpressions.test_mixed_arithmetic_series
> test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions)
> ... ERROR
>
> ======================================================================
> ERROR: test_mixed_arithmetic_series (pandas.tests.test_
> expressions.TestExpressions)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
> line 223, in test_mixed_arithmetic_series
>     self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
> line 164, in run_series
>     test_flex=False, **kwargs)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
> line 93, in run_arithmetic_test
>     expected = op(df, other)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
> 715, in wrapper
>     result = wrap_results(safe_na_op(lvalues, rvalues))
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
> 676, in safe_na_op
>     return na_op(lvalues, rvalues)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
> 652, in na_op
>     raise_on_error=True, **eval_kwargs)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py",
> line 210, in evaluate
>     **eval_kwargs)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py",
> line 63, in _evaluate_standard
>     return op(a, b)
> ValueError: Integers to negative integer powers are not allowed.
>
>
> and being paranoid, I have rebuilt exact current master  of pandas with
> master numpy in PYTHONPATH:
>
> (git)hopa:~exppsy/pandas[master]git
> $> PYTHONPATH=/home/yoh/proj/numpy nosetests -s -v
> pandas/tests/test_expressions.py:TestExpressions.test_mixed_
> arithmetic_series
> test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions)
> ... ERROR
>
> ======================================================================
> ERROR: test_mixed_arithmetic_series (pandas.tests.test_
> expressions.TestExpressions)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
> line 223, in test_mixed_arithmetic_series
>     self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
> line 164, in run_series
>     test_flex=False, **kwargs)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
> line 93, in run_arithmetic_test
>     expected = op(df, other)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
> 715, in wrapper
>     result = wrap_results(safe_na_op(lvalues, rvalues))
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
> 676, in safe_na_op
>     return na_op(lvalues, rvalues)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
> 652, in na_op
>     raise_on_error=True, **eval_kwargs)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py",
> line 210, in evaluate
>     **eval_kwargs)
>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py",
> line 63, in _evaluate_standard
>     return op(a, b)
> ValueError: Integers to negative integer powers are not allowed.
>
> ----------------------------------------------------------------------
> Ran 1 test in 0.015s
>
> FAILED (errors=1)
>
> $> git describe --tags
> v0.19.0-303-gb957f6f
>
> $> PYTHONPATH=/home/yoh/proj/numpy python -c "import numpy; print('numpy
> version: ', numpy.__version__); a=2; b=-2;  print(pow(a,b));
> print(pow(numpy.array(a), b))"
>
>                   ('numpy version: ', '1.13.0.dev0+02e2ea8')
> 0.25
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> ValueError: Integers to negative integer powers are not allowed.
>
>
> --
> Yaroslav O. Halchenko
> Center for Open Neuroscience     http://centerforopenneuroscience.org
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/0f05e9e5/attachment.html>

From shoyer at gmail.com  Tue Jan  3 19:09:55 2017
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 3 Jan 2017 16:09:55 -0800
Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative
 integer powers...
In-Reply-To: <CAPJVwBmEf37VU4U0FLqXXsZkUtMVjSPmGH1mgzsxuazD40p_Uw@mail.gmail.com>
References: <CACUQHuLH0hgPyv3MV=kMNBgREb--BACTYTWgAZZZM2qpPWcxbw@mail.gmail.com>
 <20170103170004.GA7160@onerussian.com>
 <CAEQ_Tve5=GHBjyU162mqjzcAAqSwt7c0Of3hZ97b-849Y1ntnA@mail.gmail.com>
 <20170103214659.GB7160@onerussian.com>
 <CAPJVwBmEf37VU4U0FLqXXsZkUtMVjSPmGH1mgzsxuazD40p_Uw@mail.gmail.com>
Message-ID: <CAEQ_TvfK6vu0jCj-wHgVWTvz4enxD+xXTUwKLQSW3b2WQn070Q@mail.gmail.com>

On Tue, Jan 3, 2017 at 3:05 PM, Nathaniel Smith <njs at pobox.com> wrote:

> It's possible we should back off to just issuing a deprecation warning in
> 1.12?
>
> On Jan 3, 2017 1:47 PM, "Yaroslav Halchenko" <lists at onerussian.com> wrote:
>
>> hm... testing on current master (first result is from python's pow)
>>
>> $> python -c "import numpy; print('numpy version: ', numpy.__version__);
>> a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
>> ('numpy version: ', '1.13.0.dev0+02e2ea8')
>> 0.25
>> Traceback (most recent call last):
>>   File "<string>", line 1, in <module>
>> ValueError: Integers to negative integer powers are not allowed.
>>
>>
>> testing on Debian's packaged beta
>>
>> $> python -c "import numpy; print('numpy version: ', numpy.__version__);
>> a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
>> ('numpy version: ', '1.12.0b1')
>> 0.25
>> Traceback (most recent call last):
>>   File "<string>", line 1, in <module>
>> ValueError: Integers to negative integer powers are not allowed.
>>
>>
>> testing on stable debian box with elderly numpy, where it does behave
>> sensibly:
>>
>> $> python -c "import numpy; print('numpy version: ', numpy.__version__);
>> a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
>> ('numpy version: ', '1.8.2')
>> 0.25
>> 0
>>
>> what am I missing?
>>
>>
2 ** -2 should be 0.25.

On old versions of NumPy, you see the the incorrect answer 0. We are now
preferring to give an error rather than the wrong answer.


> >    The pandas test suite triggered this behavior, but not intentionally,
>> and
>> >    should be fixed in the next release:
>> >    https://github.com/pandas-dev/pandas/pull/14498
>>
>> I don't think that was the full set of cases, e.g.
>>
>> (git)hopa/sid-i386:~exppsy/pandas[bf-i386]
>> $> nosetests -s -v pandas/tests/test_expressions.
>> py:TestExpressions.test_mixed_arithmetic_series
>> test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions)
>> ... ERROR
>>
>> ======================================================================
>> ERROR: test_mixed_arithmetic_series (pandas.tests.test_expressions
>> .TestExpressions)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
>> line 223, in test_mixed_arithmetic_series
>>     self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4)
>>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
>> line 164, in run_series
>>     test_flex=False, **kwargs)
>>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py",
>> line 93, in run_arithmetic_test
>>     expected = op(df, other)
>>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
>> 715, in wrapper
>>     result = wrap_results(safe_na_op(lvalues, rvalues))
>>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
>> 676, in safe_na_op
>>     return na_op(lvalues, rvalues)
>>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line
>> 652, in na_op
>>     raise_on_error=True, **eval_kwargs)
>>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py",
>> line 210, in evaluate
>>     **eval_kwargs)
>>   File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py",
>> line 63, in _evaluate_standard
>>     return op(a, b)
>> ValueError: Integers to negative integer powers are not allowed.
>>
>
Agreed, it looks like pandas still has this issue in the test suite.
Nonetheless, I don't think this should be an issue for users -- pandas
defines all handling of arithmetic to numpy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170103/fa4a6082/attachment.html>

From lists at onerussian.com  Tue Jan  3 21:24:43 2017
From: lists at onerussian.com (Yaroslav Halchenko)
Date: Tue, 3 Jan 2017 21:24:43 -0500
Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative
 integer powers...
In-Reply-To: <CAEQ_TvfK6vu0jCj-wHgVWTvz4enxD+xXTUwKLQSW3b2WQn070Q@mail.gmail.com>
References: <CACUQHuLH0hgPyv3MV=kMNBgREb--BACTYTWgAZZZM2qpPWcxbw@mail.gmail.com>
 <20170103170004.GA7160@onerussian.com>
 <CAEQ_Tve5=GHBjyU162mqjzcAAqSwt7c0Of3hZ97b-849Y1ntnA@mail.gmail.com>
 <20170103214659.GB7160@onerussian.com>
 <CAPJVwBmEf37VU4U0FLqXXsZkUtMVjSPmGH1mgzsxuazD40p_Uw@mail.gmail.com>
 <CAEQ_TvfK6vu0jCj-wHgVWTvz4enxD+xXTUwKLQSW3b2WQn070Q@mail.gmail.com>
Message-ID: <20170104022443.GC7160@onerussian.com>


On Tue, 03 Jan 2017, Stephan Hoyer wrote:
> >> testing on stable debian box with elderly numpy, where it does behave
> >> sensibly:

> >> $> python -c "import numpy; print('numpy version: ', numpy.__version__);
> >> a=2; b=-2;  print(pow(a,b)); print(pow(numpy.array(a), b))"
> >> ('numpy version: ', '1.8.2')
> >> 0.25
> >> 0

> >> what am I missing?

> 2 ** -2 should be 0.25.

> On old versions of NumPy, you see the the incorrect answer 0. We are now
> preferring to give an error rather than the wrong answer.

it is correct up to casting/truncating to an int for the desire to
maintain the int data type -- the same as

>>> int(0.25)
0
>>> 1/4
0

or even

>>> np.arange(5)/4
array([0, 0, 0, 0, 1])

so it is IMHO more of a documented feature and I don't see why pow needs
to get all so special.  Sure thing, in the bring future, unless
in-place operation is demanded I would have voted for consistent float
output.

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        


From alex.rogozhnikov at yandex.ru  Thu Jan  5 04:12:24 2017
From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov)
Date: Thu, 5 Jan 2017 13:12:24 +0400
Subject: [Numpy-discussion] From Python to Numpy
In-Reply-To: <62BD0BF9-534F-4A63-98E0-CE1C0137F805@inria.fr>
References: <C83AAA15-79DE-41C5-A327-36B11EE87394@inria.fr>
 <CAB-sx63zHL26F=OEhXNEqjSJhXVfvmieHOZx1Cg9U-fezEnZjg@mail.gmail.com>
 <A1B3309B-3823-49D7-88F0-E59CA47215D9@yandex.ru>
 <62BD0BF9-534F-4A63-98E0-CE1C0137F805@inria.fr>
Message-ID: <31A0AC3C-8E06-4B51-A82C-44E52C5C7A58@yandex.ru>


> 31 ???. 2016 ?., ? 2:09, Nicolas P. Rougier <Nicolas.Rougier at inria.fr> ???????(?):
> 
>> 
>> On 30 Dec 2016, at 20:36, Alex Rogozhnikov <alex.rogozhnikov at yandex.ru> wrote:
>> 
>> Hi Nicolas, 
>> that's a very nice work!
>> 
>>> Comments/questions/fixes/ideas are of course welcome.
>> 
>> Boids example brought my attention too, some comments on it:
>> - I find using complex numbers here very natural, this should speed up things and also shorten the code (rotating without einsum, etc.)
>> - you probably can speed up things with going to sparse arrays 
>> - and you can go to really large numbers of 'birds' if you combine it with preliminary splitting of space into squares, thus analyze only birds from close squares
>> 
>> Also I think worth adding some operations with HSV / HSL color spaces as those can be visualized easily e.g. on some photo.
>> 
>> Thanks,
>> Alex.
> 
> 
> Thanks.
> 
> I'm not sure to know how to use complex with this example. Could you elaborate ?

Position and velocity are encoded by complex numbers.
Rotation is multiplication by exp(i \phi), translating is adding a complex number.
Distance = abs(x - y). 

I think, that's all operations you need, but maybe I miss something.

> 
> For the preliminary splitting, a quadtree (scipy KDTree) could also help a lot but I wanted to stick to numpy only.
> A simpler square splitting as you suggest could make thing faster but require some work. I'm not sure yet I see how to restrict analysis to close squares.
> 
> Nicolas
> 
> 
>> 
>> 
>> 
>>> 23 ???. 2016 ?., ? 12:14, Kiko <kikocorreoso at gmail.com> ???????(?):
>>> 
>>> 
>>> 
>>> 2016-12-22 17:44 GMT+01:00 Nicolas P. Rougier <Nicolas.Rougier at inria.fr>:
>>> 
>>> Dear all,
>>> 
>>> I've just put online a (kind of) book on Numpy and more specifically about vectorization methods. It's not yet finished, has not been reviewed and it's a bit rough around the edges. But I think there are some material that can be interesting. I'm specifically happy with the boids example that show a nice combination of numpy and matplotlib strengths.
>>> 
>>> Book is online at: http://www.labri.fr/perso/nrougier/from-python-to-numpy/
>>> Sources are available at: https://github.com/rougier/from-python-to-numpy
>>> 
>>> 
>>> Comments/questions/fixes/ideas are of course welcome.
>>> 
>>> Wow!!! Beautiful.
>>> 
>>> Thanks for sharing.
>>> 
>>> 
>>> 
>>> Nicolas
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> 
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
> https://mail.scipy.org/mailman/listinfo/numpy-discussion <https://mail.scipy.org/mailman/listinfo/numpy-discussion>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170105/ea235e0a/attachment.html>

From loic.esteve at ymail.com  Thu Jan  5 15:11:43 2017
From: loic.esteve at ymail.com (=?UTF-8?B?TG/Dr2MgRXN0w6h2ZQ==?=)
Date: Thu, 5 Jan 2017 21:11:43 +0100
Subject: [Numpy-discussion] Proposed change in memmap offset attribute
Message-ID: <o4m99r$sst$1@blaine.gmane.org>

Dear all,

I have a PR at https://github.com/numpy/numpy/pull/8443 that proposes to 
change the value of the offset attribute of memmap objects. At the 
moment it is not the offset into the memmap file (as the docstring would 
lead you to believe) but this modulo mmap.ALLOCATIONGRANULARITY.

It was deemed best to double-check on the mailing list whether anyone 
could think of a good reason why this is the case and/or whether anyone 
was using this property of the offset attribute.

If you have comments about this proposed change, it is probably best if 
you do it on the PR in order to keep the discussion all in the same place.

Cheers,
Lo?c


From ralf.gommers at gmail.com  Fri Jan  6 19:19:12 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 7 Jan 2017 13:19:12 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
Message-ID: <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>

On Wed, Jan 4, 2017 at 9:07 AM, Bryan Van de Ven <bryanv at continuum.io>
wrote:

> There's a good chance that bokeh.charts will be split off into a
> separately distributed package as well. Hopefully being a much smaller,
> pure Python project makes it a more accessible target for anyone interested
> in maintaining it, and if no one is interested in it anymore, well that
> fact becomes easier to judge. I think it would be a reasonable approach
> here for the same reasons.
>
> Bryan
>
> On Jan 3, 2017, at 13:54, Benjamin Root <ben.v.root at gmail.com> wrote:
>
> That's not a bad idea. Matplotlib is currently considering something
> similar for its mlab module. It has been there since the beginning, but it
> is very outdated and very out-of-scope for matplotlib. However, there are
> still lots of code out there that depends on it. So, we are looking to
> split it off as its own package. The details still need to be worked out
> (should we initially depend on the package and simply alias its import with
> a DeprecationWarning, or should we go cold turkey and have a good message
> explaining the change).
>
> Don't go cold turkey please, that still would break a lot of code. Even
with a good message, breaking things isn't great.


>
> Ben Root
>
>
> On Tue, Jan 3, 2017 at 2:31 PM, Todd <toddrjen at gmail.com> wrote:
>
>> On Mon, Jan 2, 2017 at 8:36 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Just throwing this click bait out for discussion. Now that the `@`
>>> operator is available and things seem to be moving towards Python 3,
>>> especially in the classroom, we should consider the real possibility of
>>> deprecating the matrix type and later removing it. No doubt there are old
>>> scripts that require them, but older versions of numpy are available for
>>> those who need to run old scripts.
>>>
>>> Thoughts?
>>>
>>> Chuck
>>>
>>>
>> What if the matrix class was split out into its own project, perhaps as a
>> scikit.
>>
> Something like "npmatrix" would be a better name, we'd like to keep
scikit- for active well-maintained projects I'd think.


>   That way those who really need it can still use it.  If there is
>> sufficient desire for it, those who need it can maintain it.  If not, it
>> will hopefully it will take long enough for it to bitrot that everyone has
>> transitioned.
>>
>
This sounds like a reasonable idea. Timeline could be something like:

1. Now: create new package, deprecate np.matrix in docs.
2. In say 1.5 years: start issuing visible deprecation warnings in numpy
3. After 2020: remove matrix from numpy.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/afff6418/attachment.html>

From perimosocordiae at gmail.com  Fri Jan  6 20:21:36 2017
From: perimosocordiae at gmail.com (CJ Carey)
Date: Fri, 6 Jan 2017 19:21:36 -0600
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
Message-ID: <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>

On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

> This sounds like a reasonable idea. Timeline could be something like:
>
> 1. Now: create new package, deprecate np.matrix in docs.
> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
> 3. After 2020: remove matrix from numpy.
>
> Ralf
>

I think this sounds reasonable, and reminds me of the deliberate
deprecation process taken for scipy.weave. I guess we'll see how successful
it was when 0.19 is released.

The major problem I have with removing numpy matrices is the effect on
scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
often produces numpy.matrix results when densifying. The two are coupled
tightly enough that if numpy matrices go away, all of the existing sparse
matrix classes will have to go at the same time.

I don't think that would be the end of the world, but it's definitely
something that should happen while scipy is still pre-1.0, if it's ever
going to happen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170106/ecc8fc83/attachment.html>

From ralf.gommers at gmail.com  Fri Jan  6 20:28:48 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 7 Jan 2017 14:28:48 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
Message-ID: <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>

On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com> wrote:

>
> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>> This sounds like a reasonable idea. Timeline could be something like:
>>
>> 1. Now: create new package, deprecate np.matrix in docs.
>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
>> 3. After 2020: remove matrix from numpy.
>>
>> Ralf
>>
>
> I think this sounds reasonable, and reminds me of the deliberate
> deprecation process taken for scipy.weave. I guess we'll see how successful
> it was when 0.19 is released.
>
> The major problem I have with removing numpy matrices is the effect on
> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
> often produces numpy.matrix results when densifying. The two are coupled
> tightly enough that if numpy matrices go away, all of the existing sparse
> matrix classes will have to go at the same time.
>
> I don't think that would be the end of the world,
>

Not the end of the world literally, but the impact would be pretty major. I
think we're stuck with scipy.sparse, and may at some point will add a new
sparse *array* implementation next to it. For scipy we will have to add a
dependency on the new npmatrix package or vendor it.

Ralf


> but it's definitely something that should happen while scipy is still
> pre-1.0, if it's ever going to happen.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/77654787/attachment.html>

From josef.pktd at gmail.com  Fri Jan  6 20:37:13 2017
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Fri, 6 Jan 2017 20:37:13 -0500
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
Message-ID: <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>

On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com>
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>> This sounds like a reasonable idea. Timeline could be something like:
>>>
>>> 1. Now: create new package, deprecate np.matrix in docs.
>>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
>>> 3. After 2020: remove matrix from numpy.
>>>
>>> Ralf
>>>
>>
>> I think this sounds reasonable, and reminds me of the deliberate
>> deprecation process taken for scipy.weave. I guess we'll see how successful
>> it was when 0.19 is released.
>>
>> The major problem I have with removing numpy matrices is the effect on
>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
>> often produces numpy.matrix results when densifying. The two are coupled
>> tightly enough that if numpy matrices go away, all of the existing sparse
>> matrix classes will have to go at the same time.
>>
>> I don't think that would be the end of the world,
>>
>
> Not the end of the world literally, but the impact would be pretty major.
> I think we're stuck with scipy.sparse, and may at some point will add a new
> sparse *array* implementation next to it. For scipy we will have to add a
> dependency on the new npmatrix package or vendor it.
>

That sounds to me like moving maintenance of numpy.matrix from numpy to
scipy, if scipy.sparse is one of the main users and still depends on it.

Josef


>
> Ralf
>
>
>
>> but it's definitely something that should happen while scipy is still
>> pre-1.0, if it's ever going to happen.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170106/cac1f7a2/attachment.html>

From charlesr.harris at gmail.com  Fri Jan  6 20:52:59 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 6 Jan 2017 18:52:59 -0700
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>
Message-ID: <CAB6mnxKA4BGGK9cXzM8jMZR5oF9i=qKTvgChZO57OU0y1ZJKDQ@mail.gmail.com>

On Fri, Jan 6, 2017 at 6:37 PM, <josef.pktd at gmail.com> wrote:

>
>
>
> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com>
>> wrote:
>>
>>>
>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>> This sounds like a reasonable idea. Timeline could be something like:
>>>>
>>>> 1. Now: create new package, deprecate np.matrix in docs.
>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
>>>> 3. After 2020: remove matrix from numpy.
>>>>
>>>> Ralf
>>>>
>>>
>>> I think this sounds reasonable, and reminds me of the deliberate
>>> deprecation process taken for scipy.weave. I guess we'll see how successful
>>> it was when 0.19 is released.
>>>
>>> The major problem I have with removing numpy matrices is the effect on
>>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
>>> often produces numpy.matrix results when densifying. The two are coupled
>>> tightly enough that if numpy matrices go away, all of the existing sparse
>>> matrix classes will have to go at the same time.
>>>
>>> I don't think that would be the end of the world,
>>>
>>
>> Not the end of the world literally, but the impact would be pretty major.
>> I think we're stuck with scipy.sparse, and may at some point will add a new
>> sparse *array* implementation next to it. For scipy we will have to add a
>> dependency on the new npmatrix package or vendor it.
>>
>
> That sounds to me like moving maintenance of numpy.matrix from numpy to
> scipy, if scipy.sparse is one of the main users and still depends on it.
>

What I was thinking was encouraging folks to use `arr.dot(...)` or `@`
instead of `*` for matrix multiplication, keeping `*` for scalar
multiplication. If those operations were defined for matrices, then at some
point sparse could go to arrays and it would not be noticeable except for
the treatment of 1-D arrays -- which admittedly might be a bit tricky.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170106/394a9088/attachment.html>

From ralf.gommers at gmail.com  Sat Jan  7 02:59:32 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 7 Jan 2017 20:59:32 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnxKA4BGGK9cXzM8jMZR5oF9i=qKTvgChZO57OU0y1ZJKDQ@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>
 <CAB6mnxKA4BGGK9cXzM8jMZR5oF9i=qKTvgChZO57OU0y1ZJKDQ@mail.gmail.com>
Message-ID: <CABL7CQiBWragmqG84jRLzPdqUWZiwAo4F8XOEH3Smw=WT=7uzw@mail.gmail.com>

On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Fri, Jan 6, 2017 at 6:37 PM, <josef.pktd at gmail.com> wrote:
>
>>
>>
>>
>> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com>
>>>> wrote:
>>>>
>>>>> This sounds like a reasonable idea. Timeline could be something like:
>>>>>
>>>>> 1. Now: create new package, deprecate np.matrix in docs.
>>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in
>>>>> numpy
>>>>> 3. After 2020: remove matrix from numpy.
>>>>>
>>>>> Ralf
>>>>>
>>>>
>>>> I think this sounds reasonable, and reminds me of the deliberate
>>>> deprecation process taken for scipy.weave. I guess we'll see how successful
>>>> it was when 0.19 is released.
>>>>
>>>> The major problem I have with removing numpy matrices is the effect on
>>>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
>>>> often produces numpy.matrix results when densifying. The two are coupled
>>>> tightly enough that if numpy matrices go away, all of the existing sparse
>>>> matrix classes will have to go at the same time.
>>>>
>>>> I don't think that would be the end of the world,
>>>>
>>>
>>> Not the end of the world literally, but the impact would be pretty
>>> major. I think we're stuck with scipy.sparse, and may at some point will
>>> add a new sparse *array* implementation next to it. For scipy we will have
>>> to add a dependency on the new npmatrix package or vendor it.
>>>
>>
>> That sounds to me like moving maintenance of numpy.matrix from numpy to
>> scipy, if scipy.sparse is one of the main users and still depends on it.
>>
>
Maintenance costs are pretty low, and are partly still for numpy (it has to
keep subclasses like np.matrix working. I'm not too worried about the
effort. The purpose here is to remove np.matrix from numpy so beginners
will never see it. Educating sparse matrix users is a lot easier, and there
are a lot less such users.


> What I was thinking was encouraging folks to use `arr.dot(...)` or `@`
> instead of `*` for matrix multiplication, keeping `*` for scalar
> multiplication.
>

I don't think that change in behavior of `*` is doable.


> If those operations were defined for matrices,
>

Why if? They are defined, and work as expected as far as I can tell.


> then at some point sparse could go to arrays and it would not be
> noticeable except for the treatment of 1-D arrays -- which admittedly might
> be a bit tricky.
>

I'd like that to be feasible, but especially given that any such change
would not break code but rather silently change numerical values, it's
likely not a healthy idea.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/895f1610/attachment.html>

From njs at pobox.com  Sat Jan  7 03:39:43 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 7 Jan 2017 00:39:43 -0800
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQiBWragmqG84jRLzPdqUWZiwAo4F8XOEH3Smw=WT=7uzw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>
 <CAB6mnxKA4BGGK9cXzM8jMZR5oF9i=qKTvgChZO57OU0y1ZJKDQ@mail.gmail.com>
 <CABL7CQiBWragmqG84jRLzPdqUWZiwAo4F8XOEH3Smw=WT=7uzw@mail.gmail.com>
Message-ID: <CAPJVwB=gyihwvzSKbBqQXQkLMr0a+0XX-xpJQOjqARGD-bBySw@mail.gmail.com>

On Fri, Jan 6, 2017 at 11:59 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
> On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris <charlesr.harris at gmail.com>
> wrote:
>>
>>
>>
>> On Fri, Jan 6, 2017 at 6:37 PM, <josef.pktd at gmail.com> wrote:
>>>
>>>
>>>
>>>
>>> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> This sounds like a reasonable idea. Timeline could be something like:
>>>>>>
>>>>>> 1. Now: create new package, deprecate np.matrix in docs.
>>>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in
>>>>>> numpy
>>>>>> 3. After 2020: remove matrix from numpy.
>>>>>>
>>>>>> Ralf
>>>>>
>>>>>
>>>>> I think this sounds reasonable, and reminds me of the deliberate
>>>>> deprecation process taken for scipy.weave. I guess we'll see how successful
>>>>> it was when 0.19 is released.
>>>>>
>>>>> The major problem I have with removing numpy matrices is the effect on
>>>>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
>>>>> often produces numpy.matrix results when densifying. The two are coupled
>>>>> tightly enough that if numpy matrices go away, all of the existing sparse
>>>>> matrix classes will have to go at the same time.
>>>>>
>>>>> I don't think that would be the end of the world,
>>>>
>>>>
>>>> Not the end of the world literally, but the impact would be pretty
>>>> major. I think we're stuck with scipy.sparse, and may at some point will add
>>>> a new sparse *array* implementation next to it. For scipy we will have to
>>>> add a dependency on the new npmatrix package or vendor it.
>>>
>>>
>>> That sounds to me like moving maintenance of numpy.matrix from numpy to
>>> scipy, if scipy.sparse is one of the main users and still depends on it.
>
>
> Maintenance costs are pretty low, and are partly still for numpy (it has to
> keep subclasses like np.matrix working. I'm not too worried about the
> effort. The purpose here is to remove np.matrix from numpy so beginners will
> never see it. Educating sparse matrix users is a lot easier, and there are a
> lot less such users.
>
>>
>> What I was thinking was encouraging folks to use `arr.dot(...)` or `@`
>> instead of `*` for matrix multiplication, keeping `*` for scalar
>> multiplication.
>
>
> I don't think that change in behavior of `*` is doable.

I guess it would be technically possible to have matrix.__mul__ issue
a deprecation warning before matrix.__init__ does, to try and
encourage people to switch to using .dot and/or @, and thus make it
easier to later port their code to regular arrays? I'm not immediately
seeing how this would help much though, since there would still be
this second porting step required. Especially since there's still lots
of room for things to break at that second step due to matrix's
insistence that everything be 2d always, and my impression is that
users are more annoyed by two-step migrations than one-step
migrations.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


From ralf.gommers at gmail.com  Sat Jan  7 03:52:19 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 7 Jan 2017 21:52:19 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAPJVwB=gyihwvzSKbBqQXQkLMr0a+0XX-xpJQOjqARGD-bBySw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>
 <CAB6mnxKA4BGGK9cXzM8jMZR5oF9i=qKTvgChZO57OU0y1ZJKDQ@mail.gmail.com>
 <CABL7CQiBWragmqG84jRLzPdqUWZiwAo4F8XOEH3Smw=WT=7uzw@mail.gmail.com>
 <CAPJVwB=gyihwvzSKbBqQXQkLMr0a+0XX-xpJQOjqARGD-bBySw@mail.gmail.com>
Message-ID: <CABL7CQj64yyZaf5krq2Ts8rHXXxR8XEq4bSMv4nA2wM1AfB91w@mail.gmail.com>

On Sat, Jan 7, 2017 at 9:39 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Jan 6, 2017 at 11:59 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> >
> > On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris <
> charlesr.harris at gmail.com>
> > wrote:
> >>
> >>
> >>
> >> On Fri, Jan 6, 2017 at 6:37 PM, <josef.pktd at gmail.com> wrote:
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers <ralf.gommers at gmail.com>
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>
> >>>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com
> >
> >>>>> wrote:
> >>>>>>
> >>>>>> This sounds like a reasonable idea. Timeline could be something
> like:
> >>>>>>
> >>>>>> 1. Now: create new package, deprecate np.matrix in docs.
> >>>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in
> >>>>>> numpy
> >>>>>> 3. After 2020: remove matrix from numpy.
> >>>>>>
> >>>>>> Ralf
> >>>>>
> >>>>>
> >>>>> I think this sounds reasonable, and reminds me of the deliberate
> >>>>> deprecation process taken for scipy.weave. I guess we'll see how
> successful
> >>>>> it was when 0.19 is released.
> >>>>>
> >>>>> The major problem I have with removing numpy matrices is the effect
> on
> >>>>> scipy.sparse, which mostly-consistently mimics numpy.matrix
> semantics and
> >>>>> often produces numpy.matrix results when densifying. The two are
> coupled
> >>>>> tightly enough that if numpy matrices go away, all of the existing
> sparse
> >>>>> matrix classes will have to go at the same time.
> >>>>>
> >>>>> I don't think that would be the end of the world,
> >>>>
> >>>>
> >>>> Not the end of the world literally, but the impact would be pretty
> >>>> major. I think we're stuck with scipy.sparse, and may at some point
> will add
> >>>> a new sparse *array* implementation next to it. For scipy we will
> have to
> >>>> add a dependency on the new npmatrix package or vendor it.
> >>>
> >>>
> >>> That sounds to me like moving maintenance of numpy.matrix from numpy to
> >>> scipy, if scipy.sparse is one of the main users and still depends on
> it.
> >
> >
> > Maintenance costs are pretty low, and are partly still for numpy (it has
> to
> > keep subclasses like np.matrix working. I'm not too worried about the
> > effort. The purpose here is to remove np.matrix from numpy so beginners
> will
> > never see it. Educating sparse matrix users is a lot easier, and there
> are a
> > lot less such users.
> >
> >>
> >> What I was thinking was encouraging folks to use `arr.dot(...)` or `@`
> >> instead of `*` for matrix multiplication, keeping `*` for scalar
> >> multiplication.
> >
> >
> > I don't think that change in behavior of `*` is doable.
>
> I guess it would be technically possible to have matrix.__mul__ issue
> a deprecation warning before matrix.__init__ does, to try and
> encourage people to switch to using .dot and/or @, and thus make it
> easier to later port their code to regular arrays?


Yes, but that's not very relevant. I'm saying "not doable" since after the
debacle with changing diag return to a view my understanding is we decided
that it's a bad idea to make changes that don't break code but return
different numerical results. There's no good way to work around that here.

With something as widely used as np.matrix, you simply cannot rely on
people porting code. You just need to phase out np.matrix in a way that
breaks code but never changes behavior silently (even across multiple
releases).

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/d0b9ffad/attachment.html>

From jni.soma at gmail.com  Sat Jan  7 05:12:16 2017
From: jni.soma at gmail.com (Juan Nunez-Iglesias)
Date: Sat, 7 Jan 2017 21:12:16 +1100
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQj64yyZaf5krq2Ts8rHXXxR8XEq4bSMv4nA2wM1AfB91w@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>
 <CAB6mnxKA4BGGK9cXzM8jMZR5oF9i=qKTvgChZO57OU0y1ZJKDQ@mail.gmail.com>
 <CABL7CQiBWragmqG84jRLzPdqUWZiwAo4F8XOEH3Smw=WT=7uzw@mail.gmail.com>
 <CAPJVwB=gyihwvzSKbBqQXQkLMr0a+0XX-xpJQOjqARGD-bBySw@mail.gmail.com>
 <CABL7CQj64yyZaf5krq2Ts8rHXXxR8XEq4bSMv4nA2wM1AfB91w@mail.gmail.com>
Message-ID: <2cd571f0-6391-46f3-92d2-1a0bc84e3466@Spark>

Hi all! I've been lurking on this discussion, and don't have too much to add except to encourage a fast deprecation: I can't wait for sparse matrices to have an element-wise multiply operator.

On 7 Jan 2017, 7:52 PM +1100, Ralf Gommers <ralf.gommers at gmail.com>, wrote:
>
>
> On Sat, Jan 7, 2017 at 9:39 PM, Nathaniel Smith <njs at pobox.com (mailto:njs at pobox.com)> wrote:
> > On Fri, Jan 6, 2017 at 11:59 PM, Ralf Gommers <ralf.gommers at gmail.com (mailto:ralf.gommers at gmail.com)> wrote:
> > >
> > >
> > > On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris <charlesr.harris at gmail.com (mailto:charlesr.harris at gmail.com)>
> > > wrote:
> > >>
> > >>
> > >>
> > >> On Fri, Jan 6, 2017 at 6:37 PM, <josef.pktd at gmail.com (mailto:josef.pktd at gmail.com)> wrote:
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers <ralf.gommers at gmail.com (mailto:ralf.gommers at gmail.com)>
> > >>> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com (mailto:perimosocordiae at gmail.com)>
> > >>>> wrote:
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com (mailto:ralf.gommers at gmail.com)>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> This sounds like a reasonable idea. Timeline could be something like:
> > >>>>>>
> > >>>>>> 1. Now: create new package, deprecate np.matrix in docs.
> > >>>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in
> > >>>>>> numpy
> > >>>>>> 3. After 2020: remove matrix from numpy.
> > >>>>>>
> > >>>>>> Ralf
> > >>>>>
> > >>>>>
> > >>>>> I think this sounds reasonable, and reminds me of the deliberate
> > >>>>> deprecation process taken for scipy.weave. I guess we'll see how successful
> > >>>>> it was when 0.19 is released.
> > >>>>>
> > >>>>> The major problem I have with removing numpy matrices is the effect on
> > >>>>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
> > >>>>> often produces numpy.matrix results when densifying. The two are coupled
> > >>>>> tightly enough that if numpy matrices go away, all of the existing sparse
> > >>>>> matrix classes will have to go at the same time.
> > >>>>>
> > >>>>> I don't think that would be the end of the world,
> > >>>>
> > >>>>
> > >>>> Not the end of the world literally, but the impact would be pretty
> > >>>> major. I think we're stuck with scipy.sparse, and may at some point will add
> > >>>> a new sparse *array* implementation next to it. For scipy we will have to
> > >>>> add a dependency on the new npmatrix package or vendor it.
> > >>>
> > >>>
> > >>> That sounds to me like moving maintenance of numpy.matrix from numpy to
> > >>> scipy, if scipy.sparse is one of the main users and still depends on it.
> > >
> > >
> > > Maintenance costs are pretty low, and are partly still for numpy (it has to
> > > keep subclasses like np.matrix working. I'm not too worried about the
> > > effort. The purpose here is to remove np.matrix from numpy so beginners will
> > > never see it. Educating sparse matrix users is a lot easier, and there are a
> > > lot less such users.
> > >
> > >>
> > >> What I was thinking was encouraging folks to use `arr.dot(...)` or `@`
> > >> instead of `*` for matrix multiplication, keeping `*` for scalar
> > >> multiplication.
> > >
> > >
> > > I don't think that change in behavior of `*` is doable.
> >
> > I guess it would be technically possible to have matrix.__mul__ issue
> > a deprecation warning before matrix.__init__ does, to try and
> > encourage people to switch to using .dot and/or @, and thus make it
> > easier to later port their code to regular arrays?
>
> Yes, but that's not very relevant. I'm saying "not doable" since after the debacle with changing diag return to a view my understanding is we decided that it's a bad idea to make changes that don't break code but return different numerical results. There's no good way to work around that here.
>
> With something as widely used as np.matrix, you simply cannot rely on people porting code. You just need to phase out np.matrix in a way that breaks code but never changes behavior silently (even across multiple releases).
>
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/3b8a0b34/attachment.html>

From m.h.vankerkwijk at gmail.com  Sat Jan  7 14:33:08 2017
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 7 Jan 2017 14:33:08 -0500
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <2cd571f0-6391-46f3-92d2-1a0bc84e3466@Spark>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>
 <CAB6mnxKA4BGGK9cXzM8jMZR5oF9i=qKTvgChZO57OU0y1ZJKDQ@mail.gmail.com>
 <CABL7CQiBWragmqG84jRLzPdqUWZiwAo4F8XOEH3Smw=WT=7uzw@mail.gmail.com>
 <CAPJVwB=gyihwvzSKbBqQXQkLMr0a+0XX-xpJQOjqARGD-bBySw@mail.gmail.com>
 <CABL7CQj64yyZaf5krq2Ts8rHXXxR8XEq4bSMv4nA2wM1AfB91w@mail.gmail.com>
 <2cd571f0-6391-46f3-92d2-1a0bc84e3466@Spark>
Message-ID: <CAJNV+9suTt_XYL1VyoYwew432EUpi_Y7bfs0A_WGPi_Odxa2eQ@mail.gmail.com>

Hi All,

It seems there are two steps that can be taken now and are needed no
matter what:

1. Add numpy documentation describing the preferred way to handle
matrices, extolling the virtues of @, and move np.matrix documentation
to a deprecated section

2. Start on a new `sparse` class that is based on regular arrays (and
uses `__array_func__` instead of prepare/wrap?).

All the best,

Marten


From toddrjen at gmail.com  Sat Jan  7 15:31:13 2017
From: toddrjen at gmail.com (Todd)
Date: Sat, 7 Jan 2017 15:31:13 -0500
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
Message-ID: <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>

On Jan 6, 2017 20:28, "Ralf Gommers" <ralf.gommers at gmail.com> wrote:


On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com> wrote:

>
> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>> This sounds like a reasonable idea. Timeline could be something like:
>>
>> 1. Now: create new package, deprecate np.matrix in docs.
>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
>> 3. After 2020: remove matrix from numpy.
>>
>> Ralf
>>
>
> I think this sounds reasonable, and reminds me of the deliberate
> deprecation process taken for scipy.weave. I guess we'll see how successful
> it was when 0.19 is released.
>
> The major problem I have with removing numpy matrices is the effect on
> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
> often produces numpy.matrix results when densifying. The two are coupled
> tightly enough that if numpy matrices go away, all of the existing sparse
> matrix classes will have to go at the same time.
>
> I don't think that would be the end of the world,
>

Not the end of the world literally, but the impact would be pretty major. I
think we're stuck with scipy.sparse, and may at some point will add a new
sparse *array* implementation next to it. For scipy we will have to add a
dependency on the new npmatrix package or vendor it.

Ralf


> but it's definitely something that should happen while scipy is still
> pre-1.0, if it's ever going to happen.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


So what about this:

1. Create a sparse array class
2. (optional) Refactor the sparse matrix class to be based on the sparse
array class (may not be feasible)
3. Copy the spare matrix class into the matrix package
4. Deprecate the scipy sparse matrix class
5. Remove the scipy sparse matrix class when the numpy matrix class

I don't know about the timeline, but this would just need to be done by
2020.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/b50a531b/attachment.html>

From ralf.gommers at gmail.com  Sat Jan  7 16:22:48 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 8 Jan 2017 10:22:48 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAJNV+9suTt_XYL1VyoYwew432EUpi_Y7bfs0A_WGPi_Odxa2eQ@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAMMTP+CmouTBBD7cVjOTt1kUBLLi_1W3+3pOYmCsdhT2kLOv3g@mail.gmail.com>
 <CAB6mnxKA4BGGK9cXzM8jMZR5oF9i=qKTvgChZO57OU0y1ZJKDQ@mail.gmail.com>
 <CABL7CQiBWragmqG84jRLzPdqUWZiwAo4F8XOEH3Smw=WT=7uzw@mail.gmail.com>
 <CAPJVwB=gyihwvzSKbBqQXQkLMr0a+0XX-xpJQOjqARGD-bBySw@mail.gmail.com>
 <CABL7CQj64yyZaf5krq2Ts8rHXXxR8XEq4bSMv4nA2wM1AfB91w@mail.gmail.com>
 <2cd571f0-6391-46f3-92d2-1a0bc84e3466@Spark>
 <CAJNV+9suTt_XYL1VyoYwew432EUpi_Y7bfs0A_WGPi_Odxa2eQ@mail.gmail.com>
Message-ID: <CABL7CQj_nAF8LU4YFd6jC=zYVQPpX6rOLZF=Z=XviQbFM4hoGA@mail.gmail.com>

On Sun, Jan 8, 2017 at 8:33 AM, Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi All,
>
> It seems there are two steps that can be taken now and are needed no
> matter what:
>
> 1. Add numpy documentation describing the preferred way to handle
> matrices, extolling the virtues of @, and move np.matrix documentation
> to a deprecated section
>

That would be good to do asap. Any volunteers?


>
> 2. Start on a new `sparse` class that is based on regular arrays


There are two efforts that I know of in this direction:
https://github.com/perimosocordiae/sparray
https://github.com/ev-br/sparr

(and
> uses `__array_func__` instead of prepare/wrap?).
>

Getting __array_func__ finally into a released version of numpy will be a
major improvement for sparse matrix behavior (like making
np.dot(some_matrix) work) in itself.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170108/771830b8/attachment.html>

From ralf.gommers at gmail.com  Sat Jan  7 16:29:15 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 8 Jan 2017 10:29:15 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
Message-ID: <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>

On Sun, Jan 8, 2017 at 9:31 AM, Todd <toddrjen at gmail.com> wrote:

>
>
> On Jan 6, 2017 20:28, "Ralf Gommers" <ralf.gommers at gmail.com> wrote:
>
>
>
> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com>
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>> This sounds like a reasonable idea. Timeline could be something like:
>>>
>>> 1. Now: create new package, deprecate np.matrix in docs.
>>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
>>> 3. After 2020: remove matrix from numpy.
>>>
>>> Ralf
>>>
>>
>> I think this sounds reasonable, and reminds me of the deliberate
>> deprecation process taken for scipy.weave. I guess we'll see how successful
>> it was when 0.19 is released.
>>
>> The major problem I have with removing numpy matrices is the effect on
>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
>> often produces numpy.matrix results when densifying. The two are coupled
>> tightly enough that if numpy matrices go away, all of the existing sparse
>> matrix classes will have to go at the same time.
>>
>> I don't think that would be the end of the world,
>>
>
> Not the end of the world literally, but the impact would be pretty major.
> I think we're stuck with scipy.sparse, and may at some point will add a new
> sparse *array* implementation next to it. For scipy we will have to add a
> dependency on the new npmatrix package or vendor it.
>
> Ralf
>
>
>
>> but it's definitely something that should happen while scipy is still
>> pre-1.0, if it's ever going to happen.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> So what about this:
>
> 1. Create a sparse array class
> 2. (optional) Refactor the sparse matrix class to be based on the sparse
> array class (may not be feasible)
> 3. Copy the spare matrix class into the matrix package
> 4. Deprecate the scipy sparse matrix class
> 5. Remove the scipy sparse matrix class when the numpy matrix class
>

It looks to me like we're getting a bit off track here. The sparse matrices
in scipy are heavily used, and despite rough edges pretty good at what they
do. Deprecating them is not a goal.

The actual goal for the exercise that started this thread (at least as I
see it) is to remove np.matrix from numpy itself so users (that don't know
the difference) will only use ndarrays. And the few users that prefer
np.matrix for teaching can now switch because of @, so their preference
should have disappeared.

To reach that goal, no deprecation or backwards incompatible changes to
scipy.sparse are needed.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170108/006898d0/attachment.html>

From charlesr.harris at gmail.com  Sat Jan  7 18:26:07 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 7 Jan 2017 16:26:07 -0700
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
Message-ID: <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>

On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Sun, Jan 8, 2017 at 9:31 AM, Todd <toddrjen at gmail.com> wrote:
>
>>
>>
>> On Jan 6, 2017 20:28, "Ralf Gommers" <ralf.gommers at gmail.com> wrote:
>>
>>
>>
>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey <perimosocordiae at gmail.com>
>> wrote:
>>
>>>
>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>> This sounds like a reasonable idea. Timeline could be something like:
>>>>
>>>> 1. Now: create new package, deprecate np.matrix in docs.
>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy
>>>> 3. After 2020: remove matrix from numpy.
>>>>
>>>> Ralf
>>>>
>>>
>>> I think this sounds reasonable, and reminds me of the deliberate
>>> deprecation process taken for scipy.weave. I guess we'll see how successful
>>> it was when 0.19 is released.
>>>
>>> The major problem I have with removing numpy matrices is the effect on
>>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and
>>> often produces numpy.matrix results when densifying. The two are coupled
>>> tightly enough that if numpy matrices go away, all of the existing sparse
>>> matrix classes will have to go at the same time.
>>>
>>> I don't think that would be the end of the world,
>>>
>>
>> Not the end of the world literally, but the impact would be pretty major.
>> I think we're stuck with scipy.sparse, and may at some point will add a new
>> sparse *array* implementation next to it. For scipy we will have to add a
>> dependency on the new npmatrix package or vendor it.
>>
>> Ralf
>>
>>
>>
>>> but it's definitely something that should happen while scipy is still
>>> pre-1.0, if it's ever going to happen.
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> So what about this:
>>
>> 1. Create a sparse array class
>> 2. (optional) Refactor the sparse matrix class to be based on the sparse
>> array class (may not be feasible)
>> 3. Copy the spare matrix class into the matrix package
>> 4. Deprecate the scipy sparse matrix class
>> 5. Remove the scipy sparse matrix class when the numpy matrix class
>>
>
> It looks to me like we're getting a bit off track here. The sparse
> matrices in scipy are heavily used, and despite rough edges pretty good at
> what they do. Deprecating them is not a goal.
>
> The actual goal for the exercise that started this thread (at least as I
> see it) is to remove np.matrix from numpy itself so users (that don't know
> the difference) will only use ndarrays. And the few users that prefer
> np.matrix for teaching can now switch because of @, so their preference
> should have disappeared.
>
> To reach that goal, no deprecation or backwards incompatible changes to
> scipy.sparse are needed.
>

What is the way forward with sparse? That looks like the biggest blocker on
the road to a matrix free NumPy. I don't see moving the matrix package
elsewhere as a solution for that.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/64964f0c/attachment.html>

From ralf.gommers at gmail.com  Sat Jan  7 18:35:32 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 8 Jan 2017 12:35:32 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
 <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
Message-ID: <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>

On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>> It looks to me like we're getting a bit off track here. The sparse
>> matrices in scipy are heavily used, and despite rough edges pretty good at
>> what they do. Deprecating them is not a goal.
>>
>> The actual goal for the exercise that started this thread (at least as I
>> see it) is to remove np.matrix from numpy itself so users (that don't know
>> the difference) will only use ndarrays. And the few users that prefer
>> np.matrix for teaching can now switch because of @, so their preference
>> should have disappeared.
>>
>> To reach that goal, no deprecation or backwards incompatible changes to
>> scipy.sparse are needed.
>>
>
> What is the way forward with sparse? That looks like the biggest blocker
> on the road to a matrix free NumPy. I don't see moving the matrix package
> elsewhere as a solution for that.
>

Why not?

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170108/35632ef2/attachment.html>

From charlesr.harris at gmail.com  Sat Jan  7 18:42:02 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 7 Jan 2017 16:42:02 -0700
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
 <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
 <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>
Message-ID: <CAB6mnxKMVbhhF2Jv-t5=Jzb4y8x_UGB+pN9EFmxbEAD5R3y9JA@mail.gmail.com>

On Sat, Jan 7, 2017 at 4:35 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>> It looks to me like we're getting a bit off track here. The sparse
>>> matrices in scipy are heavily used, and despite rough edges pretty good at
>>> what they do. Deprecating them is not a goal.
>>>
>>> The actual goal for the exercise that started this thread (at least as I
>>> see it) is to remove np.matrix from numpy itself so users (that don't know
>>> the difference) will only use ndarrays. And the few users that prefer
>>> np.matrix for teaching can now switch because of @, so their preference
>>> should have disappeared.
>>>
>>> To reach that goal, no deprecation or backwards incompatible changes to
>>> scipy.sparse are needed.
>>>
>>
>> What is the way forward with sparse? That looks like the biggest blocker
>> on the road to a matrix free NumPy. I don't see moving the matrix package
>> elsewhere as a solution for that.
>>
>
> Why not?
>
>
Because it doesn't get rid of matrices in SciPy, not does one gain a scalar
multiplication operator for sparse.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/cea92560/attachment.html>

From ralf.gommers at gmail.com  Sat Jan  7 18:51:19 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 8 Jan 2017 12:51:19 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnxKMVbhhF2Jv-t5=Jzb4y8x_UGB+pN9EFmxbEAD5R3y9JA@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
 <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
 <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>
 <CAB6mnxKMVbhhF2Jv-t5=Jzb4y8x_UGB+pN9EFmxbEAD5R3y9JA@mail.gmail.com>
Message-ID: <CABL7CQhYq-E9HSvd8yjtBsZJxwYG3jLcYtzzn4wGWE3SZaJoug@mail.gmail.com>

On Sun, Jan 8, 2017 at 12:42 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, Jan 7, 2017 at 4:35 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>>
>>>> It looks to me like we're getting a bit off track here. The sparse
>>>> matrices in scipy are heavily used, and despite rough edges pretty good at
>>>> what they do. Deprecating them is not a goal.
>>>>
>>>> The actual goal for the exercise that started this thread (at least as
>>>> I see it) is to remove np.matrix from numpy itself so users (that don't
>>>> know the difference) will only use ndarrays. And the few users that prefer
>>>> np.matrix for teaching can now switch because of @, so their preference
>>>> should have disappeared.
>>>>
>>>> To reach that goal, no deprecation or backwards incompatible changes to
>>>> scipy.sparse are needed.
>>>>
>>>
>>> What is the way forward with sparse? That looks like the biggest blocker
>>> on the road to a matrix free NumPy. I don't see moving the matrix package
>>> elsewhere as a solution for that.
>>>
>>
>> Why not?
>>
>>
> Because it doesn't get rid of matrices in SciPy, not does one gain a
> scalar multiplication operator for sparse.
>

That's a different goal though. You can reach the "get matrix out of numpy"
goal fairly easily (docs and packaging work), but if you insist on coupling
it to major changes to scipy.sparse (a lot more work + backwards compat
break), then what will likely happen is: nothing.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170108/441c0fd6/attachment.html>

From charlesr.harris at gmail.com  Sat Jan  7 19:24:06 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 7 Jan 2017 17:24:06 -0700
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQhYq-E9HSvd8yjtBsZJxwYG3jLcYtzzn4wGWE3SZaJoug@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
 <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
 <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>
 <CAB6mnxKMVbhhF2Jv-t5=Jzb4y8x_UGB+pN9EFmxbEAD5R3y9JA@mail.gmail.com>
 <CABL7CQhYq-E9HSvd8yjtBsZJxwYG3jLcYtzzn4wGWE3SZaJoug@mail.gmail.com>
Message-ID: <CAB6mnxLPXxaMc7+MtvC0vHt_DOQAEJ=rCVpe5jKdfzs6J8YCtQ@mail.gmail.com>

On Sat, Jan 7, 2017 at 4:51 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Sun, Jan 8, 2017 at 12:42 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sat, Jan 7, 2017 at 4:35 PM, Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris <
>>> charlesr.harris at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers <ralf.gommers at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> It looks to me like we're getting a bit off track here. The sparse
>>>>> matrices in scipy are heavily used, and despite rough edges pretty good at
>>>>> what they do. Deprecating them is not a goal.
>>>>>
>>>>> The actual goal for the exercise that started this thread (at least as
>>>>> I see it) is to remove np.matrix from numpy itself so users (that don't
>>>>> know the difference) will only use ndarrays. And the few users that prefer
>>>>> np.matrix for teaching can now switch because of @, so their preference
>>>>> should have disappeared.
>>>>>
>>>>> To reach that goal, no deprecation or backwards incompatible changes
>>>>> to scipy.sparse are needed.
>>>>>
>>>>
>>>> What is the way forward with sparse? That looks like the biggest
>>>> blocker on the road to a matrix free NumPy. I don't see moving the matrix
>>>> package elsewhere as a solution for that.
>>>>
>>>
>>> Why not?
>>>
>>>
>> Because it doesn't get rid of matrices in SciPy, not does one gain a
>> scalar multiplication operator for sparse.
>>
>
> That's a different goal though. You can reach the "get matrix out of
> numpy" goal fairly easily (docs and packaging work), but if you insist on
> coupling it to major changes to scipy.sparse (a lot more work + backwards
> compat break), then what will likely happen is: nothing.
>

Could always remove matrix from the top level namespace and make it
private. It still needs to reside someplace as long as sparse uses it.
Fixing sparse is more work, but we have three years and it won't be getting
any easier as time goes on.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/fd22cb93/attachment.html>

From perimosocordiae at gmail.com  Sat Jan  7 19:31:27 2017
From: perimosocordiae at gmail.com (CJ Carey)
Date: Sat, 7 Jan 2017 18:31:27 -0600
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQhYq-E9HSvd8yjtBsZJxwYG3jLcYtzzn4wGWE3SZaJoug@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
 <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
 <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>
 <CAB6mnxKMVbhhF2Jv-t5=Jzb4y8x_UGB+pN9EFmxbEAD5R3y9JA@mail.gmail.com>
 <CABL7CQhYq-E9HSvd8yjtBsZJxwYG3jLcYtzzn4wGWE3SZaJoug@mail.gmail.com>
Message-ID: <CAEfGn+zw+sQt8asfnk4RP03kTEHZ-PF21O7k5gHJq4ZxvnPW=w@mail.gmail.com>

I agree with Ralf; coupling these changes to sparse is a bad idea.

I think that scipy.sparse will be an important consideration during the
deprecation process, though, perhaps as an indicator of how painful the
transition might be for third party code.

I'm +1 for splitting matrices out into a standalone package.


On Jan 7, 2017 5:51 PM, "Ralf Gommers" <ralf.gommers at gmail.com> wrote:


On Sun, Jan 8, 2017 at 12:42 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sat, Jan 7, 2017 at 4:35 PM, Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>>
>>>> It looks to me like we're getting a bit off track here. The sparse
>>>> matrices in scipy are heavily used, and despite rough edges pretty good at
>>>> what they do. Deprecating them is not a goal.
>>>>
>>>> The actual goal for the exercise that started this thread (at least as
>>>> I see it) is to remove np.matrix from numpy itself so users (that don't
>>>> know the difference) will only use ndarrays. And the few users that prefer
>>>> np.matrix for teaching can now switch because of @, so their preference
>>>> should have disappeared.
>>>>
>>>> To reach that goal, no deprecation or backwards incompatible changes to
>>>> scipy.sparse are needed.
>>>>
>>>
>>> What is the way forward with sparse? That looks like the biggest blocker
>>> on the road to a matrix free NumPy. I don't see moving the matrix package
>>> elsewhere as a solution for that.
>>>
>>
>> Why not?
>>
>>
> Because it doesn't get rid of matrices in SciPy, not does one gain a
> scalar multiplication operator for sparse.
>

That's a different goal though. You can reach the "get matrix out of numpy"
goal fairly easily (docs and packaging work), but if you insist on coupling
it to major changes to scipy.sparse (a lot more work + backwards compat
break), then what will likely happen is: nothing.

Ralf


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/f879092f/attachment.html>

From charlesr.harris at gmail.com  Sat Jan  7 20:09:03 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 7 Jan 2017 18:09:03 -0700
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAEfGn+zw+sQt8asfnk4RP03kTEHZ-PF21O7k5gHJq4ZxvnPW=w@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
 <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
 <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>
 <CAB6mnxKMVbhhF2Jv-t5=Jzb4y8x_UGB+pN9EFmxbEAD5R3y9JA@mail.gmail.com>
 <CABL7CQhYq-E9HSvd8yjtBsZJxwYG3jLcYtzzn4wGWE3SZaJoug@mail.gmail.com>
 <CAEfGn+zw+sQt8asfnk4RP03kTEHZ-PF21O7k5gHJq4ZxvnPW=w@mail.gmail.com>
Message-ID: <CAB6mnx+Qzh7D-6FEmu-Ygej9X3vOObUcw84i8AuOZuMN43xyxg@mail.gmail.com>

On Sat, Jan 7, 2017 at 5:31 PM, CJ Carey <perimosocordiae at gmail.com> wrote:

> I agree with Ralf; coupling these changes to sparse is a bad idea.
>
> I think that scipy.sparse will be an important consideration during the
> deprecation process, though, perhaps as an indicator of how painful the
> transition might be for third party code.
>
> I'm +1 for splitting matrices out into a standalone package.
>

Decoupled or not, sparse still needs to be dealt with. What is the plan?

<snip>

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/d648193e/attachment.html>

From ralf.gommers at gmail.com  Sat Jan  7 20:47:51 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 8 Jan 2017 14:47:51 +1300
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CAB6mnx+Qzh7D-6FEmu-Ygej9X3vOObUcw84i8AuOZuMN43xyxg@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
 <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
 <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>
 <CAB6mnxKMVbhhF2Jv-t5=Jzb4y8x_UGB+pN9EFmxbEAD5R3y9JA@mail.gmail.com>
 <CABL7CQhYq-E9HSvd8yjtBsZJxwYG3jLcYtzzn4wGWE3SZaJoug@mail.gmail.com>
 <CAEfGn+zw+sQt8asfnk4RP03kTEHZ-PF21O7k5gHJq4ZxvnPW=w@mail.gmail.com>
 <CAB6mnx+Qzh7D-6FEmu-Ygej9X3vOObUcw84i8AuOZuMN43xyxg@mail.gmail.com>
Message-ID: <CABL7CQgqWpWoqhso_MvBhCOtNXjY8cJKAeMbG3MOBUPG415jbw@mail.gmail.com>

On Sun, Jan 8, 2017 at 2:09 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Sat, Jan 7, 2017 at 5:31 PM, CJ Carey <perimosocordiae at gmail.com>
> wrote:
>
>> I agree with Ralf; coupling these changes to sparse is a bad idea.
>>
>> I think that scipy.sparse will be an important consideration during the
>> deprecation process, though, perhaps as an indicator of how painful the
>> transition might be for third party code.
>>
>> I'm +1 for splitting matrices out into a standalone package.
>>
>
> Decoupled or not, sparse still needs to be dealt with. What is the plan?
>

My view would be:
- keep current sparse matrices as is (with improvements, like
__numpy_func__ and the various performance improvements that regularly get
done)
- once one of the sparse *array* implementations progresses far enough,
merge that and encourage people to switch over
- in the far future, once packages like scikit-learn have switched to the
new sparse arrays, the sparse matrices could potentially also be split off
as a separate package, in the same way as we did for weave and now can do
for npmatrix.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170108/f638531a/attachment.html>

From perimosocordiae at gmail.com  Sat Jan  7 23:00:25 2017
From: perimosocordiae at gmail.com (CJ Carey)
Date: Sat, 7 Jan 2017 22:00:25 -0600
Subject: [Numpy-discussion] Deprecating matrices.
In-Reply-To: <CABL7CQgqWpWoqhso_MvBhCOtNXjY8cJKAeMbG3MOBUPG415jbw@mail.gmail.com>
References: <CAB6mnxL8Xu7unUBmhCCtuff_=WsUh7+veBMkTVBHsAK=45t7SA@mail.gmail.com>
 <CAFpSVpKtSkz_CT7KKExsTLRxgUrxKrADzQbSvOW6TXwdqTbw+w@mail.gmail.com>
 <CANNq6F=eVL3RWjFq3HtKgyFY3m2HF4Z7myW2rE3Sp3JVqD4fSw@mail.gmail.com>
 <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io>
 <CABL7CQjZUGbJkW93w4kjG=8XcNsqCNw1_X=4eQWDWsUL5rNerQ@mail.gmail.com>
 <CAEfGn+wASxQMa0Dck1hg4t_xQtHYEc8jEw3AA-QBU5bhv58k4g@mail.gmail.com>
 <CABL7CQh1cf7-9xuYceN9Y=4+PWH7DEz=AeWfQFht7YWw2EQ5og@mail.gmail.com>
 <CAFpSVpLb48Sx1_De0w_NL71=Ez870AaSi79eQBUtuTZa-KZ9Pg@mail.gmail.com>
 <CABL7CQjXBek=prPh_oiL1bNfA0cE=_pyO-KLcEZzBp7q8QD-bQ@mail.gmail.com>
 <CAB6mnx+4YHurvawvWpqZxbm4WXE5JkT0b4cT_BZib8QeCqt5hw@mail.gmail.com>
 <CABL7CQipswZ1D_0_A9Z4cR3Wn0Zp_yo1Y+HFf4mCtW3TXmz0iQ@mail.gmail.com>
 <CAB6mnxKMVbhhF2Jv-t5=Jzb4y8x_UGB+pN9EFmxbEAD5R3y9JA@mail.gmail.com>
 <CABL7CQhYq-E9HSvd8yjtBsZJxwYG3jLcYtzzn4wGWE3SZaJoug@mail.gmail.com>
 <CAEfGn+zw+sQt8asfnk4RP03kTEHZ-PF21O7k5gHJq4ZxvnPW=w@mail.gmail.com>
 <CAB6mnx+Qzh7D-6FEmu-Ygej9X3vOObUcw84i8AuOZuMN43xyxg@mail.gmail.com>
 <CABL7CQgqWpWoqhso_MvBhCOtNXjY8cJKAeMbG3MOBUPG415jbw@mail.gmail.com>
Message-ID: <CAEfGn+xv1TM120D8Zb4LiREgWK2XcVp0OvrFLkPifU4yJCqXbA@mail.gmail.com>

> Decoupled or not, sparse still needs to be dealt with. What is the plan?
>

My view would be:
- keep current sparse matrices as is (with improvements, like
__numpy_func__ and the various performance improvements that regularly get
done)
- once one of the sparse *array* implementations progresses far enough,
merge that and encourage people to switch over
- in the far future, once packages like scikit-learn have switched to the
new sparse arrays, the sparse matrices could potentially also be split off
as a separate package, in the same way as we did for weave and now can do
for npmatrix.


I think that's the best way forward as well. This can happen independently
of numpy matrix changes, and doesn't leave users with silently broken code.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170107/9c208e20/attachment.html>

From ilhanpolat at gmail.com  Sun Jan  8 10:17:03 2017
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Sun, 8 Jan 2017 16:17:03 +0100
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
Message-ID: <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>

Hi everyone,

I was stalking the deprecating the numpy.matrix discussion on the other
thread and I wondered maybe the mailing list is a better place for the
discussion about something I've been meaning to ask the dev members. I
thought mailing lists are something we dumped using together with ICQ and
geocities stuff but apparently not :-)

Anyways, first thing is first: I have been in need of the ill-conditioned
warning behavior of matlab (and possibly other software suites) for my own
work. So I looked around in the numpy issues and found
https://github.com/numpy/numpy/issues/3755 some time ago. Then I've learned
from @rkern that there were C translations involved in the numpy source and
frankly I couldn't even find the entry point of how the project is
structured so I've switched to SciPy side where things are a bit more
convenient. Next to teaching me more about f2py machinery, I have noticed
that the linear algebra module is a bit less competitive than the usual
level of scipy though it is definitely a personal opinion.

So in order to get the ill-conditioning (or at least the condition number)
I've wrapped up a PR using the expert routines of LAPACK (which is I think
ready to merge) but still it is far from the contemporary software
convenience that you generally get.

https://github.com/scipy/scipy/pull/6775

The "assume_a" keyword introduced here is hopefully modular enough that
should there be any need for more structures we can simply keep adding to
the list without any backwards compatibility. It will be at least offering
more options than what we have currently. The part that I would like to
discuss requires a bit of intro so please bear with me. Let me copy/paste
the part from the old PR:

  Around many places online, we can witness the rant about numpy/scipy not
letting the users know about the conditioning for example Mike Croucher's
blog <http://www.walkingrandomly.com/?p=5092> and numpy/numpy#3755
<https://github.com/numpy/numpy/issues/3755>

  Since we don't have any central backslash function that optimizes
depending on the input data, should we create a function, let's name it
with the matlab equivalent for the time being linsolve such that it
automatically calls for the right solver? This way, we can avoid writing
new functions for each LAPACK driver . As a reference here is a SO thread
<http://stackoverflow.com/questions/18553210/how-to-implement-matlabs-mldivide-a-k-a-the-backslash-operator>
that summarizes the linsolve
<http://nl.mathworks.com/help/matlab/ref/linsolve.html> functionality.


I'm sure you are aware, but just for completeness, the linear equation
solvers are often built around the concept of polyalgorithm which is a
fancy way of saying that the array is tested consecutively for certain
structures and the checks are ordered in such a way that the simpler
structure is tested the sooner. E.g. first check for diagonal matrix, then
for upper/lower triangular then permuted triangular then symmetrical and so
on. Here is also another example from AdvanPix http://www.advanpix.com/2016/
10/07/architecture-of-linear-systems-solver/

Now, according to what I have coded and optimized as much as I can, a pure
Python is not acceptable as an overhead during these checks. It would
definitely be a noticeable slowdown if this was in place in the existing
linalg.solve however I think this is certainly doable in the low-level
C/FORTRAN level. CPython is certainly faster but I think only a straight
C/FORTRAN implementation would cut it. Note that we only need the discovery
of the structure then we can pass to the dedicated solver as is. Hence I'm
not saying that we should recode the existing solve functionality. We
already have the branching in place to ?GE/SY/HE/POSVX routines.

-------

The second issue about the current linalg.solve function is when trying to
solve for right inverse e.g. xA = b. Again with some copy/paste: The right
inversion is currently a bit annoying, that is to say if we would like to
compute, say, BA^{-1}, then the user has to explicitly transpose the
explicitly transposed equation to avoid using an explicit inv(whose use
should be discouraged anyways)
x = scipy.linalg.solve(A.T, B.T).T.

Since expert drivers come with a trans switch that can internally handle
whether to solve the transposed or the regular equation, these routines
avoid the A.T off-the-shelf. I am wondering what might be the best way to
add a "r_inv" keyword such that the B.T is also handled at the FORTRAN
level instead such that the user can simply write "solve(A,B, r_inv=True)".
Because we don't have a backslash operation we could at least provide this
much as convenience I guess.

I would love to have go at it but I'm definitely not competent enough in
C/FORTRAN at the production level so I was wondering whether I could get
some help about this. Anyways, I hope I could make my point with a rather
lengthy post. Please let me know if this is a plausible feature

ilhan

PS: In case gmail links won't be parsed, here are the inline links

MC blog: http://www.walkingrandomly.com/?p=5092
SO thread : http://stackoverflow.com/questions/18553210/how-to-
implement-matlabs-mldivide-a-k-a-the-backslash-operator
linsolve/mldivide page : http://nl.mathworks.com/help/
matlab/ref/mldivide.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170108/9a3618b6/attachment.html>

From ralf.gommers at gmail.com  Mon Jan  9 05:33:56 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 9 Jan 2017 23:33:56 +1300
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
Message-ID: <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>

On Mon, Jan 9, 2017 at 4:17 AM, Ilhan Polat <ilhanpolat at gmail.com> wrote:

>
> Hi everyone,
>
> I was stalking the deprecating the numpy.matrix discussion on the other
> thread and I wondered maybe the mailing list is a better place for the
> discussion about something I've been meaning to ask the dev members. I
> thought mailing lists are something we dumped using together with ICQ and
> geocities stuff but apparently not :-)
>
> Anyways, first thing is first: I have been in need of the ill-conditioned
> warning behavior of matlab (and possibly other software suites) for my own
> work. So I looked around in the numpy issues and found
> https://github.com/numpy/numpy/issues/3755 some time ago. Then I've
> learned from @rkern that there were C translations involved in the numpy
> source and frankly I couldn't even find the entry point of how the project
> is structured so I've switched to SciPy side where things are a bit more
> convenient. Next to teaching me more about f2py machinery, I have noticed
> that the linear algebra module is a bit less competitive than the usual
> level of scipy though it is definitely a personal opinion.
>
> So in order to get the ill-conditioning (or at least the condition number)
> I've wrapped up a PR using the expert routines of LAPACK (which is I think
> ready to merge) but still it is far from the contemporary software
> convenience that you generally get.
>
> https://github.com/scipy/scipy/pull/6775
>
> The "assume_a" keyword introduced here is hopefully modular enough that
> should there be any need for more structures we can simply keep adding to
> the list without any backwards compatibility. It will be at least offering
> more options than what we have currently. The part that I would like to
> discuss requires a bit of intro so please bear with me. Let me copy/paste
> the part from the old PR:
>
>   Around many places online, we can witness the rant about numpy/scipy not
> letting the users know about the conditioning for example Mike Croucher's
> blog <http://www.walkingrandomly.com/?p=5092> and numpy/numpy#3755
> <https://github.com/numpy/numpy/issues/3755>
>
>   Since we don't have any central backslash function that optimizes
> depending on the input data, should we create a function, let's name it
> with the matlab equivalent for the time being linsolve such that it
> automatically calls for the right solver? This way, we can avoid writing
> new functions for each LAPACK driver . As a reference here is a SO thread
> <http://stackoverflow.com/questions/18553210/how-to-implement-matlabs-mldivide-a-k-a-the-backslash-operator>
> that summarizes the linsolve
> <http://nl.mathworks.com/help/matlab/ref/linsolve.html> functionality.
>

Note that you're proposing a new scipy feature (right?) on the numpy
list....

This sounds like a good idea to me. As a former heavy Matlab user I
remember a lot of things to dislike, but "\" behavior was quite nice.


> I'm sure you are aware, but just for completeness, the linear equation
> solvers are often built around the concept of polyalgorithm which is a
> fancy way of saying that the array is tested consecutively for certain
> structures and the checks are ordered in such a way that the simpler
> structure is tested the sooner. E.g. first check for diagonal matrix, then
> for upper/lower triangular then permuted triangular then symmetrical and so
> on. Here is also another example from AdvanPix
> http://www.advanpix.com/2016/10/07/architecture-of-linear-systems-solver/
>
> Now, according to what I have coded and optimized as much as I can, a pure
> Python is not acceptable as an overhead during these checks. It would
> definitely be a noticeable slowdown if this was in place in the existing
> linalg.solve however I think this is certainly doable in the low-level
> C/FORTRAN level.
>

How much is a noticeable slowdown? Note that we still have the current
interfaces available for users that know what they need, so a nice
convenience function that is say 5-10% slower would not be the end of the
world.

Ralf


> CPython is certainly faster but I think only a straight C/FORTRAN
> implementation would cut it. Note that we only need the discovery of the
> structure then we can pass to the dedicated solver as is. Hence I'm not
> saying that we should recode the existing solve functionality. We already
> have the branching in place to ?GE/SY/HE/POSVX routines.
>
> -------
>
> The second issue about the current linalg.solve function is when trying to
> solve for right inverse e.g. xA = b. Again with some copy/paste: The right
> inversion is currently a bit annoying, that is to say if we would like to
> compute, say, BA^{-1}, then the user has to explicitly transpose the
> explicitly transposed equation to avoid using an explicit inv(whose use
> should be discouraged anyways)
> x = scipy.linalg.solve(A.T, B.T).T.
>
> Since expert drivers come with a trans switch that can internally handle
> whether to solve the transposed or the regular equation, these routines
> avoid the A.T off-the-shelf. I am wondering what might be the best way to
> add a "r_inv" keyword such that the B.T is also handled at the FORTRAN
> level instead such that the user can simply write "solve(A,B, r_inv=True)".
> Because we don't have a backslash operation we could at least provide this
> much as convenience I guess.
>
> I would love to have go at it but I'm definitely not competent enough in
> C/FORTRAN at the production level so I was wondering whether I could get
> some help about this. Anyways, I hope I could make my point with a rather
> lengthy post. Please let me know if this is a plausible feature
>
> ilhan
>
> PS: In case gmail links won't be parsed, here are the inline links
>
> MC blog: http://www.walkingrandomly.com/?p=5092
> SO thread : http://stackoverflow.com/questions/18553210/how-to-implement
> -matlabs-mldivide-a-k-a-the-backslash-operator
> linsolve/mldivide page : http://nl.mathworks.com/help/m
> atlab/ref/mldivide.html
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170109/84c2fd2a/attachment.html>

From ilhanpolat at gmail.com  Mon Jan  9 06:27:57 2017
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Mon, 9 Jan 2017 12:27:57 +0100
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
 <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
Message-ID: <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>

> Note that you're proposing a new scipy feature (right?) on the numpy
list....

> This sounds like a good idea to me. As a former heavy Matlab user I
remember a lot of things to dislike, but "\" behavior was quite nice.

Correct, I am not sure where this might go in. It seemed like a NumPy array
operation (touching array elements rapidly etc. can also be added for
similar functionalities other than solve) hence the NumPy list. But of
course it can be pushed as an exclusive SciPy feature. I'm not sure what
the outlook on np.linalg.solve is.


> How much is a noticeable slowdown? Note that we still have the current
interfaces available for users that know what they need, so a nice
convenience function that is say 5-10% slower would not be the end of the
world.

the fastest case was around 150-400% slower but of course it might be the
case that I'm not using the fastest methods. It was mostly shuffling things
around and using np.any on them in the pure python3 case. I will cook up
something again for the baseline as soon as I have time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170109/b644a14c/attachment.html>

From bryanv at continuum.io  Mon Jan  9 14:08:28 2017
From: bryanv at continuum.io (Bryan Van de Ven)
Date: Mon, 9 Jan 2017 13:08:28 -0600
Subject: [Numpy-discussion] ANN: Bokeh 0.12.4 Released
Message-ID: <F41CB491-8E09-4D90-9892-18F6CA5B2330@continuum.io>

Hi all,

On behalf of the Bokeh team, I am pleased to announce the release of version 0.12.4 of Bokeh!

Please see the announcement post at:

	https://bokeh.github.io/blog/2017/1/6/release-0-12-4/

which has more information as well as live demonstrations. 

If you are using Anaconda/miniconda, you can install it with conda:

	conda install -c bokeh bokeh

Alternatively, you can also install it with pip:

	pip install bokeh

Full information including details about how to use and obtain BokehJS are at:

	http://bokeh.pydata.org/en/0.12.4/docs/installation.html

Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/bokeh/bokeh

Documentation is available at http://bokeh.pydata.org/en/0.12.4

There are over 200 total contributors to Bokeh and their time and effort help make Bokeh such an amazing project and community. Thank you again for your contributions. 

Finally (as always), for questions, technical assistance or if you're interested in contributing, questions can be directed to the Bokeh mailing list: bokeh at continuum.io or the Gitter Chat room: https://gitter.im/bokeh/bokeh

Thanks,

Bryan Van de Ven

From josef.pktd at gmail.com  Mon Jan  9 14:30:20 2017
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Mon, 9 Jan 2017 14:30:20 -0500
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
 <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
 <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>
Message-ID: <CAMMTP+Dk58Kw1UJ8vtrqC5O+Qmb72Wc-u2unEMpEdXtZy=t65g@mail.gmail.com>

On Mon, Jan 9, 2017 at 6:27 AM, Ilhan Polat <ilhanpolat at gmail.com> wrote:

> > Note that you're proposing a new scipy feature (right?) on the numpy
> list....
>
> > This sounds like a good idea to me. As a former heavy Matlab user I
> remember a lot of things to dislike, but "\" behavior was quite nice.
>
> Correct, I am not sure where this might go in. It seemed like a NumPy
> array operation (touching array elements rapidly etc. can also be added for
> similar functionalities other than solve) hence the NumPy list. But of
> course it can be pushed as an exclusive SciPy feature. I'm not sure what
> the outlook on np.linalg.solve is.
>
>
> > How much is a noticeable slowdown? Note that we still have the current
> interfaces available for users that know what they need, so a nice
> convenience function that is say 5-10% slower would not be the end of the
> world.
>
> the fastest case was around 150-400% slower but of course it might be the
> case that I'm not using the fastest methods. It was mostly shuffling things
> around and using np.any on them in the pure python3 case. I will cook up
> something again for the baseline as soon as I have time.
>
>
>
All this checks sound a bit expensive, if we have almost always completely
unstructured arrays that don't satisfy any special matrix pattern.

In analogy to the type proliferation in Julia to handle those cases: Is
there a way to attach information to numpy arrays that for example signals
that a 2d array is hermitian, banded or diagonal or ...?

(After second thought: maybe completely unstructured is not too expensive
to detect if the checks are short-circuited, one off diagonal element
nonzero - not diagonal, two opposite diagonal different - not symmetric,
...)

Josef


>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170109/860e6e17/attachment.html>

From ilhanpolat at gmail.com  Mon Jan  9 20:09:48 2017
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Tue, 10 Jan 2017 02:09:48 +0100
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAMMTP+Dk58Kw1UJ8vtrqC5O+Qmb72Wc-u2unEMpEdXtZy=t65g@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
 <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
 <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>
 <CAMMTP+Dk58Kw1UJ8vtrqC5O+Qmb72Wc-u2unEMpEdXtZy=t65g@mail.gmail.com>
Message-ID: <CAEBuzr9H3OsrdTKHn7+wE3eQJerRkMxDWN06cF980MXa5dFAnA@mail.gmail.com>

Indeed, generic is the cheapest discovery including the worst case that
only the last off-diagonal element is nonzero, a pseudo code is first
remove the diagonals check the remaining parts for nonzero, then check the
upper triangle then lower, then morally triangularness from zero structure
if any then bandedness and so on. If you have access to matlab, then you
can set the sparse monitor to verbose mode " spparms('spumoni', 1) " and
perform a backslash operation on sparse matrices. It will spit out what it
does during the checks.

    A = sparse([0 2 0 1 0; 4 -1 -1 0 0; 0 0 0 3 -6; -2 0 0 0 2; 0 0 4 2 0]);
    B = sparse([8; -1; -18; 8; 20]);
    spparms('spumoni',1)
    x = A\B

So every test in the polyalgorithm is cheaper than the next one. I'm not
exactly sure what might be the best strategy yet hence the question. It's
really interesting that LAPACK doesn't have this type of fast checks.

On Mon, Jan 9, 2017 at 8:30 PM, <josef.pktd at gmail.com> wrote:

>
>
> On Mon, Jan 9, 2017 at 6:27 AM, Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
>> > Note that you're proposing a new scipy feature (right?) on the numpy
>> list....
>>
>> > This sounds like a good idea to me. As a former heavy Matlab user I
>> remember a lot of things to dislike, but "\" behavior was quite nice.
>>
>> Correct, I am not sure where this might go in. It seemed like a NumPy
>> array operation (touching array elements rapidly etc. can also be added for
>> similar functionalities other than solve) hence the NumPy list. But of
>> course it can be pushed as an exclusive SciPy feature. I'm not sure what
>> the outlook on np.linalg.solve is.
>>
>>
>> > How much is a noticeable slowdown? Note that we still have the current
>> interfaces available for users that know what they need, so a nice
>> convenience function that is say 5-10% slower would not be the end of the
>> world.
>>
>> the fastest case was around 150-400% slower but of course it might be the
>> case that I'm not using the fastest methods. It was mostly shuffling things
>> around and using np.any on them in the pure python3 case. I will cook up
>> something again for the baseline as soon as I have time.
>>
>>
>>
> All this checks sound a bit expensive, if we have almost always completely
> unstructured arrays that don't satisfy any special matrix pattern.
>
> In analogy to the type proliferation in Julia to handle those cases: Is
> there a way to attach information to numpy arrays that for example signals
> that a 2d array is hermitian, banded or diagonal or ...?
>
> (After second thought: maybe completely unstructured is not too expensive
> to detect if the checks are short-circuited, one off diagonal element
> nonzero - not diagonal, two opposite diagonal different - not symmetric,
> ...)
>
> Josef
>
>
>
>
>>
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170110/c645e13f/attachment.html>

From robert.kern at gmail.com  Mon Jan  9 20:29:25 2017
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 9 Jan 2017 17:29:25 -0800
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAEBuzr9H3OsrdTKHn7+wE3eQJerRkMxDWN06cF980MXa5dFAnA@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
 <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
 <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>
 <CAMMTP+Dk58Kw1UJ8vtrqC5O+Qmb72Wc-u2unEMpEdXtZy=t65g@mail.gmail.com>
 <CAEBuzr9H3OsrdTKHn7+wE3eQJerRkMxDWN06cF980MXa5dFAnA@mail.gmail.com>
Message-ID: <CAF6FJitAQ2qdCKAB83SY50QH8if3Zi0UtBTxrYj_A4MSupzAxw@mail.gmail.com>

On Mon, Jan 9, 2017 at 5:09 PM, Ilhan Polat <ilhanpolat at gmail.com> wrote:

> So every test in the polyalgorithm is cheaper than the next one. I'm not
exactly sure what might be the best strategy yet hence the question. It's
really interesting that LAPACK doesn't have this type of fast checks.

In Fortran LAPACK, if you have a special structured matrix, you usually
explicitly use packed storage and call the appropriate function type on it.
It's only when you go to a system that only has a generic, unstructured
dense matrix data type that it makes sense to do those kinds of checks.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170109/596867ea/attachment.html>

From harrigan.matthew at gmail.com  Mon Jan  9 20:55:03 2017
From: harrigan.matthew at gmail.com (Matthew Harrigan)
Date: Mon, 9 Jan 2017 20:55:03 -0500
Subject: [Numpy-discussion] From Python to Numpy
In-Reply-To: <4C818029-F4B8-4893-8E3E-42C24221EC49@inria.fr>
References: <C83AAA15-79DE-41C5-A327-36B11EE87394@inria.fr>
 <A19359C4-55CD-455F-AD79-91B8BC2F92B3@gmail.com>
 <4C818029-F4B8-4893-8E3E-42C24221EC49@inria.fr>
Message-ID: <CAOfRF=j5DXPkF91X3aR3kppT390QukW_H=+KXMfADzmBjdgMCw@mail.gmail.com>

I also have been stalking this email thread.  First, excellent book!

Regarding the vectorization example mentioned above, one thing to note is
that it increases the order of the algorithm relative to the pure python.
The vectorized approach uses correlate, which requires ~(len(seq) *
len(sub)) FLOPs.  In the case where the first element in sub is not equal
to the vast majority of elements in seq, the basic approach requires
~len(seq) comparisons.  Note that is the case in the SO answer.  One fairly
common thing I have seen in vectorized approaches is that the memory or
operations required scales worse than strictly required.  It may or may not
be an issue, largely depends on the specifics of how its used, but it
usually indicates a better approach exists.  That may be worth mentioning
here.

Given that, I tried to come up with an "ideal" approach.  stride_tricks can
be used to convert seq to a 2D array, and then ideally each row could be
compared to sub.  However I can't think of how to do that with numpy
function calls other than compare each element in the 2D array, requiring
O(n_sub*n_seq) operations again.  array_equal
<https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html>
is an example of that.  Python list equality scales better, for instance if
x = [0]*n and y = [1]*n, x == y is very fast and the time is independent of
the value of n.

It seems a generalized ufunc "all_equal" with signature (i),(i)->() and
short circuit logic once the first non equal element is encountered would
be an important performance improvement.  In the ideal case it is
dramatically faster, and even if every element must be compared then its
still probably meaningfully faster since the boolean intermediate array
isn't created.  Even better would be to get the axis argument in place for
generalized ufuncs.  Then this problem could be vectorized in one line with
far better performance.  If others think this is a good idea I will post an
issue and attempt a solution.


On Sat, Dec 31, 2016 at 5:23 AM, Nicolas P. Rougier <
Nicolas.Rougier at inria.fr> wrote:

>
> > I?ve seen vectorisation taken to the extreme, with negative consequences
> in terms of both speed and readability, in both Python and MATLAB
> codebases, so I would suggest some discussion / wisdom about when not to
> vectorise.
>
>
> I agree and there is actually a warning in the introduction about
> readability vs speed with an example showing a clever optimization (by
> Jaime Fern?ndez del R?o) that is hardly readable for the non-experts
> (including myself).
>
>
> Nicolas
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170109/3c0da8da/attachment.html>

From ilhanpolat at gmail.com  Mon Jan  9 22:10:13 2017
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Tue, 10 Jan 2017 04:10:13 +0100
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAF6FJitAQ2qdCKAB83SY50QH8if3Zi0UtBTxrYj_A4MSupzAxw@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
 <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
 <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>
 <CAMMTP+Dk58Kw1UJ8vtrqC5O+Qmb72Wc-u2unEMpEdXtZy=t65g@mail.gmail.com>
 <CAEBuzr9H3OsrdTKHn7+wE3eQJerRkMxDWN06cF980MXa5dFAnA@mail.gmail.com>
 <CAF6FJitAQ2qdCKAB83SY50QH8if3Zi0UtBTxrYj_A4MSupzAxw@mail.gmail.com>
Message-ID: <CAEBuzr9R0SGTi+2gB52R5BF1sVKahkD51gJV3BxNnp8AR1tqWA@mail.gmail.com>

Yes, that's precisely the case but when we know the structure we can just
choose the appropriate solver anyhow with a little bit of overhead. What I
mean is that, to my knowledge, FORTRAN routines for checking for
triangularness etc. are absent.

On Tue, Jan 10, 2017 at 2:29 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Mon, Jan 9, 2017 at 5:09 PM, Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
> > So every test in the polyalgorithm is cheaper than the next one. I'm not
> exactly sure what might be the best strategy yet hence the question. It's
> really interesting that LAPACK doesn't have this type of fast checks.
>
> In Fortran LAPACK, if you have a special structured matrix, you usually
> explicitly use packed storage and call the appropriate function type on it.
> It's only when you go to a system that only has a generic, unstructured
> dense matrix data type that it makes sense to do those kinds of checks.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170110/1570fe3d/attachment.html>

From robert.kern at gmail.com  Mon Jan  9 22:16:33 2017
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 9 Jan 2017 19:16:33 -0800
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAEBuzr9R0SGTi+2gB52R5BF1sVKahkD51gJV3BxNnp8AR1tqWA@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
 <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
 <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>
 <CAMMTP+Dk58Kw1UJ8vtrqC5O+Qmb72Wc-u2unEMpEdXtZy=t65g@mail.gmail.com>
 <CAEBuzr9H3OsrdTKHn7+wE3eQJerRkMxDWN06cF980MXa5dFAnA@mail.gmail.com>
 <CAF6FJitAQ2qdCKAB83SY50QH8if3Zi0UtBTxrYj_A4MSupzAxw@mail.gmail.com>
 <CAEBuzr9R0SGTi+2gB52R5BF1sVKahkD51gJV3BxNnp8AR1tqWA@mail.gmail.com>
Message-ID: <CAF6FJiu+2XrS3ZKft-=v8RcTEZn6vdoJNfXqiZKg49Q83m5EAw@mail.gmail.com>

On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
> Yes, that's precisely the case but when we know the structure we can just
choose the appropriate solver anyhow with a little bit of overhead. What I
mean is that, to my knowledge, FORTRAN routines for checking for
triangularness etc. are absent.

I'm responding to that. The reason that they don't have those FORTRAN
routines for testing for structure inside of a generic dense matrix is that
in FORTRAN it's more natural (and efficient) to just use the explicit
packed structure and associated routines instead. You would only use a
generic dense matrix if you know that there isn't structure in the matrix.
So there are no routines for detecting that structure in generic dense
matrices.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170109/0d74a4cf/attachment.html>

From ilhanpolat at gmail.com  Tue Jan 10 05:58:17 2017
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Tue, 10 Jan 2017 11:58:17 +0100
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAF6FJiu+2XrS3ZKft-=v8RcTEZn6vdoJNfXqiZKg49Q83m5EAw@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
 <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
 <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>
 <CAMMTP+Dk58Kw1UJ8vtrqC5O+Qmb72Wc-u2unEMpEdXtZy=t65g@mail.gmail.com>
 <CAEBuzr9H3OsrdTKHn7+wE3eQJerRkMxDWN06cF980MXa5dFAnA@mail.gmail.com>
 <CAF6FJitAQ2qdCKAB83SY50QH8if3Zi0UtBTxrYj_A4MSupzAxw@mail.gmail.com>
 <CAEBuzr9R0SGTi+2gB52R5BF1sVKahkD51gJV3BxNnp8AR1tqWA@mail.gmail.com>
 <CAF6FJiu+2XrS3ZKft-=v8RcTEZn6vdoJNfXqiZKg49Q83m5EAw@mail.gmail.com>
Message-ID: <CAEBuzr9dQ95-YF5bDgW8Z5Bwk5dxPWjP4DCT75UF2ry1ohHySA@mail.gmail.com>

I've done some benchmarking and it seems that the packed storage comes with
a runtime penalty which agrees with a few links I've found online

https://blog.debroglie.net/2013/09/01/lapack-and-packed-storage/
http://stackoverflow.com/questions/8941678/lapack-are-operations-on-packed-storage-matrices-faster

The access of individual elements in packed stored matrices is expected to
be more costly than in full storage, because of the more complicated
indexing necessary. Hence, I am not sure if this justifies the absence just
by having a dedicated solver for a prescribed structure.

Existence of these polyalgorithms in matlab and not having in lapack should
not imply FORTRAN users always know the structure in their matrices. I will
also ask in LAPACK message board about this for some context.

But thanks tough. As usual there is more to it than meets the eye probably,
ilhan


On Tue, Jan 10, 2017 at 4:16 AM, Robert Kern <robert.kern at gmail.com> wrote:

> On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat <ilhanpolat at gmail.com> wrote:
> >
> > Yes, that's precisely the case but when we know the structure we can
> just choose the appropriate solver anyhow with a little bit of overhead.
> What I mean is that, to my knowledge, FORTRAN routines for checking for
> triangularness etc. are absent.
>
> I'm responding to that. The reason that they don't have those FORTRAN
> routines for testing for structure inside of a generic dense matrix is that
> in FORTRAN it's more natural (and efficient) to just use the explicit
> packed structure and associated routines instead. You would only use a
> generic dense matrix if you know that there isn't structure in the matrix.
> So there are no routines for detecting that structure in generic dense
> matrices.
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170110/010f080f/attachment.html>

From perimosocordiae at gmail.com  Tue Jan 10 12:26:35 2017
From: perimosocordiae at gmail.com (CJ Carey)
Date: Tue, 10 Jan 2017 11:26:35 -0600
Subject: [Numpy-discussion] Fwd: Backslash operator A\b and
	np/sp.linalg.solve
In-Reply-To: <CAEBuzr9dQ95-YF5bDgW8Z5Bwk5dxPWjP4DCT75UF2ry1ohHySA@mail.gmail.com>
References: <CAEBuzr_QqPn9WX03ZWjPk_KgZVS2ihihc+DtQ2KsYFS2UwDxTw@mail.gmail.com>
 <CAEBuzr_1dR=PL_R_vf9viLOEjF=OBEw0repwqUy2YNR6zVB12g@mail.gmail.com>
 <CABL7CQg_8Le=YZ3wFxKuo_Tqru1CXkzwG64482Jhp7gEESd=jA@mail.gmail.com>
 <CAEBuzr8R3UmPgXcfH9iK026wyCGiv6PSZwZDAx58HyFnJcU7-A@mail.gmail.com>
 <CAMMTP+Dk58Kw1UJ8vtrqC5O+Qmb72Wc-u2unEMpEdXtZy=t65g@mail.gmail.com>
 <CAEBuzr9H3OsrdTKHn7+wE3eQJerRkMxDWN06cF980MXa5dFAnA@mail.gmail.com>
 <CAF6FJitAQ2qdCKAB83SY50QH8if3Zi0UtBTxrYj_A4MSupzAxw@mail.gmail.com>
 <CAEBuzr9R0SGTi+2gB52R5BF1sVKahkD51gJV3BxNnp8AR1tqWA@mail.gmail.com>
 <CAF6FJiu+2XrS3ZKft-=v8RcTEZn6vdoJNfXqiZKg49Q83m5EAw@mail.gmail.com>
 <CAEBuzr9dQ95-YF5bDgW8Z5Bwk5dxPWjP4DCT75UF2ry1ohHySA@mail.gmail.com>
Message-ID: <CAEfGn+zTN3NoHJNYNGzA8OApCb4CT=8YFM5o+kd-qyOSwJYzkQ@mail.gmail.com>

I agree that this seems more like a scipy feature than a numpy feature.

Users with structured matrices often use a sparse matrix format, though the
API for using them in solvers could use some work. (I have a
work-in-progress PR along those lines here:
https://github.com/scipy/scipy/pull/6331)

Perhaps this polyalgorithm approach could be used to dispatch sparse
matrices to the appropriate solver, while optionally checking dense
matrices for structure before dispatching them as well. Usage might look
like:

# if A is sparse, use scipy.sparse.linalg.solve, otherwise use
scipy.linalg.solve
scipy.linalg.generic_solve(A, b)

# converts A to banded representation and calls scipy.linalg.solveh_banded,
regardless if A is sparse or dense
scipy.linalg.generic_solve(A, b, symmetric=True, banded=(-5, 5))

# runs possibly-expensive checks, then dispatches to the appropriate solver
scipy.linalg.generic_solve(A, b, detect_structure=True)


(I'm not advocating for "generic_solve" as the final name, I just needed a
placeholder.)

On Tue, Jan 10, 2017 at 4:58 AM, Ilhan Polat <ilhanpolat at gmail.com> wrote:

> I've done some benchmarking and it seems that the packed storage comes
> with a runtime penalty which agrees with a few links I've found online
>
> https://blog.debroglie.net/2013/09/01/lapack-and-packed-storage/
> http://stackoverflow.com/questions/8941678/lapack-are-
> operations-on-packed-storage-matrices-faster
>
> The access of individual elements in packed stored matrices is expected to
> be more costly than in full storage, because of the more complicated
> indexing necessary. Hence, I am not sure if this justifies the absence just
> by having a dedicated solver for a prescribed structure.
>
> Existence of these polyalgorithms in matlab and not having in lapack
> should not imply FORTRAN users always know the structure in their matrices.
> I will also ask in LAPACK message board about this for some context.
>
> But thanks tough. As usual there is more to it than meets the eye
> probably,
> ilhan
>
>
>
>
> On Tue, Jan 10, 2017 at 4:16 AM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat <ilhanpolat at gmail.com> wrote:
>> >
>> > Yes, that's precisely the case but when we know the structure we can
>> just choose the appropriate solver anyhow with a little bit of overhead.
>> What I mean is that, to my knowledge, FORTRAN routines for checking for
>> triangularness etc. are absent.
>>
>> I'm responding to that. The reason that they don't have those FORTRAN
>> routines for testing for structure inside of a generic dense matrix is that
>> in FORTRAN it's more natural (and efficient) to just use the explicit
>> packed structure and associated routines instead. You would only use a
>> generic dense matrix if you know that there isn't structure in the matrix.
>> So there are no routines for detecting that structure in generic dense
>> matrices.
>>
>> --
>> Robert Kern
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170110/cb7d8ade/attachment.html>

From alebarde at gmail.com  Tue Jan 10 19:28:27 2017
From: alebarde at gmail.com (alebarde at gmail.com)
Date: Wed, 11 Jan 2017 01:28:27 +0100
Subject: [Numpy-discussion] making np.gradient support unevenly spaced data
Message-ID: <CAOaM08iOCSYTZT9QUL=PrBB0_KpKZZuNPakqt8sNS0RLrprg2A@mail.gmail.com>

Hi all,

I have implemented a proposed enhancement for the np.gradient function that
allows to compute the gradient on non uniform grids. (PR:
https://github.com/numpy/numpy/pull/8446)
The proposed implementation has a behaviour/signature is similar to that of
Matlab/Octave. As argument it can take:
1. A single scalar to specify a sample distance for all dimensions.
2. N scalars to specify a constant sample distance for each dimension.
   i.e. `dx`, `dy`, `dz`, ...
3. N arrays to specify the coordinates of the values along each
   dimension of F. The length of the array must match the size of
   the corresponding dimension
4. Any combination of N scalars/arrays with the meaning of 2. and 3.

e.g., you can do the following:

    >>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)
    >>> dx = 2.
    >>> y = [1., 1.5, 3.5]
    >>> np.gradient(f, dx, y)
    [array([[ 1. ,  1. , -0.5], [ 1. ,  1. , -0.5]]),
     array([[ 2. ,  2. ,  2. ], [ 2. ,  1.7,  0.5]])]

It should not break any existing code since as of 1.12 only scalars or list
of scalars are allowed.
A possible alternative API could be pass arrays of sampling steps instead
of the coordinates.
On the one hand, this would have the advantage of having "differences" both
in the scalar case and in the array case.
On the other hand, if you are dealing with non uniformly-spaced data (e.g,
data is mapped on a grid or it is a time-series), in most cases you already
have the coordinates/timestamps. Therefore, in the case of difference as
argument, you would almost always have a call np.diff before np.gradient.

In the end, I would rather prefer the coordinates option since IMHO it is
more handy, I don't think that would be too much "surprising" and it is
what Matlab already does. Also, it could not easily lead to "silly"
mistakes since the length have to match the size of the corresponding
dimension.

What do you think?

Thanks

Alessandro

-- 
--------------------------------------------------------------------------
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170111/0587d8c1/attachment.html>

From chris.barker at noaa.gov  Tue Jan 10 20:27:07 2017
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Tue, 10 Jan 2017 17:27:07 -0800
Subject: [Numpy-discussion] From Python to Numpy
In-Reply-To: <CAOfRF=j5DXPkF91X3aR3kppT390QukW_H=+KXMfADzmBjdgMCw@mail.gmail.com>
References: <C83AAA15-79DE-41C5-A327-36B11EE87394@inria.fr>
 <A19359C4-55CD-455F-AD79-91B8BC2F92B3@gmail.com>
 <4C818029-F4B8-4893-8E3E-42C24221EC49@inria.fr>
 <CAOfRF=j5DXPkF91X3aR3kppT390QukW_H=+KXMfADzmBjdgMCw@mail.gmail.com>
Message-ID: <4004103797063557632@unknownmsgid>

> It seems a generalized ufunc "all_equal" with signature (i),(i)->() and short circuit logic once the first non equal element is encountered would be an important performance improvement.

How does array_equal() perform?

-CHB


From harrigan.matthew at gmail.com  Fri Jan 13 11:02:47 2017
From: harrigan.matthew at gmail.com (Matthew Harrigan)
Date: Fri, 13 Jan 2017 11:02:47 -0500
Subject: [Numpy-discussion] From Python to Numpy
In-Reply-To: <4004103797063557632@unknownmsgid>
References: <C83AAA15-79DE-41C5-A327-36B11EE87394@inria.fr>
 <A19359C4-55CD-455F-AD79-91B8BC2F92B3@gmail.com>
 <4C818029-F4B8-4893-8E3E-42C24221EC49@inria.fr>
 <CAOfRF=j5DXPkF91X3aR3kppT390QukW_H=+KXMfADzmBjdgMCw@mail.gmail.com>
 <4004103797063557632@unknownmsgid>
Message-ID: <CAOfRF=g1b_jzMNEgu6vQhQqYCVwr9A2egaBKDqCBPLASB9hJ4g@mail.gmail.com>

I coded up an all_equal gufunc here
<https://github.com/mattharrigan/numpy_logical_gufuncs>.  Benchmark results
are also in that repo.  For the specific problem in the book which started
this, its 40x faster than the optimized code in the book.  For large arrays
which have any early non equal element, its dramatically faster (1000x)
than the current alternative.  For large arrays which are all equal, its
~10% faster due to eliminating the intermediate boolean array.  For tiny
arrays its much faster due to a single function call instead of at least
two, but its debatable how relevant speed is for tiny problems.
Disclaimer: this is my first ufunc I have every written.

On Tue, Jan 10, 2017 at 8:27 PM, Chris Barker - NOAA Federal <
chris.barker at noaa.gov> wrote:

> > It seems a generalized ufunc "all_equal" with signature (i),(i)->() and
> short circuit logic once the first non equal element is encountered would
> be an important performance improvement.
>
> How does array_equal() perform?
>
> -CHB
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170113/0a4d3c19/attachment.html>

From charlesr.harris at gmail.com  Sun Jan 15 18:43:41 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 15 Jan 2017 16:43:41 -0700
Subject: [Numpy-discussion] NumPy 1.12.0 release
Message-ID: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>

Hi All,

I'm pleased to announce the NumPy 1.12.0 release. This release supports
Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
downloaded from PiPY
<https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the tarball
and zip files may be downloaded from Github
<https://github.com/numpy/numpy/releases/tag/v1.12.0>. The release notes
and files hashes may also be found at Github
<https://github.com/numpy/numpy/releases/tag/v1.12.0> .

NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
contributors and comprises a large number of fixes and improvements. Among
the many improvements it is difficult to  pick out just a few as standing
above the others, but the following may be of particular interest or
indicate areas likely to have future consequences.

* Order of operations in ``np.einsum`` can now be optimized for large speed
improvements.
* New ``signature`` argument to ``np.vectorize`` for vectorizing with core
dimensions.
* The ``keepdims`` argument was added to many functions.
* New context manager for testing warnings
* Support for BLIS in numpy.distutils
* Much improved support for PyPy (not yet finished)

Enjoy,

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170115/8d926505/attachment.html>

From tcaswell at gmail.com  Mon Jan 16 01:00:34 2017
From: tcaswell at gmail.com (Thomas Caswell)
Date: Mon, 16 Jan 2017 06:00:34 +0000
Subject: [Numpy-discussion] question about long doubles on ppc64el
Message-ID: <CAA48SF80KCpQ5LLOGYHxWrA9xPoDLc0L15SnTMfR2nORGSWeNg@mail.gmail.com>

Folks,

Over at h5py we are trying to get a release out and have discovered (via
debian) that on ppc64el there is an apparent disagreement between the size
of a native long double according to hdf5 and numpy.

For all of the gorey details see: https://github.com/h5py/h5py/issues/817 .

In short, `np.longdouble` seems to be `np.float128` and according to the
docs should map to the native 'long double'.  However, hdf5 provides a
`H5T_NATIVE_LDOUBLE` which should also refer to the native 'long double',
but seems to be a 64 bit float.

Anyone on this list have a ppc64el machine (or experience with) that can
provide some guidance here?

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170116/dcc472d0/attachment.html>

From jenshnielsen at gmail.com  Mon Jan 16 03:47:20 2017
From: jenshnielsen at gmail.com (Jens Nielsen)
Date: Mon, 16 Jan 2017 08:47:20 +0000
Subject: [Numpy-discussion] question about long doubles on ppc64el
In-Reply-To: <CAA48SF80KCpQ5LLOGYHxWrA9xPoDLc0L15SnTMfR2nORGSWeNg@mail.gmail.com>
References: <CAA48SF80KCpQ5LLOGYHxWrA9xPoDLc0L15SnTMfR2nORGSWeNg@mail.gmail.com>
Message-ID: <CAM-Pw01A=ger2ovJX-nGONT2xZ8TckZix6NKcncKzosH-=b5HA@mail.gmail.com>

According to
https://docs.scipy.org/doc/numpy-dev/user/basics.types.html#extended-precision
numpy
long doubles are typically zero padded to 128 bits on 64 bit systems could
that be the reason?

On Mon, 16 Jan 2017 at 07:00 Thomas Caswell <tcaswell at gmail.com> wrote:

> Folks,
>
> Over at h5py we are trying to get a release out and have discovered (via
> debian) that on ppc64el there is an apparent disagreement between the size
> of a native long double according to hdf5 and numpy.
>
> For all of the gorey details see: https://github.com/h5py/h5py/issues/817
>  .
>
> In short, `np.longdouble` seems to be `np.float128` and according to the
> docs should map to the native 'long double'.  However, hdf5 provides a
> `H5T_NATIVE_LDOUBLE` which should also refer to the native 'long double',
> but seems to be a 64 bit float.
>
> Anyone on this list have a ppc64el machine (or experience with) that can
> provide some guidance here?
>
> Tom
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170116/dc6c85a0/attachment.html>

From ralf.gommers at gmail.com  Mon Jan 16 04:42:06 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 16 Jan 2017 22:42:06 +1300
Subject: [Numpy-discussion] NumPy 1.12.0 release
In-Reply-To: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
Message-ID: <CABL7CQj7hQEhoxqLrt-vwQR3cCUnOvyq2=OF=3BnPmnd3M2vhA@mail.gmail.com>

On Mon, Jan 16, 2017 at 12:43 PM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

> Hi All,
>
> I'm pleased to announce the NumPy 1.12.0 release. This release supports
> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
> downloaded from PiPY
> <https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the tarball
> and zip files may be downloaded from Github
> <https://github.com/numpy/numpy/releases/tag/v1.12.0>. The release notes
> and files hashes may also be found at Github
> <https://github.com/numpy/numpy/releases/tag/v1.12.0> .
>
> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
> contributors and comprises a large number of fixes and improvements. Among
> the many improvements it is difficult to  pick out just a few as standing
> above the others, but the following may be of particular interest or
> indicate areas likely to have future consequences.
>
> * Order of operations in ``np.einsum`` can now be optimized for large
> speed improvements.
> * New ``signature`` argument to ``np.vectorize`` for vectorizing with core
> dimensions.
> * The ``keepdims`` argument was added to many functions.
> * New context manager for testing warnings
> * Support for BLIS in numpy.distutils
> * Much improved support for PyPy (not yet finished)
>

Thanks for all the heavy lifting on this one Chuck!

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170116/507297e5/attachment.html>

From max_linke at gmx.de  Mon Jan 16 08:38:11 2017
From: max_linke at gmx.de (Max Linke)
Date: Mon, 16 Jan 2017 14:38:11 +0100
Subject: [Numpy-discussion] GSoC 2017: NumFocus will be an umbrella
	organization
Message-ID: <d5068955-94f4-05e4-c098-c96e47fbac1f@gmx.de>

Hi

Organizations can start submitting applications for Google Summer of Code
2017 on January 19 (and the deadline is February 9)

https://developers.google.com/open-source/gsoc/timeline?hl=en

NumFOCUS will be applying again this year. If you want to work with us
please let me know and if you apply as an organization yourself or under a
different umbrella organization please tell me as well. If you participate
with us it would be great if you start to add possible projects to the
ideas page on github soon. We some general information for mentors on
github.

https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-mentors.md

We also have a template for ideas that might help. It lists the things
Google likes to see.

https://github.com/numfocus/gsoc/blob/master/2017/ideas-list-skeleton.md

In case you participated in earlier years with NumFOCUS there are some
small changes this year. Raniere won't be the admin this year. Instead I'm
going to be the admin. We are also planning to include two explicit rules
when a student should be failed, they have to communicate regularly and
commit code into your development branch at the end of the summer.

best, Max


From charlesr.harris at gmail.com  Mon Jan 16 10:47:19 2017
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 16 Jan 2017 08:47:19 -0700
Subject: [Numpy-discussion] question about long doubles on ppc64el
In-Reply-To: <CAA48SF80KCpQ5LLOGYHxWrA9xPoDLc0L15SnTMfR2nORGSWeNg@mail.gmail.com>
References: <CAA48SF80KCpQ5LLOGYHxWrA9xPoDLc0L15SnTMfR2nORGSWeNg@mail.gmail.com>
Message-ID: <CAB6mnxLY=w8sXWpsrfm4dgbFFdhrE5YuheUaD_A0U+fEFD6bbQ@mail.gmail.com>

On Sun, Jan 15, 2017 at 11:00 PM, Thomas Caswell <tcaswell at gmail.com> wrote:

> Folks,
>
> Over at h5py we are trying to get a release out and have discovered (via
> debian) that on ppc64el there is an apparent disagreement between the size
> of a native long double according to hdf5 and numpy.
>
> For all of the gorey details see: https://github.com/h5py/h5py/issues/817
>  .
>
> In short, `np.longdouble` seems to be `np.float128` and according to the
> docs should map to the native 'long double'.  However, hdf5 provides a
> `H5T_NATIVE_LDOUBLE` which should also refer to the native 'long double',
> but seems to be a 64 bit float.
>
> Anyone on this list have a ppc64el machine (or experience with) that can
> provide some guidance here?
>

I believe the ppc64 long double is IBM double double, i.e., two doubles for
128 bits. It isn't IEEE compliant and probably not very portable. It is
possible that different compilers could treat it differently or it may be
flagged to be treated in some specific way.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170116/65e10599/attachment.html>

From matthew.brett at gmail.com  Mon Jan 16 11:55:59 2017
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 16 Jan 2017 08:55:59 -0800
Subject: [Numpy-discussion] question about long doubles on ppc64el
In-Reply-To: <CAA48SF80KCpQ5LLOGYHxWrA9xPoDLc0L15SnTMfR2nORGSWeNg@mail.gmail.com>
References: <CAA48SF80KCpQ5LLOGYHxWrA9xPoDLc0L15SnTMfR2nORGSWeNg@mail.gmail.com>
Message-ID: <CAH6Pt5p3Q-LfeUwk0ShypewTkNSJbw7-JWsL33vESaK6dfOq9Q@mail.gmail.com>

Hi,

On Sun, Jan 15, 2017 at 10:00 PM, Thomas Caswell <tcaswell at gmail.com> wrote:
> Folks,
>
> Over at h5py we are trying to get a release out and have discovered (via
> debian) that on ppc64el there is an apparent disagreement between the size
> of a native long double according to hdf5 and numpy.
>
> For all of the gorey details see: https://github.com/h5py/h5py/issues/817 .
>
> In short, `np.longdouble` seems to be `np.float128` and according to the
> docs should map to the native 'long double'.  However, hdf5 provides a
> `H5T_NATIVE_LDOUBLE` which should also refer to the native 'long double',
> but seems to be a 64 bit float.
>
> Anyone on this list have a ppc64el machine (or experience with) that can
> provide some guidance here?

I know that long double on numpy for the PPC on Mac G4 (power64 arch)
is the twin double, as expected, so I'd be surprised if that wasn't
true for numpy on ppc64el .

Do you want a login for the G4 running Jessie?   If so, send me your
public key off-list?

Cheers,

Matthew


From nyh at scylladb.com  Tue Jan 17 04:55:58 2017
From: nyh at scylladb.com (Nadav Har'El)
Date: Tue, 17 Jan 2017 11:55:58 +0200
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
Message-ID: <CANEVyjv2Sko9+O=0vgvjFHJOquAQ9gVaQF-cbh9OPW3JZ-dc8g@mail.gmail.com>

Hi, I'm looking for a way to find a random sample of C different items out
of N items, with a some desired probabilty Pi for each item i.

I saw that numpy has a function that supposedly does this,
numpy.random.choice (with replace=False and a probabilities array), but
looking at the algorithm actually implemented, I am wondering in what sense
are the probabilities Pi actually obeyed...

To me, the code doesn't seem to be doing the right thing... Let me explain:

Consider a simple numerical example: We have 3 items, and need to pick 2
different ones randomly. Let's assume the desired probabilities for item 1,
2 and 3 are: 0.2, 0.4 and 0.4.

Working out the equations there is exactly one solution here: The random
outcome of numpy.random.choice in this case should be [1,2] at probability
0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed
a solution for the desired probabilities because it yields item 1 in
[1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
0.2+0.6 = 0.8 = 2*P2, etc.

However, the algorithm in numpy.random.choice's replace=False generates, if
I understand correctly, different probabilities for the outcomes: I believe
in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333,
and [2,3] at probability 0.53333.

My question is how does this result fit the desired probabilities?

If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333,
then the expect number of "1" results we'll get per drawing is 0.23333 +
0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for
"3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice
common than item 1 as we originally desired (we asked for probabilities
0.2, 0.4, 0.4 for the individual items!).


--
Nadav Har'El
nyh at scylladb.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170117/d1f0a1db/attachment.html>

From ndbecker2 at gmail.com  Tue Jan 17 08:56:42 2017
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 17 Jan 2017 08:56:42 -0500
Subject: [Numpy-discussion] NumPy 1.12.0 release
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
Message-ID: <o5l7ql$tpi$2@blaine.gmane.org>

Charles R Harris wrote:

> Hi All,
> 
> I'm pleased to announce the NumPy 1.12.0 release. This release supports
> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
> downloaded from PiPY
> <https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the tarball
> and zip files may be downloaded from Github
> <https://github.com/numpy/numpy/releases/tag/v1.12.0>. The release notes
> and files hashes may also be found at Github
> <https://github.com/numpy/numpy/releases/tag/v1.12.0> .
> 
> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
> contributors and comprises a large number of fixes and improvements. Among
> the many improvements it is difficult to  pick out just a few as standing
> above the others, but the following may be of particular interest or
> indicate areas likely to have future consequences.
> 
> * Order of operations in ``np.einsum`` can now be optimized for large
> speed improvements.
> * New ``signature`` argument to ``np.vectorize`` for vectorizing with core
> dimensions.
> * The ``keepdims`` argument was added to many functions.
> * New context manager for testing warnings
> * Support for BLIS in numpy.distutils
> * Much improved support for PyPy (not yet finished)
> 
> Enjoy,
> 
> Chuck

I've installed via pip3 on linux x86_64, which gives me a wheel.  My 
question is, am I loosing significant performance choosing this pre-built 
binary vs. compiling myself?  For example, my processor might have some more 
features than the base version used to build wheels.


From tcaswell at gmail.com  Tue Jan 17 11:55:12 2017
From: tcaswell at gmail.com (Thomas Caswell)
Date: Tue, 17 Jan 2017 16:55:12 +0000
Subject: [Numpy-discussion] [REL] matplotlib v2.0.0
Message-ID: <CAA48SF-ThsWTT6tnXzpqc70RVrcAxWp+aBRxhTeieoJJUmJpoA@mail.gmail.com>

Folks,

We are happy to announce the release of (long delayed) matplotlib 2.0!
This release completely overhauls the default style of the plots.

The source tarball and wheels for Mac, Win, and manylinux for python 2.7,
3.4-3.6 are available on pypi

   pip install --upgrade matplotlib

and conda packages for Mac, Win, linux for python 2.7, 3.4-3.6 are
available from conda-forge

   conda install matplotlib -c conda-forge


Highlights include:

 - 'viridis' is default color map instead of jet.
 - Modernized the default color cycle.
 - Many more functions respect the color cycle.
 - Line dash patterns scale with linewidth.
 - Change default font to DejaVu, now supports most Western alphabets
(including Greek, Cyrillic and Latin with diacritics), math symbols and
emoji out of the box.
 - Faster text rendering.
 - Improved auto-limits.
 - Ticks out and only on the right and bottom spines by default.
 - Improved auto-ticking, particularly for log scales and dates.
 - Improved image support (imshow respects scales and eliminated a class of
artifacts).

For a full list of the default changes (along with how to revert them)
please see http://matplotlib.org/users/dflt_style_changes.html and
http://matplotlib.org/users/whats_new.html#new-in-matplotlib-2-0.

There were a number of small API changes documented at
http://matplotlib.org/api/api_changes.html#api-changes-in-2-0-0

I would like to thank everyone who helped on this release in anyway.  The
people at 2015 scipy BOF where this got started, users who provided
feedback and suggestions along the way, the beta-testers, Nathaniel, Stefan
and Eric for the new color maps, and all of the documentation and code
contributors.

Please report any issues to matplotlib-users at python.org (will have to join
to post un-moderated) or https://github.com/matplotlib/matplotlib/issues .

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170117/970ec201/attachment.html>

From alebarde at gmail.com  Tue Jan 17 12:18:59 2017
From: alebarde at gmail.com (alebarde at gmail.com)
Date: Tue, 17 Jan 2017 18:18:59 +0100
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
Message-ID: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>

Hi Nadav,

I may be wrong, but I think that the result of the current implementation
is actually the expected one.
Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4

P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])

Now, P([1]) = 0.2 and P([2]) = 0.4. However:
P([2] | 1st=[1]) = 0.5     (2 and 3 have the same sampling probability)
P([1] | 1st=[2]) = 1/3     (1 and 3 have probability 0.2 and 0.4 that, once
normalised, translate into 1/3 and 2/3 respectively)
Therefore P([1,2]) = 0.7/3 = 0.23333
Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333

What am I missing?

Alessandro


2017-01-17 13:00 GMT+01:00 <numpy-discussion-request at scipy.org>:

> Hi, I'm looking for a way to find a random sample of C different items out
> of N items, with a some desired probabilty Pi for each item i.
>
> I saw that numpy has a function that supposedly does this,
> numpy.random.choice (with replace=False and a probabilities array), but
> looking at the algorithm actually implemented, I am wondering in what sense
> are the probabilities Pi actually obeyed...
>
> To me, the code doesn't seem to be doing the right thing... Let me explain:
>
> Consider a simple numerical example: We have 3 items, and need to pick 2
> different ones randomly. Let's assume the desired probabilities for item 1,
> 2 and 3 are: 0.2, 0.4 and 0.4.
>
> Working out the equations there is exactly one solution here: The random
> outcome of numpy.random.choice in this case should be [1,2] at probability
> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed
> a solution for the desired probabilities because it yields item 1 in
> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
> 0.2+0.6 = 0.8 = 2*P2, etc.
>
> However, the algorithm in numpy.random.choice's replace=False generates, if
> I understand correctly, different probabilities for the outcomes: I believe
> in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333,
> and [2,3] at probability 0.53333.
>
> My question is how does this result fit the desired probabilities?
>
> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333,
> then the expect number of "1" results we'll get per drawing is 0.23333 +
> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for
> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice
> common than item 1 as we originally desired (we asked for probabilities
> 0.2, 0.4, 0.4 for the individual items!).
>
>
> --
> Nadav Har'El
> nyh at scylladb.com
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://mail.scipy.org/pipermail/numpy-discussion/
> attachments/20170117/d1f0a1db/attachment-0001.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------
>
> End of NumPy-Discussion Digest, Vol 124, Issue 24
> *************************************************
>


-- 
--------------------------------------------------------------------------
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170117/e7d5631a/attachment.html>

From matthew.brett at gmail.com  Tue Jan 17 13:02:42 2017
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 17 Jan 2017 10:02:42 -0800
Subject: [Numpy-discussion] NumPy 1.12.0 release
In-Reply-To: <o5l7ql$tpi$2@blaine.gmane.org>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org>
Message-ID: <CAH6Pt5r0SGS6gedmc9mcsjhPKcSfyFgQvVnPHh05s3G1cDAnjQ@mail.gmail.com>

Hi,

On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Charles R Harris wrote:
>
>> Hi All,
>>
>> I'm pleased to announce the NumPy 1.12.0 release. This release supports
>> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
>> downloaded from PiPY
>> <https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the tarball
>> and zip files may be downloaded from Github
>> <https://github.com/numpy/numpy/releases/tag/v1.12.0>. The release notes
>> and files hashes may also be found at Github
>> <https://github.com/numpy/numpy/releases/tag/v1.12.0> .
>>
>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
>> contributors and comprises a large number of fixes and improvements. Among
>> the many improvements it is difficult to  pick out just a few as standing
>> above the others, but the following may be of particular interest or
>> indicate areas likely to have future consequences.
>>
>> * Order of operations in ``np.einsum`` can now be optimized for large
>> speed improvements.
>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with core
>> dimensions.
>> * The ``keepdims`` argument was added to many functions.
>> * New context manager for testing warnings
>> * Support for BLIS in numpy.distutils
>> * Much improved support for PyPy (not yet finished)
>>
>> Enjoy,
>>
>> Chuck
>
> I've installed via pip3 on linux x86_64, which gives me a wheel.  My
> question is, am I loosing significant performance choosing this pre-built
> binary vs. compiling myself?  For example, my processor might have some more
> features than the base version used to build wheels.

I guess you are thinking about using this built wheel on some other
machine?   You'd have to be lucky for that to work; the wheel depends
on the symbols it found at build time, which may not exist in the same
places on your other machine.

If it does work, the speed will primarily depend on your BLAS library.

The pypi wheels should be pretty fast; they are built with OpenBLAS,
which is at or near top of range for speed, across a range of
platforms.

Cheers,

Matthew


From nyh at scylladb.com  Tue Jan 17 16:13:26 2017
From: nyh at scylladb.com (Nadav Har'El)
Date: Tue, 17 Jan 2017 23:13:26 +0200
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
Message-ID: <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>

On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com <alebarde at gmail.com>
wrote:

> Hi Nadav,
>
> I may be wrong, but I think that the result of the current implementation
> is actually the expected one.
> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4
>
> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>

Yes, this formula does fit well with the actual algorithm in the code. But,
my question is *why* we want this formula to be correct:


> Now, P([1]) = 0.2 and P([2]) = 0.4. However:
> P([2] | 1st=[1]) = 0.5     (2 and 3 have the same sampling probability)
> P([1] | 1st=[2]) = 1/3     (1 and 3 have probability 0.2 and 0.4 that,
> once normalised, translate into 1/3 and 2/3 respectively)
> Therefore P([1,2]) = 0.7/3 = 0.23333
> Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333
>

Right, these are the numbers that the algorithm in the current code, and
the formula above, produce:

P([1,2]) = P([1,3]) = 0.23333
P([2,3]) = 0.53333

What I'm puzzled about is that these probabilities do not really fullfill
the given probability vector 0.2, 0.4, 0.4...
Let me try to explain explain:

Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three items
in the first place?

One reasonable interpretation is that the user wants in his random picks to
see item 1 half the time of item 2 or 3.
For example, maybe item 1 costs twice as much as item 2 or 3, so picking it
half as often will result in an equal expenditure on each item.

If the user randomly picks the items individually (a single item at a
time), he indeed gets exactly this distribution: 0.2 of the time item 1,
0.4 of the time item 2, 0.4 of the time item 3.

Now, what happens if he picks not individual items, but pairs of different
items using numpy.random.choice with two items, replace=false?
Suddenly, the distribution of the individual items in the results get
skewed: If we look at the expected number of times we'll see each item in
one draw of a random pair, we will get:

E(1) = P([1,2]) + P([1,3]) = 0.46666
E(2) = P([1,2]) + P([2,3]) = 0.76666
E(3) = P([1,3]) + P([2,3]) = 0.76666

Or renormalizing by dividing by 2:

P(1) = 0.233333
P(2) = 0.383333
P(3) = 0.383333

As you can see this is not quite the probabilities we wanted (which were
0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more
often than we wanted, and item 2 and 3 were used a bit less often!

So that brought my question of why we consider these numbers right.

In this example, it's actually possible to get the right item distribution,
if we pick the pair outcomes with the following probabilties:

   P([1,2]) = 0.2        (not 0.233333 as above)
   P([1,3]) = 0.2
   P([2,3]) = 0.6        (not 0.533333 as above)

Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4

Interestingly, fixing things like I suggest is not always possible.
Consider a different probability-vector example for three items - 0.99,
0.005, 0.005. Now, no matter which algorithm we use for randomly picking
pairs from these three items, *each* returned pair will inevitably contain
one of the two very-low-probability items, so each of those items will
appear in roughly half the pairs, instead of in a vanishingly small
percentage as we hoped.

But in other choices of probabilities (like the one in my original
example), there is a solution. For 2-out-of-3 sampling we can actually show
a system of three linear equations in three variables, so there is always
one solution but if this solution has components not valid as probabilities
(not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005,
0.005 example.


> What am I missing?
>
> Alessandro
>
>
> 2017-01-17 13:00 GMT+01:00 <numpy-discussion-request at scipy.org>:
>
>> Hi, I'm looking for a way to find a random sample of C different items out
>> of N items, with a some desired probabilty Pi for each item i.
>>
>> I saw that numpy has a function that supposedly does this,
>> numpy.random.choice (with replace=False and a probabilities array), but
>> looking at the algorithm actually implemented, I am wondering in what
>> sense
>> are the probabilities Pi actually obeyed...
>>
>> To me, the code doesn't seem to be doing the right thing... Let me
>> explain:
>>
>> Consider a simple numerical example: We have 3 items, and need to pick 2
>> different ones randomly. Let's assume the desired probabilities for item
>> 1,
>> 2 and 3 are: 0.2, 0.4 and 0.4.
>>
>> Working out the equations there is exactly one solution here: The random
>> outcome of numpy.random.choice in this case should be [1,2] at probability
>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed
>> a solution for the desired probabilities because it yields item 1 in
>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
>> 0.2+0.6 = 0.8 = 2*P2, etc.
>>
>> However, the algorithm in numpy.random.choice's replace=False generates,
>> if
>> I understand correctly, different probabilities for the outcomes: I
>> believe
>> in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333,
>> and [2,3] at probability 0.53333.
>>
>> My question is how does this result fit the desired probabilities?
>>
>> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333,
>> then the expect number of "1" results we'll get per drawing is 0.23333 +
>> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and
>> for
>> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice
>> common than item 1 as we originally desired (we asked for probabilities
>> 0.2, 0.4, 0.4 for the individual items!).
>>
>>
>> --
>> Nadav Har'El
>> nyh at scylladb.com
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <https://mail.scipy.org/pipermail/numpy-discussion/attachmen
>> ts/20170117/d1f0a1db/attachment-0001.html>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> ------------------------------
>>
>> End of NumPy-Discussion Digest, Vol 124, Issue 24
>> *************************************************
>>
>
>
>
> --
> --------------------------------------------------------------------------
> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
> confidential information and are intended for the sole use of the
> recipient(s) named above. If you are not the intended recipient of this
> message you are hereby notified that any dissemination or copying of this
> message is strictly prohibited. If you have received this e-mail in error,
> please notify the sender either by telephone or by e-mail and delete the
> material from any computer. Thank you.
> --------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170117/33e7501b/attachment.html>

From josef.pktd at gmail.com  Tue Jan 17 17:25:39 2017
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 17 Jan 2017 17:25:39 -0500
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
Message-ID: <CAMMTP+Be48nKGWjwhibD8R4M+e0_WqrFeVDn=KqX0KsU7oDhvg@mail.gmail.com>

On Tue, Jan 17, 2017 at 4:13 PM, Nadav Har'El <nyh at scylladb.com> wrote:
>
>
> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com <alebarde at gmail.com>
wrote:
>>
>> Hi Nadav,
>>
>> I may be wrong, but I think that the result of the current
implementation is actually the expected one.
>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
0.4
>>
>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>
>
> Yes, this formula does fit well with the actual algorithm in the code.
But, my question is *why* we want this formula to be correct:
>
>>
>> Now, P([1]) = 0.2 and P([2]) = 0.4. However:
>> P([2] | 1st=[1]) = 0.5     (2 and 3 have the same sampling probability)
>> P([1] | 1st=[2]) = 1/3     (1 and 3 have probability 0.2 and 0.4 that,
once normalised, translate into 1/3 and 2/3 respectively)
>> Therefore P([1,2]) = 0.7/3 = 0.23333
>> Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333
>
>
> Right, these are the numbers that the algorithm in the current code, and
the formula above, produce:
>
> P([1,2]) = P([1,3]) = 0.23333
> P([2,3]) = 0.53333
>
> What I'm puzzled about is that these probabilities do not really fullfill
the given probability vector 0.2, 0.4, 0.4...
> Let me try to explain explain:
>
> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three
items in the first place?
>
> One reasonable interpretation is that the user wants in his random picks
to see item 1 half the time of item 2 or 3.
> For example, maybe item 1 costs twice as much as item 2 or 3, so picking
it half as often will result in an equal expenditure on each item.
>
> If the user randomly picks the items individually (a single item at a
time), he indeed gets exactly this distribution: 0.2 of the time item 1,
0.4 of the time item 2, 0.4 of the time item 3.
>
> Now, what happens if he picks not individual items, but pairs of
different items using numpy.random.choice with two items, replace=false?
> Suddenly, the distribution of the individual items in the results get
skewed: If we look at the expected number of times we'll see each item in
one draw of a random pair, we will get:
>
> E(1) = P([1,2]) + P([1,3]) = 0.46666
> E(2) = P([1,2]) + P([2,3]) = 0.76666
> E(3) = P([1,3]) + P([2,3]) = 0.76666
>
> Or renormalizing by dividing by 2:
>
> P(1) = 0.233333
> P(2) = 0.383333
> P(3) = 0.383333
>
> As you can see this is not quite the probabilities we wanted (which were
0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more
often than we wanted, and item 2 and 3 were used a bit less often!
>
> So that brought my question of why we consider these numbers right.
>
> In this example, it's actually possible to get the right item
distribution, if we pick the pair outcomes with the following probabilties:
>
>    P([1,2]) = 0.2        (not 0.233333 as above)
>    P([1,3]) = 0.2
>    P([2,3]) = 0.6        (not 0.533333 as above)
>
> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4
>
> Interestingly, fixing things like I suggest is not always possible.
Consider a different probability-vector example for three items - 0.99,
0.005, 0.005. Now, no matter which algorithm we use for randomly picking
pairs from these three items, *each* returned pair will inevitably contain
one of the two very-low-probability items, so each of those items will
appear in roughly half the pairs, instead of in a vanishingly small
percentage as we hoped.
>
> But in other choices of probabilities (like the one in my original
example), there is a solution. For 2-out-of-3 sampling we can actually show
a system of three linear equations in three variables, so there is always
one solution but if this solution has components not valid as probabilities
(not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005,
0.005 example.


I think the underlying problem is that in the sampling space the events (1,
2) (1, 3) (2, 3) are correlated and because of the discreteness an
arbitrary marginal distribution on the individual events 1, 2, 3 is not
possible.

related aside:
I'm not able (or willing to spend the time) on the math, but I just went
through something similar for survey sampling in finite population (e.g.
survey two out of 3 individuals, where 3 is the population), leading to the
Horvitz?Thompson estimator. The books have chapters on different sampling
schemes and derivation of the marginal and joint probability to be surveyed.

(I gave up on sampling without replacement, and assume we have a large
population where it doesn't make a difference.)

In some of the sampling schemes they pick sequentially and adjust the
probabilities for the remaining individuals. That seems to provide more
flexibility to create a desired or optimal sampling scheme.

Josef


>
>
>
>>
>> What am I missing?
>>
>> Alessandro
>>
>>
>> 2017-01-17 13:00 GMT+01:00 <numpy-discussion-request at scipy.org>:
>>>
>>> Hi, I'm looking for a way to find a random sample of C different items
out
>>> of N items, with a some desired probabilty Pi for each item i.
>>>
>>> I saw that numpy has a function that supposedly does this,
>>> numpy.random.choice (with replace=False and a probabilities array), but
>>> looking at the algorithm actually implemented, I am wondering in what
sense
>>> are the probabilities Pi actually obeyed...
>>>
>>> To me, the code doesn't seem to be doing the right thing... Let me
explain:
>>>
>>> Consider a simple numerical example: We have 3 items, and need to pick 2
>>> different ones randomly. Let's assume the desired probabilities for
item 1,
>>> 2 and 3 are: 0.2, 0.4 and 0.4.
>>>
>>> Working out the equations there is exactly one solution here: The random
>>> outcome of numpy.random.choice in this case should be [1,2] at
probability
>>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is
indeed
>>> a solution for the desired probabilities because it yields item 1 in
>>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
>>> 0.2+0.6 = 0.8 = 2*P2, etc.
>>>
>>> However, the algorithm in numpy.random.choice's replace=False
generates, if
>>> I understand correctly, different probabilities for the outcomes: I
believe
>>> in this case it generates [1,2] at probability 0.23333, [1,3] also
0.2333,
>>> and [2,3] at probability 0.53333.
>>>
>>> My question is how does this result fit the desired probabilities?
>>>
>>> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333,
>>> then the expect number of "1" results we'll get per drawing is 0.23333 +
>>> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and
for
>>> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT
twice
>>> common than item 1 as we originally desired (we asked for probabilities
>>> 0.2, 0.4, 0.4 for the individual items!).
>>>
>>>
>>> --
>>> Nadav Har'El
>>> nyh at scylladb.com
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: <
https://mail.scipy.org/pipermail/numpy-discussion/attachments/20170117/d1f0a1db/attachment-0001.html
>
>>>
>>> ------------------------------
>>>
>>> Subject: Digest Footer
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> ------------------------------
>>>
>>> End of NumPy-Discussion Digest, Vol 124, Issue 24
>>> *************************************************
>>
>>
>>
>>
>> --
>>
--------------------------------------------------------------------------
>> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may
contain confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
>>
--------------------------------------------------------------------------
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170117/42045d18/attachment.html>

From alebarde at gmail.com  Tue Jan 17 18:58:05 2017
From: alebarde at gmail.com (alebarde at gmail.com)
Date: Wed, 18 Jan 2017 00:58:05 +0100
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
Message-ID: <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>

2017-01-17 22:13 GMT+01:00 Nadav Har'El <nyh at scylladb.com>:

>
> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com <alebarde at gmail.com>
> wrote:
>
>> Hi Nadav,
>>
>> I may be wrong, but I think that the result of the current implementation
>> is actually the expected one.
>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4
>>
>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>>
>
> Yes, this formula does fit well with the actual algorithm in the code.
> But, my question is *why* we want this formula to be correct:
>
> Just a note: this formula is correct and it is one of statistics
fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability +
https://en.wikipedia.org/wiki/Bayes%27_theorem
Thus, the result we get from random.choice IMHO definitely makes sense. Of
course, I think we could always discuss about implementing other sampling
methods if they are useful to some application.


>
>> Now, P([1]) = 0.2 and P([2]) = 0.4. However:
>> P([2] | 1st=[1]) = 0.5     (2 and 3 have the same sampling probability)
>> P([1] | 1st=[2]) = 1/3     (1 and 3 have probability 0.2 and 0.4 that,
>> once normalised, translate into 1/3 and 2/3 respectively)
>> Therefore P([1,2]) = 0.7/3 = 0.23333
>> Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333
>>
>
> Right, these are the numbers that the algorithm in the current code, and
> the formula above, produce:
>
> P([1,2]) = P([1,3]) = 0.23333
> P([2,3]) = 0.53333
>
> What I'm puzzled about is that these probabilities do not really fullfill
> the given probability vector 0.2, 0.4, 0.4...
> Let me try to explain explain:
>
> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three
> items in the first place?
>
> One reasonable interpretation is that the user wants in his random picks
> to see item 1 half the time of item 2 or 3.
> For example, maybe item 1 costs twice as much as item 2 or 3, so picking
> it half as often will result in an equal expenditure on each item.
>
> If the user randomly picks the items individually (a single item at a
> time), he indeed gets exactly this distribution: 0.2 of the time item 1,
> 0.4 of the time item 2, 0.4 of the time item 3.
>
> Now, what happens if he picks not individual items, but pairs of different
> items using numpy.random.choice with two items, replace=false?
> Suddenly, the distribution of the individual items in the results get
> skewed: If we look at the expected number of times we'll see each item in
> one draw of a random pair, we will get:
>
> E(1) = P([1,2]) + P([1,3]) = 0.46666
> E(2) = P([1,2]) + P([2,3]) = 0.76666
> E(3) = P([1,3]) + P([2,3]) = 0.76666
>
> Or renormalizing by dividing by 2:
>
> P(1) = 0.233333
> P(2) = 0.383333
> P(3) = 0.383333
>
> As you can see this is not quite the probabilities we wanted (which were
> 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more
> often than we wanted, and item 2 and 3 were used a bit less often!
>

p is not the probability of the output but the one of the source finite
population. I think that if you want to preserve that distribution, as
Josef pointed out, you have to make extractions independent, that is either
sample with replacement or approximate an infinite population (that is
basically the same thing).  But of course in this case you will also end up
with events [X,X].


> So that brought my question of why we consider these numbers right.
>
> In this example, it's actually possible to get the right item
> distribution, if we pick the pair outcomes with the following probabilties:
>
>    P([1,2]) = 0.2        (not 0.233333 as above)
>    P([1,3]) = 0.2
>    P([2,3]) = 0.6        (not 0.533333 as above)
>
> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4
>
> Interestingly, fixing things like I suggest is not always possible.
> Consider a different probability-vector example for three items - 0.99,
> 0.005, 0.005. Now, no matter which algorithm we use for randomly picking
> pairs from these three items, *each* returned pair will inevitably contain
> one of the two very-low-probability items, so each of those items will
> appear in roughly half the pairs, instead of in a vanishingly small
> percentage as we hoped.
>
> But in other choices of probabilities (like the one in my original
> example), there is a solution. For 2-out-of-3 sampling we can actually show
> a system of three linear equations in three variables, so there is always
> one solution but if this solution has components not valid as probabilities
> (not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005,
> 0.005 example.
>
>
>
>> What am I missing?
>>
>> Alessandro
>>
>>
>> 2017-01-17 13:00 GMT+01:00 <numpy-discussion-request at scipy.org>:
>>
>>> Hi, I'm looking for a way to find a random sample of C different items
>>> out
>>> of N items, with a some desired probabilty Pi for each item i.
>>>
>>> I saw that numpy has a function that supposedly does this,
>>> numpy.random.choice (with replace=False and a probabilities array), but
>>> looking at the algorithm actually implemented, I am wondering in what
>>> sense
>>> are the probabilities Pi actually obeyed...
>>>
>>> To me, the code doesn't seem to be doing the right thing... Let me
>>> explain:
>>>
>>> Consider a simple numerical example: We have 3 items, and need to pick 2
>>> different ones randomly. Let's assume the desired probabilities for item
>>> 1,
>>> 2 and 3 are: 0.2, 0.4 and 0.4.
>>>
>>> Working out the equations there is exactly one solution here: The random
>>> outcome of numpy.random.choice in this case should be [1,2] at
>>> probability
>>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is
>>> indeed
>>> a solution for the desired probabilities because it yields item 1 in
>>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
>>> 0.2+0.6 = 0.8 = 2*P2, etc.
>>>
>>> However, the algorithm in numpy.random.choice's replace=False generates,
>>> if
>>> I understand correctly, different probabilities for the outcomes: I
>>> believe
>>> in this case it generates [1,2] at probability 0.23333, [1,3] also
>>> 0.2333,
>>> and [2,3] at probability 0.53333.
>>>
>>> My question is how does this result fit the desired probabilities?
>>>
>>> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333,
>>> then the expect number of "1" results we'll get per drawing is 0.23333 +
>>> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and
>>> for
>>> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice
>>> common than item 1 as we originally desired (we asked for probabilities
>>> 0.2, 0.4, 0.4 for the individual items!).
>>>
>>>
>>> --
>>> Nadav Har'El
>>> nyh at scylladb.com
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: <https://mail.scipy.org/pipermail/numpy-discussion/attachmen
>>> ts/20170117/d1f0a1db/attachment-0001.html>
>>>
>>> ------------------------------
>>>
>>> Subject: Digest Footer
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> ------------------------------
>>>
>>> End of NumPy-Discussion Digest, Vol 124, Issue 24
>>> *************************************************
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------
>> --------------
>> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may
>> contain confidential information and are intended for the sole use of the
>> recipient(s) named above. If you are not the intended recipient of this
>> message you are hereby notified that any dissemination or copying of this
>> message is strictly prohibited. If you have received this e-mail in error,
>> please notify the sender either by telephone or by e-mail and delete the
>> material from any computer. Thank you.
>> ------------------------------------------------------------
>> --------------
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
--------------------------------------------------------------------------
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/31c160a2/attachment.html>

From matthew.brett at gmail.com  Tue Jan 17 19:14:14 2017
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 17 Jan 2017 16:14:14 -0800
Subject: [Numpy-discussion] [SciPy-Dev]  NumPy 1.12.0 release
In-Reply-To: <o5maet$h2$2@blaine.gmane.org>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org>
 <CAH6Pt5r0SGS6gedmc9mcsjhPKcSfyFgQvVnPHh05s3G1cDAnjQ@mail.gmail.com>
 <o5maet$h2$2@blaine.gmane.org>
Message-ID: <CAH6Pt5r3a2ikUyy+Y2GkEWZJ8LH3m6Q9WBBqGAdhf3Z1sXSyNw@mail.gmail.com>

On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Matthew Brett wrote:
>
>> Hi,
>>
>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> Charles R Harris wrote:
>>>
>>>> Hi All,
>>>>
>>>> I'm pleased to announce the NumPy 1.12.0 release. This release supports
>>>> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
>>>> downloaded from PiPY
>>>> <https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the
>>>> tarball and zip files may be downloaded from Github
>>>> <https://github.com/numpy/numpy/releases/tag/v1.12.0>. The release notes
>>>> and files hashes may also be found at Github
>>>> <https://github.com/numpy/numpy/releases/tag/v1.12.0> .
>>>>
>>>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
>>>> contributors and comprises a large number of fixes and improvements.
>>>> Among
>>>> the many improvements it is difficult to  pick out just a few as
>>>> standing above the others, but the following may be of particular
>>>> interest or indicate areas likely to have future consequences.
>>>>
>>>> * Order of operations in ``np.einsum`` can now be optimized for large
>>>> speed improvements.
>>>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with
>>>> core dimensions.
>>>> * The ``keepdims`` argument was added to many functions.
>>>> * New context manager for testing warnings
>>>> * Support for BLIS in numpy.distutils
>>>> * Much improved support for PyPy (not yet finished)
>>>>
>>>> Enjoy,
>>>>
>>>> Chuck
>>>
>>> I've installed via pip3 on linux x86_64, which gives me a wheel.  My
>>> question is, am I loosing significant performance choosing this pre-built
>>> binary vs. compiling myself?  For example, my processor might have some
>>> more features than the base version used to build wheels.
>>
>> I guess you are thinking about using this built wheel on some other
>> machine?   You'd have to be lucky for that to work; the wheel depends
>> on the symbols it found at build time, which may not exist in the same
>> places on your other machine.
>>
>> If it does work, the speed will primarily depend on your BLAS library.
>>
>> The pypi wheels should be pretty fast; they are built with OpenBLAS,
>> which is at or near top of range for speed, across a range of
>> platforms.
>>
>> Cheers,
>>
>> Matthew
>
> I installed using pip3 install, and it installed a wheel package.  I did not
> build it - aren't wheels already compiled packages?  So isn't it built for
> the common denominator architecture, not necessarily as fast as one I built
> myself on my own machine?  My question is, on x86_64, is this potential
> difference large enough to bother with not using precompiled wheel packages?

Ah - my guess is that you'd be hard pressed to make a numpy that is as
fast as the precompiled wheel.   The OpenBLAS library included in
numpy selects the routines for your CPU at run-time, so they will
generally be fast on your CPU.   You might be able to get equivalent
or even better performance with a ATLAS BLAS library recompiled on
your exact machine, but that's quite a serious investment of time to
get working, and you'd have to benchmark to find if you were really
doing any better.

Cheers,

Matthew


From njs at pobox.com  Tue Jan 17 19:20:12 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 17 Jan 2017 16:20:12 -0800
Subject: [Numpy-discussion] [SciPy-Dev]  NumPy 1.12.0 release
In-Reply-To: <o5maet$h2$2@blaine.gmane.org>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org>
 <CAH6Pt5r0SGS6gedmc9mcsjhPKcSfyFgQvVnPHh05s3G1cDAnjQ@mail.gmail.com>
 <o5maet$h2$2@blaine.gmane.org>
Message-ID: <CAPJVwBkgy1cfwMhZNiePrpv_-_reryW-mX44iZwHBCTxXHzLeQ@mail.gmail.com>

On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> Matthew Brett wrote:
>
>> Hi,
>>
>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker <ndbecker2 at gmail.com> wrote:
>>> Charles R Harris wrote:
>>>
>>>> Hi All,
>>>>
>>>> I'm pleased to announce the NumPy 1.12.0 release. This release supports
>>>> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
>>>> downloaded from PiPY
>>>> <https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the
>>>> tarball and zip files may be downloaded from Github
>>>> <https://github.com/numpy/numpy/releases/tag/v1.12.0>. The release notes
>>>> and files hashes may also be found at Github
>>>> <https://github.com/numpy/numpy/releases/tag/v1.12.0> .
>>>>
>>>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
>>>> contributors and comprises a large number of fixes and improvements.
>>>> Among
>>>> the many improvements it is difficult to  pick out just a few as
>>>> standing above the others, but the following may be of particular
>>>> interest or indicate areas likely to have future consequences.
>>>>
>>>> * Order of operations in ``np.einsum`` can now be optimized for large
>>>> speed improvements.
>>>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with
>>>> core dimensions.
>>>> * The ``keepdims`` argument was added to many functions.
>>>> * New context manager for testing warnings
>>>> * Support for BLIS in numpy.distutils
>>>> * Much improved support for PyPy (not yet finished)
>>>>
>>>> Enjoy,
>>>>
>>>> Chuck
>>>
>>> I've installed via pip3 on linux x86_64, which gives me a wheel.  My
>>> question is, am I loosing significant performance choosing this pre-built
>>> binary vs. compiling myself?  For example, my processor might have some
>>> more features than the base version used to build wheels.
>>
>> I guess you are thinking about using this built wheel on some other
>> machine?   You'd have to be lucky for that to work; the wheel depends
>> on the symbols it found at build time, which may not exist in the same
>> places on your other machine.
>>
>> If it does work, the speed will primarily depend on your BLAS library.
>>
>> The pypi wheels should be pretty fast; they are built with OpenBLAS,
>> which is at or near top of range for speed, across a range of
>> platforms.
>>
>> Cheers,
>>
>> Matthew
>
> I installed using pip3 install, and it installed a wheel package.  I did not
> build it - aren't wheels already compiled packages?  So isn't it built for
> the common denominator architecture, not necessarily as fast as one I built
> myself on my own machine?  My question is, on x86_64, is this potential
> difference large enough to bother with not using precompiled wheel packages?

Ultimately, it's going to depend on all sorts of things, including
most importantly your actual code. Like most speed questions, the only
real way to know is to try it and measure the difference.

The wheels do ship with a fast BLAS (OpenBLAS configured to
automatically adapt to your CPU at runtime), so the performance will
at least be reasonable. Possible improvements would include using a
different and somehow better BLAS (MKL might be faster in some cases),
tweaking your compiler options to take advantage of whatever SIMD ISAs
your particular CPU supports (numpy's build system doesn't do this
automatically but in principle you could do it by hand -- were you
bothering before? does it even make a difference in practice? I
dunno), and using a new compiler (the linux wheels use a somewhat
ancient version of gcc for Reasons; newer compilers are better at
optimizing -- how much does it matter? again I dunno).

Basically: if you want to experiment and report back then I think we'd
all be interested to hear; OTOH if you aren't feeling particularly
curious/ambitious then I wouldn't worry about it :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


From josef.pktd at gmail.com  Tue Jan 17 19:51:25 2017
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 17 Jan 2017 19:51:25 -0500
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
Message-ID: <CAMMTP+ARdHs-r=e0uhFb-mTzOjR0qiDkXd_6bACZ6Jhq_P6REw@mail.gmail.com>

On Tue, Jan 17, 2017 at 6:58 PM, alebarde at gmail.com <alebarde at gmail.com>
wrote:

>
>
> 2017-01-17 22:13 GMT+01:00 Nadav Har'El <nyh at scylladb.com>:
>
>>
>> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com <alebarde at gmail.com>
>> wrote:
>>
>>> Hi Nadav,
>>>
>>> I may be wrong, but I think that the result of the current
>>> implementation is actually the expected one.
>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
>>> 0.4
>>>
>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>>>
>>
>> Yes, this formula does fit well with the actual algorithm in the code.
>> But, my question is *why* we want this formula to be correct:
>>
>> Just a note: this formula is correct and it is one of statistics
> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability +
> https://en.wikipedia.org/wiki/Bayes%27_theorem
> Thus, the result we get from random.choice IMHO definitely makes sense. Of
> course, I think we could always discuss about implementing other sampling
> methods if they are useful to some application.
>
>
>>
>>> Now, P([1]) = 0.2 and P([2]) = 0.4. However:
>>> P([2] | 1st=[1]) = 0.5     (2 and 3 have the same sampling probability)
>>> P([1] | 1st=[2]) = 1/3     (1 and 3 have probability 0.2 and 0.4 that,
>>> once normalised, translate into 1/3 and 2/3 respectively)
>>> Therefore P([1,2]) = 0.7/3 = 0.23333
>>> Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333
>>>
>>
>> Right, these are the numbers that the algorithm in the current code, and
>> the formula above, produce:
>>
>> P([1,2]) = P([1,3]) = 0.23333
>> P([2,3]) = 0.53333
>>
>> What I'm puzzled about is that these probabilities do not really fullfill
>> the given probability vector 0.2, 0.4, 0.4...
>> Let me try to explain explain:
>>
>> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three
>> items in the first place?
>>
>> One reasonable interpretation is that the user wants in his random picks
>> to see item 1 half the time of item 2 or 3.
>> For example, maybe item 1 costs twice as much as item 2 or 3, so picking
>> it half as often will result in an equal expenditure on each item.
>>
>> If the user randomly picks the items individually (a single item at a
>> time), he indeed gets exactly this distribution: 0.2 of the time item 1,
>> 0.4 of the time item 2, 0.4 of the time item 3.
>>
>> Now, what happens if he picks not individual items, but pairs of
>> different items using numpy.random.choice with two items, replace=false?
>> Suddenly, the distribution of the individual items in the results get
>> skewed: If we look at the expected number of times we'll see each item in
>> one draw of a random pair, we will get:
>>
>> E(1) = P([1,2]) + P([1,3]) = 0.46666
>> E(2) = P([1,2]) + P([2,3]) = 0.76666
>> E(3) = P([1,3]) + P([2,3]) = 0.76666
>>
>> Or renormalizing by dividing by 2:
>>
>> P(1) = 0.233333
>> P(2) = 0.383333
>> P(3) = 0.383333
>>
>> As you can see this is not quite the probabilities we wanted (which were
>> 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more
>> often than we wanted, and item 2 and 3 were used a bit less often!
>>
>
> p is not the probability of the output but the one of the source finite
> population. I think that if you want to preserve that distribution, as
> Josef pointed out, you have to make extractions independent, that is either
> sample with replacement or approximate an infinite population (that is
> basically the same thing).  But of course in this case you will also end up
> with events [X,X].
>

With replacement and keeping duplicates the results might also be similar
in the pattern of the marginal probabilities
https://onlinecourses.science.psu.edu/stat506/node/17

Another approach in survey sampling is also to drop duplicates in with
replacement sampling, but then the sample size itself is random. (again I
didn't try to understand the small print)

(another related aside: The problem with discrete sample space in small
samples shows up also in calculating hypothesis tests, e.g. fisher's exact
or similar. Because, we only get a few discrete possibilities in the sample
space, it is not possible to construct a test that has exactly the desired
type 1 error.)


Josef


>
>
>> So that brought my question of why we consider these numbers right.
>>
>> In this example, it's actually possible to get the right item
>> distribution, if we pick the pair outcomes with the following probabilties:
>>
>>    P([1,2]) = 0.2        (not 0.233333 as above)
>>    P([1,3]) = 0.2
>>    P([2,3]) = 0.6        (not 0.533333 as above)
>>
>> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4
>>
>> Interestingly, fixing things like I suggest is not always possible.
>> Consider a different probability-vector example for three items - 0.99,
>> 0.005, 0.005. Now, no matter which algorithm we use for randomly picking
>> pairs from these three items, *each* returned pair will inevitably contain
>> one of the two very-low-probability items, so each of those items will
>> appear in roughly half the pairs, instead of in a vanishingly small
>> percentage as we hoped.
>>
>> But in other choices of probabilities (like the one in my original
>> example), there is a solution. For 2-out-of-3 sampling we can actually show
>> a system of three linear equations in three variables, so there is always
>> one solution but if this solution has components not valid as probabilities
>> (not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005,
>> 0.005 example.
>>
>>
>>
>>> What am I missing?
>>>
>>> Alessandro
>>>
>>>
>>> 2017-01-17 13:00 GMT+01:00 <numpy-discussion-request at scipy.org>:
>>>
>>>> Hi, I'm looking for a way to find a random sample of C different items
>>>> out
>>>> of N items, with a some desired probabilty Pi for each item i.
>>>>
>>>> I saw that numpy has a function that supposedly does this,
>>>> numpy.random.choice (with replace=False and a probabilities array), but
>>>> looking at the algorithm actually implemented, I am wondering in what
>>>> sense
>>>> are the probabilities Pi actually obeyed...
>>>>
>>>> To me, the code doesn't seem to be doing the right thing... Let me
>>>> explain:
>>>>
>>>> Consider a simple numerical example: We have 3 items, and need to pick 2
>>>> different ones randomly. Let's assume the desired probabilities for
>>>> item 1,
>>>> 2 and 3 are: 0.2, 0.4 and 0.4.
>>>>
>>>> Working out the equations there is exactly one solution here: The random
>>>> outcome of numpy.random.choice in this case should be [1,2] at
>>>> probability
>>>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is
>>>> indeed
>>>> a solution for the desired probabilities because it yields item 1 in
>>>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
>>>> 0.2+0.6 = 0.8 = 2*P2, etc.
>>>>
>>>> However, the algorithm in numpy.random.choice's replace=False
>>>> generates, if
>>>> I understand correctly, different probabilities for the outcomes: I
>>>> believe
>>>> in this case it generates [1,2] at probability 0.23333, [1,3] also
>>>> 0.2333,
>>>> and [2,3] at probability 0.53333.
>>>>
>>>> My question is how does this result fit the desired probabilities?
>>>>
>>>> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333,
>>>> then the expect number of "1" results we'll get per drawing is 0.23333 +
>>>> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and
>>>> for
>>>> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT
>>>> twice
>>>> common than item 1 as we originally desired (we asked for probabilities
>>>> 0.2, 0.4, 0.4 for the individual items!).
>>>>
>>>>
>>>> --
>>>> Nadav Har'El
>>>> nyh at scylladb.com
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>> URL: <https://mail.scipy.org/pipermail/numpy-discussion/attachmen
>>>> ts/20170117/d1f0a1db/attachment-0001.html>
>>>>
>>>> ------------------------------
>>>>
>>>> Subject: Digest Footer
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> End of NumPy-Discussion Digest, Vol 124, Issue 24
>>>> *************************************************
>>>>
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------
>>> --------------
>>> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may
>>> contain confidential information and are intended for the sole use of the
>>> recipient(s) named above. If you are not the intended recipient of this
>>> message you are hereby notified that any dissemination or copying of this
>>> message is strictly prohibited. If you have received this e-mail in error,
>>> please notify the sender either by telephone or by e-mail and delete the
>>> material from any computer. Thank you.
>>> ------------------------------------------------------------
>>> --------------
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> --------------------------------------------------------------------------
> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
> confidential information and are intended for the sole use of the
> recipient(s) named above. If you are not the intended recipient of this
> message you are hereby notified that any dissemination or copying of this
> message is strictly prohibited. If you have received this e-mail in error,
> please notify the sender either by telephone or by e-mail and delete the
> material from any computer. Thank you.
> --------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170117/14a5e63d/attachment.html>

From Jerome.Kieffer at esrf.fr  Wed Jan 18 02:15:06 2017
From: Jerome.Kieffer at esrf.fr (Jerome Kieffer)
Date: Wed, 18 Jan 2017 08:15:06 +0100
Subject: [Numpy-discussion] NumPy 1.12.0 release
In-Reply-To: <o5l7ql$tpi$2@blaine.gmane.org>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org>
Message-ID: <20170118081506.4ccd1cee@lintaillefer.esrf.fr>

On Tue, 17 Jan 2017 08:56:42 -0500
Neal Becker <ndbecker2 at gmail.com> wrote:

> I've installed via pip3 on linux x86_64, which gives me a wheel.  My 
> question is, am I loosing significant performance choosing this pre-built 
> binary vs. compiling myself?  For example, my processor might have some more 
> features than the base version used to build wheels.

Hi,

I have done some benchmarking (%timeit) for my code running in a
jupyter-notebook within a venv installed with pip+manylinux wheels
versus ipython and debian packages (on the same computer).
I noticed the debian installation was ~20% faster.

I did not investigate further if those 20% came from the manylinux (I
suspect) or from the notebook infrastructure.

HTH,
-- 
J?r?me Kieffer


From nathan12343 at gmail.com  Wed Jan 18 02:27:28 2017
From: nathan12343 at gmail.com (Nathan Goldbaum)
Date: Wed, 18 Jan 2017 07:27:28 +0000
Subject: [Numpy-discussion] NumPy 1.12.0 release
In-Reply-To: <20170118081506.4ccd1cee@lintaillefer.esrf.fr>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org> <20170118081506.4ccd1cee@lintaillefer.esrf.fr>
Message-ID: <CAJXewOnajZhjAQv7iuzwDyS0=NGw4GU00GQBAjHHJhbzh=Amfg@mail.gmail.com>

I've seen reports on the anaconda mailing list of people seeing similar
speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has the
same issue as manylinux in that they need to use versions of GCC available
on CentOS 5.

Given the upcoming official EOL for CentOS5, it might make sense to think
about making a pep for a CentOS 6-based manylinux2 docker image, which will
allow compiling with a newer GCC.

On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer <Jerome.Kieffer at esrf.fr>
wrote:

> On Tue, 17 Jan 2017 08:56:42 -0500
>
> Neal Becker <ndbecker2 at gmail.com> wrote:
>
>
>
> > I've installed via pip3 on linux x86_64, which gives me a wheel.  My
>
> > question is, am I loosing significant performance choosing this pre-built
>
> > binary vs. compiling myself?  For example, my processor might have some
> more
>
> > features than the base version used to build wheels.
>
>
>
> Hi,
>
>
>
> I have done some benchmarking (%timeit) for my code running in a
>
> jupyter-notebook within a venv installed with pip+manylinux wheels
>
> versus ipython and debian packages (on the same computer).
>
> I noticed the debian installation was ~20% faster.
>
>
>
> I did not investigate further if those 20% came from the manylinux (I
>
> suspect) or from the notebook infrastructure.
>
>
>
> HTH,
>
> --
>
> J?r?me Kieffer
>
>
>
> _______________________________________________
>
> NumPy-Discussion mailing list
>
> NumPy-Discussion at scipy.org
>
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/8fb0164a/attachment.html>

From ralf.gommers at gmail.com  Wed Jan 18 03:28:43 2017
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 18 Jan 2017 21:28:43 +1300
Subject: [Numpy-discussion] GSoC 2017: NumFocus will be an umbrella
	organization
In-Reply-To: <d5068955-94f4-05e4-c098-c96e47fbac1f@gmx.de>
References: <d5068955-94f4-05e4-c098-c96e47fbac1f@gmx.de>
Message-ID: <CABL7CQhLk+1-mQwteTbr8GPn=PM+1EprFPQ_zTjUpd7Mts2wfw@mail.gmail.com>

Hi Max,

On Tue, Jan 17, 2017 at 2:38 AM, Max Linke <max_linke at gmx.de> wrote:

> Hi
>
> Organizations can start submitting applications for Google Summer of Code
> 2017 on January 19 (and the deadline is February 9)
>
> https://developers.google.com/open-source/gsoc/timeline?hl=en


Thanks for bringing this up, and for organizing the NumFOCUS participation!


> NumFOCUS will be applying again this year. If you want to work with us
> please let me know and if you apply as an organization yourself or under a
> different umbrella organization please tell me as well.


I suspect we won't participate at all, but if we do then it's likely under
the PSF umbrella as we have done previously.

@all: in practice working on NumPy is just far too hard for most GSoC
students. Previous years we've registered and generated ideas, but not
gotten any students. We're also short on maintainer capacity. So I propose
to not participate this year.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/14d2f03b/attachment.html>

From nyh at scylladb.com  Wed Jan 18 03:35:30 2017
From: nyh at scylladb.com (Nadav Har'El)
Date: Wed, 18 Jan 2017 10:35:30 +0200
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
Message-ID: <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>

On Wed, Jan 18, 2017 at 1:58 AM, alebarde at gmail.com <alebarde at gmail.com>
wrote:

>
>
> 2017-01-17 22:13 GMT+01:00 Nadav Har'El <nyh at scylladb.com>:
>
>>
>> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com <alebarde at gmail.com>
>> wrote:
>>
>>> Hi Nadav,
>>>
>>> I may be wrong, but I think that the result of the current
>>> implementation is actually the expected one.
>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
>>> 0.4
>>>
>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>>>
>>
>> Yes, this formula does fit well with the actual algorithm in the code.
>> But, my question is *why* we want this formula to be correct:
>>
>> Just a note: this formula is correct and it is one of statistics
> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability +
> https://en.wikipedia.org/wiki/Bayes%27_theorem
>

Hi,

Yes, of course the formula is correct, but it doesn't mean we're not
applying it in the wrong context.

I'll be honest here: I came to numpy.random.choice after I actually coded a
similar algorithm (with the same results) myself, because like you I
thought this was the "obvious" and correct algorithm. Only then I realized
that its output doesn't actually produce the desired probabilities
specified by the user - even in the cases where that is possible. And I
started wondering if existing libraries - like numpy - do this differently.
And it turns out, numpy does it (basically) in the same way as my algorithm.


>
> Thus, the result we get from random.choice IMHO definitely makes sense.
>

Let's look at what the user asked this function, and what it returns:

User asks: please give me random pairs of the three items, where item 1 has
probability 0.2, item 2 has 0.4, and 3 has 0.4.

Function returns: random pairs, where if you make many random returned
results (as in the law of large numbers) and look at the items they
contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
0.38333.
These are not (quite) the probabilities the user asked for...

Can you explain a sense where the user's requested probabilities (0.2, 0.4,
0.4) are actually adhered in the results which random.choice returns?

Thanks,
Nadav Har'El.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/c7a7a106/attachment.html>

From alebarde at gmail.com  Wed Jan 18 04:00:50 2017
From: alebarde at gmail.com (alebarde at gmail.com)
Date: Wed, 18 Jan 2017 10:00:50 +0100
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
Message-ID: <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>

2017-01-18 9:35 GMT+01:00 Nadav Har'El <nyh at scylladb.com>:

>
> On Wed, Jan 18, 2017 at 1:58 AM, alebarde at gmail.com <alebarde at gmail.com>
> wrote:
>
>>
>>
>> 2017-01-17 22:13 GMT+01:00 Nadav Har'El <nyh at scylladb.com>:
>>
>>>
>>> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com <alebarde at gmail.com>
>>> wrote:
>>>
>>>> Hi Nadav,
>>>>
>>>> I may be wrong, but I think that the result of the current
>>>> implementation is actually the expected one.
>>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
>>>> 0.4
>>>>
>>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>>>>
>>>
>>> Yes, this formula does fit well with the actual algorithm in the code.
>>> But, my question is *why* we want this formula to be correct:
>>>
>>> Just a note: this formula is correct and it is one of statistics
>> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability
>> + https://en.wikipedia.org/wiki/Bayes%27_theorem
>>
>
> Hi,
>
> Yes, of course the formula is correct, but it doesn't mean we're not
> applying it in the wrong context.
>
> I'll be honest here: I came to numpy.random.choice after I actually coded
> a similar algorithm (with the same results) myself, because like you I
> thought this was the "obvious" and correct algorithm. Only then I realized
> that its output doesn't actually produce the desired probabilities
> specified by the user - even in the cases where that is possible. And I
> started wondering if existing libraries - like numpy - do this differently.
> And it turns out, numpy does it (basically) in the same way as my algorithm.
>
>
>>
>> Thus, the result we get from random.choice IMHO definitely makes sense.
>>
>
> Let's look at what the user asked this function, and what it returns:
>
> User asks: please give me random pairs of the three items, where item 1
> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>
> Function returns: random pairs, where if you make many random returned
> results (as in the law of large numbers) and look at the items they
> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
> 0.38333.
> These are not (quite) the probabilities the user asked for...
>
> Can you explain a sense where the user's requested probabilities (0.2,
> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>

I think that the question the user is asking by specifying p is a slightly
different one:
     "please give me random pairs of the three items extracted from a
population of 3 items where item 1 has probability of being extracted of
0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
extracted."


> Thanks,
> Nadav Har'El.
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
--------------------------------------------------------------------------
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/87749c00/attachment.html>

From nyh at scylladb.com  Wed Jan 18 04:52:45 2017
From: nyh at scylladb.com (Nadav Har'El)
Date: Wed, 18 Jan 2017 11:52:45 +0200
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
Message-ID: <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>

On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com <alebarde at gmail.com>
wrote:

> Let's look at what the user asked this function, and what it returns:
>
>>
>> User asks: please give me random pairs of the three items, where item 1
>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>
>> Function returns: random pairs, where if you make many random returned
>> results (as in the law of large numbers) and look at the items they
>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>> 0.38333.
>> These are not (quite) the probabilities the user asked for...
>>
>> Can you explain a sense where the user's requested probabilities (0.2,
>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>
>
> I think that the question the user is asking by specifying p is a slightly
> different one:
>      "please give me random pairs of the three items extracted from a
> population of 3 items where item 1 has probability of being extracted of
> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
> extracted."
>

You are right, if that is what the user wants, numpy.random.choice does the
right thing.

I'm just wondering whether this is actually what users want, and whether
they understand this is what they are getting.

As I said, I expected it to generate pairs with, empirically, the desired
distribution of individual items. The documentation of numpy.random.choice
seemed to me (wrongly) that it implis that that's what it does. So I was
surprised to realize that it does not.

Nadav.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/8c541b5f/attachment.html>

From jtaylor.debian at googlemail.com  Wed Jan 18 06:43:25 2017
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Wed, 18 Jan 2017 12:43:25 +0100
Subject: [Numpy-discussion] NumPy 1.12.0 release
In-Reply-To: <CAJXewOnajZhjAQv7iuzwDyS0=NGw4GU00GQBAjHHJhbzh=Amfg@mail.gmail.com>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org>
 <20170118081506.4ccd1cee@lintaillefer.esrf.fr>
 <CAJXewOnajZhjAQv7iuzwDyS0=NGw4GU00GQBAjHHJhbzh=Amfg@mail.gmail.com>
Message-ID: <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com>

The version of gcc used will make a large difference in some places.
E.g. the AVX2 integer ufuncs require something around 4.5 to work and in
general the optimization level of gcc has improved greatly since the
clang competition showed up around that time. centos 5 has 4.1 which is
really ancient.
I though the wheels used newer gccs also on centos 5?

On 18.01.2017 08:27, Nathan Goldbaum wrote:
> I've seen reports on the anaconda mailing list of people seeing similar
> speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has
> the same issue as manylinux in that they need to use versions of GCC
> available on CentOS 5.
> 
> Given the upcoming official EOL for CentOS5, it might make sense to
> think about making a pep for a CentOS 6-based manylinux2 docker image,
> which will allow compiling with a newer GCC.
> 
> On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer <Jerome.Kieffer at esrf.fr
> <mailto:Jerome.Kieffer at esrf.fr>> wrote:
> 
>     On Tue, 17 Jan 2017 08:56:42 -0500
> 
>     Neal Becker <ndbecker2 at gmail.com <mailto:ndbecker2 at gmail.com>> wrote:
> 
> 
> 
>     > I've installed via pip3 on linux x86_64, which gives me a wheel.  My
> 
>     > question is, am I loosing significant performance choosing this
>     pre-built
> 
>     > binary vs. compiling myself?  For example, my processor might have
>     some more
> 
>     > features than the base version used to build wheels.
> 
> 
> 
>     Hi,
> 
> 
> 
>     I have done some benchmarking (%timeit) for my code running in a
> 
>     jupyter-notebook within a venv installed with pip+manylinux wheels
> 
>     versus ipython and debian packages (on the same computer).
> 
>     I noticed the debian installation was ~20% faster.
> 
> 
> 
>     I did not investigate further if those 20% came from the manylinux (I
> 
>     suspect) or from the notebook infrastructure.
> 
> 
> 
>     HTH,
> 
>     --
> 
>     J?r?me Kieffer
> 
> 
> 
>     _______________________________________________
> 
>     NumPy-Discussion mailing list
> 
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
> 
>     https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


From ndbecker2 at gmail.com  Wed Jan 18 07:00:18 2017
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 18 Jan 2017 07:00:18 -0500
Subject: [Numpy-discussion] [SciPy-Dev]  NumPy 1.12.0 release
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org>
 <CAH6Pt5r0SGS6gedmc9mcsjhPKcSfyFgQvVnPHh05s3G1cDAnjQ@mail.gmail.com>
 <o5maet$h2$2@blaine.gmane.org>
 <CAPJVwBkgy1cfwMhZNiePrpv_-_reryW-mX44iZwHBCTxXHzLeQ@mail.gmail.com>
Message-ID: <o5nlcd$cjs$2@blaine.gmane.org>

Nathaniel Smith wrote:

> On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> Matthew Brett wrote:
>>
>>> Hi,
>>>
>>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker <ndbecker2 at gmail.com>
>>> wrote:
>>>> Charles R Harris wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I'm pleased to announce the NumPy 1.12.0 release. This release
>>>>> supports Python 2.7 and 3.4-3.6. Wheels for all supported Python
>>>>> versions may be downloaded from PiPY
>>>>> <https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the
>>>>> tarball and zip files may be downloaded from Github
>>>>> <https://github.com/numpy/numpy/releases/tag/v1.12.0>. The release
>>>>> notes and files hashes may also be found at Github
>>>>> <https://github.com/numpy/numpy/releases/tag/v1.12.0> .
>>>>>
>>>>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
>>>>> contributors and comprises a large number of fixes and improvements.
>>>>> Among
>>>>> the many improvements it is difficult to  pick out just a few as
>>>>> standing above the others, but the following may be of particular
>>>>> interest or indicate areas likely to have future consequences.
>>>>>
>>>>> * Order of operations in ``np.einsum`` can now be optimized for large
>>>>> speed improvements.
>>>>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with
>>>>> core dimensions.
>>>>> * The ``keepdims`` argument was added to many functions.
>>>>> * New context manager for testing warnings
>>>>> * Support for BLIS in numpy.distutils
>>>>> * Much improved support for PyPy (not yet finished)
>>>>>
>>>>> Enjoy,
>>>>>
>>>>> Chuck
>>>>
>>>> I've installed via pip3 on linux x86_64, which gives me a wheel.  My
>>>> question is, am I loosing significant performance choosing this
>>>> pre-built
>>>> binary vs. compiling myself?  For example, my processor might have some
>>>> more features than the base version used to build wheels.
>>>
>>> I guess you are thinking about using this built wheel on some other
>>> machine?   You'd have to be lucky for that to work; the wheel depends
>>> on the symbols it found at build time, which may not exist in the same
>>> places on your other machine.
>>>
>>> If it does work, the speed will primarily depend on your BLAS library.
>>>
>>> The pypi wheels should be pretty fast; they are built with OpenBLAS,
>>> which is at or near top of range for speed, across a range of
>>> platforms.
>>>
>>> Cheers,
>>>
>>> Matthew
>>
>> I installed using pip3 install, and it installed a wheel package.  I did
>> not
>> build it - aren't wheels already compiled packages?  So isn't it built
>> for the common denominator architecture, not necessarily as fast as one I
>> built
>> myself on my own machine?  My question is, on x86_64, is this potential
>> difference large enough to bother with not using precompiled wheel
>> packages?
> 
> Ultimately, it's going to depend on all sorts of things, including
> most importantly your actual code. Like most speed questions, the only
> real way to know is to try it and measure the difference.
> 
> The wheels do ship with a fast BLAS (OpenBLAS configured to
> automatically adapt to your CPU at runtime), so the performance will
> at least be reasonable. Possible improvements would include using a
> different and somehow better BLAS (MKL might be faster in some cases),
> tweaking your compiler options to take advantage of whatever SIMD ISAs
> your particular CPU supports (numpy's build system doesn't do this
> automatically but in principle you could do it by hand -- were you
> bothering before? does it even make a difference in practice? I
> dunno), and using a new compiler (the linux wheels use a somewhat
> ancient version of gcc for Reasons; newer compilers are better at
> optimizing -- how much does it matter? again I dunno).
> 
> Basically: if you want to experiment and report back then I think we'd
> all be interested to hear; OTOH if you aren't feeling particularly
> curious/ambitious then I wouldn't worry about it :-).
> 
> -n
> 

Yes, I always add -march=native, which should pickup whatever SIMD is 
available.  So my question was primarily if I should bother.  Thanks for the 
detailed answer.


From ndbecker2 at gmail.com  Wed Jan 18 07:02:01 2017
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 18 Jan 2017 07:02:01 -0500
Subject: [Numpy-discussion] NumPy 1.12.0 release
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org>
 <CAH6Pt5r0SGS6gedmc9mcsjhPKcSfyFgQvVnPHh05s3G1cDAnjQ@mail.gmail.com>
 <o5maet$h2$2@blaine.gmane.org>
 <CAH6Pt5r3a2ikUyy+Y2GkEWZJ8LH3m6Q9WBBqGAdhf3Z1sXSyNw@mail.gmail.com>
Message-ID: <o5nlfk$cjs$4@blaine.gmane.org>

Matthew Brett wrote:

> On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> Matthew Brett wrote:
>>
>>> Hi,
>>>
>>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker <ndbecker2 at gmail.com>
>>> wrote:
>>>> Charles R Harris wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I'm pleased to announce the NumPy 1.12.0 release. This release
>>>>> supports Python 2.7 and 3.4-3.6. Wheels for all supported Python
>>>>> versions may be downloaded from PiPY
>>>>> <https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=numpy>, the
>>>>> tarball and zip files may be downloaded from Github
>>>>> <https://github.com/numpy/numpy/releases/tag/v1.12.0>. The release
>>>>> notes and files hashes may also be found at Github
>>>>> <https://github.com/numpy/numpy/releases/tag/v1.12.0> .
>>>>>
>>>>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
>>>>> contributors and comprises a large number of fixes and improvements.
>>>>> Among
>>>>> the many improvements it is difficult to  pick out just a few as
>>>>> standing above the others, but the following may be of particular
>>>>> interest or indicate areas likely to have future consequences.
>>>>>
>>>>> * Order of operations in ``np.einsum`` can now be optimized for large
>>>>> speed improvements.
>>>>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with
>>>>> core dimensions.
>>>>> * The ``keepdims`` argument was added to many functions.
>>>>> * New context manager for testing warnings
>>>>> * Support for BLIS in numpy.distutils
>>>>> * Much improved support for PyPy (not yet finished)
>>>>>
>>>>> Enjoy,
>>>>>
>>>>> Chuck
>>>>
>>>> I've installed via pip3 on linux x86_64, which gives me a wheel.  My
>>>> question is, am I loosing significant performance choosing this
>>>> pre-built
>>>> binary vs. compiling myself?  For example, my processor might have some
>>>> more features than the base version used to build wheels.
>>>
>>> I guess you are thinking about using this built wheel on some other
>>> machine?   You'd have to be lucky for that to work; the wheel depends
>>> on the symbols it found at build time, which may not exist in the same
>>> places on your other machine.
>>>
>>> If it does work, the speed will primarily depend on your BLAS library.
>>>
>>> The pypi wheels should be pretty fast; they are built with OpenBLAS,
>>> which is at or near top of range for speed, across a range of
>>> platforms.
>>>
>>> Cheers,
>>>
>>> Matthew
>>
>> I installed using pip3 install, and it installed a wheel package.  I did
>> not
>> build it - aren't wheels already compiled packages?  So isn't it built
>> for the common denominator architecture, not necessarily as fast as one I
>> built
>> myself on my own machine?  My question is, on x86_64, is this potential
>> difference large enough to bother with not using precompiled wheel
>> packages?
> 
> Ah - my guess is that you'd be hard pressed to make a numpy that is as
> fast as the precompiled wheel.   The OpenBLAS library included in
> numpy selects the routines for your CPU at run-time, so they will
> generally be fast on your CPU.   You might be able to get equivalent
> or even better performance with a ATLAS BLAS library recompiled on
> your exact machine, but that's quite a serious investment of time to
> get working, and you'd have to benchmark to find if you were really
> doing any better.
> 
> Cheers,
> 
> Matthew

OK, so at least for BLAS things should be pretty well optimized.


From cournape at gmail.com  Wed Jan 18 07:15:16 2017
From: cournape at gmail.com (David Cournapeau)
Date: Wed, 18 Jan 2017 12:15:16 +0000
Subject: [Numpy-discussion] NumPy 1.12.0 release
In-Reply-To: <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org> <20170118081506.4ccd1cee@lintaillefer.esrf.fr>
 <CAJXewOnajZhjAQv7iuzwDyS0=NGw4GU00GQBAjHHJhbzh=Amfg@mail.gmail.com>
 <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com>
Message-ID: <CAGY4rcU=GKu6rM8fTrJU=1HzEhJfn=t1z4LZd1gJDReVTjx3ng@mail.gmail.com>

On Wed, Jan 18, 2017 at 11:43 AM, Julian Taylor <
jtaylor.debian at googlemail.com> wrote:

> The version of gcc used will make a large difference in some places.
> E.g. the AVX2 integer ufuncs require something around 4.5 to work and in
> general the optimization level of gcc has improved greatly since the
> clang competition showed up around that time. centos 5 has 4.1 which is
> really ancient.
> I though the wheels used newer gccs also on centos 5?
>

I don't know if it is mandatory for many wheels, but it is possilbe to
build w/ gcc 4.8 at least, and still binary compatibility with centos 5.X
and above, though I am not sure about the impact on speed.

It has been quite some time already that building numpy/scipy with gcc 4.1
causes troubles with errors and even crashes anyway, so you definitely want
to use a more recent compiler in any case.

David


> On 18.01.2017 08:27, Nathan Goldbaum wrote:
> > I've seen reports on the anaconda mailing list of people seeing similar
> > speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has
> > the same issue as manylinux in that they need to use versions of GCC
> > available on CentOS 5.
> >
> > Given the upcoming official EOL for CentOS5, it might make sense to
> > think about making a pep for a CentOS 6-based manylinux2 docker image,
> > which will allow compiling with a newer GCC.
> >
> > On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer <Jerome.Kieffer at esrf.fr
> > <mailto:Jerome.Kieffer at esrf.fr>> wrote:
> >
> >     On Tue, 17 Jan 2017 08:56:42 -0500
> >
> >     Neal Becker <ndbecker2 at gmail.com <mailto:ndbecker2 at gmail.com>>
> wrote:
> >
> >
> >
> >     > I've installed via pip3 on linux x86_64, which gives me a wheel.
> My
> >
> >     > question is, am I loosing significant performance choosing this
> >     pre-built
> >
> >     > binary vs. compiling myself?  For example, my processor might have
> >     some more
> >
> >     > features than the base version used to build wheels.
> >
> >
> >
> >     Hi,
> >
> >
> >
> >     I have done some benchmarking (%timeit) for my code running in a
> >
> >     jupyter-notebook within a venv installed with pip+manylinux wheels
> >
> >     versus ipython and debian packages (on the same computer).
> >
> >     I noticed the debian installation was ~20% faster.
> >
> >
> >
> >     I did not investigate further if those 20% came from the manylinux (I
> >
> >     suspect) or from the notebook infrastructure.
> >
> >
> >
> >     HTH,
> >
> >     --
> >
> >     J?r?me Kieffer
> >
> >
> >
> >     _______________________________________________
> >
> >     NumPy-Discussion mailing list
> >
> >     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
> >
> >     https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/5d24ebe6/attachment.html>

From njs at pobox.com  Wed Jan 18 07:59:18 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 18 Jan 2017 04:59:18 -0800
Subject: [Numpy-discussion] NumPy 1.12.0 release
In-Reply-To: <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com>
References: <CAB6mnxLjALhERKV9-YLUwVeAXVW7bQxcqWTU66u_-bOMGKVXnw@mail.gmail.com>
 <o5l7ql$tpi$2@blaine.gmane.org> <20170118081506.4ccd1cee@lintaillefer.esrf.fr>
 <CAJXewOnajZhjAQv7iuzwDyS0=NGw4GU00GQBAjHHJhbzh=Amfg@mail.gmail.com>
 <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com>
Message-ID: <CAPJVwBmA-Cp5xsmNPPJqjxdXnSU3khDDnC9PiRqVUDxB9y4pLQ@mail.gmail.com>

On Wed, Jan 18, 2017 at 3:43 AM, Julian Taylor
<jtaylor.debian at googlemail.com> wrote:
> The version of gcc used will make a large difference in some places.
> E.g. the AVX2 integer ufuncs require something around 4.5 to work and in
> general the optimization level of gcc has improved greatly since the
> clang competition showed up around that time. centos 5 has 4.1 which is
> really ancient.
> I though the wheels used newer gccs also on centos 5?

The wheels are built with gcc 4.8, which is the last version that you
can get to build for centos 5.

When we bump to centos 6 as the minimum supported, we'll be able to
switch to gcc 5.3.1.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


From josef.pktd at gmail.com  Wed Jan 18 08:53:24 2017
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 18 Jan 2017 08:53:24 -0500
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
Message-ID: <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>

On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El <nyh at scylladb.com> wrote:

>
> On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com <alebarde at gmail.com>
> wrote:
>
>> Let's look at what the user asked this function, and what it returns:
>>
>>>
>>> User asks: please give me random pairs of the three items, where item 1
>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>>
>>> Function returns: random pairs, where if you make many random returned
>>> results (as in the law of large numbers) and look at the items they
>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>>> 0.38333.
>>> These are not (quite) the probabilities the user asked for...
>>>
>>> Can you explain a sense where the user's requested probabilities (0.2,
>>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>>
>>
>> I think that the question the user is asking by specifying p is a
>> slightly different one:
>>      "please give me random pairs of the three items extracted from a
>> population of 3 items where item 1 has probability of being extracted of
>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>> extracted."
>>
>
> You are right, if that is what the user wants, numpy.random.choice does
> the right thing.
>
> I'm just wondering whether this is actually what users want, and whether
> they understand this is what they are getting.
>
> As I said, I expected it to generate pairs with, empirically, the desired
> distribution of individual items. The documentation of numpy.random.choice
> seemed to me (wrongly) that it implis that that's what it does. So I was
> surprised to realize that it does not.
>

As Alessandro and you showed, the function returns something that makes
sense. If the user wants something different, then they need to look for a
different function, which is however difficult if it doesn't have a
solution in general.

Sounds to me a bit like a Monty Hall problem. Whether we like it or not, or
find it counter intuitive, it is what it is given the sampling scheme.

Having more sampling schemes would be useful, but it's not possible to
implement sampling schemes with impossible properties

Josef


>
> Nadav.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/bb5967e1/attachment.html>

From josef.pktd at gmail.com  Wed Jan 18 09:30:48 2017
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 18 Jan 2017 09:30:48 -0500
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
Message-ID: <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>

On Wed, Jan 18, 2017 at 8:53 AM, <josef.pktd at gmail.com> wrote:

>
>
> On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El <nyh at scylladb.com> wrote:
>
>>
>> On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com <alebarde at gmail.com>
>> wrote:
>>
>>> Let's look at what the user asked this function, and what it returns:
>>>
>>>>
>>>> User asks: please give me random pairs of the three items, where item 1
>>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>>>
>>>> Function returns: random pairs, where if you make many random returned
>>>> results (as in the law of large numbers) and look at the items they
>>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>>>> 0.38333.
>>>> These are not (quite) the probabilities the user asked for...
>>>>
>>>> Can you explain a sense where the user's requested probabilities (0.2,
>>>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>>>
>>>
>>> I think that the question the user is asking by specifying p is a
>>> slightly different one:
>>>      "please give me random pairs of the three items extracted from a
>>> population of 3 items where item 1 has probability of being extracted of
>>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>>> extracted."
>>>
>>
>> You are right, if that is what the user wants, numpy.random.choice does
>> the right thing.
>>
>> I'm just wondering whether this is actually what users want, and whether
>> they understand this is what they are getting.
>>
>> As I said, I expected it to generate pairs with, empirically, the desired
>> distribution of individual items. The documentation of numpy.random.choice
>> seemed to me (wrongly) that it implis that that's what it does. So I was
>> surprised to realize that it does not.
>>
>
> As Alessandro and you showed, the function returns something that makes
> sense. If the user wants something different, then they need to look for a
> different function, which is however difficult if it doesn't have a
> solution in general.
>
> Sounds to me a bit like a Monty Hall problem. Whether we like it or not,
> or find it counter intuitive, it is what it is given the sampling scheme.
>
> Having more sampling schemes would be useful, but it's not possible to
> implement sampling schemes with impossible properties.
>

BTW: sampling 3 out of 3 without replacement is even worse

No matter what sampling scheme and what selection probabilities we use, we
always have every element with probability 1 in the sample.


(Which in survey statistics implies that the sampling error or standard
deviation of any estimate of a population mean or total is zero. Which I
found weird. How can you do statistics and get an estimate that doesn't
have any uncertainty associated with it?)

Josef


>
> Josef
>
>
>
>>
>> Nadav.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/372e9542/attachment.html>

From nyh at scylladb.com  Wed Jan 18 10:12:57 2017
From: nyh at scylladb.com (Nadav Har'El)
Date: Wed, 18 Jan 2017 17:12:57 +0200
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
Message-ID: <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>

On Wed, Jan 18, 2017 at 4:30 PM, <josef.pktd at gmail.com> wrote:

>
>
> Having more sampling schemes would be useful, but it's not possible to
>> implement sampling schemes with impossible properties.
>>
>>
>
> BTW: sampling 3 out of 3 without replacement is even worse
>
> No matter what sampling scheme and what selection probabilities we use, we
> always have every element with probability 1 in the sample.
>

I agree. The random-sample function of the type I envisioned will be able
to reproduce the desired probabilities in some cases (like the example I
gave) but not in others. Because doing this correctly involves a set of n
linear equations in comb(n,k) variables, it can have no solution, or many
solutions, depending on the n and k, and the desired probabilities. A
function of this sort could return an error if it can't achieve the desired
probabilities.

But in many cases (the 0.2, 0.4, 0.4 example I gave was just something
random I tried) there will be a way to achieve exactly the desired
distribution.

I guess I'll need to write this new function myself :-) Because my use case
definitely requires that the output of the random items produced matches
the required probabilities (when possible).

Thanks,
Nadav.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170118/4f83e9e8/attachment.html>

From max_linke at gmx.de  Wed Jan 18 10:18:24 2017
From: max_linke at gmx.de (Max Linke)
Date: Wed, 18 Jan 2017 16:18:24 +0100
Subject: [Numpy-discussion] GSoC 2017: NumFocus will be an umbrella
 organization
In-Reply-To: <CABL7CQhLk+1-mQwteTbr8GPn=PM+1EprFPQ_zTjUpd7Mts2wfw@mail.gmail.com>
References: <d5068955-94f4-05e4-c098-c96e47fbac1f@gmx.de>
 <CABL7CQhLk+1-mQwteTbr8GPn=PM+1EprFPQ_zTjUpd7Mts2wfw@mail.gmail.com>
Message-ID: <1b0dc33a-d608-81b0-7211-71ee3fd5e37a@gmx.de>


On 01/18/2017 09:28 AM, Ralf Gommers wrote:
> Hi Max,
>
> On Tue, Jan 17, 2017 at 2:38 AM, Max Linke <max_linke at gmx.de
> <mailto:max_linke at gmx.de>> wrote:
>
> Hi
>
> Organizations can start submitting applications for Google Summer of
> Code 2017 on January 19 (and the deadline is February 9)
>
> https://developers.google.com/open-source/gsoc/timeline?hl=en
> <https://developers.google.com/open-source/gsoc/timeline?hl=en>
>
>
> Thanks for bringing this up, and for organizing the NumFOCUS
> participation!
>
>
> NumFOCUS will be applying again this year. If you want to work with
> us please let me know and if you apply as an organization yourself
> or under a different umbrella organization please tell me as well.
>
>
> I suspect we won't participate at all, but if we do then it's likely
> under the PSF umbrella as we have done previously.

Thanks for letting me now. If you decide to participate with the PSF
please write me a private mail so that I can update the NumFOCUS gsoc
page accordingly.

>
> @all: in practice working on NumPy is just far too hard for most
> GSoC students. Previous years we've registered and generated ideas,
> but not gotten any students. We're also short on maintainer capacity.
> So I propose to not participate this year.
>
> Ralf
>
>
>
> _______________________________________________ NumPy-Discussion
> mailing list NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>


From pierre.schnizer at helmholtz-berlin.de  Sat Jan 21 03:23:43 2017
From: pierre.schnizer at helmholtz-berlin.de (Schnizer, Pierre)
Date: Sat, 21 Jan 2017 08:23:43 +0000
Subject: [Numpy-discussion] Building external c modules with mingw64 / numpy
Message-ID: <243DBD016692E54EB12F37B87C66E70E815DB8@didag1>

Dear all,

               I built  an external c-module (pygsl) using mingw 64 from msys2 mingw64-gcc compiler.

This  built required  some changes  to numpy.distutils to get the
?python setup.py config?
and
?python setup.py build?
working. In this process  I replaced  2 files in numpy.distutils  from numpy git repository:

-          numpy.dist_utils.misc_utils.py version ec0e046 <https://github.com/numpy/numpy/commit/ec0e04694278ef9ea83537d308b07fc27c1b5f85> on 14 Dec 2016

-          numpy.dist_utils. mingw32ccompiler.py version ec0e046 <https://github.com/numpy/numpy/commit/ec0e04694278ef9ea83537d308b07fc27c1b5f85> on 14 Dec 2016

mingw32ccompiler.py required to be modified to get it work

n  preprocessor  had to be defined  as I am using  setup.py config

n  specifying the runtime library search path to the linker

n  include path  of the vcrtruntime

I attached a patch reflecting the changes  I had to make  to file mingw32ccompile.py
If this information is useful I am  happy to answer questions

Sincerely yours
               Pierre

PS  Version infos:
Python:
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32

Numpy:
>> help(numpy.version)
Help on module numpy.version in numpy:
DATA
    full_version = '1.12.0'
    git_revision = '561f1accf861ad8606ea2dd723d2be2b09a2dffa'
    release = True
    short_version = '1.12.0'
    version = '1.12.0'

gcc.exe (Rev2, Built by MSYS2 project) 6.2.0


________________________________

Helmholtz-Zentrum Berlin f?r Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher
Gesch?ftsf?hrung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170121/ea53ec7a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: python_numpy_mingw_git.diff
Type: application/octet-stream
Size: 2448 bytes
Desc: python_numpy_mingw_git.diff
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170121/ea53ec7a/attachment.obj>

From josef.pktd at gmail.com  Sat Jan 21 10:10:53 2017
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 21 Jan 2017 10:10:53 -0500
Subject: [Numpy-discussion] offset in fill diagonal
Message-ID: <CAMMTP+BCwOKLU=0EphTf1R4G4p97St2yrUQfg2XSzAO3-r_Nsw@mail.gmail.com>

Is there a simple way to fill in diagonal elements in an array for other
than main diagonal?

As far as I can see, the diagxxx functions that have offset can only read
and not inplace modify, and the functions for modifying don't have offset
and only allow changing the main diagonal.

Usecase: creating banded matrices (2-D arrays) similar to toeplitz.


Josef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170121/5e33e9ef/attachment.html>

From jtaylor.debian at googlemail.com  Sat Jan 21 10:23:33 2017
From: jtaylor.debian at googlemail.com (Julian Taylor)
Date: Sat, 21 Jan 2017 16:23:33 +0100
Subject: [Numpy-discussion] offset in fill diagonal
In-Reply-To: <CAMMTP+BCwOKLU=0EphTf1R4G4p97St2yrUQfg2XSzAO3-r_Nsw@mail.gmail.com>
References: <CAMMTP+BCwOKLU=0EphTf1R4G4p97St2yrUQfg2XSzAO3-r_Nsw@mail.gmail.com>
Message-ID: <b8de1032-c73c-7a0d-662e-b9eec72ab1c7@googlemail.com>

On 21.01.2017 16:10, josef.pktd at gmail.com wrote:
> Is there a simple way to fill in diagonal elements in an array for other
> than main diagonal?
> 
> As far as I can see, the diagxxx functions that have offset can only
> read and not inplace modify, and the functions for modifying don't have
> offset and only allow changing the main diagonal.
> 
> Usecase: creating banded matrices (2-D arrays) similar to toeplitz.
> 

you can construct index arrays or boolean masks to index using the
np.tri* functions.
e.g.

a = np.arange(5*5).reshape(5,5)
band = np.tri(5, 5, 1, dtype=np.bool) & ~np.tri(5, 5, -2, dtype=np.bool)
a[band] = -1


From insertinterestingnamehere at gmail.com  Sat Jan 21 14:26:12 2017
From: insertinterestingnamehere at gmail.com (Ian Henriksen)
Date: Sat, 21 Jan 2017 19:26:12 +0000
Subject: [Numpy-discussion] offset in fill diagonal
In-Reply-To: <b8de1032-c73c-7a0d-662e-b9eec72ab1c7@googlemail.com>
References: <CAMMTP+BCwOKLU=0EphTf1R4G4p97St2yrUQfg2XSzAO3-r_Nsw@mail.gmail.com>
 <b8de1032-c73c-7a0d-662e-b9eec72ab1c7@googlemail.com>
Message-ID: <CAPWbfpTjiZhKnSjWjq2MeKPOj2crSR5nQ8zif97Et7EPdVA1-g@mail.gmail.com>

On Sat, Jan 21, 2017 at 9:23 AM Julian Taylor <jtaylor.debian at googlemail.com>
wrote:

> On 21.01.2017 16:10, josef.pktd at gmail.com wrote:
> > Is there a simple way to fill in diagonal elements in an array for other
> > than main diagonal?
> >
> > As far as I can see, the diagxxx functions that have offset can only
> > read and not inplace modify, and the functions for modifying don't have
> > offset and only allow changing the main diagonal.
> >
> > Usecase: creating banded matrices (2-D arrays) similar to toeplitz.
> >
>
> you can construct index arrays or boolean masks to index using the
> np.tri* functions.
> e.g.
>
> a = np.arange(5*5).reshape(5,5)
> band = np.tri(5, 5, 1, dtype=np.bool) & ~np.tri(5, 5, -2, dtype=np.bool)
> a[band] = -1
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion


You can slice the array you're filling before passing it to fill_diagonal.
For example:

import numpy as np
a = np.zeros((4, 4))
b = np.ones(3)
np.fill_diagonal(a[1:], b)
np.fill_diagonal(a[:,1:], -b)

yields

array([[ 0., -1.,  0.,  0.],
       [ 1.,  0., -1.,  0.],
       [ 0.,  1.,  0., -1.],
       [ 0.,  0.,  1.,  0.]])

Hope this helps,

Ian Henriksen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170121/86fc8e3b/attachment.html>

From cournape at gmail.com  Mon Jan 23 06:40:51 2017
From: cournape at gmail.com (David Cournapeau)
Date: Mon, 23 Jan 2017 11:40:51 +0000
Subject: [Numpy-discussion] Numpy 1.11.3, scipy 0.18.1,
	MSVC 2015 and crashes in complex functions
Message-ID: <CAGY4rcWMiiFEcMjbXLO_n443K-zoBPnBQQS=09B4zkVhCc=ujQ@mail.gmail.com>

Hi there,

While building the latest scipy on top of numpy 1.11.3, I have noticed
crashes while running the scipy test suite, in scipy.special (e.g. in
scipy.special hyp0f1 test).. This only happens on windows for python 3.5
(where we use MSVC 2015 compiler).

Applying some violence to distutils, I re-built numpy/scipy with debug
symbols, and the debugger claims that crashes happen inside scipy.special
ufunc cython code, when calling clog or csqrt. I first suspected a compiler
bug, but disabling those functions in numpy, to force using our own
versions in npymath, made the problem go away.

I am a bit suspicious about the whole thing as neither conda's or gholke's
wheel crashed. Has anybody else encountered this ?

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/1cd6454f/attachment.html>

From evgeny.burovskiy at gmail.com  Mon Jan 23 06:46:02 2017
From: evgeny.burovskiy at gmail.com (Evgeni Burovski)
Date: Mon, 23 Jan 2017 14:46:02 +0300
Subject: [Numpy-discussion] Numpy 1.11.3, scipy 0.18.1,
 MSVC 2015 and crashes in complex functions
In-Reply-To: <CAGY4rcWMiiFEcMjbXLO_n443K-zoBPnBQQS=09B4zkVhCc=ujQ@mail.gmail.com>
References: <CAGY4rcWMiiFEcMjbXLO_n443K-zoBPnBQQS=09B4zkVhCc=ujQ@mail.gmail.com>
Message-ID: <CAMRo0iv6SdSH_3C-c33RHhxcHReXg11cFYcpXf6qAR1o7Lj2fw@mail.gmail.com>

Related to https://github.com/scipy/scipy/issues/6336?
23.01.2017 14:40 ???????????? "David Cournapeau" <cournape at gmail.com>
???????:

> Hi there,
>
> While building the latest scipy on top of numpy 1.11.3, I have noticed
> crashes while running the scipy test suite, in scipy.special (e.g. in
> scipy.special hyp0f1 test).. This only happens on windows for python 3.5
> (where we use MSVC 2015 compiler).
>
> Applying some violence to distutils, I re-built numpy/scipy with debug
> symbols, and the debugger claims that crashes happen inside scipy.special
> ufunc cython code, when calling clog or csqrt. I first suspected a compiler
> bug, but disabling those functions in numpy, to force using our own
> versions in npymath, made the problem go away.
>
> I am a bit suspicious about the whole thing as neither conda's or gholke's
> wheel crashed. Has anybody else encountered this ?
>
> David
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/07e721a6/attachment.html>

From cournape at gmail.com  Mon Jan 23 07:02:01 2017
From: cournape at gmail.com (David Cournapeau)
Date: Mon, 23 Jan 2017 12:02:01 +0000
Subject: [Numpy-discussion] Numpy 1.11.3, scipy 0.18.1,
 MSVC 2015 and crashes in complex functions
In-Reply-To: <CAMRo0iv6SdSH_3C-c33RHhxcHReXg11cFYcpXf6qAR1o7Lj2fw@mail.gmail.com>
References: <CAGY4rcWMiiFEcMjbXLO_n443K-zoBPnBQQS=09B4zkVhCc=ujQ@mail.gmail.com>
 <CAMRo0iv6SdSH_3C-c33RHhxcHReXg11cFYcpXf6qAR1o7Lj2fw@mail.gmail.com>
Message-ID: <CAGY4rcXsE501f7ndTOEfkQXQ4Sz8-PmxZUoJkpigeTh6wh-dtA@mail.gmail.com>

Indeed. I wrongly assumed that since gholke's wheels did not crash, they
did not run into that issue.

That sounds like an ABI issue, since I suspect intel math library supports
C99 complex numbers. I will add info on that issue then,

David

On Mon, Jan 23, 2017 at 11:46 AM, Evgeni Burovski <
evgeny.burovskiy at gmail.com> wrote:

> Related to https://github.com/scipy/scipy/issues/6336?
> 23.01.2017 14:40 ???????????? "David Cournapeau" <cournape at gmail.com>
> ???????:
>
>> Hi there,
>>
>> While building the latest scipy on top of numpy 1.11.3, I have noticed
>> crashes while running the scipy test suite, in scipy.special (e.g. in
>> scipy.special hyp0f1 test).. This only happens on windows for python 3.5
>> (where we use MSVC 2015 compiler).
>>
>> Applying some violence to distutils, I re-built numpy/scipy with debug
>> symbols, and the debugger claims that crashes happen inside scipy.special
>> ufunc cython code, when calling clog or csqrt. I first suspected a compiler
>> bug, but disabling those functions in numpy, to force using our own
>> versions in npymath, made the problem go away.
>>
>> I am a bit suspicious about the whole thing as neither conda's or
>> gholke's wheel crashed. Has anybody else encountered this ?
>>
>> David
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/c1ffdb1a/attachment.html>

From peridot.faceted at gmail.com  Mon Jan 23 07:27:43 2017
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Mon, 23 Jan 2017 12:27:43 +0000
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
 <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
Message-ID: <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>

On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El <nyh at scylladb.com> wrote:

> On Wed, Jan 18, 2017 at 4:30 PM, <josef.pktd at gmail.com> wrote:
>
>
>
> Having more sampling schemes would be useful, but it's not possible to
> implement sampling schemes with impossible properties.
>
>
>
> BTW: sampling 3 out of 3 without replacement is even worse
>
> No matter what sampling scheme and what selection probabilities we use, we
> always have every element with probability 1 in the sample.
>
>
> I agree. The random-sample function of the type I envisioned will be able
> to reproduce the desired probabilities in some cases (like the example I
> gave) but not in others. Because doing this correctly involves a set of n
> linear equations in comb(n,k) variables, it can have no solution, or many
> solutions, depending on the n and k, and the desired probabilities. A
> function of this sort could return an error if it can't achieve the desired
> probabilities.
>

It seems to me that the basic problem here is that the numpy.random.choice
docstring fails to explain what the function actually does when called with
weights and without replacement. Clearly there are different expectations;
I think numpy.random.choice chose one that is easy to explain and implement
but not necessarily what everyone expects. So the docstring should be
clarified. Perhaps a Notes section:

When numpy.random.choice is called with replace=False and non-uniform
probabilities, the resulting distribution of samples is not obvious.
numpy.random.choice effectively follows the procedure: when choosing the
kth element in a set, the probability of element i occurring is p[i]
divided by the total probability of all not-yet-chosen (and therefore
eligible) elements. This approach is always possible as long as the sample
size is no larger than the population, but it means that the probability
that element i occurs in the sample is not exactly p[i].

Anne

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/30b0e5b0/attachment.html>

From robert.kern at gmail.com  Mon Jan 23 09:33:57 2017
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 23 Jan 2017 08:33:57 -0600
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
 <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
 <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>
Message-ID: <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>

On Mon, Jan 23, 2017 at 6:27 AM, Anne Archibald <peridot.faceted at gmail.com>
wrote:
>
> On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El <nyh at scylladb.com> wrote:
>>
>> On Wed, Jan 18, 2017 at 4:30 PM, <josef.pktd at gmail.com> wrote:
>>>
>>>> Having more sampling schemes would be useful, but it's not possible to
implement sampling schemes with impossible properties.
>>>
>>> BTW: sampling 3 out of 3 without replacement is even worse
>>>
>>> No matter what sampling scheme and what selection probabilities we use,
we always have every element with probability 1 in the sample.
>>
>> I agree. The random-sample function of the type I envisioned will be
able to reproduce the desired probabilities in some cases (like the example
I gave) but not in others. Because doing this correctly involves a set of n
linear equations in comb(n,k) variables, it can have no solution, or many
solutions, depending on the n and k, and the desired probabilities. A
function of this sort could return an error if it can't achieve the desired
probabilities.
>
> It seems to me that the basic problem here is that the
numpy.random.choice docstring fails to explain what the function actually
does when called with weights and without replacement. Clearly there are
different expectations; I think numpy.random.choice chose one that is easy
to explain and implement but not necessarily what everyone expects. So the
docstring should be clarified. Perhaps a Notes section:
>
> When numpy.random.choice is called with replace=False and non-uniform
probabilities, the resulting distribution of samples is not obvious.
numpy.random.choice effectively follows the procedure: when choosing the
kth element in a set, the probability of element i occurring is p[i]
divided by the total probability of all not-yet-chosen (and therefore
eligible) elements. This approach is always possible as long as the sample
size is no larger than the population, but it means that the probability
that element i occurs in the sample is not exactly p[i].

I don't object to some Notes, but I would probably phrase it more like we
are providing the standard definition of the jargon term "sampling without
replacement" in the case of non-uniform probabilities. To my mind (or more
accurately, with my background), "replace=False" obviously picks out the
implemented procedure, and I would have been incredibly surprised if it did
anything else. If the option were named "unique=True", then I would have
needed some more documentation to let me know exactly how it was
implemented.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/685223be/attachment.html>

From alebarde at gmail.com  Mon Jan 23 09:52:56 2017
From: alebarde at gmail.com (alebarde at gmail.com)
Date: Mon, 23 Jan 2017 15:52:56 +0100
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
 <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
 <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>
 <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>
Message-ID: <CAOaM08jNWdB1FsQ1y=Z0bKiooTg8t1_9t-Sj=9Y1Leb0KqCthg@mail.gmail.com>

2017-01-23 15:33 GMT+01:00 Robert Kern <robert.kern at gmail.com>:

> On Mon, Jan 23, 2017 at 6:27 AM, Anne Archibald <peridot.faceted at gmail.com>
> wrote:
> >
> > On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El <nyh at scylladb.com> wrote:
> >>
> >> On Wed, Jan 18, 2017 at 4:30 PM, <josef.pktd at gmail.com> wrote:
> >>>
> >>>> Having more sampling schemes would be useful, but it's not possible
> to implement sampling schemes with impossible properties.
> >>>
> >>> BTW: sampling 3 out of 3 without replacement is even worse
> >>>
> >>> No matter what sampling scheme and what selection probabilities we
> use, we always have every element with probability 1 in the sample.
> >>
> >> I agree. The random-sample function of the type I envisioned will be
> able to reproduce the desired probabilities in some cases (like the example
> I gave) but not in others. Because doing this correctly involves a set of n
> linear equations in comb(n,k) variables, it can have no solution, or many
> solutions, depending on the n and k, and the desired probabilities. A
> function of this sort could return an error if it can't achieve the desired
> probabilities.
> >
> > It seems to me that the basic problem here is that the
> numpy.random.choice docstring fails to explain what the function actually
> does when called with weights and without replacement. Clearly there are
> different expectations; I think numpy.random.choice chose one that is easy
> to explain and implement but not necessarily what everyone expects. So the
> docstring should be clarified. Perhaps a Notes section:
> >
> > When numpy.random.choice is called with replace=False and non-uniform
> probabilities, the resulting distribution of samples is not obvious.
> numpy.random.choice effectively follows the procedure: when choosing the
> kth element in a set, the probability of element i occurring is p[i]
> divided by the total probability of all not-yet-chosen (and therefore
> eligible) elements. This approach is always possible as long as the sample
> size is no larger than the population, but it means that the probability
> that element i occurs in the sample is not exactly p[i].
>
> I don't object to some Notes, but I would probably phrase it more like we
> are providing the standard definition of the jargon term "sampling without
> replacement" in the case of non-uniform probabilities. To my mind (or more
> accurately, with my background), "replace=False" obviously picks out the
> implemented procedure, and I would have been incredibly surprised if it did
> anything else. If the option were named "unique=True", then I would have
> needed some more documentation to let me know exactly how it was
> implemented.
>
> FWIW, I totally agree with Robert


>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
--------------------------------------------------------------------------
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/c2175bfb/attachment.html>

From peridot.faceted at gmail.com  Mon Jan 23 10:22:42 2017
From: peridot.faceted at gmail.com (Anne Archibald)
Date: Mon, 23 Jan 2017 15:22:42 +0000
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
 <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
 <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>
 <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>
Message-ID: <CANm_+ZqOp9D0u=T0Gw2=wkM_-VhCnbByAYsCUnDxHxu4LZOJEw@mail.gmail.com>

On Mon, Jan 23, 2017 at 3:34 PM Robert Kern <robert.kern at gmail.com> wrote:

> I don't object to some Notes, but I would probably phrase it more like we
> are providing the standard definition of the jargon term "sampling without
> replacement" in the case of non-uniform probabilities. To my mind (or more
> accurately, with my background), "replace=False" obviously picks out the
> implemented procedure, and I would have been incredibly surprised if it did
> anything else. If the option were named "unique=True", then I would have
> needed some more documentation to let me know exactly how it was
> implemented.
>

It is what I would have expected too, but we have a concrete example of a
user who expected otherwise; where one user speaks up, there are probably
more who didn't (some of whom probably have code that's not doing what they
think it does). So for the cost of adding a Note, why not help some of them?

As for the standardness of the definition: I don't know, have you a
reference where it is defined? More natural to me would be to have a list
of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time).
I'm hesitant to claim ours is a standard definition unless it's in a
textbook somewhere. But I don't insist on my phrasing.

Anne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/826c0b6b/attachment.html>

From nyh at scylladb.com  Mon Jan 23 10:41:57 2017
From: nyh at scylladb.com (Nadav Har'El)
Date: Mon, 23 Jan 2017 17:41:57 +0200
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAOaM08jNWdB1FsQ1y=Z0bKiooTg8t1_9t-Sj=9Y1Leb0KqCthg@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
 <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
 <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>
 <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>
 <CAOaM08jNWdB1FsQ1y=Z0bKiooTg8t1_9t-Sj=9Y1Leb0KqCthg@mail.gmail.com>
Message-ID: <CANEVyjsa=Q-H7+w3zOPTwiA55KynLFaatpoB+mU4BbqUDxPthQ@mail.gmail.com>

On Mon, Jan 23, 2017 at 4:52 PM, alebarde at gmail.com <alebarde at gmail.com>
wrote:

>
>
> 2017-01-23 15:33 GMT+01:00 Robert Kern <robert.kern at gmail.com>:
>
>>
>> I don't object to some Notes, but I would probably phrase it more like we
>> are providing the standard definition of the jargon term "sampling without
>> replacement" in the case of non-uniform probabilities. To my mind (or more
>> accurately, with my background), "replace=False" obviously picks out the
>> implemented procedure, and I would have been incredibly surprised if it did
>> anything else. If the option were named "unique=True", then I would have
>> needed some more documentation to let me know exactly how it was
>> implemented.
>>
>> FWIW, I totally agree with Robert
>

With my own background (MSc. in Mathematics), I agree that this algorithm
is indeed the most natural one. And as I said, when I wanted to implement
something myself when I wanted to choose random combinations (k out of n
items), I wrote exactly the same one. But when it didn't produce the
desired probabilities (even in cases where I knew that doing this was
possible), I wrongly assumed numpy would do things differently - only to
realize it uses exactly the same algorithm. So clearly, the documentation
didn't quite explain what it does or doesn't do.

Also, Robert, I'm curious: beyond explaining why the existing algorithm is
reasonable (which I agree), could you give me an example of where it is
actually  *useful* for sampling?

Let me give you an illustrative counter-example:

Let's imagine a country that a country has 3 races: 40% Lilliputians, 40%
Blefuscans, an 20% Yahoos (immigrants from a different section of the book
;-)).
Gulliver wants to take a poll, and needs to sample people from all these
races with appropriate proportions.

These races live in different parts of town, so to pick a random person he
needs to first pick one of the races and then a random person from that
part of town.

If he picks one respondent at a time, he uses numpy.random.choice(3,
size=1,p=[0.4,0.4,0.2])) to pick the part of town, and then a person from
that part - he gets the desired 40% / 40% / 20% division of races.

Now imagine that Gulliver can interview two respondents each day, so he
needs to pick two people each time. If he picks 2 choices of part-of-town
*with* replacement, numpy.random.choice(3, size=2,p=[0.4,0.4,0.2]), that's
also fine: he may need to take two people from the same part of town, or
two from two different parts of town, but in any case will still get the
desired 40% / 40% / 20% division between the races of the people he
interviews.

But consider that we are told that if two people from the same race meet in
Gulliver's interview room, the two start chatting between themselves, and
waste Gulliver's time. So he prefers to interview two people of *different*
races. That's sampling without replacement. So he uses
numpy.random.choice(size=2,p=[0.4,0.4,0.2],replace=False) to pick two
different parts of town, and one person from each.
But then he looks at his logs, and discovers he actually interviewed the
races at 38% / 38% / 23% proportions - not the 40%/40%/20% he wanted.
So the opinions of the Yahoos were over-counted in this poll!

I know that this is a silly example (made even sillier by the names of
races I used), but I wonder if you could give me an example where the
current behavior of replace=False is genuinely useful.

Not that I'm saying that fixing this problem is easy (I'm still struggling
with it myself in the general case of size < n-1).

Nadav.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/ccb574ab/attachment.html>

From robert.kern at gmail.com  Mon Jan 23 10:47:54 2017
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 23 Jan 2017 09:47:54 -0600
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CANm_+ZqOp9D0u=T0Gw2=wkM_-VhCnbByAYsCUnDxHxu4LZOJEw@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
 <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
 <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>
 <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>
 <CANm_+ZqOp9D0u=T0Gw2=wkM_-VhCnbByAYsCUnDxHxu4LZOJEw@mail.gmail.com>
Message-ID: <CAF6FJiui93NSuKKLfraUXZiz96e7pdBtvz8+wb5iuSCd3ewZXA@mail.gmail.com>

On Mon, Jan 23, 2017 at 9:22 AM, Anne Archibald <peridot.faceted at gmail.com>
wrote:
>
>
> On Mon, Jan 23, 2017 at 3:34 PM Robert Kern <robert.kern at gmail.com> wrote:
>>
>> I don't object to some Notes, but I would probably phrase it more like
we are providing the standard definition of the jargon term "sampling
without replacement" in the case of non-uniform probabilities. To my mind
(or more accurately, with my background), "replace=False" obviously picks
out the implemented procedure, and I would have been incredibly surprised
if it did anything else. If the option were named "unique=True", then I
would have needed some more documentation to let me know exactly how it was
implemented.
>
>
> It is what I would have expected too, but we have a concrete example of a
user who expected otherwise; where one user speaks up, there are probably
more who didn't (some of whom probably have code that's not doing what they
think it does). So for the cost of adding a Note, why not help some of them?

That's why I said I'm fine with adding a Note. I'm just suggesting a
re-wording so that the cautious language doesn't lead anyone who is
familiar with the jargon to think we're doing something ad hoc while still
providing the details for those who aren't so familiar.

> As for the standardness of the definition: I don't know, have you a
reference where it is defined? More natural to me would be to have a list
of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time).
I'm hesitant to claim ours is a standard definition unless it's in a
textbook somewhere. But I don't insist on my phrasing.

Textbook, I'm not so sure, but it is the *only* definition I've ever
encountered in the literature:

http://epubs.siam.org/doi/abs/10.1137/0209009
http://www.sciencedirect.com/science/article/pii/S002001900500298X

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/3cd252ec/attachment.html>

From nyh at scylladb.com  Mon Jan 23 11:08:18 2017
From: nyh at scylladb.com (Nadav Har'El)
Date: Mon, 23 Jan 2017 18:08:18 +0200
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CAF6FJiui93NSuKKLfraUXZiz96e7pdBtvz8+wb5iuSCd3ewZXA@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
 <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
 <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>
 <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>
 <CANm_+ZqOp9D0u=T0Gw2=wkM_-VhCnbByAYsCUnDxHxu4LZOJEw@mail.gmail.com>
 <CAF6FJiui93NSuKKLfraUXZiz96e7pdBtvz8+wb5iuSCd3ewZXA@mail.gmail.com>
Message-ID: <CANEVyjv0zWzrKfcXS-qdMviS=hrsijDBkYNkZkxBbYcJFoRqxQ@mail.gmail.com>

On Mon, Jan 23, 2017 at 5:47 PM, Robert Kern <robert.kern at gmail.com> wrote:

>
> > As for the standardness of the definition: I don't know, have you a
> reference where it is defined? More natural to me would be to have a list
> of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time).
> I'm hesitant to claim ours is a standard definition unless it's in a
> textbook somewhere. But I don't insist on my phrasing.
>
> Textbook, I'm not so sure, but it is the *only* definition I've ever
> encountered in the literature:
>
> http://epubs.siam.org/doi/abs/10.1137/0209009
>

Very interesting. This paper (PDF available if you search for its name in
Google) explicitly mentions one of the uses of this algorithm is
"multistage sampling", which appears to be exactly the same thing as in the
hypothetical Gulliver example I gave in my earlier mail.

And yet, I showed in my mail that this algorithm does NOT reproduce the
desired frequency of the different sampling units...

Moreover, this paper doesn't explain why you need the "without replacement"
for this use case (everything seems easier, and the desired probabilities
are reproduced, with replacement).
In my story I gave a funny excuse why "without replacement" might be
warrented, but if you're interested I can tell you a bit about my actual
use case, with a more serious reason why I want without replacement.


> http://www.sciencedirect.com/science/article/pii/S002001900500298X
>
> --
> Robert Kern
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/6f387eb0/attachment.html>

From robert.kern at gmail.com  Mon Jan 23 11:08:29 2017
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 23 Jan 2017 10:08:29 -0600
Subject: [Numpy-discussion] Question about numpy.random.choice with
	probabilties
In-Reply-To: <CANEVyjsa=Q-H7+w3zOPTwiA55KynLFaatpoB+mU4BbqUDxPthQ@mail.gmail.com>
References: <CAOaM08ismzUkaL+Vym-BzA1wge5Dtm7kuXipZFQFHgxjrZ_7Bw@mail.gmail.com>
 <CANEVyjuhAbPRio6okEwdErb8i_S1PB67MNiipT1Dnaq4Lb05Yg@mail.gmail.com>
 <CAOaM08h2UTS+oPT7hTGAEYTgR1ZQ5vD3L7Uq6OaFBGoQUb5xeA@mail.gmail.com>
 <CANEVyjs_-VHRBM0SU8Maa28uw42Q=datzvo6z=M+vC4OmOkuSQ@mail.gmail.com>
 <CAOaM08iS7zijy-k6awSPAvGwtyxjnU5NTrAvJvmuVXV8QkUggg@mail.gmail.com>
 <CANEVyjsxpYBLKQf3kUiHsyqX0yy8T0WcF6YfCxAfELCcgxD97A@mail.gmail.com>
 <CAMMTP+A4i5SLNqxmQ+QMvP1qwOv-osLNm7e+8-zvGFdef8gzSg@mail.gmail.com>
 <CAMMTP+ApcYATLeS3QVnzP-A6HJt2h=inAJJ3wbgi=OT5gKaNfg@mail.gmail.com>
 <CANEVyjt43H5r+j2OqEyQgOPhCgH_w+PXUW9G0_2Ejrfo8=5fvQ@mail.gmail.com>
 <CANm_+Zo7XbmattD+VsHXAsk=675mzG_GDZ7Rn6yZfcOLobyyaQ@mail.gmail.com>
 <CAF6FJisR+MNucZoeDAucwGNi1EHTf3mS-o4YPEyZrRntFZcGEQ@mail.gmail.com>
 <CAOaM08jNWdB1FsQ1y=Z0bKiooTg8t1_9t-Sj=9Y1Leb0KqCthg@mail.gmail.com>
 <CANEVyjsa=Q-H7+w3zOPTwiA55KynLFaatpoB+mU4BbqUDxPthQ@mail.gmail.com>
Message-ID: <CAF6FJitR6vhu2y=YwDP4V7V0f-fM96Ha=HC486OVWVLPHtS21w@mail.gmail.com>

On Mon, Jan 23, 2017 at 9:41 AM, Nadav Har'El <nyh at scylladb.com> wrote:
>
> On Mon, Jan 23, 2017 at 4:52 PM, alebarde at gmail.com <alebarde at gmail.com>
wrote:
>>
>> 2017-01-23 15:33 GMT+01:00 Robert Kern <robert.kern at gmail.com>:
>>>
>>> I don't object to some Notes, but I would probably phrase it more like
we are providing the standard definition of the jargon term "sampling
without replacement" in the case of non-uniform probabilities. To my mind
(or more accurately, with my background), "replace=False" obviously picks
out the implemented procedure, and I would have been incredibly surprised
if it did anything else. If the option were named "unique=True", then I
would have needed some more documentation to let me know exactly how it was
implemented.
>>>
>> FWIW, I totally agree with Robert
>
> With my own background (MSc. in Mathematics), I agree that this algorithm
is indeed the most natural one. And as I said, when I wanted to implement
something myself when I wanted to choose random combinations (k out of n
items), I wrote exactly the same one. But when it didn't produce the
desired probabilities (even in cases where I knew that doing this was
possible), I wrongly assumed numpy would do things differently - only to
realize it uses exactly the same algorithm. So clearly, the documentation
didn't quite explain what it does or doesn't do.

In my experience, I have seen "without replacement" mean only one thing. If
the docstring had said "returns unique items", I'd agree that it doesn't
explain what it does or doesn't do. The only issue is that "without
replacement" is jargon, and it is good to recapitulate the definitions of
such terms for those who aren't familiar with them.

> Also, Robert, I'm curious: beyond explaining why the existing algorithm
is reasonable (which I agree), could you give me an example of where it is
actually  *useful* for sampling?

The references I previously quoted list a few. One is called "multistage
sampling proportional to size". The idea being that you draw (without
replacement) from a larger units (say, congressional districts) before
sampling within them. It is similar to the situation you outline, but it is
probably more useful at a different scale, like lots of larger units (where
your algorithm is likely to provide no solution) rather than a handful.

It is probably less useful in terms of survey design, where you are trying
to *design* a process to get a result, than it is in queueing theory and
related fields, where you are trying to *describe* and simulate a process
that is pre-defined.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170123/89f32d22/attachment.html>

From edwardlrichards at gmail.com  Wed Jan 25 15:14:50 2017
From: edwardlrichards at gmail.com (Edward Richards)
Date: Wed, 25 Jan 2017 12:14:50 -0800
Subject: [Numpy-discussion] Checking matrix condition number
Message-ID: <5889073A.6050403@gmail.com>

What is the best way to make sure that a matrix inversion makes any 
sense before preforming it? I am currently struggling to understand some 
results from matrix inversions in my work, and I would like to see if I 
am dealing with an ill-conditioned problem. It is probably user error, 
but I don't like having the possibility hanging over my head.

I naively put a call to np.linalg.cond into my code; all of my cores 
went to 100% and a few minutes later I got a number. To be fair A is 
6400 elements square, but this takes ~20x more time than the inversion. 
This is not really practical for what I am doing, is there a better way?

This is partly in response to Ilhan Polat's post about introducing the 
A\b operator to numpy. I also couldn't check the Numpy mailing list 
archives to see if this has been asked before, the numpy-discussion 
gmane link isn't working for me at all.

Thanks for your time,
Ned


From ilhanpolat at gmail.com  Thu Jan 26 04:29:45 2017
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Thu, 26 Jan 2017 10:29:45 +0100
Subject: [Numpy-discussion] Checking matrix condition number
In-Reply-To: <5889073A.6050403@gmail.com>
References: <5889073A.6050403@gmail.com>
Message-ID: <CAEBuzr-Q+hAf1QyJa3xb28asidNO2A2Jkh39V34WY4KdmZ_KmQ@mail.gmail.com>

I've indeed opened an issue for this :
https://github.com/numpy/numpy/issues/8090 . Recently, I've included the
LAPACK routines into SciPy dev version that will come with version 0.19.
Then you can use ?GECON, ?POCON and other ?XXCON routines for yourself or
wait a bit more until I have time to implement it on the SciPy side.

@rkern told me that for NumPy, C translations are involved but I couldn't
find an entrance point to contribute for yet. It's a bit above my abilities
to fully grasp the way of working in NumPy. You can read more in
https://github.com/numpy/numpy/issues/3755

Best,
ilhan


On Wed, Jan 25, 2017 at 9:14 PM, Edward Richards <edwardlrichards at gmail.com>
wrote:

> What is the best way to make sure that a matrix inversion makes any sense
> before preforming it? I am currently struggling to understand some results
> from matrix inversions in my work, and I would like to see if I am dealing
> with an ill-conditioned problem. It is probably user error, but I don't
> like having the possibility hanging over my head.
>
> I naively put a call to np.linalg.cond into my code; all of my cores went
> to 100% and a few minutes later I got a number. To be fair A is 6400
> elements square, but this takes ~20x more time than the inversion. This is
> not really practical for what I am doing, is there a better way?
>
> This is partly in response to Ilhan Polat's post about introducing the A\b
> operator to numpy. I also couldn't check the Numpy mailing list archives to
> see if this has been asked before, the numpy-discussion gmane link isn't
> working for me at all.
>
> Thanks for your time,
> Ned
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170126/83275502/attachment.html>

From matthew.brett at gmail.com  Fri Jan 27 13:24:16 2017
From: matthew.brett at gmail.com (Matthew Brett)
Date: Fri, 27 Jan 2017 18:24:16 +0000
Subject: [Numpy-discussion] Numpy development version wheels for testing
Message-ID: <CAH6Pt5o7VpeRd9Mg-Veg6JDvSbhL0QGvYU0DviCUD721XYSTQg@mail.gmail.com>

Hi,

I've taken advantage of the new travis-ci cron job feature [1] to set
up daily builds of numpy manylinux and OSX wheels for the current
trunk, uploading to:

https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com

The numpy build process already builds Ubuntu Precise numpy wheels for
the current trunk, available at [2], but the cron-job manylinux wheels
have the following advantages:

* they are built the same way as our usual pypi wheels, using
openblas, and so will be closer to the eventual numpy distributed
wheel;
* manylinux wheels will install on all the travis-ci containers, not
just the Precise container;
* manylinux wheels don't need any extra packages installed by apt,
because they are self-contained.

There's an example of use at
https://github.com/matthew-brett/nibabel/blob/use-pre/.travis.yml#L23

Cheers,

Matthew

[1] https://docs.travis-ci.com/user/cron-jobs
[2] https://f66d8a5767b134cb96d3-4ffdece11fd3f72855e4665bc61c7445.ssl.cf2.rackcdn.com


From evgeny.burovskiy at gmail.com  Sat Jan 28 06:37:28 2017
From: evgeny.burovskiy at gmail.com (Evgeni Burovski)
Date: Sat, 28 Jan 2017 14:37:28 +0300
Subject: [Numpy-discussion] Numpy development version wheels for testing
In-Reply-To: <CAH6Pt5o7VpeRd9Mg-Veg6JDvSbhL0QGvYU0DviCUD721XYSTQg@mail.gmail.com>
References: <CAH6Pt5o7VpeRd9Mg-Veg6JDvSbhL0QGvYU0DviCUD721XYSTQg@mail.gmail.com>
Message-ID: <CAMRo0isxRE7DYvj84Cmd3x1fgC7HU=8b2rL9A4HPWVw03VTm3w@mail.gmail.com>

On Fri, Jan 27, 2017 at 9:24 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> I've taken advantage of the new travis-ci cron job feature [1] to set
> up daily builds of numpy manylinux and OSX wheels for the current
> trunk, uploading to:
>
> https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com
>
> The numpy build process already builds Ubuntu Precise numpy wheels for
> the current trunk, available at [2], but the cron-job manylinux wheels
> have the following advantages:
>
> * they are built the same way as our usual pypi wheels, using
> openblas, and so will be closer to the eventual numpy distributed
> wheel;
> * manylinux wheels will install on all the travis-ci containers, not
> just the Precise container;
> * manylinux wheels don't need any extra packages installed by apt,
> because they are self-contained.
>
> There's an example of use at
> https://github.com/matthew-brett/nibabel/blob/use-pre/.travis.yml#L23
>
> Cheers,
>
> Matthew
>
> [1] https://docs.travis-ci.com/user/cron-jobs
> [2] https://f66d8a5767b134cb96d3-4ffdece11fd3f72855e4665bc61c7445.ssl.cf2.rackcdn.com


This is great, thank you Matthew!


From faltet at gmail.com  Sun Jan 29 08:07:48 2017
From: faltet at gmail.com (Francesc Alted)
Date: Sun, 29 Jan 2017 14:07:48 +0100
Subject: [Numpy-discussion] ANN: numexpr 2.6.2 released!
Message-ID: <CAFrp1vrjJgrYgd=xV=NRi+4REwHd0XMNBFmTL9eXfEq5KVhVNw@mail.gmail.com>

=========================

 Announcing Numexpr 2.6.2

=========================


What's new

==========


This is a maintenance release that fixes several issues, with special

emphasis in keeping compatibility with newer NumPy versions.  Also,

initial support for POWER processors is here.  Thanks to Oleksandr

Pavlyk, Alexander Shadchin, Breno Leitao, Fernando Seiti Furusato and

Antonio Valentino for their nice contributions.


In case you want to know more in detail what has changed in this

version, see:


https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst


What's Numexpr

==============


Numexpr is a fast numerical expression evaluator for NumPy.  With it,

expressions that operate on arrays (like "3*a+4*b") are accelerated

and use less memory than doing the same calculation in Python.


It wears multi-threaded capabilities, as well as support for Intel's

MKL (Math Kernel Library), which allows an extremely fast evaluation

of transcendental functions (sin, cos, tan, exp, log...) while

squeezing the last drop of performance out of your multi-core

processors.  Look here for a some benchmarks of numexpr using MKL:


https://github.com/pydata/numexpr/wiki/NumexprMKL


Its only dependency is NumPy (MKL is optional), so it works well as an

easy-to-deploy, easy-to-use, computational engine for projects that

don't want to adopt other solutions requiring more heavy dependencies.


Where I can find Numexpr?

=========================


The project is hosted at GitHub in:


https://github.com/pydata/numexpr


You can get the packages from PyPI as well (but not for RC releases):


http://pypi.python.org/pypi/numexpr


Share your experience

=====================


Let us know of any bugs, suggestions, gripes, kudos, etc. you may

have.


Enjoy data!

-- 
Francesc Alted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170129/5a332b01/attachment.html>

From spluque at gmail.com  Tue Jan 31 19:56:30 2017
From: spluque at gmail.com (Seb)
Date: Tue, 31 Jan 2017 18:56:30 -0600
Subject: [Numpy-discussion] composing Euler rotation matrices
Message-ID: <87h94e4tkx.fsf@otaria.sebmel.org>

Hello,

I'm trying to compose Euler rotation matrices shown in
https://en.wikipedia.org/wiki/Euler_angles#Rotation_matrix.  For
example, The Z1Y2X3 Tait-Bryan rotation shown in the table can be
represented in Numpy using the function:

def z1y2x3(alpha, beta, gamma):
    """Rotation matrix given Euler angles"""
    return np.array([[np.cos(alpha) * np.cos(beta),
                      np.cos(alpha) * np.sin(beta) * np.sin(gamma) -
                      np.cos(gamma) * np.sin(alpha),
                      np.sin(alpha) * np.sin(gamma) +
                      np.cos(alpha) * np.cos(gamma) * np.sin(beta)],
                     [np.cos(beta) * np.sin(alpha),
                      np.cos(alpha) * np.cos(gamma) +
                      np.sin(alpha) * np.sin(beta) * np.sin(gamma),
                      np.cos(gamma) * np.sin(alpha) * np.sin(beta) -
                      np.cos(alpha) * np.sin(gamma)],
                     [-np.sin(beta), np.cos(beta) * np.sin(gamma),
                      np.cos(beta) * np.cos(gamma)]])

which given alpha, beta, gamma as:

angles = np.radians(np.array([30, 20, 10]))

returns the following matrix:

In [31]: z1y2x3(angles[0], angles[1], angles[2])
Out[31]: 

array([[ 0.81379768, -0.44096961,  0.37852231],
       [ 0.46984631,  0.88256412,  0.01802831],
       [-0.34202014,  0.16317591,  0.92541658]])

If I understand correctly, one should be able to compose this matrix by
multiplying the rotation matrices that it is made of.  However, I cannot
reproduce this matrix via composition; i.e. by multiplying the
underlying rotation matrices.  Any tips would be appreciated.

-- 
Seb


From jfoxrabinovitz at gmail.com  Tue Jan 31 21:23:55 2017
From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz)
Date: Tue, 31 Jan 2017 21:23:55 -0500
Subject: [Numpy-discussion] composing Euler rotation matrices
In-Reply-To: <87h94e4tkx.fsf@otaria.sebmel.org>
References: <87h94e4tkx.fsf@otaria.sebmel.org>
Message-ID: <CAAa1KPYTocDHo1nOregxJU6vhR1X12qjzBO1LeHha1VTr7DEdg@mail.gmail.com>

Could you show what you are doing to get the statement "However, I cannot
reproduce this matrix via composition; i.e. by multiplying the underlying
rotation matrices.". I would guess something involving the `*` operator
instead of `@`, but guessing probably won't help you solve your issue.

    -Joe


On Tue, Jan 31, 2017 at 7:56 PM, Seb <spluque at gmail.com> wrote:

> Hello,
>
> I'm trying to compose Euler rotation matrices shown in
> https://en.wikipedia.org/wiki/Euler_angles#Rotation_matrix.  For
> example, The Z1Y2X3 Tait-Bryan rotation shown in the table can be
> represented in Numpy using the function:
>
> def z1y2x3(alpha, beta, gamma):
>     """Rotation matrix given Euler angles"""
>     return np.array([[np.cos(alpha) * np.cos(beta),
>                       np.cos(alpha) * np.sin(beta) * np.sin(gamma) -
>                       np.cos(gamma) * np.sin(alpha),
>                       np.sin(alpha) * np.sin(gamma) +
>                       np.cos(alpha) * np.cos(gamma) * np.sin(beta)],
>                      [np.cos(beta) * np.sin(alpha),
>                       np.cos(alpha) * np.cos(gamma) +
>                       np.sin(alpha) * np.sin(beta) * np.sin(gamma),
>                       np.cos(gamma) * np.sin(alpha) * np.sin(beta) -
>                       np.cos(alpha) * np.sin(gamma)],
>                      [-np.sin(beta), np.cos(beta) * np.sin(gamma),
>                       np.cos(beta) * np.cos(gamma)]])
>
> which given alpha, beta, gamma as:
>
> angles = np.radians(np.array([30, 20, 10]))
>
> returns the following matrix:
>
> In [31]: z1y2x3(angles[0], angles[1], angles[2])
> Out[31]:
>
> array([[ 0.81379768, -0.44096961,  0.37852231],
>        [ 0.46984631,  0.88256412,  0.01802831],
>        [-0.34202014,  0.16317591,  0.92541658]])
>
> If I understand correctly, one should be able to compose this matrix by
> multiplying the rotation matrices that it is made of.  However, I cannot
> reproduce this matrix via composition; i.e. by multiplying the
> underlying rotation matrices.  Any tips would be appreciated.
>
> --
> Seb
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170131/19d4f3bc/attachment.html>

From spluque at gmail.com  Tue Jan 31 22:27:35 2017
From: spluque at gmail.com (Seb)
Date: Tue, 31 Jan 2017 21:27:35 -0600
Subject: [Numpy-discussion] composing Euler rotation matrices
References: <87h94e4tkx.fsf@otaria.sebmel.org>
 <CAAa1KPYTocDHo1nOregxJU6vhR1X12qjzBO1LeHha1VTr7DEdg@mail.gmail.com>
Message-ID: <87d1f24ml4.fsf@otaria.sebmel.org>

On Tue, 31 Jan 2017 21:23:55 -0500,
Joseph Fox-Rabinovitz <jfoxrabinovitz at gmail.com> wrote:

> Could you show what you are doing to get the statement "However, I
> cannot reproduce this matrix via composition; i.e. by multiplying the
> underlying rotation matrices.". I would guess something involving the
> `*` operator instead of `@`, but guessing probably won't help you
> solve your issue.

Sure, although composition is not something I can take credit for, as
it's a well-described operation for generating linear transformations.
It is the matrix multiplication of two or more transformation matrices.
In the case of Euler transformations, it's matrices specifying rotations
around 3 orthogonal axes by 3 given angles.  I'm using `numpy.dot' to
perform matrix multiplication on 2D arrays representing matrices.

However, it's not obvious from the link I provided what particular
rotation matrices are multiplied and in what order (i.e. what
composition) is used to arrive at the Z1Y2X3 rotation matrix shown.
Perhaps I'm not understanding the conventions used therein.  This is one
of my attempts at reproducing that rotation matrix via composition:

---<--------------------cut here---------------start------------------->---
import numpy as np

angles = np.radians(np.array([30, 20, 10]))

def z1y2x3(alpha, beta, gamma):
    """Z1Y2X3 rotation matrix given Euler angles"""
    return np.array([[np.cos(alpha) * np.cos(beta),
                      np.cos(alpha) * np.sin(beta) * np.sin(gamma) -
                      np.cos(gamma) * np.sin(alpha),
                      np.sin(alpha) * np.sin(gamma) +
                      np.cos(alpha) * np.cos(gamma) * np.sin(beta)],
                     [np.cos(beta) * np.sin(alpha),
                      np.cos(alpha) * np.cos(gamma) +
                      np.sin(alpha) * np.sin(beta) * np.sin(gamma),
                      np.cos(gamma) * np.sin(alpha) * np.sin(beta) -
                      np.cos(alpha) * np.sin(gamma)],
                     [-np.sin(beta), np.cos(beta) * np.sin(gamma),
                      np.cos(beta) * np.cos(gamma)]])

euler_mat = z1y2x3(angles[0], angles[1], angles[2])

## Now via composition

def rotation_matrix(theta, axis, active=False):
    """Generate rotation matrix for a given axis

    Parameters
    ----------

    theta: numeric, optional
        The angle (degrees) by which to perform the rotation.  Default is
        0, which means return the coordinates of the vector in the rotated
        coordinate system, when rotate_vectors=False.
    axis: int, optional
        Axis around which to perform the rotation (x=0; y=1; z=2)
    active: bool, optional
        Whether to return active transformation matrix.

    Returns
    -------
    numpy.ndarray
    3x3 rotation matrix
    """
    theta = np.radians(theta)
    if axis == 0:
        R_theta = np.array([[1, 0, 0],
                            [0, np.cos(theta), -np.sin(theta)],
                            [0, np.sin(theta), np.cos(theta)]])
    elif axis == 1:
        R_theta = np.array([[np.cos(theta), 0, np.sin(theta)],
                            [0, 1, 0],
                            [-np.sin(theta), 0, np.cos(theta)]])
    else:
        R_theta = np.array([[np.cos(theta), -np.sin(theta), 0],
                            [np.sin(theta), np.cos(theta), 0],
                            [0, 0, 1]])
    if active:
        R_theta = np.transpose(R_theta)
    return R_theta

## The rotations are given as active
xmat = rotation_matrix(angles[2], 0, active=True)
ymat = rotation_matrix(angles[1], 1, active=True)
zmat = rotation_matrix(angles[0], 2, active=True)
## The operation seems to imply this composition
euler_comp_mat = np.dot(xmat, np.dot(ymat, zmat))
---<--------------------cut here---------------end--------------------->---

I believe the matrices `euler_mat' and `euler_comp_mat' should be the
same, but they aren't, so it's unclear to me what particular composition
is meant to produce the matrix specified by this Z1Y2X3 transformation.
What am I missing?

-- 
Seb


From shoyer at gmail.com  Tue Jan 31 23:19:08 2017
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 31 Jan 2017 20:19:08 -0800
Subject: [Numpy-discussion] ANN: xarray v0.9 released
Message-ID: <CAEQ_Tveh=1aKSeF1jW0HzzhYS-7wqy35Yocws-vd1DVHHQes7Q@mail.gmail.com>

I'm pleased to announce the release of the latest major version of xarray,
v0.9.

xarray is an open source project and Python package that provides a toolkit
and data structures for N-dimensional labeled arrays. Its approach combines
an API inspired by pandas with the Common Data Model for self-described
scientific data.

This release includes five months worth of enhancements and bug fixes from
24 contributors, including some significant enhancements to the data model
that are not fully backwards compatible.

Highlights include:
- Coordinates are now optional in the xarray data model, even for
dimensions.
- Changes to caching, lazy loading and pickling to improve xarray?s
experience for parallel computing.
- Improvements for accessing and manipulating pandas.MultiIndex levels.
- Many new methods and functions, including quantile(), cumsum(),
cumprod(), combine_firstset_index(), reset_index(), reorder_levels(),
full_like(), zeros_like(), ones_like(), open_dataarray(), compute(),
Dataset.info(), testing.assert_equal(), testing.assert_identical(), and
testing.assert_allclose().

For more details, read the full release notes:
http://xarray.pydata.org/en/latest/whats-new.html

You can install xarray with pip or conda:
pip install xarray
conda install -c conda-forge xarray

Best,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170131/efd90e29/attachment.html>