From matti.picus at gmail.com  Tue Feb  4 10:08:25 2020
From: matti.picus at gmail.com (Matti Picus)
Date: Tue, 4 Feb 2020 17:08:25 +0200
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
Message-ID: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>

Together with Sayed Adel (cc) and Ralf, I am pleased to put the draft 
version of NEP 38 [0] up for discussion. As per NEP 0, this is the next 
step in the community accepting the approach layed out in the NEP. The 
NEP PR [1] has already garnered a fair amount of discussion about the 
viability of Universal SIMD Intrinsics, so I will try to capture some of 
that here as well.


Abstract

While compilers are getting better at using hardware-specific routines 
to optimize code, they sometimes do not produce optimal results. Also, 
we would like to be able to copy binary optimized C-extension modules 
from one machine to another with the same base architecture (x86, ARM, 
PowerPC) but with different capabilities without recompiling.

We have a mechanism in the ufunc machinery to build alternative loops 
indexed by CPU feature name. At import (in InitOperators), the loop 
function that matches the run-time CPU info is chosen from the 
candidates.This NEP proposes a mechanism to build on that for many more 
features and architectures. The steps proposed are to:

 ??? Establish a set of well-defined, architecture-agnostic, universal 
intrisics which capture features available across architectures.

 ??? Capture these universal intrisics in a set of C macros and use the 
macros to build code paths for sets of features from the baseline up to 
the maximum set of features available on that architecture. Offer these 
as a limited number of compiled alternative code paths.

 ??? At runtime, discover which CPU features are available, and choose 
from among the possible code paths accordingly.

Motivation and Scope

Traditionally NumPy has counted on the compilers to generate optimal 
code specifically for the target architecture. However few users today 
compile NumPy locally for their machines. Most use the binary packages 
which must provide run-time support for the lowest-common denominator 
CPU architecture. Thus NumPy cannot take advantage of more advanced 
features of their CPU processors, since they may not be available on all 
users? systems. The ufunc machinery already has a loop-selection 
protocol based on dtypes, so it is easy to extend this to also select an 
optimal loop for specifically available CPU features at runtime.

Traditionally, these features have been exposed through intrinsics which 
are compiler-specific instructions that map directly to assembly 
instructions. Recently there were discussions about the effectiveness of 
adding more intrinsics (e.g., `gh-11113`_ for AVX optimizations for 
floats). In the past, architecture-specific code was added to NumPy for 
fast avx512 routines in various ufuncs, using the mechanism described 
above to choose the best loop for the architecture. However the code is 
not generic and does not generalize to other architectures.

Recently, OpenCV moved to using universal intrinsics in the Hardware 
Abstraction Layer (HAL) which provided a nice abstraction for common 
shared Single Instruction Multiple Data (SIMD) constructs. This NEP 
proposes a similar mechanism for NumPy. There are three stages to using 
the mechanism:


- Infrastructure is provided in the code for abstract intrinsics. The 
ufunc machinery will be extended using sets of these abstract 
intrinsics, so that a single ufunc will be expressed as a set of loops, 
going from a minimal to a maximal set of possibly availabe intrinsics.


- At compile time, compiler macros and CPU detection are used to turn 
the abstract intrinsics into concrete intrinsic calls. Any intrinsics 
not available on the platform, either because the CPU does not support 
them (and so cannot be tested) or because the abstract intrinsic does 
not have a parallel concrete intrinsic on the platform will not error, 
rather the corresponding loop will not be produced and added to the set 
of possibilities.


- At runtime, the CPU detection code will further limit the set of loops 
available, and the optimal one will be chosen for the ufunc.

The current NEP proposes only to use the runtime feature detection and 
optimal loop selection mechanism for ufuncs. Future NEPS may propose 
other uses for the proposed solution.


Usage and Impact

The end user will be able to get a list of intrinsics available for 
their platform and compiler. Optionally, the user may be able to specify 
which of the loops available at runtime will be used, perhaps via an 
environment variable to enable benchmarking the impact of the different 
loops. There should be no direct impact to naive end users, the results 
of all the loops should be identical to within a small number (1-3?) 
ULPs. On the other hand, users with more powerful machines should notice 
a significant performance boost.
Binary releases - wheels on PyPI and conda packages

The binaries released by this process will be larger since they include 
all possible loops for the architecture. Some packagers may prefer to 
limit the number of loops in order to limit the size of the binaries, we 
would hope they would still support a wide range of families of 
architectures. Note this problem already exists in the Intel MKL 
offering, where the binary package includes an extensive set of 
alternative shared objects (DLLs) for various CPU alternatives.


Source builds

See ?Detailed Description? below. A source build where the packager 
knows details of the target machine could theoretically produce a 
smaller binary by choosing to compile only the loops needed by the 
target via command line arguments.
How to run benchmarks to assess performance benefits

Adding more code which use intrinsics will make the code harder to 
maintain. Therefore, such code should only be added if it yields a 
significant performance benefit. Assessing this performance benefit can 
be nontrivial. To aid with this, the implementation for this NEP will 
add a way to select which instruction sets can be used at runtime via 
environment variables. (name TBD). This ablility is critical for CI code 
verification.
Diagnostics

A new dictionary __cpu_features__ will be available to python. The keys 
are the available features, the value is a boolean whether the feature 
is available or not. Various new private C functions will be used 
internally to query available features. These might be exposed via 
specific c-extension modules for testing.
Workflow for adding a new CPU architecture-specific optimization

NumPy will always have a baseline C implementation for any code that may 
be a candidate for SIMD vectorization. If a contributor wants to add 
SIMD support for some architecture (typically the one of most interest 
to them), this is the proposed workflow:

TODO (see 
https://github.com/numpy/numpy/pull/13516#issuecomment-558859638, needs 
to be worked out more)
Reuse by other projects

It would be nice if the universal intrinsics would be available to other 
libraries like SciPy or Astropy that also build ufuncs, but that is not 
an explicit goal of the first implementation of this NEP.

-----------------------------------------------------------------------------------

My biased summary of select comments from the PR:

(Raghuveer): A very similar SIMD library has been proposed for C++. Here 
is the link to the details:

 1. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0214r8.pdf
 2. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/n4808.pdf

There is good discussion on the minimal/common set of instructions 
across architectures (which narrows down to loads, stores, arithmetic, 
compare, bitwise and shuffle instructions). Based on my developer 
experience so far, these instructions aren't by themselves enough to 
implement and optimize NumPy ufuncs. As i pointed out earlier, I think I 
would find it useful to learn the workflow of how to use instructions 
that don't fit in the Universal Intrinsic framework.


(Raguveer) gave a well laid out table of currently proposed unversal 
intrinsics by use: load/store, reorder, operators, conversions, 
arithmatic and misc [2] which led to a long response from Sayed [3] with 
some sample code, demonstrating how more complex operations can be built 
up from the primitives.


(catree) mentioned the Simd Library [4] and Halide [5] and asked about 
maintainability.


(Ralf) responded [6] with concerns about competent developer bandwidth 
for code review. He also mentioned that our CI system currently supports 
all the architectures we are targeting (x86, aarch64, s390x, ppc64le) 
although some of these machines may not have the most advanced hardware 
to support the latest intrinsics.


I apologize if my summary is not accurate, pleas correct any mistakes or 
misconceptions.

----------------------------------------------------------------------------------------


Barring complete rejection of the idea here, we will be pushing forward 
with PRs to implement this. Comments either on the mailing list or in 
those PRs are welcome.

Matti


[0] https://numpy.org/neps/nep-0038-SIMD-optimizations.html

[1] https://github.com/numpy/numpy/pull/15228

[2] https://github.com/numpy/numpy/pull/15228#issuecomment-580479336

[3] https://github.com/numpy/numpy/pull/15228#issuecomment-580605718

[4] https://github.com/ermig1979/Simd

[5] https://halide-lang.org

[6]?https://github.com/numpy/numpy/pull/15228#issuecomment-581029991


From daniele at grinta.net  Tue Feb  4 13:00:36 2020
From: daniele at grinta.net (Daniele Nicolodi)
Date: Tue, 4 Feb 2020 11:00:36 -0700
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
Message-ID: <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>

On 04-02-2020 08:08, Matti Picus wrote:
> Together with Sayed Adel (cc) and Ralf, I am pleased to put the draft
> version of NEP 38 [0] up for discussion. As per NEP 0, this is the next
> step in the community accepting the approach layed out in the NEP. The
> NEP PR [1] has already garnered a fair amount of discussion about the
> viability of Universal SIMD Intrinsics, so I will try to capture some of
> that here as well.

Hello,

more interesting prior art may be found in VOLK https://www.libvolk.org.
VOLK is developed mainly to be used in GNURadio, and this reflects in
the available kernels and in the supported data types, I think the
approach used there may be of interest.

Cheers,
Dan

From raghuveer.devulapalli at intel.com  Tue Feb  4 14:36:28 2020
From: raghuveer.devulapalli at intel.com (Devulapalli, Raghuveer)
Date: Tue, 4 Feb 2020 19:36:28 +0000
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
Message-ID: <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>

Hi everyone, 

I know had raised these questions in the PR, but wanted to post them in the mailing list as well.  

1) Once NumPy adds the framework and initial set of Universal Intrinsic, if contributors want to leverage a new architecture specific SIMD instruction, will they be expected to add software implementation of this instruction for all other architectures too? 

2) On whom does the burden lie to ensure that new implementations are benchmarked and shows benefits on every architecture? What happens if optimizing an Ufunc leads to improving performance on one architecture and worsens performance on another? 

Thanks, 
Raghuveer


-----Original Message-----
From: NumPy-Discussion <numpy-discussion-bounces+raghuveer.devulapalli=intel.com at python.org> On Behalf Of Daniele Nicolodi
Sent: Tuesday, February 4, 2020 10:01 AM
To: numpy-discussion at python.org
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

On 04-02-2020 08:08, Matti Picus wrote:
> Together with Sayed Adel (cc) and Ralf, I am pleased to put the draft 
> version of NEP 38 [0] up for discussion. As per NEP 0, this is the 
> next step in the community accepting the approach layed out in the 
> NEP. The NEP PR [1] has already garnered a fair amount of discussion 
> about the viability of Universal SIMD Intrinsics, so I will try to 
> capture some of that here as well.

Hello,

more interesting prior art may be found in VOLK https://www.libvolk.org.
VOLK is developed mainly to be used in GNURadio, and this reflects in the available kernels and in the supported data types, I think the approach used there may be of interest.

Cheers,
Dan
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

From einstein.edison at gmail.com  Tue Feb  4 14:59:54 2020
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Tue, 4 Feb 2020 19:59:54 +0000
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>,
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
Message-ID: <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>

?snip?

> 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if contributors want to leverage a new architecture specific SIMD instruction, will they be expected to add software implementation of this instruction for all other architectures too?

In my opinion, if the instructions are lower, then yes. For example, one cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128 and SSE*.  However, I would not expect one person or team to be an expert in all assemblies, so intrinsics for one architecture can be developed independently of another.

> 2) On whom does the burden lie to ensure that new implementations are benchmarked and shows benefits on every architecture? What happens if optimizing an Ufunc leads to improving performance on one architecture and worsens performance on another?

I would look at this from a maintainability point of view. If we are increasing the code size by 20% for a certain ufunc, there must be a domonstrable 20% increase in performance on any CPU. That is to say, micro-optimisation will be unwelcome, and code readability will be preferable. Usually we ask the submitter of the PR to test the PR with a machine they have on hand, and I would be inclined to keep this trend of self-reporting. Of course, if someone else came along and reported a performance regression of, say, 10%, then we have increased code by 20%, with only a net 5% gain in performance, and the PR will have to be reverted.

?snip?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200204/304bef43/attachment.html>

From sebastian at sipsolutions.net  Tue Feb  4 16:06:02 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 04 Feb 2020 13:06:02 -0800
Subject: [Numpy-discussion] NumPy Community Meeting Wednesday, Feb. 05
Message-ID: <bfd4345dca032349cd997cd22713919489cb7290.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting Wednesday February 5 at 11 am
Pacific Time. Everyone is invited to join in and edit the work-in-
progress meeting topics and notes:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200204/5e746833/attachment-0001.sig>

From charlesr.harris at gmail.com  Tue Feb  4 16:17:04 2020
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 4 Feb 2020 14:17:04 -0700
Subject: [Numpy-discussion] manylinux upgrade for numpy wheels
Message-ID: <CAB6mnxJuwvZw2BnVxRR1i6M+J486ZxpqHGNRqFcy_zipsphwXQ@mail.gmail.com>

Hi All,

Thought now would be a good time to decide on upgrading manylinux for the
1.19 release so that we can make sure that everything works as expected.
The choices are

manylinux1 <https://www.python.org/dev/peps/pep-0513/> -- CentOS 5,
currently used, gcc 4.2 (in practice 4.5), only supports i686, x86_64.
manylinux2010 <https://www.python.org/dev/peps/pep-0571/> -- CentOS 6, gcc
4.5, only supports i686, x86_64.
manylinux2014 <https://www.python.org/dev/peps/pep-0599/> -- CentOS 7,
gcc 4.8, supports many more architectures.

The main advantage of manylinux2014 is that it supports many new
architectures, some of which we are already testing against. The main
disadvantage is that it requires pip >= 19.x, which may not be much of a
problem 4 months from now but will undoubtedly cause some installation
problems. Unfortunately, the compiler remains archaic, but folks interested
in performance should be using a performance oriented distribution or
compiling for their native architecture.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200204/b5d6f628/attachment.html>

From njs at pobox.com  Tue Feb  4 17:36:42 2020
From: njs at pobox.com (Nathaniel Smith)
Date: Tue, 4 Feb 2020 14:36:42 -0800
Subject: [Numpy-discussion] manylinux upgrade for numpy wheels
In-Reply-To: <CAB6mnxJuwvZw2BnVxRR1i6M+J486ZxpqHGNRqFcy_zipsphwXQ@mail.gmail.com>
References: <CAB6mnxJuwvZw2BnVxRR1i6M+J486ZxpqHGNRqFcy_zipsphwXQ@mail.gmail.com>
Message-ID: <CAPJVwB=8OxswwCPROKSyvj=2B1Aj1caO=RaCyFkD_TN9JQHrpA@mail.gmail.com>

Pretty sure the 2010 and 2014 images both have much newer compilers than
that.

There are still a lot of users on CentOS 6, so I'd still stick to 2010 for
now on x86_64 at least. We could potentially start adding 2014 wheels for
the other platforms where we currently don't ship wheels ? gotta be better
than nothing, right?

There probably still is some tail of end users whose pip is too old to know
about 2010 wheels. I don't know how big that tail is. If we wanted to be
really careful, we could ship both manylinux1 and manylinux2010 wheels for
a bit ? pip will automatically pick the latest one it recognizes ? and see
what the download numbers look like.

On Tue, Feb 4, 2020, 13:18 Charles R Harris <charlesr.harris at gmail.com>
wrote:

> Hi All,
>
> Thought now would be a good time to decide on upgrading manylinux for the
> 1.19 release so that we can make sure that everything works as expected.
> The choices are
>
> manylinux1 <https://www.python.org/dev/peps/pep-0513/> -- CentOS 5,
> currently used, gcc 4.2 (in practice 4.5), only supports i686, x86_64.
> manylinux2010 <https://www.python.org/dev/peps/pep-0571/> -- CentOS 6,
> gcc 4.5, only supports i686, x86_64.
> manylinux2014 <https://www.python.org/dev/peps/pep-0599/> -- CentOS 7,
> gcc 4.8, supports many more architectures.
>
> The main advantage of manylinux2014 is that it supports many new
> architectures, some of which we are already testing against. The main
> disadvantage is that it requires pip >= 19.x, which may not be much of a
> problem 4 months from now but will undoubtedly cause some installation
> problems. Unfortunately, the compiler remains archaic, but folks interested
> in performance should be using a performance oriented distribution or
> compiling for their native architecture.
>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200204/45c564fd/attachment.html>

From matthew.brett at gmail.com  Wed Feb  5 04:06:17 2020
From: matthew.brett at gmail.com (Matthew Brett)
Date: Wed, 5 Feb 2020 09:06:17 +0000
Subject: [Numpy-discussion] manylinux upgrade for numpy wheels
In-Reply-To: <CAPJVwB=8OxswwCPROKSyvj=2B1Aj1caO=RaCyFkD_TN9JQHrpA@mail.gmail.com>
References: <CAB6mnxJuwvZw2BnVxRR1i6M+J486ZxpqHGNRqFcy_zipsphwXQ@mail.gmail.com>
 <CAPJVwB=8OxswwCPROKSyvj=2B1Aj1caO=RaCyFkD_TN9JQHrpA@mail.gmail.com>
Message-ID: <CAH6Pt5rScq5GmzJ2yn7A7eGC5dRGLMOiskgBOw2e5NryrmvQ5g@mail.gmail.com>

Hi,

On Tue, Feb 4, 2020 at 10:38 PM Nathaniel Smith <njs at pobox.com> wrote:
>
> Pretty sure the 2010 and 2014 images both have much newer compilers than that.
>
> There are still a lot of users on CentOS 6, so I'd still stick to 2010 for now on x86_64 at least. We could potentially start adding 2014 wheels for the other platforms where we currently don't ship wheels ? gotta be better than nothing, right?
>
> There probably still is some tail of end users whose pip is too old to know about 2010 wheels. I don't know how big that tail is. If we wanted to be really careful, we could ship both manylinux1 and manylinux2010 wheels for a bit ? pip will automatically pick the latest one it recognizes ? and see what the download numbers look like.

That all sounds right to me too.

Cheers,

Matthew

> On Tue, Feb 4, 2020, 13:18 Charles R Harris <charlesr.harris at gmail.com> wrote:
>>
>> Hi All,
>>
>> Thought now would be a good time to decide on upgrading manylinux for the 1.19 release so that we can make sure that everything works as expected. The choices are
>>
>> manylinux1 -- CentOS 5, currently used, gcc 4.2 (in practice 4.5), only supports i686, x86_64.
>> manylinux2010 -- CentOS 6, gcc 4.5, only supports i686, x86_64.
>> manylinux2014 -- CentOS 7, gcc 4.8, supports many more architectures.
>>
>> The main advantage of manylinux2014 is that it supports many new architectures, some of which we are already testing against. The main disadvantage is that it requires pip >= 19.x, which may not be much of a problem 4 months from now but will undoubtedly cause some installation problems. Unfortunately, the compiler remains archaic, but folks interested in performance should be using a performance oriented distribution or compiling for their native architecture.
>>
>> Chuck
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From t3kcit at gmail.com  Wed Feb  5 11:01:04 2020
From: t3kcit at gmail.com (Andreas Mueller)
Date: Wed, 5 Feb 2020 11:01:04 -0500
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
Message-ID: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>

A bit late to the NEP 37 party.
I just wanted to say that at least from my perspective it seems a great 
solution that will help sklearn move towards more flexible compute engines.
I think one of the biggest issues is array creation (including random 
arrays), and that's handled quite nicely with NEP 37.

There's some discussion on the scikit-learn side here:
https://github.com/scikit-learn/scikit-learn/pull/14963
https://github.com/scikit-learn/scikit-learn/issues/11447

Two different groups of people tried to use __array_function__ to 
delegate to MxNet and CuPy respectively in scikit-learn, and ran into 
the same issues.

There's some remaining issues in sklearn that will not be handled by NEP 
37 but they go beyond NumPy in some sense.
Just to briefly bring them up:

- We use scipy.linalg in many places, and we would need to do a separate 
dispatching to check whether we can use module.linalg instead
 ?(that might be an issue for many libraries but I'm not sure).

- Some models have several possible optimization algorithms, some of 
which are pure numpy and some which are Cython. If someone provides a 
different array module,
 ?we might want to choose an algorithm that is actually supported by 
that module. While this exact issue is maybe sklearn specific, a similar 
issue could appear for most downstream libs that use Cython in some places.
 ?Many Cython algorithms could be implemented in pure numpy with a 
potential slowdown, but once we have NEP 37 there might be a benefit to 
having a pure NumPy implementation as an alternative code path.


Anyway, NEP 37 seems a great step in the right direction and would 
enable sklearn to actually dispatch in some places. Dispatching just 
based on __array_function__ seems not really feasible so far.

Best,
Andreas Mueller


On 1/6/20 11:29 PM, Stephan Hoyer wrote:
> I am pleased to present a new NumPy Enhancement Proposal for 
> discussion: "NEP-37: A dispatch protocol for NumPy-like modules." 
> Feedback would be very welcome!
>
> The full text follows. The rendered proposal can also be found online 
> at https://numpy.org/neps/nep-0037-array-module.html
>
> Best,
> Stephan Hoyer
>
> ===================================================
> NEP 37 ? A dispatch protocol for NumPy-like modules
> ===================================================
>
> :Author: Stephan Hoyer <shoyer at google.com <mailto:shoyer at google.com>>
> :Author: Hameer Abbasi
> :Author: Sebastian Berg
> :Status: Draft
> :Type: Standards Track
> :Created: 2019-12-29
>
> Abstract
> --------
>
> NEP-18's ``__array_function__`` has been a mixed success. Some 
> projects (e.g.,
> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it. Others
> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose 
> a new
> protocol, ``__array_module__``, that we expect could eventually 
> subsume most
> use-cases for ``__array_function__``. The protocol requires explicit 
> adoption
> by both users and library authors, which ensures backwards 
> compatibility, and
> is also significantly simpler than ``__array_function__``, both of 
> which we
> expect will make it easier to adopt.
>
> Why ``__array_function__`` hasn't been enough
> ---------------------------------------------
>
> There are two broad ways in which NEP-18 has fallen short of its goals:
>
> 1. **Maintainability concerns**. `__array_function__` has significant
> ? ?implications for libraries that use it:
>
> ? ?- Projects like `PyTorch
> ? ? ?<https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
> ? ? ?<https://github.com/google/jax/issues/1565>`_ and even `scipy.sparse
> ? ? ?<https://github.com/scipy/scipy/issues/10362>`_ have been 
> reluctant to
> ? ? ?implement `__array_function__` in part because they are concerned 
> about
> ? ? ?**breaking existing code**: users expect NumPy functions like
> ? ? ?``np.concatenate`` to return NumPy arrays. This is a fundamental
> ? ? ?limitation of the ``__array_function__`` design, which we chose 
> to allow
> ? ? ?overriding the existing ``numpy`` namespace.
> ? ?- ``__array_function__`` currently requires an "all or nothing" 
> approach to
> ? ? ?implementing NumPy's API. There is no good pathway for **incremental
> ? ? ?adoption**, which is particularly problematic for established 
> projects
> ? ? ?for which adopting ``__array_function__`` would result in breaking
> ? ? ?changes.
> ? ?- It is no longer possible to use **aliases to NumPy functions** within
> ? ? ?modules that support overrides. For example, both CuPy and JAX set
> ? ? ?``result_type = np.result_type``.
> ? ?- Implementing **fall-back mechanisms** for unimplemented NumPy 
> functions
> ? ? ?by using NumPy's implementation is hard to get right (but see the
> ? ? ?`version from dask <https://github.com/dask/dask/pull/5043>`_), 
> because
> ? ? ?``__array_function__`` does not present a consistent interface.
> ? ? ?Converting all arguments of array type requires recursing into 
> generic
> ? ? ?arguments of the form ``*args, **kwargs``.
>
> 2. **Limitations on what can be overridden.** ``__array_function__`` 
> has some
> ? ?important gaps, most notably array creation and coercion functions:
>
> ? ?- **Array creation** routines (e.g., ``np.arange`` and those in
> ? ? ?``np.random``) need some other mechanism for indicating what type of
> ? ? ?arrays to create. `NEP 36 
> <https://github.com/numpy/numpy/pull/14715>`_
> ? ? ?proposed adding optional ``like=`` arguments to functions without
> ? ? ?existing array arguments. However, we still lack any mechanism to
> ? ? ?override methods on objects, such as those needed by
> ? ? ?``np.random.RandomState``.
> ? ?- **Array conversion** can't reuse the existing coercion functions like
> ? ? ?``np.asarray``, because ``np.asarray`` sometimes means "convert to an
> ? ? ?exact ``np.ndarray``" and other times means "convert to something 
> _like_
> ? ? ?a NumPy array." This led to the `NEP 30
> ? ? ?<https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_ 
> proposal for
> ? ? ?a separate ``np.duckarray`` function, but this still does not 
> resolve how
> ? ? ?to cast one duck array into a type matching another duck array.
>
> ``get_array_module`` and the ``__array_module__`` protocol
> ----------------------------------------------------------
>
> We propose a new user-facing mechanism for dispatching to a duck-array
> implementation, ``numpy.get_array_module``. ``get_array_module`` 
> performs the
> same type resolution as ``__array_function__`` and returns a module 
> with an API
> promised to match the standard interface of ``numpy`` that can implement
> operations on all provided array types.
>
> The protocol itself is both simpler and more powerful than
> ``__array_function__``, because it doesn't need to worry about actually
> implementing functions. We believe it resolves most of the 
> maintainability and
> functionality limitations of ``__array_function__``.
>
> The new protocol is opt-in, explicit and with local control; see
> :ref:`appendix-design-choices` for discussion on the importance of 
> these design
> features.
>
> The array module contract
> =========================
>
> Modules returned by ``get_array_module``/``__array_module__`` should 
> make a
> best effort to implement NumPy's core functionality on new array types(s).
> Unimplemented functionality should simply be omitted (e.g., accessing an
> unimplemented function should raise ``AttributeError``). In the future, we
> anticipate codifying a protocol for requesting restricted subsets of 
> ``numpy``;
> see :ref:`requesting-restricted-subsets` for more details.
>
> How to use ``get_array_module``
> ===============================
>
> Code that wants to support generic duck arrays should explicitly call
> ``get_array_module`` to determine an appropriate array module from 
> which to
> call functions, rather than using the ``numpy`` namespace directly. For
> example:
>
> .. code:: python
>
> ? ? # calls the appropriate version of np.something for x and y
> ? ? module = np.get_array_module(x, y)
> ? ? module.something(x, y)
>
> Both array creation and array conversion are supported, because 
> dispatching is
> handled by ``get_array_module`` rather than via the types of function
> arguments. For example, to use random number generation functions or 
> methods,
> we can simply pull out the appropriate submodule:
>
> .. code:: python
>
> ? ? def duckarray_add_random(array):
> ? ? ? ? module = np.get_array_module(array)
> ? ? ? ? noise = module.random.randn(*array.shape)
> ? ? ? ? return array + noise
>
> We can also write the duck-array ``stack`` function from `NEP 30
> <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_, without 
> the need
> for a new ``np.duckarray`` function:
>
> .. code:: python
>
> ? ? def duckarray_stack(arrays):
> ? ? ? ? module = np.get_array_module(*arrays)
> ? ? ? ? arrays = [module.asarray(arr) for arr in arrays]
> ? ? ? ? shapes = {arr.shape for arr in arrays}
> ? ? ? ? if len(shapes) != 1:
> ? ? ? ? ? ? raise ValueError('all input arrays must have the same shape')
> ? ? ? ? expanded_arrays = [arr[module.newaxis, ...] for arr in arrays]
> ? ? ? ? return module.concatenate(expanded_arrays, axis=0)
>
> By default, ``get_array_module`` will return the ``numpy`` module if no
> arguments are arrays. This fall-back can be explicitly controlled by 
> providing
> the ``module`` keyword-only argument. It is also possible to indicate 
> that an
> exception should be raised instead of returning a default array module by
> setting ``module=None``.
>
> How to implement ``__array_module__``
> =====================================
>
> Libraries implementing a duck array type that want to support
> ``get_array_module`` need to implement the corresponding protocol,
> ``__array_module__``. This new protocol is based on Python's dispatch 
> protocol
> for arithmetic, and is essentially a simpler version of 
> ``__array_function__``.
>
> Only one argument is passed into ``__array_module__``, a Python 
> collection of
> unique array types passed into ``get_array_module``, i.e., all 
> arguments with
> an ``__array_module__`` attribute.
>
> The special method should either return an namespace with an API matching
> ``numpy``, or ``NotImplemented``, indicating that it does not know how to
> handle the operation:
>
> .. code:: python
>
> ? ? class MyArray:
> ? ? ? ? def __array_module__(self, types):
> ? ? ? ? ? ? if not all(issubclass(t, MyArray) for t in types):
> ? ? ? ? ? ? ? ? return NotImplemented
> ? ? ? ? ? ? return my_array_module
>
> Returning custom objects from ``__array_module__``
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> ``my_array_module`` will typically, but need not always, be a Python 
> module.
> Returning a custom objects (e.g., with functions implemented via
> ``__getattr__``) may be useful for some advanced use cases.
>
> For example, custom objects could allow for partial implementations of 
> duck
> array modules that fall-back to NumPy (although this is not recommended in
> general because such fall-back behavior can be error prone):
>
> .. code:: python
>
> ? ? class MyArray:
> ? ? ? ? def __array_module__(self, types):
> ? ? ? ? ? ? if all(issubclass(t, MyArray) for t in types):
> ? ? ? ? ? ? ? ? return ArrayModule()
> ? ? ? ? ? ? else:
> ? ? ? ? ? ? ? ? return NotImplemented
>
> ? ? class ArrayModule:
> ? ? ? ? def __getattr__(self, name):
> ? ? ? ? ? ? import base_module
> ? ? ? ? ? ? return getattr(base_module, name, getattr(numpy, name))
>
> Subclassing from ``numpy.ndarray``
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> All of the same guidance about well-defined type casting hierarchies from
> NEP-18 still applies. ``numpy.ndarray`` itself contains a matching
> implementation of ``__array_module__``, ?which is convenient for 
> subclasses:
>
> .. code:: python
>
> ? ? class ndarray:
> ? ? ? ? def __array_module__(self, types):
> ? ? ? ? ? ? if all(issubclass(t, ndarray) for t in types):
> ? ? ? ? ? ? ? ? return numpy
> ? ? ? ? ? ? else:
> ? ? ? ? ? ? ? ? return NotImplemented
>
> NumPy's internal machinery
> ==========================
>
> The type resolution rules of ``get_array_module`` follow the same model as
> Python and NumPy's existing dispatch protocols: subclasses are called 
> before
> super-classes, and otherwise left to right. ``__array_module__`` is 
> guaranteed
> to be called only ?a single time on each unique type.
>
> The actual implementation of `get_array_module` will be in C, but 
> should be
> equivalent to this Python code:
>
> .. code:: python
>
> ? ? def get_array_module(*arrays, default=numpy):
> ? ? ? ? implementing_arrays, types = 
> _implementing_arrays_and_types(arrays)
> ? ? ? ? if not implementing_arrays and default is not None:
> ? ? ? ? ? ? return default
> ? ? ? ? for array in implementing_arrays:
> ? ? ? ? ? ? module = array.__array_module__(types)
> ? ? ? ? ? ? if module is not NotImplemented:
> ? ? ? ? ? ? ? ? return module
> ? ? ? ? raise TypeError("no common array module found")
>
> ? ? def _implementing_arrays_and_types(relevant_arrays):
> ? ? ? ? types = []
> ? ? ? ? implementing_arrays = []
> ? ? ? ? for array in relevant_arrays:
> ? ? ? ? ? ? t = type(array)
> ? ? ? ? ? ? if t not in types and hasattr(t, '__array_module__'):
> ? ? ? ? ? ? ? ? types.append(t)
> ? ? ? ? ? ? ? ? # Subclasses before superclasses, otherwise left to right
> ? ? ? ? ? ? ? ? index = len(implementing_arrays)
> ? ? ? ? ? ? ? ? for i, old_array in enumerate(implementing_arrays):
> ? ? ? ? ? ? ? ? ? ? if issubclass(t, type(old_array)):
> ? ? ? ? ? ? ? ? ? ? ? ? index = i
> ? ? ? ? ? ? ? ? ? ? ? ? break
> ? ? ? ? ? ? ? ? implementing_arrays.insert(index, array)
> ? ? ? ? return implementing_arrays, types
>
> Relationship with ``__array_ufunc__`` and ``__array_function__``
> ----------------------------------------------------------------
>
> These older protocols have distinct use-cases and should remain
> ===============================================================
>
> ``__array_module__`` is intended to resolve limitations of
> ``__array_function__``, so it is natural to consider whether it could 
> entirely
> replace ``__array_function__``. This would offer dual benefits: (1) 
> simplifying
> the user-story about how to override NumPy and (2) removing the slowdown
> associated with checking for dispatch when calling every NumPy function.
>
> However, ``__array_module__`` and ``__array_function__`` are pretty 
> different
> from a user perspective: it requires explicit calls to 
> ``get_array_function``,
> rather than simply reusing original ``numpy`` functions. This is 
> probably fine
> for *libraries* that rely on duck-arrays, but may be frustratingly 
> verbose for
> interactive use.
>
> Some of the dispatching use-cases for ``__array_ufunc__`` are also 
> solved by
> ``__array_module__``, but not all of them. For example, it is still 
> useful to
> be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in a 
> generic way
> on non-NumPy arrays (e.g., with dask.array).
>
> Given their existing adoption and distinct use cases, we don't think 
> it makes
> sense to remove or deprecate ``__array_function__`` and 
> ``__array_ufunc__`` at
> this time.
>
> Mixin classes to implement ``__array_function__`` and ``__array_ufunc__``
> =========================================================================
>
> Despite the user-facing differences, ``__array_module__`` and a module
> implementing NumPy's API still contain sufficient functionality needed to
> implement dispatching with the existing duck array protocols.
>
> For example, the following mixin classes would provide sensible 
> defaults for
> these special methods in terms of ``get_array_module`` and
> ``__array_module__``:
>
> .. code:: python
>
> ? ? class ArrayUfuncFromModuleMixin:
>
> ? ? ? ? def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
> ? ? ? ? ? ? arrays = inputs + kwargs.get('out', ())
> ? ? ? ? ? ? try:
> ? ? ? ? ? ? ? ? array_module = np.get_array_module(*arrays)
> ? ? ? ? ? ? except TypeError:
> ? ? ? ? ? ? ? ? return NotImplemented
>
> ? ? ? ? ? ? try:
> ? ? ? ? ? ? ? ? # Note this may have false positive matches, if 
> ufunc.__name__
> ? ? ? ? ? ? ? ? # matches the name of a ufunc defined by NumPy. 
> Unfortunately
> ? ? ? ? ? ? ? ? # there is no way to determine in which module a ufunc was
> ? ? ? ? ? ? ? ? # defined.
> ? ? ? ? ? ? ? ? new_ufunc = getattr(array_module, ufunc.__name__)
> ? ? ? ? ? ? except AttributeError:
> ? ? ? ? ? ? ? ? return NotImplemented
>
> ? ? ? ? ? ? try:
> ? ? ? ? ? ? ? ? callable = getattr(new_ufunc, method)
> ? ? ? ? ? ? except AttributeError:
> ? ? ? ? ? ? ? ? return NotImplemented
>
> ? ? ? ? ? ? return callable(*inputs, **kwargs)
>
> ? ? class ArrayFunctionFromModuleMixin:
>
> ? ? ? ? def __array_function__(self, func, types, args, kwargs):
> ? ? ? ? ? ? array_module = self.__array_module__(types)
> ? ? ? ? ? ? if array_module is NotImplemented:
> ? ? ? ? ? ? ? ? return NotImplemented
>
> ? ? ? ? ? ? # Traverse submodules to find the appropriate function
> ? ? ? ? ? ? modules = func.__module__.split('.')
> ? ? ? ? ? ? assert modules[0] == 'numpy'
> ? ? ? ? ? ? for submodule in modules[1:]:
> ? ? ? ? ? ? ? ? module = getattr(module, submodule, None)
> ? ? ? ? ? ? new_func = getattr(module, func.__name__, None)
> ? ? ? ? ? ? if new_func is None:
> ? ? ? ? ? ? ? ? return NotImplemented
>
> ? ? ? ? ? ? return new_func(*args, **kwargs)
>
> To make it easier to write duck arrays, we could also add these mixin 
> classes
> into ``numpy.lib.mixins`` (but the examples above may suffice).
>
> Alternatives considered
> -----------------------
>
> Naming
> ======
>
> We like the name ``__array_module__`` because it mirrors the existing
> ``__array_function__`` and ``__array_ufunc__`` protocols. Another 
> reasonable
> choice could be ``__array_namespace__``.
>
> It is less clear what the NumPy function that calls this protocol 
> should be
> called (``get_array_module`` in this proposal). Some possible 
> alternatives:
> ``array_module``, ``common_array_module``, ``resolve_array_module``,
> ``get_namespace``, ``get_numpy``, ``get_numpylike_module``,
> ``get_duck_array_module``.
>
> .. _requesting-restricted-subsets:
>
> Requesting restricted subsets of NumPy's API
> ============================================
>
> Over time, NumPy has accumulated a very large API surface, with over 600
> attributes in the top level ``numpy`` module alone. It is unlikely 
> that any
> duck array library could or would want to implement all of these 
> functions and
> classes, because the frequently used subset of NumPy is much smaller.
>
> We think it would be useful exercise to define "minimal" subset(s) of 
> NumPy's
> API, omitting rarely used or non-recommended functionality. For example,
> minimal NumPy might include ``stack``, but not the other stacking 
> functions
> ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This could 
> clearly
> indicate to duck array authors and users want functionality is core 
> and what
> functionality they can skip.
>
> Support for requesting a restricted subset of NumPy's API would be a 
> natural
> feature to include in ?``get_array_function`` and 
> ``__array_module__``, e.g.,
>
> .. code:: python
>
> ? ? # array_module is only guaranteed to contain "minimal" NumPy
> ? ? array_module = np.get_array_module(*arrays, request='minimal')
>
> To facilitate testing with NumPy and use with any valid duck array 
> library,
> NumPy itself would return restricted versions of the ``numpy`` module when
> ``get_array_module`` is called only on NumPy arrays. Omitted functions 
> would
> simply not exist.
>
> Unfortunately, we have not yet figured out what these restricted 
> subsets should
> be, so it doesn't make sense to do this yet. When/if we do, we could 
> either add
> new keyword arguments to ``get_array_module`` or add new top level 
> functions,
> e.g., ``get_minimal_array_module``. We would also need to add either a new
> protocol patterned off of ``__array_module__`` (e.g.,
> ``__array_module_minimal__``), or could add an optional second argument to
> ``__array_module__`` (catching errors with ``try``/``except``).
>
> A new namespace for implicit dispatch
> =====================================
>
> Instead of supporting overrides in the main `numpy` namespace with
> ``__array_function__``, we could create a new opt-in namespace, e.g.,
> ``numpy.api``, with versions of NumPy functions that support 
> dispatching. These
> overrides would need new opt-in protocols, e.g., 
> ``__array_function_api__``
> patterned off of ``__array_function__``.
>
> This would resolve the biggest limitations of ``__array_function__`` 
> by being
> opt-in and would also allow for unambiguously overriding functions like
> ``asarray``, because ``np.api.asarray`` would always mean "convert an
> array-like object." ?But it wouldn't solve all the dispatching needs 
> met by
> ``__array_module__``, and would leave us with supporting a 
> considerably more
> complex protocol both for array users and implementors.
>
> We could potentially implement such a new namespace *via* the
> ``__array_module__`` protocol. Certainly some users would find this 
> convenient,
> because it is slightly less boilerplate. But this would leave users with a
> confusing choice: when should they use `get_array_module` vs.
> `np.api.something`. Also, we would have to add and maintain a whole 
> new module,
> which is considerably more expensive than merely adding a function.
>
> Dispatching on both types and arrays instead of only types
> ==========================================================
>
> Instead of supporting dispatch only via unique array types, we could also
> support dispatch via array objects, e.g., by passing an ``arrays`` 
> argument as
> part of the ``__array_module__`` protocol. This could potentially be 
> useful for
> dispatch for arrays with metadata, such provided by Dask and Pint, but 
> would
> impose costs in terms of type safety and complexity.
>
> For example, a library that supports arrays on both CPUs and GPUs 
> might decide
> on which device to create a new arrays from functions like ``ones`` 
> based on
> input arguments:
>
> .. code:: python
>
> ? ? class Array:
> ? ? ? ? def __array_module__(self, types, arrays):
> ? ? ? ? ? ? useful_arrays = tuple(a in arrays if isinstance(a, Array))
> ? ? ? ? ? ? if not useful_arrays:
> ? ? ? ? ? ? ? ? return NotImplemented
> ? ? ? ? ? ? prefer_gpu = any(a.prefer_gpu for a in useful_arrays)
> ? ? ? ? ? ? return ArrayModule(prefer_gpu)
>
> ? ? class ArrayModule:
> ? ? ? ? def __init__(self, prefer_gpu):
> ? ? ? ? ? ? self.prefer_gpu = prefer_gpu
>
> ? ? ? ? def __getattr__(self, name):
> ? ? ? ? ? ? import base_module
> ? ? ? ? ? ? base_func = getattr(base_module, name)
> ? ? ? ? ? ? return functools.partial(base_func, 
> prefer_gpu=self.prefer_gpu)
>
> This might be useful, but it's not clear if we really need it. Pint 
> seems to
> get along OK without any explicit array creation routines (favoring
> multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for the 
> most part
> Dask is also OK with existing ``__array_function__`` style overides (e.g.,
> favoring ``np.ones_like`` over ``np.ones``). Choosing whether to place 
> an array
> on the CPU or GPU could be solved by `making array creation lazy
> <https://github.com/google/jax/pull/1668>`_.
>
> .. _appendix-design-choices:
>
> Appendix: design choices for API overrides
> ------------------------------------------
>
> There is a large range of possible design choices for overriding 
> NumPy's API.
> Here we discuss three major axes of the design decision that guided 
> our design
> for ``__array_module__``.
>
> Opt-in vs. opt-out for users
> ============================
>
> The ``__array_ufunc__`` and ``__array_function__`` protocols provide a
> mechanism for overriding NumPy functions *within NumPy's existing 
> namespace*.
> This means that users need to explicitly opt-out if they do not want any
> overridden behavior, e.g., by casting arrays with ``np.asarray()``.
>
> In theory, this approach lowers the barrier for adopting these 
> protocols in
> user code and libraries, because code that uses the standard NumPy 
> namespace is
> automatically compatible. But in practice, this hasn't worked out. For 
> example,
> most well-maintained libraries that use NumPy follow the best practice of
> casting all inputs with ``np.asarray()``, which they would have to 
> explicitly
> relax to use ``__array_function__``. Our experience has been that making a
> library compatible with a new duck array type typically requires at 
> least a
> small amount of work to accommodate differences in the data model and 
> operations
> that can be implemented efficiently.
>
> These opt-out approaches also considerably complicate backwards 
> compatibility
> for libraries that adopt these protocols, because by opting in as a 
> library
> they also opt-in their users, whether they expect it or not. For 
> winning over
> libraries that have been unable to adopt ``__array_function__``, an opt-in
> approach seems like a must.
>
> Explicit vs. implicit choice of implementation
> ==============================================
>
> Both ``__array_ufunc__`` and ``__array_function__`` have implicit 
> control over
> dispatching: the dispatched functions are determined via the appropriate
> protocols in every function call. This generalizes well to handling many
> different types of objects, as evidenced by its use for implementing 
> arithmetic
> operators in Python, but it has two downsides:
>
> 1. *Speed*: it imposes additional overhead in every function call, 
> because each
> ? ?function call needs to inspect each of its arguments for overrides. 
> This is
> ? ?why arithmetic on builtin Python numbers is slow.
> 2. *Readability*: it is not longer immediately evident to readers of 
> code what
> ? ?happens when a function is called, because the function's 
> implementation
> ? ?could be overridden by any of its arguments.
>
> In contrast, importing a new library (e.g., ``import ?dask.array as 
> da``) with
> an API matching NumPy is entirely explicit. There is no overhead from 
> dispatch
> or ambiguity about which implementation is being used.
>
> Explicit and implicit choice of implementations are not mutually exclusive
> options. Indeed, most implementations of NumPy API overrides via
> ``__array_function__`` that we are familiar with (namely, dask, CuPy and
> sparse, but not Pint) also include an explicit way to use their version of
> NumPy's API by importing a module directly (``dask.array``, ``cupy`` or
> ``sparse``, respectively).
>
> Local vs. non-local vs. global control
> ======================================
>
> The final design axis is how users control the choice of API:
>
> - **Local control**, as exemplified by multiple dispatch and Python 
> protocols for
> ? arithmetic, determines which implementation to use either by 
> checking types
> ? or calling methods on the direct arguments of a function.
> - **Non-local control** such as `np.errstate
> ? 
> <https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html>`_
> ? overrides behavior with global-state via function decorators or
> ? context-managers. Control is determined hierarchically, via the 
> inner-most
> ? context.
> - **Global control** provides a mechanism for users to set default 
> behavior,
> ? either via function calls or configuration files. For example, 
> matplotlib
> ? allows setting a global choice of plotting backend.
>
> Local control is generally considered a best practice for API design, 
> because
> control flow is entirely explicit, which makes it the easiest to 
> understand.
> Non-local and global control are occasionally used, but generally 
> either due to
> ignorance or a lack of better alternatives.
>
> In the case of duck typing for NumPy's public API, we think non-local 
> or global
> control would be mistakes, mostly because they **don't compose well**. 
> If one
> library sets/needs one set of overrides and then internally calls a 
> routine
> that expects another set of overrides, the resulting behavior may be very
> surprising. Higher order functions are especially problematic, because the
> context in which functions are evaluated may not be the context in 
> which they
> are defined.
>
> One class of override use cases where we think non-local and global 
> control are
> appropriate is for choosing a backend system that is guaranteed to have an
> entirely consistent interface, such as a faster alternative 
> implementation of
> ``numpy.fft`` on NumPy arrays. However, these are out of scope for the 
> current
> proposal, which is focused on duck arrays.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200205/c44efeda/attachment-0001.html>

From ralf.gommers at gmail.com  Wed Feb  5 12:37:53 2020
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 5 Feb 2020 11:37:53 -0600
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
Message-ID: <CABL7CQizzjByT3TnRj7nf+rt9dKGdjfZRc=SEHZrjSkNORYYWA@mail.gmail.com>

On Wed, Feb 5, 2020 at 10:01 AM Andreas Mueller <t3kcit at gmail.com> wrote:

> A bit late to the NEP 37 party.
> I just wanted to say that at least from my perspective it seems a great
> solution that will help sklearn move towards more flexible compute engines.
> I think one of the biggest issues is array creation (including random
> arrays), and that's handled quite nicely with NEP 37.
>
> There's some discussion on the scikit-learn side here:
> https://github.com/scikit-learn/scikit-learn/pull/14963
> https://github.com/scikit-learn/scikit-learn/issues/11447
>
> Two different groups of people tried to use __array_function__ to delegate
> to MxNet and CuPy respectively in scikit-learn, and ran into the same
> issues.
>
> There's some remaining issues in sklearn that will not be handled by NEP
> 37 but they go beyond NumPy in some sense.
> Just to briefly bring them up:
>
> - We use scipy.linalg in many places, and we would need to do a separate
> dispatching to check whether we can use module.linalg instead
>  (that might be an issue for many libraries but I'm not sure).
>

That is an issue, and goes in the opposite direction we need - scipy.linalg
is a superset of numpy.linalg, so we'd like to encourage using scipy. This
is something we may want to consider fixing by making the dispatch
decorator public in numpy and adopting in scipy.

Cheers,
Ralf


>
> - Some models have several possible optimization algorithms, some of which
> are pure numpy and some which are Cython. If someone provides a different
> array module,
>  we might want to choose an algorithm that is actually supported by that
> module. While this exact issue is maybe sklearn specific, a similar issue
> could appear for most downstream libs that use Cython in some places.
>  Many Cython algorithms could be implemented in pure numpy with a
> potential slowdown, but once we have NEP 37 there might be a benefit to
> having a pure NumPy implementation as an alternative code path.
>
>
> Anyway, NEP 37 seems a great step in the right direction and would enable
> sklearn to actually dispatch in some places. Dispatching just based on
> __array_function__ seems not really feasible so far.
>
> Best,
> Andreas Mueller
>
>
> On 1/6/20 11:29 PM, Stephan Hoyer wrote:
>
> I am pleased to present a new NumPy Enhancement Proposal for discussion:
> "NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
> very welcome!
>
> The full text follows. The rendered proposal can also be found online at
> https://numpy.org/neps/nep-0037-array-module.html
>
> Best,
> Stephan Hoyer
>
> ===================================================
> NEP 37 ? A dispatch protocol for NumPy-like modules
> ===================================================
>
> :Author: Stephan Hoyer <shoyer at google.com>
> :Author: Hameer Abbasi
> :Author: Sebastian Berg
> :Status: Draft
> :Type: Standards Track
> :Created: 2019-12-29
>
> Abstract
> --------
>
> NEP-18's ``__array_function__`` has been a mixed success. Some projects
> (e.g.,
> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it. Others
> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a new
> protocol, ``__array_module__``, that we expect could eventually subsume
> most
> use-cases for ``__array_function__``. The protocol requires explicit
> adoption
> by both users and library authors, which ensures backwards compatibility,
> and
> is also significantly simpler than ``__array_function__``, both of which we
> expect will make it easier to adopt.
>
> Why ``__array_function__`` hasn't been enough
> ---------------------------------------------
>
> There are two broad ways in which NEP-18 has fallen short of its goals:
>
> 1. **Maintainability concerns**. `__array_function__` has significant
>    implications for libraries that use it:
>
>    - Projects like `PyTorch
>      <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
>      <https://github.com/google/jax/issues/1565>`_ and even `scipy.sparse
>      <https://github.com/scipy/scipy/issues/10362>`_ have been reluctant
> to
>      implement `__array_function__` in part because they are concerned
> about
>      **breaking existing code**: users expect NumPy functions like
>      ``np.concatenate`` to return NumPy arrays. This is a fundamental
>      limitation of the ``__array_function__`` design, which we chose to
> allow
>      overriding the existing ``numpy`` namespace.
>    - ``__array_function__`` currently requires an "all or nothing"
> approach to
>      implementing NumPy's API. There is no good pathway for **incremental
>      adoption**, which is particularly problematic for established projects
>      for which adopting ``__array_function__`` would result in breaking
>      changes.
>    - It is no longer possible to use **aliases to NumPy functions** within
>      modules that support overrides. For example, both CuPy and JAX set
>      ``result_type = np.result_type``.
>    - Implementing **fall-back mechanisms** for unimplemented NumPy
> functions
>      by using NumPy's implementation is hard to get right (but see the
>      `version from dask <https://github.com/dask/dask/pull/5043>`_),
> because
>      ``__array_function__`` does not present a consistent interface.
>      Converting all arguments of array type requires recursing into generic
>      arguments of the form ``*args, **kwargs``.
>
> 2. **Limitations on what can be overridden.** ``__array_function__`` has
> some
>    important gaps, most notably array creation and coercion functions:
>
>    - **Array creation** routines (e.g., ``np.arange`` and those in
>      ``np.random``) need some other mechanism for indicating what type of
>      arrays to create. `NEP 36 <https://github.com/numpy/numpy/pull/14715
> >`_
>      proposed adding optional ``like=`` arguments to functions without
>      existing array arguments. However, we still lack any mechanism to
>      override methods on objects, such as those needed by
>      ``np.random.RandomState``.
>    - **Array conversion** can't reuse the existing coercion functions like
>      ``np.asarray``, because ``np.asarray`` sometimes means "convert to an
>      exact ``np.ndarray``" and other times means "convert to something
> _like_
>      a NumPy array." This led to the `NEP 30
>      <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_
> proposal for
>      a separate ``np.duckarray`` function, but this still does not resolve
> how
>      to cast one duck array into a type matching another duck array.
>
> ``get_array_module`` and the ``__array_module__`` protocol
> ----------------------------------------------------------
>
> We propose a new user-facing mechanism for dispatching to a duck-array
> implementation, ``numpy.get_array_module``. ``get_array_module`` performs
> the
> same type resolution as ``__array_function__`` and returns a module with
> an API
> promised to match the standard interface of ``numpy`` that can implement
> operations on all provided array types.
>
> The protocol itself is both simpler and more powerful than
> ``__array_function__``, because it doesn't need to worry about actually
> implementing functions. We believe it resolves most of the maintainability
> and
> functionality limitations of ``__array_function__``.
>
> The new protocol is opt-in, explicit and with local control; see
> :ref:`appendix-design-choices` for discussion on the importance of these
> design
> features.
>
> The array module contract
> =========================
>
> Modules returned by ``get_array_module``/``__array_module__`` should make a
> best effort to implement NumPy's core functionality on new array types(s).
> Unimplemented functionality should simply be omitted (e.g., accessing an
> unimplemented function should raise ``AttributeError``). In the future, we
> anticipate codifying a protocol for requesting restricted subsets of
> ``numpy``;
> see :ref:`requesting-restricted-subsets` for more details.
>
> How to use ``get_array_module``
> ===============================
>
> Code that wants to support generic duck arrays should explicitly call
> ``get_array_module`` to determine an appropriate array module from which to
> call functions, rather than using the ``numpy`` namespace directly. For
> example:
>
> .. code:: python
>
>     # calls the appropriate version of np.something for x and y
>     module = np.get_array_module(x, y)
>     module.something(x, y)
>
> Both array creation and array conversion are supported, because
> dispatching is
> handled by ``get_array_module`` rather than via the types of function
> arguments. For example, to use random number generation functions or
> methods,
> we can simply pull out the appropriate submodule:
>
> .. code:: python
>
>     def duckarray_add_random(array):
>         module = np.get_array_module(array)
>         noise = module.random.randn(*array.shape)
>         return array + noise
>
> We can also write the duck-array ``stack`` function from `NEP 30
> <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_, without the
> need
> for a new ``np.duckarray`` function:
>
> .. code:: python
>
>     def duckarray_stack(arrays):
>         module = np.get_array_module(*arrays)
>         arrays = [module.asarray(arr) for arr in arrays]
>         shapes = {arr.shape for arr in arrays}
>         if len(shapes) != 1:
>             raise ValueError('all input arrays must have the same shape')
>         expanded_arrays = [arr[module.newaxis, ...] for arr in arrays]
>         return module.concatenate(expanded_arrays, axis=0)
>
> By default, ``get_array_module`` will return the ``numpy`` module if no
> arguments are arrays. This fall-back can be explicitly controlled by
> providing
> the ``module`` keyword-only argument. It is also possible to indicate that
> an
> exception should be raised instead of returning a default array module by
> setting ``module=None``.
>
> How to implement ``__array_module__``
> =====================================
>
> Libraries implementing a duck array type that want to support
> ``get_array_module`` need to implement the corresponding protocol,
> ``__array_module__``. This new protocol is based on Python's dispatch
> protocol
> for arithmetic, and is essentially a simpler version of
> ``__array_function__``.
>
> Only one argument is passed into ``__array_module__``, a Python collection
> of
> unique array types passed into ``get_array_module``, i.e., all arguments
> with
> an ``__array_module__`` attribute.
>
> The special method should either return an namespace with an API matching
> ``numpy``, or ``NotImplemented``, indicating that it does not know how to
> handle the operation:
>
> .. code:: python
>
>     class MyArray:
>         def __array_module__(self, types):
>             if not all(issubclass(t, MyArray) for t in types):
>                 return NotImplemented
>             return my_array_module
>
> Returning custom objects from ``__array_module__``
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> ``my_array_module`` will typically, but need not always, be a Python
> module.
> Returning a custom objects (e.g., with functions implemented via
> ``__getattr__``) may be useful for some advanced use cases.
>
> For example, custom objects could allow for partial implementations of duck
> array modules that fall-back to NumPy (although this is not recommended in
> general because such fall-back behavior can be error prone):
>
> .. code:: python
>
>     class MyArray:
>         def __array_module__(self, types):
>             if all(issubclass(t, MyArray) for t in types):
>                 return ArrayModule()
>             else:
>                 return NotImplemented
>
>     class ArrayModule:
>         def __getattr__(self, name):
>             import base_module
>             return getattr(base_module, name, getattr(numpy, name))
>
> Subclassing from ``numpy.ndarray``
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> All of the same guidance about well-defined type casting hierarchies from
> NEP-18 still applies. ``numpy.ndarray`` itself contains a matching
> implementation of ``__array_module__``,  which is convenient for
> subclasses:
>
> .. code:: python
>
>     class ndarray:
>         def __array_module__(self, types):
>             if all(issubclass(t, ndarray) for t in types):
>                 return numpy
>             else:
>                 return NotImplemented
>
> NumPy's internal machinery
> ==========================
>
> The type resolution rules of ``get_array_module`` follow the same model as
> Python and NumPy's existing dispatch protocols: subclasses are called
> before
> super-classes, and otherwise left to right. ``__array_module__`` is
> guaranteed
> to be called only  a single time on each unique type.
>
> The actual implementation of `get_array_module` will be in C, but should be
> equivalent to this Python code:
>
> .. code:: python
>
>     def get_array_module(*arrays, default=numpy):
>         implementing_arrays, types = _implementing_arrays_and_types(arrays)
>         if not implementing_arrays and default is not None:
>             return default
>         for array in implementing_arrays:
>             module = array.__array_module__(types)
>             if module is not NotImplemented:
>                 return module
>         raise TypeError("no common array module found")
>
>     def _implementing_arrays_and_types(relevant_arrays):
>         types = []
>         implementing_arrays = []
>         for array in relevant_arrays:
>             t = type(array)
>             if t not in types and hasattr(t, '__array_module__'):
>                 types.append(t)
>                 # Subclasses before superclasses, otherwise left to right
>                 index = len(implementing_arrays)
>                 for i, old_array in enumerate(implementing_arrays):
>                     if issubclass(t, type(old_array)):
>                         index = i
>                         break
>                 implementing_arrays.insert(index, array)
>         return implementing_arrays, types
>
> Relationship with ``__array_ufunc__`` and ``__array_function__``
> ----------------------------------------------------------------
>
> These older protocols have distinct use-cases and should remain
> ===============================================================
>
> ``__array_module__`` is intended to resolve limitations of
> ``__array_function__``, so it is natural to consider whether it could
> entirely
> replace ``__array_function__``. This would offer dual benefits: (1)
> simplifying
> the user-story about how to override NumPy and (2) removing the slowdown
> associated with checking for dispatch when calling every NumPy function.
>
> However, ``__array_module__`` and ``__array_function__`` are pretty
> different
> from a user perspective: it requires explicit calls to
> ``get_array_function``,
> rather than simply reusing original ``numpy`` functions. This is probably
> fine
> for *libraries* that rely on duck-arrays, but may be frustratingly verbose
> for
> interactive use.
>
> Some of the dispatching use-cases for ``__array_ufunc__`` are also solved
> by
> ``__array_module__``, but not all of them. For example, it is still useful
> to
> be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in a
> generic way
> on non-NumPy arrays (e.g., with dask.array).
>
> Given their existing adoption and distinct use cases, we don't think it
> makes
> sense to remove or deprecate ``__array_function__`` and
> ``__array_ufunc__`` at
> this time.
>
> Mixin classes to implement ``__array_function__`` and ``__array_ufunc__``
> =========================================================================
>
> Despite the user-facing differences, ``__array_module__`` and a module
> implementing NumPy's API still contain sufficient functionality needed to
> implement dispatching with the existing duck array protocols.
>
> For example, the following mixin classes would provide sensible defaults
> for
> these special methods in terms of ``get_array_module`` and
> ``__array_module__``:
>
> .. code:: python
>
>     class ArrayUfuncFromModuleMixin:
>
>         def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
>             arrays = inputs + kwargs.get('out', ())
>             try:
>                 array_module = np.get_array_module(*arrays)
>             except TypeError:
>                 return NotImplemented
>
>             try:
>                 # Note this may have false positive matches, if
> ufunc.__name__
>                 # matches the name of a ufunc defined by NumPy.
> Unfortunately
>                 # there is no way to determine in which module a ufunc was
>                 # defined.
>                 new_ufunc = getattr(array_module, ufunc.__name__)
>             except AttributeError:
>                 return NotImplemented
>
>             try:
>                 callable = getattr(new_ufunc, method)
>             except AttributeError:
>                 return NotImplemented
>
>             return callable(*inputs, **kwargs)
>
>     class ArrayFunctionFromModuleMixin:
>
>         def __array_function__(self, func, types, args, kwargs):
>             array_module = self.__array_module__(types)
>             if array_module is NotImplemented:
>                 return NotImplemented
>
>             # Traverse submodules to find the appropriate function
>             modules = func.__module__.split('.')
>             assert modules[0] == 'numpy'
>             for submodule in modules[1:]:
>                 module = getattr(module, submodule, None)
>             new_func = getattr(module, func.__name__, None)
>             if new_func is None:
>                 return NotImplemented
>
>             return new_func(*args, **kwargs)
>
> To make it easier to write duck arrays, we could also add these mixin
> classes
> into ``numpy.lib.mixins`` (but the examples above may suffice).
>
> Alternatives considered
> -----------------------
>
> Naming
> ======
>
> We like the name ``__array_module__`` because it mirrors the existing
> ``__array_function__`` and ``__array_ufunc__`` protocols. Another
> reasonable
> choice could be ``__array_namespace__``.
>
> It is less clear what the NumPy function that calls this protocol should be
> called (``get_array_module`` in this proposal). Some possible alternatives:
> ``array_module``, ``common_array_module``, ``resolve_array_module``,
> ``get_namespace``, ``get_numpy``, ``get_numpylike_module``,
> ``get_duck_array_module``.
>
> .. _requesting-restricted-subsets:
>
> Requesting restricted subsets of NumPy's API
> ============================================
>
> Over time, NumPy has accumulated a very large API surface, with over 600
> attributes in the top level ``numpy`` module alone. It is unlikely that any
> duck array library could or would want to implement all of these functions
> and
> classes, because the frequently used subset of NumPy is much smaller.
>
> We think it would be useful exercise to define "minimal" subset(s) of
> NumPy's
> API, omitting rarely used or non-recommended functionality. For example,
> minimal NumPy might include ``stack``, but not the other stacking functions
> ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This could clearly
> indicate to duck array authors and users want functionality is core and
> what
> functionality they can skip.
>
> Support for requesting a restricted subset of NumPy's API would be a
> natural
> feature to include in  ``get_array_function`` and ``__array_module__``,
> e.g.,
>
> .. code:: python
>
>     # array_module is only guaranteed to contain "minimal" NumPy
>     array_module = np.get_array_module(*arrays, request='minimal')
>
> To facilitate testing with NumPy and use with any valid duck array library,
> NumPy itself would return restricted versions of the ``numpy`` module when
> ``get_array_module`` is called only on NumPy arrays. Omitted functions
> would
> simply not exist.
>
> Unfortunately, we have not yet figured out what these restricted subsets
> should
> be, so it doesn't make sense to do this yet. When/if we do, we could
> either add
> new keyword arguments to ``get_array_module`` or add new top level
> functions,
> e.g., ``get_minimal_array_module``. We would also need to add either a new
> protocol patterned off of ``__array_module__`` (e.g.,
> ``__array_module_minimal__``), or could add an optional second argument to
> ``__array_module__`` (catching errors with ``try``/``except``).
>
> A new namespace for implicit dispatch
> =====================================
>
> Instead of supporting overrides in the main `numpy` namespace with
> ``__array_function__``, we could create a new opt-in namespace, e.g.,
> ``numpy.api``, with versions of NumPy functions that support dispatching.
> These
> overrides would need new opt-in protocols, e.g., ``__array_function_api__``
> patterned off of ``__array_function__``.
>
> This would resolve the biggest limitations of ``__array_function__`` by
> being
> opt-in and would also allow for unambiguously overriding functions like
> ``asarray``, because ``np.api.asarray`` would always mean "convert an
> array-like object."  But it wouldn't solve all the dispatching needs met by
> ``__array_module__``, and would leave us with supporting a considerably
> more
> complex protocol both for array users and implementors.
>
> We could potentially implement such a new namespace *via* the
> ``__array_module__`` protocol. Certainly some users would find this
> convenient,
> because it is slightly less boilerplate. But this would leave users with a
> confusing choice: when should they use `get_array_module` vs.
> `np.api.something`. Also, we would have to add and maintain a whole new
> module,
> which is considerably more expensive than merely adding a function.
>
> Dispatching on both types and arrays instead of only types
> ==========================================================
>
> Instead of supporting dispatch only via unique array types, we could also
> support dispatch via array objects, e.g., by passing an ``arrays``
> argument as
> part of the ``__array_module__`` protocol. This could potentially be
> useful for
> dispatch for arrays with metadata, such provided by Dask and Pint, but
> would
> impose costs in terms of type safety and complexity.
>
> For example, a library that supports arrays on both CPUs and GPUs might
> decide
> on which device to create a new arrays from functions like ``ones`` based
> on
> input arguments:
>
> .. code:: python
>
>     class Array:
>         def __array_module__(self, types, arrays):
>             useful_arrays = tuple(a in arrays if isinstance(a, Array))
>             if not useful_arrays:
>                 return NotImplemented
>             prefer_gpu = any(a.prefer_gpu for a in useful_arrays)
>             return ArrayModule(prefer_gpu)
>
>     class ArrayModule:
>         def __init__(self, prefer_gpu):
>             self.prefer_gpu = prefer_gpu
>
>         def __getattr__(self, name):
>             import base_module
>             base_func = getattr(base_module, name)
>             return functools.partial(base_func, prefer_gpu=self.prefer_gpu)
>
> This might be useful, but it's not clear if we really need it. Pint seems
> to
> get along OK without any explicit array creation routines (favoring
> multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for the most
> part
> Dask is also OK with existing ``__array_function__`` style overides (e.g.,
> favoring ``np.ones_like`` over ``np.ones``). Choosing whether to place an
> array
> on the CPU or GPU could be solved by `making array creation lazy
> <https://github.com/google/jax/pull/1668>`_.
>
> .. _appendix-design-choices:
>
> Appendix: design choices for API overrides
> ------------------------------------------
>
> There is a large range of possible design choices for overriding NumPy's
> API.
> Here we discuss three major axes of the design decision that guided our
> design
> for ``__array_module__``.
>
> Opt-in vs. opt-out for users
> ============================
>
> The ``__array_ufunc__`` and ``__array_function__`` protocols provide a
> mechanism for overriding NumPy functions *within NumPy's existing
> namespace*.
> This means that users need to explicitly opt-out if they do not want any
> overridden behavior, e.g., by casting arrays with ``np.asarray()``.
>
> In theory, this approach lowers the barrier for adopting these protocols in
> user code and libraries, because code that uses the standard NumPy
> namespace is
> automatically compatible. But in practice, this hasn't worked out. For
> example,
> most well-maintained libraries that use NumPy follow the best practice of
> casting all inputs with ``np.asarray()``, which they would have to
> explicitly
> relax to use ``__array_function__``. Our experience has been that making a
> library compatible with a new duck array type typically requires at least a
> small amount of work to accommodate differences in the data model and
> operations
> that can be implemented efficiently.
>
> These opt-out approaches also considerably complicate backwards
> compatibility
> for libraries that adopt these protocols, because by opting in as a library
> they also opt-in their users, whether they expect it or not. For winning
> over
> libraries that have been unable to adopt ``__array_function__``, an opt-in
> approach seems like a must.
>
> Explicit vs. implicit choice of implementation
> ==============================================
>
> Both ``__array_ufunc__`` and ``__array_function__`` have implicit control
> over
> dispatching: the dispatched functions are determined via the appropriate
> protocols in every function call. This generalizes well to handling many
> different types of objects, as evidenced by its use for implementing
> arithmetic
> operators in Python, but it has two downsides:
>
> 1. *Speed*: it imposes additional overhead in every function call, because
> each
>    function call needs to inspect each of its arguments for overrides.
> This is
>    why arithmetic on builtin Python numbers is slow.
> 2. *Readability*: it is not longer immediately evident to readers of code
> what
>    happens when a function is called, because the function's implementation
>    could be overridden by any of its arguments.
>
> In contrast, importing a new library (e.g., ``import  dask.array as da``)
> with
> an API matching NumPy is entirely explicit. There is no overhead from
> dispatch
> or ambiguity about which implementation is being used.
>
> Explicit and implicit choice of implementations are not mutually exclusive
> options. Indeed, most implementations of NumPy API overrides via
> ``__array_function__`` that we are familiar with (namely, dask, CuPy and
> sparse, but not Pint) also include an explicit way to use their version of
> NumPy's API by importing a module directly (``dask.array``, ``cupy`` or
> ``sparse``, respectively).
>
> Local vs. non-local vs. global control
> ======================================
>
> The final design axis is how users control the choice of API:
>
> - **Local control**, as exemplified by multiple dispatch and Python
> protocols for
>   arithmetic, determines which implementation to use either by checking
> types
>   or calling methods on the direct arguments of a function.
> - **Non-local control** such as `np.errstate
>   <
> https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html
> >`_
>   overrides behavior with global-state via function decorators or
>   context-managers. Control is determined hierarchically, via the
> inner-most
>   context.
> - **Global control** provides a mechanism for users to set default
> behavior,
>   either via function calls or configuration files. For example, matplotlib
>   allows setting a global choice of plotting backend.
>
> Local control is generally considered a best practice for API design,
> because
> control flow is entirely explicit, which makes it the easiest to
> understand.
> Non-local and global control are occasionally used, but generally either
> due to
> ignorance or a lack of better alternatives.
>
> In the case of duck typing for NumPy's public API, we think non-local or
> global
> control would be mistakes, mostly because they **don't compose well**. If
> one
> library sets/needs one set of overrides and then internally calls a routine
> that expects another set of overrides, the resulting behavior may be very
> surprising. Higher order functions are especially problematic, because the
> context in which functions are evaluated may not be the context in which
> they
> are defined.
>
> One class of override use cases where we think non-local and global
> control are
> appropriate is for choosing a backend system that is guaranteed to have an
> entirely consistent interface, such as a faster alternative implementation
> of
> ``numpy.fft`` on NumPy arrays. However, these are out of scope for the
> current
> proposal, which is focused on duck arrays.
>
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200205/26e488e9/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Wed Feb  5 13:13:52 2020
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Wed, 5 Feb 2020 18:13:52 +0000
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <CABL7CQizzjByT3TnRj7nf+rt9dKGdjfZRc=SEHZrjSkNORYYWA@mail.gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
 <CABL7CQizzjByT3TnRj7nf+rt9dKGdjfZRc=SEHZrjSkNORYYWA@mail.gmail.com>
Message-ID: <CAL1kJvAZrcfriz8-yB1itr7zvf-bERUyDHRZg47HiSvdMDSffA@mail.gmail.com>

>  scipy.linalg is a superset of numpy.linalg

This isn't completely accurate - numpy.linalg supports almost all
operations* over stacks of matrices via gufuncs, but scipy.linalg does not
appear to.

Eric

*: not lstsq due to an ungeneralizable public API

On Wed, 5 Feb 2020 at 17:38, Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Wed, Feb 5, 2020 at 10:01 AM Andreas Mueller <t3kcit at gmail.com> wrote:
>
>> A bit late to the NEP 37 party.
>> I just wanted to say that at least from my perspective it seems a great
>> solution that will help sklearn move towards more flexible compute engines.
>> I think one of the biggest issues is array creation (including random
>> arrays), and that's handled quite nicely with NEP 37.
>>
>> There's some discussion on the scikit-learn side here:
>> https://github.com/scikit-learn/scikit-learn/pull/14963
>> https://github.com/scikit-learn/scikit-learn/issues/11447
>>
>> Two different groups of people tried to use __array_function__ to
>> delegate to MxNet and CuPy respectively in scikit-learn, and ran into the
>> same issues.
>>
>> There's some remaining issues in sklearn that will not be handled by NEP
>> 37 but they go beyond NumPy in some sense.
>> Just to briefly bring them up:
>>
>> - We use scipy.linalg in many places, and we would need to do a separate
>> dispatching to check whether we can use module.linalg instead
>>  (that might be an issue for many libraries but I'm not sure).
>>
>
> That is an issue, and goes in the opposite direction we need -
> scipy.linalg is a superset of numpy.linalg, so we'd like to encourage using
> scipy. This is something we may want to consider fixing by making the
> dispatch decorator public in numpy and adopting in scipy.
>
> Cheers,
> Ralf
>
>
>
>>
>> - Some models have several possible optimization algorithms, some of
>> which are pure numpy and some which are Cython. If someone provides a
>> different array module,
>>  we might want to choose an algorithm that is actually supported by that
>> module. While this exact issue is maybe sklearn specific, a similar issue
>> could appear for most downstream libs that use Cython in some places.
>>  Many Cython algorithms could be implemented in pure numpy with a
>> potential slowdown, but once we have NEP 37 there might be a benefit to
>> having a pure NumPy implementation as an alternative code path.
>>
>>
>> Anyway, NEP 37 seems a great step in the right direction and would enable
>> sklearn to actually dispatch in some places. Dispatching just based on
>> __array_function__ seems not really feasible so far.
>>
>> Best,
>> Andreas Mueller
>>
>>
>> On 1/6/20 11:29 PM, Stephan Hoyer wrote:
>>
>> I am pleased to present a new NumPy Enhancement Proposal for discussion:
>> "NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
>> very welcome!
>>
>> The full text follows. The rendered proposal can also be found online at
>> https://numpy.org/neps/nep-0037-array-module.html
>>
>> Best,
>> Stephan Hoyer
>>
>> ===================================================
>> NEP 37 ? A dispatch protocol for NumPy-like modules
>> ===================================================
>>
>> :Author: Stephan Hoyer <shoyer at google.com>
>> :Author: Hameer Abbasi
>> :Author: Sebastian Berg
>> :Status: Draft
>> :Type: Standards Track
>> :Created: 2019-12-29
>>
>> Abstract
>> --------
>>
>> NEP-18's ``__array_function__`` has been a mixed success. Some projects
>> (e.g.,
>> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it. Others
>> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a
>> new
>> protocol, ``__array_module__``, that we expect could eventually subsume
>> most
>> use-cases for ``__array_function__``. The protocol requires explicit
>> adoption
>> by both users and library authors, which ensures backwards compatibility,
>> and
>> is also significantly simpler than ``__array_function__``, both of which
>> we
>> expect will make it easier to adopt.
>>
>> Why ``__array_function__`` hasn't been enough
>> ---------------------------------------------
>>
>> There are two broad ways in which NEP-18 has fallen short of its goals:
>>
>> 1. **Maintainability concerns**. `__array_function__` has significant
>>    implications for libraries that use it:
>>
>>    - Projects like `PyTorch
>>      <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
>>      <https://github.com/google/jax/issues/1565>`_ and even `scipy.sparse
>>      <https://github.com/scipy/scipy/issues/10362>`_ have been reluctant
>> to
>>      implement `__array_function__` in part because they are concerned
>> about
>>      **breaking existing code**: users expect NumPy functions like
>>      ``np.concatenate`` to return NumPy arrays. This is a fundamental
>>      limitation of the ``__array_function__`` design, which we chose to
>> allow
>>      overriding the existing ``numpy`` namespace.
>>    - ``__array_function__`` currently requires an "all or nothing"
>> approach to
>>      implementing NumPy's API. There is no good pathway for **incremental
>>      adoption**, which is particularly problematic for established
>> projects
>>      for which adopting ``__array_function__`` would result in breaking
>>      changes.
>>    - It is no longer possible to use **aliases to NumPy functions** within
>>      modules that support overrides. For example, both CuPy and JAX set
>>      ``result_type = np.result_type``.
>>    - Implementing **fall-back mechanisms** for unimplemented NumPy
>> functions
>>      by using NumPy's implementation is hard to get right (but see the
>>      `version from dask <https://github.com/dask/dask/pull/5043>`_),
>> because
>>      ``__array_function__`` does not present a consistent interface.
>>      Converting all arguments of array type requires recursing into
>> generic
>>      arguments of the form ``*args, **kwargs``.
>>
>> 2. **Limitations on what can be overridden.** ``__array_function__`` has
>> some
>>    important gaps, most notably array creation and coercion functions:
>>
>>    - **Array creation** routines (e.g., ``np.arange`` and those in
>>      ``np.random``) need some other mechanism for indicating what type of
>>      arrays to create. `NEP 36 <https://github.com/numpy/numpy/pull/14715
>> >`_
>>      proposed adding optional ``like=`` arguments to functions without
>>      existing array arguments. However, we still lack any mechanism to
>>      override methods on objects, such as those needed by
>>      ``np.random.RandomState``.
>>    - **Array conversion** can't reuse the existing coercion functions like
>>      ``np.asarray``, because ``np.asarray`` sometimes means "convert to an
>>      exact ``np.ndarray``" and other times means "convert to something
>> _like_
>>      a NumPy array." This led to the `NEP 30
>>      <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_
>> proposal for
>>      a separate ``np.duckarray`` function, but this still does not
>> resolve how
>>      to cast one duck array into a type matching another duck array.
>>
>> ``get_array_module`` and the ``__array_module__`` protocol
>> ----------------------------------------------------------
>>
>> We propose a new user-facing mechanism for dispatching to a duck-array
>> implementation, ``numpy.get_array_module``. ``get_array_module`` performs
>> the
>> same type resolution as ``__array_function__`` and returns a module with
>> an API
>> promised to match the standard interface of ``numpy`` that can implement
>> operations on all provided array types.
>>
>> The protocol itself is both simpler and more powerful than
>> ``__array_function__``, because it doesn't need to worry about actually
>> implementing functions. We believe it resolves most of the
>> maintainability and
>> functionality limitations of ``__array_function__``.
>>
>> The new protocol is opt-in, explicit and with local control; see
>> :ref:`appendix-design-choices` for discussion on the importance of these
>> design
>> features.
>>
>> The array module contract
>> =========================
>>
>> Modules returned by ``get_array_module``/``__array_module__`` should make
>> a
>> best effort to implement NumPy's core functionality on new array types(s).
>> Unimplemented functionality should simply be omitted (e.g., accessing an
>> unimplemented function should raise ``AttributeError``). In the future, we
>> anticipate codifying a protocol for requesting restricted subsets of
>> ``numpy``;
>> see :ref:`requesting-restricted-subsets` for more details.
>>
>> How to use ``get_array_module``
>> ===============================
>>
>> Code that wants to support generic duck arrays should explicitly call
>> ``get_array_module`` to determine an appropriate array module from which
>> to
>> call functions, rather than using the ``numpy`` namespace directly. For
>> example:
>>
>> .. code:: python
>>
>>     # calls the appropriate version of np.something for x and y
>>     module = np.get_array_module(x, y)
>>     module.something(x, y)
>>
>> Both array creation and array conversion are supported, because
>> dispatching is
>> handled by ``get_array_module`` rather than via the types of function
>> arguments. For example, to use random number generation functions or
>> methods,
>> we can simply pull out the appropriate submodule:
>>
>> .. code:: python
>>
>>     def duckarray_add_random(array):
>>         module = np.get_array_module(array)
>>         noise = module.random.randn(*array.shape)
>>         return array + noise
>>
>> We can also write the duck-array ``stack`` function from `NEP 30
>> <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_, without
>> the need
>> for a new ``np.duckarray`` function:
>>
>> .. code:: python
>>
>>     def duckarray_stack(arrays):
>>         module = np.get_array_module(*arrays)
>>         arrays = [module.asarray(arr) for arr in arrays]
>>         shapes = {arr.shape for arr in arrays}
>>         if len(shapes) != 1:
>>             raise ValueError('all input arrays must have the same shape')
>>         expanded_arrays = [arr[module.newaxis, ...] for arr in arrays]
>>         return module.concatenate(expanded_arrays, axis=0)
>>
>> By default, ``get_array_module`` will return the ``numpy`` module if no
>> arguments are arrays. This fall-back can be explicitly controlled by
>> providing
>> the ``module`` keyword-only argument. It is also possible to indicate
>> that an
>> exception should be raised instead of returning a default array module by
>> setting ``module=None``.
>>
>> How to implement ``__array_module__``
>> =====================================
>>
>> Libraries implementing a duck array type that want to support
>> ``get_array_module`` need to implement the corresponding protocol,
>> ``__array_module__``. This new protocol is based on Python's dispatch
>> protocol
>> for arithmetic, and is essentially a simpler version of
>> ``__array_function__``.
>>
>> Only one argument is passed into ``__array_module__``, a Python
>> collection of
>> unique array types passed into ``get_array_module``, i.e., all arguments
>> with
>> an ``__array_module__`` attribute.
>>
>> The special method should either return an namespace with an API matching
>> ``numpy``, or ``NotImplemented``, indicating that it does not know how to
>> handle the operation:
>>
>> .. code:: python
>>
>>     class MyArray:
>>         def __array_module__(self, types):
>>             if not all(issubclass(t, MyArray) for t in types):
>>                 return NotImplemented
>>             return my_array_module
>>
>> Returning custom objects from ``__array_module__``
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> ``my_array_module`` will typically, but need not always, be a Python
>> module.
>> Returning a custom objects (e.g., with functions implemented via
>> ``__getattr__``) may be useful for some advanced use cases.
>>
>> For example, custom objects could allow for partial implementations of
>> duck
>> array modules that fall-back to NumPy (although this is not recommended in
>> general because such fall-back behavior can be error prone):
>>
>> .. code:: python
>>
>>     class MyArray:
>>         def __array_module__(self, types):
>>             if all(issubclass(t, MyArray) for t in types):
>>                 return ArrayModule()
>>             else:
>>                 return NotImplemented
>>
>>     class ArrayModule:
>>         def __getattr__(self, name):
>>             import base_module
>>             return getattr(base_module, name, getattr(numpy, name))
>>
>> Subclassing from ``numpy.ndarray``
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> All of the same guidance about well-defined type casting hierarchies from
>> NEP-18 still applies. ``numpy.ndarray`` itself contains a matching
>> implementation of ``__array_module__``,  which is convenient for
>> subclasses:
>>
>> .. code:: python
>>
>>     class ndarray:
>>         def __array_module__(self, types):
>>             if all(issubclass(t, ndarray) for t in types):
>>                 return numpy
>>             else:
>>                 return NotImplemented
>>
>> NumPy's internal machinery
>> ==========================
>>
>> The type resolution rules of ``get_array_module`` follow the same model as
>> Python and NumPy's existing dispatch protocols: subclasses are called
>> before
>> super-classes, and otherwise left to right. ``__array_module__`` is
>> guaranteed
>> to be called only  a single time on each unique type.
>>
>> The actual implementation of `get_array_module` will be in C, but should
>> be
>> equivalent to this Python code:
>>
>> .. code:: python
>>
>>     def get_array_module(*arrays, default=numpy):
>>         implementing_arrays, types =
>> _implementing_arrays_and_types(arrays)
>>         if not implementing_arrays and default is not None:
>>             return default
>>         for array in implementing_arrays:
>>             module = array.__array_module__(types)
>>             if module is not NotImplemented:
>>                 return module
>>         raise TypeError("no common array module found")
>>
>>     def _implementing_arrays_and_types(relevant_arrays):
>>         types = []
>>         implementing_arrays = []
>>         for array in relevant_arrays:
>>             t = type(array)
>>             if t not in types and hasattr(t, '__array_module__'):
>>                 types.append(t)
>>                 # Subclasses before superclasses, otherwise left to right
>>                 index = len(implementing_arrays)
>>                 for i, old_array in enumerate(implementing_arrays):
>>                     if issubclass(t, type(old_array)):
>>                         index = i
>>                         break
>>                 implementing_arrays.insert(index, array)
>>         return implementing_arrays, types
>>
>> Relationship with ``__array_ufunc__`` and ``__array_function__``
>> ----------------------------------------------------------------
>>
>> These older protocols have distinct use-cases and should remain
>> ===============================================================
>>
>> ``__array_module__`` is intended to resolve limitations of
>> ``__array_function__``, so it is natural to consider whether it could
>> entirely
>> replace ``__array_function__``. This would offer dual benefits: (1)
>> simplifying
>> the user-story about how to override NumPy and (2) removing the slowdown
>> associated with checking for dispatch when calling every NumPy function.
>>
>> However, ``__array_module__`` and ``__array_function__`` are pretty
>> different
>> from a user perspective: it requires explicit calls to
>> ``get_array_function``,
>> rather than simply reusing original ``numpy`` functions. This is probably
>> fine
>> for *libraries* that rely on duck-arrays, but may be frustratingly
>> verbose for
>> interactive use.
>>
>> Some of the dispatching use-cases for ``__array_ufunc__`` are also solved
>> by
>> ``__array_module__``, but not all of them. For example, it is still
>> useful to
>> be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in a
>> generic way
>> on non-NumPy arrays (e.g., with dask.array).
>>
>> Given their existing adoption and distinct use cases, we don't think it
>> makes
>> sense to remove or deprecate ``__array_function__`` and
>> ``__array_ufunc__`` at
>> this time.
>>
>> Mixin classes to implement ``__array_function__`` and ``__array_ufunc__``
>> =========================================================================
>>
>> Despite the user-facing differences, ``__array_module__`` and a module
>> implementing NumPy's API still contain sufficient functionality needed to
>> implement dispatching with the existing duck array protocols.
>>
>> For example, the following mixin classes would provide sensible defaults
>> for
>> these special methods in terms of ``get_array_module`` and
>> ``__array_module__``:
>>
>> .. code:: python
>>
>>     class ArrayUfuncFromModuleMixin:
>>
>>         def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
>>             arrays = inputs + kwargs.get('out', ())
>>             try:
>>                 array_module = np.get_array_module(*arrays)
>>             except TypeError:
>>                 return NotImplemented
>>
>>             try:
>>                 # Note this may have false positive matches, if
>> ufunc.__name__
>>                 # matches the name of a ufunc defined by NumPy.
>> Unfortunately
>>                 # there is no way to determine in which module a ufunc was
>>                 # defined.
>>                 new_ufunc = getattr(array_module, ufunc.__name__)
>>             except AttributeError:
>>                 return NotImplemented
>>
>>             try:
>>                 callable = getattr(new_ufunc, method)
>>             except AttributeError:
>>                 return NotImplemented
>>
>>             return callable(*inputs, **kwargs)
>>
>>     class ArrayFunctionFromModuleMixin:
>>
>>         def __array_function__(self, func, types, args, kwargs):
>>             array_module = self.__array_module__(types)
>>             if array_module is NotImplemented:
>>                 return NotImplemented
>>
>>             # Traverse submodules to find the appropriate function
>>             modules = func.__module__.split('.')
>>             assert modules[0] == 'numpy'
>>             for submodule in modules[1:]:
>>                 module = getattr(module, submodule, None)
>>             new_func = getattr(module, func.__name__, None)
>>             if new_func is None:
>>                 return NotImplemented
>>
>>             return new_func(*args, **kwargs)
>>
>> To make it easier to write duck arrays, we could also add these mixin
>> classes
>> into ``numpy.lib.mixins`` (but the examples above may suffice).
>>
>> Alternatives considered
>> -----------------------
>>
>> Naming
>> ======
>>
>> We like the name ``__array_module__`` because it mirrors the existing
>> ``__array_function__`` and ``__array_ufunc__`` protocols. Another
>> reasonable
>> choice could be ``__array_namespace__``.
>>
>> It is less clear what the NumPy function that calls this protocol should
>> be
>> called (``get_array_module`` in this proposal). Some possible
>> alternatives:
>> ``array_module``, ``common_array_module``, ``resolve_array_module``,
>> ``get_namespace``, ``get_numpy``, ``get_numpylike_module``,
>> ``get_duck_array_module``.
>>
>> .. _requesting-restricted-subsets:
>>
>> Requesting restricted subsets of NumPy's API
>> ============================================
>>
>> Over time, NumPy has accumulated a very large API surface, with over 600
>> attributes in the top level ``numpy`` module alone. It is unlikely that
>> any
>> duck array library could or would want to implement all of these
>> functions and
>> classes, because the frequently used subset of NumPy is much smaller.
>>
>> We think it would be useful exercise to define "minimal" subset(s) of
>> NumPy's
>> API, omitting rarely used or non-recommended functionality. For example,
>> minimal NumPy might include ``stack``, but not the other stacking
>> functions
>> ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This could
>> clearly
>> indicate to duck array authors and users want functionality is core and
>> what
>> functionality they can skip.
>>
>> Support for requesting a restricted subset of NumPy's API would be a
>> natural
>> feature to include in  ``get_array_function`` and ``__array_module__``,
>> e.g.,
>>
>> .. code:: python
>>
>>     # array_module is only guaranteed to contain "minimal" NumPy
>>     array_module = np.get_array_module(*arrays, request='minimal')
>>
>> To facilitate testing with NumPy and use with any valid duck array
>> library,
>> NumPy itself would return restricted versions of the ``numpy`` module when
>> ``get_array_module`` is called only on NumPy arrays. Omitted functions
>> would
>> simply not exist.
>>
>> Unfortunately, we have not yet figured out what these restricted subsets
>> should
>> be, so it doesn't make sense to do this yet. When/if we do, we could
>> either add
>> new keyword arguments to ``get_array_module`` or add new top level
>> functions,
>> e.g., ``get_minimal_array_module``. We would also need to add either a new
>> protocol patterned off of ``__array_module__`` (e.g.,
>> ``__array_module_minimal__``), or could add an optional second argument to
>> ``__array_module__`` (catching errors with ``try``/``except``).
>>
>> A new namespace for implicit dispatch
>> =====================================
>>
>> Instead of supporting overrides in the main `numpy` namespace with
>> ``__array_function__``, we could create a new opt-in namespace, e.g.,
>> ``numpy.api``, with versions of NumPy functions that support dispatching.
>> These
>> overrides would need new opt-in protocols, e.g.,
>> ``__array_function_api__``
>> patterned off of ``__array_function__``.
>>
>> This would resolve the biggest limitations of ``__array_function__`` by
>> being
>> opt-in and would also allow for unambiguously overriding functions like
>> ``asarray``, because ``np.api.asarray`` would always mean "convert an
>> array-like object."  But it wouldn't solve all the dispatching needs met
>> by
>> ``__array_module__``, and would leave us with supporting a considerably
>> more
>> complex protocol both for array users and implementors.
>>
>> We could potentially implement such a new namespace *via* the
>> ``__array_module__`` protocol. Certainly some users would find this
>> convenient,
>> because it is slightly less boilerplate. But this would leave users with a
>> confusing choice: when should they use `get_array_module` vs.
>> `np.api.something`. Also, we would have to add and maintain a whole new
>> module,
>> which is considerably more expensive than merely adding a function.
>>
>> Dispatching on both types and arrays instead of only types
>> ==========================================================
>>
>> Instead of supporting dispatch only via unique array types, we could also
>> support dispatch via array objects, e.g., by passing an ``arrays``
>> argument as
>> part of the ``__array_module__`` protocol. This could potentially be
>> useful for
>> dispatch for arrays with metadata, such provided by Dask and Pint, but
>> would
>> impose costs in terms of type safety and complexity.
>>
>> For example, a library that supports arrays on both CPUs and GPUs might
>> decide
>> on which device to create a new arrays from functions like ``ones`` based
>> on
>> input arguments:
>>
>> .. code:: python
>>
>>     class Array:
>>         def __array_module__(self, types, arrays):
>>             useful_arrays = tuple(a in arrays if isinstance(a, Array))
>>             if not useful_arrays:
>>                 return NotImplemented
>>             prefer_gpu = any(a.prefer_gpu for a in useful_arrays)
>>             return ArrayModule(prefer_gpu)
>>
>>     class ArrayModule:
>>         def __init__(self, prefer_gpu):
>>             self.prefer_gpu = prefer_gpu
>>
>>         def __getattr__(self, name):
>>             import base_module
>>             base_func = getattr(base_module, name)
>>             return functools.partial(base_func,
>> prefer_gpu=self.prefer_gpu)
>>
>> This might be useful, but it's not clear if we really need it. Pint seems
>> to
>> get along OK without any explicit array creation routines (favoring
>> multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for the most
>> part
>> Dask is also OK with existing ``__array_function__`` style overides (e.g.,
>> favoring ``np.ones_like`` over ``np.ones``). Choosing whether to place an
>> array
>> on the CPU or GPU could be solved by `making array creation lazy
>> <https://github.com/google/jax/pull/1668>`_.
>>
>> .. _appendix-design-choices:
>>
>> Appendix: design choices for API overrides
>> ------------------------------------------
>>
>> There is a large range of possible design choices for overriding NumPy's
>> API.
>> Here we discuss three major axes of the design decision that guided our
>> design
>> for ``__array_module__``.
>>
>> Opt-in vs. opt-out for users
>> ============================
>>
>> The ``__array_ufunc__`` and ``__array_function__`` protocols provide a
>> mechanism for overriding NumPy functions *within NumPy's existing
>> namespace*.
>> This means that users need to explicitly opt-out if they do not want any
>> overridden behavior, e.g., by casting arrays with ``np.asarray()``.
>>
>> In theory, this approach lowers the barrier for adopting these protocols
>> in
>> user code and libraries, because code that uses the standard NumPy
>> namespace is
>> automatically compatible. But in practice, this hasn't worked out. For
>> example,
>> most well-maintained libraries that use NumPy follow the best practice of
>> casting all inputs with ``np.asarray()``, which they would have to
>> explicitly
>> relax to use ``__array_function__``. Our experience has been that making a
>> library compatible with a new duck array type typically requires at least
>> a
>> small amount of work to accommodate differences in the data model and
>> operations
>> that can be implemented efficiently.
>>
>> These opt-out approaches also considerably complicate backwards
>> compatibility
>> for libraries that adopt these protocols, because by opting in as a
>> library
>> they also opt-in their users, whether they expect it or not. For winning
>> over
>> libraries that have been unable to adopt ``__array_function__``, an opt-in
>> approach seems like a must.
>>
>> Explicit vs. implicit choice of implementation
>> ==============================================
>>
>> Both ``__array_ufunc__`` and ``__array_function__`` have implicit control
>> over
>> dispatching: the dispatched functions are determined via the appropriate
>> protocols in every function call. This generalizes well to handling many
>> different types of objects, as evidenced by its use for implementing
>> arithmetic
>> operators in Python, but it has two downsides:
>>
>> 1. *Speed*: it imposes additional overhead in every function call,
>> because each
>>    function call needs to inspect each of its arguments for overrides.
>> This is
>>    why arithmetic on builtin Python numbers is slow.
>> 2. *Readability*: it is not longer immediately evident to readers of code
>> what
>>    happens when a function is called, because the function's
>> implementation
>>    could be overridden by any of its arguments.
>>
>> In contrast, importing a new library (e.g., ``import  dask.array as da``)
>> with
>> an API matching NumPy is entirely explicit. There is no overhead from
>> dispatch
>> or ambiguity about which implementation is being used.
>>
>> Explicit and implicit choice of implementations are not mutually exclusive
>> options. Indeed, most implementations of NumPy API overrides via
>> ``__array_function__`` that we are familiar with (namely, dask, CuPy and
>> sparse, but not Pint) also include an explicit way to use their version of
>> NumPy's API by importing a module directly (``dask.array``, ``cupy`` or
>> ``sparse``, respectively).
>>
>> Local vs. non-local vs. global control
>> ======================================
>>
>> The final design axis is how users control the choice of API:
>>
>> - **Local control**, as exemplified by multiple dispatch and Python
>> protocols for
>>   arithmetic, determines which implementation to use either by checking
>> types
>>   or calling methods on the direct arguments of a function.
>> - **Non-local control** such as `np.errstate
>>   <
>> https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html
>> >`_
>>   overrides behavior with global-state via function decorators or
>>   context-managers. Control is determined hierarchically, via the
>> inner-most
>>   context.
>> - **Global control** provides a mechanism for users to set default
>> behavior,
>>   either via function calls or configuration files. For example,
>> matplotlib
>>   allows setting a global choice of plotting backend.
>>
>> Local control is generally considered a best practice for API design,
>> because
>> control flow is entirely explicit, which makes it the easiest to
>> understand.
>> Non-local and global control are occasionally used, but generally either
>> due to
>> ignorance or a lack of better alternatives.
>>
>> In the case of duck typing for NumPy's public API, we think non-local or
>> global
>> control would be mistakes, mostly because they **don't compose well**. If
>> one
>> library sets/needs one set of overrides and then internally calls a
>> routine
>> that expects another set of overrides, the resulting behavior may be very
>> surprising. Higher order functions are especially problematic, because the
>> context in which functions are evaluated may not be the context in which
>> they
>> are defined.
>>
>> One class of override use cases where we think non-local and global
>> control are
>> appropriate is for choosing a backend system that is guaranteed to have an
>> entirely consistent interface, such as a faster alternative
>> implementation of
>> ``numpy.fft`` on NumPy arrays. However, these are out of scope for the
>> current
>> proposal, which is focused on duck arrays.
>>
>> _______________________________________________
>> NumPy-Discussion mailing listNumPy-Discussion at python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200205/2fde7273/attachment-0001.html>

From ralf.gommers at gmail.com  Wed Feb  5 13:22:19 2020
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 5 Feb 2020 12:22:19 -0600
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <CAL1kJvAZrcfriz8-yB1itr7zvf-bERUyDHRZg47HiSvdMDSffA@mail.gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
 <CABL7CQizzjByT3TnRj7nf+rt9dKGdjfZRc=SEHZrjSkNORYYWA@mail.gmail.com>
 <CAL1kJvAZrcfriz8-yB1itr7zvf-bERUyDHRZg47HiSvdMDSffA@mail.gmail.com>
Message-ID: <CABL7CQg-=TxKHC=cVBtX7CcaV=d44BJv516MKWcKCgTdUH3DXg@mail.gmail.com>

On Wed, Feb 5, 2020 at 12:14 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> >  scipy.linalg is a superset of numpy.linalg
>
> This isn't completely accurate - numpy.linalg supports almost all
> operations* over stacks of matrices via gufuncs, but scipy.linalg does not
> appear to.
>
> Eric
>
> *: not lstsq due to an ungeneralizable public API
>

That's true for `qr` as well I believe.

Indeed some functions have diverged slightly, but that's not on purpose,
more like a lack of time to coordinate. We would like to fix that so
everything is in sync and fully API-compatible again.

Ralf


> On Wed, 5 Feb 2020 at 17:38, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>>
>>
>> On Wed, Feb 5, 2020 at 10:01 AM Andreas Mueller <t3kcit at gmail.com> wrote:
>>
>>> A bit late to the NEP 37 party.
>>> I just wanted to say that at least from my perspective it seems a great
>>> solution that will help sklearn move towards more flexible compute engines.
>>> I think one of the biggest issues is array creation (including random
>>> arrays), and that's handled quite nicely with NEP 37.
>>>
>>> There's some discussion on the scikit-learn side here:
>>> https://github.com/scikit-learn/scikit-learn/pull/14963
>>> https://github.com/scikit-learn/scikit-learn/issues/11447
>>>
>>> Two different groups of people tried to use __array_function__ to
>>> delegate to MxNet and CuPy respectively in scikit-learn, and ran into the
>>> same issues.
>>>
>>> There's some remaining issues in sklearn that will not be handled by NEP
>>> 37 but they go beyond NumPy in some sense.
>>> Just to briefly bring them up:
>>>
>>> - We use scipy.linalg in many places, and we would need to do a separate
>>> dispatching to check whether we can use module.linalg instead
>>>  (that might be an issue for many libraries but I'm not sure).
>>>
>>
>> That is an issue, and goes in the opposite direction we need -
>> scipy.linalg is a superset of numpy.linalg, so we'd like to encourage using
>> scipy. This is something we may want to consider fixing by making the
>> dispatch decorator public in numpy and adopting in scipy.
>>
>> Cheers,
>> Ralf
>>
>>
>>
>>>
>>> - Some models have several possible optimization algorithms, some of
>>> which are pure numpy and some which are Cython. If someone provides a
>>> different array module,
>>>  we might want to choose an algorithm that is actually supported by that
>>> module. While this exact issue is maybe sklearn specific, a similar issue
>>> could appear for most downstream libs that use Cython in some places.
>>>  Many Cython algorithms could be implemented in pure numpy with a
>>> potential slowdown, but once we have NEP 37 there might be a benefit to
>>> having a pure NumPy implementation as an alternative code path.
>>>
>>>
>>> Anyway, NEP 37 seems a great step in the right direction and would
>>> enable sklearn to actually dispatch in some places. Dispatching just based
>>> on __array_function__ seems not really feasible so far.
>>>
>>> Best,
>>> Andreas Mueller
>>>
>>>
>>> On 1/6/20 11:29 PM, Stephan Hoyer wrote:
>>>
>>> I am pleased to present a new NumPy Enhancement Proposal for discussion:
>>> "NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
>>> very welcome!
>>>
>>> The full text follows. The rendered proposal can also be found online at
>>> https://numpy.org/neps/nep-0037-array-module.html
>>>
>>> Best,
>>> Stephan Hoyer
>>>
>>> ===================================================
>>> NEP 37 ? A dispatch protocol for NumPy-like modules
>>> ===================================================
>>>
>>> :Author: Stephan Hoyer <shoyer at google.com>
>>> :Author: Hameer Abbasi
>>> :Author: Sebastian Berg
>>> :Status: Draft
>>> :Type: Standards Track
>>> :Created: 2019-12-29
>>>
>>> Abstract
>>> --------
>>>
>>> NEP-18's ``__array_function__`` has been a mixed success. Some projects
>>> (e.g.,
>>> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it.
>>> Others
>>> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a
>>> new
>>> protocol, ``__array_module__``, that we expect could eventually subsume
>>> most
>>> use-cases for ``__array_function__``. The protocol requires explicit
>>> adoption
>>> by both users and library authors, which ensures backwards
>>> compatibility, and
>>> is also significantly simpler than ``__array_function__``, both of which
>>> we
>>> expect will make it easier to adopt.
>>>
>>> Why ``__array_function__`` hasn't been enough
>>> ---------------------------------------------
>>>
>>> There are two broad ways in which NEP-18 has fallen short of its goals:
>>>
>>> 1. **Maintainability concerns**. `__array_function__` has significant
>>>    implications for libraries that use it:
>>>
>>>    - Projects like `PyTorch
>>>      <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
>>>      <https://github.com/google/jax/issues/1565>`_ and even
>>> `scipy.sparse
>>>      <https://github.com/scipy/scipy/issues/10362>`_ have been
>>> reluctant to
>>>      implement `__array_function__` in part because they are concerned
>>> about
>>>      **breaking existing code**: users expect NumPy functions like
>>>      ``np.concatenate`` to return NumPy arrays. This is a fundamental
>>>      limitation of the ``__array_function__`` design, which we chose to
>>> allow
>>>      overriding the existing ``numpy`` namespace.
>>>    - ``__array_function__`` currently requires an "all or nothing"
>>> approach to
>>>      implementing NumPy's API. There is no good pathway for **incremental
>>>      adoption**, which is particularly problematic for established
>>> projects
>>>      for which adopting ``__array_function__`` would result in breaking
>>>      changes.
>>>    - It is no longer possible to use **aliases to NumPy functions**
>>> within
>>>      modules that support overrides. For example, both CuPy and JAX set
>>>      ``result_type = np.result_type``.
>>>    - Implementing **fall-back mechanisms** for unimplemented NumPy
>>> functions
>>>      by using NumPy's implementation is hard to get right (but see the
>>>      `version from dask <https://github.com/dask/dask/pull/5043>`_),
>>> because
>>>      ``__array_function__`` does not present a consistent interface.
>>>      Converting all arguments of array type requires recursing into
>>> generic
>>>      arguments of the form ``*args, **kwargs``.
>>>
>>> 2. **Limitations on what can be overridden.** ``__array_function__`` has
>>> some
>>>    important gaps, most notably array creation and coercion functions:
>>>
>>>    - **Array creation** routines (e.g., ``np.arange`` and those in
>>>      ``np.random``) need some other mechanism for indicating what type of
>>>      arrays to create. `NEP 36 <
>>> https://github.com/numpy/numpy/pull/14715>`_
>>>      proposed adding optional ``like=`` arguments to functions without
>>>      existing array arguments. However, we still lack any mechanism to
>>>      override methods on objects, such as those needed by
>>>      ``np.random.RandomState``.
>>>    - **Array conversion** can't reuse the existing coercion functions
>>> like
>>>      ``np.asarray``, because ``np.asarray`` sometimes means "convert to
>>> an
>>>      exact ``np.ndarray``" and other times means "convert to something
>>> _like_
>>>      a NumPy array." This led to the `NEP 30
>>>      <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_
>>> proposal for
>>>      a separate ``np.duckarray`` function, but this still does not
>>> resolve how
>>>      to cast one duck array into a type matching another duck array.
>>>
>>> ``get_array_module`` and the ``__array_module__`` protocol
>>> ----------------------------------------------------------
>>>
>>> We propose a new user-facing mechanism for dispatching to a duck-array
>>> implementation, ``numpy.get_array_module``. ``get_array_module``
>>> performs the
>>> same type resolution as ``__array_function__`` and returns a module with
>>> an API
>>> promised to match the standard interface of ``numpy`` that can implement
>>> operations on all provided array types.
>>>
>>> The protocol itself is both simpler and more powerful than
>>> ``__array_function__``, because it doesn't need to worry about actually
>>> implementing functions. We believe it resolves most of the
>>> maintainability and
>>> functionality limitations of ``__array_function__``.
>>>
>>> The new protocol is opt-in, explicit and with local control; see
>>> :ref:`appendix-design-choices` for discussion on the importance of these
>>> design
>>> features.
>>>
>>> The array module contract
>>> =========================
>>>
>>> Modules returned by ``get_array_module``/``__array_module__`` should
>>> make a
>>> best effort to implement NumPy's core functionality on new array
>>> types(s).
>>> Unimplemented functionality should simply be omitted (e.g., accessing an
>>> unimplemented function should raise ``AttributeError``). In the future,
>>> we
>>> anticipate codifying a protocol for requesting restricted subsets of
>>> ``numpy``;
>>> see :ref:`requesting-restricted-subsets` for more details.
>>>
>>> How to use ``get_array_module``
>>> ===============================
>>>
>>> Code that wants to support generic duck arrays should explicitly call
>>> ``get_array_module`` to determine an appropriate array module from which
>>> to
>>> call functions, rather than using the ``numpy`` namespace directly. For
>>> example:
>>>
>>> .. code:: python
>>>
>>>     # calls the appropriate version of np.something for x and y
>>>     module = np.get_array_module(x, y)
>>>     module.something(x, y)
>>>
>>> Both array creation and array conversion are supported, because
>>> dispatching is
>>> handled by ``get_array_module`` rather than via the types of function
>>> arguments. For example, to use random number generation functions or
>>> methods,
>>> we can simply pull out the appropriate submodule:
>>>
>>> .. code:: python
>>>
>>>     def duckarray_add_random(array):
>>>         module = np.get_array_module(array)
>>>         noise = module.random.randn(*array.shape)
>>>         return array + noise
>>>
>>> We can also write the duck-array ``stack`` function from `NEP 30
>>> <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_, without
>>> the need
>>> for a new ``np.duckarray`` function:
>>>
>>> .. code:: python
>>>
>>>     def duckarray_stack(arrays):
>>>         module = np.get_array_module(*arrays)
>>>         arrays = [module.asarray(arr) for arr in arrays]
>>>         shapes = {arr.shape for arr in arrays}
>>>         if len(shapes) != 1:
>>>             raise ValueError('all input arrays must have the same shape')
>>>         expanded_arrays = [arr[module.newaxis, ...] for arr in arrays]
>>>         return module.concatenate(expanded_arrays, axis=0)
>>>
>>> By default, ``get_array_module`` will return the ``numpy`` module if no
>>> arguments are arrays. This fall-back can be explicitly controlled by
>>> providing
>>> the ``module`` keyword-only argument. It is also possible to indicate
>>> that an
>>> exception should be raised instead of returning a default array module by
>>> setting ``module=None``.
>>>
>>> How to implement ``__array_module__``
>>> =====================================
>>>
>>> Libraries implementing a duck array type that want to support
>>> ``get_array_module`` need to implement the corresponding protocol,
>>> ``__array_module__``. This new protocol is based on Python's dispatch
>>> protocol
>>> for arithmetic, and is essentially a simpler version of
>>> ``__array_function__``.
>>>
>>> Only one argument is passed into ``__array_module__``, a Python
>>> collection of
>>> unique array types passed into ``get_array_module``, i.e., all arguments
>>> with
>>> an ``__array_module__`` attribute.
>>>
>>> The special method should either return an namespace with an API matching
>>> ``numpy``, or ``NotImplemented``, indicating that it does not know how to
>>> handle the operation:
>>>
>>> .. code:: python
>>>
>>>     class MyArray:
>>>         def __array_module__(self, types):
>>>             if not all(issubclass(t, MyArray) for t in types):
>>>                 return NotImplemented
>>>             return my_array_module
>>>
>>> Returning custom objects from ``__array_module__``
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> ``my_array_module`` will typically, but need not always, be a Python
>>> module.
>>> Returning a custom objects (e.g., with functions implemented via
>>> ``__getattr__``) may be useful for some advanced use cases.
>>>
>>> For example, custom objects could allow for partial implementations of
>>> duck
>>> array modules that fall-back to NumPy (although this is not recommended
>>> in
>>> general because such fall-back behavior can be error prone):
>>>
>>> .. code:: python
>>>
>>>     class MyArray:
>>>         def __array_module__(self, types):
>>>             if all(issubclass(t, MyArray) for t in types):
>>>                 return ArrayModule()
>>>             else:
>>>                 return NotImplemented
>>>
>>>     class ArrayModule:
>>>         def __getattr__(self, name):
>>>             import base_module
>>>             return getattr(base_module, name, getattr(numpy, name))
>>>
>>> Subclassing from ``numpy.ndarray``
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> All of the same guidance about well-defined type casting hierarchies from
>>> NEP-18 still applies. ``numpy.ndarray`` itself contains a matching
>>> implementation of ``__array_module__``,  which is convenient for
>>> subclasses:
>>>
>>> .. code:: python
>>>
>>>     class ndarray:
>>>         def __array_module__(self, types):
>>>             if all(issubclass(t, ndarray) for t in types):
>>>                 return numpy
>>>             else:
>>>                 return NotImplemented
>>>
>>> NumPy's internal machinery
>>> ==========================
>>>
>>> The type resolution rules of ``get_array_module`` follow the same model
>>> as
>>> Python and NumPy's existing dispatch protocols: subclasses are called
>>> before
>>> super-classes, and otherwise left to right. ``__array_module__`` is
>>> guaranteed
>>> to be called only  a single time on each unique type.
>>>
>>> The actual implementation of `get_array_module` will be in C, but should
>>> be
>>> equivalent to this Python code:
>>>
>>> .. code:: python
>>>
>>>     def get_array_module(*arrays, default=numpy):
>>>         implementing_arrays, types =
>>> _implementing_arrays_and_types(arrays)
>>>         if not implementing_arrays and default is not None:
>>>             return default
>>>         for array in implementing_arrays:
>>>             module = array.__array_module__(types)
>>>             if module is not NotImplemented:
>>>                 return module
>>>         raise TypeError("no common array module found")
>>>
>>>     def _implementing_arrays_and_types(relevant_arrays):
>>>         types = []
>>>         implementing_arrays = []
>>>         for array in relevant_arrays:
>>>             t = type(array)
>>>             if t not in types and hasattr(t, '__array_module__'):
>>>                 types.append(t)
>>>                 # Subclasses before superclasses, otherwise left to right
>>>                 index = len(implementing_arrays)
>>>                 for i, old_array in enumerate(implementing_arrays):
>>>                     if issubclass(t, type(old_array)):
>>>                         index = i
>>>                         break
>>>                 implementing_arrays.insert(index, array)
>>>         return implementing_arrays, types
>>>
>>> Relationship with ``__array_ufunc__`` and ``__array_function__``
>>> ----------------------------------------------------------------
>>>
>>> These older protocols have distinct use-cases and should remain
>>> ===============================================================
>>>
>>> ``__array_module__`` is intended to resolve limitations of
>>> ``__array_function__``, so it is natural to consider whether it could
>>> entirely
>>> replace ``__array_function__``. This would offer dual benefits: (1)
>>> simplifying
>>> the user-story about how to override NumPy and (2) removing the slowdown
>>> associated with checking for dispatch when calling every NumPy function.
>>>
>>> However, ``__array_module__`` and ``__array_function__`` are pretty
>>> different
>>> from a user perspective: it requires explicit calls to
>>> ``get_array_function``,
>>> rather than simply reusing original ``numpy`` functions. This is
>>> probably fine
>>> for *libraries* that rely on duck-arrays, but may be frustratingly
>>> verbose for
>>> interactive use.
>>>
>>> Some of the dispatching use-cases for ``__array_ufunc__`` are also
>>> solved by
>>> ``__array_module__``, but not all of them. For example, it is still
>>> useful to
>>> be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in a
>>> generic way
>>> on non-NumPy arrays (e.g., with dask.array).
>>>
>>> Given their existing adoption and distinct use cases, we don't think it
>>> makes
>>> sense to remove or deprecate ``__array_function__`` and
>>> ``__array_ufunc__`` at
>>> this time.
>>>
>>> Mixin classes to implement ``__array_function__`` and ``__array_ufunc__``
>>> =========================================================================
>>>
>>> Despite the user-facing differences, ``__array_module__`` and a module
>>> implementing NumPy's API still contain sufficient functionality needed to
>>> implement dispatching with the existing duck array protocols.
>>>
>>> For example, the following mixin classes would provide sensible defaults
>>> for
>>> these special methods in terms of ``get_array_module`` and
>>> ``__array_module__``:
>>>
>>> .. code:: python
>>>
>>>     class ArrayUfuncFromModuleMixin:
>>>
>>>         def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
>>>             arrays = inputs + kwargs.get('out', ())
>>>             try:
>>>                 array_module = np.get_array_module(*arrays)
>>>             except TypeError:
>>>                 return NotImplemented
>>>
>>>             try:
>>>                 # Note this may have false positive matches, if
>>> ufunc.__name__
>>>                 # matches the name of a ufunc defined by NumPy.
>>> Unfortunately
>>>                 # there is no way to determine in which module a ufunc
>>> was
>>>                 # defined.
>>>                 new_ufunc = getattr(array_module, ufunc.__name__)
>>>             except AttributeError:
>>>                 return NotImplemented
>>>
>>>             try:
>>>                 callable = getattr(new_ufunc, method)
>>>             except AttributeError:
>>>                 return NotImplemented
>>>
>>>             return callable(*inputs, **kwargs)
>>>
>>>     class ArrayFunctionFromModuleMixin:
>>>
>>>         def __array_function__(self, func, types, args, kwargs):
>>>             array_module = self.__array_module__(types)
>>>             if array_module is NotImplemented:
>>>                 return NotImplemented
>>>
>>>             # Traverse submodules to find the appropriate function
>>>             modules = func.__module__.split('.')
>>>             assert modules[0] == 'numpy'
>>>             for submodule in modules[1:]:
>>>                 module = getattr(module, submodule, None)
>>>             new_func = getattr(module, func.__name__, None)
>>>             if new_func is None:
>>>                 return NotImplemented
>>>
>>>             return new_func(*args, **kwargs)
>>>
>>> To make it easier to write duck arrays, we could also add these mixin
>>> classes
>>> into ``numpy.lib.mixins`` (but the examples above may suffice).
>>>
>>> Alternatives considered
>>> -----------------------
>>>
>>> Naming
>>> ======
>>>
>>> We like the name ``__array_module__`` because it mirrors the existing
>>> ``__array_function__`` and ``__array_ufunc__`` protocols. Another
>>> reasonable
>>> choice could be ``__array_namespace__``.
>>>
>>> It is less clear what the NumPy function that calls this protocol should
>>> be
>>> called (``get_array_module`` in this proposal). Some possible
>>> alternatives:
>>> ``array_module``, ``common_array_module``, ``resolve_array_module``,
>>> ``get_namespace``, ``get_numpy``, ``get_numpylike_module``,
>>> ``get_duck_array_module``.
>>>
>>> .. _requesting-restricted-subsets:
>>>
>>> Requesting restricted subsets of NumPy's API
>>> ============================================
>>>
>>> Over time, NumPy has accumulated a very large API surface, with over 600
>>> attributes in the top level ``numpy`` module alone. It is unlikely that
>>> any
>>> duck array library could or would want to implement all of these
>>> functions and
>>> classes, because the frequently used subset of NumPy is much smaller.
>>>
>>> We think it would be useful exercise to define "minimal" subset(s) of
>>> NumPy's
>>> API, omitting rarely used or non-recommended functionality. For example,
>>> minimal NumPy might include ``stack``, but not the other stacking
>>> functions
>>> ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This could
>>> clearly
>>> indicate to duck array authors and users want functionality is core and
>>> what
>>> functionality they can skip.
>>>
>>> Support for requesting a restricted subset of NumPy's API would be a
>>> natural
>>> feature to include in  ``get_array_function`` and ``__array_module__``,
>>> e.g.,
>>>
>>> .. code:: python
>>>
>>>     # array_module is only guaranteed to contain "minimal" NumPy
>>>     array_module = np.get_array_module(*arrays, request='minimal')
>>>
>>> To facilitate testing with NumPy and use with any valid duck array
>>> library,
>>> NumPy itself would return restricted versions of the ``numpy`` module
>>> when
>>> ``get_array_module`` is called only on NumPy arrays. Omitted functions
>>> would
>>> simply not exist.
>>>
>>> Unfortunately, we have not yet figured out what these restricted subsets
>>> should
>>> be, so it doesn't make sense to do this yet. When/if we do, we could
>>> either add
>>> new keyword arguments to ``get_array_module`` or add new top level
>>> functions,
>>> e.g., ``get_minimal_array_module``. We would also need to add either a
>>> new
>>> protocol patterned off of ``__array_module__`` (e.g.,
>>> ``__array_module_minimal__``), or could add an optional second argument
>>> to
>>> ``__array_module__`` (catching errors with ``try``/``except``).
>>>
>>> A new namespace for implicit dispatch
>>> =====================================
>>>
>>> Instead of supporting overrides in the main `numpy` namespace with
>>> ``__array_function__``, we could create a new opt-in namespace, e.g.,
>>> ``numpy.api``, with versions of NumPy functions that support
>>> dispatching. These
>>> overrides would need new opt-in protocols, e.g.,
>>> ``__array_function_api__``
>>> patterned off of ``__array_function__``.
>>>
>>> This would resolve the biggest limitations of ``__array_function__`` by
>>> being
>>> opt-in and would also allow for unambiguously overriding functions like
>>> ``asarray``, because ``np.api.asarray`` would always mean "convert an
>>> array-like object."  But it wouldn't solve all the dispatching needs met
>>> by
>>> ``__array_module__``, and would leave us with supporting a considerably
>>> more
>>> complex protocol both for array users and implementors.
>>>
>>> We could potentially implement such a new namespace *via* the
>>> ``__array_module__`` protocol. Certainly some users would find this
>>> convenient,
>>> because it is slightly less boilerplate. But this would leave users with
>>> a
>>> confusing choice: when should they use `get_array_module` vs.
>>> `np.api.something`. Also, we would have to add and maintain a whole new
>>> module,
>>> which is considerably more expensive than merely adding a function.
>>>
>>> Dispatching on both types and arrays instead of only types
>>> ==========================================================
>>>
>>> Instead of supporting dispatch only via unique array types, we could also
>>> support dispatch via array objects, e.g., by passing an ``arrays``
>>> argument as
>>> part of the ``__array_module__`` protocol. This could potentially be
>>> useful for
>>> dispatch for arrays with metadata, such provided by Dask and Pint, but
>>> would
>>> impose costs in terms of type safety and complexity.
>>>
>>> For example, a library that supports arrays on both CPUs and GPUs might
>>> decide
>>> on which device to create a new arrays from functions like ``ones``
>>> based on
>>> input arguments:
>>>
>>> .. code:: python
>>>
>>>     class Array:
>>>         def __array_module__(self, types, arrays):
>>>             useful_arrays = tuple(a in arrays if isinstance(a, Array))
>>>             if not useful_arrays:
>>>                 return NotImplemented
>>>             prefer_gpu = any(a.prefer_gpu for a in useful_arrays)
>>>             return ArrayModule(prefer_gpu)
>>>
>>>     class ArrayModule:
>>>         def __init__(self, prefer_gpu):
>>>             self.prefer_gpu = prefer_gpu
>>>
>>>         def __getattr__(self, name):
>>>             import base_module
>>>             base_func = getattr(base_module, name)
>>>             return functools.partial(base_func,
>>> prefer_gpu=self.prefer_gpu)
>>>
>>> This might be useful, but it's not clear if we really need it. Pint
>>> seems to
>>> get along OK without any explicit array creation routines (favoring
>>> multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for the
>>> most part
>>> Dask is also OK with existing ``__array_function__`` style overides
>>> (e.g.,
>>> favoring ``np.ones_like`` over ``np.ones``). Choosing whether to place
>>> an array
>>> on the CPU or GPU could be solved by `making array creation lazy
>>> <https://github.com/google/jax/pull/1668>`_.
>>>
>>> .. _appendix-design-choices:
>>>
>>> Appendix: design choices for API overrides
>>> ------------------------------------------
>>>
>>> There is a large range of possible design choices for overriding NumPy's
>>> API.
>>> Here we discuss three major axes of the design decision that guided our
>>> design
>>> for ``__array_module__``.
>>>
>>> Opt-in vs. opt-out for users
>>> ============================
>>>
>>> The ``__array_ufunc__`` and ``__array_function__`` protocols provide a
>>> mechanism for overriding NumPy functions *within NumPy's existing
>>> namespace*.
>>> This means that users need to explicitly opt-out if they do not want any
>>> overridden behavior, e.g., by casting arrays with ``np.asarray()``.
>>>
>>> In theory, this approach lowers the barrier for adopting these protocols
>>> in
>>> user code and libraries, because code that uses the standard NumPy
>>> namespace is
>>> automatically compatible. But in practice, this hasn't worked out. For
>>> example,
>>> most well-maintained libraries that use NumPy follow the best practice of
>>> casting all inputs with ``np.asarray()``, which they would have to
>>> explicitly
>>> relax to use ``__array_function__``. Our experience has been that making
>>> a
>>> library compatible with a new duck array type typically requires at
>>> least a
>>> small amount of work to accommodate differences in the data model and
>>> operations
>>> that can be implemented efficiently.
>>>
>>> These opt-out approaches also considerably complicate backwards
>>> compatibility
>>> for libraries that adopt these protocols, because by opting in as a
>>> library
>>> they also opt-in their users, whether they expect it or not. For winning
>>> over
>>> libraries that have been unable to adopt ``__array_function__``, an
>>> opt-in
>>> approach seems like a must.
>>>
>>> Explicit vs. implicit choice of implementation
>>> ==============================================
>>>
>>> Both ``__array_ufunc__`` and ``__array_function__`` have implicit
>>> control over
>>> dispatching: the dispatched functions are determined via the appropriate
>>> protocols in every function call. This generalizes well to handling many
>>> different types of objects, as evidenced by its use for implementing
>>> arithmetic
>>> operators in Python, but it has two downsides:
>>>
>>> 1. *Speed*: it imposes additional overhead in every function call,
>>> because each
>>>    function call needs to inspect each of its arguments for overrides.
>>> This is
>>>    why arithmetic on builtin Python numbers is slow.
>>> 2. *Readability*: it is not longer immediately evident to readers of
>>> code what
>>>    happens when a function is called, because the function's
>>> implementation
>>>    could be overridden by any of its arguments.
>>>
>>> In contrast, importing a new library (e.g., ``import  dask.array as
>>> da``) with
>>> an API matching NumPy is entirely explicit. There is no overhead from
>>> dispatch
>>> or ambiguity about which implementation is being used.
>>>
>>> Explicit and implicit choice of implementations are not mutually
>>> exclusive
>>> options. Indeed, most implementations of NumPy API overrides via
>>> ``__array_function__`` that we are familiar with (namely, dask, CuPy and
>>> sparse, but not Pint) also include an explicit way to use their version
>>> of
>>> NumPy's API by importing a module directly (``dask.array``, ``cupy`` or
>>> ``sparse``, respectively).
>>>
>>> Local vs. non-local vs. global control
>>> ======================================
>>>
>>> The final design axis is how users control the choice of API:
>>>
>>> - **Local control**, as exemplified by multiple dispatch and Python
>>> protocols for
>>>   arithmetic, determines which implementation to use either by checking
>>> types
>>>   or calling methods on the direct arguments of a function.
>>> - **Non-local control** such as `np.errstate
>>>   <
>>> https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html
>>> >`_
>>>   overrides behavior with global-state via function decorators or
>>>   context-managers. Control is determined hierarchically, via the
>>> inner-most
>>>   context.
>>> - **Global control** provides a mechanism for users to set default
>>> behavior,
>>>   either via function calls or configuration files. For example,
>>> matplotlib
>>>   allows setting a global choice of plotting backend.
>>>
>>> Local control is generally considered a best practice for API design,
>>> because
>>> control flow is entirely explicit, which makes it the easiest to
>>> understand.
>>> Non-local and global control are occasionally used, but generally either
>>> due to
>>> ignorance or a lack of better alternatives.
>>>
>>> In the case of duck typing for NumPy's public API, we think non-local or
>>> global
>>> control would be mistakes, mostly because they **don't compose well**.
>>> If one
>>> library sets/needs one set of overrides and then internally calls a
>>> routine
>>> that expects another set of overrides, the resulting behavior may be very
>>> surprising. Higher order functions are especially problematic, because
>>> the
>>> context in which functions are evaluated may not be the context in which
>>> they
>>> are defined.
>>>
>>> One class of override use cases where we think non-local and global
>>> control are
>>> appropriate is for choosing a backend system that is guaranteed to have
>>> an
>>> entirely consistent interface, such as a faster alternative
>>> implementation of
>>> ``numpy.fft`` on NumPy arrays. However, these are out of scope for the
>>> current
>>> proposal, which is focused on duck arrays.
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing listNumPy-Discussion at python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200205/46824216/attachment-0001.html>

From ndbecker2 at gmail.com  Thu Feb  6 08:00:59 2020
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 6 Feb 2020 08:00:59 -0500
Subject: [Numpy-discussion] manylinux upgrade for numpy wheels
In-Reply-To: <CAH6Pt5rScq5GmzJ2yn7A7eGC5dRGLMOiskgBOw2e5NryrmvQ5g@mail.gmail.com>
References: <CAB6mnxJuwvZw2BnVxRR1i6M+J486ZxpqHGNRqFcy_zipsphwXQ@mail.gmail.com>
 <CAPJVwB=8OxswwCPROKSyvj=2B1Aj1caO=RaCyFkD_TN9JQHrpA@mail.gmail.com>
 <CAH6Pt5rScq5GmzJ2yn7A7eGC5dRGLMOiskgBOw2e5NryrmvQ5g@mail.gmail.com>
Message-ID: <CAG3t+pESE3uq71dOPs8SiTcYpYg=vsQX5t7qbY3mHe3dJT-bpQ@mail.gmail.com>

Slightly off topic perhaps, it is recommended to perform custom compilation
for best performance, yet is there an
easy way to do this?  I don't think a simple pip will do.

On Wed, Feb 5, 2020 at 4:07 AM Matthew Brett <matthew.brett at gmail.com>
wrote:

> Hi,
>
> On Tue, Feb 4, 2020 at 10:38 PM Nathaniel Smith <njs at pobox.com> wrote:
> >
> > Pretty sure the 2010 and 2014 images both have much newer compilers than
> that.
> >
> > There are still a lot of users on CentOS 6, so I'd still stick to 2010
> for now on x86_64 at least. We could potentially start adding 2014 wheels
> for the other platforms where we currently don't ship wheels ? gotta be
> better than nothing, right?
> >
> > There probably still is some tail of end users whose pip is too old to
> know about 2010 wheels. I don't know how big that tail is. If we wanted to
> be really careful, we could ship both manylinux1 and manylinux2010 wheels
> for a bit ? pip will automatically pick the latest one it recognizes ? and
> see what the download numbers look like.
>
> That all sounds right to me too.
>
> Cheers,
>
> Matthew
>
> > On Tue, Feb 4, 2020, 13:18 Charles R Harris <charlesr.harris at gmail.com>
> wrote:
> >>
> >> Hi All,
> >>
> >> Thought now would be a good time to decide on upgrading manylinux for
> the 1.19 release so that we can make sure that everything works as
> expected. The choices are
> >>
> >> manylinux1 -- CentOS 5, currently used, gcc 4.2 (in practice 4.5), only
> supports i686, x86_64.
> >> manylinux2010 -- CentOS 6, gcc 4.5, only supports i686, x86_64.
> >> manylinux2014 -- CentOS 7, gcc 4.8, supports many more architectures.
> >>
> >> The main advantage of manylinux2014 is that it supports many new
> architectures, some of which we are already testing against. The main
> disadvantage is that it requires pip >= 19.x, which may not be much of a
> problem 4 months from now but will undoubtedly cause some installation
> problems. Unfortunately, the compiler remains archaic, but folks interested
> in performance should be using a performance oriented distribution or
> compiling for their native architecture.
> >>
> >> Chuck
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 
*Those who don't understand recursion are doomed to repeat it*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200206/c252433c/attachment.html>

From aldcroft at head.cfa.harvard.edu  Thu Feb  6 08:24:16 2020
From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas)
Date: Thu, 6 Feb 2020 08:24:16 -0500
Subject: [Numpy-discussion] manylinux upgrade for numpy wheels
In-Reply-To: <CAPJVwB=8OxswwCPROKSyvj=2B1Aj1caO=RaCyFkD_TN9JQHrpA@mail.gmail.com>
References: <CAB6mnxJuwvZw2BnVxRR1i6M+J486ZxpqHGNRqFcy_zipsphwXQ@mail.gmail.com>
 <CAPJVwB=8OxswwCPROKSyvj=2B1Aj1caO=RaCyFkD_TN9JQHrpA@mail.gmail.com>
Message-ID: <CAMtEP6wUZSErrh3s1w28Pwc+a4uo4yGA2GKPaiLZ+rti0Qtxbg@mail.gmail.com>

Our organization is still using CentOS-6, so my vote is for that.

Thanks,
Tom

On Tue, Feb 4, 2020 at 5:38 PM Nathaniel Smith <njs at pobox.com> wrote:

> Pretty sure the 2010 and 2014 images both have much newer compilers than
> that.
>
> There are still a lot of users on CentOS 6, so I'd still stick to 2010 for
> now on x86_64 at least. We could potentially start adding 2014 wheels for
> the other platforms where we currently don't ship wheels ? gotta be better
> than nothing, right?
>
> There probably still is some tail of end users whose pip is too old to
> know about 2010 wheels. I don't know how big that tail is. If we wanted to
> be really careful, we could ship both manylinux1 and manylinux2010 wheels
> for a bit ? pip will automatically pick the latest one it recognizes ? and
> see what the download numbers look like.
>
> On Tue, Feb 4, 2020, 13:18 Charles R Harris <charlesr.harris at gmail.com>
> wrote:
>
>> Hi All,
>>
>> Thought now would be a good time to decide on upgrading manylinux for the
>> 1.19 release so that we can make sure that everything works as expected.
>> The choices are
>>
>> manylinux1 <https://www.python.org/dev/peps/pep-0513/> -- CentOS 5,
>> currently used, gcc 4.2 (in practice 4.5), only supports i686, x86_64.
>> manylinux2010 <https://www.python.org/dev/peps/pep-0571/> -- CentOS 6,
>> gcc 4.5, only supports i686, x86_64.
>> manylinux2014 <https://www.python.org/dev/peps/pep-0599/> -- CentOS 7,
>> gcc 4.8, supports many more architectures.
>>
>> The main advantage of manylinux2014 is that it supports many new
>> architectures, some of which we are already testing against. The main
>> disadvantage is that it requires pip >= 19.x, which may not be much of a
>> problem 4 months from now but will undoubtedly cause some installation
>> problems. Unfortunately, the compiler remains archaic, but folks interested
>> in performance should be using a performance oriented distribution or
>> compiling for their native architecture.
>>
>> Chuck
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200206/bf189039/attachment.html>

From shoyer at gmail.com  Thu Feb  6 12:35:31 2020
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 6 Feb 2020 09:35:31 -0800
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
Message-ID: <CAEQ_Tvc9H33HeBmPJsaCrksEWSiOBRwURmXOZfVL=dgPkcExdQ@mail.gmail.com>

On Wed, Feb 5, 2020 at 8:02 AM Andreas Mueller <t3kcit at gmail.com> wrote:

> A bit late to the NEP 37 party.
> I just wanted to say that at least from my perspective it seems a great
> solution that will help sklearn move towards more flexible compute engines.
> I think one of the biggest issues is array creation (including random
> arrays), and that's handled quite nicely with NEP 37.
>

Andreas, thanks for sharing your feedback here! Your perspective is really
appreciated.


> - We use scipy.linalg in many places, and we would need to do a separate
> dispatching to check whether we can use module.linalg instead
>  (that might be an issue for many libraries but I'm not sure).
>

This brings up a good question -- obviously the final decision here is up
to SciPy maintainers, but how should we encourage SciPy to support
dispatching?

We could pretty easily make __array_function__ cover SciPy by simply
exposing NumPy's internal utilities. SciPy could simply use the
np.array_function_dispatch decorator internally and that would be enough.

It is less clear how this could work for __array_module__, because
__array_module__ and get_array_module() are not generic -- they refers
explicitly to a NumPy like module. If we want to extend it to SciPy (for
which I agree there are good use-cases), what should that look like?

The obvious choices would be to either add a new protocol, e.g.,
__scipy_module__ (but then NumPy needs to know about SciPy), or to add some
sort of "module request" parameter to np.get_array_module(), to indicate
the requested API, e.g., np.get_array_module(*arrays, matching='scipy').
This is pretty similar to the "default" argument but would need to get
passed into the __array_module__ protocol, too.


> - Some models have several possible optimization algorithms, some of which
> are pure numpy and some which are Cython. If someone provides a different
> array module,
>  we might want to choose an algorithm that is actually supported by that
> module. While this exact issue is maybe sklearn specific, a similar issue
> could appear for most downstream libs that use Cython in some places.
>  Many Cython algorithms could be implemented in pure numpy with a
> potential slowdown, but once we have NEP 37 there might be a benefit to
> having a pure NumPy implementation as an alternative code path.
>
>
> Anyway, NEP 37 seems a great step in the right direction and would enable
> sklearn to actually dispatch in some places. Dispatching just based on
> __array_function__ seems not really feasible so far.
>
> Best,
> Andreas Mueller
>
>
> On 1/6/20 11:29 PM, Stephan Hoyer wrote:
>
> I am pleased to present a new NumPy Enhancement Proposal for discussion:
> "NEP-37: A dispatch protocol for NumPy-like modules." Feedback would be
> very welcome!
>
> The full text follows. The rendered proposal can also be found online at
> https://numpy.org/neps/nep-0037-array-module.html
>
> Best,
> Stephan Hoyer
>
> ===================================================
> NEP 37 ? A dispatch protocol for NumPy-like modules
> ===================================================
>
> :Author: Stephan Hoyer <shoyer at google.com>
> :Author: Hameer Abbasi
> :Author: Sebastian Berg
> :Status: Draft
> :Type: Standards Track
> :Created: 2019-12-29
>
> Abstract
> --------
>
> NEP-18's ``__array_function__`` has been a mixed success. Some projects
> (e.g.,
> dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted it. Others
> (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we propose a new
> protocol, ``__array_module__``, that we expect could eventually subsume
> most
> use-cases for ``__array_function__``. The protocol requires explicit
> adoption
> by both users and library authors, which ensures backwards compatibility,
> and
> is also significantly simpler than ``__array_function__``, both of which we
> expect will make it easier to adopt.
>
> Why ``__array_function__`` hasn't been enough
> ---------------------------------------------
>
> There are two broad ways in which NEP-18 has fallen short of its goals:
>
> 1. **Maintainability concerns**. `__array_function__` has significant
>    implications for libraries that use it:
>
>    - Projects like `PyTorch
>      <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
>      <https://github.com/google/jax/issues/1565>`_ and even `scipy.sparse
>      <https://github.com/scipy/scipy/issues/10362>`_ have been reluctant
> to
>      implement `__array_function__` in part because they are concerned
> about
>      **breaking existing code**: users expect NumPy functions like
>      ``np.concatenate`` to return NumPy arrays. This is a fundamental
>      limitation of the ``__array_function__`` design, which we chose to
> allow
>      overriding the existing ``numpy`` namespace.
>    - ``__array_function__`` currently requires an "all or nothing"
> approach to
>      implementing NumPy's API. There is no good pathway for **incremental
>      adoption**, which is particularly problematic for established projects
>      for which adopting ``__array_function__`` would result in breaking
>      changes.
>    - It is no longer possible to use **aliases to NumPy functions** within
>      modules that support overrides. For example, both CuPy and JAX set
>      ``result_type = np.result_type``.
>    - Implementing **fall-back mechanisms** for unimplemented NumPy
> functions
>      by using NumPy's implementation is hard to get right (but see the
>      `version from dask <https://github.com/dask/dask/pull/5043>`_),
> because
>      ``__array_function__`` does not present a consistent interface.
>      Converting all arguments of array type requires recursing into generic
>      arguments of the form ``*args, **kwargs``.
>
> 2. **Limitations on what can be overridden.** ``__array_function__`` has
> some
>    important gaps, most notably array creation and coercion functions:
>
>    - **Array creation** routines (e.g., ``np.arange`` and those in
>      ``np.random``) need some other mechanism for indicating what type of
>      arrays to create. `NEP 36 <https://github.com/numpy/numpy/pull/14715
> >`_
>      proposed adding optional ``like=`` arguments to functions without
>      existing array arguments. However, we still lack any mechanism to
>      override methods on objects, such as those needed by
>      ``np.random.RandomState``.
>    - **Array conversion** can't reuse the existing coercion functions like
>      ``np.asarray``, because ``np.asarray`` sometimes means "convert to an
>      exact ``np.ndarray``" and other times means "convert to something
> _like_
>      a NumPy array." This led to the `NEP 30
>      <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_
> proposal for
>      a separate ``np.duckarray`` function, but this still does not resolve
> how
>      to cast one duck array into a type matching another duck array.
>
> ``get_array_module`` and the ``__array_module__`` protocol
> ----------------------------------------------------------
>
> We propose a new user-facing mechanism for dispatching to a duck-array
> implementation, ``numpy.get_array_module``. ``get_array_module`` performs
> the
> same type resolution as ``__array_function__`` and returns a module with
> an API
> promised to match the standard interface of ``numpy`` that can implement
> operations on all provided array types.
>
> The protocol itself is both simpler and more powerful than
> ``__array_function__``, because it doesn't need to worry about actually
> implementing functions. We believe it resolves most of the maintainability
> and
> functionality limitations of ``__array_function__``.
>
> The new protocol is opt-in, explicit and with local control; see
> :ref:`appendix-design-choices` for discussion on the importance of these
> design
> features.
>
> The array module contract
> =========================
>
> Modules returned by ``get_array_module``/``__array_module__`` should make a
> best effort to implement NumPy's core functionality on new array types(s).
> Unimplemented functionality should simply be omitted (e.g., accessing an
> unimplemented function should raise ``AttributeError``). In the future, we
> anticipate codifying a protocol for requesting restricted subsets of
> ``numpy``;
> see :ref:`requesting-restricted-subsets` for more details.
>
> How to use ``get_array_module``
> ===============================
>
> Code that wants to support generic duck arrays should explicitly call
> ``get_array_module`` to determine an appropriate array module from which to
> call functions, rather than using the ``numpy`` namespace directly. For
> example:
>
> .. code:: python
>
>     # calls the appropriate version of np.something for x and y
>     module = np.get_array_module(x, y)
>     module.something(x, y)
>
> Both array creation and array conversion are supported, because
> dispatching is
> handled by ``get_array_module`` rather than via the types of function
> arguments. For example, to use random number generation functions or
> methods,
> we can simply pull out the appropriate submodule:
>
> .. code:: python
>
>     def duckarray_add_random(array):
>         module = np.get_array_module(array)
>         noise = module.random.randn(*array.shape)
>         return array + noise
>
> We can also write the duck-array ``stack`` function from `NEP 30
> <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_, without the
> need
> for a new ``np.duckarray`` function:
>
> .. code:: python
>
>     def duckarray_stack(arrays):
>         module = np.get_array_module(*arrays)
>         arrays = [module.asarray(arr) for arr in arrays]
>         shapes = {arr.shape for arr in arrays}
>         if len(shapes) != 1:
>             raise ValueError('all input arrays must have the same shape')
>         expanded_arrays = [arr[module.newaxis, ...] for arr in arrays]
>         return module.concatenate(expanded_arrays, axis=0)
>
> By default, ``get_array_module`` will return the ``numpy`` module if no
> arguments are arrays. This fall-back can be explicitly controlled by
> providing
> the ``module`` keyword-only argument. It is also possible to indicate that
> an
> exception should be raised instead of returning a default array module by
> setting ``module=None``.
>
> How to implement ``__array_module__``
> =====================================
>
> Libraries implementing a duck array type that want to support
> ``get_array_module`` need to implement the corresponding protocol,
> ``__array_module__``. This new protocol is based on Python's dispatch
> protocol
> for arithmetic, and is essentially a simpler version of
> ``__array_function__``.
>
> Only one argument is passed into ``__array_module__``, a Python collection
> of
> unique array types passed into ``get_array_module``, i.e., all arguments
> with
> an ``__array_module__`` attribute.
>
> The special method should either return an namespace with an API matching
> ``numpy``, or ``NotImplemented``, indicating that it does not know how to
> handle the operation:
>
> .. code:: python
>
>     class MyArray:
>         def __array_module__(self, types):
>             if not all(issubclass(t, MyArray) for t in types):
>                 return NotImplemented
>             return my_array_module
>
> Returning custom objects from ``__array_module__``
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> ``my_array_module`` will typically, but need not always, be a Python
> module.
> Returning a custom objects (e.g., with functions implemented via
> ``__getattr__``) may be useful for some advanced use cases.
>
> For example, custom objects could allow for partial implementations of duck
> array modules that fall-back to NumPy (although this is not recommended in
> general because such fall-back behavior can be error prone):
>
> .. code:: python
>
>     class MyArray:
>         def __array_module__(self, types):
>             if all(issubclass(t, MyArray) for t in types):
>                 return ArrayModule()
>             else:
>                 return NotImplemented
>
>     class ArrayModule:
>         def __getattr__(self, name):
>             import base_module
>             return getattr(base_module, name, getattr(numpy, name))
>
> Subclassing from ``numpy.ndarray``
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> All of the same guidance about well-defined type casting hierarchies from
> NEP-18 still applies. ``numpy.ndarray`` itself contains a matching
> implementation of ``__array_module__``,  which is convenient for
> subclasses:
>
> .. code:: python
>
>     class ndarray:
>         def __array_module__(self, types):
>             if all(issubclass(t, ndarray) for t in types):
>                 return numpy
>             else:
>                 return NotImplemented
>
> NumPy's internal machinery
> ==========================
>
> The type resolution rules of ``get_array_module`` follow the same model as
> Python and NumPy's existing dispatch protocols: subclasses are called
> before
> super-classes, and otherwise left to right. ``__array_module__`` is
> guaranteed
> to be called only  a single time on each unique type.
>
> The actual implementation of `get_array_module` will be in C, but should be
> equivalent to this Python code:
>
> .. code:: python
>
>     def get_array_module(*arrays, default=numpy):
>         implementing_arrays, types = _implementing_arrays_and_types(arrays)
>         if not implementing_arrays and default is not None:
>             return default
>         for array in implementing_arrays:
>             module = array.__array_module__(types)
>             if module is not NotImplemented:
>                 return module
>         raise TypeError("no common array module found")
>
>     def _implementing_arrays_and_types(relevant_arrays):
>         types = []
>         implementing_arrays = []
>         for array in relevant_arrays:
>             t = type(array)
>             if t not in types and hasattr(t, '__array_module__'):
>                 types.append(t)
>                 # Subclasses before superclasses, otherwise left to right
>                 index = len(implementing_arrays)
>                 for i, old_array in enumerate(implementing_arrays):
>                     if issubclass(t, type(old_array)):
>                         index = i
>                         break
>                 implementing_arrays.insert(index, array)
>         return implementing_arrays, types
>
> Relationship with ``__array_ufunc__`` and ``__array_function__``
> ----------------------------------------------------------------
>
> These older protocols have distinct use-cases and should remain
> ===============================================================
>
> ``__array_module__`` is intended to resolve limitations of
> ``__array_function__``, so it is natural to consider whether it could
> entirely
> replace ``__array_function__``. This would offer dual benefits: (1)
> simplifying
> the user-story about how to override NumPy and (2) removing the slowdown
> associated with checking for dispatch when calling every NumPy function.
>
> However, ``__array_module__`` and ``__array_function__`` are pretty
> different
> from a user perspective: it requires explicit calls to
> ``get_array_function``,
> rather than simply reusing original ``numpy`` functions. This is probably
> fine
> for *libraries* that rely on duck-arrays, but may be frustratingly verbose
> for
> interactive use.
>
> Some of the dispatching use-cases for ``__array_ufunc__`` are also solved
> by
> ``__array_module__``, but not all of them. For example, it is still useful
> to
> be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in a
> generic way
> on non-NumPy arrays (e.g., with dask.array).
>
> Given their existing adoption and distinct use cases, we don't think it
> makes
> sense to remove or deprecate ``__array_function__`` and
> ``__array_ufunc__`` at
> this time.
>
> Mixin classes to implement ``__array_function__`` and ``__array_ufunc__``
> =========================================================================
>
> Despite the user-facing differences, ``__array_module__`` and a module
> implementing NumPy's API still contain sufficient functionality needed to
> implement dispatching with the existing duck array protocols.
>
> For example, the following mixin classes would provide sensible defaults
> for
> these special methods in terms of ``get_array_module`` and
> ``__array_module__``:
>
> .. code:: python
>
>     class ArrayUfuncFromModuleMixin:
>
>         def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
>             arrays = inputs + kwargs.get('out', ())
>             try:
>                 array_module = np.get_array_module(*arrays)
>             except TypeError:
>                 return NotImplemented
>
>             try:
>                 # Note this may have false positive matches, if
> ufunc.__name__
>                 # matches the name of a ufunc defined by NumPy.
> Unfortunately
>                 # there is no way to determine in which module a ufunc was
>                 # defined.
>                 new_ufunc = getattr(array_module, ufunc.__name__)
>             except AttributeError:
>                 return NotImplemented
>
>             try:
>                 callable = getattr(new_ufunc, method)
>             except AttributeError:
>                 return NotImplemented
>
>             return callable(*inputs, **kwargs)
>
>     class ArrayFunctionFromModuleMixin:
>
>         def __array_function__(self, func, types, args, kwargs):
>             array_module = self.__array_module__(types)
>             if array_module is NotImplemented:
>                 return NotImplemented
>
>             # Traverse submodules to find the appropriate function
>             modules = func.__module__.split('.')
>             assert modules[0] == 'numpy'
>             for submodule in modules[1:]:
>                 module = getattr(module, submodule, None)
>             new_func = getattr(module, func.__name__, None)
>             if new_func is None:
>                 return NotImplemented
>
>             return new_func(*args, **kwargs)
>
> To make it easier to write duck arrays, we could also add these mixin
> classes
> into ``numpy.lib.mixins`` (but the examples above may suffice).
>
> Alternatives considered
> -----------------------
>
> Naming
> ======
>
> We like the name ``__array_module__`` because it mirrors the existing
> ``__array_function__`` and ``__array_ufunc__`` protocols. Another
> reasonable
> choice could be ``__array_namespace__``.
>
> It is less clear what the NumPy function that calls this protocol should be
> called (``get_array_module`` in this proposal). Some possible alternatives:
> ``array_module``, ``common_array_module``, ``resolve_array_module``,
> ``get_namespace``, ``get_numpy``, ``get_numpylike_module``,
> ``get_duck_array_module``.
>
> .. _requesting-restricted-subsets:
>
> Requesting restricted subsets of NumPy's API
> ============================================
>
> Over time, NumPy has accumulated a very large API surface, with over 600
> attributes in the top level ``numpy`` module alone. It is unlikely that any
> duck array library could or would want to implement all of these functions
> and
> classes, because the frequently used subset of NumPy is much smaller.
>
> We think it would be useful exercise to define "minimal" subset(s) of
> NumPy's
> API, omitting rarely used or non-recommended functionality. For example,
> minimal NumPy might include ``stack``, but not the other stacking functions
> ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This could clearly
> indicate to duck array authors and users want functionality is core and
> what
> functionality they can skip.
>
> Support for requesting a restricted subset of NumPy's API would be a
> natural
> feature to include in  ``get_array_function`` and ``__array_module__``,
> e.g.,
>
> .. code:: python
>
>     # array_module is only guaranteed to contain "minimal" NumPy
>     array_module = np.get_array_module(*arrays, request='minimal')
>
> To facilitate testing with NumPy and use with any valid duck array library,
> NumPy itself would return restricted versions of the ``numpy`` module when
> ``get_array_module`` is called only on NumPy arrays. Omitted functions
> would
> simply not exist.
>
> Unfortunately, we have not yet figured out what these restricted subsets
> should
> be, so it doesn't make sense to do this yet. When/if we do, we could
> either add
> new keyword arguments to ``get_array_module`` or add new top level
> functions,
> e.g., ``get_minimal_array_module``. We would also need to add either a new
> protocol patterned off of ``__array_module__`` (e.g.,
> ``__array_module_minimal__``), or could add an optional second argument to
> ``__array_module__`` (catching errors with ``try``/``except``).
>
> A new namespace for implicit dispatch
> =====================================
>
> Instead of supporting overrides in the main `numpy` namespace with
> ``__array_function__``, we could create a new opt-in namespace, e.g.,
> ``numpy.api``, with versions of NumPy functions that support dispatching.
> These
> overrides would need new opt-in protocols, e.g., ``__array_function_api__``
> patterned off of ``__array_function__``.
>
> This would resolve the biggest limitations of ``__array_function__`` by
> being
> opt-in and would also allow for unambiguously overriding functions like
> ``asarray``, because ``np.api.asarray`` would always mean "convert an
> array-like object."  But it wouldn't solve all the dispatching needs met by
> ``__array_module__``, and would leave us with supporting a considerably
> more
> complex protocol both for array users and implementors.
>
> We could potentially implement such a new namespace *via* the
> ``__array_module__`` protocol. Certainly some users would find this
> convenient,
> because it is slightly less boilerplate. But this would leave users with a
> confusing choice: when should they use `get_array_module` vs.
> `np.api.something`. Also, we would have to add and maintain a whole new
> module,
> which is considerably more expensive than merely adding a function.
>
> Dispatching on both types and arrays instead of only types
> ==========================================================
>
> Instead of supporting dispatch only via unique array types, we could also
> support dispatch via array objects, e.g., by passing an ``arrays``
> argument as
> part of the ``__array_module__`` protocol. This could potentially be
> useful for
> dispatch for arrays with metadata, such provided by Dask and Pint, but
> would
> impose costs in terms of type safety and complexity.
>
> For example, a library that supports arrays on both CPUs and GPUs might
> decide
> on which device to create a new arrays from functions like ``ones`` based
> on
> input arguments:
>
> .. code:: python
>
>     class Array:
>         def __array_module__(self, types, arrays):
>             useful_arrays = tuple(a in arrays if isinstance(a, Array))
>             if not useful_arrays:
>                 return NotImplemented
>             prefer_gpu = any(a.prefer_gpu for a in useful_arrays)
>             return ArrayModule(prefer_gpu)
>
>     class ArrayModule:
>         def __init__(self, prefer_gpu):
>             self.prefer_gpu = prefer_gpu
>
>         def __getattr__(self, name):
>             import base_module
>             base_func = getattr(base_module, name)
>             return functools.partial(base_func, prefer_gpu=self.prefer_gpu)
>
> This might be useful, but it's not clear if we really need it. Pint seems
> to
> get along OK without any explicit array creation routines (favoring
> multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for the most
> part
> Dask is also OK with existing ``__array_function__`` style overides (e.g.,
> favoring ``np.ones_like`` over ``np.ones``). Choosing whether to place an
> array
> on the CPU or GPU could be solved by `making array creation lazy
> <https://github.com/google/jax/pull/1668>`_.
>
> .. _appendix-design-choices:
>
> Appendix: design choices for API overrides
> ------------------------------------------
>
> There is a large range of possible design choices for overriding NumPy's
> API.
> Here we discuss three major axes of the design decision that guided our
> design
> for ``__array_module__``.
>
> Opt-in vs. opt-out for users
> ============================
>
> The ``__array_ufunc__`` and ``__array_function__`` protocols provide a
> mechanism for overriding NumPy functions *within NumPy's existing
> namespace*.
> This means that users need to explicitly opt-out if they do not want any
> overridden behavior, e.g., by casting arrays with ``np.asarray()``.
>
> In theory, this approach lowers the barrier for adopting these protocols in
> user code and libraries, because code that uses the standard NumPy
> namespace is
> automatically compatible. But in practice, this hasn't worked out. For
> example,
> most well-maintained libraries that use NumPy follow the best practice of
> casting all inputs with ``np.asarray()``, which they would have to
> explicitly
> relax to use ``__array_function__``. Our experience has been that making a
> library compatible with a new duck array type typically requires at least a
> small amount of work to accommodate differences in the data model and
> operations
> that can be implemented efficiently.
>
> These opt-out approaches also considerably complicate backwards
> compatibility
> for libraries that adopt these protocols, because by opting in as a library
> they also opt-in their users, whether they expect it or not. For winning
> over
> libraries that have been unable to adopt ``__array_function__``, an opt-in
> approach seems like a must.
>
> Explicit vs. implicit choice of implementation
> ==============================================
>
> Both ``__array_ufunc__`` and ``__array_function__`` have implicit control
> over
> dispatching: the dispatched functions are determined via the appropriate
> protocols in every function call. This generalizes well to handling many
> different types of objects, as evidenced by its use for implementing
> arithmetic
> operators in Python, but it has two downsides:
>
> 1. *Speed*: it imposes additional overhead in every function call, because
> each
>    function call needs to inspect each of its arguments for overrides.
> This is
>    why arithmetic on builtin Python numbers is slow.
> 2. *Readability*: it is not longer immediately evident to readers of code
> what
>    happens when a function is called, because the function's implementation
>    could be overridden by any of its arguments.
>
> In contrast, importing a new library (e.g., ``import  dask.array as da``)
> with
> an API matching NumPy is entirely explicit. There is no overhead from
> dispatch
> or ambiguity about which implementation is being used.
>
> Explicit and implicit choice of implementations are not mutually exclusive
> options. Indeed, most implementations of NumPy API overrides via
> ``__array_function__`` that we are familiar with (namely, dask, CuPy and
> sparse, but not Pint) also include an explicit way to use their version of
> NumPy's API by importing a module directly (``dask.array``, ``cupy`` or
> ``sparse``, respectively).
>
> Local vs. non-local vs. global control
> ======================================
>
> The final design axis is how users control the choice of API:
>
> - **Local control**, as exemplified by multiple dispatch and Python
> protocols for
>   arithmetic, determines which implementation to use either by checking
> types
>   or calling methods on the direct arguments of a function.
> - **Non-local control** such as `np.errstate
>   <
> https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html
> >`_
>   overrides behavior with global-state via function decorators or
>   context-managers. Control is determined hierarchically, via the
> inner-most
>   context.
> - **Global control** provides a mechanism for users to set default
> behavior,
>   either via function calls or configuration files. For example, matplotlib
>   allows setting a global choice of plotting backend.
>
> Local control is generally considered a best practice for API design,
> because
> control flow is entirely explicit, which makes it the easiest to
> understand.
> Non-local and global control are occasionally used, but generally either
> due to
> ignorance or a lack of better alternatives.
>
> In the case of duck typing for NumPy's public API, we think non-local or
> global
> control would be mistakes, mostly because they **don't compose well**. If
> one
> library sets/needs one set of overrides and then internally calls a routine
> that expects another set of overrides, the resulting behavior may be very
> surprising. Higher order functions are especially problematic, because the
> context in which functions are evaluated may not be the context in which
> they
> are defined.
>
> One class of override use cases where we think non-local and global
> control are
> appropriate is for choosing a backend system that is guaranteed to have an
> entirely consistent interface, such as a faster alternative implementation
> of
> ``numpy.fft`` on NumPy arrays. However, these are out of scope for the
> current
> proposal, which is focused on duck arrays.
>
> _______________________________________________
> NumPy-Discussion mailing listNumPy-Discussion at python.orghttps://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200206/83f7e213/attachment-0001.html>

From sebastian at sipsolutions.net  Thu Feb  6 15:20:15 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 06 Feb 2020 12:20:15 -0800
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <CAEQ_Tvc9H33HeBmPJsaCrksEWSiOBRwURmXOZfVL=dgPkcExdQ@mail.gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
 <CAEQ_Tvc9H33HeBmPJsaCrksEWSiOBRwURmXOZfVL=dgPkcExdQ@mail.gmail.com>
Message-ID: <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net>

On Thu, 2020-02-06 at 09:35 -0800, Stephan Hoyer wrote:
> On Wed, Feb 5, 2020 at 8:02 AM Andreas Mueller <t3kcit at gmail.com>
> wrote:
<snip>
>  
> > - We use scipy.linalg in many places, and we would need to do a
> > separate dispatching to check whether we can use module.linalg
> > instead
> >  (that might be an issue for many libraries but I'm not sure).
> > 
> 
> This brings up a good question -- obviously the final decision here
> is up to SciPy maintainers, but how should we encourage SciPy to
> support dispatching?
> We could pretty easily make __array_function__ cover SciPy by simply
> exposing NumPy's internal utilities. SciPy could simply use the
> np.array_function_dispatch decorator internally and that would be
> enough.

Hmmm, in NumPy we can easily force basically 100% of (desired)
coverage, i.e. JAX can return a namespace that implements everything.
With SciPy that is already muss less feasible, and as you go to domain
specific tools it seems implausible.

`get_array_module` solves the issue of a library that wants to support
all array likes. As long as:
  * most functions rely only on the NumPy API
  * the domain specific library is expected to implement support for
    specific array objects if necessary. E.g. sklearn can include
    special code for Dask support. Dask does not replace sklearn code.

> It is less clear how this could work for __array_module__, because
> __array_module__ and get_array_module() are not generic -- they
> refers explicitly to a NumPy like module. If we want to extend it to
> SciPy (for which I agree there are good use-cases), what should that
> look __array_module__` 

I suppose the question is here, where should the code reside? For
SciPy, I agree there is a good reason why you may want to "reverse" the
implementation. The code to support JAX arrays, should live inside JAX.

One, probably silly, option is to return a "global" namespace, so that:

    np = get_array_module(*arrays).numpy`

We have to distinct issues: Where should e.g. SciPy put a generic
implementation (assuming they to provide implementations that only
require NumPy-API support to not require overriding)?
And, also if a library provides generic support, should we define a
standard of how the context/namespace may be passed in/provided?

sklearn's main namespace is expected to support many array
objects/types, but it could be nice to pass in an already known
context/namespace (say scikit-image already found it, and then calls
scikit-learn internally). A "generic" namespace may even require this
to infer the correct output array object.


Another thing about backward compatibility: What is our vision there
actually?
This NEP will *not* give the *end user* the option to opt-in! Here,
opt-in is really reserved to the *library user* (e.g. sklearn). (I did
not realize this clearly before)

Thinking about that for a bit now, that seems like the right choice.
But it also means that the library requires an easy way of giving a
FutureWarning, to notify the end-user of the upcoming change. The end-
user will easily be able to convert to a NumPy array to keep the old
behaviour.
Once this warning is given (maybe during `get_array_module()`, the
array module object/context would preferably be passed around,
hopefully even between libraries. That provides a reasonable way to
opt-in to the new behaviour without a warning (mainly for library
users, end-users can silence the warning if they wish so). 

- Sebastian


> The obvious choices would be to either add a new protocol, e.g.,
> __scipy_module__ (but then NumPy needs to know about SciPy), or to
> add some sort of "module request" parameter to np.get_array_module(),
> to indicate the requested API, e.g., np.get_array_module(*arrays,
> matching='scipy'). This is pretty similar to the "default" argument
> but would need to get passed into the __array_module__ protocol, too.
>  
> > - Some models have several possible optimization algorithms, some
> > of which are pure numpy and some which are Cython. If someone
> > provides a different array module,
> >  we might want to choose an algorithm that is actually supported by
> > that module. While this exact issue is maybe sklearn specific, a
> > similar issue could appear for most downstream libs that use Cython
> > in some places.
> >  Many Cython algorithms could be implemented in pure numpy with a
> > potential slowdown, but once we have NEP 37 there might be a
> > benefit to having a pure NumPy implementation as an alternative
> > code path.
> > 
> > 
> > Anyway, NEP 37 seems a great step in the right direction and would
> > enable sklearn to actually dispatch in some places. Dispatching
> > just based on __array_function__ seems not really feasible so far.
> > 
> > Best,
> > Andreas Mueller
> > 
> > 
> > On 1/6/20 11:29 PM, Stephan Hoyer wrote:
> > > I am pleased to present a new NumPy Enhancement Proposal for
> > > discussion: "NEP-37: A dispatch protocol for NumPy-like modules."
> > > Feedback would be very welcome!
> > > 
> > > The full text follows. The rendered proposal can also be found
> > > online at https://numpy.org/neps/nep-0037-array-module.html
> > > 
> > > Best,
> > > Stephan Hoyer
> > > 
> > > ===================================================
> > > NEP 37 ? A dispatch protocol for NumPy-like modules
> > > ===================================================
> > > 
> > > :Author: Stephan Hoyer <shoyer at google.com>
> > > :Author: Hameer Abbasi
> > > :Author: Sebastian Berg
> > > :Status: Draft
> > > :Type: Standards Track
> > > :Created: 2019-12-29
> > > 
> > > Abstract
> > > --------
> > > 
> > > NEP-18's ``__array_function__`` has been a mixed success. Some
> > > projects (e.g.,
> > > dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted
> > > it. Others
> > > (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we
> > > propose a new
> > > protocol, ``__array_module__``, that we expect could eventually
> > > subsume most
> > > use-cases for ``__array_function__``. The protocol requires
> > > explicit adoption
> > > by both users and library authors, which ensures backwards
> > > compatibility, and
> > > is also significantly simpler than ``__array_function__``, both
> > > of which we
> > > expect will make it easier to adopt.
> > > 
> > > Why ``__array_function__`` hasn't been enough
> > > ---------------------------------------------
> > > 
> > > There are two broad ways in which NEP-18 has fallen short of its
> > > goals:
> > > 
> > > 1. **Maintainability concerns**. `__array_function__` has
> > > significant
> > >    implications for libraries that use it:
> > > 
> > >    - Projects like `PyTorch
> > >      <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
> > >      <https://github.com/google/jax/issues/1565>`_ and even
> > > `scipy.sparse
> > >      <https://github.com/scipy/scipy/issues/10362>`_ have been
> > > reluctant to
> > >      implement `__array_function__` in part because they are
> > > concerned about
> > >      **breaking existing code**: users expect NumPy functions
> > > like
> > >      ``np.concatenate`` to return NumPy arrays. This is a
> > > fundamental
> > >      limitation of the ``__array_function__`` design, which we
> > > chose to allow
> > >      overriding the existing ``numpy`` namespace.
> > >    - ``__array_function__`` currently requires an "all or
> > > nothing" approach to
> > >      implementing NumPy's API. There is no good pathway for
> > > **incremental
> > >      adoption**, which is particularly problematic for
> > > established projects
> > >      for which adopting ``__array_function__`` would result in
> > > breaking
> > >      changes.
> > >    - It is no longer possible to use **aliases to NumPy
> > > functions** within
> > >      modules that support overrides. For example, both CuPy and
> > > JAX set
> > >      ``result_type = np.result_type``.
> > >    - Implementing **fall-back mechanisms** for unimplemented
> > > NumPy functions
> > >      by using NumPy's implementation is hard to get right (but
> > > see the
> > >      `version from dask <
> > > https://github.com/dask/dask/pull/5043>`_), because
> > >      ``__array_function__`` does not present a consistent
> > > interface.
> > >      Converting all arguments of array type requires recursing
> > > into generic
> > >      arguments of the form ``*args, **kwargs``.
> > > 
> > > 2. **Limitations on what can be overridden.**
> > > ``__array_function__`` has some
> > >    important gaps, most notably array creation and coercion
> > > functions:
> > > 
> > >    - **Array creation** routines (e.g., ``np.arange`` and those
> > > in
> > >      ``np.random``) need some other mechanism for indicating what
> > > type of
> > >      arrays to create. `NEP 36 <
> > > https://github.com/numpy/numpy/pull/14715>`_
> > >      proposed adding optional ``like=`` arguments to functions
> > > without
> > >      existing array arguments. However, we still lack any
> > > mechanism to
> > >      override methods on objects, such as those needed by
> > >      ``np.random.RandomState``.
> > >    - **Array conversion** can't reuse the existing coercion
> > > functions like
> > >      ``np.asarray``, because ``np.asarray`` sometimes means
> > > "convert to an
> > >      exact ``np.ndarray``" and other times means "convert to
> > > something _like_
> > >      a NumPy array." This led to the `NEP 30
> > >      <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_
> > >  proposal for
> > >      a separate ``np.duckarray`` function, but this still does
> > > not resolve how
> > >      to cast one duck array into a type matching another duck
> > > array.
> > > 
> > > ``get_array_module`` and the ``__array_module__`` protocol
> > > ----------------------------------------------------------
> > > 
> > > We propose a new user-facing mechanism for dispatching to a duck-
> > > array
> > > implementation, ``numpy.get_array_module``. ``get_array_module``
> > > performs the
> > > same type resolution as ``__array_function__`` and returns a
> > > module with an API
> > > promised to match the standard interface of ``numpy`` that can
> > > implement
> > > operations on all provided array types.
> > > 
> > > The protocol itself is both simpler and more powerful than
> > > ``__array_function__``, because it doesn't need to worry about
> > > actually
> > > implementing functions. We believe it resolves most of the
> > > maintainability and
> > > functionality limitations of ``__array_function__``.
> > > 
> > > The new protocol is opt-in, explicit and with local control; see
> > > :ref:`appendix-design-choices` for discussion on the importance
> > > of these design
> > > features.
> > > 
> > > The array module contract
> > > =========================
> > > 
> > > Modules returned by ``get_array_module``/``__array_module__``
> > > should make a
> > > best effort to implement NumPy's core functionality on new array
> > > types(s).
> > > Unimplemented functionality should simply be omitted (e.g.,
> > > accessing an
> > > unimplemented function should raise ``AttributeError``). In the
> > > future, we
> > > anticipate codifying a protocol for requesting restricted subsets
> > > of ``numpy``;
> > > see :ref:`requesting-restricted-subsets` for more details.
> > > 
> > > How to use ``get_array_module``
> > > ===============================
> > > 
> > > Code that wants to support generic duck arrays should explicitly
> > > call
> > > ``get_array_module`` to determine an appropriate array module
> > > from which to
> > > call functions, rather than using the ``numpy`` namespace
> > > directly. For
> > > example:
> > > 
> > > .. code:: python
> > > 
> > >     # calls the appropriate version of np.something for x and y
> > >     module = np.get_array_module(x, y)
> > >     module.something(x, y)
> > > 
> > > Both array creation and array conversion are supported, because
> > > dispatching is
> > > handled by ``get_array_module`` rather than via the types of
> > > function
> > > arguments. For example, to use random number generation functions
> > > or methods,
> > > we can simply pull out the appropriate submodule:
> > > 
> > > .. code:: python
> > > 
> > >     def duckarray_add_random(array):
> > >         module = np.get_array_module(array)
> > >         noise = module.random.randn(*array.shape)
> > >         return array + noise
> > > 
> > > We can also write the duck-array ``stack`` function from `NEP 30
> > > <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_,
> > > without the need
> > > for a new ``np.duckarray`` function:
> > > 
> > > .. code:: python
> > > 
> > >     def duckarray_stack(arrays):
> > >         module = np.get_array_module(*arrays)
> > >         arrays = [module.asarray(arr) for arr in arrays]
> > >         shapes = {arr.shape for arr in arrays}
> > >         if len(shapes) != 1:
> > >             raise ValueError('all input arrays must have the same
> > > shape')
> > >         expanded_arrays = [arr[module.newaxis, ...] for arr in
> > > arrays]
> > >         return module.concatenate(expanded_arrays, axis=0)
> > > 
> > > By default, ``get_array_module`` will return the ``numpy`` module
> > > if no
> > > arguments are arrays. This fall-back can be explicitly controlled
> > > by providing
> > > the ``module`` keyword-only argument. It is also possible to
> > > indicate that an
> > > exception should be raised instead of returning a default array
> > > module by
> > > setting ``module=None``.
> > > 
> > > How to implement ``__array_module__``
> > > =====================================
> > > 
> > > Libraries implementing a duck array type that want to support
> > > ``get_array_module`` need to implement the corresponding
> > > protocol,
> > > ``__array_module__``. This new protocol is based on Python's
> > > dispatch protocol
> > > for arithmetic, and is essentially a simpler version of
> > > ``__array_function__``.
> > > 
> > > Only one argument is passed into ``__array_module__``, a Python
> > > collection of
> > > unique array types passed into ``get_array_module``, i.e., all
> > > arguments with
> > > an ``__array_module__`` attribute.
> > > 
> > > The special method should either return an namespace with an API
> > > matching
> > > ``numpy``, or ``NotImplemented``, indicating that it does not
> > > know how to
> > > handle the operation:
> > > 
> > > .. code:: python
> > > 
> > >     class MyArray:
> > >         def __array_module__(self, types):
> > >             if not all(issubclass(t, MyArray) for t in types):
> > >                 return NotImplemented
> > >             return my_array_module
> > > 
> > > Returning custom objects from ``__array_module__``
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > ``my_array_module`` will typically, but need not always, be a
> > > Python module.
> > > Returning a custom objects (e.g., with functions implemented via
> > > ``__getattr__``) may be useful for some advanced use cases.
> > > 
> > > For example, custom objects could allow for partial
> > > implementations of duck
> > > array modules that fall-back to NumPy (although this is not
> > > recommended in
> > > general because such fall-back behavior can be error prone):
> > > 
> > > .. code:: python
> > > 
> > >     class MyArray:
> > >         def __array_module__(self, types):
> > >             if all(issubclass(t, MyArray) for t in types):
> > >                 return ArrayModule()
> > >             else:
> > >                 return NotImplemented
> > > 
> > >     class ArrayModule:
> > >         def __getattr__(self, name):
> > >             import base_module
> > >             return getattr(base_module, name, getattr(numpy,
> > > name))
> > > 
> > > Subclassing from ``numpy.ndarray``
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > All of the same guidance about well-defined type casting
> > > hierarchies from
> > > NEP-18 still applies. ``numpy.ndarray`` itself contains a
> > > matching
> > > implementation of ``__array_module__``,  which is convenient for
> > > subclasses:
> > > 
> > > .. code:: python
> > > 
> > >     class ndarray:
> > >         def __array_module__(self, types):
> > >             if all(issubclass(t, ndarray) for t in types):
> > >                 return numpy
> > >             else:
> > >                 return NotImplemented
> > > 
> > > NumPy's internal machinery
> > > ==========================
> > > 
> > > The type resolution rules of ``get_array_module`` follow the same
> > > model as
> > > Python and NumPy's existing dispatch protocols: subclasses are
> > > called before
> > > super-classes, and otherwise left to right. ``__array_module__``
> > > is guaranteed
> > > to be called only  a single time on each unique type.
> > > 
> > > The actual implementation of `get_array_module` will be in C, but
> > > should be
> > > equivalent to this Python code:
> > > 
> > > .. code:: python
> > > 
> > >     def get_array_module(*arrays, default=numpy):
> > >         implementing_arrays, types =
> > > _implementing_arrays_and_types(arrays)
> > >         if not implementing_arrays and default is not None:
> > >             return default
> > >         for array in implementing_arrays:
> > >             module = array.__array_module__(types)
> > >             if module is not NotImplemented:
> > >                 return module
> > >         raise TypeError("no common array module found")
> > > 
> > >     def _implementing_arrays_and_types(relevant_arrays):
> > >         types = []
> > >         implementing_arrays = []
> > >         for array in relevant_arrays:
> > >             t = type(array)
> > >             if t not in types and hasattr(t, '__array_module__'):
> > >                 types.append(t)
> > >                 # Subclasses before superclasses, otherwise left
> > > to right
> > >                 index = len(implementing_arrays)
> > >                 for i, old_array in
> > > enumerate(implementing_arrays):
> > >                     if issubclass(t, type(old_array)):
> > >                         index = i
> > >                         break
> > >                 implementing_arrays.insert(index, array)
> > >         return implementing_arrays, types
> > > 
> > > Relationship with ``__array_ufunc__`` and ``__array_function__``
> > > ----------------------------------------------------------------
> > > 
> > > These older protocols have distinct use-cases and should remain
> > > ===============================================================
> > > 
> > > ``__array_module__`` is intended to resolve limitations of
> > > ``__array_function__``, so it is natural to consider whether it
> > > could entirely
> > > replace ``__array_function__``. This would offer dual benefits:
> > > (1) simplifying
> > > the user-story about how to override NumPy and (2) removing the
> > > slowdown
> > > associated with checking for dispatch when calling every NumPy
> > > function.
> > > 
> > > However, ``__array_module__`` and ``__array_function__`` are
> > > pretty different
> > > from a user perspective: it requires explicit calls to
> > > ``get_array_function``,
> > > rather than simply reusing original ``numpy`` functions. This is
> > > probably fine
> > > for *libraries* that rely on duck-arrays, but may be
> > > frustratingly verbose for
> > > interactive use.
> > > 
> > > Some of the dispatching use-cases for ``__array_ufunc__`` are
> > > also solved by
> > > ``__array_module__``, but not all of them. For example, it is
> > > still useful to
> > > be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in
> > > a generic way
> > > on non-NumPy arrays (e.g., with dask.array).
> > > 
> > > Given their existing adoption and distinct use cases, we don't
> > > think it makes
> > > sense to remove or deprecate ``__array_function__`` and
> > > ``__array_ufunc__`` at
> > > this time.
> > > 
> > > Mixin classes to implement ``__array_function__`` and
> > > ``__array_ufunc__``
> > > =================================================================
> > > ========
> > > 
> > > Despite the user-facing differences, ``__array_module__`` and a
> > > module
> > > implementing NumPy's API still contain sufficient functionality
> > > needed to
> > > implement dispatching with the existing duck array protocols.
> > > 
> > > For example, the following mixin classes would provide sensible
> > > defaults for
> > > these special methods in terms of ``get_array_module`` and
> > > ``__array_module__``:
> > > 
> > > .. code:: python
> > > 
> > >     class ArrayUfuncFromModuleMixin:
> > > 
> > >         def __array_ufunc__(self, ufunc, method, *inputs,
> > > **kwargs):
> > >             arrays = inputs + kwargs.get('out', ())
> > >             try:
> > >                 array_module = np.get_array_module(*arrays)
> > >             except TypeError:
> > >                 return NotImplemented
> > > 
> > >             try:
> > >                 # Note this may have false positive matches, if
> > > ufunc.__name__
> > >                 # matches the name of a ufunc defined by NumPy.
> > > Unfortunately
> > >                 # there is no way to determine in which module a
> > > ufunc was
> > >                 # defined.
> > >                 new_ufunc = getattr(array_module, ufunc.__name__)
> > >             except AttributeError:
> > >                 return NotImplemented
> > > 
> > >             try:
> > >                 callable = getattr(new_ufunc, method)
> > >             except AttributeError:
> > >                 return NotImplemented
> > > 
> > >             return callable(*inputs, **kwargs)
> > > 
> > >     class ArrayFunctionFromModuleMixin:
> > > 
> > >         def __array_function__(self, func, types, args, kwargs):
> > >             array_module = self.__array_module__(types)
> > >             if array_module is NotImplemented:
> > >                 return NotImplemented
> > > 
> > >             # Traverse submodules to find the appropriate
> > > function
> > >             modules = func.__module__.split('.')
> > >             assert modules[0] == 'numpy'
> > >             for submodule in modules[1:]:
> > >                 module = getattr(module, submodule, None)
> > >             new_func = getattr(module, func.__name__, None)
> > >             if new_func is None:
> > >                 return NotImplemented
> > > 
> > >             return new_func(*args, **kwargs)
> > > 
> > > To make it easier to write duck arrays, we could also add these
> > > mixin classes
> > > into ``numpy.lib.mixins`` (but the examples above may suffice).
> > > 
> > > Alternatives considered
> > > -----------------------
> > > 
> > > Naming
> > > ======
> > > 
> > > We like the name ``__array_module__`` because it mirrors the
> > > existing
> > > ``__array_function__`` and ``__array_ufunc__`` protocols. Another
> > > reasonable
> > > choice could be ``__array_namespace__``.
> > > 
> > > It is less clear what the NumPy function that calls this protocol
> > > should be
> > > called (``get_array_module`` in this proposal). Some possible
> > > alternatives:
> > > ``array_module``, ``common_array_module``,
> > > ``resolve_array_module``,
> > > ``get_namespace``, ``get_numpy``, ``get_numpylike_module``,
> > > ``get_duck_array_module``.
> > > 
> > > .. _requesting-restricted-subsets:
> > > 
> > > Requesting restricted subsets of NumPy's API
> > > ============================================
> > > 
> > > Over time, NumPy has accumulated a very large API surface, with
> > > over 600
> > > attributes in the top level ``numpy`` module alone. It is
> > > unlikely that any
> > > duck array library could or would want to implement all of these
> > > functions and
> > > classes, because the frequently used subset of NumPy is much
> > > smaller.
> > > 
> > > We think it would be useful exercise to define "minimal"
> > > subset(s) of NumPy's
> > > API, omitting rarely used or non-recommended functionality. For
> > > example,
> > > minimal NumPy might include ``stack``, but not the other stacking
> > > functions
> > > ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This
> > > could clearly
> > > indicate to duck array authors and users want functionality is
> > > core and what
> > > functionality they can skip.
> > > 
> > > Support for requesting a restricted subset of NumPy's API would
> > > be a natural
> > > feature to include in  ``get_array_function`` and
> > > ``__array_module__``, e.g.,
> > > 
> > > .. code:: python
> > > 
> > >     # array_module is only guaranteed to contain "minimal" NumPy
> > >     array_module = np.get_array_module(*arrays,
> > > request='minimal')
> > > 
> > > To facilitate testing with NumPy and use with any valid duck
> > > array library,
> > > NumPy itself would return restricted versions of the ``numpy``
> > > module when
> > > ``get_array_module`` is called only on NumPy arrays. Omitted
> > > functions would
> > > simply not exist.
> > > 
> > > Unfortunately, we have not yet figured out what these restricted
> > > subsets should
> > > be, so it doesn't make sense to do this yet. When/if we do, we
> > > could either add
> > > new keyword arguments to ``get_array_module`` or add new top
> > > level functions,
> > > e.g., ``get_minimal_array_module``. We would also need to add
> > > either a new
> > > protocol patterned off of ``__array_module__`` (e.g.,
> > > ``__array_module_minimal__``), or could add an optional second
> > > argument to
> > > ``__array_module__`` (catching errors with ``try``/``except``).
> > > 
> > > A new namespace for implicit dispatch
> > > =====================================
> > > 
> > > Instead of supporting overrides in the main `numpy` namespace
> > > with
> > > ``__array_function__``, we could create a new opt-in namespace,
> > > e.g.,
> > > ``numpy.api``, with versions of NumPy functions that support
> > > dispatching. These
> > > overrides would need new opt-in protocols, e.g.,
> > > ``__array_function_api__``
> > > patterned off of ``__array_function__``.
> > > 
> > > This would resolve the biggest limitations of
> > > ``__array_function__`` by being
> > > opt-in and would also allow for unambiguously overriding
> > > functions like
> > > ``asarray``, because ``np.api.asarray`` would always mean
> > > "convert an
> > > array-like object."  But it wouldn't solve all the dispatching
> > > needs met by
> > > ``__array_module__``, and would leave us with supporting a
> > > considerably more
> > > complex protocol both for array users and implementors.
> > > 
> > > We could potentially implement such a new namespace *via* the
> > > ``__array_module__`` protocol. Certainly some users would find
> > > this convenient,
> > > because it is slightly less boilerplate. But this would leave
> > > users with a
> > > confusing choice: when should they use `get_array_module` vs.
> > > `np.api.something`. Also, we would have to add and maintain a
> > > whole new module,
> > > which is considerably more expensive than merely adding a
> > > function.
> > > 
> > > Dispatching on both types and arrays instead of only types
> > > ==========================================================
> > > 
> > > Instead of supporting dispatch only via unique array types, we
> > > could also
> > > support dispatch via array objects, e.g., by passing an
> > > ``arrays`` argument as
> > > part of the ``__array_module__`` protocol. This could potentially
> > > be useful for
> > > dispatch for arrays with metadata, such provided by Dask and
> > > Pint, but would
> > > impose costs in terms of type safety and complexity.
> > > 
> > > For example, a library that supports arrays on both CPUs and GPUs
> > > might decide
> > > on which device to create a new arrays from functions like
> > > ``ones`` based on
> > > input arguments:
> > > 
> > > .. code:: python
> > > 
> > >     class Array:
> > >         def __array_module__(self, types, arrays):
> > >             useful_arrays = tuple(a in arrays if isinstance(a,
> > > Array))
> > >             if not useful_arrays:
> > >                 return NotImplemented
> > >             prefer_gpu = any(a.prefer_gpu for a in useful_arrays)
> > >             return ArrayModule(prefer_gpu)
> > > 
> > >     class ArrayModule:
> > >         def __init__(self, prefer_gpu):
> > >             self.prefer_gpu = prefer_gpu
> > >         
> > >         def __getattr__(self, name):
> > >             import base_module
> > >             base_func = getattr(base_module, name)
> > >             return functools.partial(base_func,
> > > prefer_gpu=self.prefer_gpu)
> > > 
> > > This might be useful, but it's not clear if we really need it.
> > > Pint seems to
> > > get along OK without any explicit array creation routines
> > > (favoring
> > > multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for
> > > the most part
> > > Dask is also OK with existing ``__array_function__`` style
> > > overides (e.g.,
> > > favoring ``np.ones_like`` over ``np.ones``). Choosing whether to
> > > place an array
> > > on the CPU or GPU could be solved by `making array creation lazy
> > > <https://github.com/google/jax/pull/1668>`_.
> > > 
> > > .. _appendix-design-choices:
> > > 
> > > Appendix: design choices for API overrides
> > > ------------------------------------------
> > > 
> > > There is a large range of possible design choices for overriding
> > > NumPy's API.
> > > Here we discuss three major axes of the design decision that
> > > guided our design
> > > for ``__array_module__``.
> > > 
> > > Opt-in vs. opt-out for users
> > > ============================
> > > 
> > > The ``__array_ufunc__`` and ``__array_function__`` protocols
> > > provide a
> > > mechanism for overriding NumPy functions *within NumPy's existing
> > > namespace*.
> > > This means that users need to explicitly opt-out if they do not
> > > want any
> > > overridden behavior, e.g., by casting arrays with
> > > ``np.asarray()``.
> > > 
> > > In theory, this approach lowers the barrier for adopting these
> > > protocols in
> > > user code and libraries, because code that uses the standard
> > > NumPy namespace is
> > > automatically compatible. But in practice, this hasn't worked
> > > out. For example,
> > > most well-maintained libraries that use NumPy follow the best
> > > practice of
> > > casting all inputs with ``np.asarray()``, which they would have
> > > to explicitly
> > > relax to use ``__array_function__``. Our experience has been that
> > > making a
> > > library compatible with a new duck array type typically requires
> > > at least a
> > > small amount of work to accommodate differences in the data model
> > > and operations
> > > that can be implemented efficiently.
> > > 
> > > These opt-out approaches also considerably complicate backwards
> > > compatibility
> > > for libraries that adopt these protocols, because by opting in as
> > > a library
> > > they also opt-in their users, whether they expect it or not. For
> > > winning over
> > > libraries that have been unable to adopt ``__array_function__``,
> > > an opt-in
> > > approach seems like a must.
> > > 
> > > Explicit vs. implicit choice of implementation
> > > ==============================================
> > > 
> > > Both ``__array_ufunc__`` and ``__array_function__`` have implicit
> > > control over
> > > dispatching: the dispatched functions are determined via the
> > > appropriate
> > > protocols in every function call. This generalizes well to
> > > handling many
> > > different types of objects, as evidenced by its use for
> > > implementing arithmetic
> > > operators in Python, but it has two downsides:
> > > 
> > > 1. *Speed*: it imposes additional overhead in every function
> > > call, because each
> > >    function call needs to inspect each of its arguments for
> > > overrides. This is
> > >    why arithmetic on builtin Python numbers is slow.
> > > 2. *Readability*: it is not longer immediately evident to readers
> > > of code what
> > >    happens when a function is called, because the function's
> > > implementation
> > >    could be overridden by any of its arguments.
> > > 
> > > In contrast, importing a new library (e.g., ``import  dask.array
> > > as da``) with
> > > an API matching NumPy is entirely explicit. There is no overhead
> > > from dispatch
> > > or ambiguity about which implementation is being used.
> > > 
> > > Explicit and implicit choice of implementations are not mutually
> > > exclusive
> > > options. Indeed, most implementations of NumPy API overrides via
> > > ``__array_function__`` that we are familiar with (namely, dask,
> > > CuPy and
> > > sparse, but not Pint) also include an explicit way to use their
> > > version of
> > > NumPy's API by importing a module directly (``dask.array``,
> > > ``cupy`` or
> > > ``sparse``, respectively).
> > > 
> > > Local vs. non-local vs. global control
> > > ======================================
> > > 
> > > The final design axis is how users control the choice of API:
> > > 
> > > - **Local control**, as exemplified by multiple dispatch and
> > > Python protocols for
> > >   arithmetic, determines which implementation to use either by
> > > checking types
> > >   or calling methods on the direct arguments of a function.
> > > - **Non-local control** such as `np.errstate
> > >   <
> > > https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html>`_
> > >   overrides behavior with global-state via function decorators or
> > >   context-managers. Control is determined hierarchically, via the
> > > inner-most
> > >   context.
> > > - **Global control** provides a mechanism for users to set
> > > default behavior,
> > >   either via function calls or configuration files. For example,
> > > matplotlib
> > >   allows setting a global choice of plotting backend.
> > > 
> > > Local control is generally considered a best practice for API
> > > design, because
> > > control flow is entirely explicit, which makes it the easiest to
> > > understand.
> > > Non-local and global control are occasionally used, but generally
> > > either due to
> > > ignorance or a lack of better alternatives.
> > > 
> > > In the case of duck typing for NumPy's public API, we think non-
> > > local or global
> > > control would be mistakes, mostly because they **don't compose
> > > well**. If one
> > > library sets/needs one set of overrides and then internally calls
> > > a routine
> > > that expects another set of overrides, the resulting behavior may
> > > be very
> > > surprising. Higher order functions are especially problematic,
> > > because the
> > > context in which functions are evaluated may not be the context
> > > in which they
> > > are defined.
> > > 
> > > One class of override use cases where we think non-local and
> > > global control are
> > > appropriate is for choosing a backend system that is guaranteed
> > > to have an
> > > entirely consistent interface, such as a faster alternative
> > > implementation of
> > > ``numpy.fft`` on NumPy arrays. However, these are out of scope
> > > for the current
> > > proposal, which is focused on duck arrays.
> > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >  
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200206/ddf176eb/attachment-0001.sig>

From lxndr.rvs at gmail.com  Mon Feb 10 21:07:40 2020
From: lxndr.rvs at gmail.com (Alexander Reeves)
Date: Mon, 10 Feb 2020 18:07:40 -0800
Subject: [Numpy-discussion] Proposal - extend histograms api to allow uneven
 bins
Message-ID: <CABeAeRyiVt4RJP2ew7=C4c4itJO3WZptdD6yrhWV6NgUoqT_mQ@mail.gmail.com>

Greetings,

I have a PR that warrants discussion according to @seberg. See
https://github.com/numpy/numpy/pull/14278.

It is an enhancement that fixes a bug. The original bug is that when using
the fd estimator on a dataset with small inter-quartile range and large
outliers, the current codebase produces more bins than memory allows. There
are several related bug reports (see #11879, #10297, #8203).

In terms of scope, I restricted my changes to conditions where
np.histogram(bins='auto') defaults to the 'fd'.  For the actual fix, I
actually enhanced the API. I used a suggestion from @eric-wieser to merge
empty histogram bins. In practice this solves the outsized bins issue.

However @seberg is concerned that extending the API in this way may not be
the way to go. For example, if you use "auto" once, and then re-use the
bins, the uneven bins may not be what you want.

Furthermore @eric-wieser is concerned that there may be a floating-point
devil in the details. He advocates using the hypothesis testing package to
increase our confidence that the current implementation adequately handles
corner cases.

I would like to do my part in improving the code base. I don't have strong
opinions but I have to admit that I would like to eventually make a PR that
resolves these bugs. This has been a PR half a year in the making after all.

Thoughts?

-areeves87
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200210/69a4fab1/attachment.html>

From ralf.gommers at gmail.com  Tue Feb 11 00:16:44 2020
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 10 Feb 2020 23:16:44 -0600
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
Message-ID: <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>

On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> ?snip?
>
> > 1) Once NumPy adds the framework and initial set of Universal Intrinsic,
> if contributors want to leverage a new architecture specific SIMD
> instruction, will they be expected to add software implementation of this
> instruction for all other architectures too?
>
> In my opinion, if the instructions are lower, then yes. For example, one
> cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128
> and SSE*.  However, I would not expect one person or team to be an expert
> in all assemblies, so intrinsics for one architecture can be developed
> independently of another.
>

I think this doesn't quite answer the question. If I understand correctly,
it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing
from the supported AVX512 instructions in master). I think the answer is
yes, it needs to be added for other architectures as well. Otherwise, if
universal intrinsics are added ad-hoc and there's no guarantee that a
universal instruction is available for all main supported platforms, then
over time there won't be much that's "universal" about the framework.

This is a different question though from adding a new ufunc implementation.
I would expect accelerating ufuncs via intrinsics that are already
supported to be much more common than having to add new intrinsics. Does
that sound right?


> > 2) On whom does the burden lie to ensure that new implementations are
> benchmarked and shows benefits on every architecture? What happens if
> optimizing an Ufunc leads to improving performance on one architecture and
> worsens performance on another?
>

This is slightly hard to provide a recipe for. I suspect it may take a
while before this becomes an issue, since we don't have much SIMD code to
begin with. So adding new code with benchmarks will likely show
improvements on all architectures (we should ensure benchmarks can be run
via CI, otherwise it's too onerous). And if not and it's not easily
fixable, the problematic platform could be skipped so performance there is
unchanged.

Only once there's existing universal intrinsics and then they're tweaked
will we have to be much more careful I'd think.

Cheers,
Ralf


>
> I would look at this from a maintainability point of view. If we are
> increasing the code size by 20% for a certain ufunc, there must be a
> domonstrable 20% increase in performance on any CPU. That is to say,
> micro-optimisation will be unwelcome, and code readability will be
> preferable. Usually we ask the submitter of the PR to test the PR with a
> machine they have on hand, and I would be inclined to keep this trend of
> self-reporting. Of course, if someone else came along and reported a
> performance regression of, say, 10%, then we have increased code by 20%,
> with only a net 5% gain in performance, and the PR will have to be reverted.
>
> ?snip?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200210/4b8354cf/attachment.html>

From matti.picus at gmail.com  Tue Feb 11 01:53:18 2020
From: matti.picus at gmail.com (Matti Picus)
Date: Tue, 11 Feb 2020 08:53:18 +0200
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
 <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
Message-ID: <11f71397-0bac-aaba-3ee9-133ad098b894@gmail.com>


On 11/2/20 7:16 am, Ralf Gommers wrote:
>
>
> On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi 
> <einstein.edison at gmail.com <mailto:einstein.edison at gmail.com>> wrote:
>
>     ?snip?
>
>     > 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if
>     contributors want to leverage a new architecture specific SIMD
>     instruction, will they be expected to add software implementation
>     of this instruction for all other architectures too?
>
>     In my opinion, if the instructions are lower, then yes. For
>     example, one cannot add AVX-512 without adding, for example adding
>     AVX-256 and AVX-128 and SSE*.? However, I would not expect one
>     person or team to be an expert in all assemblies, so intrinsics
>     for one architecture can be developed independently of another.
>
>
> I think this doesn't quite answer the question. If I understand 
> correctly, it's about a single instruction (e.g. one needs 
> |"VEXP2PD"and it's missing from the supported AVX512 instructions in 
> master). I think the answer is yes, it needs to be added for other 
> architectures as well. Otherwise, if universal intrinsics are added 
> ad-hoc and there's no guarantee that a universal instruction is 
> available for all main supported platforms, then over time there won't 
> be much that's "universal" about the framework.|
> |
> |
> |This is a different question though from adding a new ufunc 
> implementation. I would expect accelerating ufuncs via intrinsics that 
> are already supported to be much more common than having to add new 
> intrinsics. Does that sound right?
> |
|Yes. Universal intrinsics are cross-platform. However the NEP is open 
to the possibility that certain architectures may have SIMD intrinsics 
that cannot be expressed in terms of intrinsics for other platforms, and 
so there may be a use case for architecture-specific loops. This is 
explicitly stated in the latest PR to the NEP: "|If the regression is 
not minimal, we may choose to keep the X86-specific code for that 
platform and use the universal intrisic code for other platforms."
> |
> |
>
>
>     > 2) On whom does the burden lie to ensure that new
>     implementations are benchmarked and shows benefits on every
>     architecture? What happens if optimizing an Ufunc leads to
>     improving performance on one architecture and worsens performance
>     on another?
>
>
> This is slightly hard to provide a recipe for. I suspect it may take a 
> while before this becomes an issue, since we don't have much SIMD code 
> to begin with. So adding new code with benchmarks will likely show 
> improvements on all architectures (we should ensure benchmarks can be 
> run via CI, otherwise it's too onerous). And if not and it's not 
> easily fixable, the problematic platform could be skipped so 
> performance there is unchanged.


On HEAD, out of the 89 ufuncs in 
numpy.core.code_generators.generate_umath.defdict, 34 have X86-specific 
simd loops:


 >>> [x for x in defdict.keys() if any([td.simd for td in 
defdict[x].type_descriptions])]
['add', 'subtract', 'multiply', 'conjugate', 'square', 'reciprocal', 
'absolute', 'negative', 'greater', 'greater_equal', 'less', 
'less_equal', 'equal', 'not_equal', 'logical_and', 'logical_not', 
'logical_or', 'maximum', 'minimum', 'bitwise_and', 'bitwise_or', 
'bitwise_xor', 'invert', 'left_shift', 'right_shift', 'cos', 'sin', 
'exp', 'log', 'sqrt', 'ceil', 'trunc', 'floor', 'rint']


They would be the first targets for universal intrinsics. Of them I 
estimate that the ones with more than one loop for at least one dtype 
signature would be the most difficult, since these have different 
optimizations for avx2, fma, and/or avx512f:


['square', 'reciprocal', 'absolute', 'cos', 'sin', 'exp', 'log', 'sqrt', 
'ceil', 'trunc', 'floor', 'rint']


The other 55 ufuncs, for completeness, are


['floor_divide', 'true_divide', 'fmod', '_ones_like', 'power', 
'float_power', '_arg', 'positive', 'sign', 'logical_xor', 'clip', 
'fmax', 'fmin', 'logaddexp', 'logaddexp2', 'heaviside', 'degrees', 
'rad2deg', 'radians', 'deg2rad', 'arccos', 'arccosh', 'arcsin', 
'arcsinh', 'arctan', 'arctanh', 'tan', 'cosh', 'sinh', 'tanh', 'exp2', 
'expm1', 'log2', 'log10', 'log1p', 'cbrt', 'fabs', 'arctan2', 
'remainder', 'divmod', 'hypot', 'isnan', 'isnat', 'isinf', 'isfinite', 
'signbit', 'copysign', 'nextafter', 'spacing', 'modf', 'ldexp', 'frexp', 
'gcd', 'lcm', 'matmul']


As for testing accuracy: we recently added a framework for testing ulp 
variation of ufuncs against "golden results" in 
numpy/core/tests/test_umath_accuracy. So far float32 is tested for exp, 
log, cos, sin. Others may be tested elsewhere by specific tests, for 
instance numpy/core/test/test_half.py has test_half_ufuncs.


It is difficult to do benchmarking on CI: the machines that run CI vary 
too much. We would need to set aside a machine for this and carefully 
set it up to keep CPU speed and temperature constant. We do have 
benchmarks for ufuncs (they could always be improved). I think Pauli 
runs the benchmarks carefully on X86, and may even makes the results 
public, but that resource is not really on PR reviewers' radar. We could 
run benchmarks on the gcc build farm machines for other architectures. 
Those machines are shared but not heavily utilized.


> Only once there's existing universal intrinsics and then they're 
> tweaked will we have to be much more careful I'd think.
>
>
>
>     I would look at this from a maintainability point of view. If we
>     are increasing the code size by 20% for a certain ufunc, there
>     must be a domonstrable 20% increase in performance on any CPU.
>     That is to say, micro-optimisation will be unwelcome, and code
>     readability will be preferable. Usually we ask the submitter of
>     the PR to test the PR with a machine they have on hand, and I
>     would be inclined to keep this trend of self-reporting. Of course,
>     if someone else came along and reported a performance regression
>     of, say, 10%, then we have increased code by 20%, with only a net
>     5% gain in performance, and the PR will have to be reverted.
>
>     ?snip?
>

I think we should be careful not to increase the reviewer burden, and 
try to automate as much as possible. It would be nice if we could at 
some point set up a set of bots that can be triggered to run benchmarks 
for us and report in the PR the results.


Matti


From raghuveer.devulapalli at intel.com  Tue Feb 11 13:02:09 2020
From: raghuveer.devulapalli at intel.com (Devulapalli, Raghuveer)
Date: Tue, 11 Feb 2020 18:02:09 +0000
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
 <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
Message-ID: <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>

>> I think this doesn't quite answer the question. If I understand correctly, it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing from the  supported AVX512 instructions in master). I think the answer is yes, it needs to be added for other architectures as well.

That adds a lot of overhead to write SIMD based optimizations which can discourage contributors. It?s also an unreasonable expectation that a developer be familiar with SIMD of all the architectures. On top of that the performance implications aren?t clear. Software implementations of hardware instructions might perform worse and might not even produce the same result.

From: NumPy-Discussion <numpy-discussion-bounces+raghuveer.devulapalli=intel.com at python.org> On Behalf Of Ralf Gommers
Sent: Monday, February 10, 2020 9:17 PM
To: Discussion of Numerical Python <numpy-discussion at python.org>
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics


On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi <einstein.edison at gmail.com<mailto:einstein.edison at gmail.com>> wrote:
?snip?

> 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if contributors want to leverage a new architecture specific SIMD instruction, will they be expected to add software implementation of this instruction for all other architectures too?

In my opinion, if the instructions are lower, then yes. For example, one cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128 and SSE*.  However, I would not expect one person or team to be an expert in all assemblies, so intrinsics for one architecture can be developed independently of another.

I think this doesn't quite answer the question. If I understand correctly, it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing from the supported AVX512 instructions in master). I think the answer is yes, it needs to be added for other architectures as well. Otherwise, if universal intrinsics are added ad-hoc and there's no guarantee that a universal instruction is available for all main supported platforms, then over time there won't be much that's "universal" about the framework.

This is a different question though from adding a new ufunc implementation. I would expect accelerating ufuncs via intrinsics that are already supported to be much more common than having to add new intrinsics. Does that sound right?


> 2) On whom does the burden lie to ensure that new implementations are benchmarked and shows benefits on every architecture? What happens if optimizing an Ufunc leads to improving performance on one architecture and worsens performance on another?

This is slightly hard to provide a recipe for. I suspect it may take a while before this becomes an issue, since we don't have much SIMD code to begin with. So adding new code with benchmarks will likely show improvements on all architectures (we should ensure benchmarks can be run via CI, otherwise it's too onerous). And if not and it's not easily fixable, the problematic platform could be skipped so performance there is unchanged.

Only once there's existing universal intrinsics and then they're tweaked will we have to be much more careful I'd think.

Cheers,
Ralf


I would look at this from a maintainability point of view. If we are increasing the code size by 20% for a certain ufunc, there must be a domonstrable 20% increase in performance on any CPU. That is to say, micro-optimisation will be unwelcome, and code readability will be preferable. Usually we ask the submitter of the PR to test the PR with a machine they have on hand, and I would be inclined to keep this trend of self-reporting. Of course, if someone else came along and reported a performance regression of, say, 10%, then we have increased code by 20%, with only a net 5% gain in performance, and the PR will have to be reverted.

?snip?
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org<mailto:NumPy-Discussion at python.org>
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200211/5b26b7b5/attachment.html>

From sebastian at sipsolutions.net  Tue Feb 11 13:09:20 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 11 Feb 2020 10:09:20 -0800
Subject: [Numpy-discussion] NumPy Development Meeting - Triage Focus
Message-ID: <f5ff8cfcb588ac2259811e396e803900b206dd5e.camel@sipsolutions.net>

Hi all,

Our bi-weekly triage-focused NumPy development meeting is tomorrow
(Wednesday, Februrary 12) at 11 am Pacific Time. Everyone is invited
to join in and edit the work-in-progress meeting topics and notes:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

I encourage everyone to notify us of issues or PRs that you feel should
be prioritized or simply discussed briefly. Just comment on it so we
can label it, or add your PR/issue to this weeks topics for discussion.

Best regards,

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200211/b7309370/attachment-0001.sig>

From rossbar15 at gmail.com  Tue Feb 11 16:12:13 2020
From: rossbar15 at gmail.com (Ross Barnowski)
Date: Tue, 11 Feb 2020 13:12:13 -0800
Subject: [Numpy-discussion] Proposal - extend histograms api to allow
 uneven bins
In-Reply-To: <mailman.7788.1581404003.9275.numpy-discussion@python.org>
References: <mailman.7788.1581404003.9275.numpy-discussion@python.org>
Message-ID: <CADyYXupE345hRmHiZS8cVA3mN8kXNOaNPwhpuNBAZoZecr09zw@mail.gmail.com>

Just a few thoughts re: the changes proposed in
https://github.com/numpy/numpy/pull/14278

1. Though the PR is limited to the 'auto' kwarg, the issue of potential
memory problems for the automated binning methods is a more general one
(e.g. #15332 <https://github.com/numpy/numpy/issues/15332>).

2. The main concern that jumps out to me is downstream users who are
relying on the implicit assumption of regular binning. This is of course
bad practice and makes even less sense when using one of the bin
estimators, so I'm not sure how big of a concern it is. However, there is
likely downstream user code that relies on the regular binning assumption,
especially since, as far as I know, NumPy has never implemented binning
techniques that return irregular bins.

3. The astropy project have at least one estimator that returns irregular
bins <https://docs.astropy.org/en/stable/visualization/histogram.html#>.  I
checked for issues
<https://github.com/astropy/astropy/issues?utf8=%E2%9C%93&q=is%3Aissue+histogram>
related to irregular binning: though they have many of the same problems
with the automatic bin estimators (i.e. memory problems for inputs with
outliers), I didn't see anything specifically related to irregular binning

I just wanted to add my two cents. The binning-data-with-outliers problem
is very common in high-resolution spectroscopy, and I have seen
practitioners rely on the assumption of regular binning (e.g. divide the
`range` by the number of bins) to specify bin centers even though this is
not the right way to do things.

Thanks for taking the time to write up your work!

On Mon, Feb 10, 2020 at 10:53 PM <numpy-discussion-request at python.org>
wrote:

> Send NumPy-Discussion mailing list submissions to
>         numpy-discussion at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>         numpy-discussion-request at python.org
>
> You can reach the person managing the list at
>         numpy-discussion-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>    1. Proposal - extend histograms api to allow uneven bins
>       (Alexander Reeves)
>    2. Re: NEP 38 - Universal SIMD intrinsics (Ralf Gommers)
>    3. Re: NEP 38 - Universal SIMD intrinsics (Matti Picus)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 10 Feb 2020 18:07:40 -0800
> From: Alexander Reeves <lxndr.rvs at gmail.com>
> To: numpy-discussion at python.org
> Subject: [Numpy-discussion] Proposal - extend histograms api to allow
>         uneven bins
> Message-ID:
>         <CABeAeRyiVt4RJP2ew7=
> C4c4itJO3WZptdD6yrhWV6NgUoqT_mQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Greetings,
>
> I have a PR that warrants discussion according to @seberg. See
> https://github.com/numpy/numpy/pull/14278.
>
> It is an enhancement that fixes a bug. The original bug is that when using
> the fd estimator on a dataset with small inter-quartile range and large
> outliers, the current codebase produces more bins than memory allows. There
> are several related bug reports (see #11879, #10297, #8203).
>
> In terms of scope, I restricted my changes to conditions where
> np.histogram(bins='auto') defaults to the 'fd'.  For the actual fix, I
> actually enhanced the API. I used a suggestion from @eric-wieser to merge
> empty histogram bins. In practice this solves the outsized bins issue.
>
> However @seberg is concerned that extending the API in this way may not be
> the way to go. For example, if you use "auto" once, and then re-use the
> bins, the uneven bins may not be what you want.
>
> Furthermore @eric-wieser is concerned that there may be a floating-point
> devil in the details. He advocates using the hypothesis testing package to
> increase our confidence that the current implementation adequately handles
> corner cases.
>
> I would like to do my part in improving the code base. I don't have strong
> opinions but I have to admit that I would like to eventually make a PR that
> resolves these bugs. This has been a PR half a year in the making after
> all.
>
> Thoughts?
>
> -areeves87
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20200210/69a4fab1/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 10 Feb 2020 23:16:44 -0600
> From: Ralf Gommers <ralf.gommers at gmail.com>
> To: Discussion of Numerical Python <numpy-discussion at python.org>
> Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
> Message-ID:
>         <CABL7CQjzqoZ5Zmo=+
> ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
> > ?snip?
> >
> > > 1) Once NumPy adds the framework and initial set of Universal
> Intrinsic,
> > if contributors want to leverage a new architecture specific SIMD
> > instruction, will they be expected to add software implementation of this
> > instruction for all other architectures too?
> >
> > In my opinion, if the instructions are lower, then yes. For example, one
> > cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128
> > and SSE*.  However, I would not expect one person or team to be an expert
> > in all assemblies, so intrinsics for one architecture can be developed
> > independently of another.
> >
>
> I think this doesn't quite answer the question. If I understand correctly,
> it's about a single instruction (e.g. one needs "VEXP2PD" and it's missing
> from the supported AVX512 instructions in master). I think the answer is
> yes, it needs to be added for other architectures as well. Otherwise, if
> universal intrinsics are added ad-hoc and there's no guarantee that a
> universal instruction is available for all main supported platforms, then
> over time there won't be much that's "universal" about the framework.
>
> This is a different question though from adding a new ufunc implementation.
> I would expect accelerating ufuncs via intrinsics that are already
> supported to be much more common than having to add new intrinsics. Does
> that sound right?
>
>
> > > 2) On whom does the burden lie to ensure that new implementations are
> > benchmarked and shows benefits on every architecture? What happens if
> > optimizing an Ufunc leads to improving performance on one architecture
> and
> > worsens performance on another?
> >
>
> This is slightly hard to provide a recipe for. I suspect it may take a
> while before this becomes an issue, since we don't have much SIMD code to
> begin with. So adding new code with benchmarks will likely show
> improvements on all architectures (we should ensure benchmarks can be run
> via CI, otherwise it's too onerous). And if not and it's not easily
> fixable, the problematic platform could be skipped so performance there is
> unchanged.
>
> Only once there's existing universal intrinsics and then they're tweaked
> will we have to be much more careful I'd think.
>
> Cheers,
> Ralf
>
>
>
> >
> > I would look at this from a maintainability point of view. If we are
> > increasing the code size by 20% for a certain ufunc, there must be a
> > domonstrable 20% increase in performance on any CPU. That is to say,
> > micro-optimisation will be unwelcome, and code readability will be
> > preferable. Usually we ask the submitter of the PR to test the PR with a
> > machine they have on hand, and I would be inclined to keep this trend of
> > self-reporting. Of course, if someone else came along and reported a
> > performance regression of, say, 10%, then we have increased code by 20%,
> > with only a net 5% gain in performance, and the PR will have to be
> reverted.
> >
> > ?snip?
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20200210/4b8354cf/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Tue, 11 Feb 2020 08:53:18 +0200
> From: Matti Picus <matti.picus at gmail.com>
> To: numpy-discussion at python.org
> Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
> Message-ID: <11f71397-0bac-aaba-3ee9-133ad098b894 at gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
>
> On 11/2/20 7:16 am, Ralf Gommers wrote:
> >
> >
> > On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi
> > <einstein.edison at gmail.com <mailto:einstein.edison at gmail.com>> wrote:
> >
> >     ?snip?
> >
> >     > 1) Once NumPy adds the framework and initial set of Universal
> Intrinsic, if
> >     contributors want to leverage a new architecture specific SIMD
> >     instruction, will they be expected to add software implementation
> >     of this instruction for all other architectures too?
> >
> >     In my opinion, if the instructions are lower, then yes. For
> >     example, one cannot add AVX-512 without adding, for example adding
> >     AVX-256 and AVX-128 and SSE*.? However, I would not expect one
> >     person or team to be an expert in all assemblies, so intrinsics
> >     for one architecture can be developed independently of another.
> >
> >
> > I think this doesn't quite answer the question. If I understand
> > correctly, it's about a single instruction (e.g. one needs
> > |"VEXP2PD"and it's missing from the supported AVX512 instructions in
> > master). I think the answer is yes, it needs to be added for other
> > architectures as well. Otherwise, if universal intrinsics are added
> > ad-hoc and there's no guarantee that a universal instruction is
> > available for all main supported platforms, then over time there won't
> > be much that's "universal" about the framework.|
> > |
> > |
> > |This is a different question though from adding a new ufunc
> > implementation. I would expect accelerating ufuncs via intrinsics that
> > are already supported to be much more common than having to add new
> > intrinsics. Does that sound right?
> > |
> |Yes. Universal intrinsics are cross-platform. However the NEP is open
> to the possibility that certain architectures may have SIMD intrinsics
> that cannot be expressed in terms of intrinsics for other platforms, and
> so there may be a use case for architecture-specific loops. This is
> explicitly stated in the latest PR to the NEP: "|If the regression is
> not minimal, we may choose to keep the X86-specific code for that
> platform and use the universal intrisic code for other platforms."
> > |
> > |
> >
> >
> >     > 2) On whom does the burden lie to ensure that new
> >     implementations are benchmarked and shows benefits on every
> >     architecture? What happens if optimizing an Ufunc leads to
> >     improving performance on one architecture and worsens performance
> >     on another?
> >
> >
> > This is slightly hard to provide a recipe for. I suspect it may take a
> > while before this becomes an issue, since we don't have much SIMD code
> > to begin with. So adding new code with benchmarks will likely show
> > improvements on all architectures (we should ensure benchmarks can be
> > run via CI, otherwise it's too onerous). And if not and it's not
> > easily fixable, the problematic platform could be skipped so
> > performance there is unchanged.
>
>
> On HEAD, out of the 89 ufuncs in
> numpy.core.code_generators.generate_umath.defdict, 34 have X86-specific
> simd loops:
>
>
>  >>> [x for x in defdict.keys() if any([td.simd for td in
> defdict[x].type_descriptions])]
> ['add', 'subtract', 'multiply', 'conjugate', 'square', 'reciprocal',
> 'absolute', 'negative', 'greater', 'greater_equal', 'less',
> 'less_equal', 'equal', 'not_equal', 'logical_and', 'logical_not',
> 'logical_or', 'maximum', 'minimum', 'bitwise_and', 'bitwise_or',
> 'bitwise_xor', 'invert', 'left_shift', 'right_shift', 'cos', 'sin',
> 'exp', 'log', 'sqrt', 'ceil', 'trunc', 'floor', 'rint']
>
>
> They would be the first targets for universal intrinsics. Of them I
> estimate that the ones with more than one loop for at least one dtype
> signature would be the most difficult, since these have different
> optimizations for avx2, fma, and/or avx512f:
>
>
> ['square', 'reciprocal', 'absolute', 'cos', 'sin', 'exp', 'log', 'sqrt',
> 'ceil', 'trunc', 'floor', 'rint']
>
>
> The other 55 ufuncs, for completeness, are
>
>
> ['floor_divide', 'true_divide', 'fmod', '_ones_like', 'power',
> 'float_power', '_arg', 'positive', 'sign', 'logical_xor', 'clip',
> 'fmax', 'fmin', 'logaddexp', 'logaddexp2', 'heaviside', 'degrees',
> 'rad2deg', 'radians', 'deg2rad', 'arccos', 'arccosh', 'arcsin',
> 'arcsinh', 'arctan', 'arctanh', 'tan', 'cosh', 'sinh', 'tanh', 'exp2',
> 'expm1', 'log2', 'log10', 'log1p', 'cbrt', 'fabs', 'arctan2',
> 'remainder', 'divmod', 'hypot', 'isnan', 'isnat', 'isinf', 'isfinite',
> 'signbit', 'copysign', 'nextafter', 'spacing', 'modf', 'ldexp', 'frexp',
> 'gcd', 'lcm', 'matmul']
>
>
> As for testing accuracy: we recently added a framework for testing ulp
> variation of ufuncs against "golden results" in
> numpy/core/tests/test_umath_accuracy. So far float32 is tested for exp,
> log, cos, sin. Others may be tested elsewhere by specific tests, for
> instance numpy/core/test/test_half.py has test_half_ufuncs.
>
>
> It is difficult to do benchmarking on CI: the machines that run CI vary
> too much. We would need to set aside a machine for this and carefully
> set it up to keep CPU speed and temperature constant. We do have
> benchmarks for ufuncs (they could always be improved). I think Pauli
> runs the benchmarks carefully on X86, and may even makes the results
> public, but that resource is not really on PR reviewers' radar. We could
> run benchmarks on the gcc build farm machines for other architectures.
> Those machines are shared but not heavily utilized.
>
>
> > Only once there's existing universal intrinsics and then they're
> > tweaked will we have to be much more careful I'd think.
> >
> >
> >
> >     I would look at this from a maintainability point of view. If we
> >     are increasing the code size by 20% for a certain ufunc, there
> >     must be a domonstrable 20% increase in performance on any CPU.
> >     That is to say, micro-optimisation will be unwelcome, and code
> >     readability will be preferable. Usually we ask the submitter of
> >     the PR to test the PR with a machine they have on hand, and I
> >     would be inclined to keep this trend of self-reporting. Of course,
> >     if someone else came along and reported a performance regression
> >     of, say, 10%, then we have increased code by 20%, with only a net
> >     5% gain in performance, and the PR will have to be reverted.
> >
> >     ?snip?
> >
>
> I think we should be careful not to increase the reviewer burden, and
> try to automate as much as possible. It would be nice if we could at
> some point set up a set of bots that can be triggered to run benchmarks
> for us and report in the PR the results.
>
>
> Matti
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------
>
> End of NumPy-Discussion Digest, Vol 161, Issue 10
> *************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200211/cf2ee272/attachment-0001.html>

From ralf.gommers at gmail.com  Tue Feb 11 16:33:19 2020
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 11 Feb 2020 15:33:19 -0600
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
 <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
 <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>
Message-ID: <CABL7CQhvCOzFbrucD1Lb9FPQqScHGMS-gwgX-zsoXCb6Ps-A+Q@mail.gmail.com>

On Tue, Feb 11, 2020 at 12:03 PM Devulapalli, Raghuveer <
raghuveer.devulapalli at intel.com> wrote:

> >> I think this doesn't quite answer the question. If I understand
> correctly, it's about a single instruction (e.g. one needs "VEXP2PD" and
> it's missing from the  supported AVX512 instructions in master). I think
> the answer is yes, it needs to be added for other architectures as well.
>
>
>
> That adds a lot of overhead to write SIMD based optimizations which can
> discourage contributors.
>

Keep in mind that a new universal intrinsics instruction is just a bunch of
defines. That is way less work than writing a ufunc that uses that
instruction. We can also ping a platform expert in case it's not obvious
what the corresponding arch-specific instruction is - that's a bit of a
chicken-and-egg problem; once we get going we hopefully get more interested
people that can help each other out.


> It?s also an unreasonable expectation that a developer be familiar with
> SIMD of all the architectures. On top of that the performance implications
> aren?t clear. Software implementations of hardware instructions might
> perform worse and might not even produce the same result.
>

I think you are worrying about writing ufuncs here, not about adding an
instruction. If the same result is not produced, we have CI that should
fail - and if it does, we can deal with that by (if it's not easy to figure
out) making that platform fall back to the generic non-SIMD version of the
ufunc.

Cheers,
Ralf


>
>
> *From:* NumPy-Discussion <numpy-discussion-bounces+raghuveer.devulapalli=
> intel.com at python.org> *On Behalf Of *Ralf Gommers
> *Sent:* Monday, February 10, 2020 9:17 PM
> *To:* Discussion of Numerical Python <numpy-discussion at python.org>
> *Subject:* Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
>
>
>
>
>
>
>
> On Tue, Feb 4, 2020 at 2:00 PM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
> ?snip?
>
>
>
> > 1) Once NumPy adds the framework and initial set of Universal Intrinsic,
> if contributors want to leverage a new architecture specific SIMD
> instruction, will they be expected to add software implementation of this
> instruction for all other architectures too?
>
>
>
> In my opinion, if the instructions are lower, then yes. For example, one
> cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128
> and SSE*.  However, I would not expect one person or team to be an expert
> in all assemblies, so intrinsics for one architecture can be developed
> independently of another.
>
>
>
> I think this doesn't quite answer the question. If I understand correctly,
> it's about a single instruction (e.g. one needs "VEXP2PD" and it's
> missing from the supported AVX512 instructions in master). I think the
> answer is yes, it needs to be added for other architectures as well.
> Otherwise, if universal intrinsics are added ad-hoc and there's no
> guarantee that a universal instruction is available for all main supported
> platforms, then over time there won't be much that's "universal" about the
> framework.
>
>
>
> This is a different question though from adding a new ufunc
> implementation. I would expect accelerating ufuncs via intrinsics that are
> already supported to be much more common than having to add new intrinsics.
> Does that sound right?
>
>
>
>
> > 2) On whom does the burden lie to ensure that new implementations are
> benchmarked and shows benefits on every architecture? What happens if
> optimizing an Ufunc leads to improving performance on one architecture and
> worsens performance on another?
>
>
>
> This is slightly hard to provide a recipe for. I suspect it may take a
> while before this becomes an issue, since we don't have much SIMD code to
> begin with. So adding new code with benchmarks will likely show
> improvements on all architectures (we should ensure benchmarks can be run
> via CI, otherwise it's too onerous). And if not and it's not easily
> fixable, the problematic platform could be skipped so performance there is
> unchanged.
>
>
>
> Only once there's existing universal intrinsics and then they're tweaked
> will we have to be much more careful I'd think.
>
>
>
> Cheers,
>
> Ralf
>
>
>
>
>
>
>
> I would look at this from a maintainability point of view. If we are
> increasing the code size by 20% for a certain ufunc, there must be a
> domonstrable 20% increase in performance on any CPU. That is to say,
> micro-optimisation will be unwelcome, and code readability will be
> preferable. Usually we ask the submitter of the PR to test the PR with a
> machine they have on hand, and I would be inclined to keep this trend of
> self-reporting. Of course, if someone else came along and reported a
> performance regression of, say, 10%, then we have increased code by 20%,
> with only a net 5% gain in performance, and the PR will have to be reverted.
>
>
>
> ?snip?
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200211/e9cdb2b6/attachment.html>

From matti.picus at gmail.com  Wed Feb 12 02:19:14 2020
From: matti.picus at gmail.com (Matti Picus)
Date: Wed, 12 Feb 2020 09:19:14 +0200
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
 <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
 <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>
Message-ID: <5475fdf3-b435-2266-4c0e-b45b38ebe21e@gmail.com>

On 11/2/20 8:02 pm, Devulapalli, Raghuveer wrote:
>
> On top of that the performance implications aren?t clear. Software 
> implementations of hardware instructions might perform worse and might 
> not even produce the same result.
>

The proposal for universal intrinsics does not enable replacing an 
intrinsic on one platform with a software emulation on another: the 
intrinsics are meant to be compile-time defines that overlay the 
universal intrinsic with a platform specific one. In order to use a new 
intrinsic, it must have parallel intrinsics on the other platforms, or 
cannot be used there: "NPY_CPU_HAVE(FEATURE_NAME)" will always return 
false so the compiler will not even build a loop for that platform. I 
will try to clarify that intention in the NEP.


I hope there will not be a demand to use many non-universal intrinsics 
in ufuncs, we will need to work this out on a case-by-case basis in each 
ufunc. Does that sound reasonable? Are there intrinsics you have already 
used that have no parallel on other platforms?


Matti


From charlesr.harris at gmail.com  Wed Feb 12 11:08:02 2020
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 12 Feb 2020 09:08:02 -0700
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <5475fdf3-b435-2266-4c0e-b45b38ebe21e@gmail.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
 <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
 <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>
 <5475fdf3-b435-2266-4c0e-b45b38ebe21e@gmail.com>
Message-ID: <CAB6mnx+6285GQUS6ufSaGcxuw7sFSCwmBS902Mx2ZbnAzifwkQ@mail.gmail.com>

On Wed, Feb 12, 2020 at 12:19 AM Matti Picus <matti.picus at gmail.com> wrote:

> On 11/2/20 8:02 pm, Devulapalli, Raghuveer wrote:
> >
> > On top of that the performance implications aren?t clear. Software
> > implementations of hardware instructions might perform worse and might
> > not even produce the same result.
> >
>
> The proposal for universal intrinsics does not enable replacing an
> intrinsic on one platform with a software emulation on another: the
> intrinsics are meant to be compile-time defines that overlay the
> universal intrinsic with a platform specific one. In order to use a new
> intrinsic, it must have parallel intrinsics on the other platforms, or
> cannot be used there: "NPY_CPU_HAVE(FEATURE_NAME)" will always return
> false so the compiler will not even build a loop for that platform. I
> will try to clarify that intention in the NEP.
>
>
> I hope there will not be a demand to use many non-universal intrinsics
> in ufuncs, we will need to work this out on a case-by-case basis in each
> ufunc. Does that sound reasonable? Are there intrinsics you have already
> used that have no parallel on other platforms?
>
>
Intrinsics are not an irreversible change, they are, after all, private.
The question is whether they are sufficiently useful to justify the time
spent on them. I don't think we will know that until we attempt actual
implementations. There will probably be some changes as a result of
experience, but that is normal.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200212/fd97ebb1/attachment-0001.html>

From melissawm at gmail.com  Wed Feb 12 08:55:09 2020
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Wed, 12 Feb 2020 10:55:09 -0300
Subject: [Numpy-discussion] NEP 44 - Restructuring the NumPy Documentation
Message-ID: <CAC7J6VYBYpU9gAc2sCM6UWoLwxZNvnNaxBqtqPHskgZ1pMDjRA@mail.gmail.com>

Hi all,

Please see the NEP below for a proposal to restructure the documentation of
NumPy. The main goal here is to make the documentation more visible and
organized, and also make contributions easier.

Comments and feedback are welcome!


See https://github.com/numpy/numpy/pull/15554 for details.

Best,

Melissa

----

NEP 44 ? Restructuring the NumPy Documentation

Authors: Ralf Gommers, Melissa Mendon?a, Mars Lee
Status: Draft
Type: Process
Created: 2020-02-11

Abstract
======

This document proposes a restructuring of the NumPy Documentation, both in
form and content, with the goal of making it more organized and
discoverable for beginners and experienced users.

Motivation and Scope
=================

See [here](numpy.org/devdocs) for the front page of the latest docs. The
organization is quite confusing and illogical (e.g. user and developer docs
are mixed). We propose the following:

- Reorganizing the docs into the four categories mentioned in [1];
- Creating dedicated sections for Tutorials and How-Tos, including
orientation on how to create new content;
- Adding an Explanations section for key concepts and techniques that
require deeper descriptions, some of which will be rearranged from the
Reference Guide.

Usage and Impact
==============

The documentation is a fundamental part of any software project, especially
open source projects. In the case of NumPy, many beginners might feel
demotivated by the current structure of the documentation, since it is
difficult to discover what to learn (unless the user has a clear view of
what to look for in the Reference docs, which is not always the case).

Looking at the results of a ?NumPy Tutorial? search on any search engine
also gives an idea of the demand for this kind of content. Having official
high-level documentation written using up-to-date content and techniques
will certainly mean more users (and developers/contributors) are involved
in the NumPy community.

Backward compatibility
==================

The restructuring will effectively demand a complete rewrite of links and
some of the current content. Input from the community will be useful for
identifying key links and pages that should not be broken.

Detailed description
===============

As discussed in the article [1], there are four categories of doc content:
- Tutorials
- How-to guides
- Explanations
- Reference guide

We propose to use those categories as the ones we use (for writing and
reviewing) whenever we add a new documentation section.

The reasoning for this is that it is clearer both for
developers/documentation writers and to users where each information should
go, and the scope and tone of each document. For example, if explanations
are mixed with basic tutorials, beginners might be overwhelmed and
alienated. On the other hand, if the reference guide contains basic
how-tos, it might be difficult for experienced users to find the
information they need, quickly.

Currently, there are many blogs and tutorials on the internet about NumPy
or using NumPy. One of the issues with this is that if users search for
this information and end up in an outdated (unofficial) tutorial before
they find the current official documentation, they end up creating content
that is confusing, especially for beginners. Having a better infrastructure
for the documentation also aims to solve this problem by giving users
high-level, up-to-date official documentation that can be easily updated.

Status and ideas of each type of doc content
------------------------------------------------------------

* Reference guide

NumPy has a quite complete reference guide. All functions are documented,
most have examples, and most are cross-linked well with See Also sections.
Further improving the reference guide is incremental work that can be done
(and is being done) by many people. There are, however, many explanations
in the reference guide. These can be moved to a more dedicated Explanations
section on the docs.

* How-to guides

NumPy does not have many how-to?s. The subclassing and array ducktyping
section may be an example of a how-to. Others that could be added are:
- Parallelization (controlling BLAS multithreading with threadpoolctl,
using multiprocessing, random number generation, etc.)
- Storing and loading data (.npy/.npz format, text formats, Zarr, HDF5,
Bloscpack, etc.)
- Performance (memory layout, profiling, use with Numba, Cython, or Pythran)
- Writing generic code that works with NumPy, Dask, CuPy, pydata/sparse,
etc.

* Explanations

There is a reasonable amount of content on fundamental NumPy concepts such
as indexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could
be organized better and clarified to ensure it?s really about explaining
the concepts and not mixed with tutorial or how-to like content.

There are few explanations about anything other than those fundamental
NumPy concepts.

Some examples of concepts that could be expanded:
- Copies vs. Views;
- BLAS and other linear algebra libraries;
- Fancy indexing.

In addition, there are many explanations in the Reference Guide, which
should be moved to this new dedicated Explanations section.

* Tutorials

There?s a lot of scope for writing better tutorials. We have a new NumPy
for absolute beginners tutorial [3] (GSoD project of Anne Bonner). In
addition we need a number of tutorials addressing different levels of
experience with Python and NumPy. This could be done using engaging data
sets, ideas or stories. For example, curve fitting with polynomials and
functions in numpy.linalg could be done with the Keeling curve (decades
worth of CO2 concentration in air measurements) rather than with synthetic
random data.

Ideas for tutorials (these capture the types of things that make sense,
they?re not necessarily the exact topics we propose to implement):
- Conway?s game of life with only NumPy (note: already in Nicolas Rougier?s
book)
- Using masked arrays to deal with missing data in time series measurements
- Using Fourier transforms to analyze the Keeling curve data, and
extrapolate it.
- Geospatial data (e.g. lat/lon/time to create maps for every year via a
stacked array, like gridMet data)
- Using text data and dtypes (e.g. use speeches from different people,
shape (n_speech, n_sentences, n_words))

The Preparing to Teach document [2] from the Software Carpentry Instructor
Training materials is a nice summary of how to write effective lesson plans
(and tutorials would be very similar). In addition to adding new tutorials,
we also propose a How to write a tutorial document, which would help users
contribute new high-quality content to the documentation.

Data sets
-------------

Using interesting data in the NumPy docs requires giving all users access
to that data, either inside NumPy or in a separate package. The former is
not the best idea, since it?s hard to do without increasing the size of
NumPy significantly. Even for SciPy there has so far been no consensus on
this (see scipy PR 8707 on adding a new scipy.datasets subpackage).

So we?ll aim for a new (pure Python) package, named numpy-datasets or
scipy-datasets or something similar. That package can take some lessons
from how, e.g., scikit-learn ships data sets. Small data sets can be
included in the repo, large data sets can be accessed via a downloader
class or function.

Related Work
===========

Some examples of documentation organization in other projects:
- Documentation for Jupyter: https://jupyter.org/documentation
- Documentation for Python: https://docs.python.org/3/
- Documentation for TensorFlow: https://www.tensorflow.org/learn

These projects make the intended audience for each part of the
documentation more explicit, as well as previewing some of the content in
each section.

Implementation
============

Besides rewriting the current documentation to some extent, it would be
ideal to have a technical infrastructure that would allow more
contributions from the community. For example, if Jupyter Notebooks could
be submitted as-is as tutorials or How-Tos, this might create more
contributors and broaden the NumPy community.

Similarly, if people could download some of the documentation in Notebook
format, this would certainly mean people would use less outdated material
for learning NumPy.

It would also be interesting if the new structure for the documentation
makes translations easier.

Currently, the documentation for NumPy can be confusing, especially for
beginners. Our proposal is to reorganize the docs in the following
structure:

* For users:
- Absolute Beginners Tutorial
- main Tutorials section
- How To?s for common tasks with NumPy
- Reference Guide
- Explanations
- F2Py Guide
- Glossary

* For developers/contributors:
- Contributor?s Guide
- Building and extending the documentation
- Benchmarking
- NumPy Enhancement Proposals

* Meta information
- Reporting bugs
- Release Notes
- About NumPy
- License

References and Footnotes
====================

[1] What nobody tells you about documentation.
https://www.divio.com/blog/documentation/
[2] Preparing to Teach (from the Software Carpentry Instructor Training
materials).
https://carpentries.github.io/instructor-training/15-lesson-study/index.html
[3] NumPy for absolute beginners Tutorial by Anne Bonner.
https://numpy.org/devdocs/user/absolute_beginners.html

Copyright
========

This document has been placed in the public domain.

-- 
Melissa Weber Mendon?a
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200212/55f546da/attachment.html>

From raghuveer.devulapalli at intel.com  Wed Feb 12 14:36:10 2020
From: raghuveer.devulapalli at intel.com (Devulapalli, Raghuveer)
Date: Wed, 12 Feb 2020 19:36:10 +0000
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <5475fdf3-b435-2266-4c0e-b45b38ebe21e@gmail.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
 <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
 <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>
 <5475fdf3-b435-2266-4c0e-b45b38ebe21e@gmail.com>
Message-ID: <CY4PR11MB128874549242A4A69138203CFB1B0@CY4PR11MB1288.namprd11.prod.outlook.com>

>> I hope there will not be a demand to use many non-universal intrinsics in ufuncs, we will need to work this out on a case-by-case basis in each ufunc. Does that sound reasonable? Are there intrinsics you have already used that have no parallel on other platforms?

I think that is reasonable. It's hard to anticipate the future need and benefit of specialized intrinsics but I tried to make a list of some of the specialized intrinsics that are currently in use in NumPy that I don?t believe exist on other platforms (most of these actually don?t exist on AVX2 either). I am not an expert in ARM or VSX architecture, so please correct me if I am wrong. 

a. _mm512_mask_i32gather_ps
b. _mm512_mask_i32scatter_ps/_mm512_mask_i32scatter_pd
c. _mm512_maskz_loadu_pd/_mm512_maskz_loadu_ps
d. _mm512_getexp_ps
e. _mm512_getmant_ps
f. _mm512_scalef_ps
g. _mm512_permutex2var_ps, _mm512_permutex2var_pd
h. _mm512_maskz_div_ps, _mm512_maskz_div_pd
i. _mm512_permute_ps/_mm512_permute_pd 
j. _mm512_sqrt_ps/pd (I could be wrong on this one, but from the little google search I did, it seems like power ISA doesn?t have a vectorized sqrt instruction)

Software implementations of these instructions is definitely possible. But some of them are not trivial to implement and are surely not going to be one line macro's either. I am also unsure of what implications this has on performance, but we will hopefully find out once we convert these to universal intrinsic and then benchmark. 

Raghuveer

-----Original Message-----
From: NumPy-Discussion <numpy-discussion-bounces+raghuveer.devulapalli=intel.com at python.org> On Behalf Of Matti Picus
Sent: Tuesday, February 11, 2020 11:19 PM
To: numpy-discussion at python.org
Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics

On 11/2/20 8:02 pm, Devulapalli, Raghuveer wrote:
>
> On top of that the performance implications aren?t clear. Software 
> implementations of hardware instructions might perform worse and might 
> not even produce the same result.
>

The proposal for universal intrinsics does not enable replacing an intrinsic on one platform with a software emulation on another: the intrinsics are meant to be compile-time defines that overlay the universal intrinsic with a platform specific one. In order to use a new intrinsic, it must have parallel intrinsics on the other platforms, or cannot be used there: "NPY_CPU_HAVE(FEATURE_NAME)" will always return false so the compiler will not even build a loop for that platform. I will try to clarify that intention in the NEP.


I hope there will not be a demand to use many non-universal intrinsics in ufuncs, we will need to work this out on a case-by-case basis in each ufunc. Does that sound reasonable? Are there intrinsics you have already used that have no parallel on other platforms?


Matti

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

From Jerome.Kieffer at esrf.fr  Thu Feb 13 04:02:51 2020
From: Jerome.Kieffer at esrf.fr (Jerome Kieffer)
Date: Thu, 13 Feb 2020 10:02:51 +0100
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <CY4PR11MB128874549242A4A69138203CFB1B0@CY4PR11MB1288.namprd11.prod.outlook.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
 <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
 <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>
 <5475fdf3-b435-2266-4c0e-b45b38ebe21e@gmail.com>
 <CY4PR11MB128874549242A4A69138203CFB1B0@CY4PR11MB1288.namprd11.prod.outlook.com>
Message-ID: <20200213100251.71742e2e@lintaillefer.esrf.fr>

On Wed, 12 Feb 2020 19:36:10 +0000
"Devulapalli, Raghuveer" <raghuveer.devulapalli at intel.com> wrote:


> j. _mm512_sqrt_ps/pd (I could be wrong on this one, but from the little google search I did, it seems like power ISA doesn?t have a vectorized sqrt instruction)

Hi,
starting at Power7 (we are at Power9), the sqrt is available both in single and double precision: 

https://www.ibm.com/support/knowledgecenter/SSGH2K_12.1.0/com.ibm.xlc121.aix.doc/compiler_ref/vec_sqrt.html

Cheers,

-- 
J?r?me Kieffer
tel +33 476 882 445

From opossumnano at gmail.com  Thu Feb 13 10:41:28 2020
From: opossumnano at gmail.com (ASPP)
Date: Thu, 13 Feb 2020 07:41:28 -0800 (PST)
Subject: [Numpy-discussion] =?utf-8?b?W0FOTl0gMTPhtZfKsCBBZHZhbmNlZCBT?=
 =?utf-8?q?cientific_Programming_in_Python_in_Ghent=2C_Belgium=2C_31_Augus?=
 =?utf-8?b?dOKAlDUgU2VwdGVtYmVyLCAyMDIw?=
Message-ID: <5e456e28.1c69fb81.9303c.f5bd@mx.google.com>

13?? Advanced Scientific Programming in Python
==============================================

a Summer School by the ASPP faculty and the Ghent University

https://aspp.school

Scientists spend more and more time writing, maintaining, and debugging
software. While techniques for doing this efficiently have evolved, only few
scientists have been trained to use them. As a result, instead of doing their
research, they spend far too much time writing deficient code and reinventing
the wheel. In this course we will present a selection of advanced programming
techniques and best practices which are standard in the industry, but especially
tailored to the needs of a programming scientist. Lectures are devised to be
interactive and to give the students enough time to acquire direct hands-on
experience with the materials. Students will work in pairs throughout the school
and will team up to practice the newly learned skills in a real programming
project ? an entertaining computer game.

We use the Python programming language for the entire course. Python works as
a simple programming language for beginners, but more importantly, it also works
great in scientific simulations and data analysis. We show how clean language
design, ease of extensibility, and the great wealth of open source libraries for
scientific computing and data visualization are driving Python to become
a standard tool for the programming scientist.

This school is targeted at Master or PhD students and Post-docs from all areas
of science. Competence in Python or in another language such as Java, C/C++,
MATLAB, or R is absolutely required. Basic knowledge of Python and of a version
control system such as git, subversion, mercurial, or bazaar is assumed.
Participants without any prior experience with Python and/or git should work
through the proposed introductory material before the course.

We are striving hard to get a pool of students which is international and
gender-balanced.

Date & Location
===============
31 August?5 September, 2020. Ghent, Belgium.

Application
===========
You can apply online: https://aspp.school/wiki/applications

Application deadline: 23:59 UTC, Sunday 24 May, 2020

There will be no deadline extension, so be sure to apply on time. Be sure to
read the FAQ before applying: https://aspp.school/wiki/faq

Participation is for free, i.e. no fee is charged! Accommodation in the student
residence comes at no costs for participants. We are trying to arrange financial
coverage for food expenses too, but this may not work. Participants however
should take care of travel expenses by themselves.

Program
=======

? Version control with git and how to contribute to open source projects with GitHub
? Best practices in data visualization
? Testing and debugging scientific code
? Advanced NumPy
? Organizing, documenting, and distributing scientific code
? Advanced scientific Python: context managers and generators
? Writing parallel applications in Python
? Profiling and speeding up scientific code with Cython and numba
? Programming in teams

Faculty
=======

? Caterina Buizza, Personal Robotics Lab, Imperial College London UK
? Lisa Schwetlick, Experimental and Biological Psychology, Universit?t Potsdam Germany
? Nelle Varoquaux, CNRS, TIMC-IMAG, University Grenoble Alpes France
? Nicolas P. Rougier, Inria Bordeaux Sud-Ouest, Institute of Neurodegenerative Disease, University of Bordeaux France
? Pamela Hathway, Neural Reckoning, Imperial College London UK
? Pietro Berkes, NAGRA Kudelski, Lausanne Switzerland
? Rike-Benjamin Schuppner, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin Germany
? Tiziano Zito, Department of Psychology, Humboldt-Universit?t zu Berlin Germany
? Zbigniew J?drzejewski-Szmek, Red Hat Inc., Warsaw Poland

Organizers
==========

Head of the organization for ASPP and responsible for the scientific program:

? Tiziano Zito, Department of Psychology, Humboldt-Universit?t zu Berlin Germany

Local team:

? Nina Turk, Photonics Research Group, INTEC, Ghent University ? imec Belgium
? Freya Acar, Office for Data and Information, City of Ghent Belgium
? Joan Juvert

Institutional organizers:

? Wim Bogaerts, Photonics Research Group, INTEC, Ghent University ?imec Belgium
? Sven Degroeve, VIB-UGent Center for Medical Biotechnology, Ghent Belgium
? Jeroen Famaey, Department of Mathematics and Computer Science, University of Antwerp ? iMinds Belgium
? Bernard Manderick, Artificial Intelligence Lab, Vrije Universiteit Brussel Belgium


Website: https://aspp.school
Contact: info at aspp.school

From opossumnano at gmail.com  Thu Feb 13 10:44:11 2020
From: opossumnano at gmail.com (Tiziano Zito)
Date: Thu, 13 Feb 2020 07:44:11 -0800 (PST)
Subject: [Numpy-discussion] =?utf-8?b?W0FOTl0gMTPhtZfKsCBBZHZhbmNlZCBT?=
 =?utf-8?q?cientific_Programming_in_Python_in_Ghent=2C_Belgium=2C_31_Augus?=
 =?utf-8?b?dOKAlDUgU2VwdGVtYmVyLCAyMDIw?=
Message-ID: <5e456ecb.1c69fb81.dd39f.e267@mx.google.com>

13?? Advanced Scientific Programming in Python
==============================================

a Summer School by the ASPP faculty and the Ghent University

https://aspp.school

Scientists spend more and more time writing, maintaining, and debugging
software. While techniques for doing this efficiently have evolved, only few
scientists have been trained to use them. As a result, instead of doing their
research, they spend far too much time writing deficient code and reinventing
the wheel. In this course we will present a selection of advanced programming
techniques and best practices which are standard in the industry, but especially
tailored to the needs of a programming scientist. Lectures are devised to be
interactive and to give the students enough time to acquire direct hands-on
experience with the materials. Students will work in pairs throughout the school
and will team up to practice the newly learned skills in a real programming
project ? an entertaining computer game.

We use the Python programming language for the entire course. Python works as
a simple programming language for beginners, but more importantly, it also works
great in scientific simulations and data analysis. We show how clean language
design, ease of extensibility, and the great wealth of open source libraries for
scientific computing and data visualization are driving Python to become
a standard tool for the programming scientist.

This school is targeted at Master or PhD students and Post-docs from all areas
of science. Competence in Python or in another language such as Java, C/C++,
MATLAB, or R is absolutely required. Basic knowledge of Python and of a version
control system such as git, subversion, mercurial, or bazaar is assumed.
Participants without any prior experience with Python and/or git should work
through the proposed introductory material before the course.

We are striving hard to get a pool of students which is international and
gender-balanced.

Date & Location
===============
31 August?5 September, 2020. Ghent, Belgium.

Application
===========
You can apply online: https://aspp.school/wiki/applications

Application deadline: 23:59 UTC, Sunday 24 May, 2020

There will be no deadline extension, so be sure to apply on time. Be sure to
read the FAQ before applying: https://aspp.school/wiki/faq

Participation is for free, i.e. no fee is charged! Accommodation in the student
residence comes at no costs for participants. We are trying to arrange financial
coverage for food expenses too, but this may not work. Participants however
should take care of travel expenses by themselves.

Program
=======

? Version control with git and how to contribute to open source projects with GitHub
? Best practices in data visualization
? Testing and debugging scientific code
? Advanced NumPy
? Organizing, documenting, and distributing scientific code
? Advanced scientific Python: context managers and generators
? Writing parallel applications in Python
? Profiling and speeding up scientific code with Cython and numba
? Programming in teams

Faculty
=======

? Caterina Buizza, Personal Robotics Lab, Imperial College London UK
? Lisa Schwetlick, Experimental and Biological Psychology, Universit?t Potsdam Germany
? Nelle Varoquaux, CNRS, TIMC-IMAG, University Grenoble Alpes France
? Nicolas P. Rougier, Inria Bordeaux Sud-Ouest, Institute of Neurodegenerative Disease, University of Bordeaux France
? Pamela Hathway, Neural Reckoning, Imperial College London UK
? Pietro Berkes, NAGRA Kudelski, Lausanne Switzerland
? Rike-Benjamin Schuppner, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin Germany
? Tiziano Zito, Department of Psychology, Humboldt-Universit?t zu Berlin Germany
? Zbigniew J?drzejewski-Szmek, Red Hat Inc., Warsaw Poland

Organizers
==========

Head of the organization for ASPP and responsible for the scientific program:

? Tiziano Zito, Department of Psychology, Humboldt-Universit?t zu Berlin Germany

Local team:

? Nina Turk, Photonics Research Group, INTEC, Ghent University ? imec Belgium
? Freya Acar, Office for Data and Information, City of Ghent Belgium
? Joan Juvert

Institutional organizers:

? Wim Bogaerts, Photonics Research Group, INTEC, Ghent University ?imec Belgium
? Sven Degroeve, VIB-UGent Center for Medical Biotechnology, Ghent Belgium
? Jeroen Famaey, Department of Mathematics and Computer Science, University of Antwerp ? iMinds Belgium
? Bernard Manderick, Artificial Intelligence Lab, Vrije Universiteit Brussel Belgium


Website: https://aspp.school
Contact: info at aspp.school

From ralf.gommers at gmail.com  Thu Feb 13 12:27:13 2020
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 13 Feb 2020 11:27:13 -0600
Subject: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
In-Reply-To: <CY4PR11MB128874549242A4A69138203CFB1B0@CY4PR11MB1288.namprd11.prod.outlook.com>
References: <d9ba7893-3d6c-6806-6328-7265af3d695b@gmail.com>
 <f91f867a-ae11-fb3c-9321-c297c2fca11e@grinta.net>
 <BYAPR11MB2582DB550C2CBEBEFF59B6EAFB030@BYAPR11MB2582.namprd11.prod.outlook.com>
 <DB7PR04MB5962399FA79622462177645EF7030@DB7PR04MB5962.eurprd04.prod.outlook.com>
 <CABL7CQjzqoZ5Zmo=+ooVNijYyKRXY1C3Qr5KEeK0+ceOPGrBnQ@mail.gmail.com>
 <CY4PR11MB12882FDAD26657A92674F253FB180@CY4PR11MB1288.namprd11.prod.outlook.com>
 <5475fdf3-b435-2266-4c0e-b45b38ebe21e@gmail.com>
 <CY4PR11MB128874549242A4A69138203CFB1B0@CY4PR11MB1288.namprd11.prod.outlook.com>
Message-ID: <CABL7CQjVYYJOKubhEsVfwgMO3mUbdQFLo3DrjwMOE-2+Yf36Yg@mail.gmail.com>

On Wed, Feb 12, 2020 at 1:37 PM Devulapalli, Raghuveer <
raghuveer.devulapalli at intel.com> wrote:

> >> I hope there will not be a demand to use many non-universal intrinsics
> in ufuncs, we will need to work this out on a case-by-case basis in each
> ufunc. Does that sound reasonable? Are there intrinsics you have already
> used that have no parallel on other platforms?
>
> I think that is reasonable. It's hard to anticipate the future need and
> benefit of specialized intrinsics but I tried to make a list of some of the
> specialized intrinsics that are currently in use in NumPy that I don?t
> believe exist on other platforms (most of these actually don?t exist on
> AVX2 either). I am not an expert in ARM or VSX architecture, so please
> correct me if I am wrong.
>
> a. _mm512_mask_i32gather_ps
> b. _mm512_mask_i32scatter_ps/_mm512_mask_i32scatter_pd
> c. _mm512_maskz_loadu_pd/_mm512_maskz_loadu_ps
> d. _mm512_getexp_ps
> e. _mm512_getmant_ps
> f. _mm512_scalef_ps
> g. _mm512_permutex2var_ps, _mm512_permutex2var_pd
> h. _mm512_maskz_div_ps, _mm512_maskz_div_pd
> i. _mm512_permute_ps/_mm512_permute_pd
> j. _mm512_sqrt_ps/pd (I could be wrong on this one, but from the little
> google search I did, it seems like power ISA doesn?t have a vectorized sqrt
> instruction)
>
> Software implementations of these instructions is definitely possible. But
> some of them are not trivial to implement and are surely not going to be
> one line macro's either. I am also unsure of what implications this has on
> performance, but we will hopefully find out once we convert these to
> universal intrinsic and then benchmark.
>

For these it seems like we don't want software implementations of the
universal intrinsics - if there's no equivalent on PPC/ARM and there's
enough value (performance gain given additional code complexity) in the
additional AVX instructions, then we should still simply use AVX
instructions directly.

Ralf


> Raghuveer
>
> -----Original Message-----
> From: NumPy-Discussion <numpy-discussion-bounces+raghuveer.devulapalli=
> intel.com at python.org> On Behalf Of Matti Picus
> Sent: Tuesday, February 11, 2020 11:19 PM
> To: numpy-discussion at python.org
> Subject: Re: [Numpy-discussion] NEP 38 - Universal SIMD intrinsics
>
> On 11/2/20 8:02 pm, Devulapalli, Raghuveer wrote:
> >
> > On top of that the performance implications aren?t clear. Software
> > implementations of hardware instructions might perform worse and might
> > not even produce the same result.
> >
>
> The proposal for universal intrinsics does not enable replacing an
> intrinsic on one platform with a software emulation on another: the
> intrinsics are meant to be compile-time defines that overlay the universal
> intrinsic with a platform specific one. In order to use a new intrinsic, it
> must have parallel intrinsics on the other platforms, or cannot be used
> there: "NPY_CPU_HAVE(FEATURE_NAME)" will always return false so the
> compiler will not even build a loop for that platform. I will try to
> clarify that intention in the NEP.
>
>
> I hope there will not be a demand to use many non-universal intrinsics in
> ufuncs, we will need to work this out on a case-by-case basis in each
> ufunc. Does that sound reasonable? Are there intrinsics you have already
> used that have no parallel on other platforms?
>
>
> Matti
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200213/49234563/attachment.html>

From charlesr.harris at gmail.com  Fri Feb 14 14:03:11 2020
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 14 Feb 2020 12:03:11 -0700
Subject: [Numpy-discussion] manylinux2010.
Message-ID: <CAB6mnxJK-FVsXvNwQc8HMUs2=0u26341L4k-1pZpqexMv25hxw@mail.gmail.com>

Hi All,

Just a note that I've moved the nightly NumPy wheels builds to
manylinux2010. Downstream projects testing against those wheels should
check that they are using pip >= 19.0 in order to fetch those wheels. If
there are any problems, please note them here.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200214/0649d71f/attachment.html>

From mark.harfouche at gmail.com  Fri Feb 14 14:47:42 2020
From: mark.harfouche at gmail.com (Mark Harfouche)
Date: Fri, 14 Feb 2020 11:47:42 -0800
Subject: [Numpy-discussion] manylinux2010.
In-Reply-To: <CAB6mnxJK-FVsXvNwQc8HMUs2=0u26341L4k-1pZpqexMv25hxw@mail.gmail.com>
References: <CAB6mnxJK-FVsXvNwQc8HMUs2=0u26341L4k-1pZpqexMv25hxw@mail.gmail.com>
Message-ID: <CAC=AwPw7F9W7gCqN42N8iTEavjFDc=Xb+ZMN66-mx1S-begUbA@mail.gmail.com>

Chuck,

Cool stuff!

Will manylinux1 wheels compiled with older numpy (say 1.14) work with
manylinux2010 wheels?

What is your recommendation for downstream projects that depend on
Cython/numpy to do?

Do you have a document we can read?

Mark

On Fri, Feb 14, 2020 at 11:05 AM Charles R Harris <charlesr.harris at gmail.com>
wrote:

> Hi All,
>
> Just a note that I've moved the nightly NumPy wheels builds to
> manylinux2010. Downstream projects testing against those wheels should
> check that they are using pip >= 19.0 in order to fetch those wheels. If
> there are any problems, please note them here.
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200214/ff8a1575/attachment.html>

From charlesr.harris at gmail.com  Fri Feb 14 15:55:21 2020
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 14 Feb 2020 13:55:21 -0700
Subject: [Numpy-discussion] manylinux2010.
In-Reply-To: <CAC=AwPw7F9W7gCqN42N8iTEavjFDc=Xb+ZMN66-mx1S-begUbA@mail.gmail.com>
References: <CAB6mnxJK-FVsXvNwQc8HMUs2=0u26341L4k-1pZpqexMv25hxw@mail.gmail.com>
 <CAC=AwPw7F9W7gCqN42N8iTEavjFDc=Xb+ZMN66-mx1S-begUbA@mail.gmail.com>
Message-ID: <CAB6mnxJMij+ETLBoe2G3+KWPQGFYPYHq=99sH_LKE6enU0noFA@mail.gmail.com>

On Fri, Feb 14, 2020 at 12:48 PM Mark Harfouche <mark.harfouche at gmail.com>
wrote:

> Chuck,
>
> Cool stuff!
>
> Will manylinux1 wheels compiled with older numpy (say 1.14) work with
> manylinux2010 wheels?
>
> What is your recommendation for downstream projects that depend on
> Cython/numpy to do?
>
> Do you have a document we can read?
>
>
I don't think there will be any problems apart from pip versions, the
reason for using manylinux2010 in the pre-release wheels is to discover if
I'm wrong about that :)

For documentation of the manylinux project, see PyPA
<https://github.com/pypa/manylinux>, PEP 571
<https://www.python.org/dev/peps/pep-0571/>, and PEP 599
<https://www.python.org/dev/peps/pep-0599/>.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200214/b6410977/attachment.html>

From sebastian at sipsolutions.net  Fri Feb 14 16:27:29 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 14 Feb 2020 13:27:29 -0800
Subject: [Numpy-discussion] Py-API: Deprecate `np.dtype(np.floating)` and
 similar dtype creation
Message-ID: <b506c748ff3a1573b0dd54aea87d457ed37f48ec.camel@sipsolutions.net>

Hi all,

In https://github.com/numpy/numpy/pull/15534 I would like to start
deprecating creating dtypes from "abstract" scalar classes, such as:

np.dtype(np.floating) is np.dtype(np.float64)

While, at the same time, `isinstance(np.float32, np.floating)` is true.

Right now `arr.astype(np.floating, copy=False)` and, more obviously,
`arr.astype(np.dtype(np.floating), copy=False)` will cast a float32
array to float64.

I think we should deprecate this, to consistently enable that in the
future `dtype=np.floating` may choose to not cast a float32 array. Of
course for the `astype` call the DeprecationWarning would be changed to
a FutureWarning before we change the result value.

A slight (but hopefully rare) annoyance is that `np.integer` might be
used since it reads fairly well compared to `np.int_`. The large
upstream packages such as SciPy or astropy seem to be clean in this
regard, though (at least almost clean).

Does anyone think this is a bad idea? To me these deprecations seem
fairly straight forward, possibly flush out bugs/unintended behaviour,
and necessary for consistent future behaviour. (More similar ones may
have to follow).

If there is some, but not much, hesitation, I can also add this to the
NEP 41 draft. Although I currently feel it is the right thing to do
even if we never had any new dtypes.

- Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200214/a6203153/attachment.sig>

From nathan.goldbaum at gmail.com  Fri Feb 14 16:39:21 2020
From: nathan.goldbaum at gmail.com (Nathan)
Date: Fri, 14 Feb 2020 14:39:21 -0700
Subject: [Numpy-discussion] Py-API: Deprecate `np.dtype(np.floating)`
 and similar dtype creation
In-Reply-To: <b506c748ff3a1573b0dd54aea87d457ed37f48ec.camel@sipsolutions.net>
References: <b506c748ff3a1573b0dd54aea87d457ed37f48ec.camel@sipsolutions.net>
Message-ID: <CAJXewO=WZoFBJYfjEM+7etJZvp-D_AX-QYhX7_kNfuNe+xMxkQ@mail.gmail.com>

For what it's worth, github search only finds two instances of this usage:

https://github.com/search?q=%22np.dtype%28np.floating%29%22&type=Code

On Fri, Feb 14, 2020 at 2:28 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> In https://github.com/numpy/numpy/pull/15534 I would like to start
> deprecating creating dtypes from "abstract" scalar classes, such as:
>
> np.dtype(np.floating) is np.dtype(np.float64)
>
> While, at the same time, `isinstance(np.float32, np.floating)` is true.
>
> Right now `arr.astype(np.floating, copy=False)` and, more obviously,
> `arr.astype(np.dtype(np.floating), copy=False)` will cast a float32
> array to float64.
>
> I think we should deprecate this, to consistently enable that in the
> future `dtype=np.floating` may choose to not cast a float32 array. Of
> course for the `astype` call the DeprecationWarning would be changed to
> a FutureWarning before we change the result value.
>
> A slight (but hopefully rare) annoyance is that `np.integer` might be
> used since it reads fairly well compared to `np.int_`. The large
> upstream packages such as SciPy or astropy seem to be clean in this
> regard, though (at least almost clean).
>
> Does anyone think this is a bad idea? To me these deprecations seem
> fairly straight forward, possibly flush out bugs/unintended behaviour,
> and necessary for consistent future behaviour. (More similar ones may
> have to follow).
>
> If there is some, but not much, hesitation, I can also add this to the
> NEP 41 draft. Although I currently feel it is the right thing to do
> even if we never had any new dtypes.
>
> - Sebastian
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200214/d57d4145/attachment-0001.html>

From sebastian at sipsolutions.net  Fri Feb 14 16:44:08 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 14 Feb 2020 13:44:08 -0800
Subject: [Numpy-discussion] Py-API: Deprecate `np.dtype(np.floating)`
 and similar dtype creation
In-Reply-To: <CAJXewO=WZoFBJYfjEM+7etJZvp-D_AX-QYhX7_kNfuNe+xMxkQ@mail.gmail.com>
References: <b506c748ff3a1573b0dd54aea87d457ed37f48ec.camel@sipsolutions.net>
 <CAJXewO=WZoFBJYfjEM+7etJZvp-D_AX-QYhX7_kNfuNe+xMxkQ@mail.gmail.com>
Message-ID: <39cb4177ff1c61fe35fe6c8efb107c4c8d940ec0.camel@sipsolutions.net>

On Fri, 2020-02-14 at 14:39 -0700, Nathan wrote:
> For what it's worth, github search only finds two instances of this
> usage:
> 
> https://github.com/search?q=%22np.dtype%28np.floating%29%22&type=Code
> 

In most common thing I would expect to be `dtype=np.integer` (possibly
without the `dtype` as a positional argument).
The call your search finds is nice because it must delete `np.dtype`
call.
As is, it is doing the incorrect thing so the deprecation would flush
out a bug.

- Sebastian


> On Fri, Feb 14, 2020 at 2:28 PM Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
> > Hi all,
> > 
> > In https://github.com/numpy/numpy/pull/15534 I would like to start
> > deprecating creating dtypes from "abstract" scalar classes, such
> > as:
> > 
> > np.dtype(np.floating) is np.dtype(np.float64)
> > 
> > While, at the same time, `isinstance(np.float32, np.floating)` is
> > true.
> > 
> > Right now `arr.astype(np.floating, copy=False)` and, more
> > obviously,
> > `arr.astype(np.dtype(np.floating), copy=False)` will cast a float32
> > array to float64.
> > 
> > I think we should deprecate this, to consistently enable that in
> > the
> > future `dtype=np.floating` may choose to not cast a float32 array.
> > Of
> > course for the `astype` call the DeprecationWarning would be
> > changed to
> > a FutureWarning before we change the result value.
> > 
> > A slight (but hopefully rare) annoyance is that `np.integer` might
> > be
> > used since it reads fairly well compared to `np.int_`. The large
> > upstream packages such as SciPy or astropy seem to be clean in this
> > regard, though (at least almost clean).
> > 
> > Does anyone think this is a bad idea? To me these deprecations seem
> > fairly straight forward, possibly flush out bugs/unintended
> > behaviour,
> > and necessary for consistent future behaviour. (More similar ones
> > may
> > have to follow).
> > 
> > If there is some, but not much, hesitation, I can also add this to
> > the
> > NEP 41 draft. Although I currently feel it is the right thing to do
> > even if we never had any new dtypes.
> > 
> > - Sebastian
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200214/b8899cb4/attachment.sig>

From nathan.goldbaum at gmail.com  Fri Feb 14 17:07:07 2020
From: nathan.goldbaum at gmail.com (Nathan)
Date: Fri, 14 Feb 2020 15:07:07 -0700
Subject: [Numpy-discussion] Py-API: Deprecate `np.dtype(np.floating)`
 and similar dtype creation
In-Reply-To: <39cb4177ff1c61fe35fe6c8efb107c4c8d940ec0.camel@sipsolutions.net>
References: <b506c748ff3a1573b0dd54aea87d457ed37f48ec.camel@sipsolutions.net>
 <CAJXewO=WZoFBJYfjEM+7etJZvp-D_AX-QYhX7_kNfuNe+xMxkQ@mail.gmail.com>
 <39cb4177ff1c61fe35fe6c8efb107c4c8d940ec0.camel@sipsolutions.net>
Message-ID: <CAJXewO=md0tWtdWWiOJ6nu0wRPQiJsF680JcdSeo6G72pL_xSg@mail.gmail.com>

Yeah, that seems to be more popular:

https://github.com/search?q=%22dtype%3Dnp.integer%22&type=Code

On Fri, Feb 14, 2020 at 2:45 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Fri, 2020-02-14 at 14:39 -0700, Nathan wrote:
> > For what it's worth, github search only finds two instances of this
> > usage:
> >
> > https://github.com/search?q=%22np.dtype%28np.floating%29%22&type=Code
> >
>
> In most common thing I would expect to be `dtype=np.integer` (possibly
> without the `dtype` as a positional argument).
> The call your search finds is nice because it must delete `np.dtype`
> call.
> As is, it is doing the incorrect thing so the deprecation would flush
> out a bug.
>
> - Sebastian
>
>
> > On Fri, Feb 14, 2020 at 2:28 PM Sebastian Berg <
> > sebastian at sipsolutions.net> wrote:
> > > Hi all,
> > >
> > > In https://github.com/numpy/numpy/pull/15534 I would like to start
> > > deprecating creating dtypes from "abstract" scalar classes, such
> > > as:
> > >
> > > np.dtype(np.floating) is np.dtype(np.float64)
> > >
> > > While, at the same time, `isinstance(np.float32, np.floating)` is
> > > true.
> > >
> > > Right now `arr.astype(np.floating, copy=False)` and, more
> > > obviously,
> > > `arr.astype(np.dtype(np.floating), copy=False)` will cast a float32
> > > array to float64.
> > >
> > > I think we should deprecate this, to consistently enable that in
> > > the
> > > future `dtype=np.floating` may choose to not cast a float32 array.
> > > Of
> > > course for the `astype` call the DeprecationWarning would be
> > > changed to
> > > a FutureWarning before we change the result value.
> > >
> > > A slight (but hopefully rare) annoyance is that `np.integer` might
> > > be
> > > used since it reads fairly well compared to `np.int_`. The large
> > > upstream packages such as SciPy or astropy seem to be clean in this
> > > regard, though (at least almost clean).
> > >
> > > Does anyone think this is a bad idea? To me these deprecations seem
> > > fairly straight forward, possibly flush out bugs/unintended
> > > behaviour,
> > > and necessary for consistent future behaviour. (More similar ones
> > > may
> > > have to follow).
> > >
> > > If there is some, but not much, hesitation, I can also add this to
> > > the
> > > NEP 41 draft. Although I currently feel it is the right thing to do
> > > even if we never had any new dtypes.
> > >
> > > - Sebastian
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200214/0907e5a1/attachment.html>

From albuscode at gmail.com  Fri Feb 14 17:46:45 2020
From: albuscode at gmail.com (Inessa Pawson)
Date: Sat, 15 Feb 2020 08:46:45 +1000
Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 161, Issue 14
In-Reply-To: <mailman.36.1581526804.14874.numpy-discussion@python.org>
References: <mailman.36.1581526804.14874.numpy-discussion@python.org>
Message-ID: <CAPBcLCE-=Jm10Ek0Ye62s4o12b_B47O_Lw1NDJcg+dhiYX6XXg@mail.gmail.com>

Documentation is imperative for project sustainability, yet often
overlooked. Millions of NumPy stakeholders will benefit from this
initiative. Melissa, Mars and Ralf, thank you for taking a lead on this!

On Thu, Feb 13, 2020 at 3:05 AM <numpy-discussion-request at python.org> wrote:

> Send NumPy-Discussion mailing list submissions to
>         numpy-discussion at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>         numpy-discussion-request at python.org
>
> You can reach the person managing the list at
>         numpy-discussion-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
> Today's Topics:
>
>    1. NEP 44 - Restructuring the NumPy Documentation (Melissa Mendon?a)
>
>
>
> ---------- Forwarded message ----------
> From: "Melissa Mendon?a" <melissawm at gmail.com>
> To: numpy-discussion at python.org
> Cc:
> Bcc:
> Date: Wed, 12 Feb 2020 10:55:09 -0300
> Subject: [Numpy-discussion] NEP 44 - Restructuring the NumPy Documentation
> Hi all,
>
> Please see the NEP below for a proposal to restructure the documentation
> of NumPy. The main goal here is to make the documentation more visible and
> organized, and also make contributions easier.
>
> Comments and feedback are welcome!
>
>
> See https://github.com/numpy/numpy/pull/15554 for details.
>
> Best,
>
> Melissa
>
> ----
>
> NEP 44 ? Restructuring the NumPy Documentation
>
> Authors: Ralf Gommers, Melissa Mendon?a, Mars Lee
> Status: Draft
> Type: Process
> Created: 2020-02-11
>
> Abstract
> ======
>
> This document proposes a restructuring of the NumPy Documentation, both in
> form and content, with the goal of making it more organized and
> discoverable for beginners and experienced users.
>
> Motivation and Scope
> =================
>
> See [here](numpy.org/devdocs) for the front page of the latest docs. The
> organization is quite confusing and illogical (e.g. user and developer docs
> are mixed). We propose the following:
>
> - Reorganizing the docs into the four categories mentioned in [1];
> - Creating dedicated sections for Tutorials and How-Tos, including
> orientation on how to create new content;
> - Adding an Explanations section for key concepts and techniques that
> require deeper descriptions, some of which will be rearranged from the
> Reference Guide.
>
> Usage and Impact
> ==============
>
> The documentation is a fundamental part of any software project,
> especially open source projects. In the case of NumPy, many beginners might
> feel demotivated by the current structure of the documentation, since it is
> difficult to discover what to learn (unless the user has a clear view of
> what to look for in the Reference docs, which is not always the case).
>
> Looking at the results of a ?NumPy Tutorial? search on any search engine
> also gives an idea of the demand for this kind of content. Having official
> high-level documentation written using up-to-date content and techniques
> will certainly mean more users (and developers/contributors) are involved
> in the NumPy community.
>
> Backward compatibility
> ==================
>
> The restructuring will effectively demand a complete rewrite of links and
> some of the current content. Input from the community will be useful for
> identifying key links and pages that should not be broken.
>
> Detailed description
> ===============
>
> As discussed in the article [1], there are four categories of doc content:
> - Tutorials
> - How-to guides
> - Explanations
> - Reference guide
>
> We propose to use those categories as the ones we use (for writing and
> reviewing) whenever we add a new documentation section.
>
> The reasoning for this is that it is clearer both for
> developers/documentation writers and to users where each information should
> go, and the scope and tone of each document. For example, if explanations
> are mixed with basic tutorials, beginners might be overwhelmed and
> alienated. On the other hand, if the reference guide contains basic
> how-tos, it might be difficult for experienced users to find the
> information they need, quickly.
>
> Currently, there are many blogs and tutorials on the internet about NumPy
> or using NumPy. One of the issues with this is that if users search for
> this information and end up in an outdated (unofficial) tutorial before
> they find the current official documentation, they end up creating content
> that is confusing, especially for beginners. Having a better infrastructure
> for the documentation also aims to solve this problem by giving users
> high-level, up-to-date official documentation that can be easily updated.
>
> Status and ideas of each type of doc content
> ------------------------------------------------------------
>
> * Reference guide
>
> NumPy has a quite complete reference guide. All functions are documented,
> most have examples, and most are cross-linked well with See Also sections.
> Further improving the reference guide is incremental work that can be done
> (and is being done) by many people. There are, however, many explanations
> in the reference guide. These can be moved to a more dedicated Explanations
> section on the docs.
>
> * How-to guides
>
> NumPy does not have many how-to?s. The subclassing and array ducktyping
> section may be an example of a how-to. Others that could be added are:
> - Parallelization (controlling BLAS multithreading with threadpoolctl,
> using multiprocessing, random number generation, etc.)
> - Storing and loading data (.npy/.npz format, text formats, Zarr, HDF5,
> Bloscpack, etc.)
> - Performance (memory layout, profiling, use with Numba, Cython, or
> Pythran)
> - Writing generic code that works with NumPy, Dask, CuPy, pydata/sparse,
> etc.
>
> * Explanations
>
> There is a reasonable amount of content on fundamental NumPy concepts such
> as indexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could
> be organized better and clarified to ensure it?s really about explaining
> the concepts and not mixed with tutorial or how-to like content.
>
> There are few explanations about anything other than those fundamental
> NumPy concepts.
>
> Some examples of concepts that could be expanded:
> - Copies vs. Views;
> - BLAS and other linear algebra libraries;
> - Fancy indexing.
>
> In addition, there are many explanations in the Reference Guide, which
> should be moved to this new dedicated Explanations section.
>
> * Tutorials
>
> There?s a lot of scope for writing better tutorials. We have a new NumPy
> for absolute beginners tutorial [3] (GSoD project of Anne Bonner). In
> addition we need a number of tutorials addressing different levels of
> experience with Python and NumPy. This could be done using engaging data
> sets, ideas or stories. For example, curve fitting with polynomials and
> functions in numpy.linalg could be done with the Keeling curve (decades
> worth of CO2 concentration in air measurements) rather than with synthetic
> random data.
>
> Ideas for tutorials (these capture the types of things that make sense,
> they?re not necessarily the exact topics we propose to implement):
> - Conway?s game of life with only NumPy (note: already in Nicolas
> Rougier?s book)
> - Using masked arrays to deal with missing data in time series measurements
> - Using Fourier transforms to analyze the Keeling curve data, and
> extrapolate it.
> - Geospatial data (e.g. lat/lon/time to create maps for every year via a
> stacked array, like gridMet data)
> - Using text data and dtypes (e.g. use speeches from different people,
> shape (n_speech, n_sentences, n_words))
>
> The Preparing to Teach document [2] from the Software Carpentry Instructor
> Training materials is a nice summary of how to write effective lesson plans
> (and tutorials would be very similar). In addition to adding new tutorials,
> we also propose a How to write a tutorial document, which would help users
> contribute new high-quality content to the documentation.
>
> Data sets
> -------------
>
> Using interesting data in the NumPy docs requires giving all users access
> to that data, either inside NumPy or in a separate package. The former is
> not the best idea, since it?s hard to do without increasing the size of
> NumPy significantly. Even for SciPy there has so far been no consensus on
> this (see scipy PR 8707 on adding a new scipy.datasets subpackage).
>
> So we?ll aim for a new (pure Python) package, named numpy-datasets or
> scipy-datasets or something similar. That package can take some lessons
> from how, e.g., scikit-learn ships data sets. Small data sets can be
> included in the repo, large data sets can be accessed via a downloader
> class or function.
>
> Related Work
> ===========
>
> Some examples of documentation organization in other projects:
> - Documentation for Jupyter: https://jupyter.org/documentation
> - Documentation for Python: https://docs.python.org/3/
> - Documentation for TensorFlow: https://www.tensorflow.org/learn
>
> These projects make the intended audience for each part of the
> documentation more explicit, as well as previewing some of the content in
> each section.
>
> Implementation
> ============
>
> Besides rewriting the current documentation to some extent, it would be
> ideal to have a technical infrastructure that would allow more
> contributions from the community. For example, if Jupyter Notebooks could
> be submitted as-is as tutorials or How-Tos, this might create more
> contributors and broaden the NumPy community.
>
> Similarly, if people could download some of the documentation in Notebook
> format, this would certainly mean people would use less outdated material
> for learning NumPy.
>
> It would also be interesting if the new structure for the documentation
> makes translations easier.
>
> Currently, the documentation for NumPy can be confusing, especially for
> beginners. Our proposal is to reorganize the docs in the following
> structure:
>
> * For users:
> - Absolute Beginners Tutorial
> - main Tutorials section
> - How To?s for common tasks with NumPy
> - Reference Guide
> - Explanations
> - F2Py Guide
> - Glossary
>
> * For developers/contributors:
> - Contributor?s Guide
> - Building and extending the documentation
> - Benchmarking
> - NumPy Enhancement Proposals
>
> * Meta information
> - Reporting bugs
> - Release Notes
> - About NumPy
> - License
>
> References and Footnotes
> ====================
>
> [1] What nobody tells you about documentation.
> https://www.divio.com/blog/documentation/
> [2] Preparing to Teach (from the Software Carpentry Instructor Training
> materials).
> https://carpentries.github.io/instructor-training/15-lesson-study/index.html
> [3] NumPy for absolute beginners Tutorial by Anne Bonner.
> https://numpy.org/devdocs/user/absolute_beginners.html
>
> Copyright
> ========
>
> This document has been placed in the public domain.
>
> --
> Melissa Weber Mendon?a
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

-- 
Every good wish,
*Inessa Pawson*
Executive Director
Albus Code
inessa at albuscode.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200215/bd06d3a4/attachment-0001.html>

From josef.pktd at gmail.com  Tue Feb 18 10:14:59 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 18 Feb 2020 10:14:59 -0500
Subject: [Numpy-discussion] recent changes in np.maximum.accumulate ?
Message-ID: <CAMMTP+DOJqWfW-A4Ov-hEnaSzu=ZNWB3ZNrOLNqBpupuLSToQQ@mail.gmail.com>

I'm trying to track down test failures of statsmodels against recent master
dev versions of numpy and scipy.

The core computation is the following in one set of tests that fail

        pvals_corrected_raw = pvals * np.arange(ntests, 0, -1)
        pvals_corrected = np.maximum.accumulate(pvals_corrected_raw)

this numpy version
numpy-1.19.0.dev0%2B20200214184618_1f9ab28-cp38-cp38-manylinux2010_x86_64.whl
is in the test run with failures (the first time statsmodel master failed)

the previous version in the test runs didn't have these failures
 numpy-1.19.0.dev0%2B20200212232857_af0dfce-cp38-cp38-manylinux1_x86_64.whl


I'm right now just fishing for candidates for the failures. And I'm not
running any dev versions on my computer.

Were there any recent changes that affect np.maximum.accumulate?

Josef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200218/129403c3/attachment.html>

From sebastian at sipsolutions.net  Tue Feb 18 10:41:58 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 18 Feb 2020 07:41:58 -0800
Subject: [Numpy-discussion] recent changes in np.maximum.accumulate ?
In-Reply-To: <CAMMTP+DOJqWfW-A4Ov-hEnaSzu=ZNWB3ZNrOLNqBpupuLSToQQ@mail.gmail.com>
References: <CAMMTP+DOJqWfW-A4Ov-hEnaSzu=ZNWB3ZNrOLNqBpupuLSToQQ@mail.gmail.com>
Message-ID: <553d467fc241e1b44b9881ada555dad90400d6a7.camel@sipsolutions.net>

On Tue, 2020-02-18 at 10:14 -0500, josef.pktd at gmail.com wrote:
> I'm trying to track down test failures of statsmodels against recent
> master dev versions of numpy and scipy.
> 
> The core computation is the following in one set of tests that fail
> 
>         pvals_corrected_raw = pvals * np.arange(ntests, 0, -1)
>         pvals_corrected = np.maximum.accumulate(pvals_corrected_raw)
> 

Hmmm, the two git hashes indicate few changes between the two versions
(mainly unicode related).

However, recently there was also the addition of AVX-512F loops to
maximum, so that seems like the most reasonable candidate (although I
am unsure it changed exactly between those versions, it is also more
complex maybe due to needing a machine that supports the instructions).

Some details about the input could be nice. But if this is all that is
as input, it sounds like it should be a contiguous array? I guess it
might include subnormal numbers or NaN?

Can you open an issue with some of those details if you have them?

- Sebastian


> this numpy version 
> numpy-1.19.0.dev0%2B20200214184618_1f9ab28-cp38-cp38-
> manylinux2010_x86_64.whl
> is in the test run with failures (the first time statsmodel master
> failed)
> 
> the previous version in the test runs didn't have these failures
>  numpy-1.19.0.dev0%2B20200212232857_af0dfce-cp38-cp38-
> manylinux1_x86_64.whl
> 
> 
> I'm right now just fishing for candidates for the failures. And I'm
> not running any dev versions on my computer.
> 
> Were there any recent changes that affect np.maximum.accumulate?
> 
> Josef
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200218/ff7b67b6/attachment.sig>

From kevin.k.sheppard at gmail.com  Tue Feb 18 11:24:16 2020
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Tue, 18 Feb 2020 16:24:16 +0000
Subject: [Numpy-discussion] recent changes in np.maximum.accumulate ?
Message-ID: <5e4c0fb3.1c69fb81.14035.1b9f@mx.google.com>

An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200218/fd15dbce/attachment.html>

From josef.pktd at gmail.com  Tue Feb 18 11:29:25 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Tue, 18 Feb 2020 11:29:25 -0500
Subject: [Numpy-discussion] recent changes in np.maximum.accumulate ?
In-Reply-To: <553d467fc241e1b44b9881ada555dad90400d6a7.camel@sipsolutions.net>
References: <CAMMTP+DOJqWfW-A4Ov-hEnaSzu=ZNWB3ZNrOLNqBpupuLSToQQ@mail.gmail.com>
 <553d467fc241e1b44b9881ada555dad90400d6a7.camel@sipsolutions.net>
Message-ID: <CAMMTP+C3Wz43LF-QhxM=UY5oEn+FWnTeWOMaY+WY5Ab4+0mc1w@mail.gmail.com>

On Tue, Feb 18, 2020 at 10:43 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Tue, 2020-02-18 at 10:14 -0500, josef.pktd at gmail.com wrote:
> > I'm trying to track down test failures of statsmodels against recent
> > master dev versions of numpy and scipy.
> >
> > The core computation is the following in one set of tests that fail
> >
> >         pvals_corrected_raw = pvals * np.arange(ntests, 0, -1)
> >         pvals_corrected = np.maximum.accumulate(pvals_corrected_raw)
> >
>
> Hmmm, the two git hashes indicate few changes between the two versions
> (mainly unicode related).
>
> However, recently there was also the addition of AVX-512F loops to
> maximum, so that seems like the most reasonable candidate (although I
> am unsure it changed exactly between those versions, it is also more
> complex maybe due to needing a machine that supports the instructions).
>
> Some details about the input could be nice. But if this is all that is
> as input, it sounds like it should be a contiguous array? I guess it
> might include subnormal numbers or NaN?
>

The test failures are on a Travis machine
https://travis-ci.org/statsmodels/statsmodels/jobs/650430129

I can extract the numbers and examples from the unit test on my Windows
computer.
But, if it's machine dependent, then that might not be enough.

The main reason why maximum.accumulate might be the problem is that in some
tests
we don't get monotonically increasing values, e.g.

E            x: array([0.012, 0.02 , 0.024, 0.024, 0.02 , 0.012])
E            y: array([0.012, 0.02 , 0.024, 0.024, 0.024, 0.024])

first row is computed, second row is expected

Josef


>
> Can you open an issue with some of those details if you have them?
>
> - Sebastian
>
>
>
> > this numpy version
> > numpy-1.19.0.dev0%2B20200214184618_1f9ab28-cp38-cp38-
> > manylinux2010_x86_64.whl
> > is in the test run with failures (the first time statsmodel master
> > failed)
> >
> > the previous version in the test runs didn't have these failures
> >  numpy-1.19.0.dev0%2B20200212232857_af0dfce-cp38-cp38-
> > manylinux1_x86_64.whl
> >
> >
> > I'm right now just fishing for candidates for the failures. And I'm
> > not running any dev versions on my computer.
> >
> > Were there any recent changes that affect np.maximum.accumulate?
> >
> > Josef
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200218/4bc83e76/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Feb 18 13:47:52 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 18 Feb 2020 10:47:52 -0800
Subject: [Numpy-discussion] NumPy Community Meeting Wednesday, Feb. 19
Message-ID: <993781f29624a5dc47907afdcb85b5db49222138.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting Wednesday February 19 at 11 am
Pacific Time. Everyone is invited to join in and edit the work-in-
progress meeting topics and notes:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200218/690d9f46/attachment.sig>

From jordan at jordanmartel.com  Tue Feb 18 16:32:52 2020
From: jordan at jordanmartel.com (Jordan)
Date: Tue, 18 Feb 2020 21:32:52 +0000
Subject: [Numpy-discussion] numpy_financial functions
Message-ID: <BJyR0aVNubVj57ln74ZeYuQf_VM8paw3isbCsw1Q-tM1X4ImyM3KCIZczv7LIO3PBqsBO-7Wf40aaecW-NDhltpTg6UlyHabRP3Ydr9hATI=@jordanmartel.com>

I teach finance at IU Bloomington and use the numpy_financial module pretty heavily. I've written a couple of functions for bond math for my own use (duration, convexity, forward rates, etc.). Is there any appetite for expanding numpy_financial beyond the core Excel functions? Is there a point-person for numpy_financial with whom I could correspond about contributing?
Thanks,
Jordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200218/8d1233cd/attachment.html>

From charlesr.harris at gmail.com  Tue Feb 18 16:54:36 2020
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 18 Feb 2020 14:54:36 -0700
Subject: [Numpy-discussion] numpy_financial functions
In-Reply-To: <BJyR0aVNubVj57ln74ZeYuQf_VM8paw3isbCsw1Q-tM1X4ImyM3KCIZczv7LIO3PBqsBO-7Wf40aaecW-NDhltpTg6UlyHabRP3Ydr9hATI=@jordanmartel.com>
References: <BJyR0aVNubVj57ln74ZeYuQf_VM8paw3isbCsw1Q-tM1X4ImyM3KCIZczv7LIO3PBqsBO-7Wf40aaecW-NDhltpTg6UlyHabRP3Ydr9hATI=@jordanmartel.com>
Message-ID: <CAB6mnxKnm_tKu4VGHj=d=_DqrNNyCMQS0WiRdZjzE4ngHDkb-g@mail.gmail.com>

On Tue, Feb 18, 2020 at 2:33 PM Jordan <jordan at jordanmartel.com> wrote:

> I teach finance at IU Bloomington and use the numpy_financial module
> pretty heavily. I've written a couple of functions for bond math for my own
> use (duration, convexity, forward rates, etc.). Is there any appetite for
> expanding numpy_financial beyond the core Excel functions? Is there a
> point-person for numpy_financial with whom I could correspond about
> contributing?
> Thanks,
> Jordan
>

The financial package is separate from NumPy at this point. I don't see a
problem with it being extended as long as it is maintained. Your best bet
might be to make a PR at https://github.com/numpy/numpy-financial and
initiate a conversation. I suspect the current maintainer(s) would be happy
for help.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200218/891f9212/attachment.html>

From stefanv at berkeley.edu  Tue Feb 18 20:06:00 2020
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Tue, 18 Feb 2020 17:06:00 -0800
Subject: [Numpy-discussion] Tensor Developer Summit
Message-ID: <167a1c2b-4288-4bc5-bae8-64f2d800e11d@www.fastmail.com>

Hi all,

This has been mentioned on the community calls, but not on the mailing list, so a reminder about the Tensor Developer Summit happening at March in Berkeley:

https://xd-con.org/tensor-2020/

We would love to have developers and advanced users of NumPy (or other array libraries with Python interfaces) attend.  Registration closes 20 February.

Best regards,
St?fan

From melissawm at gmail.com  Wed Feb 19 06:58:52 2020
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Wed, 19 Feb 2020 08:58:52 -0300
Subject: [Numpy-discussion] Proposal to accept NEP #44: Restructuring the
 NumPy Documentation
Message-ID: <CAC7J6VZiGmecnw7vZrfjC-+dF89nGm03sRs7EQ511bFdtC2vXA@mail.gmail.com>

Hi all,

I am proposing the acceptance of NEP 44 - Restructuring the NumPy
Documentation.

https://numpy.org/neps/nep-0044-restructuring-numpy-docs.html

There were some comments about reorganizing the text to make it clearer,
and some rewording regarding Reference Guides, developer "under-the-hoods"
documentation and the possible mechanism for community contributions in the
form of new tutorials or how-tos. Overall I believe we have answered all
comments and there were no other points of concern.

If there are no substantive objections within 7 days from this email,
then the NEP will be accepted; see NEP 0 for more details:
see http://www.numpy.org/neps/nep-0000.html

Thanks for the comments! Looking forward to start working on this :)

-- 
Melissa Weber Mendon?a
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200219/c5ab34a7/attachment.html>

From ndbecker2 at gmail.com  Wed Feb 19 14:43:38 2020
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 19 Feb 2020 14:43:38 -0500
Subject: [Numpy-discussion] Tensor Developer Summit
References: <167a1c2b-4288-4bc5-bae8-64f2d800e11d@www.fastmail.com>
Message-ID: <r2k35a$3smb$1@ciao.gmane.io>

Stefan van der Walt wrote:

> Hi all,
> 
> This has been mentioned on the community calls, but not on the mailing
> list, so a reminder about the Tensor Developer Summit happening at March
> in Berkeley:
> 
> https://xd-con.org/tensor-2020/
> 
> We would love to have developers and advanced users of NumPy (or other
> array libraries with Python interfaces) attend.  Registration closes 20
> February.
> 
> Best regards,
> St?fan
Sounds like an exciting group!  Will it be streamed?

Thanks,
Neal


From stefanv at berkeley.edu  Wed Feb 19 15:36:51 2020
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Wed, 19 Feb 2020 12:36:51 -0800
Subject: [Numpy-discussion] Tensor Developer Summit
In-Reply-To: <r2k35a$3smb$1@ciao.gmane.io>
References: <167a1c2b-4288-4bc5-bae8-64f2d800e11d@www.fastmail.com>
 <r2k35a$3smb$1@ciao.gmane.io>
Message-ID: <c635c0ab-0575-44eb-9038-2e8007e5fbab@www.fastmail.com>

Hi Neil,

On Wed, Feb 19, 2020, at 11:43, Neal Becker wrote:
> Sounds like an exciting group!  Will it be streamed?

Due to the highly interactive nature of this event, we will not be streaming.

Best regards,
St?fan

From sebastian at sipsolutions.net  Wed Feb 19 18:00:25 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 19 Feb 2020 15:00:25 -0800
Subject: [Numpy-discussion] Proposal to accept NEP #44: Restructuring
 the NumPy Documentation
In-Reply-To: <CAC7J6VZiGmecnw7vZrfjC-+dF89nGm03sRs7EQ511bFdtC2vXA@mail.gmail.com>
References: <CAC7J6VZiGmecnw7vZrfjC-+dF89nGm03sRs7EQ511bFdtC2vXA@mail.gmail.com>
Message-ID: <335487fc63226509f5686a2eab2e8cda65919d4c.camel@sipsolutions.net>

On Wed, 2020-02-19 at 08:58 -0300, Melissa Mendon?a wrote:
> Hi all,
> 
> I am proposing the acceptance of NEP 44 - Restructuring the NumPy
> Documentation. 
> 
> https://numpy.org/neps/nep-0044-restructuring-numpy-docs.html
> 
> There were some comments about reorganizing the text to make it
> clearer, and some rewording regarding Reference Guides, developer
> "under-the-hoods" documentation and the possible mechanism for
> community contributions in the form of new tutorials or how-tos.
> Overall I believe we have answered all comments and there were no
> other points of concern.
> If there are no substantive objections within 7 days from this email,
> then the NEP will be accepted; see NEP 0 for more details:
> see http://www.numpy.org/neps/nep-0000.html
> 
> Thanks for the comments! Looking forward to start working on this :)


Looks like a good structure to me, I think go ahead and start!

- Sebastian


> -- 
> Melissa Weber Mendon?a
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200219/0b6f8335/attachment.sig>

From stefanv at berkeley.edu  Fri Feb 21 13:29:16 2020
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Fri, 21 Feb 2020 10:29:16 -0800
Subject: [Numpy-discussion] 
 =?utf-8?q?Proposal_to_accept_NEP_=2344=3A_Res?=
 =?utf-8?q?tructuring_the_NumPy_Documentation?=
In-Reply-To: <CAC7J6VZiGmecnw7vZrfjC-+dF89nGm03sRs7EQ511bFdtC2vXA@mail.gmail.com>
References: <CAC7J6VZiGmecnw7vZrfjC-+dF89nGm03sRs7EQ511bFdtC2vXA@mail.gmail.com>
Message-ID: <004c39a1-49dd-4a77-a6c1-3bdce2fbfad3@www.fastmail.com>

On Wed, Feb 19, 2020, at 03:58, Melissa Mendon?a wrote:
> I am proposing the acceptance of NEP 44 - Restructuring the NumPy Documentation. 
> 
> https://numpy.org/neps/nep-0044-restructuring-numpy-docs.html

Thanks, Melissa, for developing this NEP! The plan makes sense to me.

Best regards,
St?fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200221/df291313/attachment.html>

From charlesr.harris at gmail.com  Fri Feb 21 14:05:05 2020
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Fri, 21 Feb 2020 12:05:05 -0700
Subject: [Numpy-discussion] Scikit-learn in the news.
Message-ID: <CAB6mnx+PLiZX1S2dXLDkm=orh0H0wZMWOByMpevEnuHajfFTig@mail.gmail.com>

Hi All,

Just thought I mention a new paper where scikit-learn was used: A Deep
Learning Approach to Antibiotic Discovery
<https://www.cell.com/cell/fulltext/S0092-8674(20)30102-1#secsectitle0020>.
Congratulations to the scikit-learn team.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200221/3f2decd0/attachment.html>

From sebastian at sipsolutions.net  Fri Feb 21 20:37:01 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 21 Feb 2020 17:37:01 -0800
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
Message-ID: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>

Hi all,

When we create new datatypes, we have the option to make new choices
for the new datatypes [0] (not the existing ones).

The question is: Should every NumPy datatype have a scalar associated
and should operations like indexing return a scalar or a 0-D array?

This is in my opinion a complex, almost philosophical, question, and we
do not have to settle anything for a long time. But, if we do not
decide a direction before we have many new datatypes the decision will
make itself...
So happy about any ideas, even if its just a gut feeling :).

There are various points. I would like to mostly ignore the technical
ones, but I am listing them anyway here:

  * Scalars are faster (although that can be optimized likely)

  * Scalars have a lower memory footprint

  * The current implementation incurs a technical debt in NumPy.
    (I do not think that is a general issue, though. We could
    automatically create scalars for each new datatype probably.)

Advantages of having no scalars:

  * No need to keep track of scalars to preserve them in ufuncs, or
    libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
    (or decide they return always arrays, although ufuncs may not)

  * Seems simpler in many ways, you always know the output will be an
    array if it has to do with NumPy.

Advantages of having scalars:

  * Scalars are immutable and we are used to them from Python.
    A 0-D array cannot be used as a dictionary key consistently [1].

    I.e. without scalars as first class citizen `dict[arr1d[0]]`
    cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
    and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]

  * Object arrays as we have them now make sense, `arr1d[0]` can
    reasonably return a Python object. I.e. arrays feel more like
    container if you can take elements out easily.

Could go both ways:

  * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
    without scalars. With scalars `arr1d[0, ...]` clarifies the
    meaning. (In principle it is good to never use `arr2d[0]` to
    get a 1D slice, probably more-so if scalars exist.)

Note: array-scalars (the current NumPy scalars) are not useful in my
opinion [3]. A scalar should not be indexed or have a shape. I do not
believe in scalars pretending to be arrays.

I personally tend towards liking scalars.  If Python was a language
where the array (array-programming) concept was ingrained into the
language itself, I would lean the other way. But users are used to
scalars, and they "put" scalars into arrays. Array objects are in some
ways strange in Python, and I feel not having scalars detaches them
further.

Having scalars, however also means we should preserve them. I feel in
principle that is actually fairly straight forward. E.g. for ufuncs:

   * np.add(scalar, scalar) -> scalar
   * np.add.reduce(arr, axis=None) -> scalar
   * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
   * np.add.reduce(scalar, axis=()) -> array

Of course libraries that do `np.asarray` would/could basically chose to
not preserve scalars: Their signature is defined as taking strictly
array input.

Cheers,

Sebastian


[0] At best this can be a vision to decide which way they may evolve.

[1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
strange. E.g. Quantity defines hash correctly, but does not fully 
ensure immutability for 0-D Quantities. Ensuring immutability in a
world where "views" are a central concept requires a write-only copy.

[2] Arguably `.item()` would always return a scalar, but it would be a
second class citizen. (Although if it returns a scalar, at least we
already have a scalar implementation.)

[3] They are necessary due to technical debt for NumPy datatypes
though.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200221/3307d187/attachment.sig>

From jni at fastmail.com  Fri Feb 21 21:15:07 2020
From: jni at fastmail.com (Juan Nunez-Iglesias)
Date: Fri, 21 Feb 2020 20:15:07 -0600
Subject: [Numpy-discussion] 
 =?utf-8?q?New_DTypes=3A_Are_scalars_a_central?=
 =?utf-8?q?_concept_in_NumPy_or_not=3F?=
In-Reply-To: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
Message-ID: <ab9400ad-6444-4383-b027-7f3d1c2fdba2@www.fastmail.com>

I personally have always found it weird and annoying to deal with 0-D arrays, so +1 for scalars!*

Juan

*: admittedly, I have almost no grasp of the underlying NumPy implementation complexities, but I will happily take Sebastian's word that scalars can be consistent with the library.

On Fri, 21 Feb 2020, at 7:37 PM, Sebastian Berg wrote:
> Hi all,
> 
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
> 
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
> 
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
> 
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
> 
>  * Scalars are faster (although that can be optimized likely)
> 
>  * Scalars have a lower memory footprint
> 
>  * The current implementation incurs a technical debt in NumPy.
>  (I do not think that is a general issue, though. We could
>  automatically create scalars for each new datatype probably.)
> 
> Advantages of having no scalars:
> 
>  * No need to keep track of scalars to preserve them in ufuncs, or
>  libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>  (or decide they return always arrays, although ufuncs may not)
> 
>  * Seems simpler in many ways, you always know the output will be an
>  array if it has to do with NumPy.
> 
> Advantages of having scalars:
> 
>  * Scalars are immutable and we are used to them from Python.
>  A 0-D array cannot be used as a dictionary key consistently [1].
> 
>  I.e. without scalars as first class citizen `dict[arr1d[0]]`
>  cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>  and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
> 
>  * Object arrays as we have them now make sense, `arr1d[0]` can
>  reasonably return a Python object. I.e. arrays feel more like
>  container if you can take elements out easily.
> 
> Could go both ways:
> 
>  * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>  without scalars. With scalars `arr1d[0, ...]` clarifies the
>  meaning. (In principle it is good to never use `arr2d[0]` to
>  get a 1D slice, probably more-so if scalars exist.)
> 
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
> 
> I personally tend towards liking scalars. If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
> 
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
> 
>  * np.add(scalar, scalar) -> scalar
>  * np.add.reduce(arr, axis=None) -> scalar
>  * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>  * np.add.reduce(scalar, axis=()) -> array
> 
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
> 
> Cheers,
> 
> Sebastian
> 
> 
> [0] At best this can be a vision to decide which way they may evolve.
> 
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully 
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
> 
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
> 
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> *Attachments:*
>  * signature.asc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200221/a0546a76/attachment-0001.html>

From evgeny.burovskiy at gmail.com  Sat Feb 22 09:27:01 2020
From: evgeny.burovskiy at gmail.com (Evgeni Burovski)
Date: Sat, 22 Feb 2020 17:27:01 +0300
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
In-Reply-To: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
Message-ID: <CAMRo0iuRnuJDcvWgygNAuKDEnJQ2Lo4YNFYr5fQkNp0daiH72A@mail.gmail.com>

Hi Sebastian,

Just to clarify the difference:

>>> x = np.float64(42)
>>> y = np.array(42, dtype=float)

Here `x` is a scalar and `y` is a 0D array, correct?
If that's the case, not having the former would be very confusing for
users (at least, that would be very confusing to me, FWIW).

If anything, I think it'd be cleaner to not have the latter, and only
have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
is probably way too late to even think about it anyway.

Cheers,

Evgeni

On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
<sebastian at sipsolutions.net> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From josef.pktd at gmail.com  Sat Feb 22 09:34:55 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 22 Feb 2020 09:34:55 -0500
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
In-Reply-To: <CAMRo0iuRnuJDcvWgygNAuKDEnJQ2Lo4YNFYr5fQkNp0daiH72A@mail.gmail.com>
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
 <CAMRo0iuRnuJDcvWgygNAuKDEnJQ2Lo4YNFYr5fQkNp0daiH72A@mail.gmail.com>
Message-ID: <CAMMTP+BVgxJBK-t_s+bKfGNXqYFR5CkwpFCFJAyep43u6npOBA@mail.gmail.com>

not having a hashable tuple conversion would be a strong limitation

a = tuple(np.arange(5))
versus
a = tuple([np.array(i) for i in range(5)])
{a:5}

Josef

On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski <evgeny.burovskiy at gmail.com>
wrote:

> Hi Sebastian,
>
> Just to clarify the difference:
>
> >>> x = np.float64(42)
> >>> y = np.array(42, dtype=float)
>
> Here `x` is a scalar and `y` is a 0D array, correct?
> If that's the case, not having the former would be very confusing for
> users (at least, that would be very confusing to me, FWIW).
>
> If anything, I think it'd be cleaner to not have the latter, and only
> have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
> is probably way too late to even think about it anyway.
>
> Cheers,
>
> Evgeni
>
> On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> >
> > Hi all,
> >
> > When we create new datatypes, we have the option to make new choices
> > for the new datatypes [0] (not the existing ones).
> >
> > The question is: Should every NumPy datatype have a scalar associated
> > and should operations like indexing return a scalar or a 0-D array?
> >
> > This is in my opinion a complex, almost philosophical, question, and we
> > do not have to settle anything for a long time. But, if we do not
> > decide a direction before we have many new datatypes the decision will
> > make itself...
> > So happy about any ideas, even if its just a gut feeling :).
> >
> > There are various points. I would like to mostly ignore the technical
> > ones, but I am listing them anyway here:
> >
> >   * Scalars are faster (although that can be optimized likely)
> >
> >   * Scalars have a lower memory footprint
> >
> >   * The current implementation incurs a technical debt in NumPy.
> >     (I do not think that is a general issue, though. We could
> >     automatically create scalars for each new datatype probably.)
> >
> > Advantages of having no scalars:
> >
> >   * No need to keep track of scalars to preserve them in ufuncs, or
> >     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
> >     (or decide they return always arrays, although ufuncs may not)
> >
> >   * Seems simpler in many ways, you always know the output will be an
> >     array if it has to do with NumPy.
> >
> > Advantages of having scalars:
> >
> >   * Scalars are immutable and we are used to them from Python.
> >     A 0-D array cannot be used as a dictionary key consistently [1].
> >
> >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
> >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
> >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
> >
> >   * Object arrays as we have them now make sense, `arr1d[0]` can
> >     reasonably return a Python object. I.e. arrays feel more like
> >     container if you can take elements out easily.
> >
> > Could go both ways:
> >
> >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> >     without scalars. With scalars `arr1d[0, ...]` clarifies the
> >     meaning. (In principle it is good to never use `arr2d[0]` to
> >     get a 1D slice, probably more-so if scalars exist.)
> >
> > Note: array-scalars (the current NumPy scalars) are not useful in my
> > opinion [3]. A scalar should not be indexed or have a shape. I do not
> > believe in scalars pretending to be arrays.
> >
> > I personally tend towards liking scalars.  If Python was a language
> > where the array (array-programming) concept was ingrained into the
> > language itself, I would lean the other way. But users are used to
> > scalars, and they "put" scalars into arrays. Array objects are in some
> > ways strange in Python, and I feel not having scalars detaches them
> > further.
> >
> > Having scalars, however also means we should preserve them. I feel in
> > principle that is actually fairly straight forward. E.g. for ufuncs:
> >
> >    * np.add(scalar, scalar) -> scalar
> >    * np.add.reduce(arr, axis=None) -> scalar
> >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> >    * np.add.reduce(scalar, axis=()) -> array
> >
> > Of course libraries that do `np.asarray` would/could basically chose to
> > not preserve scalars: Their signature is defined as taking strictly
> > array input.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> > [0] At best this can be a vision to decide which way they may evolve.
> >
> > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> > strange. E.g. Quantity defines hash correctly, but does not fully
> > ensure immutability for 0-D Quantities. Ensuring immutability in a
> > world where "views" are a central concept requires a write-only copy.
> >
> > [2] Arguably `.item()` would always return a scalar, but it would be a
> > second class citizen. (Although if it returns a scalar, at least we
> > already have a scalar implementation.)
> >
> > [3] They are necessary due to technical debt for NumPy datatypes
> > though.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200222/ed00fefd/attachment.html>

From josef.pktd at gmail.com  Sat Feb 22 09:41:10 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 22 Feb 2020 09:41:10 -0500
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
In-Reply-To: <CAMMTP+BVgxJBK-t_s+bKfGNXqYFR5CkwpFCFJAyep43u6npOBA@mail.gmail.com>
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
 <CAMRo0iuRnuJDcvWgygNAuKDEnJQ2Lo4YNFYr5fQkNp0daiH72A@mail.gmail.com>
 <CAMMTP+BVgxJBK-t_s+bKfGNXqYFR5CkwpFCFJAyep43u6npOBA@mail.gmail.com>
Message-ID: <CAMMTP+B++Aag5e5jAYtu1fJc3+J_ubTyV=ZYviW27ir5U3FpRw@mail.gmail.com>

On Sat, Feb 22, 2020 at 9:34 AM <josef.pktd at gmail.com> wrote:

> not having a hashable tuple conversion would be a strong limitation
>
> a = tuple(np.arange(5))
> versus
> a = tuple([np.array(i) for i in range(5)])
> {a:5}
>

also there is the question of which scalar

.item() versus [()]

This was used in the old times in scipy.stats, and I just saw
https://github.com/scipy/scipy/pull/11165#issuecomment-589952838

aside:
AFAIR, I use 0-dim arrays also to ensure that I have a numpy dtype and not,
e.g. some equivalent python type

Josef


>
> Josef
>
> On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski <
> evgeny.burovskiy at gmail.com> wrote:
>
>> Hi Sebastian,
>>
>> Just to clarify the difference:
>>
>> >>> x = np.float64(42)
>> >>> y = np.array(42, dtype=float)
>>
>> Here `x` is a scalar and `y` is a 0D array, correct?
>> If that's the case, not having the former would be very confusing for
>> users (at least, that would be very confusing to me, FWIW).
>>
>> If anything, I think it'd be cleaner to not have the latter, and only
>> have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
>> is probably way too late to even think about it anyway.
>>
>> Cheers,
>>
>> Evgeni
>>
>> On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
>> <sebastian at sipsolutions.net> wrote:
>> >
>> > Hi all,
>> >
>> > When we create new datatypes, we have the option to make new choices
>> > for the new datatypes [0] (not the existing ones).
>> >
>> > The question is: Should every NumPy datatype have a scalar associated
>> > and should operations like indexing return a scalar or a 0-D array?
>> >
>> > This is in my opinion a complex, almost philosophical, question, and we
>> > do not have to settle anything for a long time. But, if we do not
>> > decide a direction before we have many new datatypes the decision will
>> > make itself...
>> > So happy about any ideas, even if its just a gut feeling :).
>> >
>> > There are various points. I would like to mostly ignore the technical
>> > ones, but I am listing them anyway here:
>> >
>> >   * Scalars are faster (although that can be optimized likely)
>> >
>> >   * Scalars have a lower memory footprint
>> >
>> >   * The current implementation incurs a technical debt in NumPy.
>> >     (I do not think that is a general issue, though. We could
>> >     automatically create scalars for each new datatype probably.)
>> >
>> > Advantages of having no scalars:
>> >
>> >   * No need to keep track of scalars to preserve them in ufuncs, or
>> >     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>> >     (or decide they return always arrays, although ufuncs may not)
>> >
>> >   * Seems simpler in many ways, you always know the output will be an
>> >     array if it has to do with NumPy.
>> >
>> > Advantages of having scalars:
>> >
>> >   * Scalars are immutable and we are used to them from Python.
>> >     A 0-D array cannot be used as a dictionary key consistently [1].
>> >
>> >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>> >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>> >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>> >
>> >   * Object arrays as we have them now make sense, `arr1d[0]` can
>> >     reasonably return a Python object. I.e. arrays feel more like
>> >     container if you can take elements out easily.
>> >
>> > Could go both ways:
>> >
>> >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>> >     without scalars. With scalars `arr1d[0, ...]` clarifies the
>> >     meaning. (In principle it is good to never use `arr2d[0]` to
>> >     get a 1D slice, probably more-so if scalars exist.)
>> >
>> > Note: array-scalars (the current NumPy scalars) are not useful in my
>> > opinion [3]. A scalar should not be indexed or have a shape. I do not
>> > believe in scalars pretending to be arrays.
>> >
>> > I personally tend towards liking scalars.  If Python was a language
>> > where the array (array-programming) concept was ingrained into the
>> > language itself, I would lean the other way. But users are used to
>> > scalars, and they "put" scalars into arrays. Array objects are in some
>> > ways strange in Python, and I feel not having scalars detaches them
>> > further.
>> >
>> > Having scalars, however also means we should preserve them. I feel in
>> > principle that is actually fairly straight forward. E.g. for ufuncs:
>> >
>> >    * np.add(scalar, scalar) -> scalar
>> >    * np.add.reduce(arr, axis=None) -> scalar
>> >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>> >    * np.add.reduce(scalar, axis=()) -> array
>> >
>> > Of course libraries that do `np.asarray` would/could basically chose to
>> > not preserve scalars: Their signature is defined as taking strictly
>> > array input.
>> >
>> > Cheers,
>> >
>> > Sebastian
>> >
>> >
>> > [0] At best this can be a vision to decide which way they may evolve.
>> >
>> > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
>> > strange. E.g. Quantity defines hash correctly, but does not fully
>> > ensure immutability for 0-D Quantities. Ensuring immutability in a
>> > world where "views" are a central concept requires a write-only copy.
>> >
>> > [2] Arguably `.item()` would always return a scalar, but it would be a
>> > second class citizen. (Although if it returns a scalar, at least we
>> > already have a scalar implementation.)
>> >
>> > [3] They are necessary due to technical debt for NumPy datatypes
>> > though.
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200222/6dc33f98/attachment-0001.html>

From josef.pktd at gmail.com  Sat Feb 22 09:53:29 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Sat, 22 Feb 2020 09:53:29 -0500
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
In-Reply-To: <CAMMTP+B++Aag5e5jAYtu1fJc3+J_ubTyV=ZYviW27ir5U3FpRw@mail.gmail.com>
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
 <CAMRo0iuRnuJDcvWgygNAuKDEnJQ2Lo4YNFYr5fQkNp0daiH72A@mail.gmail.com>
 <CAMMTP+BVgxJBK-t_s+bKfGNXqYFR5CkwpFCFJAyep43u6npOBA@mail.gmail.com>
 <CAMMTP+B++Aag5e5jAYtu1fJc3+J_ubTyV=ZYviW27ir5U3FpRw@mail.gmail.com>
Message-ID: <CAMMTP+DjgfEWhuoUjipmpZcaLUAd4DNEpGcTtrMe0gwMA67U5w@mail.gmail.com>

On Sat, Feb 22, 2020 at 9:41 AM <josef.pktd at gmail.com> wrote:

>
>
> On Sat, Feb 22, 2020 at 9:34 AM <josef.pktd at gmail.com> wrote:
>
>> not having a hashable tuple conversion would be a strong limitation
>>
>> a = tuple(np.arange(5))
>> versus
>> a = tuple([np.array(i) for i in range(5)])
>> {a:5}
>>
>
> also there is the question of which scalar
>
> .item() versus [()]
>
> This was used in the old times in scipy.stats, and I just saw
> https://github.com/scipy/scipy/pull/11165#issuecomment-589952838
>
> aside:
> AFAIR, I use 0-dim arrays also to ensure that I have a numpy dtype and
> not, e.g. some equivalent python type
>

0-dim as mutable pseudo-scalar


a = np.asarray(5)
a, id(a)
(array(5), 844574884528)

a[()] = 1
a, id(a)
(array(1), 844574884528)

maybe I never used that,
In a recent similar case, I could use just a 1-d list or array to work
around python's muting or mutability behavior


> Josef
>
>
>>
>> Josef
>>
>> On Sat, Feb 22, 2020 at 9:28 AM Evgeni Burovski <
>> evgeny.burovskiy at gmail.com> wrote:
>>
>>> Hi Sebastian,
>>>
>>> Just to clarify the difference:
>>>
>>> >>> x = np.float64(42)
>>> >>> y = np.array(42, dtype=float)
>>>
>>> Here `x` is a scalar and `y` is a 0D array, correct?
>>> If that's the case, not having the former would be very confusing for
>>> users (at least, that would be very confusing to me, FWIW).
>>>
>>> If anything, I think it'd be cleaner to not have the latter, and only
>>> have either scalars or 1D arrays (i.e., N-D arrays with N>=1), but it
>>> is probably way too late to even think about it anyway.
>>>
>>> Cheers,
>>>
>>> Evgeni
>>>
>>> On Sat, Feb 22, 2020 at 4:37 AM Sebastian Berg
>>> <sebastian at sipsolutions.net> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > When we create new datatypes, we have the option to make new choices
>>> > for the new datatypes [0] (not the existing ones).
>>> >
>>> > The question is: Should every NumPy datatype have a scalar associated
>>> > and should operations like indexing return a scalar or a 0-D array?
>>> >
>>> > This is in my opinion a complex, almost philosophical, question, and we
>>> > do not have to settle anything for a long time. But, if we do not
>>> > decide a direction before we have many new datatypes the decision will
>>> > make itself...
>>> > So happy about any ideas, even if its just a gut feeling :).
>>> >
>>> > There are various points. I would like to mostly ignore the technical
>>> > ones, but I am listing them anyway here:
>>> >
>>> >   * Scalars are faster (although that can be optimized likely)
>>> >
>>> >   * Scalars have a lower memory footprint
>>> >
>>> >   * The current implementation incurs a technical debt in NumPy.
>>> >     (I do not think that is a general issue, though. We could
>>> >     automatically create scalars for each new datatype probably.)
>>> >
>>> > Advantages of having no scalars:
>>> >
>>> >   * No need to keep track of scalars to preserve them in ufuncs, or
>>> >     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>>> >     (or decide they return always arrays, although ufuncs may not)
>>> >
>>> >   * Seems simpler in many ways, you always know the output will be an
>>> >     array if it has to do with NumPy.
>>> >
>>> > Advantages of having scalars:
>>> >
>>> >   * Scalars are immutable and we are used to them from Python.
>>> >     A 0-D array cannot be used as a dictionary key consistently [1].
>>> >
>>> >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>>> >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>>> >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>>> >
>>> >   * Object arrays as we have them now make sense, `arr1d[0]` can
>>> >     reasonably return a Python object. I.e. arrays feel more like
>>> >     container if you can take elements out easily.
>>> >
>>> > Could go both ways:
>>> >
>>> >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>>> >     without scalars. With scalars `arr1d[0, ...]` clarifies the
>>> >     meaning. (In principle it is good to never use `arr2d[0]` to
>>> >     get a 1D slice, probably more-so if scalars exist.)
>>> >
>>> > Note: array-scalars (the current NumPy scalars) are not useful in my
>>> > opinion [3]. A scalar should not be indexed or have a shape. I do not
>>> > believe in scalars pretending to be arrays.
>>> >
>>> > I personally tend towards liking scalars.  If Python was a language
>>> > where the array (array-programming) concept was ingrained into the
>>> > language itself, I would lean the other way. But users are used to
>>> > scalars, and they "put" scalars into arrays. Array objects are in some
>>> > ways strange in Python, and I feel not having scalars detaches them
>>> > further.
>>> >
>>> > Having scalars, however also means we should preserve them. I feel in
>>> > principle that is actually fairly straight forward. E.g. for ufuncs:
>>> >
>>> >    * np.add(scalar, scalar) -> scalar
>>> >    * np.add.reduce(arr, axis=None) -> scalar
>>> >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>>> >    * np.add.reduce(scalar, axis=()) -> array
>>> >
>>> > Of course libraries that do `np.asarray` would/could basically chose to
>>> > not preserve scalars: Their signature is defined as taking strictly
>>> > array input.
>>> >
>>> > Cheers,
>>> >
>>> > Sebastian
>>> >
>>> >
>>> > [0] At best this can be a vision to decide which way they may evolve.
>>> >
>>> > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
>>> > strange. E.g. Quantity defines hash correctly, but does not fully
>>> > ensure immutability for 0-D Quantities. Ensuring immutability in a
>>> > world where "views" are a central concept requires a write-only copy.
>>> >
>>> > [2] Arguably `.item()` would always return a scalar, but it would be a
>>> > second class citizen. (Although if it returns a scalar, at least we
>>> > already have a scalar implementation.)
>>> >
>>> > [3] They are necessary due to technical debt for NumPy datatypes
>>> > though.
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at python.org
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200222/ae9b671a/attachment.html>

From njs at pobox.com  Sat Feb 22 16:28:16 2020
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 22 Feb 2020 13:28:16 -0800
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
In-Reply-To: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
Message-ID: <CAPJVwBmHDQJNZsHaL+1OebjsLvcZdgn+xFF7Y_7JiYgMykX-zw@mail.gmail.com>

Off the cuff, my intuition is that dtypes will want to be able to
define how scalar indexing works, and let it return objects other than
arrays. So e.g.:

- some dtypes might just return a zero-d array
- some dtypes might want to return some arbitrary domain-appropriate
type, like a datetime dtype might want to return datetime.datetime
objects (like how dtype(object) works now)
- some dtypes might want to go to all the trouble to define immutable
duck-array "scalar" types (like how dtype(float) and friends work now)

But I don't think we need to give that last case any special
privileges in the dtype system. For example, I don't think we need to
mandate that everyone who defines their own dtype MUST also implement
a custom duck-array type to act as the scalars, or build a whole
complex system to auto-generate such types given an arbitrary
user-defined dtype.

-n

On Fri, Feb 21, 2020 at 5:37 PM Sebastian Berg
<sebastian at sipsolutions.net> wrote:
>
> Hi all,
>
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
>
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
>
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
>
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
>
>   * Scalars are faster (although that can be optimized likely)
>
>   * Scalars have a lower memory footprint
>
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
>
> Advantages of having no scalars:
>
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
>
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
>
> Advantages of having scalars:
>
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
>
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
>
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
>
> Could go both ways:
>
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
>
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
>
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
>
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
>
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
>
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
>
> Cheers,
>
> Sebastian
>
>
> [0] At best this can be a vision to decide which way they may evolve.
>
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
>
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
>
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


-- 
Nathaniel J. Smith -- https://vorpus.org

From einstein.edison at gmail.com  Sun Feb 23 05:04:10 2020
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sun, 23 Feb 2020 10:04:10 +0000
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
Message-ID: <AM0PR04MB398626065275760C264E5775F7EF0@AM0PR04MB3986.eurprd04.prod.outlook.com>

Hi, Sebastian,

?On 22.02.20, 02:37, "NumPy-Discussion on behalf of Sebastian Berg" <numpy-discussion-bounces+hameerabbasi=yahoo.com at python.org on behalf of sebastian at sipsolutions.net> wrote:

    Hi all,
    
    When we create new datatypes, we have the option to make new choices
    for the new datatypes [0] (not the existing ones).
    
    The question is: Should every NumPy datatype have a scalar associated
    and should operations like indexing return a scalar or a 0-D array?
    
    This is in my opinion a complex, almost philosophical, question, and we
    do not have to settle anything for a long time. But, if we do not
    decide a direction before we have many new datatypes the decision will
    make itself...
    So happy about any ideas, even if its just a gut feeling :).
    
    There are various points. I would like to mostly ignore the technical
    ones, but I am listing them anyway here:
    
      * Scalars are faster (although that can be optimized likely)
    
      * Scalars have a lower memory footprint
    
      * The current implementation incurs a technical debt in NumPy.
        (I do not think that is a general issue, though. We could
        automatically create scalars for each new datatype probably.)
    
    Advantages of having no scalars:
    
      * No need to keep track of scalars to preserve them in ufuncs, or
        libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
        (or decide they return always arrays, although ufuncs may not)
    
      * Seems simpler in many ways, you always know the output will be an
        array if it has to do with NumPy.
    
    Advantages of having scalars:
    
      * Scalars are immutable and we are used to them from Python.
        A 0-D array cannot be used as a dictionary key consistently [1].
    
        I.e. without scalars as first class citizen `dict[arr1d[0]]`
        cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
        and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
    
      * Object arrays as we have them now make sense, `arr1d[0]` can
        reasonably return a Python object. I.e. arrays feel more like
        container if you can take elements out easily.
    
    Could go both ways:
    
      * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
        without scalars. With scalars `arr1d[0, ...]` clarifies the
        meaning. (In principle it is good to never use `arr2d[0]` to
        get a 1D slice, probably more-so if scalars exist.)

From a usability perspective, one could argue that if the dimension of the array one is indexing into is known and the user isn't advanced, then the behavior expected is one of scalars and not 0D arrays. If, however, the input dimension is unknown, then the behavior switch at 0D and the need for an extra ellipsis to ensure array-ness makes things confusing to regular users. I am file with the current behavior of indexing, as anything else would likely be a large backwards-compat break.

    
    Note: array-scalars (the current NumPy scalars) are not useful in my
    opinion [3]. A scalar should not be indexed or have a shape. I do not
    believe in scalars pretending to be arrays.
    
    I personally tend towards liking scalars.  If Python was a language
    where the array (array-programming) concept was ingrained into the
    language itself, I would lean the other way. But users are used to
    scalars, and they "put" scalars into arrays. Array objects are in some
    ways strange in Python, and I feel not having scalars detaches them
    further.
    
    Having scalars, however also means we should preserve them. I feel in
    principle that is actually fairly straight forward. E.g. for ufuncs:
    
       * np.add(scalar, scalar) -> scalar
       * np.add.reduce(arr, axis=None) -> scalar
       * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
       * np.add.reduce(scalar, axis=()) -> array

I love this idea.
    
    Of course libraries that do `np.asarray` would/could basically chose to
    not preserve scalars: Their signature is defined as taking strictly
    array input.
    
    Cheers,
    
    Sebastian
    
    
    [0] At best this can be a vision to decide which way they may evolve.
    
    [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
    strange. E.g. Quantity defines hash correctly, but does not fully 
    ensure immutability for 0-D Quantities. Ensuring immutability in a
    world where "views" are a central concept requires a write-only copy.
    
    [2] Arguably `.item()` would always return a scalar, but it would be a
    second class citizen. (Although if it returns a scalar, at least we
    already have a scalar implementation.)
    
    [3] They are necessary due to technical debt for NumPy datatypes
    though.
    _______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion at python.org
    https://mail.python.org/mailman/listinfo/numpy-discussion
    

From sebastian at sipsolutions.net  Sun Feb 23 16:56:55 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 23 Feb 2020 13:56:55 -0800
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
In-Reply-To: <CAPJVwBmHDQJNZsHaL+1OebjsLvcZdgn+xFF7Y_7JiYgMykX-zw@mail.gmail.com>
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
 <CAPJVwBmHDQJNZsHaL+1OebjsLvcZdgn+xFF7Y_7JiYgMykX-zw@mail.gmail.com>
Message-ID: <4fecc19c9dcc40d935214dd3b997b2cf769cb7d7.camel@sipsolutions.net>

On Sat, 2020-02-22 at 13:28 -0800, Nathaniel Smith wrote:
> Off the cuff, my intuition is that dtypes will want to be able to
> define how scalar indexing works, and let it return objects other
> than
> arrays. So e.g.:
> 
> - some dtypes might just return a zero-d array
> - some dtypes might want to return some arbitrary domain-appropriate
> type, like a datetime dtype might want to return datetime.datetime
> objects (like how dtype(object) works now)
> - some dtypes might want to go to all the trouble to define immutable
> duck-array "scalar" types (like how dtype(float) and friends work
> now)

Right, my assumption is that whatever we suggest is going to be what
most will choose, so we have the chance to move in a certain direction
and set a standard. This is to make code which may or may not deal with
0-D arrays more reliable (more below).

> 
> But I don't think we need to give that last case any special
> privileges in the dtype system. For example, I don't think we need to
> mandate that everyone who defines their own dtype MUST also implement
> a custom duck-array type to act as the scalars, or build a whole
> complex system to auto-generate such types given an arbitrary
> user-defined dtype.

(Note that "autogenerating" would be nothing more than a write-only 0-D 
array, which does not implement indexing.)


There are also categoricals, for which the type may just be "object" in
practice (you could define it closer, but it seems unlikely to be
useful). And for simple numerical types, if we go the `.item()` path,
it is arguably fine if the type is just a python type.

Maybe the crux of the problem is actuall that in general
`np.asarray(arr1d[0])` does not roundtrip for the current object dtype,
and only partially for a categorical above.
As such that is fine, but right now it is hard to tell when you will
have a scalar and when a 0D array.

Maybe it is better to talk about a potentially new `np.pyobject[type]`
datatype (i.e. an object datatype with all elements having the same
python type).
Currently writing generic code with the object dtype is tricky, because
we randomly return the object instead of arrays.
What would be the preference for such a specific dtype?

   * arr1d[0] -> scalar or array?
   * np.add(scalar, scalar) -> scalar or array
   * np.add.reduce(arr) -> scalar or array?

I think the `np.add` case we can decide fairly independently. The main
thing is the indexing. Would we want to force a `.item()` call or not?
Forcing `.item()` is in many ways simpler, I am unsure whether it would
be inconvenient often.

And, maybe the answer is just that for datatypes that do not round-trip 
easily, `.item()` is probably preferable, and for datatypes that do
round-trip scalars are fine.

- Sebastian


> 
> On Fri, Feb 21, 2020 at 5:37 PM Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> > Hi all,
> > 
> > When we create new datatypes, we have the option to make new
> > choices
> > for the new datatypes [0] (not the existing ones).
> > 
> > The question is: Should every NumPy datatype have a scalar
> > associated
> > and should operations like indexing return a scalar or a 0-D array?
> > 
> > This is in my opinion a complex, almost philosophical, question,
> > and we
> > do not have to settle anything for a long time. But, if we do not
> > decide a direction before we have many new datatypes the decision
> > will
> > make itself...
> > So happy about any ideas, even if its just a gut feeling :).
> > 
> > There are various points. I would like to mostly ignore the
> > technical
> > ones, but I am listing them anyway here:
> > 
> >   * Scalars are faster (although that can be optimized likely)
> > 
> >   * Scalars have a lower memory footprint
> > 
> >   * The current implementation incurs a technical debt in NumPy.
> >     (I do not think that is a general issue, though. We could
> >     automatically create scalars for each new datatype probably.)
> > 
> > Advantages of having no scalars:
> > 
> >   * No need to keep track of scalars to preserve them in ufuncs, or
> >     libraries using `np.asarray`, do they need
> > `np.asarray_or_scalar`?
> >     (or decide they return always arrays, although ufuncs may not)
> > 
> >   * Seems simpler in many ways, you always know the output will be
> > an
> >     array if it has to do with NumPy.
> > 
> > Advantages of having scalars:
> > 
> >   * Scalars are immutable and we are used to them from Python.
> >     A 0-D array cannot be used as a dictionary key consistently
> > [1].
> > 
> >     I.e. without scalars as first class citizen `dict[arr1d[0]]`
> >     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is
> > defined,
> >     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work.
> > [2]
> > 
> >   * Object arrays as we have them now make sense, `arr1d[0]` can
> >     reasonably return a Python object. I.e. arrays feel more like
> >     container if you can take elements out easily.
> > 
> > Could go both ways:
> > 
> >   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
> >     without scalars. With scalars `arr1d[0, ...]` clarifies the
> >     meaning. (In principle it is good to never use `arr2d[0]` to
> >     get a 1D slice, probably more-so if scalars exist.)
> > 
> > Note: array-scalars (the current NumPy scalars) are not useful in
> > my
> > opinion [3]. A scalar should not be indexed or have a shape. I do
> > not
> > believe in scalars pretending to be arrays.
> > 
> > I personally tend towards liking scalars.  If Python was a language
> > where the array (array-programming) concept was ingrained into the
> > language itself, I would lean the other way. But users are used to
> > scalars, and they "put" scalars into arrays. Array objects are in
> > some
> > ways strange in Python, and I feel not having scalars detaches them
> > further.
> > 
> > Having scalars, however also means we should preserve them. I feel
> > in
> > principle that is actually fairly straight forward. E.g. for
> > ufuncs:
> > 
> >    * np.add(scalar, scalar) -> scalar
> >    * np.add.reduce(arr, axis=None) -> scalar
> >    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
> >    * np.add.reduce(scalar, axis=()) -> array
> > 
> > Of course libraries that do `np.asarray` would/could basically
> > chose to
> > not preserve scalars: Their signature is defined as taking strictly
> > array input.
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > [0] At best this can be a vision to decide which way they may
> > evolve.
> > 
> > [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is
> > arguably
> > strange. E.g. Quantity defines hash correctly, but does not fully
> > ensure immutability for 0-D Quantities. Ensuring immutability in a
> > world where "views" are a central concept requires a write-only
> > copy.
> > 
> > [2] Arguably `.item()` would always return a scalar, but it would
> > be a
> > second class citizen. (Although if it returns a scalar, at least we
> > already have a scalar implementation.)
> > 
> > [3] They are necessary due to technical debt for NumPy datatypes
> > though.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200223/9ceec910/attachment.sig>

From shoyer at gmail.com  Sun Feb 23 18:30:54 2020
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 23 Feb 2020 15:30:54 -0800
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
 <CAEQ_Tvc9H33HeBmPJsaCrksEWSiOBRwURmXOZfVL=dgPkcExdQ@mail.gmail.com>
 <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net>
Message-ID: <CAEQ_TvdWVT_hQD7H-xS+R7jXXfZOZY9HmGaMh9QLdovPqSvQMA@mail.gmail.com>

On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> > It is less clear how this could work for __array_module__, because
>
> __array_module__ and get_array_module() are not generic -- they
> > refers explicitly to a NumPy like module. If we want to extend it to
> > SciPy (for which I agree there are good use-cases), what should that
> > look __array_module__`
>
> I suppose the question is here, where should the code reside? For
> SciPy, I agree there is a good reason why you may want to "reverse" the
> implementation. The code to support JAX arrays, should live inside JAX.
>
> One, probably silly, option is to return a "global" namespace, so that:
>
>     np = get_array_module(*arrays).numpy`
>
>
My main concern with a "global namespace" is that it adds boilerplate to
the typical usage of fetching a duck-array version of NumPy.

I think the simplest proposal is to add a "module" argument to both
get_array_module and __array_module__, with a default value of "numpy".
This adds flexibility with minimal additional complexity.

The main question is what the type of arguments for "module" should be:
1. Modules could be specified as strings, e.g., "numpy"
2. Module could be specified as actual namespace, e.g., numpy from import
numpy.

The advantage of (1) is that in theory you could write
np.get_array_module(*arrays, module='scipy.linalg') without the overhead of
actually importing scipy.linalg or without even needing scipy to be
installed, if all the arrays use a different scipy.linalg implementation.
But in practice, this seems a little far-fetched. All alternative
implementations of scipy that I know of (e.g., in JAX or conceivably in
Dask) import the original library.

The main downside of (1) is that it would would mean that NumPy's
ndarray.__array_module__ would need to use importlib.import_module() to
dynamically import modules. It also adds a potentially awkward asymmetry
between the "module" and "default" arguments, unless we also switched
default to specify modules with strings.

Either way, the "default" argument will probably need to be adjusted so
that by default it matches whatever value is passed into "module", instead
of always defaulting to "numpy".

Any thoughts on which of these options makes most sense? We could also put
off making any changes to the protocol now, but this change seems pretty
safe and appear to have real use-cases (e.g., for sklearn) so I am inclined
to go ahead with it now before finalizing the NEP.


> We have to distinct issues: Where should e.g. SciPy put a generic
> implementation (assuming they to provide implementations that only
> require NumPy-API support to not require overriding)?
> And, also if a library provides generic support, should we define a
> standard of how the context/namespace may be passed in/provided?
>
> sklearn's main namespace is expected to support many array
> objects/types, but it could be nice to pass in an already known
> context/namespace (say scikit-image already found it, and then calls
> scikit-learn internally). A "generic" namespace may even require this
> to infer the correct output array object.
>
>
> Another thing about backward compatibility: What is our vision there
> actually?
> This NEP will *not* give the *end user* the option to opt-in! Here,
> opt-in is really reserved to the *library user* (e.g. sklearn). (I did
> not realize this clearly before)
>
> Thinking about that for a bit now, that seems like the right choice.
> But it also means that the library requires an easy way of giving a
> FutureWarning, to notify the end-user of the upcoming change. The end-
> user will easily be able to convert to a NumPy array to keep the old
> behaviour.
> Once this warning is given (maybe during `get_array_module()`, the
> array module object/context would preferably be passed around,
> hopefully even between libraries. That provides a reasonable way to
> opt-in to the new behaviour without a warning (mainly for library
> users, end-users can silence the warning if they wish so).
>

I don't think NumPy needs to do anything about warnings. It is
straightforward for libraries that want to use use get_array_module() to
issue their own warnings before calling get_array_module(), if desired.

Or alternatively, if a library is about to add a new __array_module__
method, it is straightforward to issue a warning inside the new
__array_module__ method before returning the NumPy functions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200223/70f60b1c/attachment-0001.html>

From ralf.gommers at gmail.com  Sun Feb 23 18:59:42 2020
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 23 Feb 2020 15:59:42 -0800
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <CAEQ_TvdWVT_hQD7H-xS+R7jXXfZOZY9HmGaMh9QLdovPqSvQMA@mail.gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
 <CAEQ_Tvc9H33HeBmPJsaCrksEWSiOBRwURmXOZfVL=dgPkcExdQ@mail.gmail.com>
 <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net>
 <CAEQ_TvdWVT_hQD7H-xS+R7jXXfZOZY9HmGaMh9QLdovPqSvQMA@mail.gmail.com>
Message-ID: <CABL7CQg6iGotFCcR3B6FJ=CO1QdrGVZefm_eknVn4tpsGi1yig@mail.gmail.com>

On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
>
>>
>> Another thing about backward compatibility: What is our vision there
>> actually?
>> This NEP will *not* give the *end user* the option to opt-in! Here,
>> opt-in is really reserved to the *library user* (e.g. sklearn). (I did
>> not realize this clearly before)
>>
>> Thinking about that for a bit now, that seems like the right choice.
>> But it also means that the library requires an easy way of giving a
>> FutureWarning, to notify the end-user of the upcoming change. The end-
>> user will easily be able to convert to a NumPy array to keep the old
>> behaviour.
>> Once this warning is given (maybe during `get_array_module()`, the
>> array module object/context would preferably be passed around,
>> hopefully even between libraries. That provides a reasonable way to
>> opt-in to the new behaviour without a warning (mainly for library
>> users, end-users can silence the warning if they wish so).
>>
>
> I don't think NumPy needs to do anything about warnings. It is
> straightforward for libraries that want to use use get_array_module() to
> issue their own warnings before calling get_array_module(), if desired.
>

> Or alternatively, if a library is about to add a new __array_module__
> method, it is straightforward to issue a warning inside the new
> __array_module__ method before returning the NumPy functions.
>

I don't think this is quite enough. Sebastian points out a fairly important
issue. One of the main rationales for the whole NEP, and the argument in
multiple places (
https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users)
is that it's now opt-in while __array_function__ was opt-out. This isn't
really true - the problem is simply *moved*, from the duck array libraries
to the array-consuming libraries. The end user will still see the backwards
incompatible change, with no way to turn it off. It will be easier with
__array_module__ to warn users, but this should be expanded on in the NEP.

Also, I'm still not sure I agree with the tone of the discussion on this
topic. It's very heavily inspired by what the JAX devs are telling you (the
NEP still says PyTorch and scipy.sparse as well, but that's not true in
both cases). If you ask Dask and CuPy for example, they're quite happy with
__array_function__ and there haven't been many complaints about backwards
compat breakage.

Cheers,
Ralf


_______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200223/693a3fa1/attachment.html>

From shoyer at gmail.com  Mon Feb 24 01:44:29 2020
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 23 Feb 2020 22:44:29 -0800
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <CABL7CQg6iGotFCcR3B6FJ=CO1QdrGVZefm_eknVn4tpsGi1yig@mail.gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
 <CAEQ_Tvc9H33HeBmPJsaCrksEWSiOBRwURmXOZfVL=dgPkcExdQ@mail.gmail.com>
 <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net>
 <CAEQ_TvdWVT_hQD7H-xS+R7jXXfZOZY9HmGaMh9QLdovPqSvQMA@mail.gmail.com>
 <CABL7CQg6iGotFCcR3B6FJ=CO1QdrGVZefm_eknVn4tpsGi1yig@mail.gmail.com>
Message-ID: <CAEQ_TvddWqzU7EzFBJ6OHmD25t97gwXtCpx6s+WEUF3DX0jpdQ@mail.gmail.com>

On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <
>> sebastian at sipsolutions.net> wrote:
>>
>>>
>>> Another thing about backward compatibility: What is our vision there
>>> actually?
>>> This NEP will *not* give the *end user* the option to opt-in! Here,
>>> opt-in is really reserved to the *library user* (e.g. sklearn). (I did
>>> not realize this clearly before)
>>>
>>> Thinking about that for a bit now, that seems like the right choice.
>>> But it also means that the library requires an easy way of giving a
>>> FutureWarning, to notify the end-user of the upcoming change. The end-
>>> user will easily be able to convert to a NumPy array to keep the old
>>> behaviour.
>>> Once this warning is given (maybe during `get_array_module()`, the
>>> array module object/context would preferably be passed around,
>>> hopefully even between libraries. That provides a reasonable way to
>>> opt-in to the new behaviour without a warning (mainly for library
>>> users, end-users can silence the warning if they wish so).
>>>
>>
>> I don't think NumPy needs to do anything about warnings. It is
>> straightforward for libraries that want to use use get_array_module() to
>> issue their own warnings before calling get_array_module(), if desired.
>>
>
>> Or alternatively, if a library is about to add a new __array_module__
>> method, it is straightforward to issue a warning inside the new
>> __array_module__ method before returning the NumPy functions.
>>
>
> I don't think this is quite enough. Sebastian points out a fairly
> important issue. One of the main rationales for the whole NEP, and the
> argument in multiple places (
> https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users)
> is that it's now opt-in while __array_function__ was opt-out. This isn't
> really true - the problem is simply *moved*, from the duck array libraries
> to the array-consuming libraries. The end user will still see the backwards
> incompatible change, with no way to turn it off. It will be easier with
> __array_module__ to warn users, but this should be expanded on in the NEP.
>

Ralf, thanks for sharing your thoughts.

I'm not quite I understand the concerns about backwards incompatibility:
1. The intention is that implementing a __array_module__ method should be
backwards compatible with all current uses of NumPy. This satisfies
backwards compatibility concerns for an array-implementing library like JAX.
2. In contrast, calling get_array_module() offers no guarantees about
backwards compatibility. This seems nearly impossible, because the entire
point of the protocol is to make it possible to opt-in to new behavior. So
backwards compatibility isn't solved for Scikit-Learn switching to use
get_array_module(), and after Scikit-Learn does so, adding __array_module__
to new types of arrays could potentially have backwards incompatible
consequences for Scikit-Learn (unless sklearn uses default=None).

Are you suggesting just adding something like what I'm writing here into
the NEP? Perhaps along with advice to consider issuing warnings inside
__array_module__  and falling back to legacy behavior when first
implementing it on a new type?

We could also potentially make a few changes to make backwards
compatibility even easier, by making the protocol less aggressive about
assuming that NumPy is a safe fallback. Some non-exclusive options:
a. We could switch the default value of "default" on get_array_module() to
None, so an exception is raised if nothing implements __array_module__.
b. We could includes *all* argument types in "types", not just types that
implement __array_module__. NumPy's ndarray.__array_module__ could then
recognize and refuse to return an implementation if there are other
arguments that might implement __array_module__ in the future (e.g.,
anything outside the standard library?).

The downside of making either of these choices is that it would potentially
make get_array_function() a bit less usable, because it is more likely to
fail, e.g., if called on a float, or some custom type that should be
treated as a scalar.

Also, I'm still not sure I agree with the tone of the discussion on this
> topic. It's very heavily inspired by what the JAX devs are telling you (the
> NEP still says PyTorch and scipy.sparse as well, but that's not true in
> both cases). If you ask Dask and CuPy for example, they're quite happy with
> __array_function__ and there haven't been many complaints about backwards
> compat breakage.
>

I'm linking to comments you wrote in reference to PyTorch and scipy.sparse
in the current draft of the NEP, so I certainly want to make sure that you
agree my characterization :).

Would it be fair to say:
- JAX is reluctant to implement __array_function__ because of concerns
about breaking existing code. JAX developers think that when users use
NumPy functions on JAX arrays, they are explicitly choosing to convert from
JAX to NumPy. This model is fundamentally incompatible __array_function__,
which we chose to override the existing numpy namespace.
- PyTorch and scipy.sparse are not yet in position to implement
__array_function__ (due to a lack of a direct implementation of NumPy's
API), but these projects take backwards compatibility seriously.

Does "take backwards compatibility seriously" sound about right to you? I'm
very open to specific suggestions here. (TensorFlow could probably also be
safely added to this second list.)

Best,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200223/5872edef/attachment-0001.html>

From alt.tblt1 at gmail.com  Mon Feb 24 15:00:47 2020
From: alt.tblt1 at gmail.com (A T)
Date: Mon, 24 Feb 2020 21:00:47 +0100
Subject: [Numpy-discussion] Suggestion: prevent silent downcast in
 np.full_like
Message-ID: <da5cc122-bd48-3a64-1833-c4f360f94ce3@gmail.com>

Hello,

This is my first activity on this mailing list, please let me know if I 
am doing anything improperly.
I recently opened issue #15635 
<https://github.com/numpy/numpy/issues/15635>and it was suggested to me 
that it could be worth discussing here.

Here is a summarized code example of how unsafe downcasting in 
np.full_like() resulted in issues in our scientific toolbox:

t0 = 20.5
# We're trying to make a constant-valued "ufunc"
temperature = lambda x: np.full_like(x, t0)
print(temperature([0.1, 0.7, 2.3]))
# [20.5 20.5 20.5]
print(temperature(0))
# 20

This is consistent with the documentation (which even gives an example 
of this unsafe casting), and was obvious to fix once identified.

But what seems especially problematic to me is the fact that the code 
/looks safe, but isn't/. There is no sketchy `dtype=...` to make you 
think twice about the possibility of downcasting, but it happens all the 
same.

What are your thoughts on this topic? Should a special warning be given? 
Should the casting rule be made more strict?


Alexis THIBAULT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200224/6bac1167/attachment.html>

From allanhaldane at gmail.com  Mon Feb 24 15:29:00 2020
From: allanhaldane at gmail.com (Allan Haldane)
Date: Mon, 24 Feb 2020 15:29:00 -0500
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
In-Reply-To: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
References: <bae54d149e85b28a4e794ddc134a4cb0180189f3.camel@sipsolutions.net>
Message-ID: <9716a1b7-871b-1776-11fe-59e095b89cf0@gmail.com>

I have some thoughts on scalars from playing with ndarray ducktypes
(__array_function__), eg a MaskedArray ndarray-ducktype, for which I
wanted an associated "MaskedScalar" type.

In summary, the ways scalars currently work makes ducktyping
(duck-scalars) difficult:

  * numpy scalar types are not subclassable, so my duck-scalars aren't
    subclasses of numpy scalars and aren't in the type hierarchy
  * even if scalars were subclassable, I would have to subclass each
    scalar datatype individually to make masked versions
  * lots of code checks  `np.isinstance(var, np.float64)` which breaks
    for my duck-scalars
  * it was difficult to distinguish between a duck-scalar and a duck-0d
    array. The method I used in the end seems hacky.

This has led to some daydreams about how scalars should work, and also
led me last to read through your NEPs 40/41 with specific focus on what
you said about scalars, and was about to post there until I saw this
discussion. I agree with what you said in the NEPs about not making
scalars be dtype instances.

Here is what ducktypes led me to:

If we are able to do something like define a `np.numpy_scalar` type
covering all numpy scalars, which has a `.dtype` attribute like you
describe in the NEPs, then that would seem to solve the ducktype
problems above. Ducktype implementors would need to make a "duck-scalar"
type in parallel to their "duck-ndarray" type, but I found that to be
pretty easy using an abstract class in my MaskedArray ducktype, since
the MaskedArray and MaskedScalar share a lot of behavior.

A numpy_scalar type would also help solve some object-array problems if
the object scalars are wrapped in the np_scalar type. A long time ago I
started to try to fix up various funny/strange behaviors of object
datatypes, but there are lots of special cases, and the main problem was
that the returned objects (eg from indexing) were not numpy types and
did not support numpy attributes or indexing. Wrapping the returned
object in `np.numpy_scalar` might add an extra slight annoyance to
people who want to unwrap the object, but I think it would make object
arrays less buggy and make code using object arrays easier to reason
about and debug.

Finally, a few random votes/comments based on the other emails on the list:

I think scalars have a place in numpy (rather than just reusing 0d
arrays), since there is a clear use in having hashable, immutable
scalars. Structured scalars should probably be immutable.

I agree with your suggestion that scalars should not be indexable. Thus,
my duck-scalars (and proposed numpy_scalar) would not be indexable.
However, I think they should encode their datatype though a .dtype
attribute like ndarrays, rather than by inheritance.

Also, something to think about is that currently numpy scalars satisfy
the property `isinstance(np.float64(1), float)`, i.e they are within the
python numerical type hierarchy. 0d arrays do not have this property. My
proposal above would break this. I'm not sure what to think about
whether this is a good property to maintain or not.

Cheers,
Allan


On 2/21/20 8:37 PM, Sebastian Berg wrote:
> Hi all,
> 
> When we create new datatypes, we have the option to make new choices
> for the new datatypes [0] (not the existing ones).
> 
> The question is: Should every NumPy datatype have a scalar associated
> and should operations like indexing return a scalar or a 0-D array?
> 
> This is in my opinion a complex, almost philosophical, question, and we
> do not have to settle anything for a long time. But, if we do not
> decide a direction before we have many new datatypes the decision will
> make itself...
> So happy about any ideas, even if its just a gut feeling :).
> 
> There are various points. I would like to mostly ignore the technical
> ones, but I am listing them anyway here:
> 
>   * Scalars are faster (although that can be optimized likely)
> 
>   * Scalars have a lower memory footprint
> 
>   * The current implementation incurs a technical debt in NumPy.
>     (I do not think that is a general issue, though. We could
>     automatically create scalars for each new datatype probably.)
> 
> Advantages of having no scalars:
> 
>   * No need to keep track of scalars to preserve them in ufuncs, or
>     libraries using `np.asarray`, do they need `np.asarray_or_scalar`?
>     (or decide they return always arrays, although ufuncs may not)
> 
>   * Seems simpler in many ways, you always know the output will be an
>     array if it has to do with NumPy.
> 
> Advantages of having scalars:
> 
>   * Scalars are immutable and we are used to them from Python.
>     A 0-D array cannot be used as a dictionary key consistently [1].
> 
>     I.e. without scalars as first class citizen `dict[arr1d[0]]`
>     cannot work, `dict[arr1d[0].item()]` may (if `.item()` is defined,
>     and e.g. `dict[arr1d[0].frozen()]` could make a copy to work. [2]
> 
>   * Object arrays as we have them now make sense, `arr1d[0]` can
>     reasonably return a Python object. I.e. arrays feel more like
>     container if you can take elements out easily.
> 
> Could go both ways:
> 
>   * Scalar math `scalar = arr1d[0]; scalar += 1` modifies the array
>     without scalars. With scalars `arr1d[0, ...]` clarifies the
>     meaning. (In principle it is good to never use `arr2d[0]` to
>     get a 1D slice, probably more-so if scalars exist.)
> 
> Note: array-scalars (the current NumPy scalars) are not useful in my
> opinion [3]. A scalar should not be indexed or have a shape. I do not
> believe in scalars pretending to be arrays.
> 
> I personally tend towards liking scalars.  If Python was a language
> where the array (array-programming) concept was ingrained into the
> language itself, I would lean the other way. But users are used to
> scalars, and they "put" scalars into arrays. Array objects are in some
> ways strange in Python, and I feel not having scalars detaches them
> further.
> 
> Having scalars, however also means we should preserve them. I feel in
> principle that is actually fairly straight forward. E.g. for ufuncs:
> 
>    * np.add(scalar, scalar) -> scalar
>    * np.add.reduce(arr, axis=None) -> scalar
>    * np.add.reduce(arr, axis=1) -> array (even if arr is 1d)
>    * np.add.reduce(scalar, axis=()) -> array
> 
> Of course libraries that do `np.asarray` would/could basically chose to
> not preserve scalars: Their signature is defined as taking strictly
> array input.
> 
> Cheers,
> 
> Sebastian
> 
> 
> [0] At best this can be a vision to decide which way they may evolve.
> 
> [1] E.g. PyTorch uses `hash(tensor) == id(tensor)` which is arguably
> strange. E.g. Quantity defines hash correctly, but does not fully 
> ensure immutability for 0-D Quantities. Ensuring immutability in a
> world where "views" are a central concept requires a write-only copy.
> 
> [2] Arguably `.item()` would always return a scalar, but it would be a
> second class citizen. (Although if it returns a scalar, at least we
> already have a scalar implementation.)
> 
> [3] They are necessary due to technical debt for NumPy datatypes
> though.
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From stefanv at berkeley.edu  Mon Feb 24 18:03:34 2020
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Mon, 24 Feb 2020 15:03:34 -0800
Subject: [Numpy-discussion] 
 =?utf-8?q?Suggestion=3A_prevent_silent_downca?=
 =?utf-8?q?st_in_np=2Efull=5Flike?=
In-Reply-To: <da5cc122-bd48-3a64-1833-c4f360f94ce3@gmail.com>
References: <da5cc122-bd48-3a64-1833-c4f360f94ce3@gmail.com>
Message-ID: <208093a9-dff0-4fb5-866d-f717fbd8acf0@www.fastmail.com>

Hi Alexis,

On Mon, Feb 24, 2020, at 12:00, A T wrote:
> Here is a summarized code example of how unsafe downcasting in np.full_like() resulted in issues in our scientific toolbox:
> 
>  t0 = 20.5
> # We're trying to make a constant-valued "ufunc"
> temperature = lambda x: np.full_like(x, t0)
> print(temperature([0.1, 0.7, 2.3]))
>  # [20.5 20.5 20.5]
> print(temperature(0))
>  # 20

I agree that this behavior is counter-intuitive. When t0 is not of the same type as x, the user intent is likely to store t0 as-is.

If a check is introduced, there is the question of how strict that check should be. Checking that dtypes are identical may be too strict (there are objects that cast to one another without loss of information). Perhaps, for now, a warning is the best way to flag that something is going on. Users who do not wish to see such a warning can first cast t0 to the correct dtype:

 np.full_like(x, x.dtype.type(t0))

or

 np.full_like(x, int(t0))

I imagine we'd want to show the warning even if dtype is specified.

Best regards,
St?fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200224/cf405e0b/attachment-0001.html>

From stefano.miccoli at polimi.it  Tue Feb 25 04:00:59 2020
From: stefano.miccoli at polimi.it (Stefano Miccoli)
Date: Tue, 25 Feb 2020 09:00:59 +0000
Subject: [Numpy-discussion] New DTypes: Are scalars a central concept in
 NumPy or not?
In-Reply-To: <mailman.1169.1582585396.9236.numpy-discussion@python.org>
References: <mailman.1169.1582585396.9236.numpy-discussion@python.org>
Message-ID: <A39C4FA7-1A2F-4362-8137-68A83CC9AD6D@polimi.it>

The fact that `isinstance(np.float64(1), float)` raises the problem that the current
implementation of np.float64 scalars breaks the Liskov substitution principle:
`sequence_or_array[round(x)]` works if `x` is a float, but breaks down if x is 
a np.float64.

See https://github.com/numpy/numpy/issues/11810, where the issue is discussed in the
broader setting of the semantics of `np.round` vs. python3 `round`.

I do not have a strong opinion here, except that if np.float64?s are within the python
number hierarchy they should be PEP 3141 compliant (which currently they are not.)

Stefano

> On 25 Feb 2020, at 00:03, numpy-discussion-request at python.org wrote:
> 
> Also, something to think about is that currently numpy scalars satisfy
> the property `isinstance(np.float64(1), float)`, i.e they are within the
> python numerical type hierarchy. 0d arrays do not have this property. My
> proposal above would break this. I'm not sure what to think about
> whether this is a good property to maintain or not.
> 
> Cheers,
> Allan


From sebastian at sipsolutions.net  Tue Feb 25 16:05:06 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 25 Feb 2020 13:05:06 -0800
Subject: [Numpy-discussion] NumPy Development Meeting - Triage Focus
Message-ID: <f0b32bf73d7f5ab164b17d824d9855d53b2a198b.camel@sipsolutions.net>

Hi all,

Our bi-weekly triage-focused NumPy development meeting is tomorrow
(Wednesday, Februrary 26) at 11 am Pacific Time. Everyone is invited
to join in and edit the work-in-progress meeting topics and notes:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

I encourage everyone to notify us of issues or PRs that you feel should
be prioritized or simply discussed briefly. Just comment on it so we
can label it, or add your PR/issue to this weeks topics for discussion.

Best regards,

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200225/89b30579/attachment.sig>

From einstein.edison at gmail.com  Wed Feb 26 15:17:39 2020
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Wed, 26 Feb 2020 20:17:39 +0000
Subject: [Numpy-discussion] Output type of round is inconsistent with python
 built-in
Message-ID: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>

Hello,

Currently, the built-in Python round (which is different from np.round) when called on a np.float64 returns a np.float64, due to its __round__ method. A congruous statement is true for np.float32. However, since Python 3, the default behavior of round is to return a Python int when it operates on a Python float. This is a mismatch according to the Liskov Substitution Principle<https://en.wikipedia.org/wiki/Liskov_substitution_principle>, as both these types subclass Python?s float. This has been brought up in gh-15297<https://github.com/numpy/numpy/issues/15297>. Here is the problem summed up in code:

>>> type(round(np.float64(5)))
<class 'numpy.float64'>
>>> type(round(np.float32(5)))
<class 'numpy.float32'>
>>> type(round(float(5)))
<class 'int'>

This problem manifests itself most prominently when trying to index into collections:

>>> np.arange(6)[round(float(5))]
5
>>> np.arange(6)[round(np.float64(5))]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

There still remains the question, do we return Python ints or np.int64s?

  *   Python ints have the advantage of not overflowing.
  *   If we decide to add __round__ to arrays in the future, Python ints may become inconsistent with our design, as such a method will return an int64 array.

This was issue was discussed in the weekly triage meeting today, and the following plan of action was proposed:

  *   change scalar floats to return integers for __round__ (which integer type was not discussed, I propose np.int64)
  *   not change anything else: not 0d arrays and not other numpy functionality
Does anyone have any thoughts on the proposal?
Best regards,
Hameer Abbasi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/5f9c7fb2/attachment.html>

From robert.kern at gmail.com  Wed Feb 26 16:36:31 2020
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 26 Feb 2020 16:36:31 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
Message-ID: <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>

On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

>
> There still remains the question, do we return Python ints or np.int64s?
>
>    - Python ints have the advantage of not overflowing.
>    - If we decide to add __round__ to arrays in the future, Python ints
>    may become inconsistent with our design, as such a method will return an
>    int64 array.
>
>
>
> This was issue was discussed in the weekly triage meeting today, and the
> following plan of action was proposed:
>
>    - change scalar floats to return integers for __round__ (which integer
>    type was not discussed, I propose np.int64)
>    - not change anything else: not 0d arrays and not other numpy
>    functionality
>
> The only reason that float.__round__() was allowed to change to returning
ints was because ints became unbounded. If we also change to returning an
integer type, it should be a Python int.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/fd7b36d0/attachment-0001.html>

From josef.pktd at gmail.com  Wed Feb 26 17:26:10 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 26 Feb 2020 17:26:10 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
Message-ID: <CAMMTP+BM3Wi_nDK=MU6tPMOpq-2DVS3GfwkJaCdtJftu3RJ-DQ@mail.gmail.com>

great another object array

np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20, 2e200])])
array([1, 2, 200000000000000000000,

 199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896],
      dtype=object)


I would rather have numpy consistent with numpy than with python


On Wed, Feb 26, 2020 at 4:38 PM Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
>>
>> There still remains the question, do we return Python ints or np.int64s?
>>
>>    - Python ints have the advantage of not overflowing.
>>    - If we decide to add __round__ to arrays in the future, Python ints
>>    may become inconsistent with our design, as such a method will return an
>>    int64 array.
>>
>>
>>
>> This was issue was discussed in the weekly triage meeting today, and the
>> following plan of action was proposed:
>>
>>    - change scalar floats to return integers for __round__ (which
>>    integer type was not discussed, I propose np.int64)
>>    - not change anything else: not 0d arrays and not other numpy
>>    functionality
>>
>> The only reason that float.__round__() was allowed to change to returning
> ints was because ints became unbounded. If we also change to returning an
> integer type, it should be a Python int.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/a51bc0e6/attachment.html>

From ilhanpolat at gmail.com  Wed Feb 26 17:28:39 2020
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Wed, 26 Feb 2020 22:28:39 +0000
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
Message-ID: <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>

Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?

That would be really awkward for many reasons pandas frame size being
bloated just by rounding for an example. Or numpy array size growing for no
apparent reason

I am not really sure if I understand why LSP should hold in this case to be
honest. Rounding is an operation specific for the number instance and not
for the generic class.


On Wed, Feb 26, 2020, 21:38 Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
>>
>> There still remains the question, do we return Python ints or np.int64s?
>>
>>    - Python ints have the advantage of not overflowing.
>>    - If we decide to add __round__ to arrays in the future, Python ints
>>    may become inconsistent with our design, as such a method will return an
>>    int64 array.
>>
>>
>>
>> This was issue was discussed in the weekly triage meeting today, and the
>> following plan of action was proposed:
>>
>>    - change scalar floats to return integers for __round__ (which
>>    integer type was not discussed, I propose np.int64)
>>    - not change anything else: not 0d arrays and not other numpy
>>    functionality
>>
>> The only reason that float.__round__() was allowed to change to returning
> ints was because ints became unbounded. If we also change to returning an
> integer type, it should be a Python int.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/be82f13c/attachment.html>

From ilhanpolat at gmail.com  Wed Feb 26 17:28:39 2020
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Wed, 26 Feb 2020 22:28:39 +0000
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
Message-ID: <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>

Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?

That would be really awkward for many reasons pandas frame size being
bloated just by rounding for an example. Or numpy array size growing for no
apparent reason

I am not really sure if I understand why LSP should hold in this case to be
honest. Rounding is an operation specific for the number instance and not
for the generic class.


On Wed, Feb 26, 2020, 21:38 Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <einstein.edison at gmail.com>
> wrote:
>
>>
>> There still remains the question, do we return Python ints or np.int64s?
>>
>>    - Python ints have the advantage of not overflowing.
>>    - If we decide to add __round__ to arrays in the future, Python ints
>>    may become inconsistent with our design, as such a method will return an
>>    int64 array.
>>
>>
>>
>> This was issue was discussed in the weekly triage meeting today, and the
>> following plan of action was proposed:
>>
>>    - change scalar floats to return integers for __round__ (which
>>    integer type was not discussed, I propose np.int64)
>>    - not change anything else: not 0d arrays and not other numpy
>>    functionality
>>
>> The only reason that float.__round__() was allowed to change to returning
> ints was because ints became unbounded. If we also change to returning an
> integer type, it should be a Python int.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/be82f13c/attachment-0003.html>

From josef.pktd at gmail.com  Wed Feb 26 17:40:26 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 26 Feb 2020 17:40:26 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
Message-ID: <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>

On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:

> Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
>
> That would be really awkward for many reasons pandas frame size being
> bloated just by rounding for an example. Or numpy array size growing for no
> apparent reason
>
> I am not really sure if I understand why LSP should hold in this case to
> be honest. Rounding is an operation specific for the number instance and
> not for the generic class.
>
>
>
>
> On Wed, Feb 26, 2020, 21:38 Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <einstein.edison at gmail.com>
>> wrote:
>>
>>>
>>> There still remains the question, do we return Python ints or np.int64s?
>>>
>>>    - Python ints have the advantage of not overflowing.
>>>    - If we decide to add __round__ to arrays in the future, Python ints
>>>    may become inconsistent with our design, as such a method will return an
>>>    int64 array.
>>>
>>>
>>>
>>> This was issue was discussed in the weekly triage meeting today, and the
>>> following plan of action was proposed:
>>>
>>>    - change scalar floats to return integers for __round__ (which
>>>    integer type was not discussed, I propose np.int64)
>>>    - not change anything else: not 0d arrays and not other numpy
>>>    functionality
>>>
>>>
I think making numerical behavior different between arrays and numpy
scalars with the same dtype, will create many happy debugging hours.

(although I don't remember having been careful about the distinction
between python scalars and numpy scalars in some time.
I had some fun with integers in the scipy.stats discrete distributions,
until they became floats)

Josef


> The only reason that float.__round__() was allowed to change to returning
>> ints was because ints became unbounded. If we also change to returning an
>> integer type, it should be a Python int.
>>
>> --
>> Robert Kern
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/665c9572/attachment.html>

From robert.kern at gmail.com  Wed Feb 26 18:02:53 2020
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 26 Feb 2020 18:02:53 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
 <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>
Message-ID: <CAF6FJisAH_ov+PFBEfS6h8hXLgHJ1wFzaGD3tX2e6CHnHHH=TA@mail.gmail.com>

On Wed, Feb 26, 2020 at 5:41 PM <josef.pktd at gmail.com> wrote:

>
>
> On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
>> Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
>>
>> That would be really awkward for many reasons pandas frame size being
>> bloated just by rounding for an example. Or numpy array size growing for no
>> apparent reason
>>
>> I am not really sure if I understand why LSP should hold in this case to
>> be honest. Rounding is an operation specific for the number instance and
>> not for the generic class.
>>
>>
>>
>>
>> On Wed, Feb 26, 2020, 21:38 Robert Kern <robert.kern at gmail.com> wrote:
>>
>>> On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <einstein.edison at gmail.com>
>>> wrote:
>>>
>>>>
>>>> There still remains the question, do we return Python ints or np.int64
>>>> s?
>>>>
>>>>    - Python ints have the advantage of not overflowing.
>>>>    - If we decide to add __round__ to arrays in the future, Python ints
>>>>    may become inconsistent with our design, as such a method will return an
>>>>    int64 array.
>>>>
>>>>
>>>>
>>>> This was issue was discussed in the weekly triage meeting today, and
>>>> the following plan of action was proposed:
>>>>
>>>>    - change scalar floats to return integers for __round__ (which
>>>>    integer type was not discussed, I propose np.int64)
>>>>    - not change anything else: not 0d arrays and not other numpy
>>>>    functionality
>>>>
>>>>
> I think making numerical behavior different between arrays and numpy
> scalars with the same dtype, will create many happy debugging hours.
>

round(some_ndarray) isn't implemented, so there is no difference to worry
about.

If you want the float->float rounding, use np.around(). That function
should continue to behave like it currently does for both arrays and
scalars.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/506e7406/attachment.html>

From robert.kern at gmail.com  Wed Feb 26 18:05:30 2020
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 26 Feb 2020 18:05:30 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
Message-ID: <CAF6FJivt7trPrA0dh1zCwqj-nMxnYiC5DB9abDmNgLAv5+93+g@mail.gmail.com>

On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:

> Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
>

No. np.round() is an alias (which would be good to deprecate) for
np.around(). No one has proposed changing np.around().


> That would be really awkward for many reasons pandas frame size being
> bloated just by rounding for an example. Or numpy array size growing for no
> apparent reason
>
> I am not really sure if I understand why LSP should hold in this case to
> be honest. Rounding is an operation specific for the number instance and
> not for the generic class.
>

The type of the return value is part of the type's interface, not the
specific instance.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/fda841d8/attachment-0001.html>

From robert.kern at gmail.com  Wed Feb 26 18:07:57 2020
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 26 Feb 2020 18:07:57 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAMMTP+BM3Wi_nDK=MU6tPMOpq-2DVS3GfwkJaCdtJftu3RJ-DQ@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAMMTP+BM3Wi_nDK=MU6tPMOpq-2DVS3GfwkJaCdtJftu3RJ-DQ@mail.gmail.com>
Message-ID: <CAF6FJivNd+89+upnt67MLhwgxfwjsV1Gyt00wfUwYTpLD=hYDw@mail.gmail.com>

On Wed, Feb 26, 2020 at 5:27 PM <josef.pktd at gmail.com> wrote:

> great another object array
>
> np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20, 2e200])])
> array([1, 2, 200000000000000000000,
>
>  199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896],
>       dtype=object)
>
>
> I would rather have numpy consistent with numpy than with python
>

Since round() (and the __round__() interface) is part of Python and not
numpy, there is nothing in numpy to be consistent with. We only implement
__round__() for the scalar types.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/a605b87f/attachment.html>

From josef.pktd at gmail.com  Wed Feb 26 18:57:50 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 26 Feb 2020 18:57:50 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAF6FJivNd+89+upnt67MLhwgxfwjsV1Gyt00wfUwYTpLD=hYDw@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAMMTP+BM3Wi_nDK=MU6tPMOpq-2DVS3GfwkJaCdtJftu3RJ-DQ@mail.gmail.com>
 <CAF6FJivNd+89+upnt67MLhwgxfwjsV1Gyt00wfUwYTpLD=hYDw@mail.gmail.com>
Message-ID: <CAMMTP+AeEd-o9a+1VEazU_GMXMYBetKnqaUNPpySqKRCkv9yjw@mail.gmail.com>

On Wed, Feb 26, 2020 at 6:09 PM Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Feb 26, 2020 at 5:27 PM <josef.pktd at gmail.com> wrote:
>
>> great another object array
>>
>> np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20, 2e200])])
>> array([1, 2, 200000000000000000000,
>>
>>  199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896],
>>       dtype=object)
>>
>>
>> I would rather have numpy consistent with numpy than with python
>>
>
> Since round() (and the __round__() interface) is part of Python and not
> numpy, there is nothing in numpy to be consistent with. We only implement
> __round__() for the scalar types.
>


Maybe I misunderstand

I'm using np.round a lot. So maybe it's a question whether and how it will
affect np.round.

Does the following change with the proposal?

np.round(np.array([1, 2.5, 2e20, 2e200]))
array([1.e+000, 2.e+000, 2.e+020, 2.e+200])

np.round(np.array([1, 2.5, 2e20, 2e200])).astype(int)
array([          1,           2, -2147483648, -2147483648])

np.round(np.array([2e200])[0])
2e+200

np.round(2e200)
2e+200

round(2e200)
199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896

Josef
"around 100" sounds like "something all_close(100)"


>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/40c7dd6e/attachment.html>

From josef.pktd at gmail.com  Wed Feb 26 19:03:39 2020
From: josef.pktd at gmail.com (josef.pktd at gmail.com)
Date: Wed, 26 Feb 2020 19:03:39 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAMMTP+AeEd-o9a+1VEazU_GMXMYBetKnqaUNPpySqKRCkv9yjw@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAMMTP+BM3Wi_nDK=MU6tPMOpq-2DVS3GfwkJaCdtJftu3RJ-DQ@mail.gmail.com>
 <CAF6FJivNd+89+upnt67MLhwgxfwjsV1Gyt00wfUwYTpLD=hYDw@mail.gmail.com>
 <CAMMTP+AeEd-o9a+1VEazU_GMXMYBetKnqaUNPpySqKRCkv9yjw@mail.gmail.com>
Message-ID: <CAMMTP+D48ycjJCHTM9Y+3SocyJVPT21tBJwURtX=3BMEupDfRA@mail.gmail.com>

On Wed, Feb 26, 2020 at 6:57 PM <josef.pktd at gmail.com> wrote:

>
>
> On Wed, Feb 26, 2020 at 6:09 PM Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Wed, Feb 26, 2020 at 5:27 PM <josef.pktd at gmail.com> wrote:
>>
>>> great another object array
>>>
>>> np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20,
>>> 2e200])])
>>> array([1, 2, 200000000000000000000,
>>>
>>>  199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896],
>>>       dtype=object)
>>>
>>>
>>> I would rather have numpy consistent with numpy than with python
>>>
>>
>> Since round() (and the __round__() interface) is part of Python and not
>> numpy, there is nothing in numpy to be consistent with. We only implement
>> __round__() for the scalar types.
>>
>
>
> Maybe I misunderstand
>
> I'm using np.round a lot. So maybe it's a question whether and how it will
> affect np.round.
>
> Does the following change with the proposal?
>
> np.round(np.array([1, 2.5, 2e20, 2e200]))
> array([1.e+000, 2.e+000, 2.e+020, 2.e+200])
>
> np.round(np.array([1, 2.5, 2e20, 2e200])).astype(int)
> array([          1,           2, -2147483648, -2147483648])
>
> np.round(np.array([2e200])[0])
> 2e+200
>
> np.round(2e200)
> 2e+200
>
> round(2e200)
>
> 199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896
>
> Josef
> "around 100" sounds like "something all_close(100)"
>

I guess I'm slow

It only affects this case, as long as we don't have  __round__ in arrays

round(np.float64(2e200))
2e+200

round(np.array([1, 2.5, 2e20, 2e200]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-177-bd4a17555729> in <module>
----> 1 round(np.array([1, 2.5, 2e20, 2e200]))

TypeError: type numpy.ndarray doesn't define __round__ method

Josef


>
>
>>
>> --
>> Robert Kern
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/9282bf6f/attachment.html>

From robert.kern at gmail.com  Wed Feb 26 19:06:36 2020
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 26 Feb 2020 19:06:36 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAMMTP+AeEd-o9a+1VEazU_GMXMYBetKnqaUNPpySqKRCkv9yjw@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAMMTP+BM3Wi_nDK=MU6tPMOpq-2DVS3GfwkJaCdtJftu3RJ-DQ@mail.gmail.com>
 <CAF6FJivNd+89+upnt67MLhwgxfwjsV1Gyt00wfUwYTpLD=hYDw@mail.gmail.com>
 <CAMMTP+AeEd-o9a+1VEazU_GMXMYBetKnqaUNPpySqKRCkv9yjw@mail.gmail.com>
Message-ID: <CAF6FJis3ZqvEQSXu4-Wvx-cD2+y-+DLJoDELsEMOVe9cJ+j_TQ@mail.gmail.com>

On Wed, Feb 26, 2020 at 6:59 PM <josef.pktd at gmail.com> wrote:

>
>
> On Wed, Feb 26, 2020 at 6:09 PM Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Wed, Feb 26, 2020 at 5:27 PM <josef.pktd at gmail.com> wrote:
>>
>>> great another object array
>>>
>>> np.asarray([round(x_i.item()) for x_i in np.array([1, 2.5, 2e20,
>>> 2e200])])
>>> array([1, 2, 200000000000000000000,
>>>
>>>  199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896],
>>>       dtype=object)
>>>
>>>
>>> I would rather have numpy consistent with numpy than with python
>>>
>>
>> Since round() (and the __round__() interface) is part of Python and not
>> numpy, there is nothing in numpy to be consistent with. We only implement
>> __round__() for the scalar types.
>>
>
>
> Maybe I misunderstand
>
> I'm using np.round a lot. So maybe it's a question whether and how it will
> affect np.round.
>

Nope, not changing.


> Does the following change with the proposal?
>
> np.round(np.array([1, 2.5, 2e20, 2e200]))
> array([1.e+000, 2.e+000, 2.e+020, 2.e+200])
>
> np.round(np.array([1, 2.5, 2e20, 2e200])).astype(int)
> array([          1,           2, -2147483648, -2147483648])
>
> np.round(np.array([2e200])[0])
> 2e+200
>
> np.round(2e200)
> 2e+200
>

No change.


> round(2e200)
>
> 199999999999999993946624442502072331894900655091004725296483501900693696871108151068392676809412503736055024831947764816364271468736556969278770082094479755742047182133579963622363626612334257709776896
>

Obviously, not under out control, but no, that's not changing.

This is the only result that will change:

round(np.float64(2e200))
2e+200


> Josef
> "around 100" sounds like "something all_close(100)"
>

I know. It's meant to be read as "array-round". We prefer the `around()`
spelling to avoid shadowing the built-in. Early mistake that we're still
living with.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200226/d093616a/attachment-0001.html>

From ilhanpolat at gmail.com  Thu Feb 27 00:03:39 2020
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Thu, 27 Feb 2020 05:03:39 +0000
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAF6FJisAH_ov+PFBEfS6h8hXLgHJ1wFzaGD3tX2e6CHnHHH=TA@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
 <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>
 <CAF6FJisAH_ov+PFBEfS6h8hXLgHJ1wFzaGD3tX2e6CHnHHH=TA@mail.gmail.com>
Message-ID: <CAEBuzr-AL0ogJQ26Bv_092h3EU=iSte2zWyam2QQ4v8AkUSbNQ@mail.gmail.com>

It's not about what I want but this changes the output of round. In my
example I didn't use any arrays but a scalar type which looks like will
upcasted.

On Wed, Feb 26, 2020, 23:04 Robert Kern <robert.kern at gmail.com> wrote:

> On Wed, Feb 26, 2020 at 5:41 PM <josef.pktd at gmail.com> wrote:
>
>>
>>
>> On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>>
>>> Does this mean that np.round(np.float32(5)) return a 64 bit upcasted int?
>>>
>>> That would be really awkward for many reasons pandas frame size being
>>> bloated just by rounding for an example. Or numpy array size growing for no
>>> apparent reason
>>>
>>> I am not really sure if I understand why LSP should hold in this case to
>>> be honest. Rounding is an operation specific for the number instance and
>>> not for the generic class.
>>>
>>>
>>>
>>>
>>> On Wed, Feb 26, 2020, 21:38 Robert Kern <robert.kern at gmail.com> wrote:
>>>
>>>> On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <
>>>> einstein.edison at gmail.com> wrote:
>>>>
>>>>>
>>>>> There still remains the question, do we return Python ints or np.int64
>>>>> s?
>>>>>
>>>>>    - Python ints have the advantage of not overflowing.
>>>>>    - If we decide to add __round__ to arrays in the future, Python ints
>>>>>    may become inconsistent with our design, as such a method will return an
>>>>>    int64 array.
>>>>>
>>>>>
>>>>>
>>>>> This was issue was discussed in the weekly triage meeting today, and
>>>>> the following plan of action was proposed:
>>>>>
>>>>>    - change scalar floats to return integers for __round__ (which
>>>>>    integer type was not discussed, I propose np.int64)
>>>>>    - not change anything else: not 0d arrays and not other numpy
>>>>>    functionality
>>>>>
>>>>>
>> I think making numerical behavior different between arrays and numpy
>> scalars with the same dtype, will create many happy debugging hours.
>>
>
> round(some_ndarray) isn't implemented, so there is no difference to worry
> about.
>
> If you want the float->float rounding, use np.around(). That function
> should continue to behave like it currently does for both arrays and
> scalars.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/b1053c06/attachment.html>

From robert.kern at gmail.com  Thu Feb 27 00:10:40 2020
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 27 Feb 2020 00:10:40 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAEBuzr-AL0ogJQ26Bv_092h3EU=iSte2zWyam2QQ4v8AkUSbNQ@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
 <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>
 <CAF6FJisAH_ov+PFBEfS6h8hXLgHJ1wFzaGD3tX2e6CHnHHH=TA@mail.gmail.com>
 <CAEBuzr-AL0ogJQ26Bv_092h3EU=iSte2zWyam2QQ4v8AkUSbNQ@mail.gmail.com>
Message-ID: <CAF6FJivwcBo3Nq4ngHrTkuoGdMzycFFjWRhK+Hg7Ts_yfQ+P3g@mail.gmail.com>

Your example used np.round(), not the builtin round(). np.round() is not
changing. If you want the dtype of the output to be the dtype of the input,
you can certainly keep using np.round() (or its canonical spelling,
np.around()).

On Thu, Feb 27, 2020, 12:05 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:

> It's not about what I want but this changes the output of round. In my
> example I didn't use any arrays but a scalar type which looks like will
> upcasted.
>
> On Wed, Feb 26, 2020, 23:04 Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Wed, Feb 26, 2020 at 5:41 PM <josef.pktd at gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat <ilhanpolat at gmail.com>
>>> wrote:
>>>
>>>> Does this mean that np.round(np.float32(5)) return a 64 bit upcasted
>>>> int?
>>>>
>>>> That would be really awkward for many reasons pandas frame size being
>>>> bloated just by rounding for an example. Or numpy array size growing for no
>>>> apparent reason
>>>>
>>>> I am not really sure if I understand why LSP should hold in this case
>>>> to be honest. Rounding is an operation specific for the number instance and
>>>> not for the generic class.
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Feb 26, 2020, 21:38 Robert Kern <robert.kern at gmail.com> wrote:
>>>>
>>>>> On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <
>>>>> einstein.edison at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> There still remains the question, do we return Python ints or
>>>>>> np.int64s?
>>>>>>
>>>>>>    - Python ints have the advantage of not overflowing.
>>>>>>    - If we decide to add __round__ to arrays in the future, Python
>>>>>>    ints may become inconsistent with our design, as such a method
>>>>>>    will return an int64 array.
>>>>>>
>>>>>>
>>>>>>
>>>>>> This was issue was discussed in the weekly triage meeting today, and
>>>>>> the following plan of action was proposed:
>>>>>>
>>>>>>    - change scalar floats to return integers for __round__ (which
>>>>>>    integer type was not discussed, I propose np.int64)
>>>>>>    - not change anything else: not 0d arrays and not other numpy
>>>>>>    functionality
>>>>>>
>>>>>>
>>> I think making numerical behavior different between arrays and numpy
>>> scalars with the same dtype, will create many happy debugging hours.
>>>
>>
>> round(some_ndarray) isn't implemented, so there is no difference to worry
>> about.
>>
>> If you want the float->float rounding, use np.around(). That function
>> should continue to behave like it currently does for both arrays and
>> scalars.
>>
>> --
>> Robert Kern
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/d1f0ca69/attachment-0001.html>

From albuscode at gmail.com  Thu Feb 27 02:30:33 2020
From: albuscode at gmail.com (Inessa Pawson)
Date: Thu, 27 Feb 2020 17:30:33 +1000
Subject: [Numpy-discussion] help translating into Hindi
Message-ID: <CAPBcLCG4zrpjCXX5VK-m6dvEizAq79ZFtLsfdKuB6hiWMNvF2w@mail.gmail.com>

Our collaboration with the students and faculty from the Master?s program
in Survey Methodology at the University of Michigan and the University of
Maryland at College Park is underway. We are looking for a volunteer to
translate the survey questionnaire into Hindi. If you are available, or you
know someone who would be interested to help, please leave a comment here:
https://github.com/numpy/numpy-surveys/issues/1.

-- 
Every good wish,
*Inessa Pawson *
Executive Director
Albus Code
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/b2f709d9/attachment.html>

From ilhanpolat at gmail.com  Thu Feb 27 02:41:29 2020
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Thu, 27 Feb 2020 07:41:29 +0000
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAF6FJivwcBo3Nq4ngHrTkuoGdMzycFFjWRhK+Hg7Ts_yfQ+P3g@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
 <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>
 <CAF6FJisAH_ov+PFBEfS6h8hXLgHJ1wFzaGD3tX2e6CHnHHH=TA@mail.gmail.com>
 <CAEBuzr-AL0ogJQ26Bv_092h3EU=iSte2zWyam2QQ4v8AkUSbNQ@mail.gmail.com>
 <CAF6FJivwcBo3Nq4ngHrTkuoGdMzycFFjWRhK+Hg7Ts_yfQ+P3g@mail.gmail.com>
Message-ID: <CAEBuzr-rM+7=iTrTfvqAy=02AO8amDPT-61jx=D-xNW-1X4_+Q@mail.gmail.com>

Oh sorry. That's trigger finger np-dotting.

What i mean is if someone was using the round method on float32 or other
small bit datatypes they would have a silent upcasting.

Maybe not a big problem but can have significant impact.

On Thu, Feb 27, 2020, 05:12 Robert Kern <robert.kern at gmail.com> wrote:

> Your example used np.round(), not the builtin round(). np.round() is not
> changing. If you want the dtype of the output to be the dtype of the input,
> you can certainly keep using np.round() (or its canonical spelling,
> np.around()).
>
> On Thu, Feb 27, 2020, 12:05 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
>> It's not about what I want but this changes the output of round. In my
>> example I didn't use any arrays but a scalar type which looks like will
>> upcasted.
>>
>> On Wed, Feb 26, 2020, 23:04 Robert Kern <robert.kern at gmail.com> wrote:
>>
>>> On Wed, Feb 26, 2020 at 5:41 PM <josef.pktd at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Feb 26, 2020 at 5:30 PM Ilhan Polat <ilhanpolat at gmail.com>
>>>> wrote:
>>>>
>>>>> Does this mean that np.round(np.float32(5)) return a 64 bit upcasted
>>>>> int?
>>>>>
>>>>> That would be really awkward for many reasons pandas frame size being
>>>>> bloated just by rounding for an example. Or numpy array size growing for no
>>>>> apparent reason
>>>>>
>>>>> I am not really sure if I understand why LSP should hold in this case
>>>>> to be honest. Rounding is an operation specific for the number instance and
>>>>> not for the generic class.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 26, 2020, 21:38 Robert Kern <robert.kern at gmail.com> wrote:
>>>>>
>>>>>> On Wed, Feb 26, 2020 at 3:19 PM Hameer Abbasi <
>>>>>> einstein.edison at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> There still remains the question, do we return Python ints or
>>>>>>> np.int64s?
>>>>>>>
>>>>>>>    - Python ints have the advantage of not overflowing.
>>>>>>>    - If we decide to add __round__ to arrays in the future, Python
>>>>>>>    ints may become inconsistent with our design, as such a method
>>>>>>>    will return an int64 array.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This was issue was discussed in the weekly triage meeting today, and
>>>>>>> the following plan of action was proposed:
>>>>>>>
>>>>>>>    - change scalar floats to return integers for __round__ (which
>>>>>>>    integer type was not discussed, I propose np.int64)
>>>>>>>    - not change anything else: not 0d arrays and not other numpy
>>>>>>>    functionality
>>>>>>>
>>>>>>>
>>>> I think making numerical behavior different between arrays and numpy
>>>> scalars with the same dtype, will create many happy debugging hours.
>>>>
>>>
>>> round(some_ndarray) isn't implemented, so there is no difference to
>>> worry about.
>>>
>>> If you want the float->float rounding, use np.around(). That function
>>> should continue to behave like it currently does for both arrays and
>>> scalars.
>>>
>>> --
>>> Robert Kern
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/d3a319b5/attachment.html>

From mikofski at berkeley.edu  Thu Feb 27 03:21:03 2020
From: mikofski at berkeley.edu (Dr. Mark Alexander Mikofski PhD)
Date: Thu, 27 Feb 2020 00:21:03 -0800
Subject: [Numpy-discussion] [ANN] pvlib participating in GSoC
Message-ID: <CAEqRcW0iyT3LLg440c2rJroO8LMeVWZnRuQXegpUjYrNKqFnxg@mail.gmail.com>

Excited to announce that pvlib python
<https://pvlib-python.readthedocs.io/en/latest/> is participating in its
first ever Google Summer of Code GSoC under the NumFOCUS umbrella. If you
are a student interested in modeling renewable solar energy please apply:

https://summerofcode.withgoogle.com/organizations/4727917315096576/


For project ideas, take a look at the pvlib github wiki:

https://summerofcode.withgoogle.com/organizations/4727917315096576/


and reach out on our Google Groups forum:

https://groups.google.com/forum/#!forum/pvlib-python


We look forward to hearing from you!

-- 
Mark Mikofski, PhD (2005)
*Fiat Lux*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/2131b7e2/attachment-0001.html>

From stefano.miccoli at polimi.it  Thu Feb 27 04:07:19 2020
From: stefano.miccoli at polimi.it (Stefano Miccoli)
Date: Thu, 27 Feb 2020 09:07:19 +0000
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <mailman.1832.1582791675.9236.numpy-discussion@python.org>
References: <mailman.1832.1582791675.9236.numpy-discussion@python.org>
Message-ID: <4571E446-E241-401B-A589-0F15F2767F75@polimi.it>

There several mixed issues here.

1. PEP 3141 <https://www.python.org/dev/peps/pep-3141/> compliance.

Numpy scalars are `numbers.Real` instances, and have to respect the 
`__round__` semantics defined by PEP 3141:

    def __round__(self, ndigits:Integral=None):
        """Rounds self to ndigits decimal places, defaulting to 0.

        If ndigits is omitted or None, returns an Integral,
        otherwise returns a Real, preferably of the same type as
        self. Types may choose which direction to round half. For
        example, float rounds half toward even.

        """

This means that if Real -> Real rounding is desired one should call 
`round(x, 0)` or `np.around(x)`. 

This semantics only dictates that the return type should be Integral, so
for `round(x)` and `round(x, None)`

np.float32 -> np.int32
np.float32 -> np.int64
np.float64 -> np.int64
np.floatXX -> int

are all OK.
I think also that it is perfectly OK to raise an overflow on `round(x)`


2. Liskov substitution principle

`np.float64` floats are also `float` instances (but `np.float32` are not.)
This means that strictly respecting LSP means that `np.float64` should round to python
`int`, since `round(x)` never overflows for python `float`.

Here we have several options.

- round `np.float64` -> `int` and respect LSP.

- relax LSP, and round  `np.float64` -> `np.int64`. Who cares about `round(1e300)`?

- decide that there is no reason for having `np.float64` a subclass of `float`,
  so that LSP does not apply.


This all said, I think that these are the two most sensible choices for `round(x)`:

np.float32 -> np.int32
np.float64 -> np.int64
drop np.float64 subclassing python float

or

np.float32 -> int
np.float64 -> int
keep np.float64 subclassing python float


The second one seems to me the less disruptive one.

Bye

Stefano

From einstein.edison at gmail.com  Thu Feb 27 04:48:12 2020
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Thu, 27 Feb 2020 09:48:12 +0000
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAEBuzr-rM+7=iTrTfvqAy=02AO8amDPT-61jx=D-xNW-1X4_+Q@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
 <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>
 <CAF6FJisAH_ov+PFBEfS6h8hXLgHJ1wFzaGD3tX2e6CHnHHH=TA@mail.gmail.com>
 <CAEBuzr-AL0ogJQ26Bv_092h3EU=iSte2zWyam2QQ4v8AkUSbNQ@mail.gmail.com>
 <CAF6FJivwcBo3Nq4ngHrTkuoGdMzycFFjWRhK+Hg7Ts_yfQ+P3g@mail.gmail.com>
 <CAEBuzr-rM+7=iTrTfvqAy=02AO8amDPT-61jx=D-xNW-1X4_+Q@mail.gmail.com>
Message-ID: <AM0PR04MB39861CE124DBB02C42A6652FF7EB0@AM0PR04MB3986.eurprd04.prod.outlook.com>

Hello, Ilhan,

From: NumPy-Discussion <numpy-discussion-bounces+einstein.edison=gmail.com at python.org> on behalf of Ilhan Polat <ilhanpolat at gmail.com>
Reply to: Discussion of Numerical Python <numpy-discussion at python.org>
Date: Thursday, 27. February 2020 at 08:41
To: Discussion of Numerical Python <numpy-discussion at python.org>
Subject: Re: [Numpy-discussion] Output type of round is inconsistent with python built-in

Oh sorry. That's trigger finger np-dotting.

What i mean is if someone was using the round method on float32 or other small bit datatypes they would have a silent upcasting.

No they won?t. The only affected types would be scalars, and that too only with the built-in Python round. Arrays don?t define the __round__ method, and so won?t be affected. np.ndarray.round won?t be affected either. Only np_scalar_types.__round__ will be affected, which is what the Python round checks for.

For illustration, in code:

>>> type(round(np_float))
<class 'numpy.float64'>
>>> type(round(np_array_0d))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type numpy.ndarray doesn't define __round__ method
>>> type(round(np_array_nd))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: type numpy.ndarray doesn't define __round__ method

The second and third cases would remain unaffected. Only the first case would return a builtin Python int with what Robert Kern is suggesting and a np.int64 with what I?m suggesting. I do agree with something posted elsewhere on this thread that we should warn on overflow but prefer to be self-consistent and return a np.int64, but it doesn?t matter too much to me. Furthermore, the behavior of np.[a]round and np_arr.round(?) will not change. The only upcasting problem here is if someone does this in a loop, in which case they?re probably using Python objects and don?t care about memory anyway.

Maybe not a big problem but can have significant impact.

Best regards,
Hameer Abbasi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/ca20de5c/attachment.html>

From robert.kern at gmail.com  Thu Feb 27 09:46:23 2020
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 27 Feb 2020 09:46:23 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <AM0PR04MB39861CE124DBB02C42A6652FF7EB0@AM0PR04MB3986.eurprd04.prod.outlook.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
 <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>
 <CAF6FJisAH_ov+PFBEfS6h8hXLgHJ1wFzaGD3tX2e6CHnHHH=TA@mail.gmail.com>
 <CAEBuzr-AL0ogJQ26Bv_092h3EU=iSte2zWyam2QQ4v8AkUSbNQ@mail.gmail.com>
 <CAF6FJivwcBo3Nq4ngHrTkuoGdMzycFFjWRhK+Hg7Ts_yfQ+P3g@mail.gmail.com>
 <CAEBuzr-rM+7=iTrTfvqAy=02AO8amDPT-61jx=D-xNW-1X4_+Q@mail.gmail.com>
 <AM0PR04MB39861CE124DBB02C42A6652FF7EB0@AM0PR04MB3986.eurprd04.prod.outlook.com>
Message-ID: <CAF6FJiu2r0sJxNoAXQ4S3R=xie4KhiPnvFz359JrBa=6bxj1TQ@mail.gmail.com>

On Thu, Feb 27, 2020 at 4:49 AM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> Hello, Ilhan,
>
>
>
> *From: *NumPy-Discussion <numpy-discussion-bounces+einstein.edison=
> gmail.com at python.org> on behalf of Ilhan Polat <ilhanpolat at gmail.com>
> *Reply to: *Discussion of Numerical Python <numpy-discussion at python.org>
> *Date: *Thursday, 27. February 2020 at 08:41
> *To: *Discussion of Numerical Python <numpy-discussion at python.org>
> *Subject: *Re: [Numpy-discussion] Output type of round is inconsistent
> with python built-in
>
>
>
> Oh sorry. That's trigger finger np-dotting.
>
>
>
> What i mean is if someone was using the round method on float32 or other
> small bit datatypes they would have a silent upcasting.
>
>
>
> No they won?t. The only affected types would be scalars, and that too only
> with the built-in Python round.
>

Just to be clear, his example _did_ use numpy scalars.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/8bbb6004/attachment-0001.html>

From robert.kern at gmail.com  Thu Feb 27 09:47:58 2020
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 27 Feb 2020 09:47:58 -0500
Subject: [Numpy-discussion] Output type of round is inconsistent with
 python built-in
In-Reply-To: <CAEBuzr-rM+7=iTrTfvqAy=02AO8amDPT-61jx=D-xNW-1X4_+Q@mail.gmail.com>
References: <AM0PR04MB3986759232EC2EDD914032AEF7EA0@AM0PR04MB3986.eurprd04.prod.outlook.com>
 <CAF6FJiuHXn4menhXgeqXY1LQe1rSzuJHdMD+y6qppuB6x4rdCg@mail.gmail.com>
 <CAEBuzr8feGTOJ=w0eKDGd-Yb0nv+=zKfL+Uf5phKMMor=B-KmA@mail.gmail.com>
 <CAMMTP+BcsgMoVxU3ugdZU6r62bx9GxQ6CTXSWWK7kf9B3Bgu0Q@mail.gmail.com>
 <CAF6FJisAH_ov+PFBEfS6h8hXLgHJ1wFzaGD3tX2e6CHnHHH=TA@mail.gmail.com>
 <CAEBuzr-AL0ogJQ26Bv_092h3EU=iSte2zWyam2QQ4v8AkUSbNQ@mail.gmail.com>
 <CAF6FJivwcBo3Nq4ngHrTkuoGdMzycFFjWRhK+Hg7Ts_yfQ+P3g@mail.gmail.com>
 <CAEBuzr-rM+7=iTrTfvqAy=02AO8amDPT-61jx=D-xNW-1X4_+Q@mail.gmail.com>
Message-ID: <CAF6FJiskN1imNv4vU4qOmsrQESPnLCv+mhakiJWXqf0+s1bVBg@mail.gmail.com>

On Thu, Feb 27, 2020 at 2:43 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:

> Oh sorry. That's trigger finger np-dotting.
>
> What i mean is if someone was using the round method on float32 or other
> small bit datatypes they would have a silent upcasting.
>
> Maybe not a big problem but can have significant impact.
>

np.round()/np.around() will still exist and behave as you would want it to
in such cases (float32->float32, float64->float64).

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/9ee9ef33/attachment.html>

From mikofski at berkeley.edu  Thu Feb 27 16:27:16 2020
From: mikofski at berkeley.edu (Dr. Mark Alexander Mikofski PhD)
Date: Thu, 27 Feb 2020 13:27:16 -0800
Subject: [Numpy-discussion] [ANN] pvlib participating in GSoC
In-Reply-To: <CAEqRcW0iyT3LLg440c2rJroO8LMeVWZnRuQXegpUjYrNKqFnxg@mail.gmail.com>
References: <CAEqRcW0iyT3LLg440c2rJroO8LMeVWZnRuQXegpUjYrNKqFnxg@mail.gmail.com>
Message-ID: <CAEqRcW1GMU4VUaYxpk95+4D0oVbHYEY0x4Kk2MBZm-k6rdjoUA@mail.gmail.com>

Sorry, the correct link to the wiki list of project ideas is here:
https://github.com/pvlib/pvlib-python/wiki/GSoC-2020-Project

On Thu, Feb 27, 2020 at 12:21 AM Dr. Mark Alexander Mikofski PhD <
mikofski at berkeley.edu> wrote:

> Excited to announce that pvlib python
> <https://pvlib-python.readthedocs.io/en/latest/> is participating in its
> first ever Google Summer of Code GSoC under the NumFOCUS umbrella. If you
> are a student interested in modeling renewable solar energy please apply:
>
> https://summerofcode.withgoogle.com/organizations/4727917315096576/
>
>
> For project ideas, take a look at the pvlib github wiki:
>
> https://summerofcode.withgoogle.com/organizations/4727917315096576/
>
>
> and reach out on our Google Groups forum:
>
> https://groups.google.com/forum/#!forum/pvlib-python
>
>
> We look forward to hearing from you!
>
> --
> Mark Mikofski, PhD (2005)
> *Fiat Lux*
>


-- 
Mark Mikofski, PhD (2005)
*Fiat Lux*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/f2bc55a6/attachment.html>

From sebastian at sipsolutions.net  Thu Feb 27 17:41:15 2020
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 27 Feb 2020 17:41:15 -0500
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <CAEQ_TvddWqzU7EzFBJ6OHmD25t97gwXtCpx6s+WEUF3DX0jpdQ@mail.gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
 <CAEQ_Tvc9H33HeBmPJsaCrksEWSiOBRwURmXOZfVL=dgPkcExdQ@mail.gmail.com>
 <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net>
 <CAEQ_TvdWVT_hQD7H-xS+R7jXXfZOZY9HmGaMh9QLdovPqSvQMA@mail.gmail.com>
 <CABL7CQg6iGotFCcR3B6FJ=CO1QdrGVZefm_eknVn4tpsGi1yig@mail.gmail.com>
 <CAEQ_TvddWqzU7EzFBJ6OHmD25t97gwXtCpx6s+WEUF3DX0jpdQ@mail.gmail.com>
Message-ID: <e311bf09c8d35c85de646f3f904c6222d73fdb05.camel@sipsolutions.net>

On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote:
> > 
<snip>
> On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> 
> > Also, I'm still not sure I agree with the tone of the discussion on
> > this topic. It's very heavily inspired by what the JAX devs are
> > telling you (the NEP still says PyTorch and scipy.sparse as well,
> > but that's not true in both cases). If you ask Dask and CuPy for
> > example, they're quite happy with __array_function__ and there
> > haven't been many complaints about backwards compat breakage.
> > 
> 
> I'm linking to comments you wrote in reference to PyTorch and
> scipy.sparse in the current draft of the NEP, so I certainly want to
> make sure that you agree my characterization :).
> 
> Would it be fair to say:
> - JAX is reluctant to implement __array_function__ because of
> concerns about breaking existing code. JAX developers think that when
> users use NumPy functions on JAX arrays, they are explicitly choosing
> to convert from JAX to NumPy. This model is fundamentally
> incompatible __array_function__, which we chose to override the
> existing numpy namespace.
> - PyTorch and scipy.sparse are not yet in position to implement
> __array_function__ (due to a lack of a direct implementation of
> NumPy's API), but these projects take backwards compatibility
> seriously.
> 
> Does "take backwards compatibility seriously" sound about right to
> you? I'm very open to specific suggestions here. (TensorFlow could
> probably also be safely added to this second list.)
> 

Just to be clear, the way scikit-learn would probably be handling
backward compatibility concerns is by adding it to their configuration
context manager, see:

https://github.com/scikit-learn/scikit-learn/pull/16574

So the backward compat is in a sense solved (but there are project
specific context managers involved ? which is not perfect maybe, but
OK).

I am willing to consider pushing this off into its own namespace (and
package, preferably in the NumPy org though) if necessary, the idea
being that we keep it super minimal, and expand it as we go to keep up
with scikit-learn needs.

Possibly even with a function registration approach, so that you could
have import time checks on function availability and signature mismatch
easier.

I still do not like the idea of context managers much though, I think I
prefer the returned (bound) namespace a lot. Also I think we should
*not* do implicit dispatching.
Consider this case:


def numpy_only(x):
    x = np.asarray(x)
    return x + _helper(len(x))

def generic(x):
    module = np.get_array_module(x)
    x = module.asarray(x)
    return x + _helper(len(x))

def _helper(n, module=np):
    return module.random.unform(size=n)


If you try to make the above work with context managers, you _still_
need to pass in the module to _helper [1], because otherwise you would
have to change the `numpy_only` function to ensure an outside context
does not change its behaviour.


- Sebastian


[1] If "module" had a `module.set_backend()` and was a global instead
`_helper` using the global module would do the wrong thing for
`numpy_only`.
This is of course also a bit of an issue with the sklearn context
manager as well, but it seems to me _much_ less so, and probably not if
most libraries slowly switch over and currently use `np.asarray`.	


> Best,
> Stephan
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200227/4b59b5ea/attachment.sig>

From allanhaldane at gmail.com  Fri Feb 28 11:28:28 2020
From: allanhaldane at gmail.com (Allan Haldane)
Date: Fri, 28 Feb 2020 11:28:28 -0500
Subject: [Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like
 modules
In-Reply-To: <CABL7CQg6iGotFCcR3B6FJ=CO1QdrGVZefm_eknVn4tpsGi1yig@mail.gmail.com>
References: <CAEQ_Tvf4i0ACKMzP4fVW=FRXVXktuWqcHOk2gF0-zf46pGo_4Q@mail.gmail.com>
 <21692339-9f4b-029c-d422-ea549acbe6c3@gmail.com>
 <CAEQ_Tvc9H33HeBmPJsaCrksEWSiOBRwURmXOZfVL=dgPkcExdQ@mail.gmail.com>
 <1cfce715d48b847e91739c2a56b9750f15b1958f.camel@sipsolutions.net>
 <CAEQ_TvdWVT_hQD7H-xS+R7jXXfZOZY9HmGaMh9QLdovPqSvQMA@mail.gmail.com>
 <CABL7CQg6iGotFCcR3B6FJ=CO1QdrGVZefm_eknVn4tpsGi1yig@mail.gmail.com>
Message-ID: <39784d9f-17c0-d5b4-4575-3ad2826ea3ce@gmail.com>

On 2/23/20 6:59 PM, Ralf Gommers wrote:
> One of the main rationales for the whole NEP, and the argument in
> multiple places
> (https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users)
> is that it's now opt-in while __array_function__ was opt-out. This isn't
> really true - the problem is simply *moved*, from the duck array
> libraries to the array-consuming libraries. The end user will still see
> the backwards incompatible change, with no way to turn it off. It will
> be easier with __array_module__ to warn users, but this should be
> expanded on in the NEP.

Might it be possible to flip this NEP back to opt-out while keeping the
nice simplifications and configurabile array-creation routines, relative
to __array_function__?

That is, what if we define two modules, "numpy" and "numpy_strict".
"numpy_strict" would raise an exception on duck-arrays defining
__array_module__ (as numpy currently does). "numpy" would be a wrapper
around "numpy_strict" that decorates all numpy methods with a call to
"get_array_module(inputs).func(inputs)".

Then end-user code that did "import numpy as np" would accept ducktypes
by default, while library developers who want to signal they don't
support ducktypes can opt-out by doing "import numpy_strict as np".
Issues with `np.as_array` seem mitigated compared to __array_function__
since that method would now be ducktype-aware.

Cheers,
-Allan

From albuscode at gmail.com  Fri Feb 28 23:16:23 2020
From: albuscode at gmail.com (Inessa Pawson)
Date: Sat, 29 Feb 2020 14:16:23 +1000
Subject: [Numpy-discussion] help translating into Russian
Message-ID: <CAPBcLCEo1p2KG3L9_hZ3UbA7qP-831n4N95JGobDDP_QvRvr=Q@mail.gmail.com>

Our collaboration with the students and faculty from the Master?s program
in Survey Methodology at the University of Michigan and the University of
Maryland is underway. We are looking for a volunteer to translate the
survey questionnaire into Russian. If you are available, or you know
someone who would be interested to help, please leave a comment here:
https://github.com/numpy/numpy-surveys/issues/1.

-- 
Every good wish,
*Inessa Pawson *
Executive Director
Albus Code
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200229/9c686cde/attachment.html>