From ralf.gommers at gmail.com  Sat Jun  1 04:17:30 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 1 Jun 2019 10:17:30 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
Message-ID: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>

Hi all,

I have an idea that I've discussed with a few people in person, and the
feedback has generally been positive. So I'd like to bring it up here, to
get a sense of if this is going to fly. Note that this is NOT a proposal at
this point.

Idea in five words: define a NumPy API standard

Observations
------------
- Many libraries, both in Python and other languages, have APIs copied from
or inspired by NumPy.
- All of those APIs are incomplete, and many deviate from NumPy either by
accident or on purpose.
- The NumPy API is very large and ill-defined.

Libraries with a NumPy-like API
-------------------------------
In Python:
- GPU: Tensorflow, PyTorch, CuPy, MXNet
- distributed: Dask
- sparse: pydata/sparse
- other: tensorly, uarray/unumpy, ...

In other languages:
- JavaScript: numjs
- Go: Gonum
- Rust: rust-ndarray, rust-numpy
- C++: xtensor
- C: XND
- Java: ND4J
- C#: NumSharp, numpy.net
- Ruby: Narray, xnd-ruby
- R: Rray

This is an incomplete list. Xtensor and XND aim for multi-language support.
These libraries are of varying completeness, size and quality - everything
from one-person efforts that have just started, to large code bases that go
beyond NumPy in features or performance.

Idea
----
Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's
just a name for now), that
other libraries can use as a guide on what to implement and when to say
they are NumPy compatible.

In scope:
- Define a NumPy API standard, containing an N-dimensional array object and
a set of functions.
- List of functions and ndarray methods to include.
- Recommendations about where to deviate from NumPy (e.g. leave out array
scalars)

Out of scope, or to be treated separately:
- dtypes and casting
- (g)ufuncs
- function behavior (e.g. returning views vs. copies, which keyword
arguments to include)
- indexing behavior
- submodules (fft, random, linalg)

Who cares and why?
- Library authors: this saves them work and helps them make decisions.
- End users: consistency between libraries/languages, helps transfer
knowledge and understand code
- NumPy developers: gives them a vocabulary for "the NumPy API",
"compatible with NumPy", etc.

Risks:
- If not done well, we just add to the confusion rather than make things
better.
- Opportunity for endless amount of bikeshedding
- ?

Some more rationale:
We (NumPy devs) mostly have a shared understanding of what is "core NumPy
functionality", what we'd like to remove but are stuck with, what's not
used a whole lot, etc. Examples: financial functions don't belong, array
creation methods with weird names like np.r_ were a mistake. We are not
communicating this in any way though. Doing so would be helpful. Perhaps
this API standard could even have layers, to indicate what's really core,
what are secondary sets of functionality to include in other libraries, etc.

Discussion and next steps
-------------------------
What I'd like to get a sense of is:
- Is this a good idea to begin with?
- What should the scope be?
- What should the format be (a NEP, some other doc, defining in code)?

If this idea is well-received, I can try to draft a proposal during the
next month (help/volunteers welcome!). It can then be discussed at SciPy'19
- high-bandwidth communication may help to get a set of people on the same
page and hash out a lot of details.

Thoughts?

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/0641464f/attachment.html>

From njs at pobox.com  Sat Jun  1 05:32:10 2019
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 1 Jun 2019 02:32:10 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
Message-ID: <CAPJVwBnC7Ob8ergJwNb-+V2+24KAnraDfc+S+5+b7rOCyXKKmA@mail.gmail.com>

It's possible I'm not getting what you're thinking, but from what you
describe in your email I think it's a bad idea.

Standards take a tremendous amount of work (no really, an absurdly
massively huge amount of work, more than you can imagine if you
haven't done it). And they don't do what people usually hope they do.
Many many standards are written all the time that have zero effect on
reality, and the effort is wasted. They're really only useful when you
have to solve a coordination problem: lots of people want to do the
same thing as each other, whatever that is, but no-one knows what the
thing should be. That's not a problem at all for us, because numpy
already exists.

If you want to improve compatibility between Python libraries, then I
don't think it will be relevant. Users aren't writing code against
"the numpy standard", they're not testing their libraries against "the
numpy standard", they're using/testing against numpy. If library
authors want to be compatible with numpy, they need to match what
numpy does, not what some document says. OTOH if they think they have
a better idea and its worth breaking compatibility, they're going to
do it regardless of what some document somewhere says.

If you want to share the lessons learned from numpy in the hopes of
improving future libraries that don't care about numpy compatibility
per se, in python or other languages, then that seems like a great
idea! But that's not a standard, that's a journal article called
something like "NumPy: A retrospective". Other languages aren't going
to match numpy one-to-one anyway, because they'll be adapting things
to their language's idioms; they certainly don't care about whether
you decided 'newaxis MUST be defined to None' or merely 'SHOULD' be
defined to None.

IMO if you try the most likely outcome will be that it will suck up a
lot of energy writing it, and then the only effect is that everyone
will keep doing what they would have done anyway but now with extra
self-righteousness and yelling depending on whether that turns out to
match the standard or not.

-n

On Sat, Jun 1, 2019 at 1:18 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
> Hi all,
>
> I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point.
>
> Idea in five words: define a NumPy API standard
>
> Observations
> ------------
> - Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy.
> - All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose.
> - The NumPy API is very large and ill-defined.
>
> Libraries with a NumPy-like API
> -------------------------------
> In Python:
> - GPU: Tensorflow, PyTorch, CuPy, MXNet
> - distributed: Dask
> - sparse: pydata/sparse
> - other: tensorly, uarray/unumpy, ...
>
> In other languages:
> - JavaScript: numjs
> - Go: Gonum
> - Rust: rust-ndarray, rust-numpy
> - C++: xtensor
> - C: XND
> - Java: ND4J
> - C#: NumSharp, numpy.net
> - Ruby: Narray, xnd-ruby
> - R: Rray
>
> This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance.
>
> Idea
> ----
> Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that
> other libraries can use as a guide on what to implement and when to say they are NumPy compatible.
>
> In scope:
> - Define a NumPy API standard, containing an N-dimensional array object and a set of functions.
> - List of functions and ndarray methods to include.
> - Recommendations about where to deviate from NumPy (e.g. leave out array scalars)
>
> Out of scope, or to be treated separately:
> - dtypes and casting
> - (g)ufuncs
> - function behavior (e.g. returning views vs. copies, which keyword arguments to include)
> - indexing behavior
> - submodules (fft, random, linalg)
>
> Who cares and why?
> - Library authors: this saves them work and helps them make decisions.
> - End users: consistency between libraries/languages, helps transfer knowledge and understand code
> - NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc.
>
> Risks:
> - If not done well, we just add to the confusion rather than make things better.
> - Opportunity for endless amount of bikeshedding
> - ?
>
> Some more rationale:
> We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc.
>
> Discussion and next steps
> -------------------------
> What I'd like to get a sense of is:
> - Is this a good idea to begin with?
> - What should the scope be?
> - What should the format be (a NEP, some other doc, defining in code)?
>
> If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details.
>
> Thoughts?
>
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


-- 
Nathaniel J. Smith -- https://vorpus.org

From einstein.edison at gmail.com  Sat Jun  1 08:23:06 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sat, 1 Jun 2019 14:23:06 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <7c8e7851-1b17-419c-b453-d2ac08f3e569@Canary>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <CAPJVwBnC7Ob8ergJwNb-+V2+24KAnraDfc+S+5+b7rOCyXKKmA@mail.gmail.com>
 <7c8e7851-1b17-419c-b453-d2ac08f3e569@Canary>
Message-ID: <5814a804-d96d-4c89-8154-84b2bb438b67@Canary>

I think this hits the crux of the issue... There is a huge coordination problem. Users want to move their code from NumPy to Sparse or Dask all the time, but it?s not trivial to do. And libraries like sparse and Dask want to follow a standard (or at least hoped there was one) before they existed.

Maybe I think the issue is bigger than it really is, but there?s definitely a coordination problem.

See the section in the original email on ?who cares and why?...

Best Regards,
Hameer Abbasi

> On Saturday, Jun 01, 2019 at 11:32 AM, Nathaniel Smith <njs at pobox.com (mailto:njs at pobox.com)> wrote:
> [snip]
>
> That's not a problem at all for us, because numpy
> already exists.
>
> [snip]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/23246df2/attachment.html>

From wrw at mac.com  Sat Jun  1 10:24:17 2019
From: wrw at mac.com (William Ray Wing)
Date: Sat, 1 Jun 2019 10:24:17 -0400
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
Message-ID: <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>


> On Jun 1, 2019, at 4:17 AM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
> 
> Hi all,
> 
> I have an idea that I've discussed with a few people in person, and the feedback has generally been positive. So I'd like to bring it up here, to get a sense of if this is going to fly. Note that this is NOT a proposal at this point.
> 
> Idea in five words: define a NumPy API standard
> 

As an amateur user of Numpy (hobby programming), and at the opposite end of the spectrum from the Numpy development team, I?d like to raise my hand and applaud this idea.  I think it would make my use of Numpy significantly easier if an API not only specified the basic API structure, but also regularized it to the extent possible.

Thanks,
Bill Wing

> Observations
> ------------
> - Many libraries, both in Python and other languages, have APIs copied from or inspired by NumPy.
> - All of those APIs are incomplete, and many deviate from NumPy either by accident or on purpose.
> - The NumPy API is very large and ill-defined.
> 
> Libraries with a NumPy-like API
> -------------------------------
> In Python:
> - GPU: Tensorflow, PyTorch, CuPy, MXNet
> - distributed: Dask
> - sparse: pydata/sparse
> - other: tensorly, uarray/unumpy, ...
> 
> In other languages:
> - JavaScript: numjs
> - Go: Gonum 
> - Rust: rust-ndarray, rust-numpy
> - C++: xtensor
> - C: XND
> - Java: ND4J
> - C#: NumSharp, numpy.net <http://numpy.net/>
> - Ruby: Narray, xnd-ruby
> - R: Rray
> 
> This is an incomplete list. Xtensor and XND aim for multi-language support. These libraries are of varying completeness, size and quality - everything from one-person efforts that have just started, to large code bases that go beyond NumPy in features or performance.
> 
> Idea
> ----
> Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's just a name for now), that
> other libraries can use as a guide on what to implement and when to say they are NumPy compatible.
> 
> In scope:
> - Define a NumPy API standard, containing an N-dimensional array object and a set of functions.
> - List of functions and ndarray methods to include.
> - Recommendations about where to deviate from NumPy (e.g. leave out array scalars)
> 
> Out of scope, or to be treated separately:
> - dtypes and casting
> - (g)ufuncs
> - function behavior (e.g. returning views vs. copies, which keyword arguments to include)
> - indexing behavior
> - submodules (fft, random, linalg)
> 
> Who cares and why?
> - Library authors: this saves them work and helps them make decisions.
> - End users: consistency between libraries/languages, helps transfer knowledge and understand code
> - NumPy developers: gives them a vocabulary for "the NumPy API", "compatible with NumPy", etc.
> 
> Risks:
> - If not done well, we just add to the confusion rather than make things better.
> - Opportunity for endless amount of bikeshedding
> - ?
> 
> Some more rationale:
> We (NumPy devs) mostly have a shared understanding of what is "core NumPy functionality", what we'd like to remove but are stuck with, what's not used a whole lot, etc. Examples: financial functions don't belong, array creation methods with weird names like np.r_ were a mistake. We are not communicating this in any way though. Doing so would be helpful. Perhaps this API standard could even have layers, to indicate what's really core, what are secondary sets of functionality to include in other libraries, etc.
> 
> Discussion and next steps
> -------------------------
> What I'd like to get a sense of is:
> - Is this a good idea to begin with?
> - What should the scope be?
> - What should the format be (a NEP, some other doc, defining in code)?
> 
> If this idea is well-received, I can try to draft a proposal during the next month (help/volunteers welcome!). It can then be discussed at SciPy'19 - high-bandwidth communication may help to get a set of people on the same page and hash out a lot of details. 
> 
> Thoughts?
> 
> Ralf
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/e7255264/attachment.html>

From m.h.vankerkwijk at gmail.com  Sat Jun  1 12:11:38 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 1 Jun 2019 12:11:38 -0400
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
Message-ID: <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>

Hi Ralf,

Despite sharing Nathaniel's doubts about the ease of defining the numpy API
and the likelihood of people actually sticking to a limited subset of what
numpy exposes, I quite like the actual things you propose to do!

But my liking it is for reasons that are different from your stated ones: I
think the proposed actions are likely to benefit greatly  both for users
(like Bill above) and current and prospective developers.  To me, it seems
almost as a side benefit (if a very nice one) that it might help other
projects to share an API; a larger benefit may come from tapping into the
experience of other projects in thinking about what are the true  basic
functions/method that one should have.

More concretely, to address Nathaniel's (very reasonable) worry about
ending up wasting a lot of time, I think it may be good to identify smaller
parts, each of which are useful on their own.

In this respect, I think an excellent place to start might be something you
are planning already anyway: update the user documentation. Doing this will
necessarily require thinking about, e.g., what `ndarray` methods and
properties are actually fundamental, as you only want to focus on a few.
With that in place, one could then, as you suggest, reorganize the
reference documentation to put those most important properties up front,
and ones that we really think are mistakes at the bottom, with explanations
of why we think so and what the alternative is. Also for the reference
documentation, it would help to group functions more logically.

The above could lead to three next steps, all of which I think would be
useful. First, for (prospective) developers as well as for future
maintenance, I think it would be quite a large benefit if we (slowly but
surely) rewrote code that implements the less basic functionality in terms
of more basic functions (e.g., replace use of `array.fill(...)` or
`np.copyto(array, ...)` with `array[...] =`).

Second, we could update Nathaniel's NEP about distinct properties duck
arrays might want to mimic/implement.

Third, we could actual implementing the logical groupings identified in the
code base (and describing them!). Currently, it is a mess: for the C files,
I typically have to grep to even find where things are done, and while for
the functions defined in python files that is not necessary, many have
historical rather than logical groupings (looking at you, `from_numeric`!),
and even more descriptive ones like `shape_base` are split over `lib` and
`core`. I think it would help everybody if we went to a python-like layout,
with a true core and libraries such as polynomial, fft, ma, etc.

Anyway, re-reading your message, I realize the above is not really what you
wrote about, so perhaps this is irrelevant...

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/663959b1/attachment.html>

From ralf.gommers at gmail.com  Sat Jun  1 12:12:26 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 1 Jun 2019 18:12:26 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAPJVwBnC7Ob8ergJwNb-+V2+24KAnraDfc+S+5+b7rOCyXKKmA@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <CAPJVwBnC7Ob8ergJwNb-+V2+24KAnraDfc+S+5+b7rOCyXKKmA@mail.gmail.com>
Message-ID: <CABL7CQgb0sbmpn9KKgy9DA+tBQR_P_hMRsZOTVGDmckP9oeeKw@mail.gmail.com>

On Sat, Jun 1, 2019 at 11:32 AM Nathaniel Smith <njs at pobox.com> wrote:

> It's possible I'm not getting what you're thinking, but from what you
> describe in your email I think it's a bad idea.
>

Hi Nathaniel, I think you are indeed not getting what I meant and are just
responding to the word "standard".

I'll give a concrete example. Here is the xtensor to numpy comparison:
https://xtensor.readthedocs.io/en/latest/numpy.html. The xtensor authors
clearly have made sane choices, but they did have to spend a lot of effort
making those choices - what to include and what not.

Now, the XND team is just starting to build out their Python API. Hameer is
building out unumpy. There's all the other arrays libraries I mentioned. We
can say "sort it out yourself, make your own choices", or we can provide
some guidance. So far the authors of those libaries I have asked say they
would appreciate the guidance.

Cheers,
Ralf


> Standards take a tremendous amount of work (no really, an absurdly
> massively huge amount of work, more than you can imagine if you
> haven't done it). And they don't do what people usually hope they do.
> Many many standards are written all the time that have zero effect on
> reality, and the effort is wasted. They're really only useful when you
> have to solve a coordination problem: lots of people want to do the
> same thing as each other, whatever that is, but no-one knows what the
> thing should be. That's not a problem at all for us, because numpy
> already exists.
>
> If you want to improve compatibility between Python libraries, then I
> don't think it will be relevant. Users aren't writing code against
> "the numpy standard", they're not testing their libraries against "the
> numpy standard", they're using/testing against numpy. If library
> authors want to be compatible with numpy, they need to match what
> numpy does, not what some document says. OTOH if they think they have
> a better idea and its worth breaking compatibility, they're going to
> do it regardless of what some document somewhere says.
>
> If you want to share the lessons learned from numpy in the hopes of
> improving future libraries that don't care about numpy compatibility
> per se, in python or other languages, then that seems like a great
> idea! But that's not a standard, that's a journal article called
> something like "NumPy: A retrospective". Other languages aren't going
> to match numpy one-to-one anyway, because they'll be adapting things
> to their language's idioms; they certainly don't care about whether
> you decided 'newaxis MUST be defined to None' or merely 'SHOULD' be
> defined to None.
>
> IMO if you try the most likely outcome will be that it will suck up a
> lot of energy writing it, and then the only effect is that everyone
> will keep doing what they would have done anyway but now with extra
> self-righteousness and yelling depending on whether that turns out to
> match the standard or not.
>
> -n
>
> On Sat, Jun 1, 2019 at 1:18 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I have an idea that I've discussed with a few people in person, and the
> feedback has generally been positive. So I'd like to bring it up here, to
> get a sense of if this is going to fly. Note that this is NOT a proposal at
> this point.
> >
> > Idea in five words: define a NumPy API standard
> >
> > Observations
> > ------------
> > - Many libraries, both in Python and other languages, have APIs copied
> from or inspired by NumPy.
> > - All of those APIs are incomplete, and many deviate from NumPy either
> by accident or on purpose.
> > - The NumPy API is very large and ill-defined.
> >
> > Libraries with a NumPy-like API
> > -------------------------------
> > In Python:
> > - GPU: Tensorflow, PyTorch, CuPy, MXNet
> > - distributed: Dask
> > - sparse: pydata/sparse
> > - other: tensorly, uarray/unumpy, ...
> >
> > In other languages:
> > - JavaScript: numjs
> > - Go: Gonum
> > - Rust: rust-ndarray, rust-numpy
> > - C++: xtensor
> > - C: XND
> > - Java: ND4J
> > - C#: NumSharp, numpy.net
> > - Ruby: Narray, xnd-ruby
> > - R: Rray
> >
> > This is an incomplete list. Xtensor and XND aim for multi-language
> support. These libraries are of varying completeness, size and quality -
> everything from one-person efforts that have just started, to large code
> bases that go beyond NumPy in features or performance.
> >
> > Idea
> > ----
> > Define a standard for "the NumPy API" (or "NumPy core API", or .... -
> it's just a name for now), that
> > other libraries can use as a guide on what to implement and when to say
> they are NumPy compatible.
> >
> > In scope:
> > - Define a NumPy API standard, containing an N-dimensional array object
> and a set of functions.
> > - List of functions and ndarray methods to include.
> > - Recommendations about where to deviate from NumPy (e.g. leave out
> array scalars)
> >
> > Out of scope, or to be treated separately:
> > - dtypes and casting
> > - (g)ufuncs
> > - function behavior (e.g. returning views vs. copies, which keyword
> arguments to include)
> > - indexing behavior
> > - submodules (fft, random, linalg)
> >
> > Who cares and why?
> > - Library authors: this saves them work and helps them make decisions.
> > - End users: consistency between libraries/languages, helps transfer
> knowledge and understand code
> > - NumPy developers: gives them a vocabulary for "the NumPy API",
> "compatible with NumPy", etc.
> >
> > Risks:
> > - If not done well, we just add to the confusion rather than make things
> better.
> > - Opportunity for endless amount of bikeshedding
> > - ?
> >
> > Some more rationale:
> > We (NumPy devs) mostly have a shared understanding of what is "core
> NumPy functionality", what we'd like to remove but are stuck with, what's
> not used a whole lot, etc. Examples: financial functions don't belong,
> array creation methods with weird names like np.r_ were a mistake. We are
> not communicating this in any way though. Doing so would be helpful.
> Perhaps this API standard could even have layers, to indicate what's really
> core, what are secondary sets of functionality to include in other
> libraries, etc.
> >
> > Discussion and next steps
> > -------------------------
> > What I'd like to get a sense of is:
> > - Is this a good idea to begin with?
> > - What should the scope be?
> > - What should the format be (a NEP, some other doc, defining in code)?
> >
> > If this idea is well-received, I can try to draft a proposal during the
> next month (help/volunteers welcome!). It can then be discussed at SciPy'19
> - high-bandwidth communication may help to get a set of people on the same
> page and hash out a lot of details.
> >
> > Thoughts?
> >
> > Ralf
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/1fe756d0/attachment-0001.html>

From ralf.gommers at gmail.com  Sat Jun  1 12:27:10 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 1 Jun 2019 18:27:10 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
Message-ID: <CABL7CQh5UrtN_fzhfFJuQec1dAiRZpaz6FbXcJiCA6qsV37Ypw@mail.gmail.com>

On Sat, Jun 1, 2019 at 6:12 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf,
>
> Despite sharing Nathaniel's doubts about the ease of defining the numpy
> API and the likelihood of people actually sticking to a limited subset of
> what numpy exposes, I quite like the actual things you propose to do!
>
> But my liking it is for reasons that are different from your stated ones:
> I think the proposed actions are likely to benefit greatly  both for users
> (like Bill above) and current and prospective developers.  To me, it seems
> almost as a side benefit (if a very nice one) that it might help other
> projects to share an API; a larger benefit may come from tapping into the
> experience of other projects in thinking about what are the true  basic
> functions/method that one should have.
>

Agreed, there is some reverse learning there as well. Projects like Dask
and Xtensor already went through making these choices, which can teach us
as NumPy developers some lessons.


> More concretely, to address Nathaniel's (very reasonable) worry about
> ending up wasting a lot of time, I think it may be good to identify smaller
> parts, each of which are useful on their own.
>
> In this respect, I think an excellent place to start might be something
> you are planning already anyway: update the user documentation. Doing this
> will necessarily require thinking about, e.g., what `ndarray` methods and
> properties are actually fundamental, as you only want to focus on a few.
> With that in place, one could then, as you suggest, reorganize the
> reference documentation to put those most important properties up front,
> and ones that we really think are mistakes at the bottom, with explanations
> of why we think so and what the alternative is. Also for the reference
> documentation, it would help to group functions more logically.
>

That perhaps another rationale for doing this. The docs are likely to get a
fairly major overhaul this year. If we don't write down a coherent plan
then we're just going to make very similar decisions as when we'd write up
a "standard", just ad hoc and with much less review.


> The above could lead to three next steps, all of which I think would be
> useful. First, for (prospective) developers as well as for future
> maintenance, I think it would be quite a large benefit if we (slowly but
> surely) rewrote code that implements the less basic functionality in terms
> of more basic functions (e.g., replace use of `array.fill(...)` or
> `np.copyto(array, ...)` with `array[...] =`).
>

That could indeed be nice. I think Travis referred to this as defining an
"RNumPy" (similar to RPython as a subset of Python).


> Second, we could update Nathaniel's NEP about distinct properties duck
> arrays might want to mimic/implement.
>

I wasn't thinking about that indeed, but agreed that it could be helpful.


> Third, we could actual implementing the logical groupings identified in
> the code base (and describing them!). Currently, it is a mess: for the C
> files, I typically have to grep to even find where things are done, and
> while for the functions defined in python files that is not necessary, many
> have historical rather than logical groupings (looking at you,
> `from_numeric`!), and even more descriptive ones like `shape_base` are
> split over `lib` and `core`. I think it would help everybody if we went to
> a python-like layout, with a true core and libraries such as polynomial,
> fft, ma, etc.
>

I'd really like this. Also to have sane namespace in numpy, and a basis for
putting something in numpy.lib vs the main namespace vs some other
namespace (there are a couple of semi-public ones).


> Anyway, re-reading your message, I realize the above is not really what
> you wrote about, so perhaps this is irrelevant...
>

Not irrelevant, I think you're making some good points.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/50b3a732/attachment.html>

From charlesr.harris at gmail.com  Sat Jun  1 12:31:13 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 1 Jun 2019 10:31:13 -0600
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
Message-ID: <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>

On Sat, Jun 1, 2019 at 10:12 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf,
>
> Despite sharing Nathaniel's doubts about the ease of defining the numpy
> API and the likelihood of people actually sticking to a limited subset of
> what numpy exposes, I quite like the actual things you propose to do!
>
> But my liking it is for reasons that are different from your stated ones:
> I think the proposed actions are likely to benefit greatly  both for users
> (like Bill above) and current and prospective developers.  To me, it seems
> almost as a side benefit (if a very nice one) that it might help other
> projects to share an API; a larger benefit may come from tapping into the
> experience of other projects in thinking about what are the true  basic
> functions/method that one should have.
>

I generally agree with this. The most useful aspect of this exercise is
likely to be clarifying NumPy for its own developers, and maybe offering a
guide to future simplification. Trying to put something together that
everyone agrees to as an official standard would be a big project and, as
Nathaniel points out, would involve an enormous amount of work, much time,
and doubtless many arguments.  What might be a less ambitious exercise
would be identifying commonalities in the current numpy-like languages.
That would have the advantage of feedback from actual user experience, and
would be more like a lessons learned document that would be helpful to
others.


> More concretely, to address Nathaniel's (very reasonable) worry about
> ending up wasting a lot of time, I think it may be good to identify smaller
> parts, each of which are useful on their own.
>
> In this respect, I think an excellent place to start might be something
> you are planning already anyway: update the user documentation. Doing this
> will necessarily require thinking about, e.g., what `ndarray` methods and
> properties are actually fundamental, as you only want to focus on a few.
> With that in place, one could then, as you suggest, reorganize the
> reference documentation to put those most important properties up front,
> and ones that we really think are mistakes at the bottom, with explanations
> of why we think so and what the alternative is. Also for the reference
> documentation, it would help to group functions more logically.
>

I keep thinking duck type. Or in this case, duck type lite.


> The above could lead to three next steps, all of which I think would be
> useful. First, for (prospective) developers as well as for future
> maintenance, I think it would be quite a large benefit if we (slowly but
> surely) rewrote code that implements the less basic functionality in terms
> of more basic functions (e.g., replace use of `array.fill(...)` or
> `np.copyto(array, ...)` with `array[...] =`).
>
>
I've had similar thoughts.


> Second, we could update Nathaniel's NEP about distinct properties duck
> arrays might want to mimic/implement.
>
>
Yes.


> Third, we could actual implementing the logical groupings identified in
> the code base (and describing them!). Currently, it is a mess: for the C
> files, I typically have to grep to even find where things are done, and
> while for the functions defined in python files that is not necessary, many
> have historical rather than logical groupings (looking at you,
> `from_numeric`!), and even more descriptive ones like `shape_base` are
> split over `lib` and `core`. I think it would help everybody if we went to
> a python-like layout, with a true core and libraries such as polynomial,
> fft, ma, etc.
>
> Anyway, re-reading your message, I realize the above is not really what
> you wrote about, so perhaps this is irrelevant...
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/c4eca86f/attachment-0001.html>

From matti.picus at gmail.com  Sat Jun  1 14:45:38 2019
From: matti.picus at gmail.com (Matti Picus)
Date: Sat, 1 Jun 2019 21:45:38 +0300
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
Message-ID: <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>

On 1/6/19 7:31 pm, Charles R Harris wrote:
> I generally agree with this. The most useful aspect of this exercise 
> is likely to be clarifying NumPy for its own developers, and maybe 
> offering a guide to future simplification. Trying to put something 
> together that everyone agrees to as an official standard would be a 
> big project and, as Nathaniel points out, would involve an enormous 
> amount of work, much time, and doubtless many arguments.? What might 
> be a less ambitious exercise would be identifying commonalities in the 
> current numpy-like languages. That would have the advantage of 
> feedback from actual user experience, and would be more like a lessons 
> learned document that would be helpful to others.
>
>
>     More concretely, to address Nathaniel's (very reasonable) worry
>     about ending up wasting a lot of time, I think it may be good to
>     identify smaller parts, each of which are useful on their own.
>
>     In this respect, I think an excellent place to start might be
>     something you are planning already anyway: update the user
>     documentation
>

I would include tests as well. Rather than hammer out a full standard 
based on extensive discussions and negotiations, I would suggest NumPy 
might be able set a de-facto "standard" based on pieces of the the 
current numpy user documentation and test suite. Then other projects 
could use "passing the tests" as an indication that they implement the 
NumPy API, and could refer to the documentation where appropriate. Once 
we have a base repo under numpy with tests and documentations for the 
generally accepted baseline interfaces. we can discuss on a case-by-case 
basis via pull requests and issues whether other interfaces should be 
included. If we find general classes of similarity that can be concisely 
described but not all duckarray packages support (structured arrays, for 
instance), these could become test-specifiers `@pytest.skipif(not 
HAVE_STRUCTURED_ARRAYS)`, the tests and documentation would only apply 
if that specifier exists.


Matti


From njs at pobox.com  Sat Jun  1 15:43:19 2019
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 1 Jun 2019 12:43:19 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <5814a804-d96d-4c89-8154-84b2bb438b67@Canary>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <CAPJVwBnC7Ob8ergJwNb-+V2+24KAnraDfc+S+5+b7rOCyXKKmA@mail.gmail.com>
 <7c8e7851-1b17-419c-b453-d2ac08f3e569@Canary>
 <5814a804-d96d-4c89-8154-84b2bb438b67@Canary>
Message-ID: <CAPJVwB=49Fef1a6u0OTL0h==3WptE+3XvLEQQ+pPcbVnPyYL3A@mail.gmail.com>

On Sat, Jun 1, 2019, 05:23 Hameer Abbasi <einstein.edison at gmail.com> wrote:

> I think this hits the crux of the issue... There *is* a huge coordination
> problem. Users want to move their code from NumPy to Sparse or Dask all the
> time, but it?s not trivial to do. And libraries like sparse and Dask want
> to follow a standard (or at least hoped there was one) before they existed.
>

Those are big problems, but they aren't coordination problems :-)

https://en.m.wikipedia.org/wiki/Coordination_game

If you and I both each have our own unrelated code that we want to move to
Sparse, then we don't have to talk to each other and agree on how to do it.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/d899fc0b/attachment.html>

From ralf.gommers at gmail.com  Sat Jun  1 16:04:32 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 1 Jun 2019 22:04:32 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
Message-ID: <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>

On Sat, Jun 1, 2019 at 8:46 PM Matti Picus <matti.picus at gmail.com> wrote:

> On 1/6/19 7:31 pm, Charles R Harris wrote:
> > I generally agree with this. The most useful aspect of this exercise
> > is likely to be clarifying NumPy for its own developers, and maybe
> > offering a guide to future simplification. Trying to put something
> > together that everyone agrees to as an official standard would be a
> > big project and, as Nathaniel points out, would involve an enormous
> > amount of work, much time, and doubtless many arguments.  What might
> > be a less ambitious exercise would be identifying commonalities in the
> > current numpy-like languages. That would have the advantage of
> > feedback from actual user experience, and would be more like a lessons
> > learned document that would be helpful to others.
> >
> >
> >     More concretely, to address Nathaniel's (very reasonable) worry
> >     about ending up wasting a lot of time, I think it may be good to
> >     identify smaller parts, each of which are useful on their own.
> >
> >     In this respect, I think an excellent place to start might be
> >     something you are planning already anyway: update the user
> >     documentation
> >
>
> I would include tests as well. Rather than hammer out a full standard
> based on extensive discussions and negotiations, I would suggest NumPy
> might be able set a de-facto "standard" based on pieces of the the
> current numpy user documentation and test suite.


I think this is potentially useful, but *far* more prescriptive and
detailed than I had in mind. Both you and Nathaniel seem to have not
understood what I mean by "out of scope", so I think that's my fault in not
being explicit enough. I *do not* want to prescribe behavior. Instead, a
simple yes/no for each function in numpy and method on ndarray.

Our API is huge. A simple count:
main namespace: 600
fft: 30
linalg: 30
random: 60
ndarray: 70
lib: 20
lib.npyio: 35
etc. (many more ill-thought out but not clearly private submodules)

Just the main namespace plus ndarray methods is close to 700 objects. If
you want to build a NumPy-like thing, that's 700 decisions to make. I'm
suggesting something as simple as a list of functions that constitute a
sensible "core of NumPy". That list would not include anything in
fft/linalg/random, since those can easily be separated out (indeed, if we
could disappear fft and linalg and just rely on scipy, pyfftw etc., that
would be great). It would not include financial functions. And so on. I
guess we'd end up with most ndarray methods plus <150 functions.

That list could be used for many purposes: improve the docs, serve as the
set of functions to implement in xnd.array, unumpy & co, Marten's
suggestion of implementing other functions in terms of basic functions, etc.

Two other thoughts:
1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly
muddled. When new functions are proposed, it's decided on on a case-by-case
basis, usually without a guiding principle. We need to improve that. A
"core of NumPy" list could be a part of that puzzle.
2. We often argue about deprecations. Deprecations are costly, but so is
keeping around functions that are not very useful or have a poor design.
This may offer a middle ground. Don't let others repeat our mistakes,
signal to users that a function is of questionable value, without breaking
already written code.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/a8e81a50/attachment.html>

From njs at pobox.com  Sat Jun  1 16:30:53 2019
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 1 Jun 2019 13:30:53 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQgb0sbmpn9KKgy9DA+tBQR_P_hMRsZOTVGDmckP9oeeKw@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <CAPJVwBnC7Ob8ergJwNb-+V2+24KAnraDfc+S+5+b7rOCyXKKmA@mail.gmail.com>
 <CABL7CQgb0sbmpn9KKgy9DA+tBQR_P_hMRsZOTVGDmckP9oeeKw@mail.gmail.com>
Message-ID: <CAPJVwB=YnWdtEnwO0zCAUJNZrK2UWxnEwcLM5kpqv4vXQgkCzw@mail.gmail.com>

On Sat, Jun 1, 2019, 09:13 Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Sat, Jun 1, 2019 at 11:32 AM Nathaniel Smith <njs at pobox.com> wrote:
>
>> It's possible I'm not getting what you're thinking, but from what you
>> describe in your email I think it's a bad idea.
>>
>
> Hi Nathaniel, I think you are indeed not getting what I meant and are just
> responding to the word "standard".
>

Well, that's the word you chose :-)

I think it's very possible that what you're thinking is a good idea, but
it's actually something else, like better high-level documentation, or a
NEP documenting things we wish we did differently but are stuck with, or a
generic duck array test suite to improve compatibility and make it easier
to bootstrap new libraries, etc.

The word "standard" is tricky:

- it has a pretty precise technical meaning that is different from all of
those things, so if those are what you want then it's a bad word to use.

- it's a somewhat arcane niche of engineering practice that a lot of people
don't have direct experience with, so there are a ton of people with vague
and magical ideas about how standards work, and if you use the word then
they'll start expecting all kinds of things. (See the response up thread
where someone thinks that you just proposed to make a bunch of incompatible
changes to numpy.) This makes it difficult to have a productive discussion,
because everyone is misinterpreting each other.

I bet if we can articulate more precisely what exactly you're hoping to
accomplish, then we'll also be able to figure out specific concrete actions
that will help, and they won't involve the word "standard". For example:


> I'll give a concrete example. Here is the xtensor to numpy comparison:
> https://xtensor.readthedocs.io/en/latest/numpy.html. The xtensor authors
> clearly have made sane choices, but they did have to spend a lot of effort
> making those choices - what to include and what not.
>
> Now, the XND team is just starting to build out their Python API. Hameer
> is building out unumpy. There's all the other arrays libraries I mentioned.
> We can say "sort it out yourself, make your own choices", or we can provide
> some guidance. So far the authors of those libaries I have asked say they
> would appreciate the guidance.
>

That sounds great. Maybe you want... a mailing list or a forum for array
library implementors to compare notes? ("So we ran into this unexpected
problem implementing einsum, how did you handle it? And btw numpy devs, why
is it like that in the first place?") Maybe you want someone to write up a
review of existing APIs like xtensor, dask, xarray, sparse, ... to see
where they deviated from numpy and if there are any commonalities? Or
someone could do an analysis of existing code and publish tables of how
often different features are used, so array implementors can make better
choices about what to implement first? Or maybe just encouraging Hameer to
be really proactive about sharing drafts and gathering feedback here?

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/3eee2bb1/attachment-0001.html>

From ralf.gommers at gmail.com  Sat Jun  1 17:30:30 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 1 Jun 2019 23:30:30 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAPJVwB=YnWdtEnwO0zCAUJNZrK2UWxnEwcLM5kpqv4vXQgkCzw@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <CAPJVwBnC7Ob8ergJwNb-+V2+24KAnraDfc+S+5+b7rOCyXKKmA@mail.gmail.com>
 <CABL7CQgb0sbmpn9KKgy9DA+tBQR_P_hMRsZOTVGDmckP9oeeKw@mail.gmail.com>
 <CAPJVwB=YnWdtEnwO0zCAUJNZrK2UWxnEwcLM5kpqv4vXQgkCzw@mail.gmail.com>
Message-ID: <CABL7CQifD0jg_+m_bMPpBEWsPYX930Vte-7d9d3EqNdGN-HM5Q@mail.gmail.com>

On Sat, Jun 1, 2019 at 10:32 PM Nathaniel Smith <njs at pobox.com> wrote:

> On Sat, Jun 1, 2019, 09:13 Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>>
>>
>> On Sat, Jun 1, 2019 at 11:32 AM Nathaniel Smith <njs at pobox.com> wrote:
>>
>>> It's possible I'm not getting what you're thinking, but from what you
>>> describe in your email I think it's a bad idea.
>>>
>>
>> Hi Nathaniel, I think you are indeed not getting what I meant and are
>> just responding to the word "standard".
>>
>
> Well, that's the word you chose :-)
>

It's just one word out of 100 line email. I'm happy to retract it. Please
pretend it wasn't there and re-read the rest. Replace it with the list of
functions that I propose in my previous email.


> I think it's very possible that what you're thinking is a good idea, but
> it's actually something else, like better high-level documentation, or a
> NEP documenting things we wish we did differently but are stuck with, or a
> generic duck array test suite to improve compatibility and make it easier
> to bootstrap new libraries, etc.
>
> The word "standard" is tricky:
>
> - it has a pretty precise technical meaning that is different from all of
> those things, so if those are what you want then it's a bad word to use.
>
> - it's a somewhat arcane niche of engineering practice that a lot of
> people don't have direct experience with, so there are a ton of people with
> vague and magical ideas about how standards work, and if you use the word
> then they'll start expecting all kinds of things. (See the response up
> thread where someone thinks that you just proposed to make a bunch of
> incompatible changes to numpy.) This makes it difficult to have a
> productive discussion, because everyone is misinterpreting each other.
>
> I bet if we can articulate more precisely what exactly you're hoping to
> accomplish,
>

Please see my email of 1 hour ago.

>
>
>> I'll give a concrete example. Here is the xtensor to numpy comparison:
>> https://xtensor.readthedocs.io/en/latest/numpy.html. The xtensor authors
>> clearly have made sane choices, but they did have to spend a lot of effort
>> making those choices - what to include and what not.
>>
>> Now, the XND team is just starting to build out their Python API. Hameer
>> is building out unumpy. There's all the other arrays libraries I mentioned.
>> We can say "sort it out yourself, make your own choices", or we can provide
>> some guidance. So far the authors of those libaries I have asked say they
>> would appreciate the guidance.
>>
>
> That sounds great. Maybe you want... a mailing list or a forum for array
> library implementors to compare notes?
>

No.

("So we ran into this unexpected problem implementing einsum, how did you
> handle it? And btw numpy devs, why is it like that in the first place?")
>

can be done on this list.

Maybe you want someone to write up a review of existing APIs like xtensor,
> dask, xarray, sparse, ... to see where they deviated from numpy and if
> there are any commonalities?
>

That will be useful in verifying that the list of functions for "core of
NumPy" I proposed is sensible. We're not going to make up things out of
thin air.


> Or someone could do an analysis of existing code and publish tables of how
> often different features are used, so array implementors can make better
> choices about what to implement first?
>

That's done:)
NumPy table:
https://github.com/Quansight-Labs/python-api-inspect/blob/master/data/csv/numpy-summary.csv
Blog post with explanation:
https://labs.quansight.org/blog/2019/05/python-package-function-usage/

And yes, it's another useful data point in verifying our choices.

Or maybe just encouraging Hameer to be really proactive about sharing
> drafts and gathering feedback here?
>

No. (well, it's always good to be proactive, but besides the point here)

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/27be46db/attachment.html>

From dashohoxha at gmail.com  Sat Jun  1 18:32:41 2019
From: dashohoxha at gmail.com (Dashamir Hoxha)
Date: Sun, 2 Jun 2019 00:32:41 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
Message-ID: <CAMucfLzBC1YSjmbOOe7hWFoRWkX-pqitP6m23W1BikGH1HdjZg@mail.gmail.com>

On Sat, Jun 1, 2019 at 10:05 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
> I think this is potentially useful, but *far* more prescriptive and
> detailed than I had in mind. Both you and Nathaniel seem to have not
> understood what I mean by "out of scope", so I think that's my fault in not
> being explicit enough. I *do not* want to prescribe behavior. Instead, a
> simple yes/no for each function in numpy and method on ndarray.
>
> Our API is huge. A simple count:
> main namespace: 600
> fft: 30
> linalg: 30
> random: 60
> ndarray: 70
> lib: 20
> lib.npyio: 35
> etc. (many more ill-thought out but not clearly private submodules)
>
> Just the main namespace plus ndarray methods is close to 700 objects. If
> you want to build a NumPy-like thing, that's 700 decisions to make. I'm
> suggesting something as simple as a list of functions that constitute a
> sensible "core of NumPy". That list would not include anything in
> fft/linalg/random, since those can easily be separated out (indeed, if we
> could disappear fft and linalg and just rely on scipy, pyfftw etc., that
> would be great). It would not include financial functions. And so on. I
> guess we'd end up with most ndarray methods plus <150 functions.
>
> That list could be used for many purposes: improve the docs, serve as the
> set of functions to implement in xnd.array, unumpy & co, Marten's
> suggestion of implementing other functions in terms of basic functions, etc.
>
> Two other thoughts:
> 1. NumPy is not done. Our thinking on how to evolve the NumPy API is
> fairly muddled. When new functions are proposed, it's decided on on a
> case-by-case basis, usually without a guiding principle. We need to improve
> that. A "core of NumPy" list could be a part of that puzzle.
> 2. We often argue about deprecations. Deprecations are costly, but so is
> keeping around functions that are not very useful or have a poor design.
> This may offer a middle ground. Don't let others repeat our mistakes,
> signal to users that a function is of questionable value, without breaking
> already written code.
>

This sounds like a restructuring or factorization of the API, in order to
make it smaller, and thus easier to learn and use.
It may start with the docs, by paying more attention to the "core" or
important functions and methods, and noting the deprecated, or not
frequently used, or not important functions. This could also help the
satellite projects, which use NumPy API as an example, and may also be
influenced by them and their decisions.

Regards,
Dashamir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/67391fc3/attachment.html>

From njs at pobox.com  Sat Jun  1 18:34:38 2019
From: njs at pobox.com (Nathaniel Smith)
Date: Sat, 1 Jun 2019 15:34:38 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
Message-ID: <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>

On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
> I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.

So yes/no are the answers. But what's the question?

"If we were redesigning numpy in a fantasy world without external
constraints or compatibility issues, would we include this function?"
"Is this function well designed?"
"Do we think that supporting this function is necessary to achieve
practical duck-array compatibility?"
"If someone implements this function, should we give them a 'numpy
core compliant!' logo to put on their website?"
"Do we recommend that people use this function in new code?"
"If we were trying to design a minimal set of primitives and implement
the rest of numpy in terms of them, then is this function a good
candidate for a primitive?"

These are all really different things, and useful for solving
different problems... I feel like you might be lumping them together
some?

Also, I'm guessing there are a bunch of functions where you think part
of the interface is fine and part of the interface is broken. (E.g.
dot's behavior on high-dimensional arrays.) Do you think this "one
bool per function" structure will be fine-grained enough for what you
want to do?

> Two other thoughts:
> 1. NumPy is not done. Our thinking on how to evolve the NumPy API is fairly muddled. When new functions are proposed, it's decided on on a case-by-case basis, usually without a guiding principle. We need to improve that. A "core of NumPy" list could be a part of that puzzle.

I think we do have some rough consensus principles on what's in scope
and what isn't in scope for numpy, but yeah, articulating them more
clearly could be useful. Stuff like "output types and shape should be
predictable from input types and shape", "numpy's core
responsibilities are the array/dtype/ufunc interfaces, and providing a
lingua franca for python numerical libraries to interoperate" (and
therefore: "if it can live outside numpy it probably should"), etc.
I'm seeing this as a living document (a NEP?) that tries to capture
some rules of thumb and that we update as we go. That seems pretty
different to me than a long list of yes/no checkboxes though?

> 2. We often argue about deprecations. Deprecations are costly, but so is keeping around functions that are not very useful or have a poor design. This may offer a middle ground. Don't let others repeat our mistakes, signal to users that a function is of questionable value, without breaking already written code.

The idea has come up a few times of having a "soft deprecation" level,
where we put a warning in the docs but not in the code. It seems like
a reasonable idea to me. It's inherently a kind of case-by-case thing
that can be done incrementally. But, if someone wants to
systematically work through all the docs and do the case-by-case
analysis, that also seems like a reasonable idea to me. I'm not sure
if that's the same as your proposal or not.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From m.h.vankerkwijk at gmail.com  Sat Jun  1 20:53:32 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 1 Jun 2019 20:53:32 -0400
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
Message-ID: <CAJNV+9tiLzJMqw7z5q50xkQLmdrjbqKK=rbe-y=wQM=MbdDcwQ@mail.gmail.com>

> >     In this respect, I think an excellent place to start might be
>> >     something you are planning already anyway: update the user
>> >     documentation
>> >
>>
>> I would include tests as well. Rather than hammer out a full standard
>> based on extensive discussions and negotiations, I would suggest NumPy
>> might be able set a de-facto "standard" based on pieces of the the
>> current numpy user documentation and test suite.
>
>
> I think this is potentially useful, but *far* more prescriptive and
> detailed than I had in mind. Both you and Nathaniel seem to have not
> understood what I mean by "out of scope", so I think that's my fault in not
> being explicit enough. I *do not* want to prescribe behavior. Instead, a
> simple yes/no for each function in numpy and method on ndarray.
>

I quite like the idea of trying to be better at defining the API through
tests - the substitution principle in action! Systematically applying tests
to both ndarray and MaskedArray might be a start internally (just a pytest
fixture away...). But definitely start with more of an overview.

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/f2831b93/attachment.html>

From m.h.vankerkwijk at gmail.com  Sat Jun  1 21:17:28 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 1 Jun 2019 21:17:28 -0400
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
Message-ID: <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>

> Our API is huge. A simple count:
> main namespace: 600
> fft: 30
> linalg: 30
> random: 60
> ndarray: 70
> lib: 20
> lib.npyio: 35
> etc. (many more ill-thought out but not clearly private submodules)
>
>
I would perhaps start with ndarray itself. Quite a lot seems superfluous

Shapes:
- need: shape, strides, reshape, transpose;
- probably: ndim, size, T
- less so: nbytes, ravel, flatten, squeeze, and swapaxes.

Getting/setting:
- need __getitem__, __setitem__;
- less so: fill, put, take, item, itemset, repeat, compress, diagonal;

Datatype/Copies/views/conversion
- need: dtype, copy, view, astype, flags
- less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap,
newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring,

Iteration
- need __iter__
- less so: flat

Numerics
- need: conj, real, imag
- maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet;
- less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any,
nonzero, ptp, searchsorted,
choose.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/0a016a86/attachment.html>

From ralf.gommers at gmail.com  Sun Jun  2 02:38:05 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 2 Jun 2019 08:38:05 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
Message-ID: <CABL7CQiCW6U0wKn7Nw3=xw8Uv0ygWUi-i7oRpopvvcBnJ8wubg@mail.gmail.com>

On Sun, Jun 2, 2019 at 3:18 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

>
> Our API is huge. A simple count:
>> main namespace: 600
>> fft: 30
>> linalg: 30
>> random: 60
>> ndarray: 70
>> lib: 20
>> lib.npyio: 35
>> etc. (many more ill-thought out but not clearly private submodules)
>>
>>
> I would perhaps start with ndarray itself. Quite a lot seems superfluous
>
> Shapes:
> - need: shape, strides, reshape, transpose;
> - probably: ndim, size, T
> - less so: nbytes, ravel, flatten, squeeze, and swapaxes.
>
> Getting/setting:
> - need __getitem__, __setitem__;
> - less so: fill, put, take, item, itemset, repeat, compress, diagonal;
>
> Datatype/Copies/views/conversion
> - need: dtype, copy, view, astype, flags
> - less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap,
> newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring,
>
> Iteration
> - need __iter__
> - less so: flat
>
> Numerics
> - need: conj, real, imag
> - maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet;
> - less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any,
> nonzero, ptp, searchsorted,
> choose.
>

Exactly. This is great, thanks Marten. I agree with pretty much everything
in this list.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/3961f874/attachment.html>

From ralf.gommers at gmail.com  Sun Jun  2 02:42:34 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 2 Jun 2019 08:42:34 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAMucfLzBC1YSjmbOOe7hWFoRWkX-pqitP6m23W1BikGH1HdjZg@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAMucfLzBC1YSjmbOOe7hWFoRWkX-pqitP6m23W1BikGH1HdjZg@mail.gmail.com>
Message-ID: <CABL7CQj62dZLCrc33D6ME35KsDnT-rCCD=9dcJP1C-Sg9e8_=g@mail.gmail.com>

On Sun, Jun 2, 2019 at 12:33 AM Dashamir Hoxha <dashohoxha at gmail.com> wrote:

> On Sat, Jun 1, 2019 at 10:05 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>> I think this is potentially useful, but *far* more prescriptive and
>> detailed than I had in mind. Both you and Nathaniel seem to have not
>> understood what I mean by "out of scope", so I think that's my fault in not
>> being explicit enough. I *do not* want to prescribe behavior. Instead, a
>> simple yes/no for each function in numpy and method on ndarray.
>>
>> Our API is huge. A simple count:
>> main namespace: 600
>> fft: 30
>> linalg: 30
>> random: 60
>> ndarray: 70
>> lib: 20
>> lib.npyio: 35
>> etc. (many more ill-thought out but not clearly private submodules)
>>
>> Just the main namespace plus ndarray methods is close to 700 objects. If
>> you want to build a NumPy-like thing, that's 700 decisions to make. I'm
>> suggesting something as simple as a list of functions that constitute a
>> sensible "core of NumPy". That list would not include anything in
>> fft/linalg/random, since those can easily be separated out (indeed, if we
>> could disappear fft and linalg and just rely on scipy, pyfftw etc., that
>> would be great). It would not include financial functions. And so on. I
>> guess we'd end up with most ndarray methods plus <150 functions.
>>
>> That list could be used for many purposes: improve the docs, serve as the
>> set of functions to implement in xnd.array, unumpy & co, Marten's
>> suggestion of implementing other functions in terms of basic functions, etc.
>>
>> Two other thoughts:
>> 1. NumPy is not done. Our thinking on how to evolve the NumPy API is
>> fairly muddled. When new functions are proposed, it's decided on on a
>> case-by-case basis, usually without a guiding principle. We need to improve
>> that. A "core of NumPy" list could be a part of that puzzle.
>> 2. We often argue about deprecations. Deprecations are costly, but so is
>> keeping around functions that are not very useful or have a poor design.
>> This may offer a middle ground. Don't let others repeat our mistakes,
>> signal to users that a function is of questionable value, without breaking
>> already written code.
>>
>
> This sounds like a restructuring or factorization of the API, in order to
> make it smaller, and thus easier to learn and use.
> It may start with the docs, by paying more attention to the "core" or
> important functions and methods, and noting the deprecated, or not
> frequently used, or not important functions. This could also help the
> satellite projects, which use NumPy API as an example, and may also be
> influenced by them and their decisions.
>

 Indeed. It will help restructure our docs. Perhaps not the reference guide
(not sure yet), but definitely the user guide and other high-level docs we
(or third parties) may want to create.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/96274e34/attachment-0001.html>

From ralf.gommers at gmail.com  Sun Jun  2 02:59:25 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 2 Jun 2019 08:59:25 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
Message-ID: <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>

On Sun, Jun 2, 2019 at 12:35 AM Nathaniel Smith <njs at pobox.com> wrote:

> On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> > I think this is potentially useful, but *far* more prescriptive and
> detailed than I had in mind. Both you and Nathaniel seem to have not
> understood what I mean by "out of scope", so I think that's my fault in not
> being explicit enough. I *do not* want to prescribe behavior. Instead, a
> simple yes/no for each function in numpy and method on ndarray.
>
> So yes/no are the answers. But what's the question?
>
> "If we were redesigning numpy in a fantasy world without external
> constraints or compatibility issues, would we include this function?"
> "Is this function well designed?"
> "Do we think that supporting this function is necessary to achieve
> practical duck-array compatibility?"
> "If someone implements this function, should we give them a 'numpy
> core compliant!' logo to put on their website?"
> "Do we recommend that people use this function in new code?"
> "If we were trying to design a minimal set of primitives and implement
> the rest of numpy in terms of them, then is this function a good
> candidate for a primitive?"
>
> These are all really different things, and useful for solving
> different problems... I feel like you might be lumping them together
> some?
>

No, I feel like you just want to see a real proposal. At this point I've
gotten some really useful feedback, in particular from Marten (thanks!),
and I have a better idea of what to do. So I'll answer a few of your
questions, and propose to leave the rest till I actually have some more
solid to discuss. That will likely answer many of your questions.


> Also, I'm guessing there are a bunch of functions where you think part
> of the interface is fine and part of the interface is broken. (E.g.
> dot's behavior on high-dimensional arrays.)


Indeed, but that's a much harder problem to tackle. Again, there's a reason
I put function behavior explicitly out of scope.

Do you think this "one
> bool per function" structure will be fine-grained enough for what you
> want to do?
>

yes


>
> > Two other thoughts:
> > 1. NumPy is not done. Our thinking on how to evolve the NumPy API is
> fairly muddled. When new functions are proposed, it's decided on on a
> case-by-case basis, usually without a guiding principle. We need to improve
> that. A "core of NumPy" list could be a part of that puzzle.
>
> I think we do have some rough consensus principles on what's in scope
> and what isn't in scope for numpy,


Very rough perhaps. I don't think we are on the same wavelength at all
about the cost of adding new functions, the cost of deprecations, the use
of submodules and even what's public or private right now.

That can't be solved all at once, but I think what my idea will help with
some of these.

but yeah, articulating them more
> clearly could be useful. Stuff like "output types and shape should be
> predictable from input types and shape", "numpy's core
> responsibilities are the array/dtype/ufunc interfaces, and providing a
> lingua franca for python numerical libraries to interoperate" (and
> therefore: "if it can live outside numpy it probably should"), etc.
>

All of these are valid questions. Most of that propably needs to be in the
scope document (https://www.numpy.org/neps/scope.html). Which also needs to
be improved.

I'm seeing this as a living document (a NEP?)


NEP would work. Although I'd prefer a way to be able to reference some
fixed version of it rather than it being always in flux.

that tries to capture
> some rules of thumb and that we update as we go. That seems pretty
> different to me than a long list of yes/no checkboxes though?
>
> > 2. We often argue about deprecations. Deprecations are costly, but so is
> keeping around functions that are not very useful or have a poor design.
> This may offer a middle ground. Don't let others repeat our mistakes,
> signal to users that a function is of questionable value, without breaking
> already written code.
>
> The idea has come up a few times of having a "soft deprecation" level,
> where we put a warning in the docs but not in the code. It seems like
> a reasonable idea to me. It's inherently a kind of case-by-case thing
> that can be done incrementally. But, if someone wants to
> systematically work through all the docs and do the case-by-case
> analysis, that also seems like a reasonable idea to me. I'm not sure
> if that's the same as your proposal or not.
>

not the same, but related.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/44ab761f/attachment.html>

From njs at pobox.com  Sun Jun  2 03:45:43 2019
From: njs at pobox.com (Nathaniel Smith)
Date: Sun, 2 Jun 2019 00:45:43 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
Message-ID: <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>

On Sat, Jun 1, 2019 at 11:59 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
> On Sun, Jun 2, 2019 at 12:35 AM Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>> > I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.
>>
>> So yes/no are the answers. But what's the question?
>>
>> "If we were redesigning numpy in a fantasy world without external
>> constraints or compatibility issues, would we include this function?"
>> "Is this function well designed?"
>> "Do we think that supporting this function is necessary to achieve
>> practical duck-array compatibility?"
>> "If someone implements this function, should we give them a 'numpy
>> core compliant!' logo to put on their website?"
>> "Do we recommend that people use this function in new code?"
>> "If we were trying to design a minimal set of primitives and implement
>> the rest of numpy in terms of them, then is this function a good
>> candidate for a primitive?"
>>
>> These are all really different things, and useful for solving
>> different problems... I feel like you might be lumping them together
>> some?
>
>
> No, I feel like you just want to see a real proposal. At this point I've gotten some really useful feedback, in particular from Marten (thanks!), and I have a better idea of what to do. So I'll answer a few of your questions, and propose to leave the rest till I actually have some more solid to discuss. That will likely answer many of your questions.

Okay, that's fine. You scared me a bit with the initial email, but I
really am trying to be helpful :-). I'm not looking for a detailed
proposal; I'm just super confused right now about what you're trying
to accomplish or how this table of yes/no values will help do it.  I
look forward to hearing more!

>> I'm seeing this as a living document (a NEP?)
>
> NEP would work. Although I'd prefer a way to be able to reference some fixed version of it rather than it being always in flux.

When I say "living" I mean: it would be seen as documenting our
consensus and necessarily fuzzy rather than normative and precise like
most NEPs. Maybe this is obvious and not worth mentioning. But I
wouldn't expect it to change rapidly. Unless our collective opinions
change rapidly I guess, but that seems unlikely.

(And of course NEPs are in git so we always have the ability to link
to a point-in-time snapshot if we need to reference something.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From ralf.gommers at gmail.com  Sun Jun  2 04:01:37 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 2 Jun 2019 10:01:37 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
 <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
Message-ID: <CABL7CQhCjJ1nG5gfGxfumzPd4reMhoUBtWRN+-Or+XaN9Hbp6w@mail.gmail.com>

On Sun, Jun 2, 2019 at 9:46 AM Nathaniel Smith <njs at pobox.com> wrote:

> On Sat, Jun 1, 2019 at 11:59 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> > On Sun, Jun 2, 2019 at 12:35 AM Nathaniel Smith <njs at pobox.com> wrote:
> >>
> >> On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >> > I think this is potentially useful, but *far* more prescriptive and
> detailed than I had in mind. Both you and Nathaniel seem to have not
> understood what I mean by "out of scope", so I think that's my fault in not
> being explicit enough. I *do not* want to prescribe behavior. Instead, a
> simple yes/no for each function in numpy and method on ndarray.
> >>
> >> So yes/no are the answers. But what's the question?
> >>
> >> "If we were redesigning numpy in a fantasy world without external
> >> constraints or compatibility issues, would we include this function?"
> >> "Is this function well designed?"
> >> "Do we think that supporting this function is necessary to achieve
> >> practical duck-array compatibility?"
> >> "If someone implements this function, should we give them a 'numpy
> >> core compliant!' logo to put on their website?"
> >> "Do we recommend that people use this function in new code?"
> >> "If we were trying to design a minimal set of primitives and implement
> >> the rest of numpy in terms of them, then is this function a good
> >> candidate for a primitive?"
> >>
> >> These are all really different things, and useful for solving
> >> different problems... I feel like you might be lumping them together
> >> some?
> >
> >
> > No, I feel like you just want to see a real proposal. At this point I've
> gotten some really useful feedback, in particular from Marten (thanks!),
> and I have a better idea of what to do. So I'll answer a few of your
> questions, and propose to leave the rest till I actually have some more
> solid to discuss. That will likely answer many of your questions.
>
> Okay, that's fine. You scared me a bit with the initial email, but I
> really am trying to be helpful :-). I'm not looking for a detailed
> proposal; I'm just super confused right now about what you're trying
> to accomplish or how this table of yes/no values will help do it.  I
> look forward to hearing more!
>

Thanks! I know this is going to be a little complicated to get everyone on
the same page. That's why I'm aiming to get a draft out before SciPy'19 so
there's a chance to discuss it in person with everyone who is there.
Mailing lists are a poor interface. Will you be at SciPy?


> >> I'm seeing this as a living document (a NEP?)
> >
> > NEP would work. Although I'd prefer a way to be able to reference some
> fixed version of it rather than it being always in flux.
>
> When I say "living" I mean: it would be seen as documenting our
> consensus and necessarily fuzzy rather than normative and precise like
> most NEPs.


Yeah, I'm going for useful rather than normative:)

Maybe this is obvious and not worth mentioning. But I
> wouldn't expect it to change rapidly. Unless our collective opinions
> change rapidly I guess, but that seems unlikely.
>
> (And of course NEPs are in git so we always have the ability to link
> to a point-in-time snapshot if we need to reference something.)
>

Agreed. One perhaps unintended side effect of separating out the NEPs doc
build from the full doc build is that we stopped shipping NEPs with our
releases. It would be nicer to say "NEP as of 1.16" rather than "NEP as of
commit 1324adf59". Ah well, that's for another time.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/c27e2e91/attachment-0001.html>

From einstein.edison at gmail.com  Sat Jun  1 08:23:06 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sat, 1 Jun 2019 14:23:06 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <7c8e7851-1b17-419c-b453-d2ac08f3e569@Canary>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <CAPJVwBnC7Ob8ergJwNb-+V2+24KAnraDfc+S+5+b7rOCyXKKmA@mail.gmail.com>
 <7c8e7851-1b17-419c-b453-d2ac08f3e569@Canary>
Message-ID: <5814a804-d96d-4c89-8154-84b2bb438b67@Canary>

I think this hits the crux of the issue... There is a huge coordination problem. Users want to move their code from NumPy to Sparse or Dask all the time, but it?s not trivial to do. And libraries like sparse and Dask want to follow a standard (or at least hoped there was one) before they existed.

Maybe I think the issue is bigger than it really is, but there?s definitely a coordination problem.

See the section in the original email on ?who cares and why?...

Best Regards,
Hameer Abbasi

> On Saturday, Jun 01, 2019 at 11:32 AM, Nathaniel Smith <njs at pobox.com (mailto:njs at pobox.com)> wrote:
> [snip]
>
> That's not a problem at all for us, because numpy
> already exists.
>
> [snip]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190601/23246df2/attachment-0002.html>

From dashohoxha at gmail.com  Sun Jun  2 04:53:17 2019
From: dashohoxha at gmail.com (Dashamir Hoxha)
Date: Sun, 2 Jun 2019 10:53:17 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQhCjJ1nG5gfGxfumzPd4reMhoUBtWRN+-Or+XaN9Hbp6w@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
 <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
 <CABL7CQhCjJ1nG5gfGxfumzPd4reMhoUBtWRN+-Or+XaN9Hbp6w@mail.gmail.com>
Message-ID: <CAMucfLzo2Akx1Am0jpPmD_pOwgtdMfooJLQ1kpQtn19O7eaqBg@mail.gmail.com>

On Sun, Jun 2, 2019 at 10:01 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
> Thanks! I know this is going to be a little complicated to get everyone on
> the same page. That's why I'm aiming to get a draft out before SciPy'19 so
> there's a chance to discuss it in person with everyone who is there.
> Mailing lists are a poor interface. Will you be at SciPy?
>

Would it be useful if we could integrate the documentation system with a
discussion forum (like Discourse.org)? Each function can be linked to its
own discussion topic, where users and developers can discuss about the
function, upvote or downvote it etc. This kind of discussion seems to be a
bit more structured than a mailing list discussion.

Dashamir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/993a22f1/attachment.html>

From ralf.gommers at gmail.com  Sun Jun  2 06:07:52 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 2 Jun 2019 12:07:52 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAMucfLzo2Akx1Am0jpPmD_pOwgtdMfooJLQ1kpQtn19O7eaqBg@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
 <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
 <CABL7CQhCjJ1nG5gfGxfumzPd4reMhoUBtWRN+-Or+XaN9Hbp6w@mail.gmail.com>
 <CAMucfLzo2Akx1Am0jpPmD_pOwgtdMfooJLQ1kpQtn19O7eaqBg@mail.gmail.com>
Message-ID: <CABL7CQiZ=tMhte-x4dp9tFAbuD7f3jnKFHEWjMJawz_-6=vjRQ@mail.gmail.com>

On Sun, Jun 2, 2019 at 10:53 AM Dashamir Hoxha <dashohoxha at gmail.com> wrote:

>
>
> On Sun, Jun 2, 2019 at 10:01 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>> Thanks! I know this is going to be a little complicated to get everyone
>> on the same page. That's why I'm aiming to get a draft out before SciPy'19
>> so there's a chance to discuss it in person with everyone who is there.
>> Mailing lists are a poor interface. Will you be at SciPy?
>>
>
> Would it be useful if we could integrate the documentation system with a
> discussion forum (like Discourse.org)? Each function can be linked to its
> own discussion topic, where users and developers can discuss about the
> function, upvote or downvote it etc. This kind of discussion seems to be a
> bit more structured than a mailing list discussion.
>

A more modern forum is nice indeed. It is not strictly better than mailing
lists though. So what I would like is a Discourse like interface on top of
the mailing list, so we get the features you're talking about without a
painful migration and breaking all links to threads in the archives.
Mailman 3 does provide this (example:
https://lists.fedoraproject.org/archives/list/devel at lists.fedoraproject.org/).
I'm keeping an eye on what's going on with Mailman 3 migration of the
python.org provided infrastructure. I think we can do this in the near to
medium future. I don't want us to be the guinea pig though:)

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/f836d8dc/attachment.html>

From ralf.gommers at gmail.com  Sun Jun  2 06:12:21 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 2 Jun 2019 12:12:21 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQiZ=tMhte-x4dp9tFAbuD7f3jnKFHEWjMJawz_-6=vjRQ@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
 <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
 <CABL7CQhCjJ1nG5gfGxfumzPd4reMhoUBtWRN+-Or+XaN9Hbp6w@mail.gmail.com>
 <CAMucfLzo2Akx1Am0jpPmD_pOwgtdMfooJLQ1kpQtn19O7eaqBg@mail.gmail.com>
 <CABL7CQiZ=tMhte-x4dp9tFAbuD7f3jnKFHEWjMJawz_-6=vjRQ@mail.gmail.com>
Message-ID: <CABL7CQh1DwQbAoNF_BWVvCVQsPWLTd2wP2P=j+uxKvHPvcoF+A@mail.gmail.com>

On Sun, Jun 2, 2019 at 12:07 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Sun, Jun 2, 2019 at 10:53 AM Dashamir Hoxha <dashohoxha at gmail.com>
> wrote:
>
>>
>>
>> On Sun, Jun 2, 2019 at 10:01 AM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>> Thanks! I know this is going to be a little complicated to get everyone
>>> on the same page. That's why I'm aiming to get a draft out before SciPy'19
>>> so there's a chance to discuss it in person with everyone who is there.
>>> Mailing lists are a poor interface. Will you be at SciPy?
>>>
>>
>> Would it be useful if we could integrate the documentation system with a
>> discussion forum (like Discourse.org)? Each function can be linked to its
>> own discussion topic, where users and developers can discuss about the
>> function, upvote or downvote it etc. This kind of discussion seems to be a
>> bit more structured than a mailing list discussion.
>>
>
> A more modern forum is nice indeed. It is not strictly better than mailing
> lists though. So what I would like is a Discourse like interface on top of
> the mailing list, so we get the features you're talking about without a
> painful migration and breaking all links to threads in the archives.
> Mailman 3 does provide this (example:
> https://lists.fedoraproject.org/archives/list/devel at lists.fedoraproject.org/).
> I'm keeping an eye on what's going on with Mailman 3 migration of the
> python.org provided infrastructure. I think we can do this in the near to
> medium future. I don't want us to be the guinea pig though:)
>

To save anyone else the trouble of posting this link, here's Guido's thumbs
down on Discourse (and he's not the only one) as a replacement for Python
mailing lists:
https://discuss.python.org/t/disappointed-and-overwhelmed-by-discourse/982.
Tastes vary:)

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/9142787d/attachment-0001.html>

From dashohoxha at gmail.com  Sun Jun  2 06:43:48 2019
From: dashohoxha at gmail.com (Dashamir Hoxha)
Date: Sun, 2 Jun 2019 12:43:48 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQh1DwQbAoNF_BWVvCVQsPWLTd2wP2P=j+uxKvHPvcoF+A@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
 <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
 <CABL7CQhCjJ1nG5gfGxfumzPd4reMhoUBtWRN+-Or+XaN9Hbp6w@mail.gmail.com>
 <CAMucfLzo2Akx1Am0jpPmD_pOwgtdMfooJLQ1kpQtn19O7eaqBg@mail.gmail.com>
 <CABL7CQiZ=tMhte-x4dp9tFAbuD7f3jnKFHEWjMJawz_-6=vjRQ@mail.gmail.com>
 <CABL7CQh1DwQbAoNF_BWVvCVQsPWLTd2wP2P=j+uxKvHPvcoF+A@mail.gmail.com>
Message-ID: <CAMucfLxOfMkvNtRJYdqwPORHWknUeDv9bq6TjwroenP_LU27Sg@mail.gmail.com>

On Sun, Jun 2, 2019 at 12:12 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>>> Would it be useful if we could integrate the documentation system with a
>>> discussion forum (like Discourse.org)? Each function can be linked to its
>>> own discussion topic, where users and developers can discuss about the
>>> function, upvote or downvote it etc. This kind of discussion seems to be a
>>> bit more structured than a mailing list discussion.
>>>
>>
>> A more modern forum is nice indeed. It is not strictly better than
>> mailing lists though. So what I would like is a Discourse like interface on
>> top of the mailing list, so we get the features you're talking about
>> without a painful migration and breaking all links to threads in the
>> archives. Mailman 3 does provide this (example:
>> https://lists.fedoraproject.org/archives/list/devel at lists.fedoraproject.org/).
>> I'm keeping an eye on what's going on with Mailman 3 migration of the
>> python.org provided infrastructure. I think we can do this in the near
>> to medium future. I don't want us to be the guinea pig though:)
>>
>
> To save anyone else the trouble of posting this link, here's Guido's
> thumbs down on Discourse (and he's not the only one) as a replacement for
> Python mailing lists:
> https://discuss.python.org/t/disappointed-and-overwhelmed-by-discourse/982.
> Tastes vary:)
>

I did not suggest replacing the mailing lists with Discourse.

I suggested integrating documentation with Discourse, so that for each
function there is a separate discussion topic for this function. For each
function on the documentation page there can be a "Feedback" or "Comment"
link that goes to the corresponding discussion topic for that function.
This way Discourse can be used like a commenting system (similar to
Disqus). In the discussion page of the function people can upvote the
function (using the "like" feature of Discourse) and can also explain why
they think it is important.

This may help building a consensus about which are the important or "core"
functions of NumPy. Or maybe it doesn't have to be so complex after all,
and mailing list discussions, combined with face-to-face discussions on
conferences or online meetings can do it better.

Dashamir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/a75ee6cb/attachment.html>

From ralf.gommers at gmail.com  Sun Jun  2 07:57:50 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 2 Jun 2019 13:57:50 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAMucfLxOfMkvNtRJYdqwPORHWknUeDv9bq6TjwroenP_LU27Sg@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
 <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
 <CABL7CQhCjJ1nG5gfGxfumzPd4reMhoUBtWRN+-Or+XaN9Hbp6w@mail.gmail.com>
 <CAMucfLzo2Akx1Am0jpPmD_pOwgtdMfooJLQ1kpQtn19O7eaqBg@mail.gmail.com>
 <CABL7CQiZ=tMhte-x4dp9tFAbuD7f3jnKFHEWjMJawz_-6=vjRQ@mail.gmail.com>
 <CABL7CQh1DwQbAoNF_BWVvCVQsPWLTd2wP2P=j+uxKvHPvcoF+A@mail.gmail.com>
 <CAMucfLxOfMkvNtRJYdqwPORHWknUeDv9bq6TjwroenP_LU27Sg@mail.gmail.com>
Message-ID: <CABL7CQhCgLD9HrifKp5=wRm-Tsr0oU8vuR+DPeey0n21F3NkbQ@mail.gmail.com>

On Sun, Jun 2, 2019 at 12:44 PM Dashamir Hoxha <dashohoxha at gmail.com> wrote:

>
>
> On Sun, Jun 2, 2019 at 12:12 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>>> Would it be useful if we could integrate the documentation system with
>>>> a discussion forum (like Discourse.org)? Each function can be linked to its
>>>> own discussion topic, where users and developers can discuss about the
>>>> function, upvote or downvote it etc. This kind of discussion seems to be a
>>>> bit more structured than a mailing list discussion.
>>>>
>>>
>>> A more modern forum is nice indeed. It is not strictly better than
>>> mailing lists though. So what I would like is a Discourse like interface on
>>> top of the mailing list, so we get the features you're talking about
>>> without a painful migration and breaking all links to threads in the
>>> archives. Mailman 3 does provide this (example:
>>> https://lists.fedoraproject.org/archives/list/devel at lists.fedoraproject.org/).
>>> I'm keeping an eye on what's going on with Mailman 3 migration of the
>>> python.org provided infrastructure. I think we can do this in the near
>>> to medium future. I don't want us to be the guinea pig though:)
>>>
>>
>> To save anyone else the trouble of posting this link, here's Guido's
>> thumbs down on Discourse (and he's not the only one) as a replacement for
>> Python mailing lists:
>> https://discuss.python.org/t/disappointed-and-overwhelmed-by-discourse/982.
>> Tastes vary:)
>>
>
> I did not suggest replacing the mailing lists with Discourse.
>
> I suggested integrating documentation with Discourse, so that for each
> function there is a separate discussion topic for this function. For each
> function on the documentation page there can be a "Feedback" or "Comment"
> link that goes to the corresponding discussion topic for that function.
> This way Discourse can be used like a commenting system (similar to
> Disqus). In the discussion page of the function people can upvote the
> function (using the "like" feature of Discourse) and can also explain why
> they think it is important.
>

Oh okay, I misunderstood you. I don't think that's desirable; it's too
complicated and has too much overhead in setting up and maintaining.
Between looking at libraries like Dask and Xtensor, tooling to measure
actual API usage (
https://labs.quansight.org/blog/2019/05/python-package-function-usage/),
and just using our own knowledge, we have enough information to make
choices.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/64e2b845/attachment.html>

From ralf.gommers at gmail.com  Sun Jun  2 10:09:05 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 2 Jun 2019 16:09:05 +0200
Subject: [Numpy-discussion] scientific Python featured in GitHub keynote
In-Reply-To: <CABL7CQjt1UmNSpyoB-0=DDfbxsm1vvytpjWyeTgs2cEU4w5XMQ@mail.gmail.com>
References: <CABL7CQgwdhVw+Nk5hfyticRdgXPR=uZA6VR=qy9N+x-OB6aoYQ@mail.gmail.com>
 <CAB6mnx+xLFsQVSFLsnjD6paYH1NE825CHthS+CtHZU0rU-dGrA@mail.gmail.com>
 <CABL7CQjt1UmNSpyoB-0=DDfbxsm1vvytpjWyeTgs2cEU4w5XMQ@mail.gmail.com>
Message-ID: <CABL7CQib8AqK46O1WGK2DXByX3--3z_uwYodWt74iRzNZqa-Wg@mail.gmail.com>

On Sun, May 26, 2019 at 11:58 AM Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
> On Sun, May 26, 2019 at 2:19 AM Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sat, May 25, 2019 at 4:09 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> On Thursday I had the pleasure to be at GitHub Satellite, together with
>>> quite a few other maintainers from projects throughout our ecosystem, and
>>> see NumPy, Matplotlib, AstroPy and other projects highlighted prominently
>>> in Nat Friedman's keynote. It included the story of the black hole image,
>>> and the open source software that enabled that image. It's the first 21
>>> minutes of https://www.youtube.com/watch?v=xAbJkn4uRL4.
>>>
>>> Also, we now have "used by" for each repo and the dependency graph (
>>> https://github.com/numpy/numpy/network/dependents): right now there are
>>> 205,240 repos and 13,877 packages on GitHub that depend on NumPy. Those
>>> numbers were not easy to get before, so very useful to have them in the UI
>>> now.
>>>
>>>
>> Thanks for the link. That was a lot of material to digest, do you have
>> thoughts about which things we should be interested in?
>>
>
> The triage role will be very useful (not yet available except as beta,
> being rolled out over the next couple of weeks). It nicely fills the gap
> between "nothing" and "full write access".
>
> The "used by" and the dependency graph features will be very useful when,
> e.g., writing proposals. It's not 100% complete (no OpenBLAS link for us
> for example) but it's better than anything we had before.
>
> I'm still wrapping my head around "sponsors". It's aimed at individuals
> and in general not the best for for NumPy and similar size projects I
> think, but there's a lot to like as well and there may be more coming in
> that direction. For those who are interested in funding/sponsoring, this is
> a nice reflection on the sponsors feature:
> https://nadiaeghbal.com/github-sponsors
>

Okay not entirely accurate. We can and did just add a "<heart> sponsor"
button:) It links to our donate section on numpy.org and to our Tidelift
page.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/c1737d7f/attachment-0001.html>

From sebastian at sipsolutions.net  Sun Jun  2 12:32:37 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 02 Jun 2019 09:32:37 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQiCW6U0wKn7Nw3=xw8Uv0ygWUi-i7oRpopvvcBnJ8wubg@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
 <CABL7CQiCW6U0wKn7Nw3=xw8Uv0ygWUi-i7oRpopvvcBnJ8wubg@mail.gmail.com>
Message-ID: <dec55758551abcab1d4796cafbfabed3a3fdf517.camel@sipsolutions.net>

On Sun, 2019-06-02 at 08:38 +0200, Ralf Gommers wrote:
> 
> 
> On Sun, Jun 2, 2019 at 3:18 AM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
> > > Our API is huge. A simple count:
> > > main namespace: 600
> > > fft: 30
> > > linalg: 30
> > > random: 60
> > > ndarray: 70
> > > lib: 20
> > > lib.npyio: 35
> > > etc. (many more ill-thought out but not clearly private
> > > submodules)
> > > 
> > 
> > I would perhaps start with ndarray itself. Quite a lot seems
> > superfluous
> > 
> > Shapes:
> > - need: shape, strides, reshape, transpose;
> > - probably: ndim, size, T
> > - less so: nbytes, ravel, flatten, squeeze, and swapaxes.
> > 
> > Getting/setting:
> > - need __getitem__, __setitem__;
> > - less so: fill, put, take, item, itemset, repeat, compress,
> > diagonal;
> > 
> > Datatype/Copies/views/conversion
> > - need: dtype, copy, view, astype, flags
> > - less so: ctypes, dump, dumps, getfield, setfield, itemsize,
> > byteswap, newbyteorder, resize, setflags, tobytes, tofile, tolist,
> > tostring, 
> > 
> > Iteration
> > - need __iter__
> > - less so: flat
> > 
> > Numerics
> > - need: conj, real, imag
> > - maybe also: min, max, mean, sum, std, var, prod, partition, sort,
> > tracet;
> > - less so: the arg* ones, cumsum, cumprod, clip, round, dot, all,
> > any, nonzero, ptp, searchsorted, 
> > choose.
> > 
> 
> Exactly. This is great, thanks Marten. I agree with pretty much
> everything in this list.
> 


It is a bit tricky. I dislike flat for example, but it does have
occasional use cases. min/max, etc. are interesting, in that they are
just aliases to minimum.reduce, and could be argued to be covered by
the ufunc.

For other projects, the list of actual usage statistics may almost be
more interesting then what we can come up with. Although it depends a
bit where the long haul goes (but it seems right now that is not the
proposal).
For example, we could actually mark all our functions, and then you
could test SciPy for "numpy-core" compatible (i.e. test the users). We
could also want to work towards reference tests at some point. It would
be a huge amount of work, but if other projects would want to help
maintain it, maybe it can safe work in the long run?

One thing that I think may be interesting, would be to attempt to make
a graph of what functions can be implemented using other functions.
Example:
   transpose <-> swapaxes <-> moveaxis
   Indexing: delete, insert (+empty_like)
   reshape: atleast_Nd, ravel (+ensure/copy)

(and then find a minimal set, etc.)

Many of these have multiple possible implementations, though. But if we
could create even an indea of such a dependency graph, it could be very
cool to find "what is important".
Much of this is trivial, but maybe it could help to get a picture of
where things might go.

Anyway, it seems like a good idea to do something, but what that
something is and how difficult it would be seems hard to judge. But I
guess that should not stop us from moving. Maybe information of usage
and groupings/opinions on importance is already a lot.

Best,

Sebastian


> Ralf
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/ad5b0f88/attachment.sig>

From einstein.edison at gmail.com  Sun Jun  2 14:03:45 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sun, 2 Jun 2019 20:03:45 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
 <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
Message-ID: <fe9b6c01-c5cd-4d84-8ba6-36ccf1a0eff9@Canary>

Re: Successful specifications (I?ll avoid using the word standard):

Moving: HTML5/CSS3, C++, Rust, Python, Java.

Static: C

I?d really like this to be a moving spec... A static one is never much use, and is doomed to miss use cases, either today or some from the future.

Best Regards,
Hameer Abbasi

> On Sunday, Jun 02, 2019 at 9:46 AM, Nathaniel Smith <njs at pobox.com (mailto:njs at pobox.com)> wrote:
> On Sat, Jun 1, 2019 at 11:59 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
> > On Sun, Jun 2, 2019 at 12:35 AM Nathaniel Smith <njs at pobox.com> wrote:
> > >
> > > On Sat, Jun 1, 2019 at 1:05 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
> > > > I think this is potentially useful, but *far* more prescriptive and detailed than I had in mind. Both you and Nathaniel seem to have not understood what I mean by "out of scope", so I think that's my fault in not being explicit enough. I *do not* want to prescribe behavior. Instead, a simple yes/no for each function in numpy and method on ndarray.
> > >
> > > So yes/no are the answers. But what's the question?
> > >
> > > "If we were redesigning numpy in a fantasy world without external
> > > constraints or compatibility issues, would we include this function?"
> > > "Is this function well designed?"
> > > "Do we think that supporting this function is necessary to achieve
> > > practical duck-array compatibility?"
> > > "If someone implements this function, should we give them a 'numpy
> > > core compliant!' logo to put on their website?"
> > > "Do we recommend that people use this function in new code?"
> > > "If we were trying to design a minimal set of primitives and implement
> > > the rest of numpy in terms of them, then is this function a good
> > > candidate for a primitive?"
> > >
> > > These are all really different things, and useful for solving
> > > different problems... I feel like you might be lumping them together
> > > some?
> >
> >
> > No, I feel like you just want to see a real proposal. At this point I've gotten some really useful feedback, in particular from Marten (thanks!), and I have a better idea of what to do. So I'll answer a few of your questions, and propose to leave the rest till I actually have some more solid to discuss. That will likely answer many of your questions.
>
> Okay, that's fine. You scared me a bit with the initial email, but I
> really am trying to be helpful :-). I'm not looking for a detailed
> proposal; I'm just super confused right now about what you're trying
> to accomplish or how this table of yes/no values will help do it. I
> look forward to hearing more!
>
> > > I'm seeing this as a living document (a NEP?)
> >
> > NEP would work. Although I'd prefer a way to be able to reference some fixed version of it rather than it being always in flux.
>
> When I say "living" I mean: it would be seen as documenting our
> consensus and necessarily fuzzy rather than normative and precise like
> most NEPs. Maybe this is obvious and not worth mentioning. But I
> wouldn't expect it to change rapidly. Unless our collective opinions
> change rapidly I guess, but that seems unlikely.
>
> (And of course NEPs are in git so we always have the ability to link
> to a point-in-time snapshot if we need to reference something.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/88e9a141/attachment.html>

From wieser.eric+numpy at gmail.com  Sun Jun  2 14:20:57 2019
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sun, 2 Jun 2019 11:20:57 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
Message-ID: <CAL1kJvBRX4aYFTdxgckSAN6_o8utvxjLmmL3X=Ep3o+Lke-pbQ@mail.gmail.com>

Some of your categories here sound like they might be suitable for ABCs
that provide mixin methods, which is something I think Hameer suggested in
the past. Perhaps it's worth re-exploring that avenue.

Eric

On Sat, Jun 1, 2019, 18:18 Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
wrote:

>
> Our API is huge. A simple count:
>> main namespace: 600
>> fft: 30
>> linalg: 30
>> random: 60
>> ndarray: 70
>> lib: 20
>> lib.npyio: 35
>> etc. (many more ill-thought out but not clearly private submodules)
>>
>>
> I would perhaps start with ndarray itself. Quite a lot seems superfluous
>
> Shapes:
> - need: shape, strides, reshape, transpose;
> - probably: ndim, size, T
> - less so: nbytes, ravel, flatten, squeeze, and swapaxes.
>
> Getting/setting:
> - need __getitem__, __setitem__;
> - less so: fill, put, take, item, itemset, repeat, compress, diagonal;
>
> Datatype/Copies/views/conversion
> - need: dtype, copy, view, astype, flags
> - less so: ctypes, dump, dumps, getfield, setfield, itemsize, byteswap,
> newbyteorder, resize, setflags, tobytes, tofile, tolist, tostring,
>
> Iteration
> - need __iter__
> - less so: flat
>
> Numerics
> - need: conj, real, imag
> - maybe also: min, max, mean, sum, std, var, prod, partition, sort, tracet;
> - less so: the arg* ones, cumsum, cumprod, clip, round, dot, all, any,
> nonzero, ptp, searchsorted,
> choose.
>
> All the best,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/d08a218d/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Sun Jun  2 16:07:53 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 2 Jun 2019 16:07:53 -0400
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAL1kJvBRX4aYFTdxgckSAN6_o8utvxjLmmL3X=Ep3o+Lke-pbQ@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
 <CAL1kJvBRX4aYFTdxgckSAN6_o8utvxjLmmL3X=Ep3o+Lke-pbQ@mail.gmail.com>
Message-ID: <CAJNV+9t8sJNxdepp_dF-OoWZej8ysjAmEwVw1GgJeuSEQxwYCg@mail.gmail.com>

On Sun, Jun 2, 2019 at 2:21 PM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> Some of your categories here sound like they might be suitable for ABCs
> that provide mixin methods, which is something I think Hameer suggested in
> the past. Perhaps it's worth re-exploring that avenue.
>
> Eric
>
>
Indeed, and of course for __array_ufunc__ we moved there a bit already,
with `NDArrayOperatorsMixin` [1].
One could certainly similarly have NDShapingMixin that, e.g., relied on
`shape`, `reshape`, and `transpose` to implement `ravel`, `swapaxes`, etc.
And indeed use those mixins in `ndarray` itself.

For this also having a summary of base functions/methods would be very
helpful.
-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/6b6264af/attachment.html>

From shoyer at gmail.com  Sun Jun  2 16:15:31 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 2 Jun 2019 13:15:31 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAJNV+9t8sJNxdepp_dF-OoWZej8ysjAmEwVw1GgJeuSEQxwYCg@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
 <CAL1kJvBRX4aYFTdxgckSAN6_o8utvxjLmmL3X=Ep3o+Lke-pbQ@mail.gmail.com>
 <CAJNV+9t8sJNxdepp_dF-OoWZej8ysjAmEwVw1GgJeuSEQxwYCg@mail.gmail.com>
Message-ID: <CAEQ_Tvd5MuNLi4N2kyzsBYy=joMvJZvdeoaOU3evfhAi0h3vzg@mail.gmail.com>

On Sun, Jun 2, 2019 at 1:08 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

>
>
> On Sun, Jun 2, 2019 at 2:21 PM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> Some of your categories here sound like they might be suitable for ABCs
>> that provide mixin methods, which is something I think Hameer suggested in
>> the past. Perhaps it's worth re-exploring that avenue.
>>
>> Eric
>>
>>
> Indeed, and of course for __array_ufunc__ we moved there a bit already,
> with `NDArrayOperatorsMixin` [1].
> One could certainly similarly have NDShapingMixin that, e.g., relied on
> `shape`, `reshape`, and `transpose` to implement `ravel`, `swapaxes`, etc.
> And indeed use those mixins in `ndarray` itself.
>
> For this also having a summary of base functions/methods would be very
> helpful.
> -- Marten
>


I would definitely support writing more mixins and helper functions (either
in NumPy, or externally) to make it easier to re-implement NumPy's public
API. Certainly there is plenty of room to make it easier to leverage
__array_ufunc__ and __array_function__.

For some recent examples of what these helpers functions could look like,
see JAX's implementation of NumPy, which is written in terms of a much
smaller array library called LAX:
https://github.com/google/jax/blob/9dfe27880517d5583048e7a3384b504681968fb4/jax/numpy/lax_numpy.py

Hypothetically, JAX could be written on top of a "restricted NumPy"
instead, which in turn could have an implementation written in LAX. This
would facilitate reusing JAX's higher level functions for automatic
differentiation and vectorization on top of different array backends.

I would also be happy to see guidance for NumPy API re-implementers, both
for those scratching from scratch (e.g., in a new language) or who plan to
copy NumPy's Python API (e.g., with __array_function__).

I would focus on:
1. Describing the tradeoffs of challenging design decisions that NumPy may
have gotten wrong, e.g., scalars and indexing.
2. Describing common "gotchas" where it's easy to deviate from NumPy's
semantics unintentionally, e.g., with scalar arithmetic dtypes or indexing
edge cases.

I would *not* try to identify a "core" list of methods/functionality to
implement. Everyone uses their own slice of NumPy's API, so the rational
approach for anyone trying to reimplement exactly (i.e., with
__array_function__) is to start with a minimal subset and add functionality
on demand to meet user's needs. Also, many of the choices involved in
making an array library don't really have objectively right or wrong
answers, and authors are going to make intentional deviations from NumPy's
semantics when it makes sense for them.

Cheers,
Stephan


>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/e2a0166f/attachment.html>

From madicken.munk at gmail.com  Sun Jun  2 16:46:14 2019
From: madicken.munk at gmail.com (Madicken Munk)
Date: Sun, 2 Jun 2019 15:46:14 -0500
Subject: [Numpy-discussion] Only a few days left to submit! -- 2019 John
 Hunter Excellence in Plotting Contest
Message-ID: <CACZp+DFGNQa9irEGk-f4DTFc9G_79iv7WxcdJOdJ84Vt2eTGdw@mail.gmail.com>

Hi everybody, There are only a few days left to submit to the 2019 John
Hunter Excellence in Plotting Contest! If you're interested in
participating, note that you have until June 8th to prepare your
submission.


In memory of John Hunter, we are pleased to be reviving the SciPy John
Hunter Excellence in Plotting Competition for 2019. This open competition
aims to highlight the importance of data visualization to scientific
progress and showcase the capabilities of open source software.

Participants are invited to submit scientific plots to be judged by a
panel. The winning entries will be announced and displayed at the
conference.

John Hunter?s family and NumFocus are graciously sponsoring cash prizes for
the winners in the following amounts:


   -

   1st prize: $1000
   -

   2nd prize: $750
   -

   3rd prize: $500


   -

   Entries must be submitted by June, 8th to the form at
   https://goo.gl/forms/cFTB3FUBrMPfQ7Vz1
   -

   Winners will be announced at Scipy 2019 in Austin, TX.
   -

   Participants do not need to attend the Scipy conference.
   -

   Entries may take the definition of ?visualization? rather broadly.
   Entries may be, for example, a traditional printed plot, an interactive
   visualization for the web, or an animation.
   -

   Source code for the plot must be provided, in the form of Python code
   and/or a Jupyter notebook, along with a rendering of the plot in a widely
   used format.  This may be, for example, PDF for print, standalone HTML and
   Javascript for an interactive plot, or MPEG-4 for a video. If the original
   data can not be shared for reasons of size or licensing, "fake" data may be
   substituted, along with an image of the plot using real data.
   -

   Each entry must include a 300-500 word abstract describing the plot and
   its importance for a general scientific audience.
   -

   Entries will be judged on their clarity, innovation and aesthetics, but
   most importantly for their effectiveness in communicating a real-world
   problem. Entrants are encouraged to submit plots that were used during the
   course of research or work, rather than merely being hypothetical.
   -

   SciPy reserves the right to display any and all entries, whether
   prize-winning or not, at the conference, use in any materials or on its
   website, with attribution to the original author(s).


SciPy John Hunter Excellence in Plotting Competition Co-Chairs

Hannah Aizenman

Thomas Caswell

Madicken Munk

Nelle Varoquaux
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/52941139/attachment-0001.html>

From chris.barker at noaa.gov  Sun Jun  2 17:47:22 2019
From: chris.barker at noaa.gov (Chris Barker - NOAA Federal)
Date: Sun, 2 Jun 2019 14:47:22 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQiCW6U0wKn7Nw3=xw8Uv0ygWUi-i7oRpopvvcBnJ8wubg@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
 <CABL7CQiCW6U0wKn7Nw3=xw8Uv0ygWUi-i7oRpopvvcBnJ8wubg@mail.gmail.com>
Message-ID: <CALGmxE+pWo6+=QLuY4ggPbjiWDajSbtUu4JMyrg+pbQnsc_b8g@mail.gmail.com>

> Exactly. This is great, thanks Marten. I agree with pretty much everything in this list.

For my part, a few things immediately popped out at my that I disagree with. ;-)

Which does not mean it isn?t a useful exercise, but it does mean we
should expect a fair bit of debate.

But I do think we should be clear as to what the point is:

I think it could be helpful for clarifying for new and long standing
users of numpy what the ?numpythonic? way to use numpy is.

I think this is very closely tied to the duck typing discussion.

But for guiding implementations of ?numpy-like? libraries, not so
much: they are going to implement the features their users need ?
whether it?s ?officially? part of the numpy API is a minor concern.
Unless there is an official ?Standard?, but it doesn?t sound like
anyone has that in mind.

I?m also a bit confused as to the scope: is this effort about the
python API only? In which case, I?m not sure how it relates to
libraries in/for other languages. Or only about those that provide a
Python binding?

When I first read the topic of this thread, I expected it to be about
the C API ? it would be nice to clearly define what parts of the C API
are considered public and stable. (Though maybe that?s already done ?
I do get numpy API deprecation warnings at times..)

-CHB

From chris.barker at noaa.gov  Sun Jun  2 18:07:07 2019
From: chris.barker at noaa.gov (Chris Barker)
Date: Sun, 2 Jun 2019 15:07:07 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAMucfLxOfMkvNtRJYdqwPORHWknUeDv9bq6TjwroenP_LU27Sg@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAPJVwBmoXOTdrtrKVkKDtxnsZsEkGnWBRfbMGE5OXxgqe5JSXQ@mail.gmail.com>
 <CABL7CQiK-mY1obQ+R3qY+21zdhs8jYWpF19q_5DWpmBwQe6ETw@mail.gmail.com>
 <CAPJVwBkX+UuzTmR9ibUjhMTnisqX1Q7uJr+-s4G3eHuJgNwA0g@mail.gmail.com>
 <CABL7CQhCjJ1nG5gfGxfumzPd4reMhoUBtWRN+-Or+XaN9Hbp6w@mail.gmail.com>
 <CAMucfLzo2Akx1Am0jpPmD_pOwgtdMfooJLQ1kpQtn19O7eaqBg@mail.gmail.com>
 <CABL7CQiZ=tMhte-x4dp9tFAbuD7f3jnKFHEWjMJawz_-6=vjRQ@mail.gmail.com>
 <CABL7CQh1DwQbAoNF_BWVvCVQsPWLTd2wP2P=j+uxKvHPvcoF+A@mail.gmail.com>
 <CAMucfLxOfMkvNtRJYdqwPORHWknUeDv9bq6TjwroenP_LU27Sg@mail.gmail.com>
Message-ID: <CALGmxEJjZf_NPN1Yx4E2SPB_=Gt3S8nNPMA2QV6rTtvBGhaBLg@mail.gmail.com>

On Sun, Jun 2, 2019 at 3:45 AM Dashamir Hoxha <dashohoxha at gmail.com> wrote:

>
> Would it be useful if we could integrate the documentation system with a
> discussion forum (like Discourse.org)? Each function can be linked to its
> own discussion topic, where users and developers can discuss about the
> function, upvote or downvote it etc. This kind of discussion seems to be a
> bit more structured than a mailing list discussion.
>

We could make a giHub repo for a document, and use issues to separately
discuss each topic.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190602/c85d9108/attachment.html>

From einstein.edison at gmail.com  Sun Jun  2 20:54:45 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Mon, 3 Jun 2019 02:54:45 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAEQ_Tvd5MuNLi4N2kyzsBYy=joMvJZvdeoaOU3evfhAi0h3vzg@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAJNV+9ukoz7Z9NbWDb+hXAC8bkx3Ok2RrNYU5fYQav9DVMTU+w@mail.gmail.com>
 <CAL1kJvBRX4aYFTdxgckSAN6_o8utvxjLmmL3X=Ep3o+Lke-pbQ@mail.gmail.com>
 <CAJNV+9t8sJNxdepp_dF-OoWZej8ysjAmEwVw1GgJeuSEQxwYCg@mail.gmail.com>
 <CAEQ_Tvd5MuNLi4N2kyzsBYy=joMvJZvdeoaOU3evfhAi0h3vzg@mail.gmail.com>
Message-ID: <d3d1f8e1-c9ec-4a70-b421-48bd67b18d10@Canary>

I would agree that the set should be minimal at first, but would comment that we should still have a better taxonomy of functions that should be supported, in terms of the functionality they provide and functionality that is required for them to work. E.g. __setitem__ needs immutability.

Best Regards,
Hameer Abbasi

> On Sunday, Jun 02, 2019 at 10:16 PM, Stephan Hoyer <shoyer at gmail.com (mailto:shoyer at gmail.com)> wrote:
> On Sun, Jun 2, 2019 at 1:08 PM Marten van Kerkwijk <m.h.vankerkwijk at gmail.com (mailto:m.h.vankerkwijk at gmail.com)> wrote:
> >
> >
> > On Sun, Jun 2, 2019 at 2:21 PM Eric Wieser <wieser.eric+numpy at gmail.com (mailto:wieser.eric%2Bnumpy at gmail.com)> wrote:
> > > Some of your categories here sound like they might be suitable for ABCs that provide mixin methods, which is something I think Hameer suggested in the past. Perhaps it's worth re-exploring that avenue.
> > >
> > > Eric
> > >
> >
> > Indeed, and of course for __array_ufunc__ we moved there a bit already, with `NDArrayOperatorsMixin` [1].
> > One could certainly similarly have NDShapingMixin that, e.g., relied on `shape`, `reshape`, and `transpose` to implement `ravel`, `swapaxes`, etc. And indeed use those mixins in `ndarray` itself.
> >
> > For this also having a summary of base functions/methods would be very helpful.
> > -- Marten
>
>
> I would definitely support writing more mixins and helper functions (either in NumPy, or externally) to make it easier to re-implement NumPy's public API. Certainly there is plenty of room to make it easier to leverage __array_ufunc__ and __array_function__.
>
> For some recent examples of what these helpers functions could look like, see JAX's implementation of NumPy, which is written in terms of a much smaller array library called LAX:
> https://github.com/google/jax/blob/9dfe27880517d5583048e7a3384b504681968fb4/jax/numpy/lax_numpy.py
>
> Hypothetically, JAX could be written on top of a "restricted NumPy" instead, which in turn could have an implementation written in LAX. This would facilitate reusing JAX's higher level functions for automatic differentiation and vectorization on top of different array backends.
>
> I would also be happy to see guidance for NumPy API re-implementers, both for those scratching from scratch (e.g., in a new language) or who plan to copy NumPy's Python API (e.g., with __array_function__).
>
> I would focus on:
> 1. Describing the tradeoffs of challenging design decisions that NumPy may have gotten wrong, e.g., scalars and indexing.
> 2. Describing common "gotchas" where it's easy to deviate from NumPy's semantics unintentionally, e.g., with scalar arithmetic dtypes or indexing edge cases.
>
> I would *not* try to identify a "core" list of methods/functionality to implement. Everyone uses their own slice of NumPy's API, so the rational approach for anyone trying to reimplement exactly (i.e., with __array_function__) is to start with a minimal subset and add functionality on demand to meet user's needs. Also, many of the choices involved in making an array library don't really have objectively right or wrong answers, and authors are going to make intentional deviations from NumPy's semantics when it makes sense for them.
>
> Cheers,
> Stephan
>
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org (mailto:NumPy-Discussion at python.org)
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190603/0b8d8944/attachment-0001.html>

From sebastian at sipsolutions.net  Mon Jun  3 13:55:38 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 03 Jun 2019 12:55:38 -0500
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQj62dZLCrc33D6ME35KsDnT-rCCD=9dcJP1C-Sg9e8_=g@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAMucfLzBC1YSjmbOOe7hWFoRWkX-pqitP6m23W1BikGH1HdjZg@mail.gmail.com>
 <CABL7CQj62dZLCrc33D6ME35KsDnT-rCCD=9dcJP1C-Sg9e8_=g@mail.gmail.com>
Message-ID: <5b80b21127a82dade2539bf3ecdffb011b9695d5.camel@sipsolutions.net>

On Sun, 2019-06-02 at 08:42 +0200, Ralf Gommers wrote:
> 
> 
<snip>
> > > 
> > 
> > This sounds like a restructuring or factorization of the API, in
> > order to make it smaller, and thus easier to learn and use.
> > It may start with the docs, by paying more attention to the "core"
> > or important functions and methods, and noting the deprecated, or
> > not frequently used, or not important functions. This could also
> > help the satellite projects, which use NumPy API as an example, and
> > may also be influenced by them and their decisions.
> > 
> 
>  Indeed. It will help restructure our docs. Perhaps not the reference
> guide (not sure yet), but definitely the user guide and other high-
> level docs we (or third parties) may want to create.
> 

Trying to follow the discussion, there seems to be various ideas? Do I
understand it right that the original proposal was much like doing a
list of:

  * np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate
  * np.ravel_multi_index: low importance, but distinct feature

Maybe with added groups such as "transpose-like" and "reshape-like"
functions?
This would be based on 1. "Experience" and 2. usage statistics. This
seems mostly a task for 2-3 people to then throw out there for
discussion.
There will be some very difficult/impossible calls, since in the end
Nathaniel is right, we do not quite know the question we want to
answer. But for a huge part of the API it may not be problematic?


Then there is an idea of providing better mixins (and tests).
This could be made easier by the first idea, for prioritization.
Although, the first idea is probably not really necessary to kick this
off at all. The interesting parts to me seem likely how to best solve
testing of the mixins and numpy-api-duplicators in general.

Implementing a growing set of mixin seems likely fairly straight
forwrad (although maybe much easier to approach if there is a list from
the first project)? And, once we have a start, maybe we can rely on the
array-like implementors to be the main developers (limiting us mostly
to review).


The last part would be probably for users and consumers of array-likes. 
This largely overlaps, but comes closer to the problem of "standard".
If we have a list of functions that we tend to see as more or less
important, it may be interesting for downstream projects to restrict
themselves to simplify interoperability e.g. with dask.

Maybe we do not have to draw a strict line though? How plausible would
it be to set up a list (best auto-updating) saying nothing but:

`np.concatenate` supported by: dask, jax, cupy


I am not sure if this is helpful, but it feels to me that the first
part is what Ralf was thinking of? Just to kick of such a a "living
document". I could maybe help with providing the second pair of eyes
for a first iteration there, Ralf. The last list I would actually find
interesting myself, but not sure how easy it would be to approach it?

Best,

Sebastian


> Ralf
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190603/187eb5eb/attachment.sig>

From stefanv at berkeley.edu  Mon Jun  3 16:25:44 2019
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Mon, 3 Jun 2019 13:25:44 -0700
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
Message-ID: <20190603202544.bocyliaqqfm6dn3m@carbo>

Hi Marten,

On Sat, 01 Jun 2019 12:11:38 -0400, Marten van Kerkwijk wrote:
> Third, we could actual implementing the logical groupings identified in the
> code base (and describing them!). Currently, it is a mess: for the C files,
> I typically have to grep to even find where things are done, and while for
> the functions defined in python files that is not necessary, many have
> historical rather than logical groupings (looking at you, `from_numeric`!),
> and even more descriptive ones like `shape_base` are split over `lib` and
> `core`. I think it would help everybody if we went to a python-like layout,
> with a true core and libraries such as polynomial, fft, ma, etc.

How hard do you think it would be to address this issue?  You seem to
have some notion of which pain points should be prioritized, and it
might be useful to jot those down somewhere (tracking issue on GitHub?).

St?fan

From m.h.vankerkwijk at gmail.com  Mon Jun  3 21:00:42 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 3 Jun 2019 21:00:42 -0400
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <20190603202544.bocyliaqqfm6dn3m@carbo>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <20190603202544.bocyliaqqfm6dn3m@carbo>
Message-ID: <CAJNV+9snKBUj8PeOE7N8vhfS+srp34Gbx9j_RXuHExfBwbb2bg@mail.gmail.com>

Hi Stefan,

On Mon, Jun 3, 2019 at 4:26 PM Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> Hi Marten,
>
> On Sat, 01 Jun 2019 12:11:38 -0400, Marten van Kerkwijk wrote:
> > Third, we could actual implementing the logical groupings identified in
> the
> > code base (and describing them!). Currently, it is a mess: for the C
> files,
> > I typically have to grep to even find where things are done, and while
> for
> > the functions defined in python files that is not necessary, many have
> > historical rather than logical groupings (looking at you,
> `from_numeric`!),
> > and even more descriptive ones like `shape_base` are split over `lib` and
> > `core`. I think it would help everybody if we went to a python-like
> layout,
> > with a true core and libraries such as polynomial, fft, ma, etc.
>
> How hard do you think it would be to address this issue?  You seem to
> have some notion of which pain points should be prioritized, and it
> might be useful to jot those down somewhere (tracking issue on GitHub?).
>

The python side would, I think, not be too hard. But I don't really have
that much of a notion - it would very much be informed by making a list
first. For the C parts, I feel even more at a loss: one really would have
to start with a summary of what is actually there (and I think the
organization may well be quite logical already; I've not so felt it was
wrong as in need of an overview).

Somewhat of an aside, but relevant for the general discussion:
updating/rewriting the user documentation may well be the best *first*
step. It certainly doesn't hurt to try to make some list now, but my guess
that the best one will emerge only when one tries to summarize what a new
user should know/understand.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190603/05985c87/attachment.html>

From ralf.gommers at gmail.com  Tue Jun  4 03:04:54 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 4 Jun 2019 09:04:54 +0200
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <5b80b21127a82dade2539bf3ecdffb011b9695d5.camel@sipsolutions.net>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAMucfLzBC1YSjmbOOe7hWFoRWkX-pqitP6m23W1BikGH1HdjZg@mail.gmail.com>
 <CABL7CQj62dZLCrc33D6ME35KsDnT-rCCD=9dcJP1C-Sg9e8_=g@mail.gmail.com>
 <5b80b21127a82dade2539bf3ecdffb011b9695d5.camel@sipsolutions.net>
Message-ID: <CABL7CQgEC5-sPNxew196=c7WqspDLB0yR0P7U2Yh4J4BkDhBNw@mail.gmail.com>

On Mon, Jun 3, 2019 at 7:56 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Sun, 2019-06-02 at 08:42 +0200, Ralf Gommers wrote:
> >
> >
> <snip>
> > > >
> > >
> > > This sounds like a restructuring or factorization of the API, in
> > > order to make it smaller, and thus easier to learn and use.
> > > It may start with the docs, by paying more attention to the "core"
> > > or important functions and methods, and noting the deprecated, or
> > > not frequently used, or not important functions. This could also
> > > help the satellite projects, which use NumPy API as an example, and
> > > may also be influenced by them and their decisions.
> > >
> >
> >  Indeed. It will help restructure our docs. Perhaps not the reference
> > guide (not sure yet), but definitely the user guide and other high-
> > level docs we (or third parties) may want to create.
> >
>
> Trying to follow the discussion, there seems to be various ideas? Do I
> understand it right that the original proposal was much like doing a
> list of:
>
>   * np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate
>   * np.ravel_multi_index: low importance, but distinct feature
>

Indeed. Certainly no more than that was my idea.


> Maybe with added groups such as "transpose-like" and "reshape-like"
> functions?
> This would be based on 1. "Experience" and 2. usage statistics. This
> seems mostly a task for 2-3 people to then throw out there for
> discussion.
> There will be some very difficult/impossible calls, since in the end
> Nathaniel is right, we do not quite know the question we want to
> answer. But for a huge part of the API it may not be problematic?
>

Agreed, won't be problematic.


>
> Then there is an idea of providing better mixins (and tests).
> This could be made easier by the first idea, for prioritization.
> Although, the first idea is probably not really necessary to kick this
> off at all. The interesting parts to me seem likely how to best solve
> testing of the mixins and numpy-api-duplicators in general.
>
> Implementing a growing set of mixin seems likely fairly straight
> forwrad (although maybe much easier to approach if there is a list from
> the first project)?


Indeed. I think there's actually 3 levels here (at least):
1. function name: high/low importance or some such simple classification
2. function signature and behavior: is the behavior optimal, what would be
change, etc.
3. making duck arrays and subclasses that rely on all those functions and
their behavior easier to implemement/use

Mixins are a specific answer to (3). And it's unclear if they're the best
answer (could be, I don't know - please don't start a discussion on that
here). Either way, working on (3) will be helped by having a better sense
of (1) and (2).

Also think about effort: (2) is at least an order of magnitude more work
than (1), and (3) likely even more work than (2).


> And, once we have a start, maybe we can rely on the
> array-like implementors to be the main developers (limiting us mostly
> to review).
>
>
> The last part would be probably for users and consumers of array-likes.
> This largely overlaps, but comes closer to the problem of "standard".
> If we have a list of functions that we tend to see as more or less
> important, it may be interesting for downstream projects to restrict
> themselves to simplify interoperability e.g. with dask.
>
> Maybe we do not have to draw a strict line though? How plausible would
> it be to set up a list (best auto-updating) saying nothing but:
>
> `np.concatenate` supported by: dask, jax, cupy
>

That's probably not that hard, and I agree it would be quite useful. The
namespaces of each of those libraries is probably not the same, but with
dir() and some strings and lists you'll get a long way here I think.


>
> I am not sure if this is helpful, but it feels to me that the first
> part is what Ralf was thinking of? Just to kick of such a a "living
> document".


Indeed.

I could maybe help with providing the second pair of eyes
> for a first iteration there, Ralf.


Awesome, thanks Sebastian.

Cheers,
Ralf


The last list I would actually find
> interesting myself, but not sure how easy it would be to approach it?
>
> Best,
>
> Sebastian
>
>
> > Ralf
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190604/eaeebadc/attachment-0001.html>

From tyler.je.reddy at gmail.com  Tue Jun  4 19:14:44 2019
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Tue, 4 Jun 2019 16:14:44 -0700
Subject: [Numpy-discussion] Weekly Community Meeting -- June 5/ 2019
Message-ID: <CAHPuU_bxyqVNk9dDaG0TNLS+eB2C+iv8bFpt4=Qzg0At=HUewQ@mail.gmail.com>

Hi,

There will be a weekly community call on June 5/ 2019--anyone may join and
edit the work-in-progress meeting notes:
https://hackmd.io/5fKOqla6SIqKJMtB7w5Law?view

The conference call link should be in that document.

Best wishes,
Tyler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190604/80be3681/attachment.html>

From tyler.je.reddy at gmail.com  Tue Jun  4 20:28:51 2019
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Tue, 4 Jun 2019 17:28:51 -0700
Subject: [Numpy-discussion] Weekly Community Meeting -- June 5/ 2019
In-Reply-To: <CAHPuU_bxyqVNk9dDaG0TNLS+eB2C+iv8bFpt4=Qzg0At=HUewQ@mail.gmail.com>
References: <CAHPuU_bxyqVNk9dDaG0TNLS+eB2C+iv8bFpt4=Qzg0At=HUewQ@mail.gmail.com>
Message-ID: <CAHPuU_ZeKdFgxWj4nM1crVwz365igvPg14keRbAhisCEXxDz+g@mail.gmail.com>

11 am Pacific Time

On Tue, 4 Jun 2019 at 16:14, Tyler Reddy <tyler.je.reddy at gmail.com> wrote:

> Hi,
>
> There will be a weekly community call on June 5/ 2019--anyone may join and
> edit the work-in-progress meeting notes:
> https://hackmd.io/5fKOqla6SIqKJMtB7w5Law?view
>
> The conference call link should be in that document.
>
> Best wishes,
> Tyler
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190604/55af27c5/attachment.html>

From chris.barker at noaa.gov  Tue Jun  4 22:50:41 2019
From: chris.barker at noaa.gov (Chris Barker)
Date: Tue, 4 Jun 2019 23:50:41 -0300
Subject: [Numpy-discussion] defining a NumPy API standard?
In-Reply-To: <CABL7CQgEC5-sPNxew196=c7WqspDLB0yR0P7U2Yh4J4BkDhBNw@mail.gmail.com>
References: <CABL7CQiEUcym-4y_F-Yjn_m0mdREmvKhO=BAWEmyxWRK_wUR0w@mail.gmail.com>
 <DE2557AE-3865-40B1-AA80-C95FF07BD3E4@mac.com>
 <CAJNV+9uUSZjJxitiVbDXe2pg-c9NN61dYQtxn6zUQQD12GPM7A@mail.gmail.com>
 <CAB6mnx+=u5xkFhAoon0AiDOs9uoiS8ar+Er-eLkDh20d0i+G_Q@mail.gmail.com>
 <631ce03d-0b01-c491-1b23-a1469a87d47e@gmail.com>
 <CABL7CQi1sJmk6ZHWvF3EiA_acFG9NNFhwbujKp=Q1aAB9AwGaw@mail.gmail.com>
 <CAMucfLzBC1YSjmbOOe7hWFoRWkX-pqitP6m23W1BikGH1HdjZg@mail.gmail.com>
 <CABL7CQj62dZLCrc33D6ME35KsDnT-rCCD=9dcJP1C-Sg9e8_=g@mail.gmail.com>
 <5b80b21127a82dade2539bf3ecdffb011b9695d5.camel@sipsolutions.net>
 <CABL7CQgEC5-sPNxew196=c7WqspDLB0yR0P7U2Yh4J4BkDhBNw@mail.gmail.com>
Message-ID: <CALGmxEK=QV6bZoKKeYkbRn6weUhAVxi2OObEq2CHA2O-R8ijJw@mail.gmail.com>

One little point here:


>   * np.ndarray.cumprod: low importance -> prefer np.multiply.accumulate
>>
>
I think that's and example of something that *should* be part of the numpy
API, but should be implemented as a mixin, based on np.multiply.accumulate.

As I'm a still confused about the goal here, that means that:

Users should still use `.cumprod`, but implementers of numpy-like packages
should implement `.multiply.accumulate`, and not directly `cumprod`, but
rather use the numpy ABC, or however it is implemented.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190604/146bc356/attachment.html>

From sebastian at sipsolutions.net  Wed Jun  5 16:41:55 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 05 Jun 2019 15:41:55 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
Message-ID: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>

Hi all,

TL;DR:

Value based promotion seems complex both for users and ufunc-
dispatching/promotion logic. Is there any way we can move forward here,
and if we do, could we just risk some possible (maybe not-existing)
corner cases to break early to get on the way?

-----------

Currently when you write code such as:

arr = np.array([1, 43, 23], dtype=np.uint16)
res = arr + 1

Numpy uses fairly sophisticated logic to decide that `1` can be
represented as a uint16, and thus for all unary functions (and most
others as well), the output will have a `res.dtype` of uint16.

Similar logic also exists for floating point types, where a lower
precision floating point can be used:

arr = np.array([1, 43, 23], dtype=np.float32)
(arr + np.float64(2.)).dtype  # will be float32

Currently, this value based logic is enforced by checking whether the
cast is possible: "4" can be cast to int8, uint8. So the first call
above will at some point check if "uint16 + uint16 -> uint16" is a
valid operation, find that it is, and thus stop searching. (There is
the additional logic, that when both/all operands are scalars, it is
not applied).

Note that while it is defined in terms of casting "1" to uint8 safely
being possible even though 1 may be typed as int64. This logic thus
affects all promotion rules as well (i.e. what should the output dtype
be).


There 2 main discussion points/issues about it:

1. Should value based casting/promotion logic exist at all?

Arguably an `np.int32(3)` has type information attached to it, so why
should we ignore it. It can also be tricky for users, because a small
change in values can change the result data type.
Because 0-D arrays and scalars are too close inside numpy (you will
often not know which one you get). There is not much option but to
handle them identically. However, it seems pretty odd that:
 * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
 * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)

give a different result.

This is a bit different for python scalars, which do not have a type
attached already.


2. Promotion and type resolution in Ufuncs:

What is currently bothering me is that the decision what the output
dtypes should be currently depends on the values in complicated ways.
It would be nice if we can decide which type signature to use without
actually looking at values (or at least only very early on).

One reason here is caching and simplicity. I would like to be able to
cache which loop should be used for what input. Having value based
casting in there bloats up the problem.
Of course it currently works OK, but especially when user dtypes come
into play, caching would seem like a nice optimization option.

Because `uint8(127)` can also be a `int8`, but uint8(128) it is not as
simple as finding the "minimal" dtype once and working with that." 
Of course Eric and I discussed this a bit before, and you could create
an internal "uint7" dtype which has the only purpose of flagging that a
cast to int8 is safe.

I suppose it is possible I am barking up the wrong tree here, and this
caching/predictability is not vital (or can be solved with such an
internal dtype easily, although I am not sure it seems elegant).


Possible options to move forward
--------------------------------

I have to still see a bit how trick things are. But there are a few
possible options. I would like to move the scalar logic to the
beginning of ufunc calls:
  * The uint7 idea would be one solution
  * Simply implement something that works for numpy and all except
    strange external ufuncs (I can only think of numba as a plausible
    candidate for creating such).

My current plan is to see where the second thing leaves me.

We also should see if we cannot move the whole thing forward, in which
case the main decision would have to be forward to where. My opinion is
currently that when a type has a dtype associated with it clearly, we
should always use that dtype in the future. This mostly means that
numpy dtypes such as `np.int64` will always be treated like an int64,
and never like a `uint8` because they happen to be castable to that.

For values without a dtype attached (read python integers, floats), I
see three options, from more complex to simpler:

1. Keep the current logic in place as much as possible
2. Only support value based promotion for operators, e.g.:
   `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
   The upside is that it limits the complexity to a much simpler
   problem, the downside is that the ufunc call and operator match
   less clearly.
3. Just associate python float with float64 and python integers with
   long/int64 and force users to always type them explicitly if they
   need to.

The downside of 1. is that it doesn't help with simplifying the current
situation all that much, because we still have the special casting
around...


I have realized that this got much too long, so I hope it makes sense.
I will continue to dabble along on these things a bit, so if nothing
else maybe writing it helps me to get a bit clearer on things...

Best,

Sebastian


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190605/8bb5aabb/attachment.sig>

From shoyer at gmail.com  Wed Jun  5 17:14:40 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Wed, 5 Jun 2019 14:14:40 -0700
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
Message-ID: <CAEQ_TvfkLR2Yo0OUL+MdokbQv6V+qC6Xk00tSzr-FkpTkZetDg@mail.gmail.com>

On Wed, Jun 5, 2019 at 1:43 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> TL;DR:
>
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
>
> -----------
>
> Currently when you write code such as:
>
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
>
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
>
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
>
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
>
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
>
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output dtype
> be).
>
>
> There 2 main discussion points/issues about it:
>
> 1. Should value based casting/promotion logic exist at all?
>
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, it seems pretty odd that:
>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
>
> give a different result.
>
> This is a bit different for python scalars, which do not have a type
> attached already.
>
>
> 2. Promotion and type resolution in Ufuncs:
>
> What is currently bothering me is that the decision what the output
> dtypes should be currently depends on the values in complicated ways.
> It would be nice if we can decide which type signature to use without
> actually looking at values (or at least only very early on).
>
> One reason here is caching and simplicity. I would like to be able to
> cache which loop should be used for what input. Having value based
> casting in there bloats up the problem.
> Of course it currently works OK, but especially when user dtypes come
> into play, caching would seem like a nice optimization option.
>
> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not as
> simple as finding the "minimal" dtype once and working with that."
> Of course Eric and I discussed this a bit before, and you could create
> an internal "uint7" dtype which has the only purpose of flagging that a
> cast to int8 is safe.
>

Does NumPy actually have an logic that does these sort of checks currently?
If so, it would be interesting to see what it is.

My experiments suggest that we currently have this logic of finding the
"minimal" dtype that can hold the scalar value:

>>> np.array([127], dtype=np.int8) + 127 # silent overflow!
array([-2], dtype=int8)

>>> np.array([127], dtype=np.int8) + 128 # correct result
array([255], dtype=int16)


I suppose it is possible I am barking up the wrong tree here, and this
> caching/predictability is not vital (or can be solved with such an
> internal dtype easily, although I am not sure it seems elegant).
>
>
> Possible options to move forward
> --------------------------------
>
> I have to still see a bit how trick things are. But there are a few
> possible options. I would like to move the scalar logic to the
> beginning of ufunc calls:
>   * The uint7 idea would be one solution
>   * Simply implement something that works for numpy and all except
>     strange external ufuncs (I can only think of numba as a plausible
>     candidate for creating such).
>
> My current plan is to see where the second thing leaves me.
>
> We also should see if we cannot move the whole thing forward, in which
> case the main decision would have to be forward to where. My opinion is
> currently that when a type has a dtype associated with it clearly, we
> should always use that dtype in the future. This mostly means that
> numpy dtypes such as `np.int64` will always be treated like an int64,
> and never like a `uint8` because they happen to be castable to that.
>
> For values without a dtype attached (read python integers, floats), I
> see three options, from more complex to simpler:
>
> 1. Keep the current logic in place as much as possible
> 2. Only support value based promotion for operators, e.g.:
>    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
>    The upside is that it limits the complexity to a much simpler
>    problem, the downside is that the ufunc call and operator match
>    less clearly.
> 3. Just associate python float with float64 and python integers with
>    long/int64 and force users to always type them explicitly if they
>    need to.
>
> The downside of 1. is that it doesn't help with simplifying the current
> situation all that much, because we still have the special casting
> around...
>

I think it would be fine to special case operators, but NEP-13 means that
the ufuncs corresponding to operators really do need to work exactly the
same way. So we should also special-case those ufuncs.

I don't think Option (3) is viable. Too many users rely upon arithmetic
like "x + 1" having a predictable dtype.


> I have realized that this got much too long, so I hope it makes sense.
> I will continue to dabble along on these things a bit, so if nothing
> else maybe writing it helps me to get a bit clearer on things...
>
> Best,
>
> Sebastian
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190605/64d119d4/attachment-0001.html>

From sebastian at sipsolutions.net  Wed Jun  5 18:02:40 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 05 Jun 2019 17:02:40 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <CAEQ_TvfkLR2Yo0OUL+MdokbQv6V+qC6Xk00tSzr-FkpTkZetDg@mail.gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <CAEQ_TvfkLR2Yo0OUL+MdokbQv6V+qC6Xk00tSzr-FkpTkZetDg@mail.gmail.com>
Message-ID: <0a0d821358629a11c9446121e715b1a15f5fce9a.camel@sipsolutions.net>

On Wed, 2019-06-05 at 14:14 -0700, Stephan Hoyer wrote:
> On Wed, Jun 5, 2019 at 1:43 PM Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
> > Hi all,
> > 

<snip>

> > 
> > Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
> > as
> > simple as finding the "minimal" dtype once and working with that." 
> > Of course Eric and I discussed this a bit before, and you could
> > create
> > an internal "uint7" dtype which has the only purpose of flagging
> > that a
> > cast to int8 is safe.
> 
> Does NumPy actually have an logic that does these sort of checks
> currently? If so, it would be interesting to see what it is.
> 
> My experiments suggest that we currently have this logic of finding
> the "minimal" dtype that can hold the scalar value:
> 
> >>> np.array([127], dtype=np.int8) + 127  # silent overflow!
> array([-2], dtype=int8)
> 
> >>> np.array([127], dtype=np.int8) + 128  # correct result
> array([255], dtype=int16)
> 

The current checks all come down to `np.can_cast` (on the C side this
is `PyArray_CanCastArray()`), answering True. The actual result value
is not taken into account of course. So 127 can be represented as int8
and since the "int8,int8->int8" loop is checked first (and "can cast"
correctly) it is used.
Alternatively, you can think of it as using `np.result_type()` which
will, for all practical purposes, give the same dtype (but result type
may or may not be actually used, and there are some subtle differences
in principle).

Effectively, in your example you could reduce it to a minimal dtype of
uint7 for 127, since a uint7 can be cast safely to an int8 and also to
a uint8. (If you would just say the minimal dtype is uint8, you could
not distinguish the two examples).

Does that answer the question?

Best,

Sebastian

> 
> > I suppose it is possible I am barking up the wrong tree here, and
> > this
> > caching/predictability is not vital (or can be solved with such an
> > internal dtype easily, although I am not sure it seems elegant).
> > 
> > 
> > Possible options to move forward
> > --------------------------------
> > 
> > I have to still see a bit how trick things are. But there are a few
> > possible options. I would like to move the scalar logic to the
> > beginning of ufunc calls:
> >   * The uint7 idea would be one solution
> >   * Simply implement something that works for numpy and all except
> >     strange external ufuncs (I can only think of numba as a
> > plausible
> >     candidate for creating such).
> > 
> > My current plan is to see where the second thing leaves me.
> > 
> > We also should see if we cannot move the whole thing forward, in
> > which
> > case the main decision would have to be forward to where. My
> > opinion is
> > currently that when a type has a dtype associated with it clearly,
> > we
> > should always use that dtype in the future. This mostly means that
> > numpy dtypes such as `np.int64` will always be treated like an
> > int64,
> > and never like a `uint8` because they happen to be castable to
> > that.
> > 
> > For values without a dtype attached (read python integers, floats),
> > I
> > see three options, from more complex to simpler:
> > 
> > 1. Keep the current logic in place as much as possible
> > 2. Only support value based promotion for operators, e.g.:
> >    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
> >    The upside is that it limits the complexity to a much simpler
> >    problem, the downside is that the ufunc call and operator match
> >    less clearly.
> > 3. Just associate python float with float64 and python integers
> > with
> >    long/int64 and force users to always type them explicitly if
> > they
> >    need to.
> > 
> > The downside of 1. is that it doesn't help with simplifying the
> > current
> > situation all that much, because we still have the special casting
> > around...
> 
> I think it would be fine to special case operators, but NEP-13 means
> that the ufuncs corresponding to operators really do need to work
> exactly the same way. So we should also special-case those ufuncs.
> 
> I don't think Option (3) is viable. Too many users rely upon
> arithmetic like "x + 1" having a predictable dtype.
>  
> > I have realized that this got much too long, so I hope it makes
> > sense.
> > I will continue to dabble along on these things a bit, so if
> > nothing
> > else maybe writing it helps me to get a bit clearer on things...
> > 
> > Best,
> > 
> > Sebastian
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190605/92f13a56/attachment.sig>

From sebastian at sipsolutions.net  Wed Jun  5 18:36:48 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 05 Jun 2019 17:36:48 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
Message-ID: <7efa604d071e796e275e89ccab8a0f04bae6eb04.camel@sipsolutions.net>

Hi all,

Maybe to clarify this at least a little, here are some examples for
what currently happen and what I could imagine we can go to (all in
terms of output dtype).

float32_arr = np.ones(10, dtype=np.float32)
int8_arr = np.ones(10, dtype=np.int8)
uint8_arr = np.ones(10, dtype=np.uint8)


Current behaviour:
------------------

float32_arr + 12.  # float32
float32_arr + 2**200  # float64 (because np.float32(2**200) == np.inf)

int8_arr + 127     # int8
int8_arr + 128     # int16
int8_arr + 2**20   # int32
uint8_arr + -1     # uint16

# But only for arrays that are not 0d:
int8_arr + np.array(1, dtype=np.int32)  # int8
int8_arr + np.array([1], dtype=np.int32)  # int32

# When the actual typing is given, this does not change:

float32_arr + np.float64(12.)                  # float32
float32_arr + np.array(12., dtype=np.float64)  # float32

# Except for inexact types, or complex:
int8_arr + np.float16(3)  # float16  (same as array behaviour)

# The exact same happens with all ufuncs:
np.add(float32_arr, 1)                               # float32
np.add(float32_arr, np.array(12., dtype=np.float64)  # float32


Keeping Value based casting only for python types
-------------------------------------------------

In this case, most examples above stay unchanged, because they use
plain python integers or floats, such as 2, 127, 12., 3, ... without
any type information attached, such as `np.float64(12.)`.

These change for example:

float32_arr + np.float64(12.)                        # float64
float32_arr + np.array(12., dtype=np.float64)        # float64
np.add(float32_arr, np.array(12., dtype=np.float64)  # float64

# so if you use `np.int32` it will be the same as np.uint64(10000)

int8_arr + np.int32(1)      # int32
int8_arr + np.int32(2**20)  # int32


Remove Value based casting completely
-------------------------------------

We could simply abolish it completely, a python `1` would always behave
the same as `np.int_(1)`. The downside of this is that:

int8_arr + 1  # int64 (or int32)

uses much more memory suddenly. Or, we remove it from ufuncs, but not
from operators:

int8_arr + 1  # int8 dtype

but:

np.add(int8_arr, 1)  # int64
# same as:
np.add(int8_arr, np.array(1))  # int16

The main reason why I was wondering about that is that for operators
the logic seems fairly simple, but for general ufuncs it seems more
complex.

Best,

Sebastian


On Wed, 2019-06-05 at 15:41 -0500, Sebastian Berg wrote:
> Hi all,
> 
> TL;DR:
> 
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward
> here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
> 
> -----------
> 
> Currently when you write code such as:
> 
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
> 
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
> 
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
> 
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
> 
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
> 
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output
> dtype
> be).
> 
> 
> There 2 main discussion points/issues about it:
> 
> 1. Should value based casting/promotion logic exist at all?
> 
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, it seems pretty odd that:
>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> 
> give a different result.
> 
> This is a bit different for python scalars, which do not have a type
> attached already.
> 
> 
> 2. Promotion and type resolution in Ufuncs:
> 
> What is currently bothering me is that the decision what the output
> dtypes should be currently depends on the values in complicated ways.
> It would be nice if we can decide which type signature to use without
> actually looking at values (or at least only very early on).
> 
> One reason here is caching and simplicity. I would like to be able to
> cache which loop should be used for what input. Having value based
> casting in there bloats up the problem.
> Of course it currently works OK, but especially when user dtypes come
> into play, caching would seem like a nice optimization option.
> 
> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
> as
> simple as finding the "minimal" dtype once and working with that." 
> Of course Eric and I discussed this a bit before, and you could
> create
> an internal "uint7" dtype which has the only purpose of flagging that
> a
> cast to int8 is safe.
> 
> I suppose it is possible I am barking up the wrong tree here, and
> this
> caching/predictability is not vital (or can be solved with such an
> internal dtype easily, although I am not sure it seems elegant).
> 
> 
> Possible options to move forward
> --------------------------------
> 
> I have to still see a bit how trick things are. But there are a few
> possible options. I would like to move the scalar logic to the
> beginning of ufunc calls:
>   * The uint7 idea would be one solution
>   * Simply implement something that works for numpy and all except
>     strange external ufuncs (I can only think of numba as a plausible
>     candidate for creating such).
> 
> My current plan is to see where the second thing leaves me.
> 
> We also should see if we cannot move the whole thing forward, in
> which
> case the main decision would have to be forward to where. My opinion
> is
> currently that when a type has a dtype associated with it clearly, we
> should always use that dtype in the future. This mostly means that
> numpy dtypes such as `np.int64` will always be treated like an int64,
> and never like a `uint8` because they happen to be castable to that.
> 
> For values without a dtype attached (read python integers, floats), I
> see three options, from more complex to simpler:
> 
> 1. Keep the current logic in place as much as possible
> 2. Only support value based promotion for operators, e.g.:
>    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
>    The upside is that it limits the complexity to a much simpler
>    problem, the downside is that the ufunc call and operator match
>    less clearly.
> 3. Just associate python float with float64 and python integers with
>    long/int64 and force users to always type them explicitly if they
>    need to.
> 
> The downside of 1. is that it doesn't help with simplifying the
> current
> situation all that much, because we still have the special casting
> around...
> 
> 
> I have realized that this got much too long, so I hope it makes
> sense.
> I will continue to dabble along on these things a bit, so if nothing
> else maybe writing it helps me to get a bit clearer on things...
> 
> Best,
> 
> Sebastian
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190605/449d16a6/attachment.sig>

From tyler.je.reddy at gmail.com  Wed Jun  5 20:14:35 2019
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Wed, 5 Jun 2019 17:14:35 -0700
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <7efa604d071e796e275e89ccab8a0f04bae6eb04.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <7efa604d071e796e275e89ccab8a0f04bae6eb04.camel@sipsolutions.net>
Message-ID: <CAHPuU_bhoUQQZicM7i7NQrSSQF1q5mzq1JxGwA1dwLxCvZuosA@mail.gmail.com>

A few thoughts:

- We're not trying to achieve systematic guards against integer overflow /
wrapping in ufunc inner loops, right? The performance tradeoffs for a
"result-based" casting / exception handling addition would presumably be
controversial? I know there was some discussion about having an "overflow
detection mode"  (toggle) of some sort that could be activated for ufunc
loops, but don't think that gained much traction/ priority. I think for
floats we have an awkward way to propagate something back to the user if
there's an issue.
- It sounds like the objective is instead primarily to achieve pure
dtype-based promotion, which is then effectively just a casting table,
which is what I think you mean by "cache?"
- Is it a safe assumption that for a cache (dtype-only casting table), the
main tradeoff is that we'd likely tend towards conservative upcasting and
using more memory in output types in many cases vs. NumPy at the moment?
Stephan seems concerned about that, presumably because x + 1 suddenly
changes output dtype in an overwhelming number of current code lines and
future simple examples for end users.
- If np.array + 1 absolutely has to stay the same output dtype moving
forward, then "Keeping Value based casting only for python types" is the
one that looks most promising to me initially, with a few further concerns:

1) Would that give you enough refactoring "wiggle room" to achieve the
simplifications you need? If value-based promotion still happens for a
non-NumPy operand, can you abstract that logic cleanly from the "pure dtype
cache / table" that is planned for NumPy operands?
2) Is the "out" argument to ufuncs a satisfactory alternative to the "power
users" who want to "override" default output casting type? We suggest that
they pre-allocate an output array of the desired type if they want to save
memory and if they overflow or wrap integers that is their problem. Can we
reasonably ask people who currently depend on the memory-conservation they
might get from value-based behavior to adjust in this way?
3) Presumably "out" does / will circumvent the "cache / dtype casting
table?"

Tyler

On Wed, 5 Jun 2019 at 15:37, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> Maybe to clarify this at least a little, here are some examples for
> what currently happen and what I could imagine we can go to (all in
> terms of output dtype).
>
> float32_arr = np.ones(10, dtype=np.float32)
> int8_arr = np.ones(10, dtype=np.int8)
> uint8_arr = np.ones(10, dtype=np.uint8)
>
>
> Current behaviour:
> ------------------
>
> float32_arr + 12.  # float32
> float32_arr + 2**200  # float64 (because np.float32(2**200) == np.inf)
>
> int8_arr + 127     # int8
> int8_arr + 128     # int16
> int8_arr + 2**20   # int32
> uint8_arr + -1     # uint16
>
> # But only for arrays that are not 0d:
> int8_arr + np.array(1, dtype=np.int32)  # int8
> int8_arr + np.array([1], dtype=np.int32)  # int32
>
> # When the actual typing is given, this does not change:
>
> float32_arr + np.float64(12.)                  # float32
> float32_arr + np.array(12., dtype=np.float64)  # float32
>
> # Except for inexact types, or complex:
> int8_arr + np.float16(3)  # float16  (same as array behaviour)
>
> # The exact same happens with all ufuncs:
> np.add(float32_arr, 1)                               # float32
> np.add(float32_arr, np.array(12., dtype=np.float64)  # float32
>
>
> Keeping Value based casting only for python types
> -------------------------------------------------
>
> In this case, most examples above stay unchanged, because they use
> plain python integers or floats, such as 2, 127, 12., 3, ... without
> any type information attached, such as `np.float64(12.)`.
>
> These change for example:
>
> float32_arr + np.float64(12.)                        # float64
> float32_arr + np.array(12., dtype=np.float64)        # float64
> np.add(float32_arr, np.array(12., dtype=np.float64)  # float64
>
> # so if you use `np.int32` it will be the same as np.uint64(10000)
>
> int8_arr + np.int32(1)      # int32
> int8_arr + np.int32(2**20)  # int32
>
>
> Remove Value based casting completely
> -------------------------------------
>
> We could simply abolish it completely, a python `1` would always behave
> the same as `np.int_(1)`. The downside of this is that:
>
> int8_arr + 1  # int64 (or int32)
>
> uses much more memory suddenly. Or, we remove it from ufuncs, but not
> from operators:
>
> int8_arr + 1  # int8 dtype
>
> but:
>
> np.add(int8_arr, 1)  # int64
> # same as:
> np.add(int8_arr, np.array(1))  # int16
>
> The main reason why I was wondering about that is that for operators
> the logic seems fairly simple, but for general ufuncs it seems more
> complex.
>
> Best,
>
> Sebastian
>
>
>
> On Wed, 2019-06-05 at 15:41 -0500, Sebastian Berg wrote:
> > Hi all,
> >
> > TL;DR:
> >
> > Value based promotion seems complex both for users and ufunc-
> > dispatching/promotion logic. Is there any way we can move forward
> > here,
> > and if we do, could we just risk some possible (maybe not-existing)
> > corner cases to break early to get on the way?
> >
> > -----------
> >
> > Currently when you write code such as:
> >
> > arr = np.array([1, 43, 23], dtype=np.uint16)
> > res = arr + 1
> >
> > Numpy uses fairly sophisticated logic to decide that `1` can be
> > represented as a uint16, and thus for all unary functions (and most
> > others as well), the output will have a `res.dtype` of uint16.
> >
> > Similar logic also exists for floating point types, where a lower
> > precision floating point can be used:
> >
> > arr = np.array([1, 43, 23], dtype=np.float32)
> > (arr + np.float64(2.)).dtype  # will be float32
> >
> > Currently, this value based logic is enforced by checking whether the
> > cast is possible: "4" can be cast to int8, uint8. So the first call
> > above will at some point check if "uint16 + uint16 -> uint16" is a
> > valid operation, find that it is, and thus stop searching. (There is
> > the additional logic, that when both/all operands are scalars, it is
> > not applied).
> >
> > Note that while it is defined in terms of casting "1" to uint8 safely
> > being possible even though 1 may be typed as int64. This logic thus
> > affects all promotion rules as well (i.e. what should the output
> > dtype
> > be).
> >
> >
> > There 2 main discussion points/issues about it:
> >
> > 1. Should value based casting/promotion logic exist at all?
> >
> > Arguably an `np.int32(3)` has type information attached to it, so why
> > should we ignore it. It can also be tricky for users, because a small
> > change in values can change the result data type.
> > Because 0-D arrays and scalars are too close inside numpy (you will
> > often not know which one you get). There is not much option but to
> > handle them identically. However, it seems pretty odd that:
> >  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
> >  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> >
> > give a different result.
> >
> > This is a bit different for python scalars, which do not have a type
> > attached already.
> >
> >
> > 2. Promotion and type resolution in Ufuncs:
> >
> > What is currently bothering me is that the decision what the output
> > dtypes should be currently depends on the values in complicated ways.
> > It would be nice if we can decide which type signature to use without
> > actually looking at values (or at least only very early on).
> >
> > One reason here is caching and simplicity. I would like to be able to
> > cache which loop should be used for what input. Having value based
> > casting in there bloats up the problem.
> > Of course it currently works OK, but especially when user dtypes come
> > into play, caching would seem like a nice optimization option.
> >
> > Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
> > as
> > simple as finding the "minimal" dtype once and working with that."
> > Of course Eric and I discussed this a bit before, and you could
> > create
> > an internal "uint7" dtype which has the only purpose of flagging that
> > a
> > cast to int8 is safe.
> >
> > I suppose it is possible I am barking up the wrong tree here, and
> > this
> > caching/predictability is not vital (or can be solved with such an
> > internal dtype easily, although I am not sure it seems elegant).
> >
> >
> > Possible options to move forward
> > --------------------------------
> >
> > I have to still see a bit how trick things are. But there are a few
> > possible options. I would like to move the scalar logic to the
> > beginning of ufunc calls:
> >   * The uint7 idea would be one solution
> >   * Simply implement something that works for numpy and all except
> >     strange external ufuncs (I can only think of numba as a plausible
> >     candidate for creating such).
> >
> > My current plan is to see where the second thing leaves me.
> >
> > We also should see if we cannot move the whole thing forward, in
> > which
> > case the main decision would have to be forward to where. My opinion
> > is
> > currently that when a type has a dtype associated with it clearly, we
> > should always use that dtype in the future. This mostly means that
> > numpy dtypes such as `np.int64` will always be treated like an int64,
> > and never like a `uint8` because they happen to be castable to that.
> >
> > For values without a dtype attached (read python integers, floats), I
> > see three options, from more complex to simpler:
> >
> > 1. Keep the current logic in place as much as possible
> > 2. Only support value based promotion for operators, e.g.:
> >    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
> >    The upside is that it limits the complexity to a much simpler
> >    problem, the downside is that the ufunc call and operator match
> >    less clearly.
> > 3. Just associate python float with float64 and python integers with
> >    long/int64 and force users to always type them explicitly if they
> >    need to.
> >
> > The downside of 1. is that it doesn't help with simplifying the
> > current
> > situation all that much, because we still have the special casting
> > around...
> >
> >
> > I have realized that this got much too long, so I hope it makes
> > sense.
> > I will continue to dabble along on these things a bit, so if nothing
> > else maybe writing it helps me to get a bit clearer on things...
> >
> > Best,
> >
> > Sebastian
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190605/e0823e8b/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Wed Jun  5 21:35:26 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 5 Jun 2019 21:35:26 -0400
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <CAHPuU_bhoUQQZicM7i7NQrSSQF1q5mzq1JxGwA1dwLxCvZuosA@mail.gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <7efa604d071e796e275e89ccab8a0f04bae6eb04.camel@sipsolutions.net>
 <CAHPuU_bhoUQQZicM7i7NQrSSQF1q5mzq1JxGwA1dwLxCvZuosA@mail.gmail.com>
Message-ID: <CAJNV+9s5-hW_6uLtB+xQuwNKTAFsZNhFEkrK7MFgyYr0iNtXsw@mail.gmail.com>

Hi Sebastian,

Tricky! It seems a balance between unexpected memory blow-up and unexpected
wrapping (the latter mostly for integers).

Some comments specifically on your message first, then some more general
related ones.

1. I'm very much against letting `a + b` do anything else than `np.add(a,
b)`.
2. For python values, an argument for casting by value is that a python int
can be arbitrarily long; the only reasonable course of action for those
seems to make them float, and once you do that one might as well cast to
whatever type can hold the value (at least approximately).
3. Not necessarily preferred, but for casting of scalars, one can get more
consistent behaviour also by extending the casting by value to any array
that has size=1.

Overall, just on the narrow question, I'd be quite happy with your
suggestion of using type information if available, i.e., only cast python
values to a minimal dtype.If one uses numpy types, those mostly will have
come from previous calculations with the same arrays, so things will work
as expected. And in most memory-limited applications, one would do
calculations in-place anyway (or, as Tyler noted, for power users one can
assume awareness of memory and thus the incentive to tell explicitly what
dtype is wanted - just `np.add(a, b, dtype=...)`, no need to create `out`).

More generally, I guess what I don't like about the casting rules generally
is that there is a presumption that if the value can be cast, the operation
will generally succeed. For `np.add` and `np.subtract`, this perhaps is
somewhat reasonable (though for unsigned a bit more dubious), but for
`np.multiply` or `np.power` it is much less so. (Indeed, we had a long
discussion about what to do with `int ** power` - now special-casing
negative integer powers.) Changing this, however, probably really is a
bridge too far!

Finally, somewhat related: I think the largest confusing actually results
from the `uint64+in64 -> float64` casting.  Should this cast to int64
instead?

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190605/3005adf9/attachment.html>

From ralf.gommers at gmail.com  Thu Jun  6 04:57:22 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 6 Jun 2019 10:57:22 +0200
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
Message-ID: <CABL7CQh0WTEVEZqC9KimuK=wC4rnR0P3N_D91etWg06Ot0iRvQ@mail.gmail.com>

On Wed, Jun 5, 2019 at 10:42 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> TL;DR:
>
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
> ...
> I have realized that this got much too long, so I hope it makes sense.
> I will continue to dabble along on these things a bit, so if nothing
> else maybe writing it helps me to get a bit clearer on things...
>

Your email was long but very clear. The part I'm missing is "why are things
the way they are?". Before diving into casting rules and all other wishes
people may have, can you please try to answer that? Because there's more to
it than "(maybe not-existing) corner cases".

Marten's first sentence ("a balance between unexpected memory blow-up and
unexpected wrapping") is in the right direction. As is Stephan's "Too many
users rely upon arithmetic like "x + 1" having a predictable dtype."

The problem is clear, however you need to figure out the constraints first,
then decide within the wiggle room you have what the options are.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190606/32040618/attachment.html>

From sebastian at sipsolutions.net  Thu Jun  6 11:33:18 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 06 Jun 2019 10:33:18 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <CAHPuU_bhoUQQZicM7i7NQrSSQF1q5mzq1JxGwA1dwLxCvZuosA@mail.gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <7efa604d071e796e275e89ccab8a0f04bae6eb04.camel@sipsolutions.net>
 <CAHPuU_bhoUQQZicM7i7NQrSSQF1q5mzq1JxGwA1dwLxCvZuosA@mail.gmail.com>
Message-ID: <9003a8c1cc24b7560c500832c3de3d8e369859dd.camel@sipsolutions.net>

On Wed, 2019-06-05 at 17:14 -0700, Tyler Reddy wrote:
> A few thoughts:
> 
> - We're not trying to achieve systematic guards against integer
> overflow / wrapping in ufunc inner loops, right? The performance
> tradeoffs for a "result-based" casting / exception handling addition
> would presumably be controversial? I know there was some discussion
> about having an "overflow detection mode"  (toggle) of some sort that
> could be activated for ufunc loops, but don't think that gained much
> traction/ priority. I think for floats we have an awkward way to
> propagate something back to the user if there's an issue.

No, that is indeed a different issue. It would be nice to provide the
option of integer overflow warnings/errors, but it is different since
it should not affect the dtypes in use (i.e. we would never upcast to
avoid the error).

> - It sounds like the objective is instead primarily to achieve pure
> dtype-based promotion, which is then effectively just a casting
> table, which is what I think you mean by "cache?"

Yes, the cache was a bad word, I used it thinking of user types where a
large table would probably not be created on the fly.

> - Is it a safe assumption that for a cache (dtype-only casting
> table), the main tradeoff is that we'd likely tend towards
> conservative upcasting and using more memory in output types in many
> cases vs. NumPy at the moment? Stephan seems concerned about that,
> presumably because x + 1 suddenly changes output dtype in an
> overwhelming number of current code lines and future simple examples
> for end users.

Yes. That is at least what we currently have. For x + 1 there is a good
point with sudden memory blow up. Maybe an even nicer example is
`float32_arr + 1`, which would have to go to float64 if 1 is
interpreted as `int32(1)`.

> - If np.array + 1 absolutely has to stay the same output dtype moving
> forward, then "Keeping Value based casting only for python types" is
> the one that looks most promising to me initially, with a few further
> concerns:

Well, while it is annoying me. I think we should base that decision of
what we want the user API to be only. And because of that, it seems
like the most likely option.
At least my gut feeling is, if it is typed, we should honor the type
(also for scalars), but code like x + 1 suddenly blowing up memory is
not a good idea.
I just realized that one (anti?)-pattern that is common is the:

arr + 0.  # make sure its "inexact/float"

is exactly an example of where you do not want to upcast unnecessarily.


> 1) Would that give you enough refactoring "wiggle room" to achieve
> the simplifications you need? If value-based promotion still happens
> for a non-NumPy operand, can you abstract that logic cleanly from the
> "pure dtype cache / table" that is planned for NumPy operands?

It is tricky. There is always the slightly strange solution of making
dtypes such as uint7, which "fixes" the type hierarchy as a minimal
dtype for promotion purpose, but would never be exposed to users.
(You probably need more strange dtypes for float and int combinations.)

To give me some wiggle room, what I was now doing is to simply decide
on the correct dtype before lookup. I am pretty sure that works for
all, except possibly one ufunc within numpy. The reason that this works
is that almost all of our ufuncs are typed as "ii->i" (identical
types).
Maybe that is OK to start working, and the strange dtype hierarchy can
be thought of later.


> 2) Is the "out" argument to ufuncs a satisfactory alternative to the
> "power users" who want to "override" default output casting type? We
> suggest that they pre-allocate an output array of the desired type if
> they want to save memory and if they overflow or wrap integers that
> is their problem. Can we reasonably ask people who currently depend
> on the memory-conservation they might get from value-based behavior
> to adjust in this way?

The can also use `dtype=...` (or at least we can fix that part to be
reliable). Or they can cast type the input. Especially if we want to
use it only for python integers/floats, adding the `np.int8(3)` is not
much effort.

> 3) Presumably "out" does / will circumvent the "cache / dtype casting
> table?"

Well, out fixes one of the types, if we look at the general machinery,
it would be possible to have:

ff->d
df->d
dd->d

loops. So if such loops are defined we cannot quite circumvent the
whole lookup. If we know that all loops are of the `ff->f` all same
dtype kind (which is true for almost all functions inside numpy),
lookup could be simplified.
For those loops with all the same dtype, the issue is fairly straight
forward anyway, because I can just decide how to handle the scalar
before hand.

Best,

Sebastian


> 
> Tyler
> 
> On Wed, 5 Jun 2019 at 15:37, Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
> > Hi all,
> > 
> > Maybe to clarify this at least a little, here are some examples for
> > what currently happen and what I could imagine we can go to (all in
> > terms of output dtype).
> > 
> > float32_arr = np.ones(10, dtype=np.float32)
> > int8_arr = np.ones(10, dtype=np.int8)
> > uint8_arr = np.ones(10, dtype=np.uint8)
> > 
> > 
> > Current behaviour:
> > ------------------
> > 
> > float32_arr + 12.  # float32
> > float32_arr + 2**200  # float64 (because np.float32(2**200) ==
> > np.inf)
> > 
> > int8_arr + 127     # int8
> > int8_arr + 128     # int16
> > int8_arr + 2**20   # int32
> > uint8_arr + -1     # uint16
> > 
> > # But only for arrays that are not 0d:
> > int8_arr + np.array(1, dtype=np.int32)  # int8
> > int8_arr + np.array([1], dtype=np.int32)  # int32
> > 
> > # When the actual typing is given, this does not change:
> > 
> > float32_arr + np.float64(12.)                  # float32
> > float32_arr + np.array(12., dtype=np.float64)  # float32
> > 
> > # Except for inexact types, or complex:
> > int8_arr + np.float16(3)  # float16  (same as array behaviour)
> > 
> > # The exact same happens with all ufuncs:
> > np.add(float32_arr, 1)                               # float32
> > np.add(float32_arr, np.array(12., dtype=np.float64)  # float32
> > 
> > 
> > Keeping Value based casting only for python types
> > -------------------------------------------------
> > 
> > In this case, most examples above stay unchanged, because they use
> > plain python integers or floats, such as 2, 127, 12., 3, ...
> > without
> > any type information attached, such as `np.float64(12.)`.
> > 
> > These change for example:
> > 
> > float32_arr + np.float64(12.)                        # float64
> > float32_arr + np.array(12., dtype=np.float64)        # float64
> > np.add(float32_arr, np.array(12., dtype=np.float64)  # float64
> > 
> > # so if you use `np.int32` it will be the same as np.uint64(10000)
> > 
> > int8_arr + np.int32(1)      # int32
> > int8_arr + np.int32(2**20)  # int32
> > 
> > 
> > Remove Value based casting completely
> > -------------------------------------
> > 
> > We could simply abolish it completely, a python `1` would always
> > behave
> > the same as `np.int_(1)`. The downside of this is that:
> > 
> > int8_arr + 1  # int64 (or int32)
> > 
> > uses much more memory suddenly. Or, we remove it from ufuncs, but
> > not
> > from operators:
> > 
> > int8_arr + 1  # int8 dtype
> > 
> > but:
> > 
> > np.add(int8_arr, 1)  # int64
> > # same as:
> > np.add(int8_arr, np.array(1))  # int16
> > 
> > The main reason why I was wondering about that is that for
> > operators
> > the logic seems fairly simple, but for general ufuncs it seems more
> > complex.
> > 
> > Best,
> > 
> > Sebastian
> > 
> > 
> > 
> > On Wed, 2019-06-05 at 15:41 -0500, Sebastian Berg wrote:
> > > Hi all,
> > > 
> > > TL;DR:
> > > 
> > > Value based promotion seems complex both for users and ufunc-
> > > dispatching/promotion logic. Is there any way we can move forward
> > > here,
> > > and if we do, could we just risk some possible (maybe not-
> > existing)
> > > corner cases to break early to get on the way?
> > > 
> > > -----------
> > > 
> > > Currently when you write code such as:
> > > 
> > > arr = np.array([1, 43, 23], dtype=np.uint16)
> > > res = arr + 1
> > > 
> > > Numpy uses fairly sophisticated logic to decide that `1` can be
> > > represented as a uint16, and thus for all unary functions (and
> > most
> > > others as well), the output will have a `res.dtype` of uint16.
> > > 
> > > Similar logic also exists for floating point types, where a lower
> > > precision floating point can be used:
> > > 
> > > arr = np.array([1, 43, 23], dtype=np.float32)
> > > (arr + np.float64(2.)).dtype  # will be float32
> > > 
> > > Currently, this value based logic is enforced by checking whether
> > the
> > > cast is possible: "4" can be cast to int8, uint8. So the first
> > call
> > > above will at some point check if "uint16 + uint16 -> uint16" is
> > a
> > > valid operation, find that it is, and thus stop searching. (There
> > is
> > > the additional logic, that when both/all operands are scalars, it
> > is
> > > not applied).
> > > 
> > > Note that while it is defined in terms of casting "1" to uint8
> > safely
> > > being possible even though 1 may be typed as int64. This logic
> > thus
> > > affects all promotion rules as well (i.e. what should the output
> > > dtype
> > > be).
> > > 
> > > 
> > > There 2 main discussion points/issues about it:
> > > 
> > > 1. Should value based casting/promotion logic exist at all?
> > > 
> > > Arguably an `np.int32(3)` has type information attached to it, so
> > why
> > > should we ignore it. It can also be tricky for users, because a
> > small
> > > change in values can change the result data type.
> > > Because 0-D arrays and scalars are too close inside numpy (you
> > will
> > > often not know which one you get). There is not much option but
> > to
> > > handle them identically. However, it seems pretty odd that:
> > >  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
> > >  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> > > 
> > > give a different result.
> > > 
> > > This is a bit different for python scalars, which do not have a
> > type
> > > attached already.
> > > 
> > > 
> > > 2. Promotion and type resolution in Ufuncs:
> > > 
> > > What is currently bothering me is that the decision what the
> > output
> > > dtypes should be currently depends on the values in complicated
> > ways.
> > > It would be nice if we can decide which type signature to use
> > without
> > > actually looking at values (or at least only very early on).
> > > 
> > > One reason here is caching and simplicity. I would like to be
> > able to
> > > cache which loop should be used for what input. Having value
> > based
> > > casting in there bloats up the problem.
> > > Of course it currently works OK, but especially when user dtypes
> > come
> > > into play, caching would seem like a nice optimization option.
> > > 
> > > Because `uint8(127)` can also be a `int8`, but uint8(128) it is
> > not
> > > as
> > > simple as finding the "minimal" dtype once and working with
> > that." 
> > > Of course Eric and I discussed this a bit before, and you could
> > > create
> > > an internal "uint7" dtype which has the only purpose of flagging
> > that
> > > a
> > > cast to int8 is safe.
> > > 
> > > I suppose it is possible I am barking up the wrong tree here, and
> > > this
> > > caching/predictability is not vital (or can be solved with such
> > an
> > > internal dtype easily, although I am not sure it seems elegant).
> > > 
> > > 
> > > Possible options to move forward
> > > --------------------------------
> > > 
> > > I have to still see a bit how trick things are. But there are a
> > few
> > > possible options. I would like to move the scalar logic to the
> > > beginning of ufunc calls:
> > >   * The uint7 idea would be one solution
> > >   * Simply implement something that works for numpy and all
> > except
> > >     strange external ufuncs (I can only think of numba as a
> > plausible
> > >     candidate for creating such).
> > > 
> > > My current plan is to see where the second thing leaves me.
> > > 
> > > We also should see if we cannot move the whole thing forward, in
> > > which
> > > case the main decision would have to be forward to where. My
> > opinion
> > > is
> > > currently that when a type has a dtype associated with it
> > clearly, we
> > > should always use that dtype in the future. This mostly means
> > that
> > > numpy dtypes such as `np.int64` will always be treated like an
> > int64,
> > > and never like a `uint8` because they happen to be castable to
> > that.
> > > 
> > > For values without a dtype attached (read python integers,
> > floats), I
> > > see three options, from more complex to simpler:
> > > 
> > > 1. Keep the current logic in place as much as possible
> > > 2. Only support value based promotion for operators, e.g.:
> > >    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
> > >    The upside is that it limits the complexity to a much simpler
> > >    problem, the downside is that the ufunc call and operator
> > match
> > >    less clearly.
> > > 3. Just associate python float with float64 and python integers
> > with
> > >    long/int64 and force users to always type them explicitly if
> > they
> > >    need to.
> > > 
> > > The downside of 1. is that it doesn't help with simplifying the
> > > current
> > > situation all that much, because we still have the special
> > casting
> > > around...
> > > 
> > > 
> > > I have realized that this got much too long, so I hope it makes
> > > sense.
> > > I will continue to dabble along on these things a bit, so if
> > nothing
> > > else maybe writing it helps me to get a bit clearer on things...
> > > 
> > > Best,
> > > 
> > > Sebastian
> > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190606/d7d955df/attachment-0001.sig>

From sebastian at sipsolutions.net  Thu Jun  6 11:43:44 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 06 Jun 2019 10:43:44 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <CAJNV+9s5-hW_6uLtB+xQuwNKTAFsZNhFEkrK7MFgyYr0iNtXsw@mail.gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <7efa604d071e796e275e89ccab8a0f04bae6eb04.camel@sipsolutions.net>
 <CAHPuU_bhoUQQZicM7i7NQrSSQF1q5mzq1JxGwA1dwLxCvZuosA@mail.gmail.com>
 <CAJNV+9s5-hW_6uLtB+xQuwNKTAFsZNhFEkrK7MFgyYr0iNtXsw@mail.gmail.com>
Message-ID: <23ca0b2cb24a1b39331543700743f1b90501df6b.camel@sipsolutions.net>

On Wed, 2019-06-05 at 21:35 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> Tricky! It seems a balance between unexpected memory blow-up and
> unexpected wrapping (the latter mostly for integers). 
> 
> Some comments specifically on your message first, then some more
> general related ones. 
> 
> 1. I'm very much against letting `a + b` do anything else than
> `np.add(a, b)`.

Well, I tend to agree. But just to put it out there:

[1] + [2]  == [1, 2]
np.add([1], [2]) == 3

So that is already far from true, since coercion has to occur. Of
course it is true that:

arr + something_else

will at some point force coercion of `something_else`, so that point is
only half valid if either `a` or `b` is already a numpy array/scalar.


> 2. For python values, an argument for casting by value is that a
> python int can be arbitrarily long; the only reasonable course of
> action for those seems to make them float, and once you do that one
> might as well cast to whatever type can hold the value (at least
> approximately).

To be honest, the "arbitrary long" thing is another issue, which is the
silent conversion to "object" dtype. Something that is also on the not
done list of: Maybe we should deprecate it.

In other words, we would freeze python int to one clear type, if you
have an arbitrarily large int, you would need to use `object` dtype (or
preferably a new `pyint/arbitrary_precision_int` dtype) explicitly.

> 3. Not necessarily preferred, but for casting of scalars, one can get
> more consistent behaviour also by extending the casting by value to
> any array that has size=1.
> 

That sounds just as horrible as the current mismatch to me, to be
honest.

> Overall, just on the narrow question, I'd be quite happy with your
> suggestion of using type information if available, i.e., only cast
> python values to a minimal dtype.If one uses numpy types, those
> mostly will have come from previous calculations with the same
> arrays, so things will work as expected. And in most memory-limited
> applications, one would do calculations in-place anyway (or, as Tyler
> noted, for power users one can assume awareness of memory and thus
> the incentive to tell explicitly what dtype is wanted - just
> `np.add(a, b, dtype=...)`, no need to create `out`).
> 
> More generally, I guess what I don't like about the casting rules
> generally is that there is a presumption that if the value can be
> cast, the operation will generally succeed. For `np.add` and
> `np.subtract`, this perhaps is somewhat reasonable (though for
> unsigned a bit more dubious), but for `np.multiply` or `np.power` it
> is much less so. (Indeed, we had a long discussion about what to do
> with `int ** power` - now special-casing negative integer powers.)
> Changing this, however, probably really is a bridge too far!

Indeed that is right. But that is a different point. E.g. there is
nothing wrong for example that `np.power` shouldn't decide that
`int**power` should always _promote_ (not cast) `int` to some larger
integer type if available.
The only point where we seriously have such logic right now is for
np.add.reduce (sum) and np.multiply.reduce (prod), which always use at
least `long` precision (and actually upcast bool->int, although
np.add(True, True) does not. Another difference to True + True...)

> 
> Finally, somewhat related: I think the largest confusing actually
> results from the `uint64+in64 -> float64` casting.  Should this cast
> to int64 instead?

Not sure, but yes, it is the other quirk in our casting that should be
discussed?.

- Sebastian

> 
> All the best,
> 
> Marten
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190606/8e00c9cb/attachment.sig>

From allanhaldane at gmail.com  Thu Jun  6 11:57:31 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Thu, 6 Jun 2019 11:57:31 -0400
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
Message-ID: <058827b2-5d3e-8597-d6f6-c1c9b2cb42e7@gmail.com>


I think dtype-based casting makes a lot of sense, the problem is
backward compatibility.

Numpy casting is weird in a number of ways: The array + array casting is
unexpected to many users (eg, uint64 + int64 -> float64), and the
casting of array + scalar is different from that, and value based.
Personally I wouldn't want to try change it unless we make a
backward-incompatible release (numpy 2.0), based on my experience trying
to change much more minor things. We already put "casting" on the list
of desired backward-incompatible changes on the list here:
https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release

Relatedly, I've previously dreamed about a different "C-style" way
casting might behave:
https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76

The proposal there is that array + array casting, array + scalar, and
array + python casting would all work in the same dtype-based way, which
mimics the familiar "C" casting rules.

See also:
https://github.com/numpy/numpy/issues/12525

Allan


On 6/5/19 4:41 PM, Sebastian Berg wrote:
> Hi all,
> 
> TL;DR:
> 
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
> 
> -----------
> 
> Currently when you write code such as:
> 
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
> 
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
> 
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
> 
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
> 
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
> 
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output dtype
> be).
> 
> 
> There 2 main discussion points/issues about it:
> 
> 1. Should value based casting/promotion logic exist at all?
> 
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, it seems pretty odd that:
>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> 
> give a different result.
> 
> This is a bit different for python scalars, which do not have a type
> attached already.
> 
> 
> 2. Promotion and type resolution in Ufuncs:
> 
> What is currently bothering me is that the decision what the output
> dtypes should be currently depends on the values in complicated ways.
> It would be nice if we can decide which type signature to use without
> actually looking at values (or at least only very early on).
> 
> One reason here is caching and simplicity. I would like to be able to
> cache which loop should be used for what input. Having value based
> casting in there bloats up the problem.
> Of course it currently works OK, but especially when user dtypes come
> into play, caching would seem like a nice optimization option.
> 
> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not as
> simple as finding the "minimal" dtype once and working with that." 
> Of course Eric and I discussed this a bit before, and you could create
> an internal "uint7" dtype which has the only purpose of flagging that a
> cast to int8 is safe.
> 
> I suppose it is possible I am barking up the wrong tree here, and this
> caching/predictability is not vital (or can be solved with such an
> internal dtype easily, although I am not sure it seems elegant).
> 
> 
> Possible options to move forward
> --------------------------------
> 
> I have to still see a bit how trick things are. But there are a few
> possible options. I would like to move the scalar logic to the
> beginning of ufunc calls:
>   * The uint7 idea would be one solution
>   * Simply implement something that works for numpy and all except
>     strange external ufuncs (I can only think of numba as a plausible
>     candidate for creating such).
> 
> My current plan is to see where the second thing leaves me.
> 
> We also should see if we cannot move the whole thing forward, in which
> case the main decision would have to be forward to where. My opinion is
> currently that when a type has a dtype associated with it clearly, we
> should always use that dtype in the future. This mostly means that
> numpy dtypes such as `np.int64` will always be treated like an int64,
> and never like a `uint8` because they happen to be castable to that.
> 
> For values without a dtype attached (read python integers, floats), I
> see three options, from more complex to simpler:
> 
> 1. Keep the current logic in place as much as possible
> 2. Only support value based promotion for operators, e.g.:
>    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
>    The upside is that it limits the complexity to a much simpler
>    problem, the downside is that the ufunc call and operator match
>    less clearly.
> 3. Just associate python float with float64 and python integers with
>    long/int64 and force users to always type them explicitly if they
>    need to.
> 
> The downside of 1. is that it doesn't help with simplifying the current
> situation all that much, because we still have the special casting
> around...
> 
> 
> I have realized that this got much too long, so I hope it makes sense.
> I will continue to dabble along on these things a bit, so if nothing
> else maybe writing it helps me to get a bit clearer on things...
> 
> Best,
> 
> Sebastian
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From sebastian at sipsolutions.net  Thu Jun  6 12:46:34 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 06 Jun 2019 11:46:34 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <058827b2-5d3e-8597-d6f6-c1c9b2cb42e7@gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <058827b2-5d3e-8597-d6f6-c1c9b2cb42e7@gmail.com>
Message-ID: <7cdbfe82028f8ad330ef310be27caa01993d3176.camel@sipsolutions.net>

On Thu, 2019-06-06 at 11:57 -0400, Allan Haldane wrote:
> I think dtype-based casting makes a lot of sense, the problem is
> backward compatibility.
> 
> Numpy casting is weird in a number of ways: The array + array casting
> is
> unexpected to many users (eg, uint64 + int64 -> float64), and the
> casting of array + scalar is different from that, and value based.
> Personally I wouldn't want to try change it unless we make a
> backward-incompatible release (numpy 2.0), based on my experience
> trying
> to change much more minor things. We already put "casting" on the
> list
> of desired backward-incompatible changes on the list here:
> https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release
> 
> Relatedly, I've previously dreamed about a different "C-style" way
> casting might behave:
> https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
> 
> The proposal there is that array + array casting, array + scalar, and
> array + python casting would all work in the same dtype-based way,
> which
> mimics the familiar "C" casting rules.

If I read it right, you do propose that array + python would cast in a
"minimal type" way for python.

In your write up, you describe that if you mix array + scalar, the
scalar uses a minimal dtype compared to the array's dtype. What we
instead have is that in principle you could have loops such as:

"ifi->f"
"idi->d"

and I think we should chose the first for a scalar, because it "fits"
into f just fine. (if the input is) `ufunc(int_arr, 12., int_arr)`.

I do not mind keeping the "simple" two (or even more) operand "lets
assume we have uniform types" logic around. For those it is easy to
find a "minimum type" even before actual loop lookup.
For the above example it would work in any case well, but it would get
complicating, if for example the last integer is an unsigned integer,
that happens to be small enough to fit also into an integer.

That might give some wiggle room, possibly also to attach warnings to
it, or at least make things easier. But I would also like to figure out
as well if we shouldn't try to move in any case. Sure, attach a major
version to it, but hopefully not a "big step type".

One thing that I had not thought about is, that if we create
FutureWarnings, we will need to provide a way to opt-in to the new/old
behaviour.
The old behaviour can be achieved by just using the python types (which
probably is what most code that wants this behaviour does already), but
the behaviour is tricky. Users can pass `dtype` explicitly, but that is
a huge kludge...
Will think about if there is a solution to that, because if there is
not, you are right. It has to be a "big step" kind of release.
Although, even then it would be nice to have warnings that can be
enabled to ease the transition!

- Sebastian


> 
> See also:
> https://github.com/numpy/numpy/issues/12525
> 
> Allan
> 
> 
> On 6/5/19 4:41 PM, Sebastian Berg wrote:
> > Hi all,
> > 
> > TL;DR:
> > 
> > Value based promotion seems complex both for users and ufunc-
> > dispatching/promotion logic. Is there any way we can move forward
> > here,
> > and if we do, could we just risk some possible (maybe not-existing)
> > corner cases to break early to get on the way?
> > 
> > -----------
> > 
> > Currently when you write code such as:
> > 
> > arr = np.array([1, 43, 23], dtype=np.uint16)
> > res = arr + 1
> > 
> > Numpy uses fairly sophisticated logic to decide that `1` can be
> > represented as a uint16, and thus for all unary functions (and most
> > others as well), the output will have a `res.dtype` of uint16.
> > 
> > Similar logic also exists for floating point types, where a lower
> > precision floating point can be used:
> > 
> > arr = np.array([1, 43, 23], dtype=np.float32)
> > (arr + np.float64(2.)).dtype  # will be float32
> > 
> > Currently, this value based logic is enforced by checking whether
> > the
> > cast is possible: "4" can be cast to int8, uint8. So the first call
> > above will at some point check if "uint16 + uint16 -> uint16" is a
> > valid operation, find that it is, and thus stop searching. (There
> > is
> > the additional logic, that when both/all operands are scalars, it
> > is
> > not applied).
> > 
> > Note that while it is defined in terms of casting "1" to uint8
> > safely
> > being possible even though 1 may be typed as int64. This logic thus
> > affects all promotion rules as well (i.e. what should the output
> > dtype
> > be).
> > 
> > 
> > There 2 main discussion points/issues about it:
> > 
> > 1. Should value based casting/promotion logic exist at all?
> > 
> > Arguably an `np.int32(3)` has type information attached to it, so
> > why
> > should we ignore it. It can also be tricky for users, because a
> > small
> > change in values can change the result data type.
> > Because 0-D arrays and scalars are too close inside numpy (you will
> > often not know which one you get). There is not much option but to
> > handle them identically. However, it seems pretty odd that:
> >  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
> >  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> > 
> > give a different result.
> > 
> > This is a bit different for python scalars, which do not have a
> > type
> > attached already.
> > 
> > 
> > 2. Promotion and type resolution in Ufuncs:
> > 
> > What is currently bothering me is that the decision what the output
> > dtypes should be currently depends on the values in complicated
> > ways.
> > It would be nice if we can decide which type signature to use
> > without
> > actually looking at values (or at least only very early on).
> > 
> > One reason here is caching and simplicity. I would like to be able
> > to
> > cache which loop should be used for what input. Having value based
> > casting in there bloats up the problem.
> > Of course it currently works OK, but especially when user dtypes
> > come
> > into play, caching would seem like a nice optimization option.
> > 
> > Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
> > as
> > simple as finding the "minimal" dtype once and working with that." 
> > Of course Eric and I discussed this a bit before, and you could
> > create
> > an internal "uint7" dtype which has the only purpose of flagging
> > that a
> > cast to int8 is safe.
> > 
> > I suppose it is possible I am barking up the wrong tree here, and
> > this
> > caching/predictability is not vital (or can be solved with such an
> > internal dtype easily, although I am not sure it seems elegant).
> > 
> > 
> > Possible options to move forward
> > --------------------------------
> > 
> > I have to still see a bit how trick things are. But there are a few
> > possible options. I would like to move the scalar logic to the
> > beginning of ufunc calls:
> >   * The uint7 idea would be one solution
> >   * Simply implement something that works for numpy and all except
> >     strange external ufuncs (I can only think of numba as a
> > plausible
> >     candidate for creating such).
> > 
> > My current plan is to see where the second thing leaves me.
> > 
> > We also should see if we cannot move the whole thing forward, in
> > which
> > case the main decision would have to be forward to where. My
> > opinion is
> > currently that when a type has a dtype associated with it clearly,
> > we
> > should always use that dtype in the future. This mostly means that
> > numpy dtypes such as `np.int64` will always be treated like an
> > int64,
> > and never like a `uint8` because they happen to be castable to
> > that.
> > 
> > For values without a dtype attached (read python integers, floats),
> > I
> > see three options, from more complex to simpler:
> > 
> > 1. Keep the current logic in place as much as possible
> > 2. Only support value based promotion for operators, e.g.:
> >    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
> >    The upside is that it limits the complexity to a much simpler
> >    problem, the downside is that the ufunc call and operator match
> >    less clearly.
> > 3. Just associate python float with float64 and python integers
> > with
> >    long/int64 and force users to always type them explicitly if
> > they
> >    need to.
> > 
> > The downside of 1. is that it doesn't help with simplifying the
> > current
> > situation all that much, because we still have the special casting
> > around...
> > 
> > 
> > I have realized that this got much too long, so I hope it makes
> > sense.
> > I will continue to dabble along on these things a bit, so if
> > nothing
> > else maybe writing it helps me to get a bit clearer on things...
> > 
> > Best,
> > 
> > Sebastian
> > 
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190606/8a9c99fc/attachment.sig>

From allanhaldane at gmail.com  Thu Jun  6 19:34:44 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Thu, 6 Jun 2019 19:34:44 -0400
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <7cdbfe82028f8ad330ef310be27caa01993d3176.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <058827b2-5d3e-8597-d6f6-c1c9b2cb42e7@gmail.com>
 <7cdbfe82028f8ad330ef310be27caa01993d3176.camel@sipsolutions.net>
Message-ID: <74ecb7ce-7e22-fdd7-a2b2-2bc50e5dd8be@gmail.com>

On 6/6/19 12:46 PM, Sebastian Berg wrote:
> On Thu, 2019-06-06 at 11:57 -0400, Allan Haldane wrote:
>> I think dtype-based casting makes a lot of sense, the problem is
>> backward compatibility.
>>
>> Numpy casting is weird in a number of ways: The array + array casting
>> is
>> unexpected to many users (eg, uint64 + int64 -> float64), and the
>> casting of array + scalar is different from that, and value based.
>> Personally I wouldn't want to try change it unless we make a
>> backward-incompatible release (numpy 2.0), based on my experience
>> trying
>> to change much more minor things. We already put "casting" on the
>> list
>> of desired backward-incompatible changes on the list here:
>> https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release
>>
>> Relatedly, I've previously dreamed about a different "C-style" way
>> casting might behave:
>> https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
>>
>> The proposal there is that array + array casting, array + scalar, and
>> array + python casting would all work in the same dtype-based way,
>> which
>> mimics the familiar "C" casting rules.
> 
> If I read it right, you do propose that array + python would cast in a
> "minimal type" way for python.

I'm a little unclear what you mean by "minimal type" way. By "minimal
type", I thought you and others are talking about the rule numpy
currently uses that "the output dtype is the minimal dtype capable of
representing the value of both input dtypes", right? But in that gist I
am instead proposing that output-dtype is determined by C-like rules.

For array+py_scalar I was less certain what to do than for array+array
and array+npy_scalar. But I proposed the three "ranks" of 1. bool, 2.
int, and 3. float/complex. My rule for array+py_scalar is that if the
python scalar's rank is less than the numpy operand dtype's rank, use
the numpy dtype. If the python-scalar's rank is greater, use the
"default" types of bool_, int64, float64 respectively. Eg:

np.bool_(1) + 1        -> int64   (default int wins)
np.int8(1) + 1         -> int8    (numpy wins)
np.uint8(1) + (-1)     -> uint8   (numpy wins)
np.int64(1) + 1        -> int64   (numpy wins)
np.int64(1) + 1.0      -> float64 (default float wins)
np.float32(1.0) + 1.0  -> float32 (numpy wins)

Note it does not depend on the numerical value of the scalar, only its type.

> In your write up, you describe that if you mix array + scalar, the
> scalar uses a minimal dtype compared to the array's dtype. 

Sorry if I'm nitpicking/misunderstanding, but in my rules np.uint64(1) +
1 -> uint64 but in numpy's "minimal dtype" rules it is  -> float64. So I
don't think I am using the minimal rule.

> What we
> instead have is that in principle you could have loops such as:
> 
> "ifi->f"
> "idi->d"
> 
> and I think we should chose the first for a scalar, because it "fits"
> into f just fine. (if the input is) `ufunc(int_arr, 12., int_arr)`.

I feel I'm not understanding you, but the casting rules in my gist
follow those two rules if i, f are the numpy types int32 and float32.

If instead you mean (np.int64, py_float, np.int64) my rules would cast
to float64, since py_float has the highest rank and so is converted to
the default numpy-type for that rank, float64.

I would also add that unlike current numpy, my C-casting rules are
associative (if all operands are numpy types, see note below), so it
does not matter in which order you promote the types: (if)i  and i(fi)
give the same result. In current numpy this is not always the case:

    p = np.promote_types
    p(p('u2',   'i1'), 'f4')    # ->  f8
    p(  'u2', p('i1',  'f4'))   # ->  f4

(However, my casting rules are not associative if you include python
scalars.. eg  np.float32(1) + 1.0 + np.int64(1) . Maybe I should try to
fix that...)

Best,
Allan

> I do not mind keeping the "simple" two (or even more) operand "lets
> assume we have uniform types" logic around. For those it is easy to
> find a "minimum type" even before actual loop lookup.
> For the above example it would work in any case well, but it would get
> complicating, if for example the last integer is an unsigned integer,
> that happens to be small enough to fit also into an integer.
> 
> That might give some wiggle room, possibly also to attach warnings to
> it, or at least make things easier. But I would also like to figure out
> as well if we shouldn't try to move in any case. Sure, attach a major
> version to it, but hopefully not a "big step type".
> 
> One thing that I had not thought about is, that if we create
> FutureWarnings, we will need to provide a way to opt-in to the new/old
> behaviour.
> The old behaviour can be achieved by just using the python types (which
> probably is what most code that wants this behaviour does already), but
> the behaviour is tricky. Users can pass `dtype` explicitly, but that is
> a huge kludge...
> Will think about if there is a solution to that, because if there is
> not, you are right. It has to be a "big step" kind of release.
> Although, even then it would be nice to have warnings that can be
> enabled to ease the transition!
> 
> - Sebastian
> 
> 
>>
>> See also:
>> https://github.com/numpy/numpy/issues/12525
>>
>> Allan
>>
>>
>> On 6/5/19 4:41 PM, Sebastian Berg wrote:
>>> Hi all,
>>>
>>> TL;DR:
>>>
>>> Value based promotion seems complex both for users and ufunc-
>>> dispatching/promotion logic. Is there any way we can move forward
>>> here,
>>> and if we do, could we just risk some possible (maybe not-existing)
>>> corner cases to break early to get on the way?
>>>
>>> -----------
>>>
>>> Currently when you write code such as:
>>>
>>> arr = np.array([1, 43, 23], dtype=np.uint16)
>>> res = arr + 1
>>>
>>> Numpy uses fairly sophisticated logic to decide that `1` can be
>>> represented as a uint16, and thus for all unary functions (and most
>>> others as well), the output will have a `res.dtype` of uint16.
>>>
>>> Similar logic also exists for floating point types, where a lower
>>> precision floating point can be used:
>>>
>>> arr = np.array([1, 43, 23], dtype=np.float32)
>>> (arr + np.float64(2.)).dtype  # will be float32
>>>
>>> Currently, this value based logic is enforced by checking whether
>>> the
>>> cast is possible: "4" can be cast to int8, uint8. So the first call
>>> above will at some point check if "uint16 + uint16 -> uint16" is a
>>> valid operation, find that it is, and thus stop searching. (There
>>> is
>>> the additional logic, that when both/all operands are scalars, it
>>> is
>>> not applied).
>>>
>>> Note that while it is defined in terms of casting "1" to uint8
>>> safely
>>> being possible even though 1 may be typed as int64. This logic thus
>>> affects all promotion rules as well (i.e. what should the output
>>> dtype
>>> be).
>>>
>>>
>>> There 2 main discussion points/issues about it:
>>>
>>> 1. Should value based casting/promotion logic exist at all?
>>>
>>> Arguably an `np.int32(3)` has type information attached to it, so
>>> why
>>> should we ignore it. It can also be tricky for users, because a
>>> small
>>> change in values can change the result data type.
>>> Because 0-D arrays and scalars are too close inside numpy (you will
>>> often not know which one you get). There is not much option but to
>>> handle them identically. However, it seems pretty odd that:
>>>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>>>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
>>>
>>> give a different result.
>>>
>>> This is a bit different for python scalars, which do not have a
>>> type
>>> attached already.
>>>
>>>
>>> 2. Promotion and type resolution in Ufuncs:
>>>
>>> What is currently bothering me is that the decision what the output
>>> dtypes should be currently depends on the values in complicated
>>> ways.
>>> It would be nice if we can decide which type signature to use
>>> without
>>> actually looking at values (or at least only very early on).
>>>
>>> One reason here is caching and simplicity. I would like to be able
>>> to
>>> cache which loop should be used for what input. Having value based
>>> casting in there bloats up the problem.
>>> Of course it currently works OK, but especially when user dtypes
>>> come
>>> into play, caching would seem like a nice optimization option.
>>>
>>> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
>>> as
>>> simple as finding the "minimal" dtype once and working with that." 
>>> Of course Eric and I discussed this a bit before, and you could
>>> create
>>> an internal "uint7" dtype which has the only purpose of flagging
>>> that a
>>> cast to int8 is safe.
>>>
>>> I suppose it is possible I am barking up the wrong tree here, and
>>> this
>>> caching/predictability is not vital (or can be solved with such an
>>> internal dtype easily, although I am not sure it seems elegant).
>>>
>>>
>>> Possible options to move forward
>>> --------------------------------
>>>
>>> I have to still see a bit how trick things are. But there are a few
>>> possible options. I would like to move the scalar logic to the
>>> beginning of ufunc calls:
>>>   * The uint7 idea would be one solution
>>>   * Simply implement something that works for numpy and all except
>>>     strange external ufuncs (I can only think of numba as a
>>> plausible
>>>     candidate for creating such).
>>>
>>> My current plan is to see where the second thing leaves me.
>>>
>>> We also should see if we cannot move the whole thing forward, in
>>> which
>>> case the main decision would have to be forward to where. My
>>> opinion is
>>> currently that when a type has a dtype associated with it clearly,
>>> we
>>> should always use that dtype in the future. This mostly means that
>>> numpy dtypes such as `np.int64` will always be treated like an
>>> int64,
>>> and never like a `uint8` because they happen to be castable to
>>> that.
>>>
>>> For values without a dtype attached (read python integers, floats),
>>> I
>>> see three options, from more complex to simpler:
>>>
>>> 1. Keep the current logic in place as much as possible
>>> 2. Only support value based promotion for operators, e.g.:
>>>    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
>>>    The upside is that it limits the complexity to a much simpler
>>>    problem, the downside is that the ufunc call and operator match
>>>    less clearly.
>>> 3. Just associate python float with float64 and python integers
>>> with
>>>    long/int64 and force users to always type them explicitly if
>>> they
>>>    need to.
>>>
>>> The downside of 1. is that it doesn't help with simplifying the
>>> current
>>> situation all that much, because we still have the special casting
>>> around...
>>>
>>>
>>> I have realized that this got much too long, so I hope it makes
>>> sense.
>>> I will continue to dabble along on these things a bit, so if
>>> nothing
>>> else maybe writing it helps me to get a bit clearer on things...
>>>
>>> Best,
>>>
>>> Sebastian
>>>
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion


From njs at pobox.com  Thu Jun  6 19:36:37 2019
From: njs at pobox.com (Nathaniel Smith)
Date: Thu, 6 Jun 2019 16:36:37 -0700
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
Message-ID: <CAPJVwBkS1BFSOUxm=L3NG6HDk7ZR=Df2a3SDMws8nTCmDiSPFw@mail.gmail.com>

I haven't read all the thread super carefully, so I might have missed
something, but I think we might want to look at this together with the
special rule for scalar casting.

IIUC, the basic end-user problem that motivates all thi sis: when you
have a simple Python constant whose exact dtype is unspecified, people
don't want numpy to first automatically pick a dtype for it, and then
use that automatically chosen dtype to override the explicit dtypes
that the user specified. That's that "x + 1" problem. (This also comes
up a ton for languages trying to figure out how to type manifest
constants.)

Numpy's original solution for this was the special casting rule for
scalars. I don't understand the exact semantics, but it's something
like: in any operation involving a mix of non-zero-dim arrays and
zero-dim arrays, we throw out the exact dtype information for the
scalar ("float64", "int32") and replace it with just the "kind"
("float", "int").

This has several surprising consequences:

- The output dtype depends on not just the input dtypes, but also the
input shapes:

In [19]: (np.array([1, 2], dtype=np.int8) + 1).dtype
Out[19]: dtype('int8')

In [20]: (np.array([1, 2], dtype=np.int8) + [1]).dtype
Out[20]: dtype('int64')

- It doesn't just affect Python scalars with vague dtypes, but also
scalars where the user has specifically set the dtype:

In [21]: (np.array([1, 2], dtype=np.int8) + np.int64(1)).dtype
Out[21]: dtype('int8')

- I'm not sure the "kind" rule even does the right thing, especially
for mixed-kind operations. float16-array + int8-scalar has to do the
same thing as float16-array + int64-scalar, but that feels weird? I
think this is why value-based casting got added (at around the same
time as float16, in fact).

(Kinds are kinda problematic in general... the SAME_KIND casting rule
is very weird ? casting int32->int64 is radically different from
casting float64->float32, which is radically different than casting
int64->int32, but SAME_KIND treats them all the same. And it's really
unclear how to generalize the 'kind' concept to new dtypes.)

My intuition is that what users actually want is for *native Python
types* to be treated as having 'underspecified' dtypes, e.g. int is
happy to coerce to int8/int32/int64/whatever, float is happy to coerce
to float32/float64/whatever, but once you have a fully-specified numpy
dtype, it should stay.

Some cases to think about:

np.array([1, 2], dtype=int8) + [1, 1]
 -> maybe this should have dtype int8, because there's no type info on
the right side to contradict that?

np.array([1, 2], dtype=int8) + 2**40
 -> maybe this should be an error, because you can't cast 2**40 to
int8 (under default casting safety rules)? That would introduce some
value-dependence, but it would only affect whether you get an error or
not, and there's precedent for that (e.g. division by zero).

In any case, it would probably be helpful to start by just writing
down the whole set of rules we have now, because I'm not sure anyone
understands all the details...

-n

On Wed, Jun 5, 2019 at 1:42 PM Sebastian Berg
<sebastian at sipsolutions.net> wrote:
>
> Hi all,
>
> TL;DR:
>
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
>
> -----------
>
> Currently when you write code such as:
>
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
>
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
>
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
>
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
>
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
>
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output dtype
> be).
>
>
> There 2 main discussion points/issues about it:
>
> 1. Should value based casting/promotion logic exist at all?
>
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, it seems pretty odd that:
>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
>
> give a different result.
>
> This is a bit different for python scalars, which do not have a type
> attached already.
>
>
> 2. Promotion and type resolution in Ufuncs:
>
> What is currently bothering me is that the decision what the output
> dtypes should be currently depends on the values in complicated ways.
> It would be nice if we can decide which type signature to use without
> actually looking at values (or at least only very early on).
>
> One reason here is caching and simplicity. I would like to be able to
> cache which loop should be used for what input. Having value based
> casting in there bloats up the problem.
> Of course it currently works OK, but especially when user dtypes come
> into play, caching would seem like a nice optimization option.
>
> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not as
> simple as finding the "minimal" dtype once and working with that."
> Of course Eric and I discussed this a bit before, and you could create
> an internal "uint7" dtype which has the only purpose of flagging that a
> cast to int8 is safe.
>
> I suppose it is possible I am barking up the wrong tree here, and this
> caching/predictability is not vital (or can be solved with such an
> internal dtype easily, although I am not sure it seems elegant).
>
>
> Possible options to move forward
> --------------------------------
>
> I have to still see a bit how trick things are. But there are a few
> possible options. I would like to move the scalar logic to the
> beginning of ufunc calls:
>   * The uint7 idea would be one solution
>   * Simply implement something that works for numpy and all except
>     strange external ufuncs (I can only think of numba as a plausible
>     candidate for creating such).
>
> My current plan is to see where the second thing leaves me.
>
> We also should see if we cannot move the whole thing forward, in which
> case the main decision would have to be forward to where. My opinion is
> currently that when a type has a dtype associated with it clearly, we
> should always use that dtype in the future. This mostly means that
> numpy dtypes such as `np.int64` will always be treated like an int64,
> and never like a `uint8` because they happen to be castable to that.
>
> For values without a dtype attached (read python integers, floats), I
> see three options, from more complex to simpler:
>
> 1. Keep the current logic in place as much as possible
> 2. Only support value based promotion for operators, e.g.:
>    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
>    The upside is that it limits the complexity to a much simpler
>    problem, the downside is that the ufunc call and operator match
>    less clearly.
> 3. Just associate python float with float64 and python integers with
>    long/int64 and force users to always type them explicitly if they
>    need to.
>
> The downside of 1. is that it doesn't help with simplifying the current
> situation all that much, because we still have the special casting
> around...
>
>
> I have realized that this got much too long, so I hope it makes sense.
> I will continue to dabble along on these things a bit, so if nothing
> else maybe writing it helps me to get a bit clearer on things...
>
> Best,
>
> Sebastian
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


-- 
Nathaniel J. Smith -- https://vorpus.org

From tyler.je.reddy at gmail.com  Thu Jun  6 22:04:38 2019
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Thu, 6 Jun 2019 19:04:38 -0700
Subject: [Numpy-discussion] ANN: SciPy 1.2.2 (LTS)
Message-ID: <CAHPuU_aEJeuA0cBs2+vgbUrdYqq3WkUCuT8B5=qH4M67VYKZ3g@mail.gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi all,

On behalf of the SciPy development team I'm pleased to announce
the release of SciPy 1.2.2, which is a bug fix release. This is part
of the long-term support (LTS) branch that includes Python 2.7.

Sources and binary wheels can be found at:
https://pypi.org/project/scipy/
and at: https://github.com/scipy/scipy/releases/tag/v1.2.2

One of a few ways to install this release with pip:

pip install scipy==1.2.2

=====================
SciPy 1.2.2 Release Notes
=====================

SciPy 1.2.2 is a bug-fix release with no new features compared to 1.2.1.
Importantly, the SciPy 1.2.2 wheels are built with OpenBLAS 0.3.7.dev to
alleviate issues with SkylakeX AVX512 kernels.

Authors
======

* CJ Carey
* Tyler Dawson +
* Ralf Gommers
* Kai Striega
* Andrew Nelson
* Tyler Reddy
* Kevin Sheppard +

A total of 7 people contributed to this release.
People with a "+" by their names contributed a patch for the first time.
This list of names is automatically generated, and may not be fully
complete.

Issues closed for 1.2.2
------------------------------
* `#9611 <https://github.com/scipy/scipy/issues/9611>`__: Overflow error
with new way of p-value calculation in kendall tau correlation for
perfectly monotonic vectors
* `#9964 <https://github.com/scipy/scipy/issues/9964>`__: optimize.newton :
overwrites x0 argument when it is a numpy array
* `#9784 <https://github.com/scipy/scipy/issues/9784>`__: TST: Minimum
NumPy version is not being CI tested
* `#10132 <https://github.com/scipy/scipy/issues/10132>`__: Docs:
Description of nnz attribute of sparse.csc_matrix misleading

Pull requests for 1.2.2
-----------------------------
* `#10056 <https://github.com/scipy/scipy/pull/10056>`__: BUG: Ensure
factorial is not too large in kendaltau
* `#9991 <https://github.com/scipy/scipy/pull/9991>`__: BUG: Avoid inplace
modification of input array in newton
* `#9788 <https://github.com/scipy/scipy/pull/9788>`__: TST, BUG:
f2py-related issues with NumPy < 1.14.0
* `#9749 <https://github.com/scipy/scipy/pull/9749>`__: BUG:
MapWrapper.__exit__ should terminate
* `#10141 <https://github.com/scipy/scipy/pull/10141>`__: Update
description for nnz on csc.py

Checksums
=========

MD5
~~~

f5d23361e78f230f70fd117be20930e1
 scipy-1.2.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
44387030d96a2495e5576800b2a567d6  scipy-1.2.2-cp27-cp27m-manylinux1_i686.whl
bc56bf862deadc96f6be1f67dc8eaf89
 scipy-1.2.2-cp27-cp27m-manylinux1_x86_64.whl
a45382978ff7d032041847f66e2f7351  scipy-1.2.2-cp27-cp27m-win32.whl
1140063ad53c44414f9feaae3c4fbf8c  scipy-1.2.2-cp27-cp27m-win_amd64.whl
3407230bae0c36210c5d3fee717a3579
 scipy-1.2.2-cp27-cp27mu-manylinux1_i686.whl
fbb9867ea3ba38cc0c979c38b8c77871
 scipy-1.2.2-cp27-cp27mu-manylinux1_x86_64.whl
8b4497e964c17135b6b2e8f691bed49e
 scipy-1.2.2-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
9139c344bc6ef05f7f22191af0810ef6  scipy-1.2.2-cp34-cp34m-manylinux1_i686.whl
a62c1f316c33af02007da3374ebf02c3
 scipy-1.2.2-cp34-cp34m-manylinux1_x86_64.whl
780ce592f99ade01a9b0883ac767f798  scipy-1.2.2-cp34-cp34m-win32.whl
498e740b099182df30c16144a109acdf  scipy-1.2.2-cp34-cp34m-win_amd64.whl
8b157f5433846d8798ff6941d0f9671f
 scipy-1.2.2-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
e1692a9e3e9a9b2764bccd0c9575bfef  scipy-1.2.2-cp35-cp35m-manylinux1_i686.whl
70863fc59dc034c07b73de765eb693f9
 scipy-1.2.2-cp35-cp35m-manylinux1_x86_64.whl
ce676f1adc72f8180b2eacec7e44c802  scipy-1.2.2-cp35-cp35m-win32.whl
21a9fac5e289682abe35ce6d54f5805f  scipy-1.2.2-cp35-cp35m-win_amd64.whl
470fa57418223df8fc27e9ec45bc7a94
 scipy-1.2.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
4001f322a2967de0aa0b8148e0116def  scipy-1.2.2-cp36-cp36m-manylinux1_i686.whl
4e0d727cbbfe8410bd1229d197fb11d8
 scipy-1.2.2-cp36-cp36m-manylinux1_x86_64.whl
352608fa1f48877fc76a55217e689240  scipy-1.2.2-cp36-cp36m-win32.whl
559ca5cda1935a9992436bb1398dbcd0  scipy-1.2.2-cp36-cp36m-win_amd64.whl
92b9356944c239520f5b2897ba531c16
 scipy-1.2.2-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
d9b427be8fc3bfd5b2a8330e1215b0ee  scipy-1.2.2-cp37-cp37m-manylinux1_i686.whl
4f2d513b1950ab7c147ddf3e4acb2542
 scipy-1.2.2-cp37-cp37m-manylinux1_x86_64.whl
1598ffe78061854f7bed87290250c33f  scipy-1.2.2-cp37-cp37m-win32.whl
9dad5d71152b714694e073d1c0c54288  scipy-1.2.2-cp37-cp37m-win_amd64.whl
d94de858fba4f24de7d6dd16f1caeb5d  scipy-1.2.2.tar.gz
136c5ee1bc4b259a12a7efe331b15d64  scipy-1.2.2.tar.xz
b9a5b4cbdf54cf681eda3b4d94a73c18  scipy-1.2.2.zip

SHA256
~~~~~~

271c6e56c8f9a3d6c3f0bc857d7a6e7cf7a8415c879a3915701cd011e82a83a3
 scipy-1.2.2-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
2eb255b30dac7516c6f3c5237f2e0ad1f1213b5364de409d932249c9a8c5bffb
 scipy-1.2.2-cp27-cp27m-manylinux1_i686.whl
7f58faa422aa493d7b70dd56d6e8783223e84dd6e7f4b4161bd776b39ecbac92
 scipy-1.2.2-cp27-cp27m-manylinux1_x86_64.whl
d0d41a9ee3264f95820138170b447f5d3e453e5ebd10b411bca37c99237aac69
 scipy-1.2.2-cp27-cp27m-win32.whl
b074a83299a82eae617dc46a830cfa7aaa588d07523990507848ee1ded3c52ce
 scipy-1.2.2-cp27-cp27m-win_amd64.whl
49dcebc6f57bce0bd23cb55dbc6144f4990e5cbce9aab3128af03d6b1b4eab6a
 scipy-1.2.2-cp27-cp27mu-manylinux1_i686.whl
67d2210c7f6f585e1055bee3dc9f15610b5ebb04e80bfaa757868937ee744fec
 scipy-1.2.2-cp27-cp27mu-manylinux1_x86_64.whl
0bcababa06ff83138a7f30a68f334dee034ce1cc7604f9278b96f62265fe7fd7
 scipy-1.2.2-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
a9fc1fcaa560edf771d4545d7e6dd865a213fc5b485bb127de5dfd32f40094e1
 scipy-1.2.2-cp34-cp34m-manylinux1_i686.whl
7fb4efff9895116428ad65564d2232fb1cac4b9d84398512a858b09dd4a7fd59
 scipy-1.2.2-cp34-cp34m-manylinux1_x86_64.whl
fbdff021643c2dfa35efd29218e0318c4b4987f48ea432be7e8c02bdb1b0c314
 scipy-1.2.2-cp34-cp34m-win32.whl
f4e355afa8fdda11010de308c2376edda29e064cec699974097364115f71e16f
 scipy-1.2.2-cp34-cp34m-win_amd64.whl
e99cd49daffe7384fd35046c3b14bee98ce87d97c95865469227001905534e13
 scipy-1.2.2-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
447c40d33ec5e0020750fadbb8599220b9eb9fd8798030efe9b308247800f364
 scipy-1.2.2-cp35-cp35m-manylinux1_i686.whl
9a21d64d002cb3a9239a55c0aa100b48d58b5e38382c0fdfcdfc68cf417d8142
 scipy-1.2.2-cp35-cp35m-manylinux1_x86_64.whl
5fa84b467b5f77c243c5701628ed7a4238e53bc4120db87be7dafa416e842fb9
 scipy-1.2.2-cp35-cp35m-win32.whl
682b210ff7a65f6f5245fdf73d26a348b57e42d2059bc5fcf7ed25d063f35c45
 scipy-1.2.2-cp35-cp35m-win_amd64.whl
bcd0d4b2de5cb3fab69007214a39737e917267f56f887ce9c7732ba3278fc33d
 scipy-1.2.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
4686d699f76068757a81269f1a111c0db689bf048a56b131a339803121534fa8
 scipy-1.2.2-cp36-cp36m-manylinux1_i686.whl
97f26b4b5d4456f44849fd35cad8801f7cae4e64b75fc4e522d26a54aef17391
 scipy-1.2.2-cp36-cp36m-manylinux1_x86_64.whl
922e2370674c82dd1367fc13a08c8765f4e5281a584d871e7cb454828d84600f
 scipy-1.2.2-cp36-cp36m-win32.whl
c390f1721757ec983616149f00e1bd0432aa32d2c1d9398930d7e7cc9542c922
 scipy-1.2.2-cp36-cp36m-win_amd64.whl
47d4623efa71948dc4a92f978fbf6b9fb69dac5b0f0fae4c1a1f3d955ac8aea9
 scipy-1.2.2-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
162b803984ebb76927990d7233cab825d146be8e2a3f6a0efb1b3a61ebacae73
 scipy-1.2.2-cp37-cp37m-manylinux1_i686.whl
d18d1575d4a54f128c0f34422bd73ce0f177e462d6124f074388e211d8dc2616
 scipy-1.2.2-cp37-cp37m-manylinux1_x86_64.whl
c5b9db9e3f6537bf7b308de12c185b27f22fb9a66fd12efc7aefbcfa0adb4d82
 scipy-1.2.2-cp37-cp37m-win32.whl
f64e29a8b32d672fb6078f456bfff3cae8f36b6c8b64c337ad0942f29404b03f
 scipy-1.2.2-cp37-cp37m-win_amd64.whl
a4331e0b8dab1ff75d2c67b5158a8bb9a83c799d7140094dda936d876c7cfbb1
 scipy-1.2.2.tar.gz
8006216f7e99dbf639b9293c73c197f36c34389ea4a223547e31f1772d2626f7
 scipy-1.2.2.tar.xz
578030ddb33eb093cdd2ebfb71ce052bb8b2d5cd33d14bcc452069b05ffa9dff
 scipy-1.2.2.zip
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJc+bQAAAoJELD/41ZX0J714DIP/1lo7lRql213ElrtmrXSetvc
wagnMTnwKW6/zAWXrDaX5aQ3XoCbUyDyva427vCoAIacw7HDsdgCkw6hb8WCI2fM
SxG7iBDuDnJM3iVBtM23qUH4aI4CvRoVZmG2oF4fwwMpjvx80bMzHmm1xkw5OVBz
9JaYYplT1PCcTUD4CnwX2jG66NlzLYomQgdg67I6NIubelKhVUEMRx1j9s5Ed76q
bwPZbV6i52kzuG441VXhUq1Rhn/+j55/hgnlRpbQFAkwbz664OkqBZ1FPIH7/Wpq
vGFxYPROECxjrpiPYiWtXZpJfRJySiQ5oltBHec3MdX1b/S7cAm7BCI0hW4NsPmU
i36Ho0f6WhCTMGowl+V4uylE3hvWEW0zHp9MJwe2mUNWc9YKPu0pCZ3hif/YH+rh
oQD8sI3IUZuyJ0ntPWN/SCXdE5kmE5zRLIBFoap15uRComuypuZWmMrWuX4oC1Qb
0pKuCa3UcDXmTVVc+ZypnbRfePUbocxeP9lrsQlT43nfhp9jXkPw1fWxXII9ChNL
ORoyAvHzgfUxc4x5vka8Evs1OhSzMg6SYkH0SN1qNiOHq9RLBwdLqk4dgyNmLCPP
fdr6HvLEK6rYNNDEq6IxY7h8zSFtOQRjtW348W6T611sAHAa8u+51nVMI5KljPlF
x9AB2Fj0Q4rFQpVQtlz/
=jT1x
-----END PGP SIGNATURE-----
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190606/4a0ed832/attachment-0001.html>

From ralf.gommers at gmail.com  Fri Jun  7 01:18:38 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 7 Jun 2019 07:18:38 +0200
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <CAPJVwBkS1BFSOUxm=L3NG6HDk7ZR=Df2a3SDMws8nTCmDiSPFw@mail.gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <CAPJVwBkS1BFSOUxm=L3NG6HDk7ZR=Df2a3SDMws8nTCmDiSPFw@mail.gmail.com>
Message-ID: <CABL7CQhVKyVxpHPZa6-sZZJjeKGohakcMe_b098hWh3j6VAEUg@mail.gmail.com>

On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <njs at pobox.com> wrote:

>
> My intuition is that what users actually want is for *native Python
> types* to be treated as having 'underspecified' dtypes, e.g. int is
> happy to coerce to int8/int32/int64/whatever, float is happy to coerce
> to float32/float64/whatever, but once you have a fully-specified numpy
> dtype, it should stay.
>

Thanks Nathaniel, I think this expresses a possible solution better than
anything I've seen on this list before. An explicit "underspecified types"
concept could make casting understandable.


> In any case, it would probably be helpful to start by just writing
> down the whole set of rules we have now, because I'm not sure anyone
> understands all the details...
>

+1

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190607/bb5ca837/attachment.html>

From m.h.vankerkwijk at gmail.com  Fri Jun  7 09:22:53 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 7 Jun 2019 09:22:53 -0400
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <CABL7CQhVKyVxpHPZa6-sZZJjeKGohakcMe_b098hWh3j6VAEUg@mail.gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <CAPJVwBkS1BFSOUxm=L3NG6HDk7ZR=Df2a3SDMws8nTCmDiSPFw@mail.gmail.com>
 <CABL7CQhVKyVxpHPZa6-sZZJjeKGohakcMe_b098hWh3j6VAEUg@mail.gmail.com>
Message-ID: <CAJNV+9tn7nLDNftLe3wt8iC1J-AeoTjxYsNWECVa8mrkftSjqw@mail.gmail.com>

On Fri, Jun 7, 2019 at 1:19 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <njs at pobox.com> wrote:
>
>>
>> My intuition is that what users actually want is for *native Python
>> types* to be treated as having 'underspecified' dtypes, e.g. int is
>> happy to coerce to int8/int32/int64/whatever, float is happy to coerce
>> to float32/float64/whatever, but once you have a fully-specified numpy
>> dtype, it should stay.
>>
>
> Thanks Nathaniel, I think this expresses a possible solution better than
> anything I've seen on this list before. An explicit "underspecified types"
> concept could make casting understandable.
>

I think the current model is that this holds for all scalars, but changing
that to be just for not already explicitly typed types makes sense.

In the context of a mental picture, one could think in terms of coercion,
of numpy having not just a `numpy.array` but also a `numpy.scalar`
function, which takes some input and tries to make a numpy scalar of it.
For python int, float, complex, etc., it uses the minimal numpy type.

Of course, this is slightly inconsistent with the `np.array` function which
converts things to `ndarray` using a default type for int, float, complex,
etc., but my sense is that that is explainable, e.g.,imagining both
`np.scalar` and `np.array` to have dtype attributes, one could say that the
default for one would be `'minimal'` and the other `'64bit'` (well, that
doesn't work for complex, but anyway).

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190607/cd39a162/attachment.html>

From ndbecker2 at gmail.com  Fri Jun  7 09:37:25 2019
From: ndbecker2 at gmail.com (Neal Becker)
Date: Fri, 7 Jun 2019 09:37:25 -0400
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
Message-ID: <CAG3t+pGzSG3-5bFz2tH0TYnh_HUA+LsjjCyoNGiXaNvNecZfoQ@mail.gmail.com>

On Wed, Jun 5, 2019 at 4:42 PM Sebastian Berg
<sebastian at sipsolutions.net> wrote:

I think the best approach is that if the user gave unambiguous types
as inputs to operators then the output should be the same dtype, or
type corresponding to the common promotion type of the inputs.

If the input type is not specified, I agree with the suggestion here:

> 3. Just associate python float with float64 and python integers with
>    long/int64 and force users to always type them explicitly if they
>    need to.

Explicit is better than implicit

-- 
Those who don't understand recursion are doomed to repeat it

From sebastian at sipsolutions.net  Fri Jun  7 14:19:37 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 07 Jun 2019 13:19:37 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <CABL7CQhVKyVxpHPZa6-sZZJjeKGohakcMe_b098hWh3j6VAEUg@mail.gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <CAPJVwBkS1BFSOUxm=L3NG6HDk7ZR=Df2a3SDMws8nTCmDiSPFw@mail.gmail.com>
 <CABL7CQhVKyVxpHPZa6-sZZJjeKGohakcMe_b098hWh3j6VAEUg@mail.gmail.com>
Message-ID: <3612b7ed300c825f5f728415004ce06c5d4280b1.camel@sipsolutions.net>

On Fri, 2019-06-07 at 07:18 +0200, Ralf Gommers wrote:
> 
> 
> On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <njs at pobox.com> wrote:
> > My intuition is that what users actually want is for *native Python
> > types* to be treated as having 'underspecified' dtypes, e.g. int is
> > happy to coerce to int8/int32/int64/whatever, float is happy to
> > coerce
> > to float32/float64/whatever, but once you have a fully-specified
> > numpy
> > dtype, it should stay.
> 
> Thanks Nathaniel, I think this expresses a possible solution better
> than anything I've seen on this list before. An explicit
> "underspecified types" concept could make casting understandable.

Yes, there is one small additional annoyance (but maybe it is just
that). In that 127 is the 'underspecified' dtype `uint7` (it can be
safely cast both to uint8 and int8).

> 
> > In any case, it would probably be helpful to start by just writing
> > down the whole set of rules we have now, because I'm not sure
> > anyone
> > understands all the details...
> 
> +1

OK, let me try to sketch the details below:

0. "Scalars" means scalars or 0-D arrays here.

1. The logic below will only be used if we have a mix of arrays and
scalars. If all are scalars, the logic is never used. (Plus one
additional tricky case within ufuncs, which is more hypothetical [0])

2. Scalars will only be demoted within their category. The categories
and casting rules within the category are as follows:

Boolean:
    Casts safely to all (nothing surprising).

Integers:
    Casting is possible if output can hold the value.
    This includes uint8(127) casting to an int8.
    (unsigned and signed integers are the same "category")

Floats:
    Scalars can be demoted based on value, roughly this
    avoids overflows:
        float16:     -65000 < value < 65000
        float32:    -3.4e38 < value < 3.4e38
        float64:   -1.7e308 < value < 1.7e308
        float128 (largest type, does not apply).

Complex: Same logic as floats (applied to .real and .imag).

Others: Anything else.

---

Ufunc, as well as `result_type` will use this liberally, which
basically means finding the smallest type for each category and using
that. Of course for floats we cannot do the actual cast until later,
since initially we do not know if the cast will actually be performed.

This is only tricky for uint vs. int, because uint8(127) is a "small
unsigned". I.e. with our current dtypes there is no strict type
hierarchy uint8(x) may or may not cast to int8. 

---

We could think of doing:

arr, min_dtype = np.asarray_and_min_dtype(pyobject)

which could even fix the list example Nathaniel had. Which would work
if you would do the dtype hierarchy.

This is where the `uint7` came from a hypothetical `uint7` would fix
the integer dtype hierarchy, by representing the numbers `0-127` which
can be cast to uint8 and int8.

Best,

Sebastian


[0] Amendment for point 1:

There is one detail (bug?) here in the logic though, that I missed
before. If a ufunc (or result_type) sees a mix of scalars and arrays,
it will try to decide whether or not to use value based logic. Value
based logic will be skipped if the scalars are in a higher category
(based on the ones above) then the highest array ? for optimization I
assume.
Plausibly, this could cause incorrect logic when the dtype signature of
a ufunc is mixed:
  float32, int8 -> float32
  float32, int64 -> float64

May choose the second loop unnecessarily. Or for example if we have a
datetime64 in the inputs, there would be no way for value based casting
to be used.


> 
> Ralf
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190607/4489771f/attachment.sig>

From sebastian at sipsolutions.net  Fri Jun  7 14:50:37 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 07 Jun 2019 13:50:37 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <74ecb7ce-7e22-fdd7-a2b2-2bc50e5dd8be@gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <058827b2-5d3e-8597-d6f6-c1c9b2cb42e7@gmail.com>
 <7cdbfe82028f8ad330ef310be27caa01993d3176.camel@sipsolutions.net>
 <74ecb7ce-7e22-fdd7-a2b2-2bc50e5dd8be@gmail.com>
Message-ID: <cc23b91ed11556af7cae880bb51f7116e026f2d6.camel@sipsolutions.net>

On Thu, 2019-06-06 at 19:34 -0400, Allan Haldane wrote:
> On 6/6/19 12:46 PM, Sebastian Berg wrote:
> > On Thu, 2019-06-06 at 11:57 -0400, Allan Haldane wrote:
> > > I think dtype-based casting makes a lot of sense, the problem is
> > > backward compatibility.
> > > 
> > > Numpy casting is weird in a number of ways: The array + array
> > > casting
> > > is
> > > unexpected to many users (eg, uint64 + int64 -> float64), and the
> > > casting of array + scalar is different from that, and value
> > > based.
> > > Personally I wouldn't want to try change it unless we make a
> > > backward-incompatible release (numpy 2.0), based on my experience
> > > trying
> > > to change much more minor things. We already put "casting" on the
> > > list
> > > of desired backward-incompatible changes on the list here:
> > > https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release
> > > 
> > > Relatedly, I've previously dreamed about a different "C-style"
> > > way
> > > casting might behave:
> > > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
> > > 
> > > The proposal there is that array + array casting, array + scalar,
> > > and
> > > array + python casting would all work in the same dtype-based
> > > way,
> > > which
> > > mimics the familiar "C" casting rules.
> > 
> > If I read it right, you do propose that array + python would cast
> > in a
> > "minimal type" way for python.
> 
> I'm a little unclear what you mean by "minimal type" way. By "minimal
> type", I thought you and others are talking about the rule numpy
> currently uses that "the output dtype is the minimal dtype capable of
> representing the value of both input dtypes", right? But in that gist
> I
> am instead proposing that output-dtype is determined by C-like rules.
> 
> For array+py_scalar I was less certain what to do than for
> array+array
> and array+npy_scalar. But I proposed the three "ranks" of 1. bool, 2.
> int, and 3. float/complex. My rule for array+py_scalar is that if the
> python scalar's rank is less than the numpy operand dtype's rank, use
> the numpy dtype. If the python-scalar's rank is greater, use the
> "default" types of bool_, int64, float64 respectively. Eg:
> 
> np.bool_(1) + 1        -> int64   (default int wins)
> np.int8(1) + 1         -> int8    (numpy wins)
> np.uint8(1) + (-1)     -> uint8   (numpy wins)
> np.int64(1) + 1        -> int64   (numpy wins)
> np.int64(1) + 1.0      -> float64 (default float wins)
> np.float32(1.0) + 1.0  -> float32 (numpy wins)
> 
> Note it does not depend on the numerical value of the scalar, only
> its type.
> 
> > In your write up, you describe that if you mix array + scalar, the
> > scalar uses a minimal dtype compared to the array's dtype. 
> 
> Sorry if I'm nitpicking/misunderstanding, but in my rules
> np.uint64(1) +
> 1 -> uint64 but in numpy's "minimal dtype" rules it is  -> float64.
> So I
> don't think I am using the minimal rule.
> 
> > What we
> > instead have is that in principle you could have loops such as:
> > 
> > "ifi->f"
> > "idi->d"
> > 
> > and I think we should chose the first for a scalar, because it
> > "fits"
> > into f just fine. (if the input is) `ufunc(int_arr, 12., int_arr)`.
> 
> I feel I'm not understanding you, but the casting rules in my gist
> follow those two rules if i, f are the numpy types int32 and float32.
> 
> If instead you mean (np.int64, py_float, np.int64) my rules would
> cast
> to float64, since py_float has the highest rank and so is converted
> to
> the default numpy-type for that rank, float64.

Yes, you are right. I should look at them a bit more carefully in any
case. Actually, numpy would also choose the second one, because it
python float has the higher "category". The example should rather have
been:

int8, float32 -> float32
int64, float32 -> float64

With `python_int(12) + np.array([1., 2.], dtype=float64)`. Numpy would
currently choose the int8 loop here, because the scalar is of a lower
or equal "category" and thus it is OK to demote it even further.

This is fairly irrelevant for most users. But for ufunc dispatching, I
think it is where it gets ugly. In non-uniform ufunc dtype signatures,
and no, I doubt that this is very relevant in practice or that numpy is
even very consistent here.

I have a branch now which basically moves the "ResultType" logic before
choosing the loop (it thus is unable to capture some of the stranger,
probably non-existing corner cases).


On a different note: The ranking you are suggesting for python types
seems very much the same as what we have, with the exception that it
would not look at the value (I suppose what we would do instead is to
simply raise a casting error):

int8_arr + 87345  # ouput should always be int8, so crash on cast?

Which may be a viable approach. Although, signed/unsigned may be
tricky:

uint8_arr + py_int  # do we look at the py_int's sign?


- Sebastian


> 
> I would also add that unlike current numpy, my C-casting rules are
> associative (if all operands are numpy types, see note below), so it
> does not matter in which order you promote the types: (if)i  and
> i(fi)
> give the same result. In current numpy this is not always the case:
> 
>     p = np.promote_types
>     p(p('u2',   'i1'), 'f4')    # ->  f8
>     p(  'u2', p('i1',  'f4'))   # ->  f4
> 
> (However, my casting rules are not associative if you include python
> scalars.. eg  np.float32(1) + 1.0 + np.int64(1) . Maybe I should try
> to
> fix that...)
> 
> Best,
> Allan
> 
> > I do not mind keeping the "simple" two (or even more) operand "lets
> > assume we have uniform types" logic around. For those it is easy to
> > find a "minimum type" even before actual loop lookup.
> > For the above example it would work in any case well, but it would
> > get
> > complicating, if for example the last integer is an unsigned
> > integer,
> > that happens to be small enough to fit also into an integer.
> > 
> > That might give some wiggle room, possibly also to attach warnings
> > to
> > it, or at least make things easier. But I would also like to figure
> > out
> > as well if we shouldn't try to move in any case. Sure, attach a
> > major
> > version to it, but hopefully not a "big step type".
> > 
> > One thing that I had not thought about is, that if we create
> > FutureWarnings, we will need to provide a way to opt-in to the
> > new/old
> > behaviour.
> > The old behaviour can be achieved by just using the python types
> > (which
> > probably is what most code that wants this behaviour does already),
> > but
> > the behaviour is tricky. Users can pass `dtype` explicitly, but
> > that is
> > a huge kludge...
> > Will think about if there is a solution to that, because if there
> > is
> > not, you are right. It has to be a "big step" kind of release.
> > Although, even then it would be nice to have warnings that can be
> > enabled to ease the transition!
> > 
> > - Sebastian
> > 
> > 
> > > See also:
> > > https://github.com/numpy/numpy/issues/12525
> > > 
> > > Allan
> > > 
> > > 
> > > On 6/5/19 4:41 PM, Sebastian Berg wrote:
> > > > Hi all,
> > > > 
> > > > TL;DR:
> > > > 
> > > > Value based promotion seems complex both for users and ufunc-
> > > > dispatching/promotion logic. Is there any way we can move
> > > > forward
> > > > here,
> > > > and if we do, could we just risk some possible (maybe not-
> > > > existing)
> > > > corner cases to break early to get on the way?
> > > > 
> > > > -----------
> > > > 
> > > > Currently when you write code such as:
> > > > 
> > > > arr = np.array([1, 43, 23], dtype=np.uint16)
> > > > res = arr + 1
> > > > 
> > > > Numpy uses fairly sophisticated logic to decide that `1` can be
> > > > represented as a uint16, and thus for all unary functions (and
> > > > most
> > > > others as well), the output will have a `res.dtype` of uint16.
> > > > 
> > > > Similar logic also exists for floating point types, where a
> > > > lower
> > > > precision floating point can be used:
> > > > 
> > > > arr = np.array([1, 43, 23], dtype=np.float32)
> > > > (arr + np.float64(2.)).dtype  # will be float32
> > > > 
> > > > Currently, this value based logic is enforced by checking
> > > > whether
> > > > the
> > > > cast is possible: "4" can be cast to int8, uint8. So the first
> > > > call
> > > > above will at some point check if "uint16 + uint16 -> uint16"
> > > > is a
> > > > valid operation, find that it is, and thus stop searching.
> > > > (There
> > > > is
> > > > the additional logic, that when both/all operands are scalars,
> > > > it
> > > > is
> > > > not applied).
> > > > 
> > > > Note that while it is defined in terms of casting "1" to uint8
> > > > safely
> > > > being possible even though 1 may be typed as int64. This logic
> > > > thus
> > > > affects all promotion rules as well (i.e. what should the
> > > > output
> > > > dtype
> > > > be).
> > > > 
> > > > 
> > > > There 2 main discussion points/issues about it:
> > > > 
> > > > 1. Should value based casting/promotion logic exist at all?
> > > > 
> > > > Arguably an `np.int32(3)` has type information attached to it,
> > > > so
> > > > why
> > > > should we ignore it. It can also be tricky for users, because a
> > > > small
> > > > change in values can change the result data type.
> > > > Because 0-D arrays and scalars are too close inside numpy (you
> > > > will
> > > > often not know which one you get). There is not much option but
> > > > to
> > > > handle them identically. However, it seems pretty odd that:
> > > >  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
> > > >  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> > > > 
> > > > give a different result.
> > > > 
> > > > This is a bit different for python scalars, which do not have a
> > > > type
> > > > attached already.
> > > > 
> > > > 
> > > > 2. Promotion and type resolution in Ufuncs:
> > > > 
> > > > What is currently bothering me is that the decision what the
> > > > output
> > > > dtypes should be currently depends on the values in complicated
> > > > ways.
> > > > It would be nice if we can decide which type signature to use
> > > > without
> > > > actually looking at values (or at least only very early on).
> > > > 
> > > > One reason here is caching and simplicity. I would like to be
> > > > able
> > > > to
> > > > cache which loop should be used for what input. Having value
> > > > based
> > > > casting in there bloats up the problem.
> > > > Of course it currently works OK, but especially when user
> > > > dtypes
> > > > come
> > > > into play, caching would seem like a nice optimization option.
> > > > 
> > > > Because `uint8(127)` can also be a `int8`, but uint8(128) it is
> > > > not
> > > > as
> > > > simple as finding the "minimal" dtype once and working with
> > > > that." 
> > > > Of course Eric and I discussed this a bit before, and you could
> > > > create
> > > > an internal "uint7" dtype which has the only purpose of
> > > > flagging
> > > > that a
> > > > cast to int8 is safe.
> > > > 
> > > > I suppose it is possible I am barking up the wrong tree here,
> > > > and
> > > > this
> > > > caching/predictability is not vital (or can be solved with such
> > > > an
> > > > internal dtype easily, although I am not sure it seems
> > > > elegant).
> > > > 
> > > > 
> > > > Possible options to move forward
> > > > --------------------------------
> > > > 
> > > > I have to still see a bit how trick things are. But there are a
> > > > few
> > > > possible options. I would like to move the scalar logic to the
> > > > beginning of ufunc calls:
> > > >   * The uint7 idea would be one solution
> > > >   * Simply implement something that works for numpy and all
> > > > except
> > > >     strange external ufuncs (I can only think of numba as a
> > > > plausible
> > > >     candidate for creating such).
> > > > 
> > > > My current plan is to see where the second thing leaves me.
> > > > 
> > > > We also should see if we cannot move the whole thing forward,
> > > > in
> > > > which
> > > > case the main decision would have to be forward to where. My
> > > > opinion is
> > > > currently that when a type has a dtype associated with it
> > > > clearly,
> > > > we
> > > > should always use that dtype in the future. This mostly means
> > > > that
> > > > numpy dtypes such as `np.int64` will always be treated like an
> > > > int64,
> > > > and never like a `uint8` because they happen to be castable to
> > > > that.
> > > > 
> > > > For values without a dtype attached (read python integers,
> > > > floats),
> > > > I
> > > > see three options, from more complex to simpler:
> > > > 
> > > > 1. Keep the current logic in place as much as possible
> > > > 2. Only support value based promotion for operators, e.g.:
> > > >    `arr + scalar` may do it, but `np.add(arr, scalar)` will
> > > > not.
> > > >    The upside is that it limits the complexity to a much
> > > > simpler
> > > >    problem, the downside is that the ufunc call and operator
> > > > match
> > > >    less clearly.
> > > > 3. Just associate python float with float64 and python integers
> > > > with
> > > >    long/int64 and force users to always type them explicitly if
> > > > they
> > > >    need to.
> > > > 
> > > > The downside of 1. is that it doesn't help with simplifying the
> > > > current
> > > > situation all that much, because we still have the special
> > > > casting
> > > > around...
> > > > 
> > > > 
> > > > I have realized that this got much too long, so I hope it makes
> > > > sense.
> > > > I will continue to dabble along on these things a bit, so if
> > > > nothing
> > > > else maybe writing it helps me to get a bit clearer on
> > > > things...
> > > > 
> > > > Best,
> > > > 
> > > > Sebastian
> > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190607/0893c09a/attachment-0001.sig>

From sebastian at sipsolutions.net  Fri Jun  7 17:41:36 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 07 Jun 2019 16:41:36 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <3612b7ed300c825f5f728415004ce06c5d4280b1.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <CAPJVwBkS1BFSOUxm=L3NG6HDk7ZR=Df2a3SDMws8nTCmDiSPFw@mail.gmail.com>
 <CABL7CQhVKyVxpHPZa6-sZZJjeKGohakcMe_b098hWh3j6VAEUg@mail.gmail.com>
 <3612b7ed300c825f5f728415004ce06c5d4280b1.camel@sipsolutions.net>
Message-ID: <48b6585921480f0eca0d988c53b2fbe8cafffa2c.camel@sipsolutions.net>

On Fri, 2019-06-07 at 13:19 -0500, Sebastian Berg wrote:
> On Fri, 2019-06-07 at 07:18 +0200, Ralf Gommers wrote:
> > 
> > On Fri, Jun 7, 2019 at 1:37 AM Nathaniel Smith <njs at pobox.com>
> > wrote:
> > > My intuition is that what users actually want is for *native
> > > Python
> > > types* to be treated as having 'underspecified' dtypes, e.g. int
> > > is
> > > happy to coerce to int8/int32/int64/whatever, float is happy to
> > > coerce
> > > to float32/float64/whatever, but once you have a fully-specified
> > > numpy
> > > dtype, it should stay.
> > 
> > Thanks Nathaniel, I think this expresses a possible solution better
> > than anything I've seen on this list before. An explicit
> > "underspecified types" concept could make casting understandable.
> 
> Yes, there is one small additional annoyance (but maybe it is just
> that). In that 127 is the 'underspecified' dtype `uint7` (it can be
> safely cast both to uint8 and int8).
> 
> > > In any case, it would probably be helpful to start by just
> > > writing
> > > down the whole set of rules we have now, because I'm not sure
> > > anyone
> > > understands all the details...
> > 
> > +1
> 
> OK, let me try to sketch the details below:
> 
> 0. "Scalars" means scalars or 0-D arrays here.
> 
> 1. The logic below will only be used if we have a mix of arrays and
> scalars. If all are scalars, the logic is never used. (Plus one
> additional tricky case within ufuncs, which is more hypothetical [0])
> 

And of course I just realized that, trying to be simple, I forgot an
important point there:

The logic in 2. is only used when there is a mix of scalars and arrays,
and the arrays are in the same or higher category. As an example:

np.array([1, 2, 3], dtype=np.uint8) + np.float64(12.)

will not demote the float64, because the scalars "float" is a higher
category than the arrays "integer".


- Sebastian


> 2. Scalars will only be demoted within their category. The categories
> and casting rules within the category are as follows:
> 
> Boolean:
>     Casts safely to all (nothing surprising).
> 
> Integers:
>     Casting is possible if output can hold the value.
>     This includes uint8(127) casting to an int8.
>     (unsigned and signed integers are the same "category")
> 
> Floats:
>     Scalars can be demoted based on value, roughly this
>     avoids overflows:
>         float16:     -65000 < value < 65000
>         float32:    -3.4e38 < value < 3.4e38
>         float64:   -1.7e308 < value < 1.7e308
>         float128 (largest type, does not apply).
> 
> Complex: Same logic as floats (applied to .real and .imag).
> 
> Others: Anything else.
> 
> ---
> 
> Ufunc, as well as `result_type` will use this liberally, which
> basically means finding the smallest type for each category and using
> that. Of course for floats we cannot do the actual cast until later,
> since initially we do not know if the cast will actually be
> performed.
> 
> This is only tricky for uint vs. int, because uint8(127) is a "small
> unsigned". I.e. with our current dtypes there is no strict type
> hierarchy uint8(x) may or may not cast to int8. 
> 
> ---
> 
> We could think of doing:
> 
> arr, min_dtype = np.asarray_and_min_dtype(pyobject)
> 
> which could even fix the list example Nathaniel had. Which would work
> if you would do the dtype hierarchy.
> 
> This is where the `uint7` came from a hypothetical `uint7` would fix
> the integer dtype hierarchy, by representing the numbers `0-127`
> which
> can be cast to uint8 and int8.
> 
> Best,
> 
> Sebastian
> 
> 
> [0] Amendment for point 1:
> 
> There is one detail (bug?) here in the logic though, that I missed
> before. If a ufunc (or result_type) sees a mix of scalars and arrays,
> it will try to decide whether or not to use value based logic. Value
> based logic will be skipped if the scalars are in a higher category
> (based on the ones above) then the highest array ? for optimization I
> assume.
> Plausibly, this could cause incorrect logic when the dtype signature
> of
> a ufunc is mixed:
>   float32, int8 -> float32
>   float32, int64 -> float64
> 
> May choose the second loop unnecessarily. Or for example if we have a
> datetime64 in the inputs, there would be no way for value based
> casting
> to be used.
> 
> 
> 
> > Ralf
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190607/ae84ee1d/attachment.sig>

From m.h.vankerkwijk at gmail.com  Mon Jun 10 13:47:07 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 10 Jun 2019 13:47:07 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and other
 duck/subclass limitations
Message-ID: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>

Hi All,

In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use
the new `where` argument for reductions to implement `nansum`, etc., using
simplifications that boil down to `np.sum(..., where=~isnan(...))`.

A problem that occurs is that `np.sum` will use a `.sum` method if that is
present, and for matrix, the `.sum` method does not take a `where`
argument. Since the `where` argument has been introduced only recently,
similar problems may happen for array mimics that implement their own
`.sum` method.

The question now is what to do, with options being:
1. Let's stick with the existing implementation; the speed-up is not that
great anyway.
2. Use try/except around the new implementation and use the old one if it
fails.
3. As (2), but emit a deprecation warning. This will help array mimics, but
not matrix (unless someone makes a PR; would we even accept it?);
4. Use the new implementation. `matrix` should be gone anyway and array
mimics can either update their `.sum()` method or override `np.nansum` with
`__array_function__`.

Personally, I'd prefer (4), but realize that (3) is probably the more safer
approach, even if it is really annoying to still be taking into account
matrix deficiencies.

All the best,

Marten

p.s. One could also avoid the `.sum()` override altogether by doing
`np.add.reduce(..., where=...)`, but this would probably break just as much.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190610/dfef4861/attachment.html>

From einstein.edison at gmail.com  Mon Jun 10 15:50:07 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Mon, 10 Jun 2019 21:50:07 +0200
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
Message-ID: <5ad131d428b3fc8473a99a7688e6b23861a148f7.camel@gmail.com>

On Mon, 2019-06-10 at 13:47 -0400, Marten van Kerkwijk wrote:
> Hi All,
> 
> In https://github.com/numpy/numpy/pull/12801, Tyler has been trying
> to use the new `where` argument for reductions to implement `nansum`,
> etc., using simplifications that boil down to `np.sum(...,
> where=~isnan(...))`.
> 
> A problem that occurs is that `np.sum` will use a `.sum` method if
> that is present, and for matrix, the `.sum` method does not take a
> `where` argument. Since the `where` argument has been introduced only
> recently, similar problems may happen for array mimics that implement
> their own `.sum` method.

Hi Marten! I ran into a similar issue with the initial kwarg when I
implemented it, except at that time I just used the np._NoValue.
> The question now is what to do, with options being:
> 1. Let's stick with the existing implementation; the speed-up is not
> that great anyway.
> 2. Use try/except around the new implementation and use the old one
> if it fails.
> 3. As (2), but emit a deprecation warning. This will help array
> mimics, but not matrix (unless someone makes a PR; would we even
> accept it?);
> 4. Use the new implementation. `matrix` should be gone anyway and
> array mimics can either update their `.sum()` method or override
> `np.nansum` with `__array_function__`.
> 
> Personally, I'd prefer (4), but realize that (3) is probably the more
> safer approach, even if it is really annoying to still be taking into
> account matrix deficiencies.

If nansum does any other kind of dispatch that should be kept around in
any case. Otherwise it failed then and it would fail now. We can catch
and raise the right type of exception for backwards compatibility if
needed.
> All the best,
> 
> Marten
> 
> p.s. One could also avoid the `.sum()` override altogether by doing
> `np.add.reduce(..., where=...)`, but this would probably break just
> as much.
> 
> 
> _______________________________________________NumPy-Discussion
> mailing listNumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190610/6fb266b9/attachment.html>

From ralf.gommers at gmail.com  Tue Jun 11 04:56:11 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 11 Jun 2019 10:56:11 +0200
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
Message-ID: <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>

On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi All,
>
> In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to
> use the new `where` argument for reductions to implement `nansum`, etc.,
> using simplifications that boil down to `np.sum(..., where=~isnan(...))`.
>
> A problem that occurs is that `np.sum` will use a `.sum` method if that is
> present, and for matrix, the `.sum` method does not take a `where`
> argument. Since the `where` argument has been introduced only recently,
> similar problems may happen for array mimics that implement their own
> `.sum` method.
>
> The question now is what to do, with options being:
> 1. Let's stick with the existing implementation; the speed-up is not that
> great anyway.
> 2. Use try/except around the new implementation and use the old one if it
> fails.
> 3. As (2), but emit a deprecation warning. This will help array mimics,
> but not matrix (unless someone makes a PR; would we even accept it?);
> 4. Use the new implementation. `matrix` should be gone anyway and array
> mimics can either update their `.sum()` method or override `np.nansum` with
> `__array_function__`.
>
> Personally, I'd prefer (4), but realize that (3) is probably the more
> safer approach, even if it is really annoying to still be taking into
> account matrix deficiencies.
>

Honestly, I agree with Tyler's assessment when he closed the PR:
"there's not much evidence of remarkable performance improvement, and many
of the other nan functions would have more complicated implementation
details that are unlikely to make the relative performance much better. It
doesn't seem to be a priority either."

The code also got more rather than less complex. So why do this? Yes you
give a reason why it may help duck arrays, but it also breaks things
apparently.

Re matrix: it's not deprecated yet and is still fairly widely used, so yes
we'd accept a PR and no we should not break it gratuitously.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/6b7e2c37/attachment.html>

From sebastian at sipsolutions.net  Tue Jun 11 11:52:25 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 11 Jun 2019 10:52:25 -0500
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
Message-ID: <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>

On Tue, 2019-06-11 at 10:56 +0200, Ralf Gommers wrote:
> 
> 
> On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
> > Hi All,
> > 
> > In https://github.com/numpy/numpy/pull/12801, Tyler has been trying
> > to use the new `where` argument for reductions to implement
> > `nansum`, etc., using simplifications that boil down to
> > `np.sum(..., where=~isnan(...))`.
> > 
> > A problem that occurs is that `np.sum` will use a `.sum` method if
> > that is present, and for matrix, the `.sum` method does not take a
> > `where` argument. Since the `where` argument has been introduced
> > only recently, similar problems may happen for array mimics that
> > implement their own `.sum` method.
> > 
> > The question now is what to do, with options being:
> > 1. Let's stick with the existing implementation; the speed-up is
> > not that great anyway.
> > 2. Use try/except around the new implementation and use the old one
> > if it fails.
> > 3. As (2), but emit a deprecation warning. This will help array
> > mimics, but not matrix (unless someone makes a PR; would we even
> > accept it?);
> > 4. Use the new implementation. `matrix` should be gone anyway and
> > array mimics can either update their `.sum()` method or override
> > `np.nansum` with `__array_function__`.
> > 
> > Personally, I'd prefer (4), but realize that (3) is probably the
> > more safer approach, even if it is really annoying to still be
> > taking into account matrix deficiencies.
> > 
> 
> Honestly, I agree with Tyler's assessment when he closed the PR: 
> "there's not much evidence of remarkable performance improvement, and
> many of the other nan functions would have more complicated
> implementation details that are unlikely to make the relative
> performance much better. It doesn't seem to be a priority either."
> 
> The code also got more rather than less complex. So why do this? Yes
> you give a reason why it may help duck arrays, but it also breaks
> things apparently.

There could probably be reasons for this, since replacing with 0 may
not work always for object arrays. But that is rather hypothetical
(e.g. using add as string concatenation).

Calling `np.add.reduce` instead of the sum method could be slightly
more compatible, but could have its own set of issue.
But I suppose it is true, unless we plan on adding specialized `where`
loops or a nanadd ufunc itself the gain is likely so small that it it
arguably may make more sense to defer this until there is an an actual
benefit (and less pain due to adoption of the new protocols).

Best,

Sebastian

> 
> Re matrix: it's not deprecated yet and is still fairly widely used,
> so yes we'd accept a PR and no we should not break it gratuitously.
> 
> Cheers,
> Ralf
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/f0c61cec/attachment-0001.sig>

From tyler.je.reddy at gmail.com  Tue Jun 11 13:47:32 2019
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Tue, 11 Jun 2019 10:47:32 -0700
Subject: [Numpy-discussion] NumPy Community Meeting June 12
Message-ID: <CAHPuU_bfrkedf3xvytTnB-fe6Jt1+MtmY_x=OBZL8-n-EUP=7Q@mail.gmail.com>

Hi,

There will be a NumPy Community meeting on June 12 at 11 am Pacific Time.
Anyone is free to join and edit the work-in-progress meeting notes:
https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?view

Best wishes,
Tyler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/a6139b61/attachment.html>

From m.h.vankerkwijk at gmail.com  Tue Jun 11 15:10:16 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 11 Jun 2019 15:10:16 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
Message-ID: <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>

OK, fair enough to just drop this!  In this particular case, I do not care
that much about duck-typing, just about making the implementation cleaner
(which it is if one doesn't have to worry about diversion via a `.sum()`
method).

In a way, I brought it up mostly as a concrete example of an internal
implementation which we cannot change to an objectively cleaner one because
other packages rely on an out-of-date numpy API.

-- Marten

On Tue, Jun 11, 2019 at 11:53 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Tue, 2019-06-11 at 10:56 +0200, Ralf Gommers wrote:
> >
> >
> > On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk <
> > m.h.vankerkwijk at gmail.com> wrote:
> > > Hi All,
> > >
> > > In https://github.com/numpy/numpy/pull/12801, Tyler has been trying
> > > to use the new `where` argument for reductions to implement
> > > `nansum`, etc., using simplifications that boil down to
> > > `np.sum(..., where=~isnan(...))`.
> > >
> > > A problem that occurs is that `np.sum` will use a `.sum` method if
> > > that is present, and for matrix, the `.sum` method does not take a
> > > `where` argument. Since the `where` argument has been introduced
> > > only recently, similar problems may happen for array mimics that
> > > implement their own `.sum` method.
> > >
> > > The question now is what to do, with options being:
> > > 1. Let's stick with the existing implementation; the speed-up is
> > > not that great anyway.
> > > 2. Use try/except around the new implementation and use the old one
> > > if it fails.
> > > 3. As (2), but emit a deprecation warning. This will help array
> > > mimics, but not matrix (unless someone makes a PR; would we even
> > > accept it?);
> > > 4. Use the new implementation. `matrix` should be gone anyway and
> > > array mimics can either update their `.sum()` method or override
> > > `np.nansum` with `__array_function__`.
> > >
> > > Personally, I'd prefer (4), but realize that (3) is probably the
> > > more safer approach, even if it is really annoying to still be
> > > taking into account matrix deficiencies.
> > >
> >
> > Honestly, I agree with Tyler's assessment when he closed the PR:
> > "there's not much evidence of remarkable performance improvement, and
> > many of the other nan functions would have more complicated
> > implementation details that are unlikely to make the relative
> > performance much better. It doesn't seem to be a priority either."
> >
> > The code also got more rather than less complex. So why do this? Yes
> > you give a reason why it may help duck arrays, but it also breaks
> > things apparently.
>
> There could probably be reasons for this, since replacing with 0 may
> not work always for object arrays. But that is rather hypothetical
> (e.g. using add as string concatenation).
>
> Calling `np.add.reduce` instead of the sum method could be slightly
> more compatible, but could have its own set of issue.
> But I suppose it is true, unless we plan on adding specialized `where`
> loops or a nanadd ufunc itself the gain is likely so small that it it
> arguably may make more sense to defer this until there is an an actual
> benefit (and less pain due to adoption of the new protocols).
>
> Best,
>
> Sebastian
>
> >
> > Re matrix: it's not deprecated yet and is still fairly widely used,
> > so yes we'd accept a PR and no we should not break it gratuitously.
> >
> > Cheers,
> > Ralf
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/75e62110/attachment.html>

From m.h.vankerkwijk at gmail.com  Tue Jun 11 15:13:11 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 11 Jun 2019 15:13:11 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
Message-ID: <CAJNV+9uLPkn1-FetzbBW9Xp9ibPz2QHAKYxqqUYtetBm6=5ggg@mail.gmail.com>

> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an out-of-date numpy API.
>
> Should have added: rely on an out-of-date numpy API where we have multiple
ways for packages to provide their own overrides. But I guess in this case
it is really matrix which is the problem. If so, maybe just adding kwargs
to it is something we should consider.

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/8a0c1289/attachment.html>

From stefanv at berkeley.edu  Tue Jun 11 18:01:38 2019
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Tue, 11 Jun 2019 15:01:38 -0700
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
Message-ID: <20190611220138.ldkqy4ik3igsb6ik@carbo>

On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
> In a way, I brought it up mostly as a concrete example of an internal
> implementation which we cannot change to an objectively cleaner one because
> other packages rely on an out-of-date numpy API.

This, and the comments Nathaniel made on the array function thread, are
important to take note of.  Would it be worth updating NEP 18 with a
list of pitfalls?  Or should this be a new informational NEP that
discusses?on a higher level?the benefits, risks, and design
considerations of providing protocols?

St?fan

From sebastian at sipsolutions.net  Tue Jun 11 18:13:37 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 11 Jun 2019 17:13:37 -0500
Subject: [Numpy-discussion] (Value Based Promotion) Current Behaviour
Message-ID: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/fe4f0339/attachment.sig>

From sebastian at sipsolutions.net  Tue Jun 11 20:45:56 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 11 Jun 2019 19:45:56 -0500
Subject: [Numpy-discussion] (Value Based Promotion) Current Behaviour
In-Reply-To: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>
References: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>
Message-ID: <296297cac0ffc34034512499b1b628240356034a.camel@sipsolutions.net>

Hi all,

strange, something went wrong sending that email, but in any case...

I tried to "summarize" the current behaviour of promotion and value
based promotion in numpy (correcting a small error in what I wrote
earlier). Since it got a bit long, you can find it here (also copy
pasted at the end):

https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA

Allan's document which I link in there is also very interesting. One
thing I had not really thought about before was the problem of
commutativity.

I do not have any specific points I want to discuss based on it (but
those are likely to come up again later).

All the Best,

Sebastian


-----------------------------

PS: Below a copy of what I wrote:

---
title: Numpy Value Based Promotion Rules
author: Sebastian Berg
---


NumPy Value Based Scalar Casting and Promotion
==============================================

This document reviews some of the behaviours of the promotion rules
within numpy. This is especially with respect to the promotion of
scalars and 0D arrays which inspect the value to decide casting and
promotion.

Other documents discussing these things:
  
  * `from numpy.testing import print_coercion_tables` prints the
current promotion tables including value based promotion for small
positive/negative scalars.
  * Allan Haldane's thoughts on changing casting/promotion to be more
C-like and discussing things such as here:
    https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
  * Discussion around the problem of uint64 and int64 being promoted to
float64: https://github.com/numpy/numpy/issues/12525 (lists many
related issues).


Nomenclature and Defintions
---------------------------

* **dtype/type**: The data type of an array or scalar: `float32`,
`float64`, `int8`, ?

* **Category**: A category to which the data type belongs, in this
context these are:
  1. boolean
  2. integer (unsigned and signed are not split up here, but are
different "kinds")
  3. floating point and complex (not split up here but are different
"kinds")
  5. All others

* **Casting**: converting from one dtype to another. There are four
different rules of casting:
  1. *"safe"* casting: All values are representable in the new data
type. I.e. no information is lost during the conversion.
  2. *"same kind"* casting: data loss may occur, but only within the
same "kind". For example a float64 can be converted to float32 using
"same kind" rules, an int64 can be converted to int16. This is although
both lose precision or even produce incorrect values. Note that "kind"
is different from "category" in that it distinguishes between signed
and unsigned integers.
  4. *"unsafe"* casting: Any conversion which can be defined, e.g.
floating point to integer. For promotion this is fairly unimportant.
(Some conversions such as string to integer, which not even work fall
in this category, but could also be called coercions or conversions.)
 
* **Promotion**: The general process of finding a new dtype for
multiple input dtypes. Will be used here to also denote any kind of
casting/promotion done before a specific function is called. This can
be more complex, because in rare cases a functions can for example take
floating point numbers and integers as input at the same time (i.e.
`np.ldexp`).

* **Common dtype**: A dtype which can represent all input data. In
general this means that all inputs can be safely cast to this dtype.
Within numpy this is the normal and simplest form of promotion.
  
* **`type1, type2 -> type3`**: Defines a promotion or signature. For
example adding two integers: `np.int32(5) + np.int32(3)` gives
`np.int32(8)`. The dtype signature for that example would be: `int32,
int32 -> int32`. A short form for this is also `ii->i` using C-like
type codes, this can be found for example in `np.ldexp.types` (and any
numpy ufunc).

* **Scalar**: A numpy or python scalar or a **0-D array**. It is
important to remember that zero dimensional arrays are treated just
like scalars with respect to casting and promotion.


Current Situation in Numpy
--------------------------

The current situation can be understand mostly in terms of safe casting
which is defined based on the type hierarchy and is sensitive to values
for scalars.

This safe casting based approach is in contrast for example to
promotion within C or Julia, which work based on category first. For
example `int32` cannot be safely cast to `float32`, but C or Julia will
use `int32, float32 -> float32` as the common type/promotion rule for
example to decide on the output dtype for addition.


### Python Integers and Floats

Note that python integers are handled exactly like numpy ones. They
are, however, special in that they do not have a dtype associated with
them explicitly. Value based logic, as described here, seems useful for
python integers and floats to allow:
```
arr = np.arange(10, dtype=np.int8)
arr += 1
# or:
res = arr + 1
res.dtype == np.int8
```
which ensures that no upcast (for example with higher memory usage)
occurs.


### Safe Casting

Most safe casting is clearly defined based on whether or not any
possible value is representable in the ouput dtype. Within numpy there
is currently a single exception to this rule: `np.can_cast(np.int64,
np.float64, casting="safe")` is considered to be true although float64
cannot represent some large integer values exactly. In contrast,
`np.can_cast(np.int32, np.float32, casting="safe")` is `False` and
`np.float64` would have to be used if a "safe" cast is desired.

This exception may be one thing that should be changed, however,
concurrently the promotion rules have to be adapted to keep doing the
same thing, or a larger behaviour change decided.


#### Scalar based rules

Unlike arrays, where inspection of all values is not feasable, for
scalars (and 0-D arrays) the value is inspected. The casting becomes a
two step process:
  1. The minimal dtype capable of holding the value is found.
  2. The normal casting rules are applied to the new dtype.

The first step uses the following rules by finding the minimal dtype
within its category:

 * Boolean: Dtype is already minimal

 * Integers:
    Casting is possible if output can hold the value. This includes
uint8(127) casting to an int8.

 * Floats and Complex
    Scalars can be demoted based on value, roughly this avoids
overflows:
    ```
    float16:     -65000 < value < 65000
    float32:    -3.4e38 < value < 3.4e38
    float64:   -1.7e308 < value < 1.7e308
    float128 (largest type, does not apply).
    ```
    For complex, the logic is simply applied to both real and imaginary
part. Complex numbers cannot be downcast to floating point.

 * Others: Dtype is not modified.


This two step process means that `np.can_cast(np.int16(1024),
np.float16)` is `False` even though float16 is capable of exactly
representing the value 1024, since value based "demotion" to a lower
dtype is used only within each category.


### Common Type Promotion

For most operations in numpy the output type is just the common type of
the inputs, this holds for example for concatenation, as well as almost
all math funcions (e.g. addition and multiplication have two identical
inputs and need one ouput dtype). This operation is exposed as
`np.result_type` which includes value based logic, and
`np.promote_types` which only accepts dtypes as input.

Normal type promotion without value based/scalar logic finds the
smallest type which both inputs can cast to safely. This will be the
largest "kind" (bool < unsigned < integer < float < complex < other).

Note that type promotion is handled in a "reduce" manner from left to
right. In rare cases this means it is not associatetive: `float32,
uint16, int16 -> float32`, but `float32, (uint16, int16) -> float64`.

#### Scalar based rule

When there is a mix of scalars and arrays, numpy will usually allow the
scalars to be handled in the same fashion as for "safe" casting rules.

The rules are as follows:

1. Value based logic is only applied if the "category" of any array is
larger or equal to the category of all scalars. If this is not the
case, the typical rules are used.
    * Specifically, this means: `np.array([1, 2, 3], dtype=np.uint8) +
np.float64(12.)` gives a `float64` result, because the
`np.float64(12.)` is not considered for being demoted.

2. Promotion is applied as normally, however, instead of the original
dtype, the minimal dtype is used. In the case where the minimal data
type is unsigned (say uint8) but the value is small enough, the minimal
type may in fact be either `uint8` or `int8` (127 can be both). This
promotion is also applied in pairs (reduction-like) from left to right.


### General Promotion during Function Execution

General functions (read "ufuncs" such as `np.add`) may have a specific
dtype signature which is (for most dtypes) stored e.g. as
`np.add.types`. For many of these functions the common type promotion
is used unchanged.

However, some functions will employ a slightly different method (which
should be equivalent in most cases). They will loop through all loops
listed in `np.add.types` in order and find the first one to which all
inputs can be safely cast:
```
np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...]
```
Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the first
`float16, float16 -> float16` (`'ee->e'`) loop because `int16` cannot
be cast safely, and then pick the float32 (`'ff->f'`) one.

For simple functions, which commonly have two identical inputs, this
should be identical, since normally a clear order exists for the dtypes
(it does require checking int8 before uint8, etc.).

#### Scalar based rule

When scalars are involved, the "safe" cast logic based on values is
applied *if and only if* rule 1. applies as before: That is there must
be an array with a higher or equal category as all of the scalars.

In the above `np.divide` example, this means that
`np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will* use
the `'ee->e'` loop, because the scalar `4` is of a lower or equal
category than the array (integer <= float or complex). While checking,
4 is found to be safely castable to float16, since `(u)int8` is
sufficient to hold 4 and that can be safely cast to `float16`.
However, `np.divide(np.int16(4), np.int16(3))` would use `float32`
because both are scalars and thus value based logic is not used (Note
that in reality numpy forces double output for an all integer input in
divide).

In it is possible for ufuncs to have mixed type signatures (this is
very rare within numy) and arbitrary inputs. In this case, in
principle, the question is whether or not a clear ordering exists and
if the rule of using value based logic is always clear. This is rather
academical (I could not find any such function in numpy or
`scipy.special` [^scipy-ufuncs]). But consider:
```
imaginary_ufunc.types:
    int32, float32 -> int32, float32
    int64, float32 -> int64, float32
    ...
```
it is not clear that `np.int64(5) + np.float32(3.)` should be able to
demote the `5`. This is very theoretical of course


Footnotes
---------

[^scipy-ufuncs]: See for example these functions:
    ```python
    import scipy.special
    for n, func in scipy.special.__dict__.items():
        if not isinstance(func, np.ufunc):
            continue

        if func.nin == 1:
            # a single input is not interesting
            continue

        # check if the signature is not uniform
        for types in func.types:
            if len(set(types[:func.nin])) != 1:
                break
        else:
            continue
        print(func, func.types)
    ```
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/fa46530f/attachment.sig>

From m.h.vankerkwijk at gmail.com  Tue Jun 11 22:08:40 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 11 Jun 2019 22:08:40 -0400
Subject: [Numpy-discussion] (Value Based Promotion) Current Behaviour
In-Reply-To: <296297cac0ffc34034512499b1b628240356034a.camel@sipsolutions.net>
References: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>
 <296297cac0ffc34034512499b1b628240356034a.camel@sipsolutions.net>
Message-ID: <CAJNV+9vkJtnPZ83KbXhExHd-93vk3+H-aAa38pqPtHqcNbdHzw@mail.gmail.com>

HI Sebastian,

Thanks for the overview! In the value-based casting, what perhaps surprises
me most is that it is done within a kind; it would seem an improvement to
check whether a given integer scalar is exactly representable in a given
float (your example of 1024 in `float16`). If we switch to the python-only
scalar values idea, I would suggest to abandon this. That might make
dealing with things like `Decimal` or `Fraction` easier as well.

All the best,

Marten

On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> strange, something went wrong sending that email, but in any case...
>
> I tried to "summarize" the current behaviour of promotion and value
> based promotion in numpy (correcting a small error in what I wrote
> earlier). Since it got a bit long, you can find it here (also copy
> pasted at the end):
>
> https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA
>
> Allan's document which I link in there is also very interesting. One
> thing I had not really thought about before was the problem of
> commutativity.
>
> I do not have any specific points I want to discuss based on it (but
> those are likely to come up again later).
>
> All the Best,
>
> Sebastian
>
>
> -----------------------------
>
> PS: Below a copy of what I wrote:
>
> ---
> title: Numpy Value Based Promotion Rules
> author: Sebastian Berg
> ---
>
>
>
> NumPy Value Based Scalar Casting and Promotion
> ==============================================
>
> This document reviews some of the behaviours of the promotion rules
> within numpy. This is especially with respect to the promotion of
> scalars and 0D arrays which inspect the value to decide casting and
> promotion.
>
> Other documents discussing these things:
>
>   * `from numpy.testing import print_coercion_tables` prints the
> current promotion tables including value based promotion for small
> positive/negative scalars.
>   * Allan Haldane's thoughts on changing casting/promotion to be more
> C-like and discussing things such as here:
>     https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
>   * Discussion around the problem of uint64 and int64 being promoted to
> float64: https://github.com/numpy/numpy/issues/12525 (lists many
> related issues).
>
>
> Nomenclature and Defintions
> ---------------------------
>
> * **dtype/type**: The data type of an array or scalar: `float32`,
> `float64`, `int8`, ?
>
> * **Category**: A category to which the data type belongs, in this
> context these are:
>   1. boolean
>   2. integer (unsigned and signed are not split up here, but are
> different "kinds")
>   3. floating point and complex (not split up here but are different
> "kinds")
>   5. All others
>
> * **Casting**: converting from one dtype to another. There are four
> different rules of casting:
>   1. *"safe"* casting: All values are representable in the new data
> type. I.e. no information is lost during the conversion.
>   2. *"same kind"* casting: data loss may occur, but only within the
> same "kind". For example a float64 can be converted to float32 using
> "same kind" rules, an int64 can be converted to int16. This is although
> both lose precision or even produce incorrect values. Note that "kind"
> is different from "category" in that it distinguishes between signed
> and unsigned integers.
>   4. *"unsafe"* casting: Any conversion which can be defined, e.g.
> floating point to integer. For promotion this is fairly unimportant.
> (Some conversions such as string to integer, which not even work fall
> in this category, but could also be called coercions or conversions.)
>
> * **Promotion**: The general process of finding a new dtype for
> multiple input dtypes. Will be used here to also denote any kind of
> casting/promotion done before a specific function is called. This can
> be more complex, because in rare cases a functions can for example take
> floating point numbers and integers as input at the same time (i.e.
> `np.ldexp`).
>
> * **Common dtype**: A dtype which can represent all input data. In
> general this means that all inputs can be safely cast to this dtype.
> Within numpy this is the normal and simplest form of promotion.
>
> * **`type1, type2 -> type3`**: Defines a promotion or signature. For
> example adding two integers: `np.int32(5) + np.int32(3)` gives
> `np.int32(8)`. The dtype signature for that example would be: `int32,
> int32 -> int32`. A short form for this is also `ii->i` using C-like
> type codes, this can be found for example in `np.ldexp.types` (and any
> numpy ufunc).
>
> * **Scalar**: A numpy or python scalar or a **0-D array**. It is
> important to remember that zero dimensional arrays are treated just
> like scalars with respect to casting and promotion.
>
>
> Current Situation in Numpy
> --------------------------
>
> The current situation can be understand mostly in terms of safe casting
> which is defined based on the type hierarchy and is sensitive to values
> for scalars.
>
> This safe casting based approach is in contrast for example to
> promotion within C or Julia, which work based on category first. For
> example `int32` cannot be safely cast to `float32`, but C or Julia will
> use `int32, float32 -> float32` as the common type/promotion rule for
> example to decide on the output dtype for addition.
>
>
> ### Python Integers and Floats
>
> Note that python integers are handled exactly like numpy ones. They
> are, however, special in that they do not have a dtype associated with
> them explicitly. Value based logic, as described here, seems useful for
> python integers and floats to allow:
> ```
> arr = np.arange(10, dtype=np.int8)
> arr += 1
> # or:
> res = arr + 1
> res.dtype == np.int8
> ```
> which ensures that no upcast (for example with higher memory usage)
> occurs.
>
>
> ### Safe Casting
>
> Most safe casting is clearly defined based on whether or not any
> possible value is representable in the ouput dtype. Within numpy there
> is currently a single exception to this rule: `np.can_cast(np.int64,
> np.float64, casting="safe")` is considered to be true although float64
> cannot represent some large integer values exactly. In contrast,
> `np.can_cast(np.int32, np.float32, casting="safe")` is `False` and
> `np.float64` would have to be used if a "safe" cast is desired.
>
> This exception may be one thing that should be changed, however,
> concurrently the promotion rules have to be adapted to keep doing the
> same thing, or a larger behaviour change decided.
>
>
> #### Scalar based rules
>
> Unlike arrays, where inspection of all values is not feasable, for
> scalars (and 0-D arrays) the value is inspected. The casting becomes a
> two step process:
>   1. The minimal dtype capable of holding the value is found.
>   2. The normal casting rules are applied to the new dtype.
>
> The first step uses the following rules by finding the minimal dtype
> within its category:
>
>  * Boolean: Dtype is already minimal
>
>  * Integers:
>     Casting is possible if output can hold the value. This includes
> uint8(127) casting to an int8.
>
>  * Floats and Complex
>     Scalars can be demoted based on value, roughly this avoids
> overflows:
>     ```
>     float16:     -65000 < value < 65000
>     float32:    -3.4e38 < value < 3.4e38
>     float64:   -1.7e308 < value < 1.7e308
>     float128 (largest type, does not apply).
>     ```
>     For complex, the logic is simply applied to both real and imaginary
> part. Complex numbers cannot be downcast to floating point.
>
>  * Others: Dtype is not modified.
>
>
> This two step process means that `np.can_cast(np.int16(1024),
> np.float16)` is `False` even though float16 is capable of exactly
> representing the value 1024, since value based "demotion" to a lower
> dtype is used only within each category.
>
>
>
> ### Common Type Promotion
>
> For most operations in numpy the output type is just the common type of
> the inputs, this holds for example for concatenation, as well as almost
> all math funcions (e.g. addition and multiplication have two identical
> inputs and need one ouput dtype). This operation is exposed as
> `np.result_type` which includes value based logic, and
> `np.promote_types` which only accepts dtypes as input.
>
> Normal type promotion without value based/scalar logic finds the
> smallest type which both inputs can cast to safely. This will be the
> largest "kind" (bool < unsigned < integer < float < complex < other).
>
> Note that type promotion is handled in a "reduce" manner from left to
> right. In rare cases this means it is not associatetive: `float32,
> uint16, int16 -> float32`, but `float32, (uint16, int16) -> float64`.
>
> #### Scalar based rule
>
> When there is a mix of scalars and arrays, numpy will usually allow the
> scalars to be handled in the same fashion as for "safe" casting rules.
>
> The rules are as follows:
>
> 1. Value based logic is only applied if the "category" of any array is
> larger or equal to the category of all scalars. If this is not the
> case, the typical rules are used.
>     * Specifically, this means: `np.array([1, 2, 3], dtype=np.uint8) +
> np.float64(12.)` gives a `float64` result, because the
> `np.float64(12.)` is not considered for being demoted.
>
> 2. Promotion is applied as normally, however, instead of the original
> dtype, the minimal dtype is used. In the case where the minimal data
> type is unsigned (say uint8) but the value is small enough, the minimal
> type may in fact be either `uint8` or `int8` (127 can be both). This
> promotion is also applied in pairs (reduction-like) from left to right.
>
>
> ### General Promotion during Function Execution
>
> General functions (read "ufuncs" such as `np.add`) may have a specific
> dtype signature which is (for most dtypes) stored e.g. as
> `np.add.types`. For many of these functions the common type promotion
> is used unchanged.
>
> However, some functions will employ a slightly different method (which
> should be equivalent in most cases). They will loop through all loops
> listed in `np.add.types` in order and find the first one to which all
> inputs can be safely cast:
> ```
> np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...]
> ```
> Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the first
> `float16, float16 -> float16` (`'ee->e'`) loop because `int16` cannot
> be cast safely, and then pick the float32 (`'ff->f'`) one.
>
> For simple functions, which commonly have two identical inputs, this
> should be identical, since normally a clear order exists for the dtypes
> (it does require checking int8 before uint8, etc.).
>
> #### Scalar based rule
>
> When scalars are involved, the "safe" cast logic based on values is
> applied *if and only if* rule 1. applies as before: That is there must
> be an array with a higher or equal category as all of the scalars.
>
> In the above `np.divide` example, this means that
> `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will* use
> the `'ee->e'` loop, because the scalar `4` is of a lower or equal
> category than the array (integer <= float or complex). While checking,
> 4 is found to be safely castable to float16, since `(u)int8` is
> sufficient to hold 4 and that can be safely cast to `float16`.
> However, `np.divide(np.int16(4), np.int16(3))` would use `float32`
> because both are scalars and thus value based logic is not used (Note
> that in reality numpy forces double output for an all integer input in
> divide).
>
> In it is possible for ufuncs to have mixed type signatures (this is
> very rare within numy) and arbitrary inputs. In this case, in
> principle, the question is whether or not a clear ordering exists and
> if the rule of using value based logic is always clear. This is rather
> academical (I could not find any such function in numpy or
> `scipy.special` [^scipy-ufuncs]). But consider:
> ```
> imaginary_ufunc.types:
>     int32, float32 -> int32, float32
>     int64, float32 -> int64, float32
>     ...
> ```
> it is not clear that `np.int64(5) + np.float32(3.)` should be able to
> demote the `5`. This is very theoretical of course
>
>
>
>
> Footnotes
> ---------
>
> [^scipy-ufuncs]: See for example these functions:
>     ```python
>     import scipy.special
>     for n, func in scipy.special.__dict__.items():
>         if not isinstance(func, np.ufunc):
>             continue
>
>         if func.nin == 1:
>             # a single input is not interesting
>             continue
>
>         # check if the signature is not uniform
>         for types in func.types:
>             if len(set(types[:func.nin])) != 1:
>                 break
>         else:
>             continue
>         print(func, func.types)
>     ```
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/626aab65/attachment-0001.html>

From charlesr.harris at gmail.com  Wed Jun 12 00:03:26 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 11 Jun 2019 22:03:26 -0600
Subject: [Numpy-discussion] branch 1.17
Message-ID: <CAB6mnxKggS9V2n41M7S0sDbK7dtb=0CuXbQr2HqshgtvhUtbtQ@mail.gmail.com>

Hi All,

I'm thinking of branching 1.17 this coming (June 15) weekend. The major
components are in: pocketfft, randomgen, and the new sorting routines,
along with default `__array_function__`. They may need a bit of polish, but
I think little remains to be done. If there are other pending PRs that you
think need to be in the next release, please yell. This would also be a
good time to review the release notes and go through the 1.17 issues to
determine if any are release blockers.

Cheers,

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190611/2e075c8a/attachment.html>

From sebastian at sipsolutions.net  Wed Jun 12 10:58:58 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 12 Jun 2019 09:58:58 -0500
Subject: [Numpy-discussion] New release note strategy after branching 1.17.
Message-ID: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>

Hi all,

we had discussed trying a new strategy to gather release notes on the
last community call, but not followed up on it on the list yet.

For the next release, we decided to try a strategy of using a wiki page
to gather release notes. The main purpose for this is to avoid merge
conflicts in the release notes file. It may also make things slightly
easier for new contributors even without merge conflicts.
Any comments/opinions about other alternatives are welcome.

We probably still need to fix some details, but I this will probably
mean:

1. We tag issues with "Needs Release Notes"
2. We ask contributors/maintainers to edit the initial PR post/comment
with a release note snippet. (I expect maintainers may typically put in
a placeholder as a start for the contributor.)
3. After merging, the release notes are copied into the wiki by the
user or a contributor. After the copy happened, the label could/should
be removed?

SciPy uses a similar strategy, so they may already have some experience
to do it slightly different that I am missing.

Best Regards,

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/696127c9/attachment.sig>

From ralf.gommers at gmail.com  Wed Jun 12 11:57:08 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 12 Jun 2019 17:57:08 +0200
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
Message-ID: <CABL7CQgf=Rs2pKDKZ38k5rsAAf0J_GTotgS-AxmUvebG1yj9fQ@mail.gmail.com>

On Wed, Jun 12, 2019 at 4:59 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> we had discussed trying a new strategy to gather release notes on the
> last community call, but not followed up on it on the list yet.
>
> For the next release, we decided to try a strategy of using a wiki page
> to gather release notes. The main purpose for this is to avoid merge
> conflicts in the release notes file. It may also make things slightly
> easier for new contributors even without merge conflicts.
> Any comments/opinions about other alternatives are welcome.
>
> We probably still need to fix some details, but I this will probably
> mean:
>
> 1. We tag issues with "Needs Release Notes"
> 2. We ask contributors/maintainers to edit the initial PR post/comment
> with a release note snippet. (I expect maintainers may typically put in
> a placeholder as a start for the contributor.)
> 3. After merging, the release notes are copied into the wiki by the
> user or a contributor. After the copy happened, the label could/should
> be removed?
>
> SciPy uses a similar strategy, so they may already have some experience
> to do it slightly different that I am missing.
>

Sounds great to me. It's actually a little more formalized already then
what we do for SciPy. I like the "needs release notes" label ideae.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/1e511403/attachment.html>

From charlesr.harris at gmail.com  Wed Jun 12 12:18:09 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 12 Jun 2019 10:18:09 -0600
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
Message-ID: <CAB6mnx+5y8trfXONDY8=E1VaVrshdhzRJPpeGdQa4-hT7W2paQ@mail.gmail.com>

On Wed, Jun 12, 2019 at 8:59 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> we had discussed trying a new strategy to gather release notes on the
> last community call, but not followed up on it on the list yet.
>
> For the next release, we decided to try a strategy of using a wiki page
> to gather release notes. The main purpose for this is to avoid merge
> conflicts in the release notes file. It may also make things slightly
> easier for new contributors even without merge conflicts.
> Any comments/opinions about other alternatives are welcome.
>
> We probably still need to fix some details, but I this will probably
> mean:
>
> 1. We tag issues with "Needs Release Notes"
> 2. We ask contributors/maintainers to edit the initial PR post/comment
> with a release note snippet. (I expect maintainers may typically put in
> a placeholder as a start for the contributor.)
> 3. After merging, the release notes are copied into the wiki by the
> user or a contributor. After the copy happened, the label could/should
> be removed?
>
> SciPy uses a similar strategy, so they may already have some experience
> to do it slightly different that I am missing.
>
>
As an aid to future automation once the note is in the PR summary, we might
want to add a selection of  `notes: section` labels. The notes are also in
RST which is not completely compatible with MD, so we might want to be
careful about some things. Maybe put everything in a codeblock?

Chuck

>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/b129fdd5/attachment.html>

From sebastian at sipsolutions.net  Wed Jun 12 13:03:57 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 12 Jun 2019 12:03:57 -0500
Subject: [Numpy-discussion] (Value Based Promotion) Current Behaviour
In-Reply-To: <CAJNV+9vkJtnPZ83KbXhExHd-93vk3+H-aAa38pqPtHqcNbdHzw@mail.gmail.com>
References: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>
 <296297cac0ffc34034512499b1b628240356034a.camel@sipsolutions.net>
 <CAJNV+9vkJtnPZ83KbXhExHd-93vk3+H-aAa38pqPtHqcNbdHzw@mail.gmail.com>
Message-ID: <b1bb5d74e179c891cefff04bdd18f2b48dafc4ba.camel@sipsolutions.net>

On Tue, 2019-06-11 at 22:08 -0400, Marten van Kerkwijk wrote:
> HI Sebastian,
> 
> Thanks for the overview! In the value-based casting, what perhaps
> surprises me most is that it is done within a kind; it would seem an
> improvement to check whether a given integer scalar is exactly
> representable in a given float (your example of 1024 in `float16`).
> If we switch to the python-only scalar values idea, I would suggest
> to abandon this. That might make dealing with things like `Decimal`
> or `Fraction` easier as well.
> 

Yeah, one can argue that since we have this "safe casting" based
approach, we should go all the way for the value based logic. I think I
tend to agree, but I am not quite sure right now to be honest.

Fractions and Decimals are very interesting in that they raise the
question what happens to user dtypes [0]. Although, you would still
need a "no lower category" rule, since you do not want 1024. or 12/3 be
demoted to an integer.

For me right now, what is most interesting is what we should do with
ufunc calls, and if we can simplify them. I feel right now we have to
types of ufuncs:

1. Ufuncs which use a "common type", where we can find the minimal type
before dispatching.

2. More complex ufuncs, for which finding the minimal type is trickier
[1]. And while I could not find any weird enough ufunc, I am not sure
that blind promotion is a good idea for general ufuncs.

Best,

Sebastian


[0] A python fraction could be converted to int64/int64 or int32/int32,
etc. depending on the value, in principle. If we want such things to
work in principle, we need machinery (although I expect one could tag
that on later).
[1] It is not impossible, but we need to insert non-existing types into
the type hierarchy.


PS: Another interesting issue is that if we try to move away from value
based casting for numpy scalars, that initial `np.asarray(...)` call
may lose the information that a python integer was passed in. So to
support such things, we might need a whole new machinery.


> All the best,
> 
> Marten
> 
> On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
> > Hi all,
> > 
> > strange, something went wrong sending that email, but in any
> > case...
> > 
> > I tried to "summarize" the current behaviour of promotion and value
> > based promotion in numpy (correcting a small error in what I wrote
> > earlier). Since it got a bit long, you can find it here (also copy
> > pasted at the end):
> > 
> > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA
> > 
> > Allan's document which I link in there is also very interesting.
> > One
> > thing I had not really thought about before was the problem of
> > commutativity.
> > 
> > I do not have any specific points I want to discuss based on it
> > (but
> > those are likely to come up again later).
> > 
> > All the Best,
> > 
> > Sebastian
> > 
> > 
> > -----------------------------
> > 
> > PS: Below a copy of what I wrote:
> > 
> > ---
> > title: Numpy Value Based Promotion Rules
> > author: Sebastian Berg
> > ---
> > 
> > 
> > 
> > NumPy Value Based Scalar Casting and Promotion
> > ==============================================
> > 
> > This document reviews some of the behaviours of the promotion rules
> > within numpy. This is especially with respect to the promotion of
> > scalars and 0D arrays which inspect the value to decide casting and
> > promotion.
> > 
> > Other documents discussing these things:
> > 
> >   * `from numpy.testing import print_coercion_tables` prints the
> > current promotion tables including value based promotion for small
> > positive/negative scalars.
> >   * Allan Haldane's thoughts on changing casting/promotion to be
> > more
> > C-like and discussing things such as here:
> >     
> > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
> >   * Discussion around the problem of uint64 and int64 being
> > promoted to
> > float64: https://github.com/numpy/numpy/issues/12525 (lists many
> > related issues).
> > 
> > 
> > Nomenclature and Defintions
> > ---------------------------
> > 
> > * **dtype/type**: The data type of an array or scalar: `float32`,
> > `float64`, `int8`, ?
> > 
> > * **Category**: A category to which the data type belongs, in this
> > context these are:
> >   1. boolean
> >   2. integer (unsigned and signed are not split up here, but are
> > different "kinds")
> >   3. floating point and complex (not split up here but are
> > different
> > "kinds")
> >   5. All others
> > 
> > * **Casting**: converting from one dtype to another. There are four
> > different rules of casting:
> >   1. *"safe"* casting: All values are representable in the new data
> > type. I.e. no information is lost during the conversion.
> >   2. *"same kind"* casting: data loss may occur, but only within
> > the
> > same "kind". For example a float64 can be converted to float32
> > using
> > "same kind" rules, an int64 can be converted to int16. This is
> > although
> > both lose precision or even produce incorrect values. Note that
> > "kind"
> > is different from "category" in that it distinguishes between
> > signed
> > and unsigned integers.
> >   4. *"unsafe"* casting: Any conversion which can be defined, e.g.
> > floating point to integer. For promotion this is fairly
> > unimportant.
> > (Some conversions such as string to integer, which not even work
> > fall
> > in this category, but could also be called coercions or
> > conversions.)
> > 
> > * **Promotion**: The general process of finding a new dtype for
> > multiple input dtypes. Will be used here to also denote any kind of
> > casting/promotion done before a specific function is called. This
> > can
> > be more complex, because in rare cases a functions can for example
> > take
> > floating point numbers and integers as input at the same time (i.e.
> > `np.ldexp`).
> > 
> > * **Common dtype**: A dtype which can represent all input data. In
> > general this means that all inputs can be safely cast to this
> > dtype.
> > Within numpy this is the normal and simplest form of promotion.
> > 
> > * **`type1, type2 -> type3`**: Defines a promotion or signature.
> > For
> > example adding two integers: `np.int32(5) + np.int32(3)` gives
> > `np.int32(8)`. The dtype signature for that example would be:
> > `int32,
> > int32 -> int32`. A short form for this is also `ii->i` using C-like
> > type codes, this can be found for example in `np.ldexp.types` (and
> > any
> > numpy ufunc).
> > 
> > * **Scalar**: A numpy or python scalar or a **0-D array**. It is
> > important to remember that zero dimensional arrays are treated just
> > like scalars with respect to casting and promotion.
> > 
> > 
> > Current Situation in Numpy
> > --------------------------
> > 
> > The current situation can be understand mostly in terms of safe
> > casting
> > which is defined based on the type hierarchy and is sensitive to
> > values
> > for scalars.
> > 
> > This safe casting based approach is in contrast for example to
> > promotion within C or Julia, which work based on category first.
> > For
> > example `int32` cannot be safely cast to `float32`, but C or Julia
> > will
> > use `int32, float32 -> float32` as the common type/promotion rule
> > for
> > example to decide on the output dtype for addition.
> > 
> > 
> > ### Python Integers and Floats
> > 
> > Note that python integers are handled exactly like numpy ones. They
> > are, however, special in that they do not have a dtype associated
> > with
> > them explicitly. Value based logic, as described here, seems useful
> > for
> > python integers and floats to allow:
> > ```
> > arr = np.arange(10, dtype=np.int8)
> > arr += 1
> > # or:
> > res = arr + 1
> > res.dtype == np.int8
> > ```
> > which ensures that no upcast (for example with higher memory usage)
> > occurs.
> > 
> > 
> > ### Safe Casting
> > 
> > Most safe casting is clearly defined based on whether or not any
> > possible value is representable in the ouput dtype. Within numpy
> > there
> > is currently a single exception to this rule:
> > `np.can_cast(np.int64,
> > np.float64, casting="safe")` is considered to be true although
> > float64
> > cannot represent some large integer values exactly. In contrast,
> > `np.can_cast(np.int32, np.float32, casting="safe")` is `False` and
> > `np.float64` would have to be used if a "safe" cast is desired.
> > 
> > This exception may be one thing that should be changed, however,
> > concurrently the promotion rules have to be adapted to keep doing
> > the
> > same thing, or a larger behaviour change decided.
> > 
> > 
> > #### Scalar based rules
> > 
> > Unlike arrays, where inspection of all values is not feasable, for
> > scalars (and 0-D arrays) the value is inspected. The casting
> > becomes a
> > two step process:
> >   1. The minimal dtype capable of holding the value is found.
> >   2. The normal casting rules are applied to the new dtype.
> > 
> > The first step uses the following rules by finding the minimal
> > dtype
> > within its category:
> > 
> >  * Boolean: Dtype is already minimal
> > 
> >  * Integers:
> >     Casting is possible if output can hold the value. This includes
> > uint8(127) casting to an int8.
> > 
> >  * Floats and Complex
> >     Scalars can be demoted based on value, roughly this avoids
> > overflows:
> >     ```
> >     float16:     -65000 < value < 65000
> >     float32:    -3.4e38 < value < 3.4e38
> >     float64:   -1.7e308 < value < 1.7e308
> >     float128 (largest type, does not apply).
> >     ```
> >     For complex, the logic is simply applied to both real and
> > imaginary
> > part. Complex numbers cannot be downcast to floating point.
> > 
> >  * Others: Dtype is not modified.
> > 
> > 
> > This two step process means that `np.can_cast(np.int16(1024),
> > np.float16)` is `False` even though float16 is capable of exactly
> > representing the value 1024, since value based "demotion" to a
> > lower
> > dtype is used only within each category.
> > 
> > 
> > 
> > ### Common Type Promotion
> > 
> > For most operations in numpy the output type is just the common
> > type of
> > the inputs, this holds for example for concatenation, as well as
> > almost
> > all math funcions (e.g. addition and multiplication have two
> > identical
> > inputs and need one ouput dtype). This operation is exposed as
> > `np.result_type` which includes value based logic, and
> > `np.promote_types` which only accepts dtypes as input.
> > 
> > Normal type promotion without value based/scalar logic finds the
> > smallest type which both inputs can cast to safely. This will be
> > the
> > largest "kind" (bool < unsigned < integer < float < complex <
> > other).
> > 
> > Note that type promotion is handled in a "reduce" manner from left
> > to
> > right. In rare cases this means it is not associatetive: `float32,
> > uint16, int16 -> float32`, but `float32, (uint16, int16) ->
> > float64`.
> > 
> > #### Scalar based rule
> > 
> > When there is a mix of scalars and arrays, numpy will usually allow
> > the
> > scalars to be handled in the same fashion as for "safe" casting
> > rules.
> > 
> > The rules are as follows:
> > 
> > 1. Value based logic is only applied if the "category" of any array
> > is
> > larger or equal to the category of all scalars. If this is not the
> > case, the typical rules are used.
> >     * Specifically, this means: `np.array([1, 2, 3],
> > dtype=np.uint8) +
> > np.float64(12.)` gives a `float64` result, because the
> > `np.float64(12.)` is not considered for being demoted.
> > 
> > 2. Promotion is applied as normally, however, instead of the
> > original
> > dtype, the minimal dtype is used. In the case where the minimal
> > data
> > type is unsigned (say uint8) but the value is small enough, the
> > minimal
> > type may in fact be either `uint8` or `int8` (127 can be both).
> > This
> > promotion is also applied in pairs (reduction-like) from left to
> > right.
> > 
> > 
> > ### General Promotion during Function Execution
> > 
> > General functions (read "ufuncs" such as `np.add`) may have a
> > specific
> > dtype signature which is (for most dtypes) stored e.g. as
> > `np.add.types`. For many of these functions the common type
> > promotion
> > is used unchanged.
> > 
> > However, some functions will employ a slightly different method
> > (which
> > should be equivalent in most cases). They will loop through all
> > loops
> > listed in `np.add.types` in order and find the first one to which
> > all
> > inputs can be safely cast:
> > ```
> > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...]
> > ```
> > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the first
> > `float16, float16 -> float16` (`'ee->e'`) loop because `int16`
> > cannot
> > be cast safely, and then pick the float32 (`'ff->f'`) one.
> > 
> > For simple functions, which commonly have two identical inputs,
> > this
> > should be identical, since normally a clear order exists for the
> > dtypes
> > (it does require checking int8 before uint8, etc.).
> > 
> > #### Scalar based rule
> > 
> > When scalars are involved, the "safe" cast logic based on values is
> > applied *if and only if* rule 1. applies as before: That is there
> > must
> > be an array with a higher or equal category as all of the scalars.
> > 
> > In the above `np.divide` example, this means that
> > `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will*
> > use
> > the `'ee->e'` loop, because the scalar `4` is of a lower or equal
> > category than the array (integer <= float or complex). While
> > checking,
> > 4 is found to be safely castable to float16, since `(u)int8` is
> > sufficient to hold 4 and that can be safely cast to `float16`.
> > However, `np.divide(np.int16(4), np.int16(3))` would use `float32`
> > because both are scalars and thus value based logic is not used
> > (Note
> > that in reality numpy forces double output for an all integer input
> > in
> > divide).
> > 
> > In it is possible for ufuncs to have mixed type signatures (this is
> > very rare within numy) and arbitrary inputs. In this case, in
> > principle, the question is whether or not a clear ordering exists
> > and
> > if the rule of using value based logic is always clear. This is
> > rather
> > academical (I could not find any such function in numpy or
> > `scipy.special` [^scipy-ufuncs]). But consider:
> > ```
> > imaginary_ufunc.types:
> >     int32, float32 -> int32, float32
> >     int64, float32 -> int64, float32
> >     ...
> > ```
> > it is not clear that `np.int64(5) + np.float32(3.)` should be able
> > to
> > demote the `5`. This is very theoretical of course
> > 
> > 
> > 
> > 
> > Footnotes
> > ---------
> > 
> > [^scipy-ufuncs]: See for example these functions:
> >     ```python
> >     import scipy.special
> >     for n, func in scipy.special.__dict__.items():
> >         if not isinstance(func, np.ufunc):
> >             continue
> > 
> >         if func.nin == 1:
> >             # a single input is not interesting
> >             continue
> > 
> >         # check if the signature is not uniform
> >         for types in func.types:
> >             if len(set(types[:func.nin])) != 1:
> >                 break
> >         else:
> >             continue
> >         print(func, func.types)
> >     ```
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/e2240b27/attachment-0001.sig>

From b.sipocz+numpylist at gmail.com  Wed Jun 12 13:28:38 2019
From: b.sipocz+numpylist at gmail.com (Brigitta Sipocz)
Date: Wed, 12 Jun 2019 10:28:38 -0700
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
Message-ID: <CAEjJ3fgSdnnvGzihjW-1Q3D+V0=hX8=XPjEo=XU2Pyp43oye4g@mail.gmail.com>

> 3. After merging, the release notes are copied into the wiki by the
> user or a contributor. After the copy happened, the label could/should
> be removed?


In astropy we keep our "whats-new-needed" label around, so people
doing the release can double check that everything is indeed ended up
in the document, and it's still a good balance of topics (some things
seemed to be what's new worthy at the beginning, but a few weeks/month
later at the time of the actual release they might not be exciting any
more).

Brigitta

From sebastian at sipsolutions.net  Wed Jun 12 13:55:15 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 12 Jun 2019 12:55:15 -0500
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
Message-ID: <bf3844ac47c2e0591f2e04bcb447e59b9899eac7.camel@sipsolutions.net>

On Wed, 2019-06-05 at 15:41 -0500, Sebastian Berg wrote:
> Hi all,
> 
> TL;DR:
> 
> Value based promotion seems complex both for users and ufunc-
> dispatching/promotion logic. Is there any way we can move forward
> here,
> and if we do, could we just risk some possible (maybe not-existing)
> corner cases to break early to get on the way?
> 

Hi all,

just to note. I think I will go forward trying to fill the hole in the
hierarchy with a non-existing uint7 dtype. That seemed like it may be
ugly, but if it does not escalate too much, it is probably fairly
straight forward. And it would allow to simplify dispatching without
any logic change at all. After that we could still decide to change the
logic.

Best,

Sebastian


> -----------
> 
> Currently when you write code such as:
> 
> arr = np.array([1, 43, 23], dtype=np.uint16)
> res = arr + 1
> 
> Numpy uses fairly sophisticated logic to decide that `1` can be
> represented as a uint16, and thus for all unary functions (and most
> others as well), the output will have a `res.dtype` of uint16.
> 
> Similar logic also exists for floating point types, where a lower
> precision floating point can be used:
> 
> arr = np.array([1, 43, 23], dtype=np.float32)
> (arr + np.float64(2.)).dtype  # will be float32
> 
> Currently, this value based logic is enforced by checking whether the
> cast is possible: "4" can be cast to int8, uint8. So the first call
> above will at some point check if "uint16 + uint16 -> uint16" is a
> valid operation, find that it is, and thus stop searching. (There is
> the additional logic, that when both/all operands are scalars, it is
> not applied).
> 
> Note that while it is defined in terms of casting "1" to uint8 safely
> being possible even though 1 may be typed as int64. This logic thus
> affects all promotion rules as well (i.e. what should the output
> dtype
> be).
> 
> 
> There 2 main discussion points/issues about it:
> 
> 1. Should value based casting/promotion logic exist at all?
> 
> Arguably an `np.int32(3)` has type information attached to it, so why
> should we ignore it. It can also be tricky for users, because a small
> change in values can change the result data type.
> Because 0-D arrays and scalars are too close inside numpy (you will
> often not know which one you get). There is not much option but to
> handle them identically. However, it seems pretty odd that:
>  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
>  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> 
> give a different result.
> 
> This is a bit different for python scalars, which do not have a type
> attached already.
> 
> 
> 2. Promotion and type resolution in Ufuncs:
> 
> What is currently bothering me is that the decision what the output
> dtypes should be currently depends on the values in complicated ways.
> It would be nice if we can decide which type signature to use without
> actually looking at values (or at least only very early on).
> 
> One reason here is caching and simplicity. I would like to be able to
> cache which loop should be used for what input. Having value based
> casting in there bloats up the problem.
> Of course it currently works OK, but especially when user dtypes come
> into play, caching would seem like a nice optimization option.
> 
> Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
> as
> simple as finding the "minimal" dtype once and working with that." 
> Of course Eric and I discussed this a bit before, and you could
> create
> an internal "uint7" dtype which has the only purpose of flagging that
> a
> cast to int8 is safe.
> 
> I suppose it is possible I am barking up the wrong tree here, and
> this
> caching/predictability is not vital (or can be solved with such an
> internal dtype easily, although I am not sure it seems elegant).
> 
> 
> Possible options to move forward
> --------------------------------
> 
> I have to still see a bit how trick things are. But there are a few
> possible options. I would like to move the scalar logic to the
> beginning of ufunc calls:
>   * The uint7 idea would be one solution
>   * Simply implement something that works for numpy and all except
>     strange external ufuncs (I can only think of numba as a plausible
>     candidate for creating such).
> 
> My current plan is to see where the second thing leaves me.
> 
> We also should see if we cannot move the whole thing forward, in
> which
> case the main decision would have to be forward to where. My opinion
> is
> currently that when a type has a dtype associated with it clearly, we
> should always use that dtype in the future. This mostly means that
> numpy dtypes such as `np.int64` will always be treated like an int64,
> and never like a `uint8` because they happen to be castable to that.
> 
> For values without a dtype attached (read python integers, floats), I
> see three options, from more complex to simpler:
> 
> 1. Keep the current logic in place as much as possible
> 2. Only support value based promotion for operators, e.g.:
>    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
>    The upside is that it limits the complexity to a much simpler
>    problem, the downside is that the ufunc call and operator match
>    less clearly.
> 3. Just associate python float with float64 and python integers with
>    long/int64 and force users to always type them explicitly if they
>    need to.
> 
> The downside of 1. is that it doesn't help with simplifying the
> current
> situation all that much, because we still have the special casting
> around...
> 
> 
> I have realized that this got much too long, so I hope it makes
> sense.
> I will continue to dabble along on these things a bit, so if nothing
> else maybe writing it helps me to get a bit clearer on things...
> 
> Best,
> 
> Sebastian
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/e7b269cb/attachment.sig>

From njs at pobox.com  Wed Jun 12 14:45:55 2019
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 12 Jun 2019 11:45:55 -0700
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
Message-ID: <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>

It might be worth considering a tool like 'towncrier'. It's automation to
support the workflow where PRs that make changes also include their release
notes, so when the release comes you've already done all the work and just
have to hit the button.

On Wed, Jun 12, 2019, 07:59 Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> we had discussed trying a new strategy to gather release notes on the
> last community call, but not followed up on it on the list yet.
>
> For the next release, we decided to try a strategy of using a wiki page
> to gather release notes. The main purpose for this is to avoid merge
> conflicts in the release notes file. It may also make things slightly
> easier for new contributors even without merge conflicts.
> Any comments/opinions about other alternatives are welcome.
>
> We probably still need to fix some details, but I this will probably
> mean:
>
> 1. We tag issues with "Needs Release Notes"
> 2. We ask contributors/maintainers to edit the initial PR post/comment
> with a release note snippet. (I expect maintainers may typically put in
> a placeholder as a start for the contributor.)
> 3. After merging, the release notes are copied into the wiki by the
> user or a contributor. After the copy happened, the label could/should
> be removed?
>
> SciPy uses a similar strategy, so they may already have some experience
> to do it slightly different that I am missing.
>
> Best Regards,
>
> Sebastian
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/5fcd36fc/attachment.html>

From wieser.eric+numpy at gmail.com  Wed Jun 12 15:18:42 2019
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Wed, 12 Jun 2019 12:18:42 -0700
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
 <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
Message-ID: <CAL1kJvCss-5V7Dw859oftSisn31NgbY5WiuqwLs1LfYf9p875w@mail.gmail.com>

It's worth linking to the issue where this discussion started, so we avoid
repeating ourselves -
 https://github.com/numpy/numpy/issues/13707.

Eric

On Wed, Jun 12, 2019, 11:51 Nathaniel Smith <njs at pobox.com> wrote:

> It might be worth considering a tool like 'towncrier'. It's automation to
> support the workflow where PRs that make changes also include their release
> notes, so when the release comes you've already done all the work and just
> have to hit the button.
>
> On Wed, Jun 12, 2019, 07:59 Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
>
>> Hi all,
>>
>> we had discussed trying a new strategy to gather release notes on the
>> last community call, but not followed up on it on the list yet.
>>
>> For the next release, we decided to try a strategy of using a wiki page
>> to gather release notes. The main purpose for this is to avoid merge
>> conflicts in the release notes file. It may also make things slightly
>> easier for new contributors even without merge conflicts.
>> Any comments/opinions about other alternatives are welcome.
>>
>> We probably still need to fix some details, but I this will probably
>> mean:
>>
>> 1. We tag issues with "Needs Release Notes"
>> 2. We ask contributors/maintainers to edit the initial PR post/comment
>> with a release note snippet. (I expect maintainers may typically put in
>> a placeholder as a start for the contributor.)
>> 3. After merging, the release notes are copied into the wiki by the
>> user or a contributor. After the copy happened, the label could/should
>> be removed?
>>
>> SciPy uses a similar strategy, so they may already have some experience
>> to do it slightly different that I am missing.
>>
>> Best Regards,
>>
>> Sebastian
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/e5a2c305/attachment-0001.html>

From sebastian at sipsolutions.net  Wed Jun 12 15:19:12 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 12 Jun 2019 14:19:12 -0500
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
 <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
Message-ID: <e3887ff20081daef6ea17f5dfffc1346d9501e74.camel@sipsolutions.net>

On Wed, 2019-06-12 at 11:45 -0700, Nathaniel Smith wrote:
> It might be worth considering a tool like 'towncrier'. It's
> automation to support the workflow where PRs that make changes also
> include their release notes, so when the release comes you've already
> done all the work and just have to hit the button.
> 

We discussed that a bit. I think one issue is that none of us had much
experience with it. There was a notion that towncrier might be a
steeper learning curve for new/one-time contributors compared to a
wiki/summary editing approach (which probably could be automated or
semi automated at some point).

But to be honest, if you suggest that we should give it a better look
or even a try, I do not think anyone had strong feelings about it. I
cannot say I looked at those two options (towncrier, and I think what
cpython uses) close enough to have an opinion.

- Sebastian


> On Wed, Jun 12, 2019, 07:59 Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
> > Hi all,
> > 
> > we had discussed trying a new strategy to gather release notes on
> > the
> > last community call, but not followed up on it on the list yet.
> > 
> > For the next release, we decided to try a strategy of using a wiki
> > page
> > to gather release notes. The main purpose for this is to avoid
> > merge
> > conflicts in the release notes file. It may also make things
> > slightly
> > easier for new contributors even without merge conflicts.
> > Any comments/opinions about other alternatives are welcome.
> > 
> > We probably still need to fix some details, but I this will
> > probably
> > mean:
> > 
> > 1. We tag issues with "Needs Release Notes"
> > 2. We ask contributors/maintainers to edit the initial PR
> > post/comment
> > with a release note snippet. (I expect maintainers may typically
> > put in
> > a placeholder as a start for the contributor.)
> > 3. After merging, the release notes are copied into the wiki by the
> > user or a contributor. After the copy happened, the label
> > could/should
> > be removed?
> > 
> > SciPy uses a similar strategy, so they may already have some
> > experience
> > to do it slightly different that I am missing.
> > 
> > Best Regards,
> > 
> > Sebastian
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/b3c0de5d/attachment.sig>

From sebastian at sipsolutions.net  Wed Jun 12 15:49:55 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 12 Jun 2019 14:49:55 -0500
Subject: [Numpy-discussion] (Value Based Promotion) Current Behaviour
In-Reply-To: <b1bb5d74e179c891cefff04bdd18f2b48dafc4ba.camel@sipsolutions.net>
References: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>
 <296297cac0ffc34034512499b1b628240356034a.camel@sipsolutions.net>
 <CAJNV+9vkJtnPZ83KbXhExHd-93vk3+H-aAa38pqPtHqcNbdHzw@mail.gmail.com>
 <b1bb5d74e179c891cefff04bdd18f2b48dafc4ba.camel@sipsolutions.net>
Message-ID: <4ac1b140e57cb0c6a49cbef7976e74fb3f55d76a.camel@sipsolutions.net>

On Wed, 2019-06-12 at 12:03 -0500, Sebastian Berg wrote:
> On Tue, 2019-06-11 at 22:08 -0400, Marten van Kerkwijk wrote:
> > HI Sebastian,
> > 
> > Thanks for the overview! In the value-based casting, what perhaps
> > surprises me most is that it is done within a kind; it would seem
> > an
> > improvement to check whether a given integer scalar is exactly
> > representable in a given float (your example of 1024 in `float16`).
> > If we switch to the python-only scalar values idea, I would suggest
> > to abandon this. That might make dealing with things like `Decimal`
> > or `Fraction` easier as well.
> > 
> 
> Yeah, one can argue that since we have this "safe casting" based
> approach, we should go all the way for the value based logic. I think
> I
> tend to agree, but I am not quite sure right now to be honest.

Just realized, one issue with this is that you get much more "special
cases" if you think of it in terms of "minimal dtype". Because
suddenly, not just the unsigned/signed integers such as "< 128" are
special, but even more values require special handling. An int16
"minimal dtype" may or may not be castable to float16.

For `can_cast` that does not matter much, but if we use the same logic
for promotion things may get uglier. Although, maybe it just gets
uglier implementation wise and is fairly logic on the user side...

- Sebastian


> 
> Fractions and Decimals are very interesting in that they raise the
> question what happens to user dtypes [0]. Although, you would still
> need a "no lower category" rule, since you do not want 1024. or 12/3
> be
> demoted to an integer.
> 
> For me right now, what is most interesting is what we should do with
> ufunc calls, and if we can simplify them. I feel right now we have to
> types of ufuncs:
> 
> 1. Ufuncs which use a "common type", where we can find the minimal
> type
> before dispatching.
> 
> 2. More complex ufuncs, for which finding the minimal type is
> trickier
> [1]. And while I could not find any weird enough ufunc, I am not sure
> that blind promotion is a good idea for general ufuncs.
> 
> Best,
> 
> Sebastian
> 
> 
> [0] A python fraction could be converted to int64/int64 or
> int32/int32,
> etc. depending on the value, in principle. If we want such things to
> work in principle, we need machinery (although I expect one could tag
> that on later).
> [1] It is not impossible, but we need to insert non-existing types
> into
> the type hierarchy.
> 
> 
> 
> PS: Another interesting issue is that if we try to move away from
> value
> based casting for numpy scalars, that initial `np.asarray(...)` call
> may lose the information that a python integer was passed in. So to
> support such things, we might need a whole new machinery.
> 
> 
>  
> 
> > All the best,
> > 
> > Marten
> > 
> > On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg <
> > sebastian at sipsolutions.net> wrote:
> > > Hi all,
> > > 
> > > strange, something went wrong sending that email, but in any
> > > case...
> > > 
> > > I tried to "summarize" the current behaviour of promotion and
> > > value
> > > based promotion in numpy (correcting a small error in what I
> > > wrote
> > > earlier). Since it got a bit long, you can find it here (also
> > > copy
> > > pasted at the end):
> > > 
> > > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA
> > > 
> > > Allan's document which I link in there is also very interesting.
> > > One
> > > thing I had not really thought about before was the problem of
> > > commutativity.
> > > 
> > > I do not have any specific points I want to discuss based on it
> > > (but
> > > those are likely to come up again later).
> > > 
> > > All the Best,
> > > 
> > > Sebastian
> > > 
> > > 
> > > -----------------------------
> > > 
> > > PS: Below a copy of what I wrote:
> > > 
> > > ---
> > > title: Numpy Value Based Promotion Rules
> > > author: Sebastian Berg
> > > ---
> > > 
> > > 
> > > 
> > > NumPy Value Based Scalar Casting and Promotion
> > > ==============================================
> > > 
> > > This document reviews some of the behaviours of the promotion
> > > rules
> > > within numpy. This is especially with respect to the promotion of
> > > scalars and 0D arrays which inspect the value to decide casting
> > > and
> > > promotion.
> > > 
> > > Other documents discussing these things:
> > > 
> > >   * `from numpy.testing import print_coercion_tables` prints the
> > > current promotion tables including value based promotion for
> > > small
> > > positive/negative scalars.
> > >   * Allan Haldane's thoughts on changing casting/promotion to be
> > > more
> > > C-like and discussing things such as here:
> > >     
> > > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
> > >   * Discussion around the problem of uint64 and int64 being
> > > promoted to
> > > float64: https://github.com/numpy/numpy/issues/12525 (lists many
> > > related issues).
> > > 
> > > 
> > > Nomenclature and Defintions
> > > ---------------------------
> > > 
> > > * **dtype/type**: The data type of an array or scalar: `float32`,
> > > `float64`, `int8`, ?
> > > 
> > > * **Category**: A category to which the data type belongs, in
> > > this
> > > context these are:
> > >   1. boolean
> > >   2. integer (unsigned and signed are not split up here, but are
> > > different "kinds")
> > >   3. floating point and complex (not split up here but are
> > > different
> > > "kinds")
> > >   5. All others
> > > 
> > > * **Casting**: converting from one dtype to another. There are
> > > four
> > > different rules of casting:
> > >   1. *"safe"* casting: All values are representable in the new
> > > data
> > > type. I.e. no information is lost during the conversion.
> > >   2. *"same kind"* casting: data loss may occur, but only within
> > > the
> > > same "kind". For example a float64 can be converted to float32
> > > using
> > > "same kind" rules, an int64 can be converted to int16. This is
> > > although
> > > both lose precision or even produce incorrect values. Note that
> > > "kind"
> > > is different from "category" in that it distinguishes between
> > > signed
> > > and unsigned integers.
> > >   4. *"unsafe"* casting: Any conversion which can be defined,
> > > e.g.
> > > floating point to integer. For promotion this is fairly
> > > unimportant.
> > > (Some conversions such as string to integer, which not even work
> > > fall
> > > in this category, but could also be called coercions or
> > > conversions.)
> > > 
> > > * **Promotion**: The general process of finding a new dtype for
> > > multiple input dtypes. Will be used here to also denote any kind
> > > of
> > > casting/promotion done before a specific function is called. This
> > > can
> > > be more complex, because in rare cases a functions can for
> > > example
> > > take
> > > floating point numbers and integers as input at the same time
> > > (i.e.
> > > `np.ldexp`).
> > > 
> > > * **Common dtype**: A dtype which can represent all input data.
> > > In
> > > general this means that all inputs can be safely cast to this
> > > dtype.
> > > Within numpy this is the normal and simplest form of promotion.
> > > 
> > > * **`type1, type2 -> type3`**: Defines a promotion or signature.
> > > For
> > > example adding two integers: `np.int32(5) + np.int32(3)` gives
> > > `np.int32(8)`. The dtype signature for that example would be:
> > > `int32,
> > > int32 -> int32`. A short form for this is also `ii->i` using C-
> > > like
> > > type codes, this can be found for example in `np.ldexp.types`
> > > (and
> > > any
> > > numpy ufunc).
> > > 
> > > * **Scalar**: A numpy or python scalar or a **0-D array**. It is
> > > important to remember that zero dimensional arrays are treated
> > > just
> > > like scalars with respect to casting and promotion.
> > > 
> > > 
> > > Current Situation in Numpy
> > > --------------------------
> > > 
> > > The current situation can be understand mostly in terms of safe
> > > casting
> > > which is defined based on the type hierarchy and is sensitive to
> > > values
> > > for scalars.
> > > 
> > > This safe casting based approach is in contrast for example to
> > > promotion within C or Julia, which work based on category first.
> > > For
> > > example `int32` cannot be safely cast to `float32`, but C or
> > > Julia
> > > will
> > > use `int32, float32 -> float32` as the common type/promotion rule
> > > for
> > > example to decide on the output dtype for addition.
> > > 
> > > 
> > > ### Python Integers and Floats
> > > 
> > > Note that python integers are handled exactly like numpy ones.
> > > They
> > > are, however, special in that they do not have a dtype associated
> > > with
> > > them explicitly. Value based logic, as described here, seems
> > > useful
> > > for
> > > python integers and floats to allow:
> > > ```
> > > arr = np.arange(10, dtype=np.int8)
> > > arr += 1
> > > # or:
> > > res = arr + 1
> > > res.dtype == np.int8
> > > ```
> > > which ensures that no upcast (for example with higher memory
> > > usage)
> > > occurs.
> > > 
> > > 
> > > ### Safe Casting
> > > 
> > > Most safe casting is clearly defined based on whether or not any
> > > possible value is representable in the ouput dtype. Within numpy
> > > there
> > > is currently a single exception to this rule:
> > > `np.can_cast(np.int64,
> > > np.float64, casting="safe")` is considered to be true although
> > > float64
> > > cannot represent some large integer values exactly. In contrast,
> > > `np.can_cast(np.int32, np.float32, casting="safe")` is `False`
> > > and
> > > `np.float64` would have to be used if a "safe" cast is desired.
> > > 
> > > This exception may be one thing that should be changed, however,
> > > concurrently the promotion rules have to be adapted to keep doing
> > > the
> > > same thing, or a larger behaviour change decided.
> > > 
> > > 
> > > #### Scalar based rules
> > > 
> > > Unlike arrays, where inspection of all values is not feasable,
> > > for
> > > scalars (and 0-D arrays) the value is inspected. The casting
> > > becomes a
> > > two step process:
> > >   1. The minimal dtype capable of holding the value is found.
> > >   2. The normal casting rules are applied to the new dtype.
> > > 
> > > The first step uses the following rules by finding the minimal
> > > dtype
> > > within its category:
> > > 
> > >  * Boolean: Dtype is already minimal
> > > 
> > >  * Integers:
> > >     Casting is possible if output can hold the value. This
> > > includes
> > > uint8(127) casting to an int8.
> > > 
> > >  * Floats and Complex
> > >     Scalars can be demoted based on value, roughly this avoids
> > > overflows:
> > >     ```
> > >     float16:     -65000 < value < 65000
> > >     float32:    -3.4e38 < value < 3.4e38
> > >     float64:   -1.7e308 < value < 1.7e308
> > >     float128 (largest type, does not apply).
> > >     ```
> > >     For complex, the logic is simply applied to both real and
> > > imaginary
> > > part. Complex numbers cannot be downcast to floating point.
> > > 
> > >  * Others: Dtype is not modified.
> > > 
> > > 
> > > This two step process means that `np.can_cast(np.int16(1024),
> > > np.float16)` is `False` even though float16 is capable of exactly
> > > representing the value 1024, since value based "demotion" to a
> > > lower
> > > dtype is used only within each category.
> > > 
> > > 
> > > 
> > > ### Common Type Promotion
> > > 
> > > For most operations in numpy the output type is just the common
> > > type of
> > > the inputs, this holds for example for concatenation, as well as
> > > almost
> > > all math funcions (e.g. addition and multiplication have two
> > > identical
> > > inputs and need one ouput dtype). This operation is exposed as
> > > `np.result_type` which includes value based logic, and
> > > `np.promote_types` which only accepts dtypes as input.
> > > 
> > > Normal type promotion without value based/scalar logic finds the
> > > smallest type which both inputs can cast to safely. This will be
> > > the
> > > largest "kind" (bool < unsigned < integer < float < complex <
> > > other).
> > > 
> > > Note that type promotion is handled in a "reduce" manner from
> > > left
> > > to
> > > right. In rare cases this means it is not associatetive:
> > > `float32,
> > > uint16, int16 -> float32`, but `float32, (uint16, int16) ->
> > > float64`.
> > > 
> > > #### Scalar based rule
> > > 
> > > When there is a mix of scalars and arrays, numpy will usually
> > > allow
> > > the
> > > scalars to be handled in the same fashion as for "safe" casting
> > > rules.
> > > 
> > > The rules are as follows:
> > > 
> > > 1. Value based logic is only applied if the "category" of any
> > > array
> > > is
> > > larger or equal to the category of all scalars. If this is not
> > > the
> > > case, the typical rules are used.
> > >     * Specifically, this means: `np.array([1, 2, 3],
> > > dtype=np.uint8) +
> > > np.float64(12.)` gives a `float64` result, because the
> > > `np.float64(12.)` is not considered for being demoted.
> > > 
> > > 2. Promotion is applied as normally, however, instead of the
> > > original
> > > dtype, the minimal dtype is used. In the case where the minimal
> > > data
> > > type is unsigned (say uint8) but the value is small enough, the
> > > minimal
> > > type may in fact be either `uint8` or `int8` (127 can be both).
> > > This
> > > promotion is also applied in pairs (reduction-like) from left to
> > > right.
> > > 
> > > 
> > > ### General Promotion during Function Execution
> > > 
> > > General functions (read "ufuncs" such as `np.add`) may have a
> > > specific
> > > dtype signature which is (for most dtypes) stored e.g. as
> > > `np.add.types`. For many of these functions the common type
> > > promotion
> > > is used unchanged.
> > > 
> > > However, some functions will employ a slightly different method
> > > (which
> > > should be equivalent in most cases). They will loop through all
> > > loops
> > > listed in `np.add.types` in order and find the first one to which
> > > all
> > > inputs can be safely cast:
> > > ```
> > > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...]
> > > ```
> > > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the
> > > first
> > > `float16, float16 -> float16` (`'ee->e'`) loop because `int16`
> > > cannot
> > > be cast safely, and then pick the float32 (`'ff->f'`) one.
> > > 
> > > For simple functions, which commonly have two identical inputs,
> > > this
> > > should be identical, since normally a clear order exists for the
> > > dtypes
> > > (it does require checking int8 before uint8, etc.).
> > > 
> > > #### Scalar based rule
> > > 
> > > When scalars are involved, the "safe" cast logic based on values
> > > is
> > > applied *if and only if* rule 1. applies as before: That is there
> > > must
> > > be an array with a higher or equal category as all of the
> > > scalars.
> > > 
> > > In the above `np.divide` example, this means that
> > > `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will*
> > > use
> > > the `'ee->e'` loop, because the scalar `4` is of a lower or equal
> > > category than the array (integer <= float or complex). While
> > > checking,
> > > 4 is found to be safely castable to float16, since `(u)int8` is
> > > sufficient to hold 4 and that can be safely cast to `float16`.
> > > However, `np.divide(np.int16(4), np.int16(3))` would use
> > > `float32`
> > > because both are scalars and thus value based logic is not used
> > > (Note
> > > that in reality numpy forces double output for an all integer
> > > input
> > > in
> > > divide).
> > > 
> > > In it is possible for ufuncs to have mixed type signatures (this
> > > is
> > > very rare within numy) and arbitrary inputs. In this case, in
> > > principle, the question is whether or not a clear ordering exists
> > > and
> > > if the rule of using value based logic is always clear. This is
> > > rather
> > > academical (I could not find any such function in numpy or
> > > `scipy.special` [^scipy-ufuncs]). But consider:
> > > ```
> > > imaginary_ufunc.types:
> > >     int32, float32 -> int32, float32
> > >     int64, float32 -> int64, float32
> > >     ...
> > > ```
> > > it is not clear that `np.int64(5) + np.float32(3.)` should be
> > > able
> > > to
> > > demote the `5`. This is very theoretical of course
> > > 
> > > 
> > > 
> > > 
> > > Footnotes
> > > ---------
> > > 
> > > [^scipy-ufuncs]: See for example these functions:
> > >     ```python
> > >     import scipy.special
> > >     for n, func in scipy.special.__dict__.items():
> > >         if not isinstance(func, np.ufunc):
> > >             continue
> > > 
> > >         if func.nin == 1:
> > >             # a single input is not interesting
> > >             continue
> > > 
> > >         # check if the signature is not uniform
> > >         for types in func.types:
> > >             if len(set(types[:func.nin])) != 1:
> > >                 break
> > >         else:
> > >             continue
> > >         print(func, func.types)
> > >     ```
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/53598f89/attachment-0001.sig>

From m.h.vankerkwijk at gmail.com  Wed Jun 12 15:57:48 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 12 Jun 2019 15:57:48 -0400
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <e3887ff20081daef6ea17f5dfffc1346d9501e74.camel@sipsolutions.net>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
 <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
 <e3887ff20081daef6ea17f5dfffc1346d9501e74.camel@sipsolutions.net>
Message-ID: <CAJNV+9tDD_7SDfStP2xzKpTL7x6du1nwPhDe2AHWX0Bc6MA-kw@mail.gmail.com>

Overall, in favour of splitting the large files, but I don't like that the
notes stop being under version control (e.g., a follow-up PR slightly
changes things, how does the note gets edited/reverted?).

Has there been any discussion of having, e.g., a directory
`docs/1.17.0-notes/`, and everyone storing their notes as individual files?
(A bit like maildir vs a single inbox file.) At release time, one would
then simply merge/order the files. Beyond staying within git, an advantage
of this may be that automation is easier (e.g., if the file is always
called <issue-number>.rst, then checks for it can be very easily
automated.).
-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/28d542f4/attachment.html>

From ralf.gommers at gmail.com  Wed Jun 12 16:31:54 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 12 Jun 2019 22:31:54 +0200
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <20190611220138.ldkqy4ik3igsb6ik@carbo>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
Message-ID: <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>

On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
> > In a way, I brought it up mostly as a concrete example of an internal
> > implementation which we cannot change to an objectively cleaner one
> because
> > other packages rely on an out-of-date numpy API.
>

I think this is not the right way to describe the problem (see below).


> This, and the comments Nathaniel made on the array function thread, are
> important to take note of.  Would it be worth updating NEP 18 with a
> list of pitfalls?  Or should this be a new informational NEP that
> discusses?on a higher level?the benefits, risks, and design
> considerations of providing protocols?
>

That would be a nice thing to do (the higher level one), but in this case I
think the issue has little to do with NEP 18. The summary of the issue in
this thread is a little brief, so let me try to clarify.

1. np.sum gained a new `where=` keyword in 1.17.0
2. using np.sum(x) will detect a `x.sum` method if it's present and try to
use that
3. the `_wrapreduction` utility that forwards the function to the method
will compare signatures of np.sum and x.sum, and throw an error if there's
a mismatch for any keywords that have a value other than the default
np._NoValue

Code to check this:
>>> x1 = np.arange(5)
>>> x2 = np.asmatrix(x1)
>>> np.sum(x1)  # works
>>> np.sum(x2)  # works
>>> np.sum(x1, where=x1>3)  # works
>>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
...
TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also
get a TypeError:
>>> y = pd.Series(x1)
>>> np.sum(y)  # works
>>> np.sum(y, where=y>3)  # pandas throws TypeError
...
TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective
of how it's implemented, new keywords cannot be used until the array-like
objects with the methods that get forwarded to gain the same keyword.

tl;dr this is simply a cost we have to be aware of when either proposing to
add new keywords, or when proposing any kind of dispatching logic (in this
case `_wrapreduction`).

Regarding internal use of  `np.sum(..., where=)`: this should not be done
until at least 4-5 versions from now, and preferably way longer than that.
Because doing so will break already released versions of Pandas, Dask, and
other libraries with array-like objects.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/d911754a/attachment.html>

From stefanv at berkeley.edu  Wed Jun 12 19:53:58 2019
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Wed, 12 Jun 2019 16:53:58 -0700
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <CAJNV+9tDD_7SDfStP2xzKpTL7x6du1nwPhDe2AHWX0Bc6MA-kw@mail.gmail.com>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
 <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
 <e3887ff20081daef6ea17f5dfffc1346d9501e74.camel@sipsolutions.net>
 <CAJNV+9tDD_7SDfStP2xzKpTL7x6du1nwPhDe2AHWX0Bc6MA-kw@mail.gmail.com>
Message-ID: <20190612235358.sf352mmxoolsspdj@carbo>

On Wed, 12 Jun 2019 15:57:48 -0400, Marten van Kerkwijk wrote:
> Overall, in favour of splitting the large files, but I don't like that the
> notes stop being under version control (e.g., a follow-up PR slightly
> changes things, how does the note gets edited/reverted?).
> 
> Has there been any discussion of having, e.g., a directory
> `docs/1.17.0-notes/`, and everyone storing their notes as individual files?
> (A bit like maildir vs a single inbox file.) At release time, one would
> then simply merge/order the files. Beyond staying within git, an advantage
> of this may be that automation is easier (e.g., if the file is always
> called <issue-number>.rst, then checks for it can be very easily
> automated.).

IPython does something very much like this: they put .rst files inside
whatsnew/pr/x.rst and then have a script to merge all these into the
release notes:

https://ipython.readthedocs.io/en/stable/coredev/index.html?highlight=whatsnew#create-github-stats-and-finish-release-note

It looks like Jupyter Notebook is not using that system, so not sure how
much mileage they got out of it.

Best regards,
St?fan

From njs at pobox.com  Wed Jun 12 20:34:33 2019
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 12 Jun 2019 17:34:33 -0700
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <CAJNV+9tDD_7SDfStP2xzKpTL7x6du1nwPhDe2AHWX0Bc6MA-kw@mail.gmail.com>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
 <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
 <e3887ff20081daef6ea17f5dfffc1346d9501e74.camel@sipsolutions.net>
 <CAJNV+9tDD_7SDfStP2xzKpTL7x6du1nwPhDe2AHWX0Bc6MA-kw@mail.gmail.com>
Message-ID: <CAPJVwBkyP+7YhKAHpz+d1rghka2GJOhA1yG_HxGGj6RFfRgseA@mail.gmail.com>

On Wed, Jun 12, 2019 at 12:58 PM Marten van Kerkwijk
<m.h.vankerkwijk at gmail.com> wrote:
>
> Overall, in favour of splitting the large files, but I don't like that the notes stop being under version control (e.g., a follow-up PR slightly changes things, how does the note gets edited/reverted?).
>
> Has there been any discussion of having, e.g., a directory `docs/1.17.0-notes/`, and everyone storing their notes as individual files? (A bit like maildir vs a single inbox file.) At release time, one would then simply merge/order the files. Beyond staying within git, an advantage of this may be that automation is easier (e.g., if the file is always called <issue-number>.rst, then checks for it can be very easily automated.).

That's exactly how towncrier works, except the filenames also have a
category like "bugfix" or "feature" so they can be sorted into the
right section of the final release notes. Here's how some projects
describe it in their contributing docs:

https://pip.pypa.io/en/stable/development/contributing/#news-entries
https://www.attrs.org/en/latest/contributing.html#changelog
https://trio.readthedocs.io/en/latest/contributing.html#release-notes

Oh, and it uses a single fixed directory, like 'docs/release-notes',
without the version in the directory name, and as part of preparing
the release you delete all the files in that directory after moving
them into the final release notes. This way if a PR is originally
targeted at 1.17 but slips to 1.18, you can't accidentally put the
note in the wrong directory. It's also nice for backports, where the
same bugfix might appear in 1.17.0 and 1.16.1 ? the backport
automatically carries the note along with it and it just works.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

From m.h.vankerkwijk at gmail.com  Wed Jun 12 20:55:42 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 12 Jun 2019 20:55:42 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
Message-ID: <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>

Hi Ralf,

You're right, the problem is with the added keyword argument (which would
appear also if we did not still have to support the old .sum method
override but just dispatched to __array_ufunc__ with `np.add.reduce` -
maybe worse given that it seems likely the reduce method has seen much less
testing in  __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our
ndarray class a certain way, and provide ways to override it (three now, in
fact). Is it really reasonable to expect that we wait 4 versions for other
packages to keep up with this, and thus get stuck with given internal
implementations?

Aside: note that the present version of the nanfunctions relies on turning
the arguments into arrays and copying 0s into it - that suggests that
currently they do not work for duck arrays like Dask.

All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <stefanv at berkeley.edu>
> wrote:
>
>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>> > In a way, I brought it up mostly as a concrete example of an internal
>> > implementation which we cannot change to an objectively cleaner one
>> because
>> > other packages rely on an out-of-date numpy API.
>>
>
> I think this is not the right way to describe the problem (see below).
>
>
>> This, and the comments Nathaniel made on the array function thread, are
>> important to take note of.  Would it be worth updating NEP 18 with a
>> list of pitfalls?  Or should this be a new informational NEP that
>> discusses?on a higher level?the benefits, risks, and design
>> considerations of providing protocols?
>>
>
> That would be a nice thing to do (the higher level one), but in this case
> I think the issue has little to do with NEP 18. The summary of the issue in
> this thread is a little brief, so let me try to clarify.
>
> 1. np.sum gained a new `where=` keyword in 1.17.0
> 2. using np.sum(x) will detect a `x.sum` method if it's present and try to
> use that
> 3. the `_wrapreduction` utility that forwards the function to the method
> will compare signatures of np.sum and x.sum, and throw an error if there's
> a mismatch for any keywords that have a value other than the default
> np._NoValue
>
> Code to check this:
> >>> x1 = np.arange(5)
> >>> x2 = np.asmatrix(x1)
> >>> np.sum(x1)  # works
> >>> np.sum(x2)  # works
> >>> np.sum(x1, where=x1>3)  # works
> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> Note that this is not specific to np.matrix. Using pandas.Series you also
> get a TypeError:
> >>> y = pd.Series(x1)
> >>> np.sum(y)  # works
> >>> np.sum(y, where=y>3)  # pandas throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> The issue is that when we have this kind of forwarding logic, irrespective
> of how it's implemented, new keywords cannot be used until the array-like
> objects with the methods that get forwarded to gain the same keyword.
>
> tl;dr this is simply a cost we have to be aware of when either proposing
> to add new keywords, or when proposing any kind of dispatching logic (in
> this case `_wrapreduction`).
>
> Regarding internal use of  `np.sum(..., where=)`: this should not be done
> until at least 4-5 versions from now, and preferably way longer than that.
> Because doing so will break already released versions of Pandas, Dask, and
> other libraries with array-like objects.
>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/c13c81c9/attachment.html>

From m.h.vankerkwijk at gmail.com  Wed Jun 12 20:59:55 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 12 Jun 2019 20:59:55 -0400
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <CAPJVwBkyP+7YhKAHpz+d1rghka2GJOhA1yG_HxGGj6RFfRgseA@mail.gmail.com>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
 <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
 <e3887ff20081daef6ea17f5dfffc1346d9501e74.camel@sipsolutions.net>
 <CAJNV+9tDD_7SDfStP2xzKpTL7x6du1nwPhDe2AHWX0Bc6MA-kw@mail.gmail.com>
 <CAPJVwBkyP+7YhKAHpz+d1rghka2GJOhA1yG_HxGGj6RFfRgseA@mail.gmail.com>
Message-ID: <CAJNV+9uuZ8AM5D1kFaOt+QPbUUPyf-GmO1OsD6W7DAAEdMpEHQ@mail.gmail.com>

The attrs like you sent definitely sounded like it would translate to numpy
nearly trivially. I'm very much in favour!
-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/b8df083b/attachment.html>

From shoyer at gmail.com  Wed Jun 12 21:16:11 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Wed, 12 Jun 2019 18:16:11 -0700
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
Message-ID: <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>

On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf,
>
> You're right, the problem is with the added keyword argument (which would
> appear also if we did not still have to support the old .sum method
> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
> maybe worse given that it seems likely the reduce method has seen much less
> testing in  __array_ufunc__ implementations).
>
> Still, I do think the question stands: we implement a `nansum` for our
> ndarray class a certain way, and provide ways to override it (three now, in
> fact). Is it really reasonable to expect that we wait 4 versions for other
> packages to keep up with this, and thus get stuck with given internal
> implementations?
>
> Aside: note that the present version of the nanfunctions relies on turning
> the arguments into arrays and copying 0s into it - that suggests that
> currently they do not work for duck arrays like Dask.
>

Agreed. We could safely rewrite things to use np.asarray(), without any
need to worry about backends compatibility. From an API perspective,
nothing would change -- we already cast inputs into base numpy arrays
inside the _replace_nan() routine.


>
> All the best,
>
> Marten
>
> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>> stefanv at berkeley.edu> wrote:
>>
>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>> > In a way, I brought it up mostly as a concrete example of an internal
>>> > implementation which we cannot change to an objectively cleaner one
>>> because
>>> > other packages rely on an out-of-date numpy API.
>>>
>>
>> I think this is not the right way to describe the problem (see below).
>>
>>
>>> This, and the comments Nathaniel made on the array function thread, are
>>> important to take note of.  Would it be worth updating NEP 18 with a
>>> list of pitfalls?  Or should this be a new informational NEP that
>>> discusses?on a higher level?the benefits, risks, and design
>>> considerations of providing protocols?
>>>
>>
>> That would be a nice thing to do (the higher level one), but in this case
>> I think the issue has little to do with NEP 18. The summary of the issue in
>> this thread is a little brief, so let me try to clarify.
>>
>> 1. np.sum gained a new `where=` keyword in 1.17.0
>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try
>> to use that
>> 3. the `_wrapreduction` utility that forwards the function to the method
>> will compare signatures of np.sum and x.sum, and throw an error if there's
>> a mismatch for any keywords that have a value other than the default
>> np._NoValue
>>
>> Code to check this:
>> >>> x1 = np.arange(5)
>> >>> x2 = np.asmatrix(x1)
>> >>> np.sum(x1)  # works
>> >>> np.sum(x2)  # works
>> >>> np.sum(x1, where=x1>3)  # works
>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>> ...
>> TypeError: sum() got an unexpected keyword argument 'where'
>>
>> Note that this is not specific to np.matrix. Using pandas.Series you also
>> get a TypeError:
>> >>> y = pd.Series(x1)
>> >>> np.sum(y)  # works
>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>> ...
>> TypeError: sum() got an unexpected keyword argument 'where'
>>
>> The issue is that when we have this kind of forwarding logic,
>> irrespective of how it's implemented, new keywords cannot be used until the
>> array-like objects with the methods that get forwarded to gain the same
>> keyword.
>>
>> tl;dr this is simply a cost we have to be aware of when either proposing
>> to add new keywords, or when proposing any kind of dispatching logic (in
>> this case `_wrapreduction`).
>>
>> Regarding internal use of  `np.sum(..., where=)`: this should not be done
>> until at least 4-5 versions from now, and preferably way longer than that.
>> Because doing so will break already released versions of Pandas, Dask, and
>> other libraries with array-like objects.
>>
>> Cheers,
>> Ralf
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/d4deb0cf/attachment-0001.html>

From sebastian at sipsolutions.net  Wed Jun 12 22:48:03 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 12 Jun 2019 21:48:03 -0500
Subject: [Numpy-discussion] New release note strategy after branching
 1.17.
In-Reply-To: <CAPJVwBkyP+7YhKAHpz+d1rghka2GJOhA1yG_HxGGj6RFfRgseA@mail.gmail.com>
References: <aeeed7f8c50877a8fdc9784e9019bfab6b39d613.camel@sipsolutions.net>
 <CAPJVwB=3AgG9NOsb+BDAtBsgKd9CrE7orBNdA9L7hZUsp7F09g@mail.gmail.com>
 <e3887ff20081daef6ea17f5dfffc1346d9501e74.camel@sipsolutions.net>
 <CAJNV+9tDD_7SDfStP2xzKpTL7x6du1nwPhDe2AHWX0Bc6MA-kw@mail.gmail.com>
 <CAPJVwBkyP+7YhKAHpz+d1rghka2GJOhA1yG_HxGGj6RFfRgseA@mail.gmail.com>
Message-ID: <096f38082dca8cd1e41231b3d8dc7ccb688cacaa.camel@sipsolutions.net>

On Wed, 2019-06-12 at 17:34 -0700, Nathaniel Smith wrote:
> On Wed, Jun 12, 2019 at 12:58 PM Marten van Kerkwijk
> <m.h.vankerkwijk at gmail.com> wrote:
> > Overall, in favour of splitting the large files, but I don't like
> > that the notes stop being under version control (e.g., a follow-up
> > PR slightly changes things, how does the note gets
> > edited/reverted?).
> > 
> > Has there been any discussion of having, e.g., a directory
> > `docs/1.17.0-notes/`, and everyone storing their notes as
> > individual files? (A bit like maildir vs a single inbox file.) At
> > release time, one would then simply merge/order the files. Beyond
> > staying within git, an advantage of this may be that automation is
> > easier (e.g., if the file is always called <issue-number>.rst, then
> > checks for it can be very easily automated.).
> 
> That's exactly how towncrier works, except the filenames also have a
> category like "bugfix" or "feature" so they can be sorted into the
> right section of the final release notes. Here's how some projects
> describe it in their contributing docs:
> 
> https://pip.pypa.io/en/stable/development/contributing/#news-entries
> https://www.attrs.org/en/latest/contributing.html#changelog
> https://trio.readthedocs.io/en/latest/contributing.html#release-notes
> 
> Oh, and it uses a single fixed directory, like 'docs/release-notes',
> without the version in the directory name, and as part of preparing
> the release you delete all the files in that directory after moving
> them into the final release notes. This way if a PR is originally
> targeted at 1.17 but slips to 1.18, you can't accidentally put the
> note in the wrong directory. It's also nice for backports, where the
> same bugfix might appear in 1.17.0 and 1.16.1 ? the backport
> automatically carries the note along with it and it just works.
> 

Those are nice features. Do you have experience that this is not very
difficult for new users, numpy possibly gets more one-time contributers
than others? I think that was the only reason we had for not being sure
about it. I am not sure we gave the fact that it is not inside version
control much thought (e.g. you could automate scraping PRs as well, we
already do that to some degree).

Maybe it is a point anyway to try it. Scipy currently uses the other
method. And right now we have the manpower to have maintainers push a
release notes commit if that should proof a bit of an issue (and it is
likely not much harder than the current "please edit this file",
especially for users making a change that needs a release note).

- Sebastian


> -n
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/20c2d050/attachment.sig>

From einstein.edison at gmail.com  Thu Jun 13 01:30:24 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Thu, 13 Jun 2019 07:30:24 +0200
Subject: [Numpy-discussion] (Value Based Promotion) Current Behaviour
In-Reply-To: <4ac1b140e57cb0c6a49cbef7976e74fb3f55d76a.camel@sipsolutions.net>
References: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>
 <296297cac0ffc34034512499b1b628240356034a.camel@sipsolutions.net>
 <CAJNV+9vkJtnPZ83KbXhExHd-93vk3+H-aAa38pqPtHqcNbdHzw@mail.gmail.com>
 <b1bb5d74e179c891cefff04bdd18f2b48dafc4ba.camel@sipsolutions.net>
 <4ac1b140e57cb0c6a49cbef7976e74fb3f55d76a.camel@sipsolutions.net>
Message-ID: <1e039ecb-10bb-4a30-a4d6-eceb37cce76b@Canary>

Hi Sebastian,

One way to avoid an ugly lookup table and special cases is to store the amount of sign bits, the amount of integer/mantissa bits and the amount of exponent bits for each numeric style. A safe cast can only happen if all three are exceeded or equal. Just a thought.

Best Regards,
Hameer Abbasi

> On Wednesday, Jun 12, 2019 at 9:50 PM, Sebastian Berg <sebastian at sipsolutions.net (mailto:sebastian at sipsolutions.net)> wrote:
> On Wed, 2019-06-12 at 12:03 -0500, Sebastian Berg wrote:
> > On Tue, 2019-06-11 at 22:08 -0400, Marten van Kerkwijk wrote:
> > > HI Sebastian,
> > >
> > > Thanks for the overview! In the value-based casting, what perhaps
> > > surprises me most is that it is done within a kind; it would seem
> > > an
> > > improvement to check whether a given integer scalar is exactly
> > > representable in a given float (your example of 1024 in `float16`).
> > > If we switch to the python-only scalar values idea, I would suggest
> > > to abandon this. That might make dealing with things like `Decimal`
> > > or `Fraction` easier as well.
> > >
> >
> > Yeah, one can argue that since we have this "safe casting" based
> > approach, we should go all the way for the value based logic. I think
> > I
> > tend to agree, but I am not quite sure right now to be honest.
>
> Just realized, one issue with this is that you get much more "special
> cases" if you think of it in terms of "minimal dtype". Because
> suddenly, not just the unsigned/signed integers such as "< 128" are
> special, but even more values require special handling. An int16
> "minimal dtype" may or may not be castable to float16.
>
> For `can_cast` that does not matter much, but if we use the same logic
> for promotion things may get uglier. Although, maybe it just gets
> uglier implementation wise and is fairly logic on the user side...
>
> - Sebastian
>
>
> >
> > Fractions and Decimals are very interesting in that they raise the
> > question what happens to user dtypes [0]. Although, you would still
> > need a "no lower category" rule, since you do not want 1024. or 12/3
> > be
> > demoted to an integer.
> >
> > For me right now, what is most interesting is what we should do with
> > ufunc calls, and if we can simplify them. I feel right now we have to
> > types of ufuncs:
> >
> > 1. Ufuncs which use a "common type", where we can find the minimal
> > type
> > before dispatching.
> >
> > 2. More complex ufuncs, for which finding the minimal type is
> > trickier
> > [1]. And while I could not find any weird enough ufunc, I am not sure
> > that blind promotion is a good idea for general ufuncs.
> >
> > Best,
> >
> > Sebastian
> >
> >
> > [0] A python fraction could be converted to int64/int64 or
> > int32/int32,
> > etc. depending on the value, in principle. If we want such things to
> > work in principle, we need machinery (although I expect one could tag
> > that on later).
> > [1] It is not impossible, but we need to insert non-existing types
> > into
> > the type hierarchy.
> >
> >
> >
> > PS: Another interesting issue is that if we try to move away from
> > value
> > based casting for numpy scalars, that initial `np.asarray(...)` call
> > may lose the information that a python integer was passed in. So to
> > support such things, we might need a whole new machinery.
> >
> >
> >
> >
> > > All the best,
> > >
> > > Marten
> > >
> > > On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg <
> > > sebastian at sipsolutions.net> wrote:
> > > > Hi all,
> > > >
> > > > strange, something went wrong sending that email, but in any
> > > > case...
> > > >
> > > > I tried to "summarize" the current behaviour of promotion and
> > > > value
> > > > based promotion in numpy (correcting a small error in what I
> > > > wrote
> > > > earlier). Since it got a bit long, you can find it here (also
> > > > copy
> > > > pasted at the end):
> > > >
> > > > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA
> > > >
> > > > Allan's document which I link in there is also very interesting.
> > > > One
> > > > thing I had not really thought about before was the problem of
> > > > commutativity.
> > > >
> > > > I do not have any specific points I want to discuss based on it
> > > > (but
> > > > those are likely to come up again later).
> > > >
> > > > All the Best,
> > > >
> > > > Sebastian
> > > >
> > > >
> > > > -----------------------------
> > > >
> > > > PS: Below a copy of what I wrote:
> > > >
> > > > ---
> > > > title: Numpy Value Based Promotion Rules
> > > > author: Sebastian Berg
> > > > ---
> > > >
> > > >
> > > >
> > > > NumPy Value Based Scalar Casting and Promotion
> > > > ==============================================
> > > >
> > > > This document reviews some of the behaviours of the promotion
> > > > rules
> > > > within numpy. This is especially with respect to the promotion of
> > > > scalars and 0D arrays which inspect the value to decide casting
> > > > and
> > > > promotion.
> > > >
> > > > Other documents discussing these things:
> > > >
> > > > * `from numpy.testing import print_coercion_tables` prints the
> > > > current promotion tables including value based promotion for
> > > > small
> > > > positive/negative scalars.
> > > > * Allan Haldane's thoughts on changing casting/promotion to be
> > > > more
> > > > C-like and discussing things such as here:
> > > >
> > > > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76
> > > > * Discussion around the problem of uint64 and int64 being
> > > > promoted to
> > > > float64: https://github.com/numpy/numpy/issues/12525 (lists many
> > > > related issues).
> > > >
> > > >
> > > > Nomenclature and Defintions
> > > > ---------------------------
> > > >
> > > > * **dtype/type**: The data type of an array or scalar: `float32`,
> > > > `float64`, `int8`, ?
> > > >
> > > > * **Category**: A category to which the data type belongs, in
> > > > this
> > > > context these are:
> > > > 1. boolean
> > > > 2. integer (unsigned and signed are not split up here, but are
> > > > different "kinds")
> > > > 3. floating point and complex (not split up here but are
> > > > different
> > > > "kinds")
> > > > 5. All others
> > > >
> > > > * **Casting**: converting from one dtype to another. There are
> > > > four
> > > > different rules of casting:
> > > > 1. *"safe"* casting: All values are representable in the new
> > > > data
> > > > type. I.e. no information is lost during the conversion.
> > > > 2. *"same kind"* casting: data loss may occur, but only within
> > > > the
> > > > same "kind". For example a float64 can be converted to float32
> > > > using
> > > > "same kind" rules, an int64 can be converted to int16. This is
> > > > although
> > > > both lose precision or even produce incorrect values. Note that
> > > > "kind"
> > > > is different from "category" in that it distinguishes between
> > > > signed
> > > > and unsigned integers.
> > > > 4. *"unsafe"* casting: Any conversion which can be defined,
> > > > e.g.
> > > > floating point to integer. For promotion this is fairly
> > > > unimportant.
> > > > (Some conversions such as string to integer, which not even work
> > > > fall
> > > > in this category, but could also be called coercions or
> > > > conversions.)
> > > >
> > > > * **Promotion**: The general process of finding a new dtype for
> > > > multiple input dtypes. Will be used here to also denote any kind
> > > > of
> > > > casting/promotion done before a specific function is called. This
> > > > can
> > > > be more complex, because in rare cases a functions can for
> > > > example
> > > > take
> > > > floating point numbers and integers as input at the same time
> > > > (i.e.
> > > > `np.ldexp`).
> > > >
> > > > * **Common dtype**: A dtype which can represent all input data.
> > > > In
> > > > general this means that all inputs can be safely cast to this
> > > > dtype.
> > > > Within numpy this is the normal and simplest form of promotion.
> > > >
> > > > * **`type1, type2 -> type3`**: Defines a promotion or signature.
> > > > For
> > > > example adding two integers: `np.int32(5) + np.int32(3)` gives
> > > > `np.int32(8)`. The dtype signature for that example would be:
> > > > `int32,
> > > > int32 -> int32`. A short form for this is also `ii->i` using C-
> > > > like
> > > > type codes, this can be found for example in `np.ldexp.types`
> > > > (and
> > > > any
> > > > numpy ufunc).
> > > >
> > > > * **Scalar**: A numpy or python scalar or a **0-D array**. It is
> > > > important to remember that zero dimensional arrays are treated
> > > > just
> > > > like scalars with respect to casting and promotion.
> > > >
> > > >
> > > > Current Situation in Numpy
> > > > --------------------------
> > > >
> > > > The current situation can be understand mostly in terms of safe
> > > > casting
> > > > which is defined based on the type hierarchy and is sensitive to
> > > > values
> > > > for scalars.
> > > >
> > > > This safe casting based approach is in contrast for example to
> > > > promotion within C or Julia, which work based on category first.
> > > > For
> > > > example `int32` cannot be safely cast to `float32`, but C or
> > > > Julia
> > > > will
> > > > use `int32, float32 -> float32` as the common type/promotion rule
> > > > for
> > > > example to decide on the output dtype for addition.
> > > >
> > > >
> > > > ### Python Integers and Floats
> > > >
> > > > Note that python integers are handled exactly like numpy ones.
> > > > They
> > > > are, however, special in that they do not have a dtype associated
> > > > with
> > > > them explicitly. Value based logic, as described here, seems
> > > > useful
> > > > for
> > > > python integers and floats to allow:
> > > > ```
> > > > arr = np.arange(10, dtype=np.int8)
> > > > arr += 1
> > > > # or:
> > > > res = arr + 1
> > > > res.dtype == np.int8
> > > > ```
> > > > which ensures that no upcast (for example with higher memory
> > > > usage)
> > > > occurs.
> > > >
> > > >
> > > > ### Safe Casting
> > > >
> > > > Most safe casting is clearly defined based on whether or not any
> > > > possible value is representable in the ouput dtype. Within numpy
> > > > there
> > > > is currently a single exception to this rule:
> > > > `np.can_cast(np.int64,
> > > > np.float64, casting="safe")` is considered to be true although
> > > > float64
> > > > cannot represent some large integer values exactly. In contrast,
> > > > `np.can_cast(np.int32, np.float32, casting="safe")` is `False`
> > > > and
> > > > `np.float64` would have to be used if a "safe" cast is desired.
> > > >
> > > > This exception may be one thing that should be changed, however,
> > > > concurrently the promotion rules have to be adapted to keep doing
> > > > the
> > > > same thing, or a larger behaviour change decided.
> > > >
> > > >
> > > > #### Scalar based rules
> > > >
> > > > Unlike arrays, where inspection of all values is not feasable,
> > > > for
> > > > scalars (and 0-D arrays) the value is inspected. The casting
> > > > becomes a
> > > > two step process:
> > > > 1. The minimal dtype capable of holding the value is found.
> > > > 2. The normal casting rules are applied to the new dtype.
> > > >
> > > > The first step uses the following rules by finding the minimal
> > > > dtype
> > > > within its category:
> > > >
> > > > * Boolean: Dtype is already minimal
> > > >
> > > > * Integers:
> > > > Casting is possible if output can hold the value. This
> > > > includes
> > > > uint8(127) casting to an int8.
> > > >
> > > > * Floats and Complex
> > > > Scalars can be demoted based on value, roughly this avoids
> > > > overflows:
> > > > ```
> > > > float16: -65000 < value < 65000
> > > > float32: -3.4e38 < value < 3.4e38
> > > > float64: -1.7e308 < value < 1.7e308
> > > > float128 (largest type, does not apply).
> > > > ```
> > > > For complex, the logic is simply applied to both real and
> > > > imaginary
> > > > part. Complex numbers cannot be downcast to floating point.
> > > >
> > > > * Others: Dtype is not modified.
> > > >
> > > >
> > > > This two step process means that `np.can_cast(np.int16(1024),
> > > > np.float16)` is `False` even though float16 is capable of exactly
> > > > representing the value 1024, since value based "demotion" to a
> > > > lower
> > > > dtype is used only within each category.
> > > >
> > > >
> > > >
> > > > ### Common Type Promotion
> > > >
> > > > For most operations in numpy the output type is just the common
> > > > type of
> > > > the inputs, this holds for example for concatenation, as well as
> > > > almost
> > > > all math funcions (e.g. addition and multiplication have two
> > > > identical
> > > > inputs and need one ouput dtype). This operation is exposed as
> > > > `np.result_type` which includes value based logic, and
> > > > `np.promote_types` which only accepts dtypes as input.
> > > >
> > > > Normal type promotion without value based/scalar logic finds the
> > > > smallest type which both inputs can cast to safely. This will be
> > > > the
> > > > largest "kind" (bool < unsigned < integer < float < complex <
> > > > other).
> > > >
> > > > Note that type promotion is handled in a "reduce" manner from
> > > > left
> > > > to
> > > > right. In rare cases this means it is not associatetive:
> > > > `float32,
> > > > uint16, int16 -> float32`, but `float32, (uint16, int16) ->
> > > > float64`.
> > > >
> > > > #### Scalar based rule
> > > >
> > > > When there is a mix of scalars and arrays, numpy will usually
> > > > allow
> > > > the
> > > > scalars to be handled in the same fashion as for "safe" casting
> > > > rules.
> > > >
> > > > The rules are as follows:
> > > >
> > > > 1. Value based logic is only applied if the "category" of any
> > > > array
> > > > is
> > > > larger or equal to the category of all scalars. If this is not
> > > > the
> > > > case, the typical rules are used.
> > > > * Specifically, this means: `np.array([1, 2, 3],
> > > > dtype=np.uint8) +
> > > > np.float64(12.)` gives a `float64` result, because the
> > > > `np.float64(12.)` is not considered for being demoted.
> > > >
> > > > 2. Promotion is applied as normally, however, instead of the
> > > > original
> > > > dtype, the minimal dtype is used. In the case where the minimal
> > > > data
> > > > type is unsigned (say uint8) but the value is small enough, the
> > > > minimal
> > > > type may in fact be either `uint8` or `int8` (127 can be both).
> > > > This
> > > > promotion is also applied in pairs (reduction-like) from left to
> > > > right.
> > > >
> > > >
> > > > ### General Promotion during Function Execution
> > > >
> > > > General functions (read "ufuncs" such as `np.add`) may have a
> > > > specific
> > > > dtype signature which is (for most dtypes) stored e.g. as
> > > > `np.add.types`. For many of these functions the common type
> > > > promotion
> > > > is used unchanged.
> > > >
> > > > However, some functions will employ a slightly different method
> > > > (which
> > > > should be equivalent in most cases). They will loop through all
> > > > loops
> > > > listed in `np.add.types` in order and find the first one to which
> > > > all
> > > > inputs can be safely cast:
> > > > ```
> > > > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...]
> > > > ```
> > > > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the
> > > > first
> > > > `float16, float16 -> float16` (`'ee->e'`) loop because `int16`
> > > > cannot
> > > > be cast safely, and then pick the float32 (`'ff->f'`) one.
> > > >
> > > > For simple functions, which commonly have two identical inputs,
> > > > this
> > > > should be identical, since normally a clear order exists for the
> > > > dtypes
> > > > (it does require checking int8 before uint8, etc.).
> > > >
> > > > #### Scalar based rule
> > > >
> > > > When scalars are involved, the "safe" cast logic based on values
> > > > is
> > > > applied *if and only if* rule 1. applies as before: That is there
> > > > must
> > > > be an array with a higher or equal category as all of the
> > > > scalars.
> > > >
> > > > In the above `np.divide` example, this means that
> > > > `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will*
> > > > use
> > > > the `'ee->e'` loop, because the scalar `4` is of a lower or equal
> > > > category than the array (integer <= float or complex). While
> > > > checking,
> > > > 4 is found to be safely castable to float16, since `(u)int8` is
> > > > sufficient to hold 4 and that can be safely cast to `float16`.
> > > > However, `np.divide(np.int16(4), np.int16(3))` would use
> > > > `float32`
> > > > because both are scalars and thus value based logic is not used
> > > > (Note
> > > > that in reality numpy forces double output for an all integer
> > > > input
> > > > in
> > > > divide).
> > > >
> > > > In it is possible for ufuncs to have mixed type signatures (this
> > > > is
> > > > very rare within numy) and arbitrary inputs. In this case, in
> > > > principle, the question is whether or not a clear ordering exists
> > > > and
> > > > if the rule of using value based logic is always clear. This is
> > > > rather
> > > > academical (I could not find any such function in numpy or
> > > > `scipy.special` [^scipy-ufuncs]). But consider:
> > > > ```
> > > > imaginary_ufunc.types:
> > > > int32, float32 -> int32, float32
> > > > int64, float32 -> int64, float32
> > > > ...
> > > > ```
> > > > it is not clear that `np.int64(5) + np.float32(3.)` should be
> > > > able
> > > > to
> > > > demote the `5`. This is very theoretical of course
> > > >
> > > >
> > > >
> > > >
> > > > Footnotes
> > > > ---------
> > > >
> > > > [^scipy-ufuncs]: See for example these functions:
> > > > ```python
> > > > import scipy.special
> > > > for n, func in scipy.special.__dict__.items():
> > > > if not isinstance(func, np.ufunc):
> > > > continue
> > > >
> > > > if func.nin == 1:
> > > > # a single input is not interesting
> > > > continue
> > > >
> > > > # check if the signature is not uniform
> > > > for types in func.types:
> > > > if len(set(types[:func.nin])) != 1:
> > > > break
> > > > else:
> > > > continue
> > > > print(func, func.types)
> > > > ```
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/4813dae2/attachment-0001.html>

From ralf.gommers at gmail.com  Thu Jun 13 04:17:31 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 13 Jun 2019 10:17:31 +0200
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
Message-ID: <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>

On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Hi Ralf,
>>
>> You're right, the problem is with the added keyword argument (which would
>> appear also if we did not still have to support the old .sum method
>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>> maybe worse given that it seems likely the reduce method has seen much less
>> testing in  __array_ufunc__ implementations).
>>
>> Still, I do think the question stands: we implement a `nansum` for our
>> ndarray class a certain way, and provide ways to override it (three now, in
>> fact). Is it really reasonable to expect that we wait 4 versions for other
>> packages to keep up with this, and thus get stuck with given internal
>> implementations?
>>
>
In general, I'd say yes. This is a problem of our own making. If an
implementation uses np.sum on an argument that can be a subclass or duck
array (i.e. we didn't coerce with asarray), then the code is calling out to
outside of numpy. At that point we have effectively made something public.
We can still change it, but only if we're sure that we don't break things
in the process (i.e. we then insert a new asarray, something you're not
happy about in general ....).

I don't get the "three ways to override nansum" by the way. There's only
one I think: __array_function__. There may be more for the sum() method.

Another way to say the same thing: if a subclass does everything right, and
we decide to add a new keyword to some function in 1.17 without considering
those subclasses, then turning around straight after and saying "oh those
subclasses rely on our old API, it may be okay to either break them or send
them a deprecation warning" (which is I think what you were advocating for
- your 4 and 3 options) is not reasonable.


>> Aside: note that the present version of the nanfunctions relies on
>> turning the arguments into arrays and copying 0s into it - that suggests
>> that currently they do not work for duck arrays like Dask.
>>
>
There being an issue with matrix in the PR suggests there is an issue for
subclasses at least?


> Agreed. We could safely rewrite things to use np.asarray(), without any
> need to worry about backends compatibility. From an API perspective,
> nothing would change -- we already cast inputs into base numpy arrays
> inside the _replace_nan() routine.
>

In that case yes, nansum can be rewritten. However, it's only because of
the use of (as)array - if it were asanyarray that would already scupper the
plan.

Cheers,
Ralf


>
>>
>> All the best,
>>
>> Marten
>>
>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>>> stefanv at berkeley.edu> wrote:
>>>
>>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>>> > In a way, I brought it up mostly as a concrete example of an internal
>>>> > implementation which we cannot change to an objectively cleaner one
>>>> because
>>>> > other packages rely on an out-of-date numpy API.
>>>>
>>>
>>> I think this is not the right way to describe the problem (see below).
>>>
>>>
>>>> This, and the comments Nathaniel made on the array function thread, are
>>>> important to take note of.  Would it be worth updating NEP 18 with a
>>>> list of pitfalls?  Or should this be a new informational NEP that
>>>> discusses?on a higher level?the benefits, risks, and design
>>>> considerations of providing protocols?
>>>>
>>>
>>> That would be a nice thing to do (the higher level one), but in this
>>> case I think the issue has little to do with NEP 18. The summary of the
>>> issue in this thread is a little brief, so let me try to clarify.
>>>
>>> 1. np.sum gained a new `where=` keyword in 1.17.0
>>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try
>>> to use that
>>> 3. the `_wrapreduction` utility that forwards the function to the method
>>> will compare signatures of np.sum and x.sum, and throw an error if there's
>>> a mismatch for any keywords that have a value other than the default
>>> np._NoValue
>>>
>>> Code to check this:
>>> >>> x1 = np.arange(5)
>>> >>> x2 = np.asmatrix(x1)
>>> >>> np.sum(x1)  # works
>>> >>> np.sum(x2)  # works
>>> >>> np.sum(x1, where=x1>3)  # works
>>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>>> ...
>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>
>>> Note that this is not specific to np.matrix. Using pandas.Series you
>>> also get a TypeError:
>>> >>> y = pd.Series(x1)
>>> >>> np.sum(y)  # works
>>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>>> ...
>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>
>>> The issue is that when we have this kind of forwarding logic,
>>> irrespective of how it's implemented, new keywords cannot be used until the
>>> array-like objects with the methods that get forwarded to gain the same
>>> keyword.
>>>
>>> tl;dr this is simply a cost we have to be aware of when either proposing
>>> to add new keywords, or when proposing any kind of dispatching logic (in
>>> this case `_wrapreduction`).
>>>
>>> Regarding internal use of  `np.sum(..., where=)`: this should not be
>>> done until at least 4-5 versions from now, and preferably way longer than
>>> that. Because doing so will break already released versions of Pandas,
>>> Dask, and other libraries with array-like objects.
>>>
>>> Cheers,
>>> Ralf
>>>
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/498e0de0/attachment.html>

From m.h.vankerkwijk at gmail.com  Thu Jun 13 08:50:38 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 13 Jun 2019 08:50:38 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
Message-ID: <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>

Hi All,

`nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the
plan to use `asarray` - sorry to have been sloppy with stating they
"convert to array".

And, yes, I'll admit to forgetting that we are introducing `where` only in
1.17 - clearly we cannot rely on other classes to have already implemented
it!! (This is the problem with having done it myself - it seems ages ago!)

For the old implementation, there was indeed only one way to override
(`__array_function__`) for non-subclasses, but for the new implementation
there are three ways, though the first two overlap:
- support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
- support `__array_ufunc__` including reductions (with all their arguments)
- support `__array_function__`

I think that eventually one would like to move to the case where there is
only one (`__array_ufunc__` in this case, since really it is just a
combination of two ufuncs, isnan and add.reduce in this case).

 Anyway, I guess this is still a good example to consider for how we should
go about getting to a new implementation, ideally with just a single-way to
override?

Indeed, how do we actually envisage deprecating the use of
`__array_function__` for a given part of the numpy API? Are we allowed to
go cold-turkey if the new implementation is covered by `__array_ufunc__`?

All the best,

Marten


On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
>> m.h.vankerkwijk at gmail.com> wrote:
>>
>>> Hi Ralf,
>>>
>>> You're right, the problem is with the added keyword argument (which
>>> would appear also if we did not still have to support the old .sum method
>>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>>> maybe worse given that it seems likely the reduce method has seen much less
>>> testing in  __array_ufunc__ implementations).
>>>
>>> Still, I do think the question stands: we implement a `nansum` for our
>>> ndarray class a certain way, and provide ways to override it (three now, in
>>> fact). Is it really reasonable to expect that we wait 4 versions for other
>>> packages to keep up with this, and thus get stuck with given internal
>>> implementations?
>>>
>>
> In general, I'd say yes. This is a problem of our own making. If an
> implementation uses np.sum on an argument that can be a subclass or duck
> array (i.e. we didn't coerce with asarray), then the code is calling out to
> outside of numpy. At that point we have effectively made something public.
> We can still change it, but only if we're sure that we don't break things
> in the process (i.e. we then insert a new asarray, something you're not
> happy about in general ....).
>
> I don't get the "three ways to override nansum" by the way. There's only
> one I think: __array_function__. There may be more for the sum() method.
>
> Another way to say the same thing: if a subclass does everything right,
> and we decide to add a new keyword to some function in 1.17 without
> considering those subclasses, then turning around straight after and saying
> "oh those subclasses rely on our old API, it may be okay to either break
> them or send them a deprecation warning" (which is I think what you were
> advocating for - your 4 and 3 options) is not reasonable.
>
>
>>> Aside: note that the present version of the nanfunctions relies on
>>> turning the arguments into arrays and copying 0s into it - that suggests
>>> that currently they do not work for duck arrays like Dask.
>>>
>>
> There being an issue with matrix in the PR suggests there is an issue for
> subclasses at least?
>
>
>> Agreed. We could safely rewrite things to use np.asarray(), without any
>> need to worry about backends compatibility. From an API perspective,
>> nothing would change -- we already cast inputs into base numpy arrays
>> inside the _replace_nan() routine.
>>
>
> In that case yes, nansum can be rewritten. However, it's only because of
> the use of (as)array - if it were asanyarray that would already scupper the
> plan.
>
> Cheers,
> Ralf
>
>
>>
>>>
>>> All the best,
>>>
>>> Marten
>>>
>>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>>>> stefanv at berkeley.edu> wrote:
>>>>
>>>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>>>> > In a way, I brought it up mostly as a concrete example of an internal
>>>>> > implementation which we cannot change to an objectively cleaner one
>>>>> because
>>>>> > other packages rely on an out-of-date numpy API.
>>>>>
>>>>
>>>> I think this is not the right way to describe the problem (see below).
>>>>
>>>>
>>>>> This, and the comments Nathaniel made on the array function thread, are
>>>>> important to take note of.  Would it be worth updating NEP 18 with a
>>>>> list of pitfalls?  Or should this be a new informational NEP that
>>>>> discusses?on a higher level?the benefits, risks, and design
>>>>> considerations of providing protocols?
>>>>>
>>>>
>>>> That would be a nice thing to do (the higher level one), but in this
>>>> case I think the issue has little to do with NEP 18. The summary of the
>>>> issue in this thread is a little brief, so let me try to clarify.
>>>>
>>>> 1. np.sum gained a new `where=` keyword in 1.17.0
>>>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try
>>>> to use that
>>>> 3. the `_wrapreduction` utility that forwards the function to the
>>>> method will compare signatures of np.sum and x.sum, and throw an error if
>>>> there's a mismatch for any keywords that have a value other than the
>>>> default np._NoValue
>>>>
>>>> Code to check this:
>>>> >>> x1 = np.arange(5)
>>>> >>> x2 = np.asmatrix(x1)
>>>> >>> np.sum(x1)  # works
>>>> >>> np.sum(x2)  # works
>>>> >>> np.sum(x1, where=x1>3)  # works
>>>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>>>> ...
>>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>>
>>>> Note that this is not specific to np.matrix. Using pandas.Series you
>>>> also get a TypeError:
>>>> >>> y = pd.Series(x1)
>>>> >>> np.sum(y)  # works
>>>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>>>> ...
>>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>>
>>>> The issue is that when we have this kind of forwarding logic,
>>>> irrespective of how it's implemented, new keywords cannot be used until the
>>>> array-like objects with the methods that get forwarded to gain the same
>>>> keyword.
>>>>
>>>> tl;dr this is simply a cost we have to be aware of when either
>>>> proposing to add new keywords, or when proposing any kind of dispatching
>>>> logic (in this case `_wrapreduction`).
>>>>
>>>> Regarding internal use of  `np.sum(..., where=)`: this should not be
>>>> done until at least 4-5 versions from now, and preferably way longer than
>>>> that. Because doing so will break already released versions of Pandas,
>>>> Dask, and other libraries with array-like objects.
>>>>
>>>> Cheers,
>>>> Ralf
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/e31edc12/attachment-0001.html>

From ralf.gommers at gmail.com  Thu Jun 13 11:15:09 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 13 Jun 2019 17:15:09 +0200
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
Message-ID: <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>

On Thu, Jun 13, 2019 at 2:51 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi All,
>
> `nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers
> the plan to use `asarray` - sorry to have been sloppy with stating they
> "convert to array".
>
> And, yes, I'll admit to forgetting that we are introducing `where` only in
> 1.17 - clearly we cannot rely on other classes to have already implemented
> it!! (This is the problem with having done it myself - it seems ages ago!)
>
> For the old implementation, there was indeed only one way to override
> (`__array_function__`) for non-subclasses, but for the new implementation
> there are three ways, though the first two overlap:
> - support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
> - support `__array_ufunc__` including reductions (with all their arguments)
> - support `__array_function__`
>
> I think that eventually one would like to move to the case where there is
> only one (`__array_ufunc__` in this case, since really it is just a
> combination of two ufuncs, isnan and add.reduce in this case).
>

I don't think the other ones are really ways of "overriding nansum". They
happen to work, but rely on internal implementation details and then
overriding functions inside those details.

It sounds like you're suggestion a new way of declaring "this is a
canonical implementation that duck arrays can rely on". I can see that
happening, but we probably want some standard way to mark those functions,
have a note in the docs and some tests.


>  Anyway, I guess this is still a good example to consider for how we
> should go about getting to a new implementation, ideally with just a
> single-way to override?
>
> Indeed, how do we actually envisage deprecating the use of
> `__array_function__` for a given part of the numpy API? Are we allowed to
> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>

I think __array_function__ is still the best way to do this (that's the
only actual override, so most robust and performant likely), so I don't see
any reason for a deprecation.

Cheers,
Ralf


> All the best,
>
> Marten
>
>
>
> On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>>> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
>>> m.h.vankerkwijk at gmail.com> wrote:
>>>
>>>> Hi Ralf,
>>>>
>>>> You're right, the problem is with the added keyword argument (which
>>>> would appear also if we did not still have to support the old .sum method
>>>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>>>> maybe worse given that it seems likely the reduce method has seen much less
>>>> testing in  __array_ufunc__ implementations).
>>>>
>>>> Still, I do think the question stands: we implement a `nansum` for our
>>>> ndarray class a certain way, and provide ways to override it (three now, in
>>>> fact). Is it really reasonable to expect that we wait 4 versions for other
>>>> packages to keep up with this, and thus get stuck with given internal
>>>> implementations?
>>>>
>>>
>> In general, I'd say yes. This is a problem of our own making. If an
>> implementation uses np.sum on an argument that can be a subclass or duck
>> array (i.e. we didn't coerce with asarray), then the code is calling out to
>> outside of numpy. At that point we have effectively made something public.
>> We can still change it, but only if we're sure that we don't break things
>> in the process (i.e. we then insert a new asarray, something you're not
>> happy about in general ....).
>>
>> I don't get the "three ways to override nansum" by the way. There's only
>> one I think: __array_function__. There may be more for the sum() method.
>>
>> Another way to say the same thing: if a subclass does everything right,
>> and we decide to add a new keyword to some function in 1.17 without
>> considering those subclasses, then turning around straight after and saying
>> "oh those subclasses rely on our old API, it may be okay to either break
>> them or send them a deprecation warning" (which is I think what you were
>> advocating for - your 4 and 3 options) is not reasonable.
>>
>>
>>>> Aside: note that the present version of the nanfunctions relies on
>>>> turning the arguments into arrays and copying 0s into it - that suggests
>>>> that currently they do not work for duck arrays like Dask.
>>>>
>>>
>> There being an issue with matrix in the PR suggests there is an issue for
>> subclasses at least?
>>
>>
>>> Agreed. We could safely rewrite things to use np.asarray(), without any
>>> need to worry about backends compatibility. From an API perspective,
>>> nothing would change -- we already cast inputs into base numpy arrays
>>> inside the _replace_nan() routine.
>>>
>>
>> In that case yes, nansum can be rewritten. However, it's only because of
>> the use of (as)array - if it were asanyarray that would already scupper the
>> plan.
>>
>> Cheers,
>> Ralf
>>
>>
>>>
>>>>
>>>> All the best,
>>>>
>>>> Marten
>>>>
>>>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>>>>> stefanv at berkeley.edu> wrote:
>>>>>
>>>>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>>>>> > In a way, I brought it up mostly as a concrete example of an
>>>>>> internal
>>>>>> > implementation which we cannot change to an objectively cleaner one
>>>>>> because
>>>>>> > other packages rely on an out-of-date numpy API.
>>>>>>
>>>>>
>>>>> I think this is not the right way to describe the problem (see below).
>>>>>
>>>>>
>>>>>> This, and the comments Nathaniel made on the array function thread,
>>>>>> are
>>>>>> important to take note of.  Would it be worth updating NEP 18 with a
>>>>>> list of pitfalls?  Or should this be a new informational NEP that
>>>>>> discusses?on a higher level?the benefits, risks, and design
>>>>>> considerations of providing protocols?
>>>>>>
>>>>>
>>>>> That would be a nice thing to do (the higher level one), but in this
>>>>> case I think the issue has little to do with NEP 18. The summary of the
>>>>> issue in this thread is a little brief, so let me try to clarify.
>>>>>
>>>>> 1. np.sum gained a new `where=` keyword in 1.17.0
>>>>> 2. using np.sum(x) will detect a `x.sum` method if it's present and
>>>>> try to use that
>>>>> 3. the `_wrapreduction` utility that forwards the function to the
>>>>> method will compare signatures of np.sum and x.sum, and throw an error if
>>>>> there's a mismatch for any keywords that have a value other than the
>>>>> default np._NoValue
>>>>>
>>>>> Code to check this:
>>>>> >>> x1 = np.arange(5)
>>>>> >>> x2 = np.asmatrix(x1)
>>>>> >>> np.sum(x1)  # works
>>>>> >>> np.sum(x2)  # works
>>>>> >>> np.sum(x1, where=x1>3)  # works
>>>>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>>>>> ...
>>>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>>>
>>>>> Note that this is not specific to np.matrix. Using pandas.Series you
>>>>> also get a TypeError:
>>>>> >>> y = pd.Series(x1)
>>>>> >>> np.sum(y)  # works
>>>>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>>>>> ...
>>>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>>>
>>>>> The issue is that when we have this kind of forwarding logic,
>>>>> irrespective of how it's implemented, new keywords cannot be used until the
>>>>> array-like objects with the methods that get forwarded to gain the same
>>>>> keyword.
>>>>>
>>>>> tl;dr this is simply a cost we have to be aware of when either
>>>>> proposing to add new keywords, or when proposing any kind of dispatching
>>>>> logic (in this case `_wrapreduction`).
>>>>>
>>>>> Regarding internal use of  `np.sum(..., where=)`: this should not be
>>>>> done until at least 4-5 versions from now, and preferably way longer than
>>>>> that. Because doing so will break already released versions of Pandas,
>>>>> Dask, and other libraries with array-like objects.
>>>>>
>>>>> Cheers,
>>>>> Ralf
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/fe853d72/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Thu Jun 13 12:34:01 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 13 Jun 2019 12:34:01 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
Message-ID: <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>

Hi Ralf, others,


>>  Anyway, I guess this is still a good example to consider for how we
>> should go about getting to a new implementation, ideally with just a
>> single-way to override?
>>
>> Indeed, how do we actually envisage deprecating the use of
>> `__array_function__` for a given part of the numpy API? Are we allowed to
>> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>>
>
> I think __array_function__ is still the best way to do this (that's the
> only actual override, so most robust and performant likely), so I don't see
> any reason for a deprecation.
>
> Yes, I fear I have to agree for the nan-functions, at least for now...

But how about `np.sum` itself? Right now, it is overridden by
__array_function__ but classes without __array_function__ support can also
override it through the method lookup and through __array_ufunc__.

Would/should there be a point where we just have `sum = np.add.reduce` and
drop other overrides? If so, how do we get there?

One option might be start reversing the order in `_wrapreduction` - try
`__array_ufunc__` if it is defined and only if that fails try the `.sum`
method.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/747cbab3/attachment.html>

From shoyer at gmail.com  Thu Jun 13 12:45:44 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 13 Jun 2019 09:45:44 -0700
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
Message-ID: <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>

On Thu, Jun 13, 2019 at 9:35 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf, others,
>
>
>>>  Anyway, I guess this is still a good example to consider for how we
>>> should go about getting to a new implementation, ideally with just a
>>> single-way to override?
>>>
>>> Indeed, how do we actually envisage deprecating the use of
>>> `__array_function__` for a given part of the numpy API? Are we allowed to
>>> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>>>
>>
>> I think __array_function__ is still the best way to do this (that's the
>> only actual override, so most robust and performant likely), so I don't see
>> any reason for a deprecation.
>>
>> Yes, I fear I have to agree for the nan-functions, at least for now...
>
> But how about `np.sum` itself? Right now, it is overridden by
> __array_function__ but classes without __array_function__ support can also
> override it through the method lookup and through __array_ufunc__.
>
> Would/should there be a point where we just have `sum = np.add.reduce` and
> drop other overrides? If so, how do we get there?
>
> One option might be start reversing the order in `_wrapreduction` - try
> `__array_ufunc__` if it is defined and only if that fails try the `.sum`
> method.
>

Yes, I think we would need to do this sort of thing. It's a bit of trouble,
but probably doable with some decorator magic. It would indeed be nice for
sum() to eventually just be np.add.reduce, though to be honest I'm not
entirely sure it's worth the trouble of a long deprecation cycle -- people
have been relying on the fall-back calling of methods for a long time.


> All the best,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/93b606f3/attachment.html>

From m.h.vankerkwijk at gmail.com  Thu Jun 13 13:06:35 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 13 Jun 2019 13:06:35 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
 <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
Message-ID: <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>

On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer <shoyer at gmail.com> wrote:
<snip>

>
>
>> But how about `np.sum` itself? Right now, it is overridden by
>> __array_function__ but classes without __array_function__ support can also
>> override it through the method lookup and through __array_ufunc__.
>>
>> Would/should there be a point where we just have `sum = np.add.reduce`
>> and drop other overrides? If so, how do we get there?
>>
>> One option might be start reversing the order in `_wrapreduction` - try
>> `__array_ufunc__` if it is defined and only if that fails try the `.sum`
>> method.
>>
>
> Yes, I think we would need to do this sort of thing. It's a bit of
> trouble, but probably doable with some decorator magic. It would indeed be
> nice for sum() to eventually just be np.add.reduce, though to be honest I'm
> not entirely sure it's worth the trouble of a long deprecation cycle --
> people have been relying on the fall-back calling of methods for a long
> time.
>
>
I guess the one immediate question is whether `np.sum` and the like should
be overridden by `__array_function__` at all, given that what should be the
future recommended override already works.

(And, yes, arguably too late to change it now!)

-- Marten


>
>
>> All the best,
>>
>> Marten
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/922faf7b/attachment-0001.html>

From ralf.gommers at gmail.com  Thu Jun 13 13:42:24 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 13 Jun 2019 19:42:24 +0200
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
 <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
 <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>
Message-ID: <CABL7CQiOd+fXF+F-r2gr4-GgnU_22WFF2x_ys_OTSYOwi2ob7w@mail.gmail.com>

On Thu, Jun 13, 2019 at 7:07 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

>
>
> On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer <shoyer at gmail.com> wrote:
> <snip>
>
>>
>>
>>> But how about `np.sum` itself? Right now, it is overridden by
>>> __array_function__ but classes without __array_function__ support can also
>>> override it through the method lookup and through __array_ufunc__.
>>>
>>> Would/should there be a point where we just have `sum = np.add.reduce`
>>> and drop other overrides? If so, how do we get there?
>>>
>>> One option might be start reversing the order in `_wrapreduction` - try
>>> `__array_ufunc__` if it is defined and only if that fails try the `.sum`
>>> method.
>>>
>>
>> Yes, I think we would need to do this sort of thing. It's a bit of
>> trouble, but probably doable with some decorator magic. It would indeed be
>> nice for sum() to eventually just be np.add.reduce, though to be honest I'm
>> not entirely sure it's worth the trouble of a long deprecation cycle --
>> people have been relying on the fall-back calling of methods for a long
>> time.
>>
>>
> I guess the one immediate question is whether `np.sum` and the like should
> be overridden by `__array_function__` at all, given that what should be the
> future recommended override already works.
>

I'm not sure I understand the rationale for this. Design consistency
matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and
other functions __array_function__. Why keep on tweaking things for little
benefit?

Ralf


> (And, yes, arguably too late to change it now!)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/41bcccd0/attachment.html>

From m.h.vankerkwijk at gmail.com  Thu Jun 13 15:43:22 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 13 Jun 2019 15:43:22 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CABL7CQiOd+fXF+F-r2gr4-GgnU_22WFF2x_ys_OTSYOwi2ob7w@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
 <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
 <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>
 <CABL7CQiOd+fXF+F-r2gr4-GgnU_22WFF2x_ys_OTSYOwi2ob7w@mail.gmail.com>
Message-ID: <CAJNV+9tnZ0PR+OvXPxWi7_j7i4xyy65TUx=56Cygcf34Lktg2w@mail.gmail.com>

Hi Ralf,


>> I guess the one immediate question is whether `np.sum` and the like
>> should be overridden by `__array_function__` at all, given that what should
>> be the future recommended override already works.
>>
>
> I'm not sure I understand the rationale for this. Design consistency
> matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and
> other functions __array_function__. Why keep on tweaking things for little
> benefit?
>

I'm mostly trying to understand how we would actually change things. I
guess your quite logical argument for consistency is that it requires
`np.sum is np.add.reduce`. But to do that, one would have to get rid of the
`.sum()` method override, and then deprecate using `__array_function__` on
`np.sum`.

A perhaps clearer example is `np.isposinf`, where the implementation truly
consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess
following your consistency logic, one would not remove the
`__array_function__` override until it actually became a ufunc itself.

Anyway, I think this discussion has been useful, if only in making it yet
more clear how difficult it is to deprecate anything!

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/a90e3f2c/attachment.html>

From ralf.gommers at gmail.com  Thu Jun 13 18:34:38 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 14 Jun 2019 00:34:38 +0200
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9tnZ0PR+OvXPxWi7_j7i4xyy65TUx=56Cygcf34Lktg2w@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
 <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
 <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>
 <CABL7CQiOd+fXF+F-r2gr4-GgnU_22WFF2x_ys_OTSYOwi2ob7w@mail.gmail.com>
 <CAJNV+9tnZ0PR+OvXPxWi7_j7i4xyy65TUx=56Cygcf34Lktg2w@mail.gmail.com>
Message-ID: <CABL7CQi0wjqgmjaa8Z8qVTvuPNNsXFC8Wki+r0Wacpd30gisqA@mail.gmail.com>

On Thu, Jun 13, 2019 at 9:43 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf,
>
>
>>> I guess the one immediate question is whether `np.sum` and the like
>>> should be overridden by `__array_function__` at all, given that what should
>>> be the future recommended override already works.
>>>
>>
>> I'm not sure I understand the rationale for this. Design consistency
>> matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and
>> other functions __array_function__. Why keep on tweaking things for little
>> benefit?
>>
>
> I'm mostly trying to understand how we would actually change things.
>

I think we do have ways to deprecate things and reason about how to make
the trade-offs to do so. Function overrides are not a whole lot different I
think, we can apply the same method (plus there's a special bit in NEP 18
that we reserve the right to change functions into ufuncs and use
`__array_ufunc__`).

I guess your quite logical argument for consistency is that it requires
> `np.sum is np.add.reduce`. But to do that, one would have to get rid of the
> `.sum()` method override, and then deprecate using `__array_function__` on
> `np.sum`.
>
> A perhaps clearer example is `np.isposinf`, where the implementation truly
> consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess
> following your consistency logic, one would not remove the
> `__array_function__` override until it actually became a ufunc itself.
>

Correct.

More importantly, I think we should not even consider *discussing*
removing` __array_function__` from np.isposinf (or any similar one off
situation) before there's a new bigger picture design. This is not about
"deprecation is hard", this is about doing things with purpose rather than
ad-hoc, as well as recognizing that lots of small changes are a major drain
on our limited maintainer resources. About the latter issue I wrote a blog
post recently, perhaps that clarifies the previous sentence a bit and gives
some more insight in my perspective:
https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/

Cheers,
Ralf


> Anyway, I think this discussion has been useful, if only in making it yet
> more clear how difficult it is to deprecate anything!
>
> All the best,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190614/92253184/attachment-0001.html>

From sebastian at sipsolutions.net  Thu Jun 13 18:37:03 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 13 Jun 2019 17:37:03 -0500
Subject: [Numpy-discussion] (Value Based Promotion) Current Behaviour
In-Reply-To: <1e039ecb-10bb-4a30-a4d6-eceb37cce76b@Canary>
References: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>
 <296297cac0ffc34034512499b1b628240356034a.camel@sipsolutions.net>
 <CAJNV+9vkJtnPZ83KbXhExHd-93vk3+H-aAa38pqPtHqcNbdHzw@mail.gmail.com>
 <b1bb5d74e179c891cefff04bdd18f2b48dafc4ba.camel@sipsolutions.net>
 <4ac1b140e57cb0c6a49cbef7976e74fb3f55d76a.camel@sipsolutions.net>
 <1e039ecb-10bb-4a30-a4d6-eceb37cce76b@Canary>
Message-ID: <e97b4dc6904cd1c6d1d5e2e5ee839206a3e2d12d.camel@sipsolutions.net>

(this may be a bit thinking out loudly...)


On Thu, 2019-06-13 at 07:30 +0200, Hameer Abbasi wrote:
> Hi Sebastian,
> 
> One way to avoid an ugly lookup table and special cases is to store
> the amount of sign bits, the amount of integer/mantissa bits and the
> amount of exponent bits for each numeric style. A safe cast can only
> happen if all three are exceeded or equal. Just a thought.
> 

True, although I am not sure I like it as a general solution. Within
numpy, we do not have much problems in any case, and I do not mind the
table as such.

What I am worrying more about right now is what happens with user
dtypes and the "minimal dtype" logic for them. For example, if a user
creates an int24, an python integer could probably not get cast to it
automatically (since it is a user defined type). But for the sake of
"minimal dtype" should we try to support it?
I.e. should: `user_library.int24(1) - 2**20` have machinery to convert
2**20 to an int24 instead of an int32 as "minimal type"? But if a
second user dtype does not know about int24 (say rational), it may be
an invalid "minimal dtype" for that (at least unless numpy tries to
automagically fill holes in the casting table, and that seems like too
much magic to me)?
(Another example is a masked_int8 which uses -128 to mean NA)

Similarly, fixing the hierarchy by representing 0-127 with a uint7 is
problematic since current dtypes cannot register with it or have not
done so (instead it really means "uint8 or int8"). Of course you could
do that for them, but its just another complexity hoop to jump through.

A similar thing is Marten's thought about intermediate values of int16
casting safely to float16, requiring ever more fine grained value based
logic.


Just to be clear, I think we _can_ very much live with any hypothetical
inconsistencies when it comes to "minimal dtypes" for the time being;
they seem very much irrelevant!
My issue is that I would like to figure out a good final solution that
covers them (since I doubt we will get rid of them completely).
And if that means that we cannot cache them very well (because any
value _could_ behave differently) then maybe so be it.
Even if that means that `arr + python_int` is slower then `arr +
int32(python_int)` then that is maybe fine, right now for all we know
the difference, may even be small.

For such a scalar object instead what would seem necessary is to call a
`dtype.__coerce_pyvalue__(scalar, casting="safe")`, or a
`__can_coerce_pyvalue__` method/slot. It would replace the current
`PyArray_CanCastArrayTo`, which can only handle the current hardcoded
special "minimum value" rules. And of course it also means that
resolvers need to handle such scalar(?) objects, but in many cases they
do not need more than "can cast" anyway.

Best,

Sebastian


> Best Regards,
> Hameer Abbasi
> 
> > On Wednesday, Jun 12, 2019 at 9:50 PM, Sebastian Berg <
> > sebastian at sipsolutions.net> wrote:
> > On Wed, 2019-06-12 at 12:03 -0500, Sebastian Berg wrote: 
> > > On Tue, 2019-06-11 at 22:08 -0400, Marten van Kerkwijk wrote: 
> > > > HI Sebastian, 
> > > > 
> > > > Thanks for the overview! In the value-based casting, what
> > > > perhaps 
> > > > surprises me most is that it is done within a kind; it would
> > > > seem 
> > > > an 
> > > > improvement to check whether a given integer scalar is exactly 
> > > > representable in a given float (your example of 1024 in
> > > > `float16`). 
> > > > If we switch to the python-only scalar values idea, I would
> > > > suggest 
> > > > to abandon this. That might make dealing with things like
> > > > `Decimal` 
> > > > or `Fraction` easier as well. 
> > > > 
> > >  
> > > Yeah, one can argue that since we have this "safe casting" based 
> > > approach, we should go all the way for the value based logic. I
> > > think 
> > > I 
> > > tend to agree, but I am not quite sure right now to be honest. 
> >  
> > Just realized, one issue with this is that you get much more
> > "special 
> > cases" if you think of it in terms of "minimal dtype". Because 
> > suddenly, not just the unsigned/signed integers such as "< 128"
> > are 
> > special, but even more values require special handling. An int16 
> > "minimal dtype" may or may not be castable to float16. 
> > 
> > For `can_cast` that does not matter much, but if we use the same
> > logic 
> > for promotion things may get uglier. Although, maybe it just gets 
> > uglier implementation wise and is fairly logic on the user side... 
> > 
> > - Sebastian 
> > 
> > 
> > > Fractions and Decimals are very interesting in that they raise
> > > the 
> > > question what happens to user dtypes [0]. Although, you would
> > > still 
> > > need a "no lower category" rule, since you do not want 1024. or
> > > 12/3 
> > > be 
> > > demoted to an integer. 
> > > 
> > > For me right now, what is most interesting is what we should do
> > > with 
> > > ufunc calls, and if we can simplify them. I feel right now we
> > > have to 
> > > types of ufuncs: 
> > > 
> > > 1. Ufuncs which use a "common type", where we can find the
> > > minimal 
> > > type 
> > > before dispatching. 
> > > 
> > > 2. More complex ufuncs, for which finding the minimal type is 
> > > trickier 
> > > [1]. And while I could not find any weird enough ufunc, I am not
> > > sure 
> > > that blind promotion is a good idea for general ufuncs. 
> > > 
> > > Best, 
> > > 
> > > Sebastian 
> > > 
> > > 
> > > [0] A python fraction could be converted to int64/int64 or 
> > > int32/int32, 
> > > etc. depending on the value, in principle. If we want such things
> > > to 
> > > work in principle, we need machinery (although I expect one could
> > > tag 
> > > that on later). 
> > > [1] It is not impossible, but we need to insert non-existing
> > > types 
> > > into 
> > > the type hierarchy. 
> > > 
> > > 
> > > 
> > > PS: Another interesting issue is that if we try to move away
> > > from 
> > > value 
> > > based casting for numpy scalars, that initial `np.asarray(...)`
> > > call 
> > > may lose the information that a python integer was passed in. So
> > > to 
> > > support such things, we might need a whole new machinery. 
> > > 
> > > 
> > > 
> > > 
> > > > All the best, 
> > > > 
> > > > Marten 
> > > > 
> > > > On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg < 
> > > > sebastian at sipsolutions.net> wrote: 
> > > > > Hi all, 
> > > > > 
> > > > > strange, something went wrong sending that email, but in any 
> > > > > case... 
> > > > > 
> > > > > I tried to "summarize" the current behaviour of promotion
> > > > > and 
> > > > > value 
> > > > > based promotion in numpy (correcting a small error in what I 
> > > > > wrote 
> > > > > earlier). Since it got a bit long, you can find it here
> > > > > (also 
> > > > > copy 
> > > > > pasted at the end): 
> > > > > 
> > > > > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA 
> > > > > 
> > > > > Allan's document which I link in there is also very
> > > > > interesting. 
> > > > > One 
> > > > > thing I had not really thought about before was the problem
> > > > > of 
> > > > > commutativity. 
> > > > > 
> > > > > I do not have any specific points I want to discuss based on
> > > > > it 
> > > > > (but 
> > > > > those are likely to come up again later). 
> > > > > 
> > > > > All the Best, 
> > > > > 
> > > > > Sebastian 
> > > > > 
> > > > > 
> > > > > ----------------------------- 
> > > > > 
> > > > > PS: Below a copy of what I wrote: 
> > > > > 
> > > > > --- 
> > > > > title: Numpy Value Based Promotion Rules 
> > > > > author: Sebastian Berg 
> > > > > --- 
> > > > > 
> > > > > 
> > > > > 
> > > > > NumPy Value Based Scalar Casting and Promotion 
> > > > > ============================================== 
> > > > > 
> > > > > This document reviews some of the behaviours of the
> > > > > promotion 
> > > > > rules 
> > > > > within numpy. This is especially with respect to the
> > > > > promotion of 
> > > > > scalars and 0D arrays which inspect the value to decide
> > > > > casting 
> > > > > and 
> > > > > promotion. 
> > > > > 
> > > > > Other documents discussing these things: 
> > > > > 
> > > > > * `from numpy.testing import print_coercion_tables` prints
> > > > > the 
> > > > > current promotion tables including value based promotion for 
> > > > > small 
> > > > > positive/negative scalars. 
> > > > > * Allan Haldane's thoughts on changing casting/promotion to
> > > > > be 
> > > > > more 
> > > > > C-like and discussing things such as here: 
> > > > > 
> > > > > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76 
> > > > > * Discussion around the problem of uint64 and int64 being 
> > > > > promoted to 
> > > > > float64: https://github.com/numpy/numpy/issues/12525 (lists
> > > > > many 
> > > > > related issues). 
> > > > > 
> > > > > 
> > > > > Nomenclature and Defintions 
> > > > > --------------------------- 
> > > > > 
> > > > > * **dtype/type**: The data type of an array or scalar:
> > > > > `float32`, 
> > > > > `float64`, `int8`, ? 
> > > > > 
> > > > > * **Category**: A category to which the data type belongs,
> > > > > in 
> > > > > this 
> > > > > context these are: 
> > > > > 1. boolean 
> > > > > 2. integer (unsigned and signed are not split up here, but
> > > > > are 
> > > > > different "kinds") 
> > > > > 3. floating point and complex (not split up here but are 
> > > > > different 
> > > > > "kinds") 
> > > > > 5. All others 
> > > > > 
> > > > > * **Casting**: converting from one dtype to another. There
> > > > > are 
> > > > > four 
> > > > > different rules of casting: 
> > > > > 1. *"safe"* casting: All values are representable in the new 
> > > > > data 
> > > > > type. I.e. no information is lost during the conversion. 
> > > > > 2. *"same kind"* casting: data loss may occur, but only
> > > > > within 
> > > > > the 
> > > > > same "kind". For example a float64 can be converted to
> > > > > float32 
> > > > > using 
> > > > > "same kind" rules, an int64 can be converted to int16. This
> > > > > is 
> > > > > although 
> > > > > both lose precision or even produce incorrect values. Note
> > > > > that 
> > > > > "kind" 
> > > > > is different from "category" in that it distinguishes
> > > > > between 
> > > > > signed 
> > > > > and unsigned integers. 
> > > > > 4. *"unsafe"* casting: Any conversion which can be defined, 
> > > > > e.g. 
> > > > > floating point to integer. For promotion this is fairly 
> > > > > unimportant. 
> > > > > (Some conversions such as string to integer, which not even
> > > > > work 
> > > > > fall 
> > > > > in this category, but could also be called coercions or 
> > > > > conversions.) 
> > > > > 
> > > > > * **Promotion**: The general process of finding a new dtype
> > > > > for 
> > > > > multiple input dtypes. Will be used here to also denote any
> > > > > kind 
> > > > > of 
> > > > > casting/promotion done before a specific function is called.
> > > > > This 
> > > > > can 
> > > > > be more complex, because in rare cases a functions can for 
> > > > > example 
> > > > > take 
> > > > > floating point numbers and integers as input at the same
> > > > > time 
> > > > > (i.e. 
> > > > > `np.ldexp`). 
> > > > > 
> > > > > * **Common dtype**: A dtype which can represent all input
> > > > > data. 
> > > > > In 
> > > > > general this means that all inputs can be safely cast to
> > > > > this 
> > > > > dtype. 
> > > > > Within numpy this is the normal and simplest form of
> > > > > promotion. 
> > > > > 
> > > > > * **`type1, type2 -> type3`**: Defines a promotion or
> > > > > signature. 
> > > > > For 
> > > > > example adding two integers: `np.int32(5) + np.int32(3)`
> > > > > gives 
> > > > > `np.int32(8)`. The dtype signature for that example would
> > > > > be: 
> > > > > `int32, 
> > > > > int32 -> int32`. A short form for this is also `ii->i` using
> > > > > C- 
> > > > > like 
> > > > > type codes, this can be found for example in
> > > > > `np.ldexp.types` 
> > > > > (and 
> > > > > any 
> > > > > numpy ufunc). 
> > > > > 
> > > > > * **Scalar**: A numpy or python scalar or a **0-D array**. It
> > > > > is 
> > > > > important to remember that zero dimensional arrays are
> > > > > treated 
> > > > > just 
> > > > > like scalars with respect to casting and promotion. 
> > > > > 
> > > > > 
> > > > > Current Situation in Numpy 
> > > > > -------------------------- 
> > > > > 
> > > > > The current situation can be understand mostly in terms of
> > > > > safe 
> > > > > casting 
> > > > > which is defined based on the type hierarchy and is sensitive
> > > > > to 
> > > > > values 
> > > > > for scalars. 
> > > > > 
> > > > > This safe casting based approach is in contrast for example
> > > > > to 
> > > > > promotion within C or Julia, which work based on category
> > > > > first. 
> > > > > For 
> > > > > example `int32` cannot be safely cast to `float32`, but C or 
> > > > > Julia 
> > > > > will 
> > > > > use `int32, float32 -> float32` as the common type/promotion
> > > > > rule 
> > > > > for 
> > > > > example to decide on the output dtype for addition. 
> > > > > 
> > > > > 
> > > > > ### Python Integers and Floats 
> > > > > 
> > > > > Note that python integers are handled exactly like numpy
> > > > > ones. 
> > > > > They 
> > > > > are, however, special in that they do not have a dtype
> > > > > associated 
> > > > > with 
> > > > > them explicitly. Value based logic, as described here, seems 
> > > > > useful 
> > > > > for 
> > > > > python integers and floats to allow: 
> > > > > ``` 
> > > > > arr = np.arange(10, dtype=np.int8) 
> > > > > arr += 1 
> > > > > # or: 
> > > > > res = arr + 1 
> > > > > res.dtype == np.int8 
> > > > > ``` 
> > > > > which ensures that no upcast (for example with higher memory 
> > > > > usage) 
> > > > > occurs. 
> > > > > 
> > > > > 
> > > > > ### Safe Casting 
> > > > > 
> > > > > Most safe casting is clearly defined based on whether or not
> > > > > any 
> > > > > possible value is representable in the ouput dtype. Within
> > > > > numpy 
> > > > > there 
> > > > > is currently a single exception to this rule: 
> > > > > `np.can_cast(np.int64, 
> > > > > np.float64, casting="safe")` is considered to be true
> > > > > although 
> > > > > float64 
> > > > > cannot represent some large integer values exactly. In
> > > > > contrast, 
> > > > > `np.can_cast(np.int32, np.float32, casting="safe")` is
> > > > > `False` 
> > > > > and 
> > > > > `np.float64` would have to be used if a "safe" cast is
> > > > > desired. 
> > > > > 
> > > > > This exception may be one thing that should be changed,
> > > > > however, 
> > > > > concurrently the promotion rules have to be adapted to keep
> > > > > doing 
> > > > > the 
> > > > > same thing, or a larger behaviour change decided. 
> > > > > 
> > > > > 
> > > > > #### Scalar based rules 
> > > > > 
> > > > > Unlike arrays, where inspection of all values is not
> > > > > feasable, 
> > > > > for 
> > > > > scalars (and 0-D arrays) the value is inspected. The casting 
> > > > > becomes a 
> > > > > two step process: 
> > > > > 1. The minimal dtype capable of holding the value is found. 
> > > > > 2. The normal casting rules are applied to the new dtype. 
> > > > > 
> > > > > The first step uses the following rules by finding the
> > > > > minimal 
> > > > > dtype 
> > > > > within its category: 
> > > > > 
> > > > > * Boolean: Dtype is already minimal 
> > > > > 
> > > > > * Integers: 
> > > > > Casting is possible if output can hold the value. This 
> > > > > includes 
> > > > > uint8(127) casting to an int8. 
> > > > > 
> > > > > * Floats and Complex 
> > > > > Scalars can be demoted based on value, roughly this avoids 
> > > > > overflows: 
> > > > > ``` 
> > > > > float16: -65000 < value < 65000 
> > > > > float32: -3.4e38 < value < 3.4e38 
> > > > > float64: -1.7e308 < value < 1.7e308 
> > > > > float128 (largest type, does not apply). 
> > > > > ``` 
> > > > > For complex, the logic is simply applied to both real and 
> > > > > imaginary 
> > > > > part. Complex numbers cannot be downcast to floating point. 
> > > > > 
> > > > > * Others: Dtype is not modified. 
> > > > > 
> > > > > 
> > > > > This two step process means that
> > > > > `np.can_cast(np.int16(1024), 
> > > > > np.float16)` is `False` even though float16 is capable of
> > > > > exactly 
> > > > > representing the value 1024, since value based "demotion" to
> > > > > a 
> > > > > lower 
> > > > > dtype is used only within each category. 
> > > > > 
> > > > > 
> > > > > 
> > > > > ### Common Type Promotion 
> > > > > 
> > > > > For most operations in numpy the output type is just the
> > > > > common 
> > > > > type of 
> > > > > the inputs, this holds for example for concatenation, as well
> > > > > as 
> > > > > almost 
> > > > > all math funcions (e.g. addition and multiplication have two 
> > > > > identical 
> > > > > inputs and need one ouput dtype). This operation is exposed
> > > > > as 
> > > > > `np.result_type` which includes value based logic, and 
> > > > > `np.promote_types` which only accepts dtypes as input. 
> > > > > 
> > > > > Normal type promotion without value based/scalar logic finds
> > > > > the 
> > > > > smallest type which both inputs can cast to safely. This will
> > > > > be 
> > > > > the 
> > > > > largest "kind" (bool < unsigned < integer < float < complex
> > > > > < 
> > > > > other). 
> > > > > 
> > > > > Note that type promotion is handled in a "reduce" manner
> > > > > from 
> > > > > left 
> > > > > to 
> > > > > right. In rare cases this means it is not associatetive: 
> > > > > `float32, 
> > > > > uint16, int16 -> float32`, but `float32, (uint16, int16) -> 
> > > > > float64`. 
> > > > > 
> > > > > #### Scalar based rule 
> > > > > 
> > > > > When there is a mix of scalars and arrays, numpy will
> > > > > usually 
> > > > > allow 
> > > > > the 
> > > > > scalars to be handled in the same fashion as for "safe"
> > > > > casting 
> > > > > rules. 
> > > > > 
> > > > > The rules are as follows: 
> > > > > 
> > > > > 1. Value based logic is only applied if the "category" of
> > > > > any 
> > > > > array 
> > > > > is 
> > > > > larger or equal to the category of all scalars. If this is
> > > > > not 
> > > > > the 
> > > > > case, the typical rules are used. 
> > > > > * Specifically, this means: `np.array([1, 2, 3], 
> > > > > dtype=np.uint8) + 
> > > > > np.float64(12.)` gives a `float64` result, because the 
> > > > > `np.float64(12.)` is not considered for being demoted. 
> > > > > 
> > > > > 2. Promotion is applied as normally, however, instead of the 
> > > > > original 
> > > > > dtype, the minimal dtype is used. In the case where the
> > > > > minimal 
> > > > > data 
> > > > > type is unsigned (say uint8) but the value is small enough,
> > > > > the 
> > > > > minimal 
> > > > > type may in fact be either `uint8` or `int8` (127 can be
> > > > > both). 
> > > > > This 
> > > > > promotion is also applied in pairs (reduction-like) from left
> > > > > to 
> > > > > right. 
> > > > > 
> > > > > 
> > > > > ### General Promotion during Function Execution 
> > > > > 
> > > > > General functions (read "ufuncs" such as `np.add`) may have
> > > > > a 
> > > > > specific 
> > > > > dtype signature which is (for most dtypes) stored e.g. as 
> > > > > `np.add.types`. For many of these functions the common type 
> > > > > promotion 
> > > > > is used unchanged. 
> > > > > 
> > > > > However, some functions will employ a slightly different
> > > > > method 
> > > > > (which 
> > > > > should be equivalent in most cases). They will loop through
> > > > > all 
> > > > > loops 
> > > > > listed in `np.add.types` in order and find the first one to
> > > > > which 
> > > > > all 
> > > > > inputs can be safely cast: 
> > > > > ``` 
> > > > > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...] 
> > > > > ``` 
> > > > > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the 
> > > > > first 
> > > > > `float16, float16 -> float16` (`'ee->e'`) loop because
> > > > > `int16` 
> > > > > cannot 
> > > > > be cast safely, and then pick the float32 (`'ff->f'`) one. 
> > > > > 
> > > > > For simple functions, which commonly have two identical
> > > > > inputs, 
> > > > > this 
> > > > > should be identical, since normally a clear order exists for
> > > > > the 
> > > > > dtypes 
> > > > > (it does require checking int8 before uint8, etc.). 
> > > > > 
> > > > > #### Scalar based rule 
> > > > > 
> > > > > When scalars are involved, the "safe" cast logic based on
> > > > > values 
> > > > > is 
> > > > > applied *if and only if* rule 1. applies as before: That is
> > > > > there 
> > > > > must 
> > > > > be an array with a higher or equal category as all of the 
> > > > > scalars. 
> > > > > 
> > > > > In the above `np.divide` example, this means that 
> > > > > `np.divide(np.int16(4), np.array([3], dtype=np.float16))`
> > > > > *will* 
> > > > > use 
> > > > > the `'ee->e'` loop, because the scalar `4` is of a lower or
> > > > > equal 
> > > > > category than the array (integer <= float or complex). While 
> > > > > checking, 
> > > > > 4 is found to be safely castable to float16, since `(u)int8`
> > > > > is 
> > > > > sufficient to hold 4 and that can be safely cast to
> > > > > `float16`. 
> > > > > However, `np.divide(np.int16(4), np.int16(3))` would use 
> > > > > `float32` 
> > > > > because both are scalars and thus value based logic is not
> > > > > used 
> > > > > (Note 
> > > > > that in reality numpy forces double output for an all
> > > > > integer 
> > > > > input 
> > > > > in 
> > > > > divide). 
> > > > > 
> > > > > In it is possible for ufuncs to have mixed type signatures
> > > > > (this 
> > > > > is 
> > > > > very rare within numy) and arbitrary inputs. In this case,
> > > > > in 
> > > > > principle, the question is whether or not a clear ordering
> > > > > exists 
> > > > > and 
> > > > > if the rule of using value based logic is always clear. This
> > > > > is 
> > > > > rather 
> > > > > academical (I could not find any such function in numpy or 
> > > > > `scipy.special` [^scipy-ufuncs]). But consider: 
> > > > > ``` 
> > > > > imaginary_ufunc.types: 
> > > > > int32, float32 -> int32, float32 
> > > > > int64, float32 -> int64, float32 
> > > > > ... 
> > > > > ``` 
> > > > > it is not clear that `np.int64(5) + np.float32(3.)` should
> > > > > be 
> > > > > able 
> > > > > to 
> > > > > demote the `5`. This is very theoretical of course 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > Footnotes 
> > > > > --------- 
> > > > > 
> > > > > [^scipy-ufuncs]: See for example these functions: 
> > > > > ```python 
> > > > > import scipy.special 
> > > > > for n, func in scipy.special.__dict__.items(): 
> > > > > if not isinstance(func, np.ufunc): 
> > > > > continue 
> > > > > 
> > > > > if func.nin == 1: 
> > > > > # a single input is not interesting 
> > > > > continue 
> > > > > 
> > > > > # check if the signature is not uniform 
> > > > > for types in func.types: 
> > > > > if len(set(types[:func.nin])) != 1: 
> > > > > break 
> > > > > else: 
> > > > > continue 
> > > > > print(func, func.types) 
> > > > > ``` 
> > > > > _______________________________________________ 
> > > > > NumPy-Discussion mailing list 
> > > > > NumPy-Discussion at python.org 
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion 
> > > >  
> > > > _______________________________________________ 
> > > > NumPy-Discussion mailing list 
> > > > NumPy-Discussion at python.org 
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion 
> > >  
> > > _______________________________________________ 
> > > NumPy-Discussion mailing list 
> > > NumPy-Discussion at python.org 
> > > https://mail.python.org/mailman/listinfo/numpy-discussion 
> > _______________________________________________ 
> > NumPy-Discussion mailing list 
> > NumPy-Discussion at python.org 
> > https://mail.python.org/mailman/listinfo/numpy-discussion 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/2a256968/attachment-0001.sig>

From m.h.vankerkwijk at gmail.com  Thu Jun 13 20:21:04 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Thu, 13 Jun 2019 20:21:04 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CABL7CQi0wjqgmjaa8Z8qVTvuPNNsXFC8Wki+r0Wacpd30gisqA@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
 <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
 <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>
 <CABL7CQiOd+fXF+F-r2gr4-GgnU_22WFF2x_ys_OTSYOwi2ob7w@mail.gmail.com>
 <CAJNV+9tnZ0PR+OvXPxWi7_j7i4xyy65TUx=56Cygcf34Lktg2w@mail.gmail.com>
 <CABL7CQi0wjqgmjaa8Z8qVTvuPNNsXFC8Wki+r0Wacpd30gisqA@mail.gmail.com>
Message-ID: <CAJNV+9v5X63-s2jWqo+n+Rk8JNYQjxeV=YGXJ_H_WeWh1wqH2g@mail.gmail.com>

Hi Ralf,

Thanks both for the reply and sharing the link. I recognize much (from both
sides!).

<snip>

>
> More importantly, I think we should not even consider *discussing*
> removing` __array_function__` from np.isposinf (or any similar one off
> situation) before there's a new bigger picture design. This is not about
> "deprecation is hard", this is about doing things with purpose rather than
> ad-hoc, as well as recognizing that lots of small changes are a major drain
> on our limited maintainer resources. About the latter issue I wrote a blog
> post recently, perhaps that clarifies the previous sentence a bit and gives
> some more insight in my perspective:
> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>
> Yes, I definitely did not mean to imply that a goal was to change just
`isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
was to clean up the whole `nanfunctions` module). Rather, to use them as
examples to see what policy there actually is or should be; and I do worry
that with __array_function__, however happy I am that it now exists
(finally, Quantity can be concatenated!), we're going the Microsoft route
of just layering on top of old code if even for the simplest cases there is
no obvious path for how to remove it.

Anyway, this discussion is probably gotten beyond the point where much is
added. I'll close the `nansum` PR.

All the best,

Marten


p.s. I would say that deprecations within numpy currently *are* hard. E.g.,
the rate at which we add `DEPRECATED` to the C code is substantially larger
than that with which we actually remove any long-deprecated behaviour.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/87bd5548/attachment.html>

From charlesr.harris at gmail.com  Thu Jun 13 21:00:22 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 13 Jun 2019 19:00:22 -0600
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9v5X63-s2jWqo+n+Rk8JNYQjxeV=YGXJ_H_WeWh1wqH2g@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
 <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
 <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>
 <CABL7CQiOd+fXF+F-r2gr4-GgnU_22WFF2x_ys_OTSYOwi2ob7w@mail.gmail.com>
 <CAJNV+9tnZ0PR+OvXPxWi7_j7i4xyy65TUx=56Cygcf34Lktg2w@mail.gmail.com>
 <CABL7CQi0wjqgmjaa8Z8qVTvuPNNsXFC8Wki+r0Wacpd30gisqA@mail.gmail.com>
 <CAJNV+9v5X63-s2jWqo+n+Rk8JNYQjxeV=YGXJ_H_WeWh1wqH2g@mail.gmail.com>
Message-ID: <CAB6mnxKL+_DeBB=FCihz7H7e4wp7+pnb3+qGD=1G7zvWZ638hQ@mail.gmail.com>

On Thu, Jun 13, 2019 at 6:21 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf,
>
> Thanks both for the reply and sharing the link. I recognize much (from
> both sides!).
>
> <snip>
>
>>
>> More importantly, I think we should not even consider *discussing*
>> removing` __array_function__` from np.isposinf (or any similar one off
>> situation) before there's a new bigger picture design. This is not about
>> "deprecation is hard", this is about doing things with purpose rather than
>> ad-hoc, as well as recognizing that lots of small changes are a major drain
>> on our limited maintainer resources. About the latter issue I wrote a blog
>> post recently, perhaps that clarifies the previous sentence a bit and gives
>> some more insight in my perspective:
>> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>>
>> Yes, I definitely did not mean to imply that a goal was to change just
> `isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
> was to clean up the whole `nanfunctions` module). Rather, to use them as
> examples to see what policy there actually is or should be; and I do worry
> that with __array_function__, however happy I am that it now exists
> (finally, Quantity can be concatenated!), we're going the Microsoft route
> of just layering on top of old code if even for the simplest cases there is
> no obvious path for how to remove it.
>
> Anyway, this discussion is probably gotten beyond the point where much is
> added. I'll close the `nansum` PR.
>
> All the best,
>
> Marten
>
>
> p.s. I would say that deprecations within numpy currently *are* hard.
> E.g., the rate at which we add `DEPRECATED` to the C code is substantially
> larger than that with which we actually remove any long-deprecated
> behaviour.
>
>

That's been true for the last couple of releases, but not historically. The
main problem was lack of time compounded by a  FutureWarning that got stuck
because it broke code in unexpected ways when brought to fruition. For 1.16
I decided that expiring spent deprecations wasn't worth the churn. At this
point I see 1.18 as the release that starts the cleanup, and possibly the
start of removing the Python 2.7 compatibility code.

Speaking of 1.18, the other thing I'd like to see is discussion of a masked
array replacement. I think the basic infrastructure is now in place for
that. Hmm... should probably make a separate post eliciting ideas for 1.18.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/0f408523/attachment.html>

From charlesr.harris at gmail.com  Thu Jun 13 21:09:21 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 13 Jun 2019 19:09:21 -0600
Subject: [Numpy-discussion] Planning for 1.18
Message-ID: <CAB6mnxLnTc9rT-GsNSLuvizU5FCY3Gn30ziz82Y+-2hCV_W6vQ@mail.gmail.com>

Hi All,

With the 1.17 branch coming soon, this might be a good time to make plans
about 1.18 development. A couple of possibilities are:


   - Expiring old deprecations,
   - Removing Python 2.7 compatibility code,
   - Design of a masked array replacement.

Those proposals are not earth shaking, I see 1.18 as a chance to let the
big changes coming in 1.17 settle down. If you have more ideas, please add
them to the list.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/a07578b6/attachment.html>

From ralf.gommers at gmail.com  Fri Jun 14 03:45:51 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 14 Jun 2019 09:45:51 +0200
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CAJNV+9v5X63-s2jWqo+n+Rk8JNYQjxeV=YGXJ_H_WeWh1wqH2g@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
 <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
 <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>
 <CABL7CQiOd+fXF+F-r2gr4-GgnU_22WFF2x_ys_OTSYOwi2ob7w@mail.gmail.com>
 <CAJNV+9tnZ0PR+OvXPxWi7_j7i4xyy65TUx=56Cygcf34Lktg2w@mail.gmail.com>
 <CABL7CQi0wjqgmjaa8Z8qVTvuPNNsXFC8Wki+r0Wacpd30gisqA@mail.gmail.com>
 <CAJNV+9v5X63-s2jWqo+n+Rk8JNYQjxeV=YGXJ_H_WeWh1wqH2g@mail.gmail.com>
Message-ID: <CABL7CQhpr+mT7n0F64H0X2ZgzzhAMtJSzZw=pNpxz2TfOpKnCw@mail.gmail.com>

On Fri, Jun 14, 2019 at 2:21 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf,
>
> Thanks both for the reply and sharing the link. I recognize much (from
> both sides!).
>
> <snip>
>
>>
>> More importantly, I think we should not even consider *discussing*
>> removing` __array_function__` from np.isposinf (or any similar one off
>> situation) before there's a new bigger picture design. This is not about
>> "deprecation is hard", this is about doing things with purpose rather than
>> ad-hoc, as well as recognizing that lots of small changes are a major drain
>> on our limited maintainer resources. About the latter issue I wrote a blog
>> post recently, perhaps that clarifies the previous sentence a bit and gives
>> some more insight in my perspective:
>> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>>
>> Yes, I definitely did not mean to imply that a goal was to change just
> `isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
> was to clean up the whole `nanfunctions` module). Rather, to use them as
> examples to see what policy there actually is or should be; and I do worry
> that with __array_function__, however happy I am that it now exists
> (finally, Quantity can be concatenated!), we're going the Microsoft route
> of just layering on top of old code if even for the simplest cases there is
> no obvious path for how to remove it.
>

I share that worry to some extent (an ever-growing collection of
protocols). To be fair, we knew that __array_function__ wasn't perfect, but
I think Stephan did a really good job making the case for why it was
necessary, and that we needed to get something in place in 6-12 months
rather than spending years on a more polished/comprehensive design. Given
those constraints, we seem to have made a good choice that has largely
achieved its goals.

I'm still not sure you got my main objection. So I'll try once more.
We now have a design, imperfect but it exists and is documented (to some
extent). Let's call it design A. Now you're wanting clarity on a policy on
how to move from design A to design B. However, what B even is isn't
spelled out, although we can derive rough outlines from mailing list
threads like these (it involves better handling of subclasses, allowing
reuse of implementation of numpy functions in terms of other numpy
functions, etc.). The right way forward is:

1. describe what design B is
2. decide if that design is a good idea
3. then worry about implementation and a migration policy

Something that's specific to all nan-functions is still way too specific,
and doesn't justify skipping 1-2 and jumping straight to 3.

I don't know how to express it any clearer than the above in email format.
If it doesn't make sense to you, it'd be great to have a call to discuss in
person.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190614/1ee25178/attachment-0001.html>

From sebastian at sipsolutions.net  Fri Jun 14 11:51:21 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 14 Jun 2019 10:51:21 -0500
Subject: [Numpy-discussion] Planning for 1.18
In-Reply-To: <CAB6mnxLnTc9rT-GsNSLuvizU5FCY3Gn30ziz82Y+-2hCV_W6vQ@mail.gmail.com>
References: <CAB6mnxLnTc9rT-GsNSLuvizU5FCY3Gn30ziz82Y+-2hCV_W6vQ@mail.gmail.com>
Message-ID: <5d2ebb94defb83da394b7d6ed9bae7eb795e95ad.camel@sipsolutions.net>

On Thu, 2019-06-13 at 19:09 -0600, Charles R Harris wrote:
> Hi All,
> 
> With the 1.17 branch coming soon, this might be a good time to make
> plans about 1.18 development. A couple of possibilities are:
> 
> Expiring old deprecations,

Good plan.

> Removing Python 2.7 compatibility code,

Sounds good, are we confident enough that Backports should become few
enough, I guess?

> Design of a masked array replacement.

It is a great time for that if anyone wants to pick that up!

> Those proposals are not earth shaking, I see 1.18 as a chance to let
> the big changes coming in 1.17 settle down. If you have more ideas,
> please add them to the list.

The dtypes will move forward. Maybe we have to see how 1.17 goes. I
expect that most users should not notice that much, but it will mean
touching quite a lot of code. I am slightly worried that it collides
with the "settle down" of 1.17 changes.

I may revive oindex/vindex, but that is nothing specific to 1.18 and
should be a safe addition with respect to compatibility/breaking
things.

- Sebastian


> 
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190614/bbfe3c35/attachment.sig>

From m.h.vankerkwijk at gmail.com  Fri Jun 14 22:10:53 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Fri, 14 Jun 2019 22:10:53 -0400
Subject: [Numpy-discussion] Extent to which to work around matrix and
 other duck/subclass limitations
In-Reply-To: <CABL7CQhpr+mT7n0F64H0X2ZgzzhAMtJSzZw=pNpxz2TfOpKnCw@mail.gmail.com>
References: <CAJNV+9vFCfKownjOKwk7Hsx2ayCSLbjLhy_Ybdv7OVtMo0x3AQ@mail.gmail.com>
 <CABL7CQjgpNAaPgBJU5BSOLzzE3KwEvP1cAEQFDaK4BNZ__PsAg@mail.gmail.com>
 <6f28a539c09eec7543a618ac9a6deabb93edb99c.camel@sipsolutions.net>
 <CAJNV+9vCxnxtBCck1eJ9BaZDmFi-N_6Gy__hNQ4QUvtO2LLn5g@mail.gmail.com>
 <20190611220138.ldkqy4ik3igsb6ik@carbo>
 <CABL7CQiUTSS2TmVwX5n9VyOYVyJdtjcGsf66RkO+=GackG4RhQ@mail.gmail.com>
 <CAJNV+9vy-whOaYv2=po96i_yP1As53tWh84DHvPhtnQ510qPDg@mail.gmail.com>
 <CAEQ_TveSs-uyYUWQLwp-0i+MiTaE0iHRRHSnZ_Jm_vzCf25wYA@mail.gmail.com>
 <CABL7CQh7DhnR19_jXOFwtx=3c+qqJb5TRsyeA61gQPNH_iNsBg@mail.gmail.com>
 <CAJNV+9v6C-r2VcgKMybApwPd5ANXu-td2zmDwcWBZXTaTWJ=pw@mail.gmail.com>
 <CABL7CQgQW2zJRpkMjZS3XYZvzh6gTFDEYux0vp0-SrHRR_dPGw@mail.gmail.com>
 <CAJNV+9uUjhc2paTAVbPksMfWmyvCBO38eb2xbJfPO9Z38WYcoA@mail.gmail.com>
 <CAEQ_TveYmjkHKzL-+_FZNgE+3uhSB73rU+BM2pZ_VHgw1i5k0w@mail.gmail.com>
 <CAJNV+9u7aqgKPh=K1BcENm9gmUY4pf4WOwpB-FVqK=_qtGMC0Q@mail.gmail.com>
 <CABL7CQiOd+fXF+F-r2gr4-GgnU_22WFF2x_ys_OTSYOwi2ob7w@mail.gmail.com>
 <CAJNV+9tnZ0PR+OvXPxWi7_j7i4xyy65TUx=56Cygcf34Lktg2w@mail.gmail.com>
 <CABL7CQi0wjqgmjaa8Z8qVTvuPNNsXFC8Wki+r0Wacpd30gisqA@mail.gmail.com>
 <CAJNV+9v5X63-s2jWqo+n+Rk8JNYQjxeV=YGXJ_H_WeWh1wqH2g@mail.gmail.com>
 <CABL7CQhpr+mT7n0F64H0X2ZgzzhAMtJSzZw=pNpxz2TfOpKnCw@mail.gmail.com>
Message-ID: <CAJNV+9sjkpK-uAWVu2CO6TfmsOPssCkOeomhAK_KOHpATLysXg@mail.gmail.com>

Hi Ralf,

Thanks for the clarification. I think in your terms the bottom line was
that I thought we had a design B for the case where a function was really
"just a ufunc". But the nanfunctions show that even if logically they are a
ufunc (which admittedly uses another ufunc or two for `where`), it is
tricky, certainly trickier than I thought. And this discussion has served
to clarify that even for other "simpler" functions it can get similarly
tricky.

Anyway, bottom line is that I think you are right in actually needed a more
proper discussion/design about how to move forward.

All the best,

Marten

p.s. And, yes, `__array_function__` is quite wonderful!

On Fri, Jun 14, 2019 at 3:46 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Fri, Jun 14, 2019 at 2:21 AM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Hi Ralf,
>>
>> Thanks both for the reply and sharing the link. I recognize much (from
>> both sides!).
>>
>> <snip>
>>
>>>
>>> More importantly, I think we should not even consider *discussing*
>>> removing` __array_function__` from np.isposinf (or any similar one off
>>> situation) before there's a new bigger picture design. This is not about
>>> "deprecation is hard", this is about doing things with purpose rather than
>>> ad-hoc, as well as recognizing that lots of small changes are a major drain
>>> on our limited maintainer resources. About the latter issue I wrote a blog
>>> post recently, perhaps that clarifies the previous sentence a bit and gives
>>> some more insight in my perspective:
>>> https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/
>>>
>>> Yes, I definitely did not mean to imply that a goal was to change just
>> `isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread
>> was to clean up the whole `nanfunctions` module). Rather, to use them as
>> examples to see what policy there actually is or should be; and I do worry
>> that with __array_function__, however happy I am that it now exists
>> (finally, Quantity can be concatenated!), we're going the Microsoft route
>> of just layering on top of old code if even for the simplest cases there is
>> no obvious path for how to remove it.
>>
>
> I share that worry to some extent (an ever-growing collection of
> protocols). To be fair, we knew that __array_function__ wasn't perfect, but
> I think Stephan did a really good job making the case for why it was
> necessary, and that we needed to get something in place in 6-12 months
> rather than spending years on a more polished/comprehensive design. Given
> those constraints, we seem to have made a good choice that has largely
> achieved its goals.
>
> I'm still not sure you got my main objection. So I'll try once more.
> We now have a design, imperfect but it exists and is documented (to some
> extent). Let's call it design A. Now you're wanting clarity on a policy on
> how to move from design A to design B. However, what B even is isn't
> spelled out, although we can derive rough outlines from mailing list
> threads like these (it involves better handling of subclasses, allowing
> reuse of implementation of numpy functions in terms of other numpy
> functions, etc.). The right way forward is:
>
> 1. describe what design B is
> 2. decide if that design is a good idea
> 3. then worry about implementation and a migration policy
>
> Something that's specific to all nan-functions is still way too specific,
> and doesn't justify skipping 1-2 and jumping straight to 3.
>
> I don't know how to express it any clearer than the above in email format.
> If it doesn't make sense to you, it'd be great to have a call to discuss in
> person.
>
> Cheers,
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190614/06b98872/attachment.html>

From matti.picus at gmail.com  Sat Jun 15 13:58:13 2019
From: matti.picus at gmail.com (Matti Picus)
Date: Sat, 15 Jun 2019 20:58:13 +0300
Subject: [Numpy-discussion] (Value Based Promotion) Current Behaviour
In-Reply-To: <e97b4dc6904cd1c6d1d5e2e5ee839206a3e2d12d.camel@sipsolutions.net>
References: <f330a2656aab221b9b96afeb24908b6a1ce1a246.camel@sipsolutions.net>
 <296297cac0ffc34034512499b1b628240356034a.camel@sipsolutions.net>
 <CAJNV+9vkJtnPZ83KbXhExHd-93vk3+H-aAa38pqPtHqcNbdHzw@mail.gmail.com>
 <b1bb5d74e179c891cefff04bdd18f2b48dafc4ba.camel@sipsolutions.net>
 <4ac1b140e57cb0c6a49cbef7976e74fb3f55d76a.camel@sipsolutions.net>
 <1e039ecb-10bb-4a30-a4d6-eceb37cce76b@Canary>
 <e97b4dc6904cd1c6d1d5e2e5ee839206a3e2d12d.camel@sipsolutions.net>
Message-ID: <e6e18f85-7a8d-092a-fa2c-877d37ea9397@gmail.com>

On 14/6/19 1:37 am, Sebastian Berg wrote:
> For such a scalar object instead what would seem necessary is to call a
> `dtype.__coerce_pyvalue__(scalar, casting="safe")`, or a
> `__can_coerce_pyvalue__` method/slot. It would replace the current
> `PyArray_CanCastArrayTo`, which can only handle the current hardcoded
> special "minimum value" rules.


This makes sense to me since it makes the problem explicit, rather than 
trying to generalize for some properties.

I would suggest changing the first argument from "scalar" to "obj" to 
indicate it is not necessarily a np.scalar but could be any non-ndarray 
object, although I would exclude sequences.


Matti


From allanhaldane at gmail.com  Mon Jun 17 18:43:05 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Mon, 17 Jun 2019 18:43:05 -0400
Subject: [Numpy-discussion] new MaskedArray class
Message-ID: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>

Hi all,

Chuck suggested we think about a MaskedArray replacement for 1.18.

A few months ago I did some work on a MaskedArray replacement using
`__array_function__`, which I got mostly working. It seems like a good
time to bring it up for discussion now. See it at:

https://github.com/ahaldane/ndarray_ducktypes

It should be very usable, it has docs you can read, and it passes a
pytest-suite with 1024 tests adapted from numpy's MaskedArray tests.
What is missing? It needs even more tests for new functionality, and a
couple numpy-API functions are missing, in particular `np.median`,
`np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could
also find many things to improve too.

Besides fixing many little annoyances from MaskedArray, and simplifying
the logic by always storing the mask in full, it also has new features.
For instance it allows the use of a "X" variable to mark masked
locations during array construction, and I solve the issue of how to
mask individual fields of a structured array differently.

At this point I would by happy to get some feedback on the design and
what seems good or bad. If it seems like a good start, I'd be happy to
move it into a numpy repo of some sort for further collaboration &
discussion, and maybe into 1.18. At the least I hope it can serve as a
design study of what we could do.


Let me also drop here two more interesting detailed issues:

First, the issue of what to do about .real and .imag of complex arrays,
and similarly about field-assignment of structured arrays. The problem
is that we have a single mask bool per element of a complex array, but
what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the
mask of the original array change? Should we make .real and .imag readonly?

Second, a more general issue of how to ducktype scalars when using
`__array_function__` which I think many ducktype implementors will have
to face. For MaskedArray, I created an associated "MaskedScalar" type.
However, MaskedScalar has to behave differently from normal numpy
scalars in a number of ways: It is not part of the numpy scalar
hierarchy, it fails checks `isinstance(var, np.floating)`, and
np.isscalar returns false. Numpy scalar types cannot be subclassed. We
have discussed before the need to have distinction between 0d-arrays and
scalars, so we shouldn't just use a 0d (in fact, this makes printing
very difficult). This leads me to think that in future dtype-overhaul
plans, we should consider creating a subclassable `np.scalar` base type
to wrap all numpy scalar variables, and code like `isinstance(var,
np.floating)` should be replaced by `isinstance(var.dtype.type,
np.floating)` or similar. That is, the numeric dtype of the scalar is no
longer encoded in `type(var)` but in `var.dtype`: The fact that the
variable is a numpy scalar is decoupled from its numeric dtype.

This is useful because there are many "associated" properties of scalars
in common with arrays which have nothing to do with the dtype, which
ducktype implementors want to touch. I imagine this will come up a lot:
In that repo I also have an "ArrayCollection" ducktype which required a
"CollectionScalar" scalar, and similarly I imagine people implementing
units want the units attached to the scalar, independently of the dtype.

Cheers,
Allan


From sebastian at sipsolutions.net  Mon Jun 17 21:30:04 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 17 Jun 2019 18:30:04 -0700
Subject: [Numpy-discussion] NumPy Community Meeting June 19
Message-ID: <24b20c26f38e05d88030fa886939b4f8d6564670.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting on June 12 at 11 am Pacific
Time. Everyone is invited to join in and edit the work-in-progress
meeting notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190617/28f35750/attachment.sig>

From einstein.edison at gmail.com  Mon Jun 17 22:28:54 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Tue, 18 Jun 2019 04:28:54 +0200
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <bf3844ac47c2e0591f2e04bcb447e59b9899eac7.camel@sipsolutions.net>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <bf3844ac47c2e0591f2e04bcb447e59b9899eac7.camel@sipsolutions.net>
Message-ID: <ec3f16e3bb558ac6f852b3baa5a8ad3e4cc125d5.camel@gmail.com>

On Wed, 2019-06-12 at 12:55 -0500, Sebastian Berg wrote:
> On Wed, 2019-06-05 at 15:41 -0500, Sebastian Berg wrote:
> > Hi all,
> > 
> > TL;DR:
> > 
> > Value based promotion seems complex both for users and ufunc-
> > dispatching/promotion logic. Is there any way we can move forward
> > here,
> > and if we do, could we just risk some possible (maybe not-existing)
> > corner cases to break early to get on the way?
> > 
> 
> Hi all,
> 
> just to note. I think I will go forward trying to fill the hole in
> the
> hierarchy with a non-existing uint7 dtype. That seemed like it may be
> ugly, but if it does not escalate too much, it is probably fairly
> straight forward. And it would allow to simplify dispatching without
> any logic change at all. After that we could still decide to change
> the
> logic.

Hi Sebastian!

This seems like the right approach to me as well, I would just add one
additional comment. Earlier on, you mentioned that a lot of "strange"
dtypes will pop up when dealing with floats/ints. E.g. int15, int31,
int63, int52 (for checking double-compat), int23 (single compat), int10
(half compat) and so on and so forth. The lookup table would get tricky
to populate by hand --- It might be worth it to use the logic I
suggested to autogenerate it in some way, or to "determine" the
temporary underspecified type, as Nathaniel proposed in his email to
the list. That is, we store the number of:

* flag (0 for numeric, 1 for non-numeric)
* sign bits (0 for unsigned ints, 1 else)
* integer/fraction bits (self-explanatory)
* exponent bits (self-explanatory)
* Log-Number of items (0 for real, 1 for complex, 2 for quarternion,
etc.) (I propose log because the Cayley-Dickson algebras [1] require a
power of two)

A type is safely castable to another if all of these numbers are
exceeded or met.

This would give us a clean way for registering new numeric types, while
also cleanly hooking into the type system, and solving the casting
scenario. Of course, I'm not proposing we generate the loops for or
provide all these types ourselves, but simply that we allow people to
define dtypes using such a schema. I do worry that we're special-casing 
numbers here, but it is "Num"Py, so I'm also not too worried.

This flexibility would, for example, allow us to easily define a
bfloat16/bcomplex32 type with all the "can_cast" logic in place, even
if people have to register their own casts or loops (and just to be
clear, we error if they are not). It also makes it easy to define loops
for int128 and so on if they come along.

The only open question left here is: What to do with a case like int64
+ uint64. And what I propose is we abandon purity for pragmatism here
and tell ourselves that losing one sign bit is tolerable 90% of the
time, and going to floating-point is probably worse. It's more of a
range-versus-accuracy question, and I would argue that people using
integers expect exactness. Of course, I doubt anyone is actually
relying on the fact that adding two integers produces floating-point
results, and it has been the cause of at least one bug, which
highlights that integers can be used in places where floats cannot. [0]

Hameer Abbasi

[0] https://github.com/numpy/numpy/issues/9982
[1] https://en.wikipedia.org/wiki/Cayley%E2%80%93Dickson_construction

> 
> Best,
> 
> Sebastian
> 
> 
> > -----------
> > 
> > Currently when you write code such as:
> > 
> > arr = np.array([1, 43, 23], dtype=np.uint16)
> > res = arr + 1
> > 
> > Numpy uses fairly sophisticated logic to decide that `1` can be
> > represented as a uint16, and thus for all unary functions (and most
> > others as well), the output will have a `res.dtype` of uint16.
> > 
> > Similar logic also exists for floating point types, where a lower
> > precision floating point can be used:
> > 
> > arr = np.array([1, 43, 23], dtype=np.float32)
> > (arr + np.float64(2.)).dtype  # will be float32
> > 
> > Currently, this value based logic is enforced by checking whether
> > the
> > cast is possible: "4" can be cast to int8, uint8. So the first call
> > above will at some point check if "uint16 + uint16 -> uint16" is a
> > valid operation, find that it is, and thus stop searching. (There
> > is
> > the additional logic, that when both/all operands are scalars, it
> > is
> > not applied).
> > 
> > Note that while it is defined in terms of casting "1" to uint8
> > safely
> > being possible even though 1 may be typed as int64. This logic thus
> > affects all promotion rules as well (i.e. what should the output
> > dtype
> > be).
> > 
> > 
> > There 2 main discussion points/issues about it:
> > 
> > 1. Should value based casting/promotion logic exist at all?
> > 
> > Arguably an `np.int32(3)` has type information attached to it, so
> > why
> > should we ignore it. It can also be tricky for users, because a
> > small
> > change in values can change the result data type.
> > Because 0-D arrays and scalars are too close inside numpy (you will
> > often not know which one you get). There is not much option but to
> > handle them identically. However, it seems pretty odd that:
> >  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
> >  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> > 
> > give a different result.
> > 
> > This is a bit different for python scalars, which do not have a
> > type
> > attached already.
> > 
> > 
> > 2. Promotion and type resolution in Ufuncs:
> > 
> > What is currently bothering me is that the decision what the output
> > dtypes should be currently depends on the values in complicated
> > ways.
> > It would be nice if we can decide which type signature to use
> > without
> > actually looking at values (or at least only very early on).
> > 
> > One reason here is caching and simplicity. I would like to be able
> > to
> > cache which loop should be used for what input. Having value based
> > casting in there bloats up the problem.
> > Of course it currently works OK, but especially when user dtypes
> > come
> > into play, caching would seem like a nice optimization option.
> > 
> > Because `uint8(127)` can also be a `int8`, but uint8(128) it is not
> > as
> > simple as finding the "minimal" dtype once and working with that." 
> > Of course Eric and I discussed this a bit before, and you could
> > create
> > an internal "uint7" dtype which has the only purpose of flagging
> > that
> > a
> > cast to int8 is safe.
> > 
> > I suppose it is possible I am barking up the wrong tree here, and
> > this
> > caching/predictability is not vital (or can be solved with such an
> > internal dtype easily, although I am not sure it seems elegant).
> > 
> > 
> > Possible options to move forward
> > --------------------------------
> > 
> > I have to still see a bit how trick things are. But there are a few
> > possible options. I would like to move the scalar logic to the
> > beginning of ufunc calls:
> >   * The uint7 idea would be one solution
> >   * Simply implement something that works for numpy and all except
> >     strange external ufuncs (I can only think of numba as a
> > plausible
> >     candidate for creating such).
> > 
> > My current plan is to see where the second thing leaves me.
> > 
> > We also should see if we cannot move the whole thing forward, in
> > which
> > case the main decision would have to be forward to where. My
> > opinion
> > is
> > currently that when a type has a dtype associated with it clearly,
> > we
> > should always use that dtype in the future. This mostly means that
> > numpy dtypes such as `np.int64` will always be treated like an
> > int64,
> > and never like a `uint8` because they happen to be castable to
> > that.
> > 
> > For values without a dtype attached (read python integers, floats),
> > I
> > see three options, from more complex to simpler:
> > 
> > 1. Keep the current logic in place as much as possible
> > 2. Only support value based promotion for operators, e.g.:
> >    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
> >    The upside is that it limits the complexity to a much simpler
> >    problem, the downside is that the ufunc call and operator match
> >    less clearly.
> > 3. Just associate python float with float64 and python integers
> > with
> >    long/int64 and force users to always type them explicitly if
> > they
> >    need to.
> > 
> > The downside of 1. is that it doesn't help with simplifying the
> > current
> > situation all that much, because we still have the special casting
> > around...
> > 
> > 
> > I have realized that this got much too long, so I hope it makes
> > sense.
> > I will continue to dabble along on these things a bit, so if
> > nothing
> > else maybe writing it helps me to get a bit clearer on things...
> > 
> > Best,
> > 
> > Sebastian
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


From einstein.edison at gmail.com  Mon Jun 17 22:32:59 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Tue, 18 Jun 2019 04:32:59 +0200
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <ec3f16e3bb558ac6f852b3baa5a8ad3e4cc125d5.camel@gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <bf3844ac47c2e0591f2e04bcb447e59b9899eac7.camel@sipsolutions.net>
 <ec3f16e3bb558ac6f852b3baa5a8ad3e4cc125d5.camel@gmail.com>
Message-ID: <181ca56bb5120050b6ac81e34a46424eef124078.camel@gmail.com>

On Tue, 2019-06-18 at 04:28 +0200, Hameer Abbasi wrote:
> On Wed, 2019-06-12 at 12:55 -0500, Sebastian Berg wrote:
> > On Wed, 2019-06-05 at 15:41 -0500, Sebastian Berg wrote:
> > > Hi all,
> > > 
> > > TL;DR:
> > > 
> > > Value based promotion seems complex both for users and ufunc-
> > > dispatching/promotion logic. Is there any way we can move forward
> > > here,
> > > and if we do, could we just risk some possible (maybe not-
> > > existing)
> > > corner cases to break early to get on the way?
> > > 
> > 
> > Hi all,
> > 
> > just to note. I think I will go forward trying to fill the hole in
> > the
> > hierarchy with a non-existing uint7 dtype. That seemed like it may
> > be
> > ugly, but if it does not escalate too much, it is probably fairly
> > straight forward. And it would allow to simplify dispatching
> > without
> > any logic change at all. After that we could still decide to change
> > the
> > logic.
> 
> Hi Sebastian!
> 
> This seems like the right approach to me as well, I would just add
> one
> additional comment. Earlier on, you mentioned that a lot of "strange"
> dtypes will pop up when dealing with floats/ints. E.g. int15, int31,
> int63, int52 (for checking double-compat), int23 (single compat),
> int10
> (half compat) and so on and so forth. The lookup table would get
> tricky
> to populate by hand --- It might be worth it to use the logic I
> suggested to autogenerate it in some way, or to "determine" the
> temporary underspecified type, as Nathaniel proposed in his email to
> the list. That is, we store the number of:
> 
> * flag (0 for numeric, 1 for non-numeric)
> * sign bits (0 for unsigned ints, 1 else)
> * integer/fraction bits (self-explanatory)
> * exponent bits (self-explanatory)
> * Log-Number of items (0 for real, 1 for complex, 2 for quarternion,
> etc.) (I propose log because the Cayley-Dickson algebras [1] require
> a
> power of two)
> 
> A type is safely castable to another if all of these numbers are
> exceeded or met.
> 
> This would give us a clean way for registering new numeric types,
> while
> also cleanly hooking into the type system, and solving the casting
> scenario. Of course, I'm not proposing we generate the loops for or
> provide all these types ourselves, but simply that we allow people to
> define dtypes using such a schema. I do worry that we're special-
> casing 
> numbers here, but it is "Num"Py, so I'm also not too worried.
> 
> This flexibility would, for example, allow us to easily define a
> bfloat16/bcomplex32 type with all the "can_cast" logic in place, even
> if people have to register their own casts or loops (and just to be
> clear, we error if they are not). It also makes it easy to define
> loops
> for int128 and so on if they come along.
> 
> The only open question left here is: What to do with a case like
> int64
> + uint64. And what I propose is we abandon purity for pragmatism here
> and tell ourselves that losing one sign bit is tolerable 90% of the
> time, and going to floating-point is probably worse. It's more of a
> range-versus-accuracy question, and I would argue that people using
> integers expect exactness. Of course, I doubt anyone is actually
> relying on the fact that adding two integers produces floating-point
> results, and it has been the cause of at least one bug, which
> highlights that integers can be used in places where floats cannot.
> [0]

P.S. Someone collected a list of issues where the automatic float-
conversion breaks things, it's old but it does highlight the importance
of the issue: [0]

https://github.com/numpy/numpy/issues/12525#issuecomment-457727726

Hameer Abbasi

> 
> Hameer Abbasi
> 
> [0] https://github.com/numpy/numpy/issues/9982
> [1] https://en.wikipedia.org/wiki/Cayley%E2%80%93Dickson_construction
> 
> > Best,
> > 
> > Sebastian
> > 
> > 
> > > -----------
> > > 
> > > Currently when you write code such as:
> > > 
> > > arr = np.array([1, 43, 23], dtype=np.uint16)
> > > res = arr + 1
> > > 
> > > Numpy uses fairly sophisticated logic to decide that `1` can be
> > > represented as a uint16, and thus for all unary functions (and
> > > most
> > > others as well), the output will have a `res.dtype` of uint16.
> > > 
> > > Similar logic also exists for floating point types, where a lower
> > > precision floating point can be used:
> > > 
> > > arr = np.array([1, 43, 23], dtype=np.float32)
> > > (arr + np.float64(2.)).dtype  # will be float32
> > > 
> > > Currently, this value based logic is enforced by checking whether
> > > the
> > > cast is possible: "4" can be cast to int8, uint8. So the first
> > > call
> > > above will at some point check if "uint16 + uint16 -> uint16" is
> > > a
> > > valid operation, find that it is, and thus stop searching. (There
> > > is
> > > the additional logic, that when both/all operands are scalars, it
> > > is
> > > not applied).
> > > 
> > > Note that while it is defined in terms of casting "1" to uint8
> > > safely
> > > being possible even though 1 may be typed as int64. This logic
> > > thus
> > > affects all promotion rules as well (i.e. what should the output
> > > dtype
> > > be).
> > > 
> > > 
> > > There 2 main discussion points/issues about it:
> > > 
> > > 1. Should value based casting/promotion logic exist at all?
> > > 
> > > Arguably an `np.int32(3)` has type information attached to it, so
> > > why
> > > should we ignore it. It can also be tricky for users, because a
> > > small
> > > change in values can change the result data type.
> > > Because 0-D arrays and scalars are too close inside numpy (you
> > > will
> > > often not know which one you get). There is not much option but
> > > to
> > > handle them identically. However, it seems pretty odd that:
> > >  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
> > >  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> > > 
> > > give a different result.
> > > 
> > > This is a bit different for python scalars, which do not have a
> > > type
> > > attached already.
> > > 
> > > 
> > > 2. Promotion and type resolution in Ufuncs:
> > > 
> > > What is currently bothering me is that the decision what the
> > > output
> > > dtypes should be currently depends on the values in complicated
> > > ways.
> > > It would be nice if we can decide which type signature to use
> > > without
> > > actually looking at values (or at least only very early on).
> > > 
> > > One reason here is caching and simplicity. I would like to be
> > > able
> > > to
> > > cache which loop should be used for what input. Having value
> > > based
> > > casting in there bloats up the problem.
> > > Of course it currently works OK, but especially when user dtypes
> > > come
> > > into play, caching would seem like a nice optimization option.
> > > 
> > > Because `uint8(127)` can also be a `int8`, but uint8(128) it is
> > > not
> > > as
> > > simple as finding the "minimal" dtype once and working with
> > > that." 
> > > Of course Eric and I discussed this a bit before, and you could
> > > create
> > > an internal "uint7" dtype which has the only purpose of flagging
> > > that
> > > a
> > > cast to int8 is safe.
> > > 
> > > I suppose it is possible I am barking up the wrong tree here, and
> > > this
> > > caching/predictability is not vital (or can be solved with such
> > > an
> > > internal dtype easily, although I am not sure it seems elegant).
> > > 
> > > 
> > > Possible options to move forward
> > > --------------------------------
> > > 
> > > I have to still see a bit how trick things are. But there are a
> > > few
> > > possible options. I would like to move the scalar logic to the
> > > beginning of ufunc calls:
> > >   * The uint7 idea would be one solution
> > >   * Simply implement something that works for numpy and all
> > > except
> > >     strange external ufuncs (I can only think of numba as a
> > > plausible
> > >     candidate for creating such).
> > > 
> > > My current plan is to see where the second thing leaves me.
> > > 
> > > We also should see if we cannot move the whole thing forward, in
> > > which
> > > case the main decision would have to be forward to where. My
> > > opinion
> > > is
> > > currently that when a type has a dtype associated with it
> > > clearly,
> > > we
> > > should always use that dtype in the future. This mostly means
> > > that
> > > numpy dtypes such as `np.int64` will always be treated like an
> > > int64,
> > > and never like a `uint8` because they happen to be castable to
> > > that.
> > > 
> > > For values without a dtype attached (read python integers,
> > > floats),
> > > I
> > > see three options, from more complex to simpler:
> > > 
> > > 1. Keep the current logic in place as much as possible
> > > 2. Only support value based promotion for operators, e.g.:
> > >    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
> > >    The upside is that it limits the complexity to a much simpler
> > >    problem, the downside is that the ufunc call and operator
> > > match
> > >    less clearly.
> > > 3. Just associate python float with float64 and python integers
> > > with
> > >    long/int64 and force users to always type them explicitly if
> > > they
> > >    need to.
> > > 
> > > The downside of 1. is that it doesn't help with simplifying the
> > > current
> > > situation all that much, because we still have the special
> > > casting
> > > around...
> > > 
> > > 
> > > I have realized that this got much too long, so I hope it makes
> > > sense.
> > > I will continue to dabble along on these things a bit, so if
> > > nothing
> > > else maybe writing it helps me to get a bit clearer on things...
> > > 
> > > Best,
> > > 
> > > Sebastian
> > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion


From m.h.vankerkwijk at gmail.com  Tue Jun 18 10:06:17 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 18 Jun 2019 10:06:17 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
Message-ID: <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>

Hi Allen,

Thanks for the message and link! In astropy, we've been struggling with
masking a lot, and one of the main conclusions I have reached is that
ideally one has a more abstract `Masked` class that can take any type of
data (including `ndarray`, of course), and behaves like that data as much
as possible, to the extent that if, e.g., I create a `Masked(Quantity(...,
unit), mask=...)`, the instance will have a `.unit` attribute and perhaps
even `isinstance(..., Quantity)` will hold. And similarly for
`Masked(Time(...), mask=...)`, `Masked(SkyCoord(...), mask=...)`, etc. In a
way, `Masked` would be a kind of Mixin-class that just tracks a mask
attribute.

This may be too much to ask from the initializer, but, if so, it still
seems most useful if it is made as easy as possible to do, say, `class
MaskedQuantity(Masked, Quantity): <very few overrides>`.

Even if this impossible, I think it is conceptually useful to think about
what the masking class should do. My sense is that, e.g., it should not
attempt to decide when an operation succeeds or not, but just "or together"
input masks for regular, multiple-input functions, and let the underlying
arrays skip elements for reductions by using `where` (hey, I did implement
that for a reason... ;-). In particular, it suggests one should not have
things like domains and all that (I never understood why `MaskedArray` did
that). If one wants more, the class should provide a method that updates
the mask (a sensible default might be `mask |= ~np.isfinite(result)` -
here, the class being masked should logically support ufuncs and functions,
so it can decide what "isfinite" means).

In any case, I would think that a basic truth should be that everything has
a mask with a shape consistent with the data, so
1. Each complex numbers has just one mask, and setting `a.imag` with a
masked array should definitely propagate the mask.
2. For a masked array with structured dtype, I'd similarly say that the
default is for a mask to have the same shape as the array. But that
something like your collection makes sense for the case where one wants to
mask items in a structure.

All the best,

Marten

p.s. I started trying to implement the above "Mixin" class; will try to
clean that up a bit so that at least it uses `where` and push it up.


On Mon, Jun 17, 2019 at 6:43 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> Hi all,
>
> Chuck suggested we think about a MaskedArray replacement for 1.18.
>
> A few months ago I did some work on a MaskedArray replacement using
> `__array_function__`, which I got mostly working. It seems like a good
> time to bring it up for discussion now. See it at:
>
> https://github.com/ahaldane/ndarray_ducktypes
>
> It should be very usable, it has docs you can read, and it passes a
> pytest-suite with 1024 tests adapted from numpy's MaskedArray tests.
> What is missing? It needs even more tests for new functionality, and a
> couple numpy-API functions are missing, in particular `np.median`,
> `np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could
> also find many things to improve too.
>
> Besides fixing many little annoyances from MaskedArray, and simplifying
> the logic by always storing the mask in full, it also has new features.
> For instance it allows the use of a "X" variable to mark masked
> locations during array construction, and I solve the issue of how to
> mask individual fields of a structured array differently.
>
> At this point I would by happy to get some feedback on the design and
> what seems good or bad. If it seems like a good start, I'd be happy to
> move it into a numpy repo of some sort for further collaboration &
> discussion, and maybe into 1.18. At the least I hope it can serve as a
> design study of what we could do.
>
>
>
>
>
> Let me also drop here two more interesting detailed issues:
>
> First, the issue of what to do about .real and .imag of complex arrays,
> and similarly about field-assignment of structured arrays. The problem
> is that we have a single mask bool per element of a complex array, but
> what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the
> mask of the original array change? Should we make .real and .imag readonly?
>
> Second, a more general issue of how to ducktype scalars when using
> `__array_function__` which I think many ducktype implementors will have
> to face. For MaskedArray, I created an associated "MaskedScalar" type.
> However, MaskedScalar has to behave differently from normal numpy
> scalars in a number of ways: It is not part of the numpy scalar
> hierarchy, it fails checks `isinstance(var, np.floating)`, and
> np.isscalar returns false. Numpy scalar types cannot be subclassed. We
> have discussed before the need to have distinction between 0d-arrays and
> scalars, so we shouldn't just use a 0d (in fact, this makes printing
> very difficult). This leads me to think that in future dtype-overhaul
> plans, we should consider creating a subclassable `np.scalar` base type
> to wrap all numpy scalar variables, and code like `isinstance(var,
> np.floating)` should be replaced by `isinstance(var.dtype.type,
> np.floating)` or similar. That is, the numeric dtype of the scalar is no
> longer encoded in `type(var)` but in `var.dtype`: The fact that the
> variable is a numpy scalar is decoupled from its numeric dtype.
>
> This is useful because there are many "associated" properties of scalars
> in common with arrays which have nothing to do with the dtype, which
> ducktype implementors want to touch. I imagine this will come up a lot:
> In that repo I also have an "ArrayCollection" ducktype which required a
> "CollectionScalar" scalar, and similarly I imagine people implementing
> units want the units attached to the scalar, independently of the dtype.
>
> Cheers,
> Allan
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190618/be963059/attachment.html>

From allanhaldane at gmail.com  Tue Jun 18 12:54:33 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Tue, 18 Jun 2019 12:54:33 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
Message-ID: <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>

On 6/18/19 10:06 AM, Marten van Kerkwijk wrote:
> Hi Allen,
> 
> Thanks for the message and link! In astropy, we've been struggling with
> masking a lot, and one of the main conclusions I have reached is that
> ideally one has a more abstract `Masked` class that can take any type of
> data (including `ndarray`, of course), and behaves like that data as
> much as possible, to the extent that if, e.g., I create a
> `Masked(Quantity(..., unit), mask=...)`, the instance will have a
> `.unit` attribute and perhaps even `isinstance(..., Quantity)` will
> hold. And similarly for `Masked(Time(...), mask=...)`,
> `Masked(SkyCoord(...), mask=...)`, etc. In a way, `Masked` would be a
> kind of Mixin-class that just tracks a mask attribute.
> 
> This may be too much to ask from the initializer, but, if so, it still
> seems most useful if it is made as easy as possible to do, say, `class
> MaskedQuantity(Masked, Quantity): <very few overrides>`.

Currently MaskedArray does not accept ducktypes as underlying arrays,
but I think it shouldn't be too hard to modify it to do so. Good idea!

I already partly navigated this mixin-issue in the
"MaskedArrayCollection" class, which essentially does
ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
boilerplate. That's the backwards encapsulation order from what you want
though.

> Even if this impossible, I think it is conceptually useful to think
> about what the masking class should do. My sense is that, e.g., it
> should not attempt to decide when an operation succeeds or not, but just
> "or together" input masks for regular, multiple-input functions, and let
> the underlying arrays skip elements for reductions by using `where`
> (hey, I did implement that for a reason... ;-). In particular, it
> suggests one should not have things like domains and all that (I never
> understood why `MaskedArray` did that). If one wants more, the class
> should provide a method that updates the mask (a sensible default might
> be `mask |= ~np.isfinite(result)` - here, the class being masked should
> logically support ufuncs and functions, so it can decide what "isfinite"
> means).

I agree it would be nice to remove domains. It would make life easier,
and I could remove a lot of twiddly code! I kept it in for now to
minimize the behavior changes from the old MaskedArray.

> In any case, I would think that a basic truth should be that everything
> has a mask with a shape consistent with the data, so
> 1. Each complex numbers has just one mask, and setting `a.imag` with a
> masked array should definitely propagate the mask.
> 2. For a masked array with structured dtype, I'd similarly say that the
> default is for a mask to have the same shape as the array. But that
> something like your collection makes sense for the case where one wants
> to mask items in a structure.

Agreed that we should have a single bool per complex or structured
element, and the mask shape is the same as the array shape. That's how I
implemented it. But there is still a problem with complex.imag assignment:

    >>> a = MaskedArray([1j, 2, X])
    >>> i = a.imag
    >>> i[:] = MaskedArray([1, X, 1])

If we make the last line copy the mask to the original array, what
should the real part of a[2] be? Conversely, if we don't copy the mask,
what should the imag part of a[1] be? It seems like we might "want" the
masks to be OR'd instead, but then should i[2] be masked after we just
set it to 1?


> All the best,
> 
> Marten
> 
> p.s. I started trying to implement the above "Mixin" class; will try to
> clean that up a bit so that at least it uses `where` and push it up.

I played with "where", but didn't include it since 1.17 is not released.
To avoid duplication of effort, I've attached a diff of what I tried. I
actually get a slight slowdown of about 10% by using where...

If you make progress with the mixin, a push is welcome. I imagine a
problem is going to be that np.isscalar doesn't work to detect duck scalars.

Cheers,
Allan


> On Mon, Jun 17, 2019 at 6:43 PM Allan Haldane <allanhaldane at gmail.com
> <mailto:allanhaldane at gmail.com>> wrote:
> 
>     Hi all,
> 
>     Chuck suggested we think about a MaskedArray replacement for 1.18.
> 
>     A few months ago I did some work on a MaskedArray replacement using
>     `__array_function__`, which I got mostly working. It seems like a good
>     time to bring it up for discussion now. See it at:
> 
>     https://github.com/ahaldane/ndarray_ducktypes
> 
>     It should be very usable, it has docs you can read, and it passes a
>     pytest-suite with 1024 tests adapted from numpy's MaskedArray tests.
>     What is missing? It needs even more tests for new functionality, and a
>     couple numpy-API functions are missing, in particular `np.median`,
>     `np.percentile`, `np.cov`, and `np.corrcoef`. I'm sure other devs could
>     also find many things to improve too.
> 
>     Besides fixing many little annoyances from MaskedArray, and simplifying
>     the logic by always storing the mask in full, it also has new features.
>     For instance it allows the use of a "X" variable to mark masked
>     locations during array construction, and I solve the issue of how to
>     mask individual fields of a structured array differently.
> 
>     At this point I would by happy to get some feedback on the design and
>     what seems good or bad. If it seems like a good start, I'd be happy to
>     move it into a numpy repo of some sort for further collaboration &
>     discussion, and maybe into 1.18. At the least I hope it can serve as a
>     design study of what we could do.
> 
> 
> 
> 
> 
>     Let me also drop here two more interesting detailed issues:
> 
>     First, the issue of what to do about .real and .imag of complex arrays,
>     and similarly about field-assignment of structured arrays. The problem
>     is that we have a single mask bool per element of a complex array, but
>     what if someone does `arr.imag = MaskedArray([1,X,1])`? How should the
>     mask of the original array change? Should we make .real and .imag
>     readonly?
> 
>     Second, a more general issue of how to ducktype scalars when using
>     `__array_function__` which I think many ducktype implementors will have
>     to face. For MaskedArray, I created an associated "MaskedScalar" type.
>     However, MaskedScalar has to behave differently from normal numpy
>     scalars in a number of ways: It is not part of the numpy scalar
>     hierarchy, it fails checks `isinstance(var, np.floating)`, and
>     np.isscalar returns false. Numpy scalar types cannot be subclassed. We
>     have discussed before the need to have distinction between 0d-arrays and
>     scalars, so we shouldn't just use a 0d (in fact, this makes printing
>     very difficult). This leads me to think that in future dtype-overhaul
>     plans, we should consider creating a subclassable `np.scalar` base type
>     to wrap all numpy scalar variables, and code like `isinstance(var,
>     np.floating)` should be replaced by `isinstance(var.dtype.type,
>     np.floating)` or similar. That is, the numeric dtype of the scalar is no
>     longer encoded in `type(var)` but in `var.dtype`: The fact that the
>     variable is a numpy scalar is decoupled from its numeric dtype.
> 
>     This is useful because there are many "associated" properties of scalars
>     in common with arrays which have nothing to do with the dtype, which
>     ducktype implementors want to touch. I imagine this will come up a lot:
>     In that repo I also have an "ArrayCollection" ducktype which required a
>     "CollectionScalar" scalar, and similarly I imagine people implementing
>     units want the units attached to the scalar, independently of the dtype.
> 
>     Cheers,
>     Allan
> 
> 
> 
> 
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
diff --git a/MaskedArray.py b/MaskedArray.py
index 2df12b4..272699f 100755
--- a/MaskedArray.py
+++ b/MaskedArray.py
@@ -904,12 +904,23 @@ class _Masked_BinOp(_Masked_UFunc):
         if isinstance(initial, (MaskedScalar, MaskedX)):
             raise ValueError("initial should not be masked")
 
-        if not np.isscalar(da):
-            da[ma] = self.reduce_fill(da.dtype)
-            # if da is a scalar, we get correct result no matter fill
+        if 1: # two different implementations, investigate performance
+            wheremask = ~ma
+            if 'where' in kwargs:
+                wheremask |= kwargs['where']
+            kwargs['where'] = wheremask
+            if 'initial' not in kwargs:
+                kwargs['initial'] = self.reduce_fill(da.dtype)
+
+            result = self.f.reduce(da, **kwargs)
+            m = np.logical_and.reduce(ma, **mkwargs)
+        else:
+            if not np.isscalar(da):
+                da[ma] = self.reduce_fill(da.dtype)
+                # if da is a scalar, we get correct result no matter fill
 
-        result = self.f.reduce(da, **kwargs)
-        m = np.logical_and.reduce(ma, **mkwargs)
+            result = self.f.reduce(da, **kwargs)
+            m = np.logical_and.reduce(ma, **mkwargs)
 
         ## Code that might be used to support domained ufuncs. WIP
         #with np.errstate(divide='ignore', invalid='ignore'):

From m.h.vankerkwijk at gmail.com  Tue Jun 18 14:04:12 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 18 Jun 2019 14:04:12 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
Message-ID: <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>

On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane <allanhaldane at gmail.com>
wrote:
<snip>

> > This may be too much to ask from the initializer, but, if so, it still
> > seems most useful if it is made as easy as possible to do, say, `class
> > MaskedQuantity(Masked, Quantity): <very few overrides>`.
>
> Currently MaskedArray does not accept ducktypes as underlying arrays,
> but I think it shouldn't be too hard to modify it to do so. Good idea!
>

Looking back at my trial, I see that I also never got to duck arrays - only
ndarray subclasses - though I tried to make the code as agnostic as
possible.

(Trial at
https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
)

I already partly navigated this mixin-issue in the
> "MaskedArrayCollection" class, which essentially does
> ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> boilerplate. That's the backwards encapsulation order from what you want
> though.
>

Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
mask=[True, False, False])` does indeed not have a `.unit` attribute (and
cannot represent itself...); I'm not at all sure that my method of just
creating a mixed class is anything but a recipe for disaster, though!


> > Even if this impossible, I think it is conceptually useful to think
> > about what the masking class should do. My sense is that, e.g., it
> > should not attempt to decide when an operation succeeds or not, but just
> > "or together" input masks for regular, multiple-input functions, and let
> > the underlying arrays skip elements for reductions by using `where`
> > (hey, I did implement that for a reason... ;-). In particular, it
> > suggests one should not have things like domains and all that (I never
> > understood why `MaskedArray` did that). If one wants more, the class
> > should provide a method that updates the mask (a sensible default might
> > be `mask |= ~np.isfinite(result)` - here, the class being masked should
> > logically support ufuncs and functions, so it can decide what "isfinite"
> > means).
>
> I agree it would be nice to remove domains. It would make life easier,
> and I could remove a lot of twiddly code! I kept it in for now to
> minimize the behavior changes from the old MaskedArray.
>

That makes sense. Could be separated out to a backwards-compatibility class
later.


> > In any case, I would think that a basic truth should be that everything
> > has a mask with a shape consistent with the data, so
> > 1. Each complex numbers has just one mask, and setting `a.imag` with a
> > masked array should definitely propagate the mask.
> > 2. For a masked array with structured dtype, I'd similarly say that the
> > default is for a mask to have the same shape as the array. But that
> > something like your collection makes sense for the case where one wants
> > to mask items in a structure.
>
> Agreed that we should have a single bool per complex or structured
> element, and the mask shape is the same as the array shape. That's how I
> implemented it. But there is still a problem with complex.imag assignment:
>
>     >>> a = MaskedArray([1j, 2, X])
>     >>> i = a.imag
>     >>> i[:] = MaskedArray([1, X, 1])
>
> If we make the last line copy the mask to the original array, what
> should the real part of a[2] be? Conversely, if we don't copy the mask,
> what should the imag part of a[1] be? It seems like we might "want" the
> masks to be OR'd instead, but then should i[2] be masked after we just
> set it to 1?
>
> Ah, I see the issue now... Easiest to implement and closest in analogy to
a regular view would be to just let it unmask a[2] (with whatever is in
real; user beware!).

Perhaps better would be to special-case such that `imag` returns a
read-only view of the mask. Making `imag` itself read-only would prevent
possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but there is no
reason this should update the mask.

Still, neither is really satisfactory...


>
> > p.s. I started trying to implement the above "Mixin" class; will try to
> > clean that up a bit so that at least it uses `where` and push it up.
>
> I played with "where", but didn't include it since 1.17 is not released.
> To avoid duplication of effort, I've attached a diff of what I tried. I
> actually get a slight slowdown of about 10% by using where...
>

Your implementation is indeed quite similar to what I got in
__array_ufunc__ (though one should "&" the where with ~mask).

I think the main benefit is not to presume that whatever is underneath
understands 0 or 1, i.e., avoid filling.


> If you make progress with the mixin, a push is welcome. I imagine a
> problem is going to be that np.isscalar doesn't work to detect duck
> scalars.
>
> I fear that in my attempts I've simply decided that only array scalars
exist...

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190618/ae585113/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Jun 18 16:47:15 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 18 Jun 2019 13:47:15 -0700
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <ec3f16e3bb558ac6f852b3baa5a8ad3e4cc125d5.camel@gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <bf3844ac47c2e0591f2e04bcb447e59b9899eac7.camel@sipsolutions.net>
 <ec3f16e3bb558ac6f852b3baa5a8ad3e4cc125d5.camel@gmail.com>
Message-ID: <42275e3335ac735688912b70f202cd7b5af2851f.camel@sipsolutions.net>

Hi Hameer,

On Tue, 2019-06-18 at 04:28 +0200, Hameer Abbasi wrote:
> On Wed, 2019-06-12 at 12:55 -0500, Sebastian Berg wrote:
> > On Wed, 2019-06-05 at 15:41 -0500, Sebastian Berg wrote:
> > > Hi all,
> > > 
<snip>
> A type is safely castable to another if all of these numbers are
> exceeded or met.
> 
> This would give us a clean way for registering new numeric types,
> while
> also cleanly hooking into the type system, and solving the casting
> scenario. Of course, I'm not proposing we generate the loops for or
> provide all these types ourselves, but simply that we allow people to
> define dtypes using such a schema. I do worry that we're special-
> casing 
> numbers here, but it is "Num"Py, so I'm also not too worried.
> 
> This flexibility would, for example, allow us to easily define a
> bfloat16/bcomplex32 type with all the "can_cast" logic in place, even
> if people have to register their own casts or loops (and just to be
> clear, we error if they are not). It also makes it easy to define
> loops
> for int128 and so on if they come along.
> 
> The only open question left here is: What to do with a case like
> int64
> + uint64. And what I propose is we abandon purity for pragmatism here
> and tell ourselves that losing one sign bit is tolerable 90% of the
> time, and going to floating-point is probably worse. It's more of a
> range-versus-accuracy question, and I would argue that people using
> integers expect exactness. Of course, I doubt anyone is actually
> relying on the fact that adding two integers produces floating-point
> results, and it has been the cause of at least one bug, which
> highlights that integers can be used in places where floats cannot.
> [0]

TL;DR: I started a prototype for a possible approach on
casting/promotion and ufunc dispatching, any comments appreciated,
especially if I took a wrong turn! I will look into writing a bit more
about it more in an NEP style and not in code.

(About uint64 + int64 see below [0])

Thanks for the input, Hameer! Sorry for not following up here with some
of the thoughts that I had after writing that. You are right that the
bit counting is a way to handle this (and maybe keep it limited enough
that caching is possible). I had started prototyping a bit with the
thought that maybe caching for scalars is just not so important (just
type if you need the speed). But using this method, it might be fine.
Although, e.g. an int8NA type using -128 to signal an NA value would be
yet another issue with "binning" the values.

I have followed a bit of an older thought of mine right now and started
to do a python mock-up prototyping for that. It is probably still a lot
in flux, but the basic idea is to start of with using slots on a dtype
objects (I realize that during the meeting we had a slight tendency
towards a registration approach for casting, this seemed a bit simpler
to me though).

For the sake of discussion, I posted the start at: 
https://github.com/seberg/numpy_dtype_prototype

On the ufunc side, it just mocks up a simple UfuncImpl model (for
add/multiply, but only add semi supports units right now).

There is a mock up array object using a numpy array + new dtype (it
does support `array.astype(new_dtype)`.

I have basically considered dtypes to be instances of "DType". The
classes of dtypes could be thought of dtype categories (if you so will,
right now these categories are not clean with respect to ufunc
dispatching).

There are basically two different kinds of dtypes in this mock up:

1. Descriptor instances which are "finalized", i.e. they are proper
dtypes with an itemsize and can be attached to arrays.
2. Descriptor instances which are not "finalized". These could be just
abstract dtype. The simplest notion here are flexible dtypes such as
"S" which cannot be attached to an array, but we still call it a dtype.

This way 1. are basically more specific versions of 2. So both groups
have all the machinery for type casting/promotion/discovery, but only
the first group can actually describe real data.

So how to handle casting? In this mock-up, I opted for slots on the
DType object (the C-side would look similar, not sure how exactly), one
reason is that casting in general needs function calls in any case,
because we cannot decide if one string can be cast to another without
looking at its length, etc (or meters cast to seconds). For casting
purposes these are:

methods:

* get_cast_func_to(...), get_cast_func_from(...)
* (optionally: can_cast_to, can_cast_from)
* common_type(self, other)  # allow to override common type operation.
* default_type()  # for non-finalized types to ask them for a real type

classmethods:

* discover_type(python_scalar)  # falls back to discovering from class:
* discover_type_from_class(python_type)

Of course some other information needs to be attached/registered, e.g.
which python classes to associate with a certain dtype category.

To handle python integers, what I did now is that I discover them as
their own dtype (a class 2. one, not a real one). Which remembers the
value and knows how to cast to normal dtypes. It will do so very slowly
by trying all possibilities, but user provided dtypes can handle it for
better speed.

One small nice thing about this is that it should support dtypes such
as `np.blasable` to mean "float32" or "float64", they may not be super
fast, but it should work.

One side effect of this approach for the common type operation is that,
there is no way for a user int24 to say that uint16 + int16 -> int24
(or similar).

On the side of ufunc dispatching it is probably a bit hacky, but
basically the idea is that it is based on the "dtypes category"
(usually class). It could be extended later, but for starters makes the
dispatching very simple, since we ignore type hierarchy.

Best,

Sebastian


[0] One of the additional things with uint64+int64 is that it jumps
kinds/category (whatever you want to call it) between ints and floats
right now.
There is a more general problem with that. Our casting logic is not
strictly ordered according to these categories (int < float), i.e.  a
large int will cause a float16 to upcast to float64.
This is problematic because it is the reason why our common
type/promotion is not associative, while C and Julia's is associative.

The only thing I can think of would be to find common types within the
same categories kinds first, but I bet that just opens a huge can of
worms (and makes dtype discovery much more complex).


> 
> Hameer Abbasi
> 
> [0] https://github.com/numpy/numpy/issues/9982
> [1] https://en.wikipedia.org/wiki/Cayley%E2%80%93Dickson_construction
> 
> > Best,
> > 
> > Sebastian
> > 
> > 
> > > -----------
> > > 
> > > Currently when you write code such as:
> > > 
> > > arr = np.array([1, 43, 23], dtype=np.uint16)
> > > res = arr + 1
> > > 
> > > Numpy uses fairly sophisticated logic to decide that `1` can be
> > > represented as a uint16, and thus for all unary functions (and
> > > most
> > > others as well), the output will have a `res.dtype` of uint16.
> > > 
> > > Similar logic also exists for floating point types, where a lower
> > > precision floating point can be used:
> > > 
> > > arr = np.array([1, 43, 23], dtype=np.float32)
> > > (arr + np.float64(2.)).dtype  # will be float32
> > > 
> > > Currently, this value based logic is enforced by checking whether
> > > the
> > > cast is possible: "4" can be cast to int8, uint8. So the first
> > > call
> > > above will at some point check if "uint16 + uint16 -> uint16" is
> > > a
> > > valid operation, find that it is, and thus stop searching. (There
> > > is
> > > the additional logic, that when both/all operands are scalars, it
> > > is
> > > not applied).
> > > 
> > > Note that while it is defined in terms of casting "1" to uint8
> > > safely
> > > being possible even though 1 may be typed as int64. This logic
> > > thus
> > > affects all promotion rules as well (i.e. what should the output
> > > dtype
> > > be).
> > > 
> > > 
> > > There 2 main discussion points/issues about it:
> > > 
> > > 1. Should value based casting/promotion logic exist at all?
> > > 
> > > Arguably an `np.int32(3)` has type information attached to it, so
> > > why
> > > should we ignore it. It can also be tricky for users, because a
> > > small
> > > change in values can change the result data type.
> > > Because 0-D arrays and scalars are too close inside numpy (you
> > > will
> > > often not know which one you get). There is not much option but
> > > to
> > > handle them identically. However, it seems pretty odd that:
> > >  * `np.array(3, dtype=np.int32)` + np.arange(10, dtype=int8)
> > >  * `np.array([3], dtype=np.int32)` + np.arange(10, dtype=int8)
> > > 
> > > give a different result.
> > > 
> > > This is a bit different for python scalars, which do not have a
> > > type
> > > attached already.
> > > 
> > > 
> > > 2. Promotion and type resolution in Ufuncs:
> > > 
> > > What is currently bothering me is that the decision what the
> > > output
> > > dtypes should be currently depends on the values in complicated
> > > ways.
> > > It would be nice if we can decide which type signature to use
> > > without
> > > actually looking at values (or at least only very early on).
> > > 
> > > One reason here is caching and simplicity. I would like to be
> > > able
> > > to
> > > cache which loop should be used for what input. Having value
> > > based
> > > casting in there bloats up the problem.
> > > Of course it currently works OK, but especially when user dtypes
> > > come
> > > into play, caching would seem like a nice optimization option.
> > > 
> > > Because `uint8(127)` can also be a `int8`, but uint8(128) it is
> > > not
> > > as
> > > simple as finding the "minimal" dtype once and working with
> > > that." 
> > > Of course Eric and I discussed this a bit before, and you could
> > > create
> > > an internal "uint7" dtype which has the only purpose of flagging
> > > that
> > > a
> > > cast to int8 is safe.
> > > 
> > > I suppose it is possible I am barking up the wrong tree here, and
> > > this
> > > caching/predictability is not vital (or can be solved with such
> > > an
> > > internal dtype easily, although I am not sure it seems elegant).
> > > 
> > > 
> > > Possible options to move forward
> > > --------------------------------
> > > 
> > > I have to still see a bit how trick things are. But there are a
> > > few
> > > possible options. I would like to move the scalar logic to the
> > > beginning of ufunc calls:
> > >   * The uint7 idea would be one solution
> > >   * Simply implement something that works for numpy and all
> > > except
> > >     strange external ufuncs (I can only think of numba as a
> > > plausible
> > >     candidate for creating such).
> > > 
> > > My current plan is to see where the second thing leaves me.
> > > 
> > > We also should see if we cannot move the whole thing forward, in
> > > which
> > > case the main decision would have to be forward to where. My
> > > opinion
> > > is
> > > currently that when a type has a dtype associated with it
> > > clearly,
> > > we
> > > should always use that dtype in the future. This mostly means
> > > that
> > > numpy dtypes such as `np.int64` will always be treated like an
> > > int64,
> > > and never like a `uint8` because they happen to be castable to
> > > that.
> > > 
> > > For values without a dtype attached (read python integers,
> > > floats),
> > > I
> > > see three options, from more complex to simpler:
> > > 
> > > 1. Keep the current logic in place as much as possible
> > > 2. Only support value based promotion for operators, e.g.:
> > >    `arr + scalar` may do it, but `np.add(arr, scalar)` will not.
> > >    The upside is that it limits the complexity to a much simpler
> > >    problem, the downside is that the ufunc call and operator
> > > match
> > >    less clearly.
> > > 3. Just associate python float with float64 and python integers
> > > with
> > >    long/int64 and force users to always type them explicitly if
> > > they
> > >    need to.
> > > 
> > > The downside of 1. is that it doesn't help with simplifying the
> > > current
> > > situation all that much, because we still have the special
> > > casting
> > > around...
> > > 
> > > 
> > > I have realized that this got much too long, so I hope it makes
> > > sense.
> > > I will continue to dabble along on these things a bit, so if
> > > nothing
> > > else maybe writing it helps me to get a bit clearer on things...
> > > 
> > > Best,
> > > 
> > > Sebastian
> > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190618/39ff1bb8/attachment.sig>

From allanhaldane at gmail.com  Wed Jun 19 17:44:38 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Wed, 19 Jun 2019 17:44:38 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
Message-ID: <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>

On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> 
> 
> On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane <allanhaldane at gmail.com
> <mailto:allanhaldane at gmail.com>> wrote:
> <snip>
> 
>     > This may be too much to ask from the initializer, but, if so, it still
>     > seems most useful if it is made as easy as possible to do, say, `class
>     > MaskedQuantity(Masked, Quantity): <very few overrides>`.
> 
>     Currently MaskedArray does not accept ducktypes as underlying arrays,
>     but I think it shouldn't be too hard to modify it to do so. Good idea!
> 
> 
> Looking back at my trial, I see that I also never got to duck arrays -
> only ndarray subclasses - though I tried to make the code as agnostic as
> possible.
> 
> (Trial at
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1)
> 
>     I already partly navigated this mixin-issue in the
>     "MaskedArrayCollection" class, which essentially does
>     ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
>     boilerplate. That's the backwards encapsulation order from what you want
>     though.
> 
> 
> Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> mask=[True, False, False])` does indeed not have a `.unit` attribute
> (and cannot represent itself...); I'm not at all sure that my method of
> just creating a mixed class is anything but a recipe for disaster, though!

Based on your suggestion I worked on this a little today, and now my
MaskedArray more easily encapsulates both ducktypes and ndarray
subclasses (pushed to repo). Here's an example I got working with masked
units using unyt:

[1]: from MaskedArray import X, MaskedArray, MaskedScalar

[2]: from unyt import m, km

[3]: import numpy as np

[4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])

[5]: uarr

MaskedArray([1., X , 3.])
[6]: uarr + 1*m

MaskedArray([1.001, X    , 3.001])
[7]: uarr.filled()

unyt_array([1., 0., 3.], 'km')
[8]: np.concatenate([uarr, 2*uarr]).filled()
unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')

The catch is the ducktype/subclass has to rigorously follow numpy's
indexing rules, including distinguishing 0d arrays from scalars. For now
only I used unyt in the example above since it happens to be less strict
 about dimensionless operations than astropy.units which trips up my
repr code. (see below for example with astropy.units). Note in the last
line I lost the dimensions, but that is because unyt does not handle
np.concatenate. To get that to work we need a true ducktype for units.

The example above doesn't expose the ".units" attribute outside the
MaskedArray, and it doesn't print the units in the repr. But you can
access them using "filled".

While I could make MaskedArray forward unknown attribute accesses to the
encapsulated array, that seems a bit dangerous/bug-prone at first
glance, so probably I want to require the user to make a MaskedArray
subclass to do so. I've just started playing with that (probably buggy),
and Ive attached subclass examples for astropy.unit and unyt, with some
example output below.

Cheers,
Allan


Example using the attached astropy unit subclass:

    >>> from astropy.units import m, km, s
    >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
    >>> uarr
    MaskedQ([1., X , 1.], units=km)
    >>> uarr.units
    km
    >>> uarr + (1*m)
    MaskedQ([1.001, X    , 1.001], units=km)
    >>> uarr/(1*s)
    MaskedQ([1., X , 1.], units=km / s)
    >>> (uarr*(1*m))[1:]
    MaskedQ([X , 1.], units=km m)
    >>> np.add.outer(uarr, uarr)
    MaskedQ([[2., X , 2.],
             [X , X , X ],
             [2., X , 2.]], units=km)
    >>> print(uarr)
    [1. X  1.] km m

Cheers,
Allan


>     > Even if this impossible, I think it is conceptually useful to think
>     > about what the masking class should do. My sense is that, e.g., it
>     > should not attempt to decide when an operation succeeds or not,
>     but just
>     > "or together" input masks for regular, multiple-input functions,
>     and let
>     > the underlying arrays skip elements for reductions by using `where`
>     > (hey, I did implement that for a reason... ;-). In particular, it
>     > suggests one should not have things like domains and all that (I never
>     > understood why `MaskedArray` did that). If one wants more, the class
>     > should provide a method that updates the mask (a sensible default
>     might
>     > be `mask |= ~np.isfinite(result)` - here, the class being masked
>     should
>     > logically support ufuncs and functions, so it can decide what
>     "isfinite"
>     > means).
> 
>     I agree it would be nice to remove domains. It would make life easier,
>     and I could remove a lot of twiddly code! I kept it in for now to
>     minimize the behavior changes from the old MaskedArray.
> 
> 
> That makes sense. Could be separated out to a backwards-compatibility
> class later.
> 
> 
>     > In any case, I would think that a basic truth should be that
>     everything
>     > has a mask with a shape consistent with the data, so
>     > 1. Each complex numbers has just one mask, and setting `a.imag` with a
>     > masked array should definitely propagate the mask.
>     > 2. For a masked array with structured dtype, I'd similarly say
>     that the
>     > default is for a mask to have the same shape as the array. But that
>     > something like your collection makes sense for the case where one
>     wants
>     > to mask items in a structure.
> 
>     Agreed that we should have a single bool per complex or structured
>     element, and the mask shape is the same as the array shape. That's how I
>     implemented it. But there is still a problem with complex.imag
>     assignment:
> 
>     ? ? >>> a = MaskedArray([1j, 2, X])
>     ? ? >>> i = a.imag
>     ? ? >>> i[:] = MaskedArray([1, X, 1])
> 
>     If we make the last line copy the mask to the original array, what
>     should the real part of a[2] be? Conversely, if we don't copy the mask,
>     what should the imag part of a[1] be? It seems like we might "want" the
>     masks to be OR'd instead, but then should i[2] be masked after we just
>     set it to 1?
> 
> Ah, I see the issue now... Easiest to implement and closest in analogy
> to a regular view would be to just let it unmask a[2] (with whatever is
> in real; user beware!).
> 
> Perhaps better would be to special-case such that `imag` returns a
> read-only view of the mask. Making `imag` itself read-only would prevent
> possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but there is
> no reason this should update the mask.
> 
> Still, neither is really satisfactory...
> ?
> 
> 
>     > p.s. I started trying to implement the above "Mixin" class; will
>     try to
>     > clean that up a bit so that at least it uses `where` and push it up.
> 
>     I played with "where", but didn't include it since 1.17 is not released.
>     To avoid duplication of effort, I've attached a diff of what I tried. I
>     actually get a slight slowdown of about 10% by using where...
> 
> 
> Your implementation is indeed quite similar to what I got in
> __array_ufunc__ (though one should "&" the where with ~mask).
> 
> I think the main benefit is not to presume that whatever is underneath
> understands 0 or 1, i.e., avoid filling.
> 
> 
>     If you make progress with the mixin, a push is welcome. I imagine a
>     problem is going to be that np.isscalar doesn't work to detect duck
>     scalars.
> 
> I fear that in my attempts I've simply decided that only array scalars
> exist...
> 
> -- Marten
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_astrounit.py
Type: text/x-python
Size: 1197 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190619/69153800/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_maskunyt.py
Type: text/x-python
Size: 1045 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190619/69153800/attachment-0001.py>

From m.h.vankerkwijk at gmail.com  Wed Jun 19 22:19:49 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 19 Jun 2019 22:19:49 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
Message-ID: <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>

Hi Allan,

This is very impressive! I could get the tests that I wrote for my class
pass with yours using Quantity with what I would consider very minimal
changes. I only could not find a good way to unmask data (I like the idea
of setting the mask on some elements via `ma[item] = X`); is this on
purpose?

Anyway, it would seem easily at the point where I should comment on your
repository rather than in the mailing list!

All the best,

Marten


On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> >
> >
> > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane <allanhaldane at gmail.com
> > <mailto:allanhaldane at gmail.com>> wrote:
> > <snip>
> >
> >     > This may be too much to ask from the initializer, but, if so, it
> still
> >     > seems most useful if it is made as easy as possible to do, say,
> `class
> >     > MaskedQuantity(Masked, Quantity): <very few overrides>`.
> >
> >     Currently MaskedArray does not accept ducktypes as underlying arrays,
> >     but I think it shouldn't be too hard to modify it to do so. Good
> idea!
> >
> >
> > Looking back at my trial, I see that I also never got to duck arrays -
> > only ndarray subclasses - though I tried to make the code as agnostic as
> > possible.
> >
> > (Trial at
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> >
> >     I already partly navigated this mixin-issue in the
> >     "MaskedArrayCollection" class, which essentially does
> >     ArrayCollection(MaskedArray(array)), and only takes about 30 lines of
> >     boilerplate. That's the backwards encapsulation order from what you
> want
> >     though.
> >
> >
> > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> > mask=[True, False, False])` does indeed not have a `.unit` attribute
> > (and cannot represent itself...); I'm not at all sure that my method of
> > just creating a mixed class is anything but a recipe for disaster,
> though!
>
> Based on your suggestion I worked on this a little today, and now my
> MaskedArray more easily encapsulates both ducktypes and ndarray
> subclasses (pushed to repo). Here's an example I got working with masked
> units using unyt:
>
> [1]: from MaskedArray import X, MaskedArray, MaskedScalar
>
> [2]: from unyt import m, km
>
> [3]: import numpy as np
>
> [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>
> [5]: uarr
>
> MaskedArray([1., X , 3.])
> [6]: uarr + 1*m
>
> MaskedArray([1.001, X    , 3.001])
> [7]: uarr.filled()
>
> unyt_array([1., 0., 3.], 'km')
> [8]: np.concatenate([uarr, 2*uarr]).filled()
> unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>
> The catch is the ducktype/subclass has to rigorously follow numpy's
> indexing rules, including distinguishing 0d arrays from scalars. For now
> only I used unyt in the example above since it happens to be less strict
>  about dimensionless operations than astropy.units which trips up my
> repr code. (see below for example with astropy.units). Note in the last
> line I lost the dimensions, but that is because unyt does not handle
> np.concatenate. To get that to work we need a true ducktype for units.
>
> The example above doesn't expose the ".units" attribute outside the
> MaskedArray, and it doesn't print the units in the repr. But you can
> access them using "filled".
>
> While I could make MaskedArray forward unknown attribute accesses to the
> encapsulated array, that seems a bit dangerous/bug-prone at first
> glance, so probably I want to require the user to make a MaskedArray
> subclass to do so. I've just started playing with that (probably buggy),
> and Ive attached subclass examples for astropy.unit and unyt, with some
> example output below.
>
> Cheers,
> Allan
>
>
>
> Example using the attached astropy unit subclass:
>
>     >>> from astropy.units import m, km, s
>     >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>     >>> uarr
>     MaskedQ([1., X , 1.], units=km)
>     >>> uarr.units
>     km
>     >>> uarr + (1*m)
>     MaskedQ([1.001, X    , 1.001], units=km)
>     >>> uarr/(1*s)
>     MaskedQ([1., X , 1.], units=km / s)
>     >>> (uarr*(1*m))[1:]
>     MaskedQ([X , 1.], units=km m)
>     >>> np.add.outer(uarr, uarr)
>     MaskedQ([[2., X , 2.],
>              [X , X , X ],
>              [2., X , 2.]], units=km)
>     >>> print(uarr)
>     [1. X  1.] km m
>
> Cheers,
> Allan
>
>
> >     > Even if this impossible, I think it is conceptually useful to think
> >     > about what the masking class should do. My sense is that, e.g., it
> >     > should not attempt to decide when an operation succeeds or not,
> >     but just
> >     > "or together" input masks for regular, multiple-input functions,
> >     and let
> >     > the underlying arrays skip elements for reductions by using `where`
> >     > (hey, I did implement that for a reason... ;-). In particular, it
> >     > suggests one should not have things like domains and all that (I
> never
> >     > understood why `MaskedArray` did that). If one wants more, the
> class
> >     > should provide a method that updates the mask (a sensible default
> >     might
> >     > be `mask |= ~np.isfinite(result)` - here, the class being masked
> >     should
> >     > logically support ufuncs and functions, so it can decide what
> >     "isfinite"
> >     > means).
> >
> >     I agree it would be nice to remove domains. It would make life
> easier,
> >     and I could remove a lot of twiddly code! I kept it in for now to
> >     minimize the behavior changes from the old MaskedArray.
> >
> >
> > That makes sense. Could be separated out to a backwards-compatibility
> > class later.
> >
> >
> >     > In any case, I would think that a basic truth should be that
> >     everything
> >     > has a mask with a shape consistent with the data, so
> >     > 1. Each complex numbers has just one mask, and setting `a.imag`
> with a
> >     > masked array should definitely propagate the mask.
> >     > 2. For a masked array with structured dtype, I'd similarly say
> >     that the
> >     > default is for a mask to have the same shape as the array. But that
> >     > something like your collection makes sense for the case where one
> >     wants
> >     > to mask items in a structure.
> >
> >     Agreed that we should have a single bool per complex or structured
> >     element, and the mask shape is the same as the array shape. That's
> how I
> >     implemented it. But there is still a problem with complex.imag
> >     assignment:
> >
> >         >>> a = MaskedArray([1j, 2, X])
> >         >>> i = a.imag
> >         >>> i[:] = MaskedArray([1, X, 1])
> >
> >     If we make the last line copy the mask to the original array, what
> >     should the real part of a[2] be? Conversely, if we don't copy the
> mask,
> >     what should the imag part of a[1] be? It seems like we might "want"
> the
> >     masks to be OR'd instead, but then should i[2] be masked after we
> just
> >     set it to 1?
> >
> > Ah, I see the issue now... Easiest to implement and closest in analogy
> > to a regular view would be to just let it unmask a[2] (with whatever is
> > in real; user beware!).
> >
> > Perhaps better would be to special-case such that `imag` returns a
> > read-only view of the mask. Making `imag` itself read-only would prevent
> > possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but there is
> > no reason this should update the mask.
> >
> > Still, neither is really satisfactory...
> >
> >
> >
> >     > p.s. I started trying to implement the above "Mixin" class; will
> >     try to
> >     > clean that up a bit so that at least it uses `where` and push it
> up.
> >
> >     I played with "where", but didn't include it since 1.17 is not
> released.
> >     To avoid duplication of effort, I've attached a diff of what I
> tried. I
> >     actually get a slight slowdown of about 10% by using where...
> >
> >
> > Your implementation is indeed quite similar to what I got in
> > __array_ufunc__ (though one should "&" the where with ~mask).
> >
> > I think the main benefit is not to presume that whatever is underneath
> > understands 0 or 1, i.e., avoid filling.
> >
> >
> >     If you make progress with the mixin, a push is welcome. I imagine a
> >     problem is going to be that np.isscalar doesn't work to detect duck
> >     scalars.
> >
> > I fear that in my attempts I've simply decided that only array scalars
> > exist...
> >
> > -- Marten
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190619/2ba2f2a6/attachment-0001.html>

From allanhaldane at gmail.com  Thu Jun 20 12:44:21 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Thu, 20 Jun 2019 12:44:21 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
Message-ID: <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>

On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
> Hi Allan,
> 
> This is very impressive! I could get the tests that I wrote for my class
> pass with yours using Quantity with what I would consider very minimal
> changes. I only could not find a good way to unmask data (I like the
> idea of setting the mask on some elements via `ma[item] = X`); is this
> on purpose?

Yes, I want to make it difficult for the user to access the garbage
values under the mask, which are often clobbered values. The only way to
"remove" a masked value is by replacing it with a new non-masked value.


> Anyway, it would seem easily at the point where I should comment on your
> repository rather than in the mailing list!

To make further progress on this encapsulation idea I need a more
complete ducktype to pass into MaskedArray to test, so that's what I'll
work on next, when I have time. I'll either try to finish my
ArrayCollection type, or try making a simple NDunit ducktype
piggybacking on astropy's Unit.

Best,
Allan


> 
> All the best,
> 
> Marten
> 
> 
> On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <allanhaldane at gmail.com
> <mailto:allanhaldane at gmail.com>> wrote:
> 
>     On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
>     >
>     >
>     > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
>     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>     > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
>     wrote:
>     > <snip>
>     >
>     >? ? ?> This may be too much to ask from the initializer, but, if
>     so, it still
>     >? ? ?> seems most useful if it is made as easy as possible to do,
>     say, `class
>     >? ? ?> MaskedQuantity(Masked, Quantity): <very few overrides>`.
>     >
>     >? ? ?Currently MaskedArray does not accept ducktypes as underlying
>     arrays,
>     >? ? ?but I think it shouldn't be too hard to modify it to do so.
>     Good idea!
>     >
>     >
>     > Looking back at my trial, I see that I also never got to duck arrays -
>     > only ndarray subclasses - though I tried to make the code as
>     agnostic as
>     > possible.
>     >
>     > (Trial at
>     >
>     https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1)
>     >
>     >? ? ?I already partly navigated this mixin-issue in the
>     >? ? ?"MaskedArrayCollection" class, which essentially does
>     >? ? ?ArrayCollection(MaskedArray(array)), and only takes about 30
>     lines of
>     >? ? ?boilerplate. That's the backwards encapsulation order from
>     what you want
>     >? ? ?though.
>     >
>     >
>     > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
>     > mask=[True, False, False])` does indeed not have a `.unit` attribute
>     > (and cannot represent itself...); I'm not at all sure that my
>     method of
>     > just creating a mixed class is anything but a recipe for disaster,
>     though!
> 
>     Based on your suggestion I worked on this a little today, and now my
>     MaskedArray more easily encapsulates both ducktypes and ndarray
>     subclasses (pushed to repo). Here's an example I got working with masked
>     units using unyt:
> 
>     [1]: from MaskedArray import X, MaskedArray, MaskedScalar
> 
>     [2]: from unyt import m, km
> 
>     [3]: import numpy as np
> 
>     [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
> 
>     [5]: uarr
> 
>     MaskedArray([1., X , 3.])
>     [6]: uarr + 1*m
> 
>     MaskedArray([1.001, X? ? , 3.001])
>     [7]: uarr.filled()
> 
>     unyt_array([1., 0., 3.], 'km')
>     [8]: np.concatenate([uarr, 2*uarr]).filled()
>     unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
> 
>     The catch is the ducktype/subclass has to rigorously follow numpy's
>     indexing rules, including distinguishing 0d arrays from scalars. For now
>     only I used unyt in the example above since it happens to be less strict
>     ?about dimensionless operations than astropy.units which trips up my
>     repr code. (see below for example with astropy.units). Note in the last
>     line I lost the dimensions, but that is because unyt does not handle
>     np.concatenate. To get that to work we need a true ducktype for units.
> 
>     The example above doesn't expose the ".units" attribute outside the
>     MaskedArray, and it doesn't print the units in the repr. But you can
>     access them using "filled".
> 
>     While I could make MaskedArray forward unknown attribute accesses to the
>     encapsulated array, that seems a bit dangerous/bug-prone at first
>     glance, so probably I want to require the user to make a MaskedArray
>     subclass to do so. I've just started playing with that (probably buggy),
>     and Ive attached subclass examples for astropy.unit and unyt, with some
>     example output below.
> 
>     Cheers,
>     Allan
> 
> 
> 
>     Example using the attached astropy unit subclass:
> 
>     ? ? >>> from astropy.units import m, km, s
>     ? ? >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>     ? ? >>> uarr
>     ? ? MaskedQ([1., X , 1.], units=km)
>     ? ? >>> uarr.units
>     ? ? km
>     ? ? >>> uarr + (1*m)
>     ? ? MaskedQ([1.001, X? ? , 1.001], units=km)
>     ? ? >>> uarr/(1*s)
>     ? ? MaskedQ([1., X , 1.], units=km / s)
>     ? ? >>> (uarr*(1*m))[1:]
>     ? ? MaskedQ([X , 1.], units=km m)
>     ? ? >>> np.add.outer(uarr, uarr)
>     ? ? MaskedQ([[2., X , 2.],
>     ? ? ? ? ? ? ?[X , X , X ],
>     ? ? ? ? ? ? ?[2., X , 2.]], units=km)
>     ? ? >>> print(uarr)
>     ? ? [1. X? 1.] km m
> 
>     Cheers,
>     Allan
> 
> 
>     >? ? ?> Even if this impossible, I think it is conceptually useful
>     to think
>     >? ? ?> about what the masking class should do. My sense is that,
>     e.g., it
>     >? ? ?> should not attempt to decide when an operation succeeds or not,
>     >? ? ?but just
>     >? ? ?> "or together" input masks for regular, multiple-input functions,
>     >? ? ?and let
>     >? ? ?> the underlying arrays skip elements for reductions by using
>     `where`
>     >? ? ?> (hey, I did implement that for a reason... ;-). In
>     particular, it
>     >? ? ?> suggests one should not have things like domains and all
>     that (I never
>     >? ? ?> understood why `MaskedArray` did that). If one wants more,
>     the class
>     >? ? ?> should provide a method that updates the mask (a sensible
>     default
>     >? ? ?might
>     >? ? ?> be `mask |= ~np.isfinite(result)` - here, the class being masked
>     >? ? ?should
>     >? ? ?> logically support ufuncs and functions, so it can decide what
>     >? ? ?"isfinite"
>     >? ? ?> means).
>     >
>     >? ? ?I agree it would be nice to remove domains. It would make life
>     easier,
>     >? ? ?and I could remove a lot of twiddly code! I kept it in for now to
>     >? ? ?minimize the behavior changes from the old MaskedArray.
>     >
>     >
>     > That makes sense. Could be separated out to a backwards-compatibility
>     > class later.
>     >
>     >
>     >? ? ?> In any case, I would think that a basic truth should be that
>     >? ? ?everything
>     >? ? ?> has a mask with a shape consistent with the data, so
>     >? ? ?> 1. Each complex numbers has just one mask, and setting
>     `a.imag` with a
>     >? ? ?> masked array should definitely propagate the mask.
>     >? ? ?> 2. For a masked array with structured dtype, I'd similarly say
>     >? ? ?that the
>     >? ? ?> default is for a mask to have the same shape as the array.
>     But that
>     >? ? ?> something like your collection makes sense for the case
>     where one
>     >? ? ?wants
>     >? ? ?> to mask items in a structure.
>     >
>     >? ? ?Agreed that we should have a single bool per complex or structured
>     >? ? ?element, and the mask shape is the same as the array shape.
>     That's how I
>     >? ? ?implemented it. But there is still a problem with complex.imag
>     >? ? ?assignment:
>     >
>     >? ? ?? ? >>> a = MaskedArray([1j, 2, X])
>     >? ? ?? ? >>> i = a.imag
>     >? ? ?? ? >>> i[:] = MaskedArray([1, X, 1])
>     >
>     >? ? ?If we make the last line copy the mask to the original array, what
>     >? ? ?should the real part of a[2] be? Conversely, if we don't copy
>     the mask,
>     >? ? ?what should the imag part of a[1] be? It seems like we might
>     "want" the
>     >? ? ?masks to be OR'd instead, but then should i[2] be masked after
>     we just
>     >? ? ?set it to 1?
>     >
>     > Ah, I see the issue now... Easiest to implement and closest in analogy
>     > to a regular view would be to just let it unmask a[2] (with
>     whatever is
>     > in real; user beware!).
>     >
>     > Perhaps better would be to special-case such that `imag` returns a
>     > read-only view of the mask. Making `imag` itself read-only would
>     prevent
>     > possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but
>     there is
>     > no reason this should update the mask.
>     >
>     > Still, neither is really satisfactory...
>     > ?
>     >
>     >
>     >? ? ?> p.s. I started trying to implement the above "Mixin" class; will
>     >? ? ?try to
>     >? ? ?> clean that up a bit so that at least it uses `where` and
>     push it up.
>     >
>     >? ? ?I played with "where", but didn't include it since 1.17 is not
>     released.
>     >? ? ?To avoid duplication of effort, I've attached a diff of what I
>     tried. I
>     >? ? ?actually get a slight slowdown of about 10% by using where...
>     >
>     >
>     > Your implementation is indeed quite similar to what I got in
>     > __array_ufunc__ (though one should "&" the where with ~mask).
>     >
>     > I think the main benefit is not to presume that whatever is underneath
>     > understands 0 or 1, i.e., avoid filling.
>     >
>     >
>     >? ? ?If you make progress with the mixin, a push is welcome. I
>     imagine a
>     >? ? ?problem is going to be that np.isscalar doesn't work to detect
>     duck
>     >? ? ?scalars.
>     >
>     > I fear that in my attempts I've simply decided that only array scalars
>     > exist...
>     >
>     > -- Marten
>     >
>     > _______________________________________________
>     > NumPy-Discussion mailing list
>     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     > https://mail.python.org/mailman/listinfo/numpy-discussion
>     >
> 
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From sebastian at sipsolutions.net  Fri Jun 21 13:55:16 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 21 Jun 2019 10:55:16 -0700
Subject: [Numpy-discussion] Moving forward with value based casting
In-Reply-To: <CAJNV+9s5-hW_6uLtB+xQuwNKTAFsZNhFEkrK7MFgyYr0iNtXsw@mail.gmail.com>
References: <e3704a10539e54a885c279a8bf71ec9c17b5467c.camel@sipsolutions.net>
 <7efa604d071e796e275e89ccab8a0f04bae6eb04.camel@sipsolutions.net>
 <CAHPuU_bhoUQQZicM7i7NQrSSQF1q5mzq1JxGwA1dwLxCvZuosA@mail.gmail.com>
 <CAJNV+9s5-hW_6uLtB+xQuwNKTAFsZNhFEkrK7MFgyYr0iNtXsw@mail.gmail.com>
Message-ID: <1755bced786462f385b006372ca22f9b3fe6cf0f.camel@sipsolutions.net>

On Wed, 2019-06-05 at 21:35 -0400, Marten van Kerkwijk wrote:
> Hi Sebastian,
> 
> Tricky! It seems a balance between unexpected memory blow-up and
> unexpected wrapping (the latter mostly for integers). 
> 
> Some comments specifically on your message first, then some more
> general related ones. 
> 
> 1. I'm very much against letting `a + b` do anything else than
> `np.add(a, b)`.
> 2. For python values, an argument for casting by value is that a
> python int can be arbitrarily long; the only reasonable course of
> action for those seems to make them float, and once you do that one
> might as well cast to whatever type can hold the value (at least
> approximately).

Just to throw it in, in the long run, instead of trying to find a
minimal dtype (which is a bit random), simply ignoring the value of the
scalar may actually be the better option.

The reason for this would be code like:
```
arr = np.zeros(5, dtype=np.int8)

for i in range(200):
    res = arr + i
    print(res.dtype)  # switches from int8 to int16!
```
Instead, try `np.int8(i)` in the loop, and if it fails raise an error.
Or, if that is a bit nasty ? especially for interactive usage ? we
would go with a warning.

This is nothing we need to decide soon, since I think some of the
complexity will remain (i.e. you still need to know that the scalar is
a floating point number or an integer and change the logic).

Best,

Sebastian


> 3. Not necessarily preferred, but for casting of scalars, one can get
> more consistent behaviour also by extending the casting by value to
> any array that has size=1.
> 
> Overall, just on the narrow question, I'd be quite happy with your
> suggestion of using type information if available, i.e., only cast
> python values to a minimal dtype.If one uses numpy types, those
> mostly will have come from previous calculations with the same
> arrays, so things will work as expected. And in most memory-limited
> applications, one would do calculations in-place anyway (or, as Tyler
> noted, for power users one can assume awareness of memory and thus
> the incentive to tell explicitly what dtype is wanted - just
> `np.add(a, b, dtype=...)`, no need to create `out`).
> 
> More generally, I guess what I don't like about the casting rules
> generally is that there is a presumption that if the value can be
> cast, the operation will generally succeed. For `np.add` and
> `np.subtract`, this perhaps is somewhat reasonable (though for
> unsigned a bit more dubious), but for `np.multiply` or `np.power` it
> is much less so. (Indeed, we had a long discussion about what to do
> with `int ** power` - now special-casing negative integer powers.)
> Changing this, however, probably really is a bridge too far!
> 
> Finally, somewhat related: I think the largest confusing actually
> results from the `uint64+in64 -> float64` casting.  Should this cast
> to int64 instead?
> 
> All the best,
> 
> Marten
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190621/c4811844/attachment.sig>

From ben.v.root at gmail.com  Fri Jun 21 14:37:00 2019
From: ben.v.root at gmail.com (Benjamin Root)
Date: Fri, 21 Jun 2019 14:37:00 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
Message-ID: <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>

Just to note, data that is masked isn't always garbage. There are plenty of
use-cases where one may want to temporarily apply a mask for a set of
computation, or possibly want to apply a series of different masks to the
data. I haven't read through this discussion deeply enough, but is this new
class going to destroy underlying masked data? and will it be possible to
swap out masks?

Cheers!
Ben Root


On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
> > Hi Allan,
> >
> > This is very impressive! I could get the tests that I wrote for my class
> > pass with yours using Quantity with what I would consider very minimal
> > changes. I only could not find a good way to unmask data (I like the
> > idea of setting the mask on some elements via `ma[item] = X`); is this
> > on purpose?
>
> Yes, I want to make it difficult for the user to access the garbage
> values under the mask, which are often clobbered values. The only way to
> "remove" a masked value is by replacing it with a new non-masked value.
>
>
> > Anyway, it would seem easily at the point where I should comment on your
> > repository rather than in the mailing list!
>
> To make further progress on this encapsulation idea I need a more
> complete ducktype to pass into MaskedArray to test, so that's what I'll
> work on next, when I have time. I'll either try to finish my
> ArrayCollection type, or try making a simple NDunit ducktype
> piggybacking on astropy's Unit.
>
> Best,
> Allan
>
>
> >
> > All the best,
> >
> > Marten
> >
> >
> > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <allanhaldane at gmail.com
> > <mailto:allanhaldane at gmail.com>> wrote:
> >
> >     On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> >     >
> >     >
> >     > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
> >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
> >     > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
> >     wrote:
> >     > <snip>
> >     >
> >     >     > This may be too much to ask from the initializer, but, if
> >     so, it still
> >     >     > seems most useful if it is made as easy as possible to do,
> >     say, `class
> >     >     > MaskedQuantity(Masked, Quantity): <very few overrides>`.
> >     >
> >     >     Currently MaskedArray does not accept ducktypes as underlying
> >     arrays,
> >     >     but I think it shouldn't be too hard to modify it to do so.
> >     Good idea!
> >     >
> >     >
> >     > Looking back at my trial, I see that I also never got to duck
> arrays -
> >     > only ndarray subclasses - though I tried to make the code as
> >     agnostic as
> >     > possible.
> >     >
> >     > (Trial at
> >     >
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> >     >
> >     >     I already partly navigated this mixin-issue in the
> >     >     "MaskedArrayCollection" class, which essentially does
> >     >     ArrayCollection(MaskedArray(array)), and only takes about 30
> >     lines of
> >     >     boilerplate. That's the backwards encapsulation order from
> >     what you want
> >     >     though.
> >     >
> >     >
> >     > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> >     > mask=[True, False, False])` does indeed not have a `.unit`
> attribute
> >     > (and cannot represent itself...); I'm not at all sure that my
> >     method of
> >     > just creating a mixed class is anything but a recipe for disaster,
> >     though!
> >
> >     Based on your suggestion I worked on this a little today, and now my
> >     MaskedArray more easily encapsulates both ducktypes and ndarray
> >     subclasses (pushed to repo). Here's an example I got working with
> masked
> >     units using unyt:
> >
> >     [1]: from MaskedArray import X, MaskedArray, MaskedScalar
> >
> >     [2]: from unyt import m, km
> >
> >     [3]: import numpy as np
> >
> >     [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
> >
> >     [5]: uarr
> >
> >     MaskedArray([1., X , 3.])
> >     [6]: uarr + 1*m
> >
> >     MaskedArray([1.001, X    , 3.001])
> >     [7]: uarr.filled()
> >
> >     unyt_array([1., 0., 3.], 'km')
> >     [8]: np.concatenate([uarr, 2*uarr]).filled()
> >     unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
> >
> >     The catch is the ducktype/subclass has to rigorously follow numpy's
> >     indexing rules, including distinguishing 0d arrays from scalars. For
> now
> >     only I used unyt in the example above since it happens to be less
> strict
> >      about dimensionless operations than astropy.units which trips up my
> >     repr code. (see below for example with astropy.units). Note in the
> last
> >     line I lost the dimensions, but that is because unyt does not handle
> >     np.concatenate. To get that to work we need a true ducktype for
> units.
> >
> >     The example above doesn't expose the ".units" attribute outside the
> >     MaskedArray, and it doesn't print the units in the repr. But you can
> >     access them using "filled".
> >
> >     While I could make MaskedArray forward unknown attribute accesses to
> the
> >     encapsulated array, that seems a bit dangerous/bug-prone at first
> >     glance, so probably I want to require the user to make a MaskedArray
> >     subclass to do so. I've just started playing with that (probably
> buggy),
> >     and Ive attached subclass examples for astropy.unit and unyt, with
> some
> >     example output below.
> >
> >     Cheers,
> >     Allan
> >
> >
> >
> >     Example using the attached astropy unit subclass:
> >
> >         >>> from astropy.units import m, km, s
> >         >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
> >         >>> uarr
> >         MaskedQ([1., X , 1.], units=km)
> >         >>> uarr.units
> >         km
> >         >>> uarr + (1*m)
> >         MaskedQ([1.001, X    , 1.001], units=km)
> >         >>> uarr/(1*s)
> >         MaskedQ([1., X , 1.], units=km / s)
> >         >>> (uarr*(1*m))[1:]
> >         MaskedQ([X , 1.], units=km m)
> >         >>> np.add.outer(uarr, uarr)
> >         MaskedQ([[2., X , 2.],
> >                  [X , X , X ],
> >                  [2., X , 2.]], units=km)
> >         >>> print(uarr)
> >         [1. X  1.] km m
> >
> >     Cheers,
> >     Allan
> >
> >
> >     >     > Even if this impossible, I think it is conceptually useful
> >     to think
> >     >     > about what the masking class should do. My sense is that,
> >     e.g., it
> >     >     > should not attempt to decide when an operation succeeds or
> not,
> >     >     but just
> >     >     > "or together" input masks for regular, multiple-input
> functions,
> >     >     and let
> >     >     > the underlying arrays skip elements for reductions by using
> >     `where`
> >     >     > (hey, I did implement that for a reason... ;-). In
> >     particular, it
> >     >     > suggests one should not have things like domains and all
> >     that (I never
> >     >     > understood why `MaskedArray` did that). If one wants more,
> >     the class
> >     >     > should provide a method that updates the mask (a sensible
> >     default
> >     >     might
> >     >     > be `mask |= ~np.isfinite(result)` - here, the class being
> masked
> >     >     should
> >     >     > logically support ufuncs and functions, so it can decide what
> >     >     "isfinite"
> >     >     > means).
> >     >
> >     >     I agree it would be nice to remove domains. It would make life
> >     easier,
> >     >     and I could remove a lot of twiddly code! I kept it in for now
> to
> >     >     minimize the behavior changes from the old MaskedArray.
> >     >
> >     >
> >     > That makes sense. Could be separated out to a
> backwards-compatibility
> >     > class later.
> >     >
> >     >
> >     >     > In any case, I would think that a basic truth should be that
> >     >     everything
> >     >     > has a mask with a shape consistent with the data, so
> >     >     > 1. Each complex numbers has just one mask, and setting
> >     `a.imag` with a
> >     >     > masked array should definitely propagate the mask.
> >     >     > 2. For a masked array with structured dtype, I'd similarly
> say
> >     >     that the
> >     >     > default is for a mask to have the same shape as the array.
> >     But that
> >     >     > something like your collection makes sense for the case
> >     where one
> >     >     wants
> >     >     > to mask items in a structure.
> >     >
> >     >     Agreed that we should have a single bool per complex or
> structured
> >     >     element, and the mask shape is the same as the array shape.
> >     That's how I
> >     >     implemented it. But there is still a problem with complex.imag
> >     >     assignment:
> >     >
> >     >         >>> a = MaskedArray([1j, 2, X])
> >     >         >>> i = a.imag
> >     >         >>> i[:] = MaskedArray([1, X, 1])
> >     >
> >     >     If we make the last line copy the mask to the original array,
> what
> >     >     should the real part of a[2] be? Conversely, if we don't copy
> >     the mask,
> >     >     what should the imag part of a[1] be? It seems like we might
> >     "want" the
> >     >     masks to be OR'd instead, but then should i[2] be masked after
> >     we just
> >     >     set it to 1?
> >     >
> >     > Ah, I see the issue now... Easiest to implement and closest in
> analogy
> >     > to a regular view would be to just let it unmask a[2] (with
> >     whatever is
> >     > in real; user beware!).
> >     >
> >     > Perhaps better would be to special-case such that `imag` returns a
> >     > read-only view of the mask. Making `imag` itself read-only would
> >     prevent
> >     > possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but
> >     there is
> >     > no reason this should update the mask.
> >     >
> >     > Still, neither is really satisfactory...
> >     >
> >     >
> >     >
> >     >     > p.s. I started trying to implement the above "Mixin" class;
> will
> >     >     try to
> >     >     > clean that up a bit so that at least it uses `where` and
> >     push it up.
> >     >
> >     >     I played with "where", but didn't include it since 1.17 is not
> >     released.
> >     >     To avoid duplication of effort, I've attached a diff of what I
> >     tried. I
> >     >     actually get a slight slowdown of about 10% by using where...
> >     >
> >     >
> >     > Your implementation is indeed quite similar to what I got in
> >     > __array_ufunc__ (though one should "&" the where with ~mask).
> >     >
> >     > I think the main benefit is not to presume that whatever is
> underneath
> >     > understands 0 or 1, i.e., avoid filling.
> >     >
> >     >
> >     >     If you make progress with the mixin, a push is welcome. I
> >     imagine a
> >     >     problem is going to be that np.isscalar doesn't work to detect
> >     duck
> >     >     scalars.
> >     >
> >     > I fear that in my attempts I've simply decided that only array
> scalars
> >     > exist...
> >     >
> >     > -- Marten
> >     >
> >     > _______________________________________________
> >     > NumPy-Discussion mailing list
> >     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >     > https://mail.python.org/mailman/listinfo/numpy-discussion
> >     >
> >
> >     _______________________________________________
> >     NumPy-Discussion mailing list
> >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >     https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190621/8cf37f76/attachment-0001.html>

From sebastian at sipsolutions.net  Fri Jun 21 14:37:50 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 21 Jun 2019 11:37:50 -0700
Subject: [Numpy-discussion] Better way to create multiple independent random
 number generators
Message-ID: <e25bc7a7ac0f2af43a42ee713d7eeaacf9f95bac.camel@sipsolutions.net>

Hi all,

since this is going to be a new addition as part of the randomgen, I
thought I would just mention it on the mailing list. The Pull Request:

https://github.com/numpy/numpy/pull/13780

Implements a new SeedSequence object based on Robert Kern's proposal
and especially the work by Prof. O'Neill's which is included in C++.

This new API allows to create many independent random generators/random
number streams. For now this will be exposed by a new object:

```
entropy = {None, int, sequence[int]}  # None or "seed"
seed_seq = np.random.SeedSequence(entropy=entropy)

# Run 100 predictable independend streams:
for spawned_seed_seq in seed_seq.spawn(100):
    run_parallel_task(spawned_seed_seq)


# where `run_parallel_task` will do:
def run_parallel_task(seed_seq):
    # Create a BitGenerator and a Generator [1]
    bit_rng = np.random.PCG64(seed_seq)
    rng = np.random.Generator(bit_rng)
```

The beauty is that `run_parellel_task` can again use `seed_seq.spawn()`
to create another set of independent streams.

I am very happy with this new API. Right now we decided to opt for a
SeedSequence object. In the future we may opt to adding a `.spawn()`
method directly to the Generator or BitGenerator. This is mostly a
heads up, since it is a new set of API, which I believe has never been
mentioned/proposed on the mailing list.

Best,

Sebastian


[1] The new API separates the BitGenerator which creates the random
streams from the Generator which uses the random stream to give sample
distributions of random numbers providing `uniform`, or `normal`, etc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190621/376bb24e/attachment.sig>

From robert.kern at gmail.com  Fri Jun 21 14:51:35 2019
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 21 Jun 2019 11:51:35 -0700
Subject: [Numpy-discussion] Better way to create multiple independent
 random number generators
In-Reply-To: <e25bc7a7ac0f2af43a42ee713d7eeaacf9f95bac.camel@sipsolutions.net>
References: <e25bc7a7ac0f2af43a42ee713d7eeaacf9f95bac.camel@sipsolutions.net>
Message-ID: <CAF6FJisP3j7ZFB0u_UG3OdObtOeQYfMgphoqUZYTgknPYzkuaA@mail.gmail.com>

On Fri, Jun 21, 2019 at 11:39 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> I am very happy with this new API. Right now we decided to opt for a
> SeedSequence object. In the future we may opt to adding a `.spawn()`
> method directly to the Generator or BitGenerator. This is mostly a
> heads up, since it is a new set of API, which I believe has never been
> mentioned/proposed on the mailing list.
>

We talked about the general desire for this kind of API a few years ago:

https://www.mail-archive.com/numpy-discussion at scipy.org/msg50383.html

The strength of the algorithm provided by Prof O'Neill is what enables this
kind of API, which is a sea change in how one can safely build reproducible
parallel stochastic programs and libraries.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190621/796231aa/attachment.html>

From allanhaldane at gmail.com  Sat Jun 22 10:54:07 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Sat, 22 Jun 2019 10:54:07 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
Message-ID: <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>

On 6/21/19 2:37 PM, Benjamin Root wrote:
> Just to note, data that is masked isn't always garbage. There are plenty
> of use-cases where one may want to temporarily apply a mask for a set of
> computation, or possibly want to apply a series of different masks to
> the data. I haven't read through this discussion deeply enough, but is
> this new class going to destroy underlying masked data? and will it be
> possible to swap out masks?
> 
> Cheers!
> Ben Root

Indeed my implementation currently feels free to clobber the data at 
masked positions and makes no guarantees not to.

I'd like to try to support reasonable use-cases like yours though. A few 
thoughts:

First, the old np.ma.MaskedArray explicitly does not promise to preserve 
masked values, with a big warning in the docs. I can't recall the 
examples, but I remember coming across cases where clobbering happens. 
So arguably your behavior was never supported, and perhaps this means 
that no-clobber behavior is difficult to reasonably support.

Second, the old np.ma.MaskedArray avoids frequent clobbering by making 
lots of copies. Therefore, in most cases you will not lose any 
performance in my new MaskedArray relative to the old one by making an 
explicit copy yourself. I.e, is it problematic to have to do

     >>> result = MaskedArray(data.copy(), trial_mask).sum()

instead of

     >>> marr.mask = trial_mask
     >>> result = marr.sum()

since they have similar performance?

Third, in the old np.ma.MaskedArray masked positions are very often 
"effectively" clobbered, in the sense that they are not computed. For 
example, if you do "c = a+b", and then change the mask of c, the values 
at masked position of the result of (a+b) do not correspond to the sum 
of the masked values in a and b. Thus, by "unmasking" c you are exposing 
nonsense values, which to me seems likely to cause heisenbugs.


In summary, by not making no-clobber guarantees and by strictly 
preventing exposure of nonsense values, I suspect that: 1. my new code 
is simpler and faster by avoiding lots of copies, and forces copies to 
be explicit in user code. 2. disallowing direct modification of the mask 
lowers the "API surface area" making people's MaskedArray code less 
buggy and easier to read: Exposure of nonsense values by "unmasking" is 
one less possibility to keep in mind.

Best,
Allan


> On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane <allanhaldane at gmail.com
> <mailto:allanhaldane at gmail.com>> wrote:
> 
>     On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
>     > Hi Allan,
>     >
>     > This is very impressive! I could get the tests that I wrote for my
>     class
>     > pass with yours using Quantity with what I would consider very minimal
>     > changes. I only could not find a good way to unmask data (I like the
>     > idea of setting the mask on some elements via `ma[item] = X`); is this
>     > on purpose?
> 
>     Yes, I want to make it difficult for the user to access the garbage
>     values under the mask, which are often clobbered values. The only way to
>     "remove" a masked value is by replacing it with a new non-masked value.
> 
> 
>     > Anyway, it would seem easily at the point where I should comment
>     on your
>     > repository rather than in the mailing list!
> 
>     To make further progress on this encapsulation idea I need a more
>     complete ducktype to pass into MaskedArray to test, so that's what I'll
>     work on next, when I have time. I'll either try to finish my
>     ArrayCollection type, or try making a simple NDunit ducktype
>     piggybacking on astropy's Unit.
> 
>     Best,
>     Allan
> 
> 
>     >
>     > All the best,
>     >
>     > Marten
>     >
>     >
>     > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane
>     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>     > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
>     wrote:
>     >
>     >? ? ?On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
>     >? ? ?<allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>     <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>
>     >? ? ?> <mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com> <mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com>>>>
>     >? ? ?wrote:
>     >? ? ?> <snip>
>     >? ? ?>
>     >? ? ?>? ? ?> This may be too much to ask from the initializer, but, if
>     >? ? ?so, it still
>     >? ? ?>? ? ?> seems most useful if it is made as easy as possible to do,
>     >? ? ?say, `class
>     >? ? ?>? ? ?> MaskedQuantity(Masked, Quantity): <very few overrides>`.
>     >? ? ?>
>     >? ? ?>? ? ?Currently MaskedArray does not accept ducktypes as
>     underlying
>     >? ? ?arrays,
>     >? ? ?>? ? ?but I think it shouldn't be too hard to modify it to do so.
>     >? ? ?Good idea!
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> Looking back at my trial, I see that I also never got to
>     duck arrays -
>     >? ? ?> only ndarray subclasses - though I tried to make the code as
>     >? ? ?agnostic as
>     >? ? ?> possible.
>     >? ? ?>
>     >? ? ?> (Trial at
>     >? ? ?>
>     >   
>     ?https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1)
>     >? ? ?>
>     >? ? ?>? ? ?I already partly navigated this mixin-issue in the
>     >? ? ?>? ? ?"MaskedArrayCollection" class, which essentially does
>     >? ? ?>? ? ?ArrayCollection(MaskedArray(array)), and only takes about 30
>     >? ? ?lines of
>     >? ? ?>? ? ?boilerplate. That's the backwards encapsulation order from
>     >? ? ?what you want
>     >? ? ?>? ? ?though.
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) *
>     u.m,
>     >? ? ?> mask=[True, False, False])` does indeed not have a `.unit`
>     attribute
>     >? ? ?> (and cannot represent itself...); I'm not at all sure that my
>     >? ? ?method of
>     >? ? ?> just creating a mixed class is anything but a recipe for
>     disaster,
>     >? ? ?though!
>     >
>     >? ? ?Based on your suggestion I worked on this a little today, and
>     now my
>     >? ? ?MaskedArray more easily encapsulates both ducktypes and ndarray
>     >? ? ?subclasses (pushed to repo). Here's an example I got working
>     with masked
>     >? ? ?units using unyt:
>     >
>     >? ? ?[1]: from MaskedArray import X, MaskedArray, MaskedScalar
>     >
>     >? ? ?[2]: from unyt import m, km
>     >
>     >? ? ?[3]: import numpy as np
>     >
>     >? ? ?[4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>     >
>     >? ? ?[5]: uarr
>     >
>     >? ? ?MaskedArray([1., X , 3.])
>     >? ? ?[6]: uarr + 1*m
>     >
>     >? ? ?MaskedArray([1.001, X? ? , 3.001])
>     >? ? ?[7]: uarr.filled()
>     >
>     >? ? ?unyt_array([1., 0., 3.], 'km')
>     >? ? ?[8]: np.concatenate([uarr, 2*uarr]).filled()
>     >? ? ?unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>     >
>     >? ? ?The catch is the ducktype/subclass has to rigorously follow
>     numpy's
>     >? ? ?indexing rules, including distinguishing 0d arrays from
>     scalars. For now
>     >? ? ?only I used unyt in the example above since it happens to be
>     less strict
>     >? ? ??about dimensionless operations than astropy.units which trips
>     up my
>     >? ? ?repr code. (see below for example with astropy.units). Note in
>     the last
>     >? ? ?line I lost the dimensions, but that is because unyt does not
>     handle
>     >? ? ?np.concatenate. To get that to work we need a true ducktype
>     for units.
>     >
>     >? ? ?The example above doesn't expose the ".units" attribute
>     outside the
>     >? ? ?MaskedArray, and it doesn't print the units in the repr. But
>     you can
>     >? ? ?access them using "filled".
>     >
>     >? ? ?While I could make MaskedArray forward unknown attribute
>     accesses to the
>     >? ? ?encapsulated array, that seems a bit dangerous/bug-prone at first
>     >? ? ?glance, so probably I want to require the user to make a
>     MaskedArray
>     >? ? ?subclass to do so. I've just started playing with that
>     (probably buggy),
>     >? ? ?and Ive attached subclass examples for astropy.unit and unyt,
>     with some
>     >? ? ?example output below.
>     >
>     >? ? ?Cheers,
>     >? ? ?Allan
>     >
>     >
>     >
>     >? ? ?Example using the attached astropy unit subclass:
>     >
>     >? ? ?? ? >>> from astropy.units import m, km, s
>     >? ? ?? ? >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>     >? ? ?? ? >>> uarr
>     >? ? ?? ? MaskedQ([1., X , 1.], units=km)
>     >? ? ?? ? >>> uarr.units
>     >? ? ?? ? km
>     >? ? ?? ? >>> uarr + (1*m)
>     >? ? ?? ? MaskedQ([1.001, X? ? , 1.001], units=km)
>     >? ? ?? ? >>> uarr/(1*s)
>     >? ? ?? ? MaskedQ([1., X , 1.], units=km / s)
>     >? ? ?? ? >>> (uarr*(1*m))[1:]
>     >? ? ?? ? MaskedQ([X , 1.], units=km m)
>     >? ? ?? ? >>> np.add.outer(uarr, uarr)
>     >? ? ?? ? MaskedQ([[2., X , 2.],
>     >? ? ?? ? ? ? ? ? ?[X , X , X ],
>     >? ? ?? ? ? ? ? ? ?[2., X , 2.]], units=km)
>     >? ? ?? ? >>> print(uarr)
>     >? ? ?? ? [1. X? 1.] km m
>     >
>     >? ? ?Cheers,
>     >? ? ?Allan
>     >
>     >
>     >? ? ?>? ? ?> Even if this impossible, I think it is conceptually useful
>     >? ? ?to think
>     >? ? ?>? ? ?> about what the masking class should do. My sense is that,
>     >? ? ?e.g., it
>     >? ? ?>? ? ?> should not attempt to decide when an operation
>     succeeds or not,
>     >? ? ?>? ? ?but just
>     >? ? ?>? ? ?> "or together" input masks for regular, multiple-input
>     functions,
>     >? ? ?>? ? ?and let
>     >? ? ?>? ? ?> the underlying arrays skip elements for reductions by
>     using
>     >? ? ?`where`
>     >? ? ?>? ? ?> (hey, I did implement that for a reason... ;-). In
>     >? ? ?particular, it
>     >? ? ?>? ? ?> suggests one should not have things like domains and all
>     >? ? ?that (I never
>     >? ? ?>? ? ?> understood why `MaskedArray` did that). If one wants more,
>     >? ? ?the class
>     >? ? ?>? ? ?> should provide a method that updates the mask (a sensible
>     >? ? ?default
>     >? ? ?>? ? ?might
>     >? ? ?>? ? ?> be `mask |= ~np.isfinite(result)` - here, the class
>     being masked
>     >? ? ?>? ? ?should
>     >? ? ?>? ? ?> logically support ufuncs and functions, so it can
>     decide what
>     >? ? ?>? ? ?"isfinite"
>     >? ? ?>? ? ?> means).
>     >? ? ?>
>     >? ? ?>? ? ?I agree it would be nice to remove domains. It would
>     make life
>     >? ? ?easier,
>     >? ? ?>? ? ?and I could remove a lot of twiddly code! I kept it in
>     for now to
>     >? ? ?>? ? ?minimize the behavior changes from the old MaskedArray.
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> That makes sense. Could be separated out to a
>     backwards-compatibility
>     >? ? ?> class later.
>     >? ? ?>
>     >? ? ?>
>     >? ? ?>? ? ?> In any case, I would think that a basic truth should
>     be that
>     >? ? ?>? ? ?everything
>     >? ? ?>? ? ?> has a mask with a shape consistent with the data, so
>     >? ? ?>? ? ?> 1. Each complex numbers has just one mask, and setting
>     >? ? ?`a.imag` with a
>     >? ? ?>? ? ?> masked array should definitely propagate the mask.
>     >? ? ?>? ? ?> 2. For a masked array with structured dtype, I'd
>     similarly say
>     >? ? ?>? ? ?that the
>     >? ? ?>? ? ?> default is for a mask to have the same shape as the array.
>     >? ? ?But that
>     >? ? ?>? ? ?> something like your collection makes sense for the case
>     >? ? ?where one
>     >? ? ?>? ? ?wants
>     >? ? ?>? ? ?> to mask items in a structure.
>     >? ? ?>
>     >? ? ?>? ? ?Agreed that we should have a single bool per complex or
>     structured
>     >? ? ?>? ? ?element, and the mask shape is the same as the array shape.
>     >? ? ?That's how I
>     >? ? ?>? ? ?implemented it. But there is still a problem with
>     complex.imag
>     >? ? ?>? ? ?assignment:
>     >? ? ?>
>     >? ? ?>? ? ?? ? >>> a = MaskedArray([1j, 2, X])
>     >? ? ?>? ? ?? ? >>> i = a.imag
>     >? ? ?>? ? ?? ? >>> i[:] = MaskedArray([1, X, 1])
>     >? ? ?>
>     >? ? ?>? ? ?If we make the last line copy the mask to the original
>     array, what
>     >? ? ?>? ? ?should the real part of a[2] be? Conversely, if we don't
>     copy
>     >? ? ?the mask,
>     >? ? ?>? ? ?what should the imag part of a[1] be? It seems like we might
>     >? ? ?"want" the
>     >? ? ?>? ? ?masks to be OR'd instead, but then should i[2] be masked
>     after
>     >? ? ?we just
>     >? ? ?>? ? ?set it to 1?
>     >? ? ?>
>     >? ? ?> Ah, I see the issue now... Easiest to implement and closest
>     in analogy
>     >? ? ?> to a regular view would be to just let it unmask a[2] (with
>     >? ? ?whatever is
>     >? ? ?> in real; user beware!).
>     >? ? ?>
>     >? ? ?> Perhaps better would be to special-case such that `imag`
>     returns a
>     >? ? ?> read-only view of the mask. Making `imag` itself read-only would
>     >? ? ?prevent
>     >? ? ?> possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but
>     >? ? ?there is
>     >? ? ?> no reason this should update the mask.
>     >? ? ?>
>     >? ? ?> Still, neither is really satisfactory...
>     >? ? ?>  
>     >? ? ?>
>     >? ? ?>
>     >? ? ?>? ? ?> p.s. I started trying to implement the above "Mixin"
>     class; will
>     >? ? ?>? ? ?try to
>     >? ? ?>? ? ?> clean that up a bit so that at least it uses `where` and
>     >? ? ?push it up.
>     >? ? ?>
>     >? ? ?>? ? ?I played with "where", but didn't include it since 1.17
>     is not
>     >? ? ?released.
>     >? ? ?>? ? ?To avoid duplication of effort, I've attached a diff of
>     what I
>     >? ? ?tried. I
>     >? ? ?>? ? ?actually get a slight slowdown of about 10% by using
>     where...
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> Your implementation is indeed quite similar to what I got in
>     >? ? ?> __array_ufunc__ (though one should "&" the where with ~mask).
>     >? ? ?>
>     >? ? ?> I think the main benefit is not to presume that whatever is
>     underneath
>     >? ? ?> understands 0 or 1, i.e., avoid filling.
>     >? ? ?>
>     >? ? ?>
>     >? ? ?>? ? ?If you make progress with the mixin, a push is welcome. I
>     >? ? ?imagine a
>     >? ? ?>? ? ?problem is going to be that np.isscalar doesn't work to
>     detect
>     >? ? ?duck
>     >? ? ?>? ? ?scalars.
>     >? ? ?>
>     >? ? ?> I fear that in my attempts I've simply decided that only
>     array scalars
>     >? ? ?> exist...
>     >? ? ?>
>     >? ? ?> -- Marten
>     >? ? ?>
>     >? ? ?> _______________________________________________
>     >? ? ?> NumPy-Discussion mailing list
>     >? ? ?> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >? ? ?> https://mail.python.org/mailman/listinfo/numpy-discussion
>     >? ? ?>
>     >
>     >? ? ?_______________________________________________
>     >? ? ?NumPy-Discussion mailing list
>     >? ? ?NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >? ? ?https://mail.python.org/mailman/listinfo/numpy-discussion
>     >
>     >
>     > _______________________________________________
>     > NumPy-Discussion mailing list
>     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     > https://mail.python.org/mailman/listinfo/numpy-discussion
>     >
> 
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From m.h.vankerkwijk at gmail.com  Sat Jun 22 11:50:40 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sat, 22 Jun 2019 11:50:40 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
Message-ID: <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>

Hi Allan,

I'm not sure I would go too much by what the old MaskedArray class did. It
indeed made an effort not to overwrite masked values with a new result,
even to the extend of copying back masked input data elements to the output
data array after an operation. But the fact that this is non-sensical if
the dtype changes (or the units in an operation on quantities) suggests
that this mental model simply does not work.

I think a sensible alternative mental model for the MaskedArray class is
that all it does is forward any operations to the data it holds and
separately propagate a mask, ORing elements together for binary operations,
etc., and explicitly skipping masked elements in reductions (ideally using
`where` to be as agnostic as possible about the underlying data, for which,
e.g., setting masked values to `0` for `np.reduce.add` may or may not be
the right thing to do - what if they are string?).

With this mental picture, the underlying data are always have well-defined
meaning: they have been operated on as if the mask did not exist. There
then is also less reason to try to avoid getting it back to the user.

As a concrete example (maybe Ben has others): in astropy we have a
sigma-clipping average routine, which uses a `MaskedArray` to iteratively
mask items that are too far off from the mean; here, the mask varies each
iteration (an initially masked element can come back into play), but the
data do not.

All the best,

Marten

On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 6/21/19 2:37 PM, Benjamin Root wrote:
> > Just to note, data that is masked isn't always garbage. There are plenty
> > of use-cases where one may want to temporarily apply a mask for a set of
> > computation, or possibly want to apply a series of different masks to
> > the data. I haven't read through this discussion deeply enough, but is
> > this new class going to destroy underlying masked data? and will it be
> > possible to swap out masks?
> >
> > Cheers!
> > Ben Root
>
> Indeed my implementation currently feels free to clobber the data at
> masked positions and makes no guarantees not to.
>
> I'd like to try to support reasonable use-cases like yours though. A few
> thoughts:
>
> First, the old np.ma.MaskedArray explicitly does not promise to preserve
> masked values, with a big warning in the docs. I can't recall the
> examples, but I remember coming across cases where clobbering happens.
> So arguably your behavior was never supported, and perhaps this means
> that no-clobber behavior is difficult to reasonably support.
>
> Second, the old np.ma.MaskedArray avoids frequent clobbering by making
> lots of copies. Therefore, in most cases you will not lose any
> performance in my new MaskedArray relative to the old one by making an
> explicit copy yourself. I.e, is it problematic to have to do
>
>      >>> result = MaskedArray(data.copy(), trial_mask).sum()
>
> instead of
>
>      >>> marr.mask = trial_mask
>      >>> result = marr.sum()
>
> since they have similar performance?
>
> Third, in the old np.ma.MaskedArray masked positions are very often
> "effectively" clobbered, in the sense that they are not computed. For
> example, if you do "c = a+b", and then change the mask of c, the values
> at masked position of the result of (a+b) do not correspond to the sum
> of the masked values in a and b. Thus, by "unmasking" c you are exposing
> nonsense values, which to me seems likely to cause heisenbugs.
>
>
> In summary, by not making no-clobber guarantees and by strictly
> preventing exposure of nonsense values, I suspect that: 1. my new code
> is simpler and faster by avoiding lots of copies, and forces copies to
> be explicit in user code. 2. disallowing direct modification of the mask
> lowers the "API surface area" making people's MaskedArray code less
> buggy and easier to read: Exposure of nonsense values by "unmasking" is
> one less possibility to keep in mind.
>
> Best,
> Allan
>
>
> > On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane <allanhaldane at gmail.com
> > <mailto:allanhaldane at gmail.com>> wrote:
> >
> >     On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
> >     > Hi Allan,
> >     >
> >     > This is very impressive! I could get the tests that I wrote for my
> >     class
> >     > pass with yours using Quantity with what I would consider very
> minimal
> >     > changes. I only could not find a good way to unmask data (I like
> the
> >     > idea of setting the mask on some elements via `ma[item] = X`); is
> this
> >     > on purpose?
> >
> >     Yes, I want to make it difficult for the user to access the garbage
> >     values under the mask, which are often clobbered values. The only
> way to
> >     "remove" a masked value is by replacing it with a new non-masked
> value.
> >
> >
> >     > Anyway, it would seem easily at the point where I should comment
> >     on your
> >     > repository rather than in the mailing list!
> >
> >     To make further progress on this encapsulation idea I need a more
> >     complete ducktype to pass into MaskedArray to test, so that's what
> I'll
> >     work on next, when I have time. I'll either try to finish my
> >     ArrayCollection type, or try making a simple NDunit ducktype
> >     piggybacking on astropy's Unit.
> >
> >     Best,
> >     Allan
> >
> >
> >     >
> >     > All the best,
> >     >
> >     > Marten
> >     >
> >     >
> >     > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane
> >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
> >     > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
> >     wrote:
> >     >
> >     >     On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> >     >     >
> >     >     >
> >     >     > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
> >     >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
> >     <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>
> >     >     > <mailto:allanhaldane at gmail.com
> >     <mailto:allanhaldane at gmail.com> <mailto:allanhaldane at gmail.com
> >     <mailto:allanhaldane at gmail.com>>>>
> >     >     wrote:
> >     >     > <snip>
> >     >     >
> >     >     >     > This may be too much to ask from the initializer, but,
> if
> >     >     so, it still
> >     >     >     > seems most useful if it is made as easy as possible to
> do,
> >     >     say, `class
> >     >     >     > MaskedQuantity(Masked, Quantity): <very few
> overrides>`.
> >     >     >
> >     >     >     Currently MaskedArray does not accept ducktypes as
> >     underlying
> >     >     arrays,
> >     >     >     but I think it shouldn't be too hard to modify it to do
> so.
> >     >     Good idea!
> >     >     >
> >     >     >
> >     >     > Looking back at my trial, I see that I also never got to
> >     duck arrays -
> >     >     > only ndarray subclasses - though I tried to make the code as
> >     >     agnostic as
> >     >     > possible.
> >     >     >
> >     >     > (Trial at
> >     >     >
> >     >
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> >     >     >
> >     >     >     I already partly navigated this mixin-issue in the
> >     >     >     "MaskedArrayCollection" class, which essentially does
> >     >     >     ArrayCollection(MaskedArray(array)), and only takes
> about 30
> >     >     lines of
> >     >     >     boilerplate. That's the backwards encapsulation order
> from
> >     >     what you want
> >     >     >     though.
> >     >     >
> >     >     >
> >     >     > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) *
> >     u.m,
> >     >     > mask=[True, False, False])` does indeed not have a `.unit`
> >     attribute
> >     >     > (and cannot represent itself...); I'm not at all sure that my
> >     >     method of
> >     >     > just creating a mixed class is anything but a recipe for
> >     disaster,
> >     >     though!
> >     >
> >     >     Based on your suggestion I worked on this a little today, and
> >     now my
> >     >     MaskedArray more easily encapsulates both ducktypes and ndarray
> >     >     subclasses (pushed to repo). Here's an example I got working
> >     with masked
> >     >     units using unyt:
> >     >
> >     >     [1]: from MaskedArray import X, MaskedArray, MaskedScalar
> >     >
> >     >     [2]: from unyt import m, km
> >     >
> >     >     [3]: import numpy as np
> >     >
> >     >     [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
> >     >
> >     >     [5]: uarr
> >     >
> >     >     MaskedArray([1., X , 3.])
> >     >     [6]: uarr + 1*m
> >     >
> >     >     MaskedArray([1.001, X    , 3.001])
> >     >     [7]: uarr.filled()
> >     >
> >     >     unyt_array([1., 0., 3.], 'km')
> >     >     [8]: np.concatenate([uarr, 2*uarr]).filled()
> >     >     unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
> >     >
> >     >     The catch is the ducktype/subclass has to rigorously follow
> >     numpy's
> >     >     indexing rules, including distinguishing 0d arrays from
> >     scalars. For now
> >     >     only I used unyt in the example above since it happens to be
> >     less strict
> >     >      about dimensionless operations than astropy.units which trips
> >     up my
> >     >     repr code. (see below for example with astropy.units). Note in
> >     the last
> >     >     line I lost the dimensions, but that is because unyt does not
> >     handle
> >     >     np.concatenate. To get that to work we need a true ducktype
> >     for units.
> >     >
> >     >     The example above doesn't expose the ".units" attribute
> >     outside the
> >     >     MaskedArray, and it doesn't print the units in the repr. But
> >     you can
> >     >     access them using "filled".
> >     >
> >     >     While I could make MaskedArray forward unknown attribute
> >     accesses to the
> >     >     encapsulated array, that seems a bit dangerous/bug-prone at
> first
> >     >     glance, so probably I want to require the user to make a
> >     MaskedArray
> >     >     subclass to do so. I've just started playing with that
> >     (probably buggy),
> >     >     and Ive attached subclass examples for astropy.unit and unyt,
> >     with some
> >     >     example output below.
> >     >
> >     >     Cheers,
> >     >     Allan
> >     >
> >     >
> >     >
> >     >     Example using the attached astropy unit subclass:
> >     >
> >     >         >>> from astropy.units import m, km, s
> >     >         >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
> >     >         >>> uarr
> >     >         MaskedQ([1., X , 1.], units=km)
> >     >         >>> uarr.units
> >     >         km
> >     >         >>> uarr + (1*m)
> >     >         MaskedQ([1.001, X    , 1.001], units=km)
> >     >         >>> uarr/(1*s)
> >     >         MaskedQ([1., X , 1.], units=km / s)
> >     >         >>> (uarr*(1*m))[1:]
> >     >         MaskedQ([X , 1.], units=km m)
> >     >         >>> np.add.outer(uarr, uarr)
> >     >         MaskedQ([[2., X , 2.],
> >     >                  [X , X , X ],
> >     >                  [2., X , 2.]], units=km)
> >     >         >>> print(uarr)
> >     >         [1. X  1.] km m
> >     >
> >     >     Cheers,
> >     >     Allan
> >     >
> >     >
> >     >     >     > Even if this impossible, I think it is conceptually
> useful
> >     >     to think
> >     >     >     > about what the masking class should do. My sense is
> that,
> >     >     e.g., it
> >     >     >     > should not attempt to decide when an operation
> >     succeeds or not,
> >     >     >     but just
> >     >     >     > "or together" input masks for regular, multiple-input
> >     functions,
> >     >     >     and let
> >     >     >     > the underlying arrays skip elements for reductions by
> >     using
> >     >     `where`
> >     >     >     > (hey, I did implement that for a reason... ;-). In
> >     >     particular, it
> >     >     >     > suggests one should not have things like domains and
> all
> >     >     that (I never
> >     >     >     > understood why `MaskedArray` did that). If one wants
> more,
> >     >     the class
> >     >     >     > should provide a method that updates the mask (a
> sensible
> >     >     default
> >     >     >     might
> >     >     >     > be `mask |= ~np.isfinite(result)` - here, the class
> >     being masked
> >     >     >     should
> >     >     >     > logically support ufuncs and functions, so it can
> >     decide what
> >     >     >     "isfinite"
> >     >     >     > means).
> >     >     >
> >     >     >     I agree it would be nice to remove domains. It would
> >     make life
> >     >     easier,
> >     >     >     and I could remove a lot of twiddly code! I kept it in
> >     for now to
> >     >     >     minimize the behavior changes from the old MaskedArray.
> >     >     >
> >     >     >
> >     >     > That makes sense. Could be separated out to a
> >     backwards-compatibility
> >     >     > class later.
> >     >     >
> >     >     >
> >     >     >     > In any case, I would think that a basic truth should
> >     be that
> >     >     >     everything
> >     >     >     > has a mask with a shape consistent with the data, so
> >     >     >     > 1. Each complex numbers has just one mask, and setting
> >     >     `a.imag` with a
> >     >     >     > masked array should definitely propagate the mask.
> >     >     >     > 2. For a masked array with structured dtype, I'd
> >     similarly say
> >     >     >     that the
> >     >     >     > default is for a mask to have the same shape as the
> array.
> >     >     But that
> >     >     >     > something like your collection makes sense for the case
> >     >     where one
> >     >     >     wants
> >     >     >     > to mask items in a structure.
> >     >     >
> >     >     >     Agreed that we should have a single bool per complex or
> >     structured
> >     >     >     element, and the mask shape is the same as the array
> shape.
> >     >     That's how I
> >     >     >     implemented it. But there is still a problem with
> >     complex.imag
> >     >     >     assignment:
> >     >     >
> >     >     >         >>> a = MaskedArray([1j, 2, X])
> >     >     >         >>> i = a.imag
> >     >     >         >>> i[:] = MaskedArray([1, X, 1])
> >     >     >
> >     >     >     If we make the last line copy the mask to the original
> >     array, what
> >     >     >     should the real part of a[2] be? Conversely, if we don't
> >     copy
> >     >     the mask,
> >     >     >     what should the imag part of a[1] be? It seems like we
> might
> >     >     "want" the
> >     >     >     masks to be OR'd instead, but then should i[2] be masked
> >     after
> >     >     we just
> >     >     >     set it to 1?
> >     >     >
> >     >     > Ah, I see the issue now... Easiest to implement and closest
> >     in analogy
> >     >     > to a regular view would be to just let it unmask a[2] (with
> >     >     whatever is
> >     >     > in real; user beware!).
> >     >     >
> >     >     > Perhaps better would be to special-case such that `imag`
> >     returns a
> >     >     > read-only view of the mask. Making `imag` itself read-only
> would
> >     >     prevent
> >     >     > possibly reasonable things like `i[np.isclose(i, 0)] = 0` -
> but
> >     >     there is
> >     >     > no reason this should update the mask.
> >     >     >
> >     >     > Still, neither is really satisfactory...
> >     >     >
> >     >     >
> >     >     >
> >     >     >     > p.s. I started trying to implement the above "Mixin"
> >     class; will
> >     >     >     try to
> >     >     >     > clean that up a bit so that at least it uses `where`
> and
> >     >     push it up.
> >     >     >
> >     >     >     I played with "where", but didn't include it since 1.17
> >     is not
> >     >     released.
> >     >     >     To avoid duplication of effort, I've attached a diff of
> >     what I
> >     >     tried. I
> >     >     >     actually get a slight slowdown of about 10% by using
> >     where...
> >     >     >
> >     >     >
> >     >     > Your implementation is indeed quite similar to what I got in
> >     >     > __array_ufunc__ (though one should "&" the where with ~mask).
> >     >     >
> >     >     > I think the main benefit is not to presume that whatever is
> >     underneath
> >     >     > understands 0 or 1, i.e., avoid filling.
> >     >     >
> >     >     >
> >     >     >     If you make progress with the mixin, a push is welcome. I
> >     >     imagine a
> >     >     >     problem is going to be that np.isscalar doesn't work to
> >     detect
> >     >     duck
> >     >     >     scalars.
> >     >     >
> >     >     > I fear that in my attempts I've simply decided that only
> >     array scalars
> >     >     > exist...
> >     >     >
> >     >     > -- Marten
> >     >     >
> >     >     > _______________________________________________
> >     >     > NumPy-Discussion mailing list
> >     >     > NumPy-Discussion at python.org
> >     <mailto:NumPy-Discussion at python.org>
> >     <mailto:NumPy-Discussion at python.org
> >     <mailto:NumPy-Discussion at python.org>>
> >     >     > https://mail.python.org/mailman/listinfo/numpy-discussion
> >     >     >
> >     >
> >     >     _______________________________________________
> >     >     NumPy-Discussion mailing list
> >     >     NumPy-Discussion at python.org
> >     <mailto:NumPy-Discussion at python.org>
> >     <mailto:NumPy-Discussion at python.org
> >     <mailto:NumPy-Discussion at python.org>>
> >     >     https://mail.python.org/mailman/listinfo/numpy-discussion
> >     >
> >     >
> >     > _______________________________________________
> >     > NumPy-Discussion mailing list
> >     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >     > https://mail.python.org/mailman/listinfo/numpy-discussion
> >     >
> >
> >     _______________________________________________
> >     NumPy-Discussion mailing list
> >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >     https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190622/09795243/attachment-0001.html>

From ben.v.root at gmail.com  Sat Jun 22 21:21:31 2019
From: ben.v.root at gmail.com (Benjamin Root)
Date: Sat, 22 Jun 2019 21:21:31 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
Message-ID: <CANNq6Fn_VbuTbxU_A0uCjnq22tabtGW0uYX+iYqBmQGKgYfPGA@mail.gmail.com>

"""Third, in the old np.ma.MaskedArray masked positions are very often
"effectively" clobbered, in the sense that they are not computed. For
example, if you do "c = a+b", and then change the mask of c"""

My use-cases don't involve changing the mask of "c". It would involve
changing the mask of "a" or "b" after I have calculated "c", so that I
could calculate "d". As a fairly simple example, I frequently work with
satellite data. We have multiple masks, such as water, vegetation, sandy
loam, bare rock, etc. The underlying satellite data in any of these places
isn't bad, they just need to be dealt with differently.  I wouldn't want
the act of applying a mask for a set of calculations on things that aren't
bare rock to mess up my subsequent calculation on things that aren't water.
Right now, I have to handle this explicitly with flattened sparse arrays,
which makes visualization and conception difficult.

Ben Root

On Sat, Jun 22, 2019 at 11:51 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Allan,
>
> I'm not sure I would go too much by what the old MaskedArray class did. It
> indeed made an effort not to overwrite masked values with a new result,
> even to the extend of copying back masked input data elements to the output
> data array after an operation. But the fact that this is non-sensical if
> the dtype changes (or the units in an operation on quantities) suggests
> that this mental model simply does not work.
>
> I think a sensible alternative mental model for the MaskedArray class is
> that all it does is forward any operations to the data it holds and
> separately propagate a mask, ORing elements together for binary operations,
> etc., and explicitly skipping masked elements in reductions (ideally using
> `where` to be as agnostic as possible about the underlying data, for which,
> e.g., setting masked values to `0` for `np.reduce.add` may or may not be
> the right thing to do - what if they are string?).
>
> With this mental picture, the underlying data are always have well-defined
> meaning: they have been operated on as if the mask did not exist. There
> then is also less reason to try to avoid getting it back to the user.
>
> As a concrete example (maybe Ben has others): in astropy we have a
> sigma-clipping average routine, which uses a `MaskedArray` to iteratively
> mask items that are too far off from the mean; here, the mask varies each
> iteration (an initially masked element can come back into play), but the
> data do not.
>
> All the best,
>
> Marten
>
> On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane <allanhaldane at gmail.com>
> wrote:
>
>> On 6/21/19 2:37 PM, Benjamin Root wrote:
>> > Just to note, data that is masked isn't always garbage. There are plenty
>> > of use-cases where one may want to temporarily apply a mask for a set of
>> > computation, or possibly want to apply a series of different masks to
>> > the data. I haven't read through this discussion deeply enough, but is
>> > this new class going to destroy underlying masked data? and will it be
>> > possible to swap out masks?
>> >
>> > Cheers!
>> > Ben Root
>>
>> Indeed my implementation currently feels free to clobber the data at
>> masked positions and makes no guarantees not to.
>>
>> I'd like to try to support reasonable use-cases like yours though. A few
>> thoughts:
>>
>> First, the old np.ma.MaskedArray explicitly does not promise to preserve
>> masked values, with a big warning in the docs. I can't recall the
>> examples, but I remember coming across cases where clobbering happens.
>> So arguably your behavior was never supported, and perhaps this means
>> that no-clobber behavior is difficult to reasonably support.
>>
>> Second, the old np.ma.MaskedArray avoids frequent clobbering by making
>> lots of copies. Therefore, in most cases you will not lose any
>> performance in my new MaskedArray relative to the old one by making an
>> explicit copy yourself. I.e, is it problematic to have to do
>>
>>      >>> result = MaskedArray(data.copy(), trial_mask).sum()
>>
>> instead of
>>
>>      >>> marr.mask = trial_mask
>>      >>> result = marr.sum()
>>
>> since they have similar performance?
>>
>> Third, in the old np.ma.MaskedArray masked positions are very often
>> "effectively" clobbered, in the sense that they are not computed. For
>> example, if you do "c = a+b", and then change the mask of c, the values
>> at masked position of the result of (a+b) do not correspond to the sum
>> of the masked values in a and b. Thus, by "unmasking" c you are exposing
>> nonsense values, which to me seems likely to cause heisenbugs.
>>
>>
>> In summary, by not making no-clobber guarantees and by strictly
>> preventing exposure of nonsense values, I suspect that: 1. my new code
>> is simpler and faster by avoiding lots of copies, and forces copies to
>> be explicit in user code. 2. disallowing direct modification of the mask
>> lowers the "API surface area" making people's MaskedArray code less
>> buggy and easier to read: Exposure of nonsense values by "unmasking" is
>> one less possibility to keep in mind.
>>
>> Best,
>> Allan
>>
>>
>> > On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane <allanhaldane at gmail.com
>> > <mailto:allanhaldane at gmail.com>> wrote:
>> >
>> >     On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
>> >     > Hi Allan,
>> >     >
>> >     > This is very impressive! I could get the tests that I wrote for my
>> >     class
>> >     > pass with yours using Quantity with what I would consider very
>> minimal
>> >     > changes. I only could not find a good way to unmask data (I like
>> the
>> >     > idea of setting the mask on some elements via `ma[item] = X`); is
>> this
>> >     > on purpose?
>> >
>> >     Yes, I want to make it difficult for the user to access the garbage
>> >     values under the mask, which are often clobbered values. The only
>> way to
>> >     "remove" a masked value is by replacing it with a new non-masked
>> value.
>> >
>> >
>> >     > Anyway, it would seem easily at the point where I should comment
>> >     on your
>> >     > repository rather than in the mailing list!
>> >
>> >     To make further progress on this encapsulation idea I need a more
>> >     complete ducktype to pass into MaskedArray to test, so that's what
>> I'll
>> >     work on next, when I have time. I'll either try to finish my
>> >     ArrayCollection type, or try making a simple NDunit ducktype
>> >     piggybacking on astropy's Unit.
>> >
>> >     Best,
>> >     Allan
>> >
>> >
>> >     >
>> >     > All the best,
>> >     >
>> >     > Marten
>> >     >
>> >     >
>> >     > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane
>> >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>> >     > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
>> >     wrote:
>> >     >
>> >     >     On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
>> >     >     >
>> >     >     >
>> >     >     > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
>> >     >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>> >     <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>
>> >     >     > <mailto:allanhaldane at gmail.com
>> >     <mailto:allanhaldane at gmail.com> <mailto:allanhaldane at gmail.com
>> >     <mailto:allanhaldane at gmail.com>>>>
>> >     >     wrote:
>> >     >     > <snip>
>> >     >     >
>> >     >     >     > This may be too much to ask from the initializer,
>> but, if
>> >     >     so, it still
>> >     >     >     > seems most useful if it is made as easy as possible
>> to do,
>> >     >     say, `class
>> >     >     >     > MaskedQuantity(Masked, Quantity): <very few
>> overrides>`.
>> >     >     >
>> >     >     >     Currently MaskedArray does not accept ducktypes as
>> >     underlying
>> >     >     arrays,
>> >     >     >     but I think it shouldn't be too hard to modify it to do
>> so.
>> >     >     Good idea!
>> >     >     >
>> >     >     >
>> >     >     > Looking back at my trial, I see that I also never got to
>> >     duck arrays -
>> >     >     > only ndarray subclasses - though I tried to make the code as
>> >     >     agnostic as
>> >     >     > possible.
>> >     >     >
>> >     >     > (Trial at
>> >     >     >
>> >     >
>> >
>> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
>> )
>> >     >     >
>> >     >     >     I already partly navigated this mixin-issue in the
>> >     >     >     "MaskedArrayCollection" class, which essentially does
>> >     >     >     ArrayCollection(MaskedArray(array)), and only takes
>> about 30
>> >     >     lines of
>> >     >     >     boilerplate. That's the backwards encapsulation order
>> from
>> >     >     what you want
>> >     >     >     though.
>> >     >     >
>> >     >     >
>> >     >     > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) *
>> >     u.m,
>> >     >     > mask=[True, False, False])` does indeed not have a `.unit`
>> >     attribute
>> >     >     > (and cannot represent itself...); I'm not at all sure that
>> my
>> >     >     method of
>> >     >     > just creating a mixed class is anything but a recipe for
>> >     disaster,
>> >     >     though!
>> >     >
>> >     >     Based on your suggestion I worked on this a little today, and
>> >     now my
>> >     >     MaskedArray more easily encapsulates both ducktypes and
>> ndarray
>> >     >     subclasses (pushed to repo). Here's an example I got working
>> >     with masked
>> >     >     units using unyt:
>> >     >
>> >     >     [1]: from MaskedArray import X, MaskedArray, MaskedScalar
>> >     >
>> >     >     [2]: from unyt import m, km
>> >     >
>> >     >     [3]: import numpy as np
>> >     >
>> >     >     [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>> >     >
>> >     >     [5]: uarr
>> >     >
>> >     >     MaskedArray([1., X , 3.])
>> >     >     [6]: uarr + 1*m
>> >     >
>> >     >     MaskedArray([1.001, X    , 3.001])
>> >     >     [7]: uarr.filled()
>> >     >
>> >     >     unyt_array([1., 0., 3.], 'km')
>> >     >     [8]: np.concatenate([uarr, 2*uarr]).filled()
>> >     >     unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>> >     >
>> >     >     The catch is the ducktype/subclass has to rigorously follow
>> >     numpy's
>> >     >     indexing rules, including distinguishing 0d arrays from
>> >     scalars. For now
>> >     >     only I used unyt in the example above since it happens to be
>> >     less strict
>> >     >      about dimensionless operations than astropy.units which trips
>> >     up my
>> >     >     repr code. (see below for example with astropy.units). Note in
>> >     the last
>> >     >     line I lost the dimensions, but that is because unyt does not
>> >     handle
>> >     >     np.concatenate. To get that to work we need a true ducktype
>> >     for units.
>> >     >
>> >     >     The example above doesn't expose the ".units" attribute
>> >     outside the
>> >     >     MaskedArray, and it doesn't print the units in the repr. But
>> >     you can
>> >     >     access them using "filled".
>> >     >
>> >     >     While I could make MaskedArray forward unknown attribute
>> >     accesses to the
>> >     >     encapsulated array, that seems a bit dangerous/bug-prone at
>> first
>> >     >     glance, so probably I want to require the user to make a
>> >     MaskedArray
>> >     >     subclass to do so. I've just started playing with that
>> >     (probably buggy),
>> >     >     and Ive attached subclass examples for astropy.unit and unyt,
>> >     with some
>> >     >     example output below.
>> >     >
>> >     >     Cheers,
>> >     >     Allan
>> >     >
>> >     >
>> >     >
>> >     >     Example using the attached astropy unit subclass:
>> >     >
>> >     >         >>> from astropy.units import m, km, s
>> >     >         >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>> >     >         >>> uarr
>> >     >         MaskedQ([1., X , 1.], units=km)
>> >     >         >>> uarr.units
>> >     >         km
>> >     >         >>> uarr + (1*m)
>> >     >         MaskedQ([1.001, X    , 1.001], units=km)
>> >     >         >>> uarr/(1*s)
>> >     >         MaskedQ([1., X , 1.], units=km / s)
>> >     >         >>> (uarr*(1*m))[1:]
>> >     >         MaskedQ([X , 1.], units=km m)
>> >     >         >>> np.add.outer(uarr, uarr)
>> >     >         MaskedQ([[2., X , 2.],
>> >     >                  [X , X , X ],
>> >     >                  [2., X , 2.]], units=km)
>> >     >         >>> print(uarr)
>> >     >         [1. X  1.] km m
>> >     >
>> >     >     Cheers,
>> >     >     Allan
>> >     >
>> >     >
>> >     >     >     > Even if this impossible, I think it is conceptually
>> useful
>> >     >     to think
>> >     >     >     > about what the masking class should do. My sense is
>> that,
>> >     >     e.g., it
>> >     >     >     > should not attempt to decide when an operation
>> >     succeeds or not,
>> >     >     >     but just
>> >     >     >     > "or together" input masks for regular, multiple-input
>> >     functions,
>> >     >     >     and let
>> >     >     >     > the underlying arrays skip elements for reductions by
>> >     using
>> >     >     `where`
>> >     >     >     > (hey, I did implement that for a reason... ;-). In
>> >     >     particular, it
>> >     >     >     > suggests one should not have things like domains and
>> all
>> >     >     that (I never
>> >     >     >     > understood why `MaskedArray` did that). If one wants
>> more,
>> >     >     the class
>> >     >     >     > should provide a method that updates the mask (a
>> sensible
>> >     >     default
>> >     >     >     might
>> >     >     >     > be `mask |= ~np.isfinite(result)` - here, the class
>> >     being masked
>> >     >     >     should
>> >     >     >     > logically support ufuncs and functions, so it can
>> >     decide what
>> >     >     >     "isfinite"
>> >     >     >     > means).
>> >     >     >
>> >     >     >     I agree it would be nice to remove domains. It would
>> >     make life
>> >     >     easier,
>> >     >     >     and I could remove a lot of twiddly code! I kept it in
>> >     for now to
>> >     >     >     minimize the behavior changes from the old MaskedArray.
>> >     >     >
>> >     >     >
>> >     >     > That makes sense. Could be separated out to a
>> >     backwards-compatibility
>> >     >     > class later.
>> >     >     >
>> >     >     >
>> >     >     >     > In any case, I would think that a basic truth should
>> >     be that
>> >     >     >     everything
>> >     >     >     > has a mask with a shape consistent with the data, so
>> >     >     >     > 1. Each complex numbers has just one mask, and setting
>> >     >     `a.imag` with a
>> >     >     >     > masked array should definitely propagate the mask.
>> >     >     >     > 2. For a masked array with structured dtype, I'd
>> >     similarly say
>> >     >     >     that the
>> >     >     >     > default is for a mask to have the same shape as the
>> array.
>> >     >     But that
>> >     >     >     > something like your collection makes sense for the
>> case
>> >     >     where one
>> >     >     >     wants
>> >     >     >     > to mask items in a structure.
>> >     >     >
>> >     >     >     Agreed that we should have a single bool per complex or
>> >     structured
>> >     >     >     element, and the mask shape is the same as the array
>> shape.
>> >     >     That's how I
>> >     >     >     implemented it. But there is still a problem with
>> >     complex.imag
>> >     >     >     assignment:
>> >     >     >
>> >     >     >         >>> a = MaskedArray([1j, 2, X])
>> >     >     >         >>> i = a.imag
>> >     >     >         >>> i[:] = MaskedArray([1, X, 1])
>> >     >     >
>> >     >     >     If we make the last line copy the mask to the original
>> >     array, what
>> >     >     >     should the real part of a[2] be? Conversely, if we don't
>> >     copy
>> >     >     the mask,
>> >     >     >     what should the imag part of a[1] be? It seems like we
>> might
>> >     >     "want" the
>> >     >     >     masks to be OR'd instead, but then should i[2] be masked
>> >     after
>> >     >     we just
>> >     >     >     set it to 1?
>> >     >     >
>> >     >     > Ah, I see the issue now... Easiest to implement and closest
>> >     in analogy
>> >     >     > to a regular view would be to just let it unmask a[2] (with
>> >     >     whatever is
>> >     >     > in real; user beware!).
>> >     >     >
>> >     >     > Perhaps better would be to special-case such that `imag`
>> >     returns a
>> >     >     > read-only view of the mask. Making `imag` itself read-only
>> would
>> >     >     prevent
>> >     >     > possibly reasonable things like `i[np.isclose(i, 0)] = 0` -
>> but
>> >     >     there is
>> >     >     > no reason this should update the mask.
>> >     >     >
>> >     >     > Still, neither is really satisfactory...
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >     > p.s. I started trying to implement the above "Mixin"
>> >     class; will
>> >     >     >     try to
>> >     >     >     > clean that up a bit so that at least it uses `where`
>> and
>> >     >     push it up.
>> >     >     >
>> >     >     >     I played with "where", but didn't include it since 1.17
>> >     is not
>> >     >     released.
>> >     >     >     To avoid duplication of effort, I've attached a diff of
>> >     what I
>> >     >     tried. I
>> >     >     >     actually get a slight slowdown of about 10% by using
>> >     where...
>> >     >     >
>> >     >     >
>> >     >     > Your implementation is indeed quite similar to what I got in
>> >     >     > __array_ufunc__ (though one should "&" the where with
>> ~mask).
>> >     >     >
>> >     >     > I think the main benefit is not to presume that whatever is
>> >     underneath
>> >     >     > understands 0 or 1, i.e., avoid filling.
>> >     >     >
>> >     >     >
>> >     >     >     If you make progress with the mixin, a push is welcome.
>> I
>> >     >     imagine a
>> >     >     >     problem is going to be that np.isscalar doesn't work to
>> >     detect
>> >     >     duck
>> >     >     >     scalars.
>> >     >     >
>> >     >     > I fear that in my attempts I've simply decided that only
>> >     array scalars
>> >     >     > exist...
>> >     >     >
>> >     >     > -- Marten
>> >     >     >
>> >     >     > _______________________________________________
>> >     >     > NumPy-Discussion mailing list
>> >     >     > NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>
>> >     <mailto:NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>>
>> >     >     > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >     >
>> >     >
>> >     >     _______________________________________________
>> >     >     NumPy-Discussion mailing list
>> >     >     NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>
>> >     <mailto:NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>>
>> >     >     https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >
>> >     >
>> >     > _______________________________________________
>> >     > NumPy-Discussion mailing list
>> >     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>> >     > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >
>> >
>> >     _______________________________________________
>> >     NumPy-Discussion mailing list
>> >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>> >     https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190622/933e56fc/attachment-0001.html>

From shoyer at gmail.com  Sun Jun 23 02:29:06 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 23 Jun 2019 09:29:06 +0300
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
Message-ID: <CAEQ_TvcmLyyomXgCnbXxSfLwydp=snbHqumoeeOtXM4wDBYNrw@mail.gmail.com>

On Thu, Jun 20, 2019 at 7:44 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
> > Hi Allan,
> >
> > This is very impressive! I could get the tests that I wrote for my class
> > pass with yours using Quantity with what I would consider very minimal
> > changes. I only could not find a good way to unmask data (I like the
> > idea of setting the mask on some elements via `ma[item] = X`); is this
> > on purpose?
>
> Yes, I want to make it difficult for the user to access the garbage
> values under the mask, which are often clobbered values. The only way to
> "remove" a masked value is by replacing it with a new non-masked value.
>

I think we should make it possible to access (and even mutate) data under
the mask directly, while noting the lack of any guarantees about what those
values are.

MaskedArray has a minimal and transparent data model, consisting of data
and mask arrays. There are plenty of use cases where it is convenient to
access the underlying arrays directly, e.g., for efficient implementation
of low-level MaskedArray algorithms.

NumPy itself does a similar thing on ndarray by exposing data/strides.
Advanced users who learn the details of the data model find them useful,
and everyone else ignores them.


>
> > Anyway, it would seem easily at the point where I should comment on your
> > repository rather than in the mailing list!
>
> To make further progress on this encapsulation idea I need a more
> complete ducktype to pass into MaskedArray to test, so that's what I'll
> work on next, when I have time. I'll either try to finish my
> ArrayCollection type, or try making a simple NDunit ducktype
> piggybacking on astropy's Unit.
>

dask.array would be another good example to try. I think it already should
support __array_function__ (and if not it should be very easy to add).


> Best,
> Allan
>
>
> >
> > All the best,
> >
> > Marten
> >
> >
> > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane <allanhaldane at gmail.com
> > <mailto:allanhaldane at gmail.com>> wrote:
> >
> >     On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
> >     >
> >     >
> >     > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
> >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
> >     > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
> >     wrote:
> >     > <snip>
> >     >
> >     >     > This may be too much to ask from the initializer, but, if
> >     so, it still
> >     >     > seems most useful if it is made as easy as possible to do,
> >     say, `class
> >     >     > MaskedQuantity(Masked, Quantity): <very few overrides>`.
> >     >
> >     >     Currently MaskedArray does not accept ducktypes as underlying
> >     arrays,
> >     >     but I think it shouldn't be too hard to modify it to do so.
> >     Good idea!
> >     >
> >     >
> >     > Looking back at my trial, I see that I also never got to duck
> arrays -
> >     > only ndarray subclasses - though I tried to make the code as
> >     agnostic as
> >     > possible.
> >     >
> >     > (Trial at
> >     >
> >
> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
> )
> >     >
> >     >     I already partly navigated this mixin-issue in the
> >     >     "MaskedArrayCollection" class, which essentially does
> >     >     ArrayCollection(MaskedArray(array)), and only takes about 30
> >     lines of
> >     >     boilerplate. That's the backwards encapsulation order from
> >     what you want
> >     >     though.
> >     >
> >     >
> >     > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) * u.m,
> >     > mask=[True, False, False])` does indeed not have a `.unit`
> attribute
> >     > (and cannot represent itself...); I'm not at all sure that my
> >     method of
> >     > just creating a mixed class is anything but a recipe for disaster,
> >     though!
> >
> >     Based on your suggestion I worked on this a little today, and now my
> >     MaskedArray more easily encapsulates both ducktypes and ndarray
> >     subclasses (pushed to repo). Here's an example I got working with
> masked
> >     units using unyt:
> >
> >     [1]: from MaskedArray import X, MaskedArray, MaskedScalar
> >
> >     [2]: from unyt import m, km
> >
> >     [3]: import numpy as np
> >
> >     [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
> >
> >     [5]: uarr
> >
> >     MaskedArray([1., X , 3.])
> >     [6]: uarr + 1*m
> >
> >     MaskedArray([1.001, X    , 3.001])
> >     [7]: uarr.filled()
> >
> >     unyt_array([1., 0., 3.], 'km')
> >     [8]: np.concatenate([uarr, 2*uarr]).filled()
> >     unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
> >
> >     The catch is the ducktype/subclass has to rigorously follow numpy's
> >     indexing rules, including distinguishing 0d arrays from scalars. For
> now
> >     only I used unyt in the example above since it happens to be less
> strict
> >      about dimensionless operations than astropy.units which trips up my
> >     repr code. (see below for example with astropy.units). Note in the
> last
> >     line I lost the dimensions, but that is because unyt does not handle
> >     np.concatenate. To get that to work we need a true ducktype for
> units.
> >
> >     The example above doesn't expose the ".units" attribute outside the
> >     MaskedArray, and it doesn't print the units in the repr. But you can
> >     access them using "filled".
> >
> >     While I could make MaskedArray forward unknown attribute accesses to
> the
> >     encapsulated array, that seems a bit dangerous/bug-prone at first
> >     glance, so probably I want to require the user to make a MaskedArray
> >     subclass to do so. I've just started playing with that (probably
> buggy),
> >     and Ive attached subclass examples for astropy.unit and unyt, with
> some
> >     example output below.
> >
> >     Cheers,
> >     Allan
> >
> >
> >
> >     Example using the attached astropy unit subclass:
> >
> >         >>> from astropy.units import m, km, s
> >         >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
> >         >>> uarr
> >         MaskedQ([1., X , 1.], units=km)
> >         >>> uarr.units
> >         km
> >         >>> uarr + (1*m)
> >         MaskedQ([1.001, X    , 1.001], units=km)
> >         >>> uarr/(1*s)
> >         MaskedQ([1., X , 1.], units=km / s)
> >         >>> (uarr*(1*m))[1:]
> >         MaskedQ([X , 1.], units=km m)
> >         >>> np.add.outer(uarr, uarr)
> >         MaskedQ([[2., X , 2.],
> >                  [X , X , X ],
> >                  [2., X , 2.]], units=km)
> >         >>> print(uarr)
> >         [1. X  1.] km m
> >
> >     Cheers,
> >     Allan
> >
> >
> >     >     > Even if this impossible, I think it is conceptually useful
> >     to think
> >     >     > about what the masking class should do. My sense is that,
> >     e.g., it
> >     >     > should not attempt to decide when an operation succeeds or
> not,
> >     >     but just
> >     >     > "or together" input masks for regular, multiple-input
> functions,
> >     >     and let
> >     >     > the underlying arrays skip elements for reductions by using
> >     `where`
> >     >     > (hey, I did implement that for a reason... ;-). In
> >     particular, it
> >     >     > suggests one should not have things like domains and all
> >     that (I never
> >     >     > understood why `MaskedArray` did that). If one wants more,
> >     the class
> >     >     > should provide a method that updates the mask (a sensible
> >     default
> >     >     might
> >     >     > be `mask |= ~np.isfinite(result)` - here, the class being
> masked
> >     >     should
> >     >     > logically support ufuncs and functions, so it can decide what
> >     >     "isfinite"
> >     >     > means).
> >     >
> >     >     I agree it would be nice to remove domains. It would make life
> >     easier,
> >     >     and I could remove a lot of twiddly code! I kept it in for now
> to
> >     >     minimize the behavior changes from the old MaskedArray.
> >     >
> >     >
> >     > That makes sense. Could be separated out to a
> backwards-compatibility
> >     > class later.
> >     >
> >     >
> >     >     > In any case, I would think that a basic truth should be that
> >     >     everything
> >     >     > has a mask with a shape consistent with the data, so
> >     >     > 1. Each complex numbers has just one mask, and setting
> >     `a.imag` with a
> >     >     > masked array should definitely propagate the mask.
> >     >     > 2. For a masked array with structured dtype, I'd similarly
> say
> >     >     that the
> >     >     > default is for a mask to have the same shape as the array.
> >     But that
> >     >     > something like your collection makes sense for the case
> >     where one
> >     >     wants
> >     >     > to mask items in a structure.
> >     >
> >     >     Agreed that we should have a single bool per complex or
> structured
> >     >     element, and the mask shape is the same as the array shape.
> >     That's how I
> >     >     implemented it. But there is still a problem with complex.imag
> >     >     assignment:
> >     >
> >     >         >>> a = MaskedArray([1j, 2, X])
> >     >         >>> i = a.imag
> >     >         >>> i[:] = MaskedArray([1, X, 1])
> >     >
> >     >     If we make the last line copy the mask to the original array,
> what
> >     >     should the real part of a[2] be? Conversely, if we don't copy
> >     the mask,
> >     >     what should the imag part of a[1] be? It seems like we might
> >     "want" the
> >     >     masks to be OR'd instead, but then should i[2] be masked after
> >     we just
> >     >     set it to 1?
> >     >
> >     > Ah, I see the issue now... Easiest to implement and closest in
> analogy
> >     > to a regular view would be to just let it unmask a[2] (with
> >     whatever is
> >     > in real; user beware!).
> >     >
> >     > Perhaps better would be to special-case such that `imag` returns a
> >     > read-only view of the mask. Making `imag` itself read-only would
> >     prevent
> >     > possibly reasonable things like `i[np.isclose(i, 0)] = 0` - but
> >     there is
> >     > no reason this should update the mask.
> >     >
> >     > Still, neither is really satisfactory...
> >     >
> >     >
> >     >
> >     >     > p.s. I started trying to implement the above "Mixin" class;
> will
> >     >     try to
> >     >     > clean that up a bit so that at least it uses `where` and
> >     push it up.
> >     >
> >     >     I played with "where", but didn't include it since 1.17 is not
> >     released.
> >     >     To avoid duplication of effort, I've attached a diff of what I
> >     tried. I
> >     >     actually get a slight slowdown of about 10% by using where...
> >     >
> >     >
> >     > Your implementation is indeed quite similar to what I got in
> >     > __array_ufunc__ (though one should "&" the where with ~mask).
> >     >
> >     > I think the main benefit is not to presume that whatever is
> underneath
> >     > understands 0 or 1, i.e., avoid filling.
> >     >
> >     >
> >     >     If you make progress with the mixin, a push is welcome. I
> >     imagine a
> >     >     problem is going to be that np.isscalar doesn't work to detect
> >     duck
> >     >     scalars.
> >     >
> >     > I fear that in my attempts I've simply decided that only array
> scalars
> >     > exist...
> >     >
> >     > -- Marten
> >     >
> >     > _______________________________________________
> >     > NumPy-Discussion mailing list
> >     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >     > https://mail.python.org/mailman/listinfo/numpy-discussion
> >     >
> >
> >     _______________________________________________
> >     NumPy-Discussion mailing list
> >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >     https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/b34c0d15/attachment-0001.html>

From shoyer at gmail.com  Sun Jun 23 03:03:56 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 23 Jun 2019 10:03:56 +0300
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
Message-ID: <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>

On Sat, Jun 22, 2019 at 6:50 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Allan,
>
> I'm not sure I would go too much by what the old MaskedArray class did. It
> indeed made an effort not to overwrite masked values with a new result,
> even to the extend of copying back masked input data elements to the output
> data array after an operation. But the fact that this is non-sensical if
> the dtype changes (or the units in an operation on quantities) suggests
> that this mental model simply does not work.
>
> I think a sensible alternative mental model for the MaskedArray class is
> that all it does is forward any operations to the data it holds and
> separately propagate a mask, ORing elements together for binary operations,
> etc., and explicitly skipping masked elements in reductions (ideally using
> `where` to be as agnostic as possible about the underlying data, for which,
> e.g., setting masked values to `0` for `np.reduce.add` may or may not be
> the right thing to do - what if they are string?).
>

+1, this sounds like the right model to me.

That said, I would still not guarantee values under the mask as part of
NumPy's API. The result of computations under the mask should be considered
an undefined implementation detail, sort of like integer overflow or dict
iteration order pre-Python 3.7. The values may even be entirely arbitrary,
e.g., in cases where the result is preallocated with empty().

I'm less confident about the right way to handle missing elements in
reductions. For example:
- Should median() also skip missing elements, even though there is no
identity element?
- If reductions/aggregations default to skipping missing elements, how is
it be possible to express "NA propagating" versions, which are also useful,
if slightly less common?

We may want to add a standard "skipna" argument on NumPy aggregations,
solely for the benefit of duck arrays (and dtypes with missing values). But
that could also be a source of confusion, especially if skipna=True refers
only "true NA" values, not including NaN, which is used as an alias for NA
in pandas and elsewhere.

With this mental picture, the underlying data are always have well-defined
> meaning: they have been operated on as if the mask did not exist. There
> then is also less reason to try to avoid getting it back to the user.
>
> As a concrete example (maybe Ben has others): in astropy we have a
> sigma-clipping average routine, which uses a `MaskedArray` to iteratively
> mask items that are too far off from the mean; here, the mask varies each
> iteration (an initially masked element can come back into play), but the
> data do not.
>
> All the best,
>
> Marten
>
> On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane <allanhaldane at gmail.com>
> wrote:
>
>> On 6/21/19 2:37 PM, Benjamin Root wrote:
>> > Just to note, data that is masked isn't always garbage. There are plenty
>> > of use-cases where one may want to temporarily apply a mask for a set of
>> > computation, or possibly want to apply a series of different masks to
>> > the data. I haven't read through this discussion deeply enough, but is
>> > this new class going to destroy underlying masked data? and will it be
>> > possible to swap out masks?
>> >
>> > Cheers!
>> > Ben Root
>>
>> Indeed my implementation currently feels free to clobber the data at
>> masked positions and makes no guarantees not to.
>>
>> I'd like to try to support reasonable use-cases like yours though. A few
>> thoughts:
>>
>> First, the old np.ma.MaskedArray explicitly does not promise to preserve
>> masked values, with a big warning in the docs. I can't recall the
>> examples, but I remember coming across cases where clobbering happens.
>> So arguably your behavior was never supported, and perhaps this means
>> that no-clobber behavior is difficult to reasonably support.
>>
>> Second, the old np.ma.MaskedArray avoids frequent clobbering by making
>> lots of copies. Therefore, in most cases you will not lose any
>> performance in my new MaskedArray relative to the old one by making an
>> explicit copy yourself. I.e, is it problematic to have to do
>>
>>      >>> result = MaskedArray(data.copy(), trial_mask).sum()
>>
>> instead of
>>
>>      >>> marr.mask = trial_mask
>>      >>> result = marr.sum()
>>
>> since they have similar performance?
>>
>> Third, in the old np.ma.MaskedArray masked positions are very often
>> "effectively" clobbered, in the sense that they are not computed. For
>> example, if you do "c = a+b", and then change the mask of c, the values
>> at masked position of the result of (a+b) do not correspond to the sum
>> of the masked values in a and b. Thus, by "unmasking" c you are exposing
>> nonsense values, which to me seems likely to cause heisenbugs.
>>
>>
>> In summary, by not making no-clobber guarantees and by strictly
>> preventing exposure of nonsense values, I suspect that: 1. my new code
>> is simpler and faster by avoiding lots of copies, and forces copies to
>> be explicit in user code. 2. disallowing direct modification of the mask
>> lowers the "API surface area" making people's MaskedArray code less
>> buggy and easier to read: Exposure of nonsense values by "unmasking" is
>> one less possibility to keep in mind.
>>
>> Best,
>> Allan
>>
>>
>> > On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane <allanhaldane at gmail.com
>> > <mailto:allanhaldane at gmail.com>> wrote:
>> >
>> >     On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
>> >     > Hi Allan,
>> >     >
>> >     > This is very impressive! I could get the tests that I wrote for my
>> >     class
>> >     > pass with yours using Quantity with what I would consider very
>> minimal
>> >     > changes. I only could not find a good way to unmask data (I like
>> the
>> >     > idea of setting the mask on some elements via `ma[item] = X`); is
>> this
>> >     > on purpose?
>> >
>> >     Yes, I want to make it difficult for the user to access the garbage
>> >     values under the mask, which are often clobbered values. The only
>> way to
>> >     "remove" a masked value is by replacing it with a new non-masked
>> value.
>> >
>> >
>> >     > Anyway, it would seem easily at the point where I should comment
>> >     on your
>> >     > repository rather than in the mailing list!
>> >
>> >     To make further progress on this encapsulation idea I need a more
>> >     complete ducktype to pass into MaskedArray to test, so that's what
>> I'll
>> >     work on next, when I have time. I'll either try to finish my
>> >     ArrayCollection type, or try making a simple NDunit ducktype
>> >     piggybacking on astropy's Unit.
>> >
>> >     Best,
>> >     Allan
>> >
>> >
>> >     >
>> >     > All the best,
>> >     >
>> >     > Marten
>> >     >
>> >     >
>> >     > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane
>> >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>> >     > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
>> >     wrote:
>> >     >
>> >     >     On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
>> >     >     >
>> >     >     >
>> >     >     > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
>> >     >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>> >     <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>
>> >     >     > <mailto:allanhaldane at gmail.com
>> >     <mailto:allanhaldane at gmail.com> <mailto:allanhaldane at gmail.com
>> >     <mailto:allanhaldane at gmail.com>>>>
>> >     >     wrote:
>> >     >     > <snip>
>> >     >     >
>> >     >     >     > This may be too much to ask from the initializer,
>> but, if
>> >     >     so, it still
>> >     >     >     > seems most useful if it is made as easy as possible
>> to do,
>> >     >     say, `class
>> >     >     >     > MaskedQuantity(Masked, Quantity): <very few
>> overrides>`.
>> >     >     >
>> >     >     >     Currently MaskedArray does not accept ducktypes as
>> >     underlying
>> >     >     arrays,
>> >     >     >     but I think it shouldn't be too hard to modify it to do
>> so.
>> >     >     Good idea!
>> >     >     >
>> >     >     >
>> >     >     > Looking back at my trial, I see that I also never got to
>> >     duck arrays -
>> >     >     > only ndarray subclasses - though I tried to make the code as
>> >     >     agnostic as
>> >     >     > possible.
>> >     >     >
>> >     >     > (Trial at
>> >     >     >
>> >     >
>> >
>> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
>> )
>> >     >     >
>> >     >     >     I already partly navigated this mixin-issue in the
>> >     >     >     "MaskedArrayCollection" class, which essentially does
>> >     >     >     ArrayCollection(MaskedArray(array)), and only takes
>> about 30
>> >     >     lines of
>> >     >     >     boilerplate. That's the backwards encapsulation order
>> from
>> >     >     what you want
>> >     >     >     though.
>> >     >     >
>> >     >     >
>> >     >     > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) *
>> >     u.m,
>> >     >     > mask=[True, False, False])` does indeed not have a `.unit`
>> >     attribute
>> >     >     > (and cannot represent itself...); I'm not at all sure that
>> my
>> >     >     method of
>> >     >     > just creating a mixed class is anything but a recipe for
>> >     disaster,
>> >     >     though!
>> >     >
>> >     >     Based on your suggestion I worked on this a little today, and
>> >     now my
>> >     >     MaskedArray more easily encapsulates both ducktypes and
>> ndarray
>> >     >     subclasses (pushed to repo). Here's an example I got working
>> >     with masked
>> >     >     units using unyt:
>> >     >
>> >     >     [1]: from MaskedArray import X, MaskedArray, MaskedScalar
>> >     >
>> >     >     [2]: from unyt import m, km
>> >     >
>> >     >     [3]: import numpy as np
>> >     >
>> >     >     [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>> >     >
>> >     >     [5]: uarr
>> >     >
>> >     >     MaskedArray([1., X , 3.])
>> >     >     [6]: uarr + 1*m
>> >     >
>> >     >     MaskedArray([1.001, X    , 3.001])
>> >     >     [7]: uarr.filled()
>> >     >
>> >     >     unyt_array([1., 0., 3.], 'km')
>> >     >     [8]: np.concatenate([uarr, 2*uarr]).filled()
>> >     >     unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>> >     >
>> >     >     The catch is the ducktype/subclass has to rigorously follow
>> >     numpy's
>> >     >     indexing rules, including distinguishing 0d arrays from
>> >     scalars. For now
>> >     >     only I used unyt in the example above since it happens to be
>> >     less strict
>> >     >      about dimensionless operations than astropy.units which trips
>> >     up my
>> >     >     repr code. (see below for example with astropy.units). Note in
>> >     the last
>> >     >     line I lost the dimensions, but that is because unyt does not
>> >     handle
>> >     >     np.concatenate. To get that to work we need a true ducktype
>> >     for units.
>> >     >
>> >     >     The example above doesn't expose the ".units" attribute
>> >     outside the
>> >     >     MaskedArray, and it doesn't print the units in the repr. But
>> >     you can
>> >     >     access them using "filled".
>> >     >
>> >     >     While I could make MaskedArray forward unknown attribute
>> >     accesses to the
>> >     >     encapsulated array, that seems a bit dangerous/bug-prone at
>> first
>> >     >     glance, so probably I want to require the user to make a
>> >     MaskedArray
>> >     >     subclass to do so. I've just started playing with that
>> >     (probably buggy),
>> >     >     and Ive attached subclass examples for astropy.unit and unyt,
>> >     with some
>> >     >     example output below.
>> >     >
>> >     >     Cheers,
>> >     >     Allan
>> >     >
>> >     >
>> >     >
>> >     >     Example using the attached astropy unit subclass:
>> >     >
>> >     >         >>> from astropy.units import m, km, s
>> >     >         >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>> >     >         >>> uarr
>> >     >         MaskedQ([1., X , 1.], units=km)
>> >     >         >>> uarr.units
>> >     >         km
>> >     >         >>> uarr + (1*m)
>> >     >         MaskedQ([1.001, X    , 1.001], units=km)
>> >     >         >>> uarr/(1*s)
>> >     >         MaskedQ([1., X , 1.], units=km / s)
>> >     >         >>> (uarr*(1*m))[1:]
>> >     >         MaskedQ([X , 1.], units=km m)
>> >     >         >>> np.add.outer(uarr, uarr)
>> >     >         MaskedQ([[2., X , 2.],
>> >     >                  [X , X , X ],
>> >     >                  [2., X , 2.]], units=km)
>> >     >         >>> print(uarr)
>> >     >         [1. X  1.] km m
>> >     >
>> >     >     Cheers,
>> >     >     Allan
>> >     >
>> >     >
>> >     >     >     > Even if this impossible, I think it is conceptually
>> useful
>> >     >     to think
>> >     >     >     > about what the masking class should do. My sense is
>> that,
>> >     >     e.g., it
>> >     >     >     > should not attempt to decide when an operation
>> >     succeeds or not,
>> >     >     >     but just
>> >     >     >     > "or together" input masks for regular, multiple-input
>> >     functions,
>> >     >     >     and let
>> >     >     >     > the underlying arrays skip elements for reductions by
>> >     using
>> >     >     `where`
>> >     >     >     > (hey, I did implement that for a reason... ;-). In
>> >     >     particular, it
>> >     >     >     > suggests one should not have things like domains and
>> all
>> >     >     that (I never
>> >     >     >     > understood why `MaskedArray` did that). If one wants
>> more,
>> >     >     the class
>> >     >     >     > should provide a method that updates the mask (a
>> sensible
>> >     >     default
>> >     >     >     might
>> >     >     >     > be `mask |= ~np.isfinite(result)` - here, the class
>> >     being masked
>> >     >     >     should
>> >     >     >     > logically support ufuncs and functions, so it can
>> >     decide what
>> >     >     >     "isfinite"
>> >     >     >     > means).
>> >     >     >
>> >     >     >     I agree it would be nice to remove domains. It would
>> >     make life
>> >     >     easier,
>> >     >     >     and I could remove a lot of twiddly code! I kept it in
>> >     for now to
>> >     >     >     minimize the behavior changes from the old MaskedArray.
>> >     >     >
>> >     >     >
>> >     >     > That makes sense. Could be separated out to a
>> >     backwards-compatibility
>> >     >     > class later.
>> >     >     >
>> >     >     >
>> >     >     >     > In any case, I would think that a basic truth should
>> >     be that
>> >     >     >     everything
>> >     >     >     > has a mask with a shape consistent with the data, so
>> >     >     >     > 1. Each complex numbers has just one mask, and setting
>> >     >     `a.imag` with a
>> >     >     >     > masked array should definitely propagate the mask.
>> >     >     >     > 2. For a masked array with structured dtype, I'd
>> >     similarly say
>> >     >     >     that the
>> >     >     >     > default is for a mask to have the same shape as the
>> array.
>> >     >     But that
>> >     >     >     > something like your collection makes sense for the
>> case
>> >     >     where one
>> >     >     >     wants
>> >     >     >     > to mask items in a structure.
>> >     >     >
>> >     >     >     Agreed that we should have a single bool per complex or
>> >     structured
>> >     >     >     element, and the mask shape is the same as the array
>> shape.
>> >     >     That's how I
>> >     >     >     implemented it. But there is still a problem with
>> >     complex.imag
>> >     >     >     assignment:
>> >     >     >
>> >     >     >         >>> a = MaskedArray([1j, 2, X])
>> >     >     >         >>> i = a.imag
>> >     >     >         >>> i[:] = MaskedArray([1, X, 1])
>> >     >     >
>> >     >     >     If we make the last line copy the mask to the original
>> >     array, what
>> >     >     >     should the real part of a[2] be? Conversely, if we don't
>> >     copy
>> >     >     the mask,
>> >     >     >     what should the imag part of a[1] be? It seems like we
>> might
>> >     >     "want" the
>> >     >     >     masks to be OR'd instead, but then should i[2] be masked
>> >     after
>> >     >     we just
>> >     >     >     set it to 1?
>> >     >     >
>> >     >     > Ah, I see the issue now... Easiest to implement and closest
>> >     in analogy
>> >     >     > to a regular view would be to just let it unmask a[2] (with
>> >     >     whatever is
>> >     >     > in real; user beware!).
>> >     >     >
>> >     >     > Perhaps better would be to special-case such that `imag`
>> >     returns a
>> >     >     > read-only view of the mask. Making `imag` itself read-only
>> would
>> >     >     prevent
>> >     >     > possibly reasonable things like `i[np.isclose(i, 0)] = 0` -
>> but
>> >     >     there is
>> >     >     > no reason this should update the mask.
>> >     >     >
>> >     >     > Still, neither is really satisfactory...
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >     > p.s. I started trying to implement the above "Mixin"
>> >     class; will
>> >     >     >     try to
>> >     >     >     > clean that up a bit so that at least it uses `where`
>> and
>> >     >     push it up.
>> >     >     >
>> >     >     >     I played with "where", but didn't include it since 1.17
>> >     is not
>> >     >     released.
>> >     >     >     To avoid duplication of effort, I've attached a diff of
>> >     what I
>> >     >     tried. I
>> >     >     >     actually get a slight slowdown of about 10% by using
>> >     where...
>> >     >     >
>> >     >     >
>> >     >     > Your implementation is indeed quite similar to what I got in
>> >     >     > __array_ufunc__ (though one should "&" the where with
>> ~mask).
>> >     >     >
>> >     >     > I think the main benefit is not to presume that whatever is
>> >     underneath
>> >     >     > understands 0 or 1, i.e., avoid filling.
>> >     >     >
>> >     >     >
>> >     >     >     If you make progress with the mixin, a push is welcome.
>> I
>> >     >     imagine a
>> >     >     >     problem is going to be that np.isscalar doesn't work to
>> >     detect
>> >     >     duck
>> >     >     >     scalars.
>> >     >     >
>> >     >     > I fear that in my attempts I've simply decided that only
>> >     array scalars
>> >     >     > exist...
>> >     >     >
>> >     >     > -- Marten
>> >     >     >
>> >     >     > _______________________________________________
>> >     >     > NumPy-Discussion mailing list
>> >     >     > NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>
>> >     <mailto:NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>>
>> >     >     > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >     >
>> >     >
>> >     >     _______________________________________________
>> >     >     NumPy-Discussion mailing list
>> >     >     NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>
>> >     <mailto:NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>>
>> >     >     https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >
>> >     >
>> >     > _______________________________________________
>> >     > NumPy-Discussion mailing list
>> >     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>> >     > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >
>> >
>> >     _______________________________________________
>> >     NumPy-Discussion mailing list
>> >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>> >     https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/aeb248cf/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Sun Jun 23 09:07:29 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 23 Jun 2019 09:07:29 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
Message-ID: <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>

> I think a sensible alternative mental model for the MaskedArray class is
>> that all it does is forward any operations to the data it holds and
>> separately propagate a mask, ORing elements together for binary operations,
>> etc., and explicitly skipping masked elements in reductions (ideally using
>> `where` to be as agnostic as possible about the underlying data, for which,
>> e.g., setting masked values to `0` for `np.reduce.add` may or may not be
>> the right thing to do - what if they are string?).
>>
>
> +1, this sounds like the right model to me.
>

One small worry is about name clashes - ideally one wants the masked array
to be somewhat of a drop-in for whatever class it is masking (independently
of whether it is an actual subclass of it). In this respect, `.data` is a
pretty terrible name (and, yes, the cause of lots of problems for astropy's
MaskedColumn - not my fault, that one!). In my own trials, thinking that
names that include "mask" are fair game, I've been considering a function
`.unmask(fill_value=None)` which would replace both `.filled(fill_value)`
and `.data` by having the default be not to fill anything (I don't see why
a masked array should carry a fill value along; one might use specific
strings such as 'minmax' for auto-generated cases). If wanted, one could
then add `unmasked = property(unmask)`.

Aside: my sense is to, at first at least, feel as unbound as possible from
the current MaskedArray - one could then use whatever it is to try to
create something that comes close to reproducing it, but only for ease of
transition.


> That said, I would still not guarantee values under the mask as part of
> NumPy's API. The result of computations under the mask should be considered
> an undefined implementation detail, sort of like integer overflow or dict
> iteration order pre-Python 3.7. The values may even be entirely arbitrary,
> e.g., in cases where the result is preallocated with empty().
>

I think that is reasonable. The use cases Ben and I described both are ones
where the array is being used as input for a set of computations which
differ only in their mask. (Admittedly, in both our cases one could just
reinitialize a masked array with the new mask; but I think we share the
mental model of that if I don't operate on the masked array, the data
doesn't change, so I should just be able to change the mask.)


> I'm less confident about the right way to handle missing elements in
> reductions. For example:
> - Should median() also skip missing elements, even though there is no
> identity element?
>

I think so. If for mean(), std(), etc., the number of unmasked elements
comes into play, I don't see why it wouldn't for median().

- If reductions/aggregations default to skipping missing elements, how is
> it be possible to express "NA propagating" versions, which are also useful,
> if slightly less common?
>

I have been playing with using a new `Mask(np.ndarray)` class for the mask,
which does the actual mask propagation (i.e., all single-operand ufuncs
just copy the mask, binary operations do `logical_or` and reductions do
`logical.and.reduce`). This way the `Masked` class itself can generally
apply a given operation on the data and the masks separately and then
combine the two results (reductions are the exception in that `where` has
to be set). Your particular example here could be solved with a different
`Mask` class, for which reductions do `logical.or.reduce`.

A larger issue is the accumulations. Personally, I think those are
basically meaningless for masked arrays, as to me logically the result on
the position of any masked item should be masked. But, if so, I do not see
how the ones "beyond" it could not be masked as well. Since here the right
answers seems at least unclear, my sense would be to refuse the temptation
to guess (i.e., the user should just explicitly fill with ufunc.identity if
this is the right thing to do in their case).

I should add that I'm slightly torn about a similar, somewhat related
issue: what should `np.minimum(a, b)` do for the case where either a or b
is masked? Currently, one just treats this as a bin-op, so the result is
masked, but one could argue that this ufunc is a bit like a 2-element
reduction, and thus that the unmasked item should "win by default".
Possibly, the answer should be different between `np.minimum` and `np.fmin`
(since the two differ in how they propagate `NaN` as well - note that you
don't include `fmin` and `fmax` in your coverage).

We may want to add a standard "skipna" argument on NumPy aggregations,
> solely for the benefit of duck arrays (and dtypes with missing values). But
> that could also be a source of confusion, especially if skipna=True refers
> only "true NA" values, not including NaN, which is used as an alias for NA
> in pandas and elsewhere.
>

It does seem `where` should suffice, no? If one wants to be super-fancy, we
could allow it to be a callable, which, if a ufunc, gets used inside the
loop (`where=np.isfinite` would be particularly useful).

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/8debfb7d/attachment.html>

From aldcroft at head.cfa.harvard.edu  Sun Jun 23 09:52:42 2019
From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas)
Date: Sun, 23 Jun 2019 09:52:42 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
Message-ID: <CAMtEP6zt0N4XNYkn9gHHM3QXfArjFRtZTLrd+7ubY4cr8J17xw@mail.gmail.com>

On Sat, Jun 22, 2019 at 11:51 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Allan,
>
> I'm not sure I would go too much by what the old MaskedArray class did. It
> indeed made an effort not to overwrite masked values with a new result,
> even to the extend of copying back masked input data elements to the output
> data array after an operation. But the fact that this is non-sensical if
> the dtype changes (or the units in an operation on quantities) suggests
> that this mental model simply does not work.
>
> I think a sensible alternative mental model for the MaskedArray class is
> that all it does is forward any operations to the data it holds and
> separately propagate a mask,
>

I'm generally on-board with that mental picture, and agree that the
use-case described by Ben (different layers of satellite imagery) is
important.  Same thing happens in astronomy data, e.g. you have a CCD image
of the sky and there are cosmic rays that contaminate the image.  Those are
not garbage data, just pixels that one wants to ignore in some, but not
all, contexts.

However, it's worth noting that one cannot blindly forward any operations
to the data it holds since the operation may be illegal on that data.  The
simplest example is dividing `a / b` where  `b` has data values of 0 but
they are masked.  That operation should succeed with no exception, and here
the resultant value under the mask is genuinely garbage.

The current MaskedArray seems a bit inconsistent in dealing with invalid
calcuations.  Dividing by 0 (if masked) is no problem and returns the
numerator.  Taking the log of a masked 0 gives the usual divide by zero
RuntimeWarning and puts a 1.0 under the mask of the output.

Perhaps the expression should not even be evaluated on elements where the
output mask is True, and all the masked output data values should be set to
a predictable value (e.g. zero for numerical, zero-length string for
string, or maybe a default fill value).  That at least provides consistent
and predictable behavior that is simple to explain.  Otherwise the story is
that the data under the mask *might* be OK, unless for a particular element
the computation was invalid in which case it is filled with some arbitrary
value.  I think that is actually an error-prone behavior that should be
avoided.

- Tom


> ORing elements together for binary operations, etc., and explicitly
> skipping masked elements in reductions (ideally using `where` to be as
> agnostic as possible about the underlying data, for which, e.g., setting
> masked values to `0` for `np.reduce.add` may or may not be the right thing
> to do - what if they are string?).
>
> With this mental picture, the underlying data are always have well-defined
> meaning: they have been operated on as if the mask did not exist. There
> then is also less reason to try to avoid getting it back to the user.
>
> As a concrete example (maybe Ben has others): in astropy we have a
> sigma-clipping average routine, which uses a `MaskedArray` to iteratively
> mask items that are too far off from the mean; here, the mask varies each
> iteration (an initially masked element can come back into play), but the
> data do not.
>
> All the best,
>
> Marten
>
> On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane <allanhaldane at gmail.com>
> wrote:
>
>> On 6/21/19 2:37 PM, Benjamin Root wrote:
>> > Just to note, data that is masked isn't always garbage. There are plenty
>> > of use-cases where one may want to temporarily apply a mask for a set of
>> > computation, or possibly want to apply a series of different masks to
>> > the data. I haven't read through this discussion deeply enough, but is
>> > this new class going to destroy underlying masked data? and will it be
>> > possible to swap out masks?
>> >
>> > Cheers!
>> > Ben Root
>>
>> Indeed my implementation currently feels free to clobber the data at
>> masked positions and makes no guarantees not to.
>>
>> I'd like to try to support reasonable use-cases like yours though. A few
>> thoughts:
>>
>> First, the old np.ma.MaskedArray explicitly does not promise to preserve
>> masked values, with a big warning in the docs. I can't recall the
>> examples, but I remember coming across cases where clobbering happens.
>> So arguably your behavior was never supported, and perhaps this means
>> that no-clobber behavior is difficult to reasonably support.
>>
>> Second, the old np.ma.MaskedArray avoids frequent clobbering by making
>> lots of copies. Therefore, in most cases you will not lose any
>> performance in my new MaskedArray relative to the old one by making an
>> explicit copy yourself. I.e, is it problematic to have to do
>>
>>      >>> result = MaskedArray(data.copy(), trial_mask).sum()
>>
>> instead of
>>
>>      >>> marr.mask = trial_mask
>>      >>> result = marr.sum()
>>
>> since they have similar performance?
>>
>> Third, in the old np.ma.MaskedArray masked positions are very often
>> "effectively" clobbered, in the sense that they are not computed. For
>> example, if you do "c = a+b", and then change the mask of c, the values
>> at masked position of the result of (a+b) do not correspond to the sum
>> of the masked values in a and b. Thus, by "unmasking" c you are exposing
>> nonsense values, which to me seems likely to cause heisenbugs.
>>
>>
>> In summary, by not making no-clobber guarantees and by strictly
>> preventing exposure of nonsense values, I suspect that: 1. my new code
>> is simpler and faster by avoiding lots of copies, and forces copies to
>> be explicit in user code. 2. disallowing direct modification of the mask
>> lowers the "API surface area" making people's MaskedArray code less
>> buggy and easier to read: Exposure of nonsense values by "unmasking" is
>> one less possibility to keep in mind.
>>
>> Best,
>> Allan
>>
>>
>> > On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane <allanhaldane at gmail.com
>> > <mailto:allanhaldane at gmail.com>> wrote:
>> >
>> >     On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
>> >     > Hi Allan,
>> >     >
>> >     > This is very impressive! I could get the tests that I wrote for my
>> >     class
>> >     > pass with yours using Quantity with what I would consider very
>> minimal
>> >     > changes. I only could not find a good way to unmask data (I like
>> the
>> >     > idea of setting the mask on some elements via `ma[item] = X`); is
>> this
>> >     > on purpose?
>> >
>> >     Yes, I want to make it difficult for the user to access the garbage
>> >     values under the mask, which are often clobbered values. The only
>> way to
>> >     "remove" a masked value is by replacing it with a new non-masked
>> value.
>> >
>> >
>> >     > Anyway, it would seem easily at the point where I should comment
>> >     on your
>> >     > repository rather than in the mailing list!
>> >
>> >     To make further progress on this encapsulation idea I need a more
>> >     complete ducktype to pass into MaskedArray to test, so that's what
>> I'll
>> >     work on next, when I have time. I'll either try to finish my
>> >     ArrayCollection type, or try making a simple NDunit ducktype
>> >     piggybacking on astropy's Unit.
>> >
>> >     Best,
>> >     Allan
>> >
>> >
>> >     >
>> >     > All the best,
>> >     >
>> >     > Marten
>> >     >
>> >     >
>> >     > On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane
>> >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>> >     > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
>> >     wrote:
>> >     >
>> >     >     On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
>> >     >     >
>> >     >     >
>> >     >     > On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
>> >     >     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>> >     <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>
>> >     >     > <mailto:allanhaldane at gmail.com
>> >     <mailto:allanhaldane at gmail.com> <mailto:allanhaldane at gmail.com
>> >     <mailto:allanhaldane at gmail.com>>>>
>> >     >     wrote:
>> >     >     > <snip>
>> >     >     >
>> >     >     >     > This may be too much to ask from the initializer,
>> but, if
>> >     >     so, it still
>> >     >     >     > seems most useful if it is made as easy as possible
>> to do,
>> >     >     say, `class
>> >     >     >     > MaskedQuantity(Masked, Quantity): <very few
>> overrides>`.
>> >     >     >
>> >     >     >     Currently MaskedArray does not accept ducktypes as
>> >     underlying
>> >     >     arrays,
>> >     >     >     but I think it shouldn't be too hard to modify it to do
>> so.
>> >     >     Good idea!
>> >     >     >
>> >     >     >
>> >     >     > Looking back at my trial, I see that I also never got to
>> >     duck arrays -
>> >     >     > only ndarray subclasses - though I tried to make the code as
>> >     >     agnostic as
>> >     >     > possible.
>> >     >     >
>> >     >     > (Trial at
>> >     >     >
>> >     >
>> >
>> https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1
>> )
>> >     >     >
>> >     >     >     I already partly navigated this mixin-issue in the
>> >     >     >     "MaskedArrayCollection" class, which essentially does
>> >     >     >     ArrayCollection(MaskedArray(array)), and only takes
>> about 30
>> >     >     lines of
>> >     >     >     boilerplate. That's the backwards encapsulation order
>> from
>> >     >     what you want
>> >     >     >     though.
>> >     >     >
>> >     >     >
>> >     >     > Yes, indeed, from a quick trial `MaskedArray(np.arange(3.) *
>> >     u.m,
>> >     >     > mask=[True, False, False])` does indeed not have a `.unit`
>> >     attribute
>> >     >     > (and cannot represent itself...); I'm not at all sure that
>> my
>> >     >     method of
>> >     >     > just creating a mixed class is anything but a recipe for
>> >     disaster,
>> >     >     though!
>> >     >
>> >     >     Based on your suggestion I worked on this a little today, and
>> >     now my
>> >     >     MaskedArray more easily encapsulates both ducktypes and
>> ndarray
>> >     >     subclasses (pushed to repo). Here's an example I got working
>> >     with masked
>> >     >     units using unyt:
>> >     >
>> >     >     [1]: from MaskedArray import X, MaskedArray, MaskedScalar
>> >     >
>> >     >     [2]: from unyt import m, km
>> >     >
>> >     >     [3]: import numpy as np
>> >     >
>> >     >     [4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>> >     >
>> >     >     [5]: uarr
>> >     >
>> >     >     MaskedArray([1., X , 3.])
>> >     >     [6]: uarr + 1*m
>> >     >
>> >     >     MaskedArray([1.001, X    , 3.001])
>> >     >     [7]: uarr.filled()
>> >     >
>> >     >     unyt_array([1., 0., 3.], 'km')
>> >     >     [8]: np.concatenate([uarr, 2*uarr]).filled()
>> >     >     unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>> >     >
>> >     >     The catch is the ducktype/subclass has to rigorously follow
>> >     numpy's
>> >     >     indexing rules, including distinguishing 0d arrays from
>> >     scalars. For now
>> >     >     only I used unyt in the example above since it happens to be
>> >     less strict
>> >     >      about dimensionless operations than astropy.units which trips
>> >     up my
>> >     >     repr code. (see below for example with astropy.units). Note in
>> >     the last
>> >     >     line I lost the dimensions, but that is because unyt does not
>> >     handle
>> >     >     np.concatenate. To get that to work we need a true ducktype
>> >     for units.
>> >     >
>> >     >     The example above doesn't expose the ".units" attribute
>> >     outside the
>> >     >     MaskedArray, and it doesn't print the units in the repr. But
>> >     you can
>> >     >     access them using "filled".
>> >     >
>> >     >     While I could make MaskedArray forward unknown attribute
>> >     accesses to the
>> >     >     encapsulated array, that seems a bit dangerous/bug-prone at
>> first
>> >     >     glance, so probably I want to require the user to make a
>> >     MaskedArray
>> >     >     subclass to do so. I've just started playing with that
>> >     (probably buggy),
>> >     >     and Ive attached subclass examples for astropy.unit and unyt,
>> >     with some
>> >     >     example output below.
>> >     >
>> >     >     Cheers,
>> >     >     Allan
>> >     >
>> >     >
>> >     >
>> >     >     Example using the attached astropy unit subclass:
>> >     >
>> >     >         >>> from astropy.units import m, km, s
>> >     >         >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>> >     >         >>> uarr
>> >     >         MaskedQ([1., X , 1.], units=km)
>> >     >         >>> uarr.units
>> >     >         km
>> >     >         >>> uarr + (1*m)
>> >     >         MaskedQ([1.001, X    , 1.001], units=km)
>> >     >         >>> uarr/(1*s)
>> >     >         MaskedQ([1., X , 1.], units=km / s)
>> >     >         >>> (uarr*(1*m))[1:]
>> >     >         MaskedQ([X , 1.], units=km m)
>> >     >         >>> np.add.outer(uarr, uarr)
>> >     >         MaskedQ([[2., X , 2.],
>> >     >                  [X , X , X ],
>> >     >                  [2., X , 2.]], units=km)
>> >     >         >>> print(uarr)
>> >     >         [1. X  1.] km m
>> >     >
>> >     >     Cheers,
>> >     >     Allan
>> >     >
>> >     >
>> >     >     >     > Even if this impossible, I think it is conceptually
>> useful
>> >     >     to think
>> >     >     >     > about what the masking class should do. My sense is
>> that,
>> >     >     e.g., it
>> >     >     >     > should not attempt to decide when an operation
>> >     succeeds or not,
>> >     >     >     but just
>> >     >     >     > "or together" input masks for regular, multiple-input
>> >     functions,
>> >     >     >     and let
>> >     >     >     > the underlying arrays skip elements for reductions by
>> >     using
>> >     >     `where`
>> >     >     >     > (hey, I did implement that for a reason... ;-). In
>> >     >     particular, it
>> >     >     >     > suggests one should not have things like domains and
>> all
>> >     >     that (I never
>> >     >     >     > understood why `MaskedArray` did that). If one wants
>> more,
>> >     >     the class
>> >     >     >     > should provide a method that updates the mask (a
>> sensible
>> >     >     default
>> >     >     >     might
>> >     >     >     > be `mask |= ~np.isfinite(result)` - here, the class
>> >     being masked
>> >     >     >     should
>> >     >     >     > logically support ufuncs and functions, so it can
>> >     decide what
>> >     >     >     "isfinite"
>> >     >     >     > means).
>> >     >     >
>> >     >     >     I agree it would be nice to remove domains. It would
>> >     make life
>> >     >     easier,
>> >     >     >     and I could remove a lot of twiddly code! I kept it in
>> >     for now to
>> >     >     >     minimize the behavior changes from the old MaskedArray.
>> >     >     >
>> >     >     >
>> >     >     > That makes sense. Could be separated out to a
>> >     backwards-compatibility
>> >     >     > class later.
>> >     >     >
>> >     >     >
>> >     >     >     > In any case, I would think that a basic truth should
>> >     be that
>> >     >     >     everything
>> >     >     >     > has a mask with a shape consistent with the data, so
>> >     >     >     > 1. Each complex numbers has just one mask, and setting
>> >     >     `a.imag` with a
>> >     >     >     > masked array should definitely propagate the mask.
>> >     >     >     > 2. For a masked array with structured dtype, I'd
>> >     similarly say
>> >     >     >     that the
>> >     >     >     > default is for a mask to have the same shape as the
>> array.
>> >     >     But that
>> >     >     >     > something like your collection makes sense for the
>> case
>> >     >     where one
>> >     >     >     wants
>> >     >     >     > to mask items in a structure.
>> >     >     >
>> >     >     >     Agreed that we should have a single bool per complex or
>> >     structured
>> >     >     >     element, and the mask shape is the same as the array
>> shape.
>> >     >     That's how I
>> >     >     >     implemented it. But there is still a problem with
>> >     complex.imag
>> >     >     >     assignment:
>> >     >     >
>> >     >     >         >>> a = MaskedArray([1j, 2, X])
>> >     >     >         >>> i = a.imag
>> >     >     >         >>> i[:] = MaskedArray([1, X, 1])
>> >     >     >
>> >     >     >     If we make the last line copy the mask to the original
>> >     array, what
>> >     >     >     should the real part of a[2] be? Conversely, if we don't
>> >     copy
>> >     >     the mask,
>> >     >     >     what should the imag part of a[1] be? It seems like we
>> might
>> >     >     "want" the
>> >     >     >     masks to be OR'd instead, but then should i[2] be masked
>> >     after
>> >     >     we just
>> >     >     >     set it to 1?
>> >     >     >
>> >     >     > Ah, I see the issue now... Easiest to implement and closest
>> >     in analogy
>> >     >     > to a regular view would be to just let it unmask a[2] (with
>> >     >     whatever is
>> >     >     > in real; user beware!).
>> >     >     >
>> >     >     > Perhaps better would be to special-case such that `imag`
>> >     returns a
>> >     >     > read-only view of the mask. Making `imag` itself read-only
>> would
>> >     >     prevent
>> >     >     > possibly reasonable things like `i[np.isclose(i, 0)] = 0` -
>> but
>> >     >     there is
>> >     >     > no reason this should update the mask.
>> >     >     >
>> >     >     > Still, neither is really satisfactory...
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >     > p.s. I started trying to implement the above "Mixin"
>> >     class; will
>> >     >     >     try to
>> >     >     >     > clean that up a bit so that at least it uses `where`
>> and
>> >     >     push it up.
>> >     >     >
>> >     >     >     I played with "where", but didn't include it since 1.17
>> >     is not
>> >     >     released.
>> >     >     >     To avoid duplication of effort, I've attached a diff of
>> >     what I
>> >     >     tried. I
>> >     >     >     actually get a slight slowdown of about 10% by using
>> >     where...
>> >     >     >
>> >     >     >
>> >     >     > Your implementation is indeed quite similar to what I got in
>> >     >     > __array_ufunc__ (though one should "&" the where with
>> ~mask).
>> >     >     >
>> >     >     > I think the main benefit is not to presume that whatever is
>> >     underneath
>> >     >     > understands 0 or 1, i.e., avoid filling.
>> >     >     >
>> >     >     >
>> >     >     >     If you make progress with the mixin, a push is welcome.
>> I
>> >     >     imagine a
>> >     >     >     problem is going to be that np.isscalar doesn't work to
>> >     detect
>> >     >     duck
>> >     >     >     scalars.
>> >     >     >
>> >     >     > I fear that in my attempts I've simply decided that only
>> >     array scalars
>> >     >     > exist...
>> >     >     >
>> >     >     > -- Marten
>> >     >     >
>> >     >     > _______________________________________________
>> >     >     > NumPy-Discussion mailing list
>> >     >     > NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>
>> >     <mailto:NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>>
>> >     >     > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >     >
>> >     >
>> >     >     _______________________________________________
>> >     >     NumPy-Discussion mailing list
>> >     >     NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>
>> >     <mailto:NumPy-Discussion at python.org
>> >     <mailto:NumPy-Discussion at python.org>>
>> >     >     https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >
>> >     >
>> >     > _______________________________________________
>> >     > NumPy-Discussion mailing list
>> >     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>> >     > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >     >
>> >
>> >     _______________________________________________
>> >     NumPy-Discussion mailing list
>> >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>> >     https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/377ae8fa/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Sun Jun 23 11:11:20 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 23 Jun 2019 11:11:20 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAMtEP6zt0N4XNYkn9gHHM3QXfArjFRtZTLrd+7ubY4cr8J17xw@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAMtEP6zt0N4XNYkn9gHHM3QXfArjFRtZTLrd+7ubY4cr8J17xw@mail.gmail.com>
Message-ID: <CAJNV+9tgOzGS-b95q1SJwSDyFzLayin_61fVhBn06+T3rLfzjg@mail.gmail.com>

Hi Tom,


I think a sensible alternative mental model for the MaskedArray class is
>> that all it does is forward any operations to the data it holds and
>> separately propagate a mask,
>>
>
> I'm generally on-board with that mental picture, and agree that the
> use-case described by Ben (different layers of satellite imagery) is
> important.  Same thing happens in astronomy data, e.g. you have a CCD image
> of the sky and there are cosmic rays that contaminate the image.  Those are
> not garbage data, just pixels that one wants to ignore in some, but not
> all, contexts.
>
> However, it's worth noting that one cannot blindly forward any operations
> to the data it holds since the operation may be illegal on that data.  The
> simplest example is dividing `a / b` where  `b` has data values of 0 but
> they are masked.  That operation should succeed with no exception, and here
> the resultant value under the mask is genuinely garbage.
>

Even in the present implementation, the operation is just forwarded, with
numpy errstate set to ignore all errors. And then after the fact some
half-hearted remediation is done.


> The current MaskedArray seems a bit inconsistent in dealing with invalid
> calcuations.  Dividing by 0 (if masked) is no problem and returns the
> numerator.  Taking the log of a masked 0 gives the usual divide by zero
> RuntimeWarning and puts a 1.0 under the mask of the output.
>
> Perhaps the expression should not even be evaluated on elements where the
> output mask is True, and all the masked output data values should be set to
> a predictable value (e.g. zero for numerical, zero-length string for
> string, or maybe a default fill value).  That at least provides consistent
> and predictable behavior that is simple to explain.  Otherwise the story is
> that the data under the mask *might* be OK, unless for a particular element
> the computation was invalid in which case it is filled with some arbitrary
> value.  I think that is actually an error-prone behavior that should be
> avoided.
>

I think I agree with Allan here, that after a computation, one generally
simply cannot safely assume anything for masked elements.

But it is reasonable for subclasses to define what they want to do
"post-operation"; e.g., for numerical arrays, it might make generally make
sense to do
```
    notok = ~np.isfinite(result)
    mask |= notok
```
and one could then also do
```
    result[notok] = fill_value
```

But I think one might want to leave that to the user.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/0a23bdcf/attachment.html>

From shoyer at gmail.com  Sun Jun 23 12:05:32 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Sun, 23 Jun 2019 19:05:32 +0300
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
 <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>
Message-ID: <CAEQ_Tvf8TQhnJtgfxae2dmxJAKvYvT-9ZoGiFFE92eNsDdu_CA@mail.gmail.com>

On Sun, Jun 23, 2019 at 4:07 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> - If reductions/aggregations default to skipping missing elements, how is
>> it be possible to express "NA propagating" versions, which are also useful,
>> if slightly less common?
>>
>
> I have been playing with using a new `Mask(np.ndarray)` class for the
> mask, which does the actual mask propagation (i.e., all single-operand
> ufuncs just copy the mask, binary operations do `logical_or` and reductions
> do `logical.and.reduce`). This way the `Masked` class itself can generally
> apply a given operation on the data and the masks separately and then
> combine the two results (reductions are the exception in that `where` has
> to be set). Your particular example here could be solved with a different
> `Mask` class, for which reductions do `logical.or.reduce`.
>

I think it would be much better to use duck-typing for the mask as well, if
possible, rather than a NumPy array subclass. This would facilitate using
alternative mask implementations, e.g., distributed masks, sparse masks,
bit-array masks, etc.

Are there use-cases for propagating masks separately from data? If not, it
might make sense to only define mask operations along with data, which
could be much simpler.


> We may want to add a standard "skipna" argument on NumPy aggregations,
>> solely for the benefit of duck arrays (and dtypes with missing values). But
>> that could also be a source of confusion, especially if skipna=True refers
>> only "true NA" values, not including NaN, which is used as an alias for NA
>> in pandas and elsewhere.
>>
>
> It does seem `where` should suffice, no? If one wants to be super-fancy,
> we could allow it to be a callable, which, if a ufunc, gets used inside the
> loop (`where=np.isfinite` would be particularly useful).
>

Let me try to make the API issue more concrete. Suppose we have a
MaskedArray with values [1, 2, NA]. How do I get:
1. The sum ignoring masked values, i.e., 3.
2. The sum that is tainted by masked values, i.e., NA.

Here's how this works with existing array libraries:
- With base NumPy using NaN as a sentinel value for NA, you can get (1)
with np.sum and (2) with np.nansum.
- With pandas and xarray, the default behavior is (1) and to get (2) you
need to write array.sum(skipna=False).
- With NumPy's current MaskedArray, it appears that you can only get (1).
Maybe there isn't as strong a need for (2) as I thought?

Your proposal would be something like np.sum(array,
where=np.ones_like(array))? This seems rather verbose for a common
operation. Perhaps np.sum(array, where=True) would work, making use of
broadcasting? (I haven't actually checked whether this is well-defined yet.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/00db954a/attachment.html>

From stewartclelland at gmail.com  Sun Jun 23 15:04:44 2019
From: stewartclelland at gmail.com (Stewart Clelland)
Date: Sun, 23 Jun 2019 15:04:44 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
Message-ID: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>

Hi All,

Based on discussion with Marten on github
<https://github.com/numpy/numpy/issues/13797>, I have a couple of
suggestions on syntax improvements on array transpose operations.

First, introducing a shorthand for the Hermitian Transpose operator. I
thought "A.HT" might be a viable candidate.

Second, the adding an array method that operates like a normal transpose.
To my understanding,
"A.tranpose()" currently inverts the usual order of all dimensions. This
may be useful in some applications involving tensors, but is not what I
would usually assume a transpose on a multi-dimensional array would entail.
I suggest a syntax of "A.MT" to indicate a transpose of the last two
dimensions by default, maybe with optional arguments (i,j) to indicate
which two dimensions to transpose.

I'm new to this mailing list format, hopefully I'm doing this right :)

Thanks,
Stew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/a9025c5d/attachment.html>

From wieser.eric+numpy at gmail.com  Sun Jun 23 15:24:18 2019
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sun, 23 Jun 2019 12:24:18 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
Message-ID: <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>

This might be contentious, but I wonder if, with a long enough deprecation
cycle, we can change the meaning of .T. That would look like:

* Emit a future warning on `more_than_2d.T` with a message like "in future
.T will transpose just the last two dimensions, not all dimensions. Use
are.transpose() if transposing all {n} dimensions is deliberate"
* Wait 5 releases or so, see how many matches Google / GitHub has for this
warning.
* If the impact is minimal, change .T
* If the impact is large, change to a deprecation warning

An argument for this approach: a good amount of code I've seen in the wild
already assumes T is a 2d transpose, and as a result does not work
correctly when called with stacks of arrays. Changing T might fix this
broken code automatically.

If the change would be too intrusive, then keeping the deprecation warning
at least prevents new users deliberately using .T for >2d transposes, which
is possibly valuable for readers.

Eric


On Sun, Jun 23, 2019, 12:05 Stewart Clelland <stewartclelland at gmail.com>
wrote:

> Hi All,
>
> Based on discussion with Marten on github
> <https://github.com/numpy/numpy/issues/13797>, I have a couple of
> suggestions on syntax improvements on array transpose operations.
>
> First, introducing a shorthand for the Hermitian Transpose operator. I
> thought "A.HT" might be a viable candidate.
>
> Second, the adding an array method that operates like a normal transpose.
> To my understanding,
> "A.tranpose()" currently inverts the usual order of all dimensions. This
> may be useful in some applications involving tensors, but is not what I
> would usually assume a transpose on a multi-dimensional array would entail.
> I suggest a syntax of "A.MT" to indicate a transpose of the last two
> dimensions by default, maybe with optional arguments (i,j) to indicate
> which two dimensions to transpose.
>
> I'm new to this mailing list format, hopefully I'm doing this right :)
>
> Thanks,
> Stew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/7fb5a986/attachment-0001.html>

From einstein.edison at gmail.com  Sun Jun 23 15:51:49 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Sun, 23 Jun 2019 19:51:49 +0000
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>,
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
Message-ID: <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>

+1 for this. I have often seen (and sometimes written) code that does this automatically, and it is a common mistake.

However, we will need some way to filter for intent, as the people who write this code are the ones who didn?t read docs on it at the time, and so there might be a fair amount of noise even if it fixes their code.

I also agree that a transpose of an array with ndim > 2 doesn?t make sense without specifying the order, at least for the applications I have seen so far.

Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________
From: NumPy-Discussion <numpy-discussion-bounces+einstein.edison=gmail.com at python.org> on behalf of Eric Wieser <wieser.eric+numpy at gmail.com>
Sent: Sunday, June 23, 2019 9:24 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Syntax Improvement for Array Transpose

This might be contentious, but I wonder if, with a long enough deprecation cycle, we can change the meaning of .T. That would look like:

* Emit a future warning on `more_than_2d.T` with a message like "in future .T will transpose just the last two dimensions, not all dimensions. Use are.transpose() if transposing all {n} dimensions is deliberate"
* Wait 5 releases or so, see how many matches Google / GitHub has for this warning.
* If the impact is minimal, change .T
* If the impact is large, change to a deprecation warning

An argument for this approach: a good amount of code I've seen in the wild already assumes T is a 2d transpose, and as a result does not work correctly when called with stacks of arrays. Changing T might fix this broken code automatically.

If the change would be too intrusive, then keeping the deprecation warning at least prevents new users deliberately using .T for >2d transposes, which is possibly valuable for readers.

Eric


On Sun, Jun 23, 2019, 12:05 Stewart Clelland <stewartclelland at gmail.com<mailto:stewartclelland at gmail.com>> wrote:
Hi All,

Based on discussion with Marten on github<https://github.com/numpy/numpy/issues/13797>, I have a couple of suggestions on syntax improvements on array transpose operations.

First, introducing a shorthand for the Hermitian Transpose operator. I thought "A.HT<http://A.HT>" might be a viable candidate.

Second, the adding an array method that operates like a normal transpose. To my understanding,
"A.tranpose()" currently inverts the usual order of all dimensions. This may be useful in some applications involving tensors, but is not what I would usually assume a transpose on a multi-dimensional array would entail. I suggest a syntax of "A.MT<http://A.MT>" to indicate a transpose of the last two dimensions by default, maybe with optional arguments (i,j) to indicate which two dimensions to transpose.

I'm new to this mailing list format, hopefully I'm doing this right :)

Thanks,
Stew
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org<mailto:NumPy-Discussion at python.org>
https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/8f7cff40/attachment.html>

From sebastian at sipsolutions.net  Sun Jun 23 16:36:48 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 23 Jun 2019 13:36:48 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 , <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
Message-ID: <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>

On Sun, 2019-06-23 at 19:51 +0000, Hameer Abbasi wrote:
> +1 for this. I have often seen (and sometimes written) code that does
> this automatically, and it is a common mistake.

Yeah, likely worth a short. I doubt many uses for the n-dimensional
axis transpose, so maybe a futurewarning approach can work. If not, I
suppose the solution is the deprecation for ndim != 2.

Another point about the `.T` is the 1-dimensional case, which commonly
causes confusion. If we do something here, should think about that as
well.

- Sebastian


> 
> However, we will need some way to filter for intent, as the people
> who write this code are the ones who didn?t read docs on it at the
> time, and so there might be a fair amount of noise even if it fixes
> their code.
> 
> I also agree that a transpose of an array with ndim > 2 doesn?t make
> sense without specifying the order, at least for the applications I
> have seen so far.
> 
> Get Outlook for iOS
>  
> From: NumPy-Discussion <
> numpy-discussion-bounces+einstein.edison=gmail.com at python.org> on
> behalf of Eric Wieser <wieser.eric+numpy at gmail.com>
> Sent: Sunday, June 23, 2019 9:24 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Syntax Improvement for Array
> Transpose
>  
> This might be contentious, but I wonder if, with a long enough
> deprecation cycle, we can change the meaning of .T. That would look
> like:
> 
> * Emit a future warning on `more_than_2d.T` with a message like "in
> future .T will transpose just the last two dimensions, not all
> dimensions. Use are.transpose() if transposing all {n} dimensions is
> deliberate"
> * Wait 5 releases or so, see how many matches Google / GitHub has for
> this warning.
> * If the impact is minimal, change .T
> * If the impact is large, change to a deprecation warning
> 
> An argument for this approach: a good amount of code I've seen in the
> wild already assumes T is a 2d transpose, and as a result does not
> work correctly when called with stacks of arrays. Changing T might
> fix this broken code automatically.
> 
> If the change would be too intrusive, then keeping the deprecation
> warning at least prevents new users deliberately using .T for >2d
> transposes, which is possibly valuable for readers.
> 
> Eric
> 
> 
> On Sun, Jun 23, 2019, 12:05 Stewart Clelland <
> stewartclelland at gmail.com> wrote:
> > Hi All,
> > 
> > Based on discussion with Marten on github, I have a couple of
> > suggestions on syntax improvements on array transpose operations.
> > 
> > First, introducing a shorthand for the Hermitian Transpose
> > operator. I thought "A.HT" might be a viable candidate.
> > 
> > Second, the adding an array method that operates like a normal
> > transpose. To my understanding,
> > "A.tranpose()" currently inverts the usual order of all dimensions.
> > This may be useful in some applications involving tensors, but is
> > not what I would usually assume a transpose on a multi-dimensional
> > array would entail. I suggest a syntax of "A.MT" to indicate a
> > transpose of the last two dimensions by default, maybe with
> > optional arguments (i,j) to indicate which two dimensions to
> > transpose.
> > 
> > I'm new to this mailing list format, hopefully I'm doing this right
> > :)
> > 
> > Thanks,
> > Stew
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/057e3f6f/attachment.sig>

From m.h.vankerkwijk at gmail.com  Sun Jun 23 16:54:07 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 23 Jun 2019 16:54:07 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAEQ_Tvf8TQhnJtgfxae2dmxJAKvYvT-9ZoGiFFE92eNsDdu_CA@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
 <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>
 <CAEQ_Tvf8TQhnJtgfxae2dmxJAKvYvT-9ZoGiFFE92eNsDdu_CA@mail.gmail.com>
Message-ID: <CAJNV+9vQC4pv_ASGA4RiKjWOBzyr4PhJDrtpygvwbaXcUzQG6g@mail.gmail.com>

Hi Stephan,

In slightly changed order:

Let me try to make the API issue more concrete. Suppose we have a
> MaskedArray with values [1, 2, NA]. How do I get:
> 1. The sum ignoring masked values, i.e., 3.
> 2. The sum that is tainted by masked values, i.e., NA.
>
> Here's how this works with existing array libraries:
> - With base NumPy using NaN as a sentinel value for NA, you can get (1)
> with np.sum and (2) with np.nansum.
> - With pandas and xarray, the default behavior is (1) and to get (2) you
> need to write array.sum(skipna=False).
> - With NumPy's current MaskedArray, it appears that you can only get (1).
> Maybe there isn't as strong a need for (2) as I thought?
>

I think this is all correct.

>
> Your proposal would be something like np.sum(array,
> where=np.ones_like(array))? This seems rather verbose for a common
> operation. Perhaps np.sum(array, where=True) would work, making use of
> broadcasting? (I haven't actually checked whether this is well-defined yet.)
>
> I think we'd need to consider separately the operation on the mask and on
the data. In my proposal, the data would always do `np.sum(array,
where=~mask)`, while how the mask would propagate might depend on the mask
itself, i.e., we'd have different mask types for `skipna=True` (default)
and `False` ("contagious") reductions, which differed in doing
`logical_and.reduce` or `logical_or.reduce` on the mask.

I have been playing with using a new `Mask(np.ndarray)` class for the mask,
>> which does the actual mask propagation (i.e., all single-operand ufuncs
>> just copy the mask, binary operations do `logical_or` and reductions do
>> `logical.and.reduce`). This way the `Masked` class itself can generally
>> apply a given operation on the data and the masks separately and then
>> combine the two results (reductions are the exception in that `where` has
>> to be set). Your particular example here could be solved with a different
>> `Mask` class, for which reductions do `logical.or.reduce`.
>>
>
> I think it would be much better to use duck-typing for the mask as well,
> if possible, rather than a NumPy array subclass. This would facilitate
> using alternative mask implementations, e.g., distributed masks, sparse
> masks, bit-array masks, etc.
>

Implicitly in the above, I agree with having the mask not necessarily be a
plain ndarray, but something that can determine part of the action. Makes
sense to generalize that to duck arrays for the reasons you give. Indeed,
if we let the mask do the mask propagation as well, it might help make the
implementation substantially easier (e.g., `logical_and.reduce` and
`logical_or.reduce` can be super-fast on a bitmask!).


> Are there use-cases for propagating masks separately from data? If not, it
> might make sense to only define mask operations along with data, which
> could be much simpler.
>

I had only thought about separating out the concern of mask propagation
from the "MaskedArray" class to the mask proper, but it might indeed make
things easier if the mask also did any required preparation for passing
things on to the data (such as adjusting the "where" argument in a
reduction). I also like that this way the mask can determine even before
the data what functionality is available (i.e., it could be the place from
which to return `NotImplemented` for a ufunc.at call with a masked index
argument).

It may be good to collect a few more test cases... E.g., I'd like to mask
some of the astropy classes that are only very partial duck arrays, in that
they cover only the shape aspect, and which do have some operators and for
which it would be nice not to feel forced to use __array_ufunc__.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/562239ef/attachment-0001.html>

From deak.andris at gmail.com  Sun Jun 23 17:03:01 2019
From: deak.andris at gmail.com (Andras Deak)
Date: Sun, 23 Jun 2019 23:03:01 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
Message-ID: <CAMEWA4N7NztobsNq67TBtPOpp2ySrKaMLMbH68Eo_mRUkCGM6g@mail.gmail.com>

On Sun, Jun 23, 2019 at 10:37 PM Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> Yeah, likely worth a short. I doubt many uses for the n-dimensional
> axis transpose, so maybe a futurewarning approach can work. If not, I
> suppose the solution is the deprecation for ndim != 2.

Any chance that the n-dimensional transpose is being used in code
interfacing fortran/matlab and python? One thing the current
multidimensional transpose is good for is to switch between row-major
and column-major order. I don't know, however, whether this switch
actually has to be done often in code, in practice.

Andr?s

From m.h.vankerkwijk at gmail.com  Sun Jun 23 17:12:41 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 23 Jun 2019 17:12:41 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
Message-ID: <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>

Hi All,

I'd love to have `.T` mean the right thing, and am happy that people are
suggesting it after I told Steward this was likely off-limits (which, in
fairness, did seem to be the conclusion when we visited this before...).
But is there something we can do to make it possible to use it already but
ensure that code on previous numpy versions breaks? (Or works, but that
seems impossible...)

For instance, in python2, one had `from __future__ import division (etc.);
could we have, e.g., a `from numpy.__future__ import matrix_transpose`,
which, when imported, implied that `.T` just did the right thing without
any warning? (Obviously, since that __future__.matrix_transpose wouldn't
exist on older versions of numpy, it would correctly break the code when
used with those.)

Also, a bit more towards the original request in the PR of a hermitian
transpose, if we're trying to go for `.T` eventually having the obvious
meaning, should we directly move towards also having `.H` as a short-cut
for `.T.conj()`? We could even expose that only with the above future
import - otherwise, the risk of abuse of `.T` would only grow...

Finally, on the meaning of `.T` for 1-D arrays, the sensible choices would
seem to (1) error; or (2) change shape to `(n, 1)`. Since while writing
this sentence I changed my preference twice, I guess I should go for
erroring (I think we need a separate solution for easily making stacks of
row/column vectors).

All the best,

Marten

On Sun, Jun 23, 2019 at 4:37 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Sun, 2019-06-23 at 19:51 +0000, Hameer Abbasi wrote:
> > +1 for this. I have often seen (and sometimes written) code that does
> > this automatically, and it is a common mistake.
>
> Yeah, likely worth a short. I doubt many uses for the n-dimensional
> axis transpose, so maybe a futurewarning approach can work. If not, I
> suppose the solution is the deprecation for ndim != 2.
>
> Another point about the `.T` is the 1-dimensional case, which commonly
> causes confusion. If we do something here, should think about that as
> well.
>
> - Sebastian
>
>
> >
> > However, we will need some way to filter for intent, as the people
> > who write this code are the ones who didn?t read docs on it at the
> > time, and so there might be a fair amount of noise even if it fixes
> > their code.
> >
> > I also agree that a transpose of an array with ndim > 2 doesn?t make
> > sense without specifying the order, at least for the applications I
> > have seen so far.
> >
> > Get Outlook for iOS
> >
> > From: NumPy-Discussion <
> > numpy-discussion-bounces+einstein.edison=gmail.com at python.org> on
> > behalf of Eric Wieser <wieser.eric+numpy at gmail.com>
> > Sent: Sunday, June 23, 2019 9:24 PM
> > To: Discussion of Numerical Python
> > Subject: Re: [Numpy-discussion] Syntax Improvement for Array
> > Transpose
> >
> > This might be contentious, but I wonder if, with a long enough
> > deprecation cycle, we can change the meaning of .T. That would look
> > like:
> >
> > * Emit a future warning on `more_than_2d.T` with a message like "in
> > future .T will transpose just the last two dimensions, not all
> > dimensions. Use are.transpose() if transposing all {n} dimensions is
> > deliberate"
> > * Wait 5 releases or so, see how many matches Google / GitHub has for
> > this warning.
> > * If the impact is minimal, change .T
> > * If the impact is large, change to a deprecation warning
> >
> > An argument for this approach: a good amount of code I've seen in the
> > wild already assumes T is a 2d transpose, and as a result does not
> > work correctly when called with stacks of arrays. Changing T might
> > fix this broken code automatically.
> >
> > If the change would be too intrusive, then keeping the deprecation
> > warning at least prevents new users deliberately using .T for >2d
> > transposes, which is possibly valuable for readers.
> >
> > Eric
> >
> >
> > On Sun, Jun 23, 2019, 12:05 Stewart Clelland <
> > stewartclelland at gmail.com> wrote:
> > > Hi All,
> > >
> > > Based on discussion with Marten on github, I have a couple of
> > > suggestions on syntax improvements on array transpose operations.
> > >
> > > First, introducing a shorthand for the Hermitian Transpose
> > > operator. I thought "A.HT" might be a viable candidate.
> > >
> > > Second, the adding an array method that operates like a normal
> > > transpose. To my understanding,
> > > "A.tranpose()" currently inverts the usual order of all dimensions.
> > > This may be useful in some applications involving tensors, but is
> > > not what I would usually assume a transpose on a multi-dimensional
> > > array would entail. I suggest a syntax of "A.MT" to indicate a
> > > transpose of the last two dimensions by default, maybe with
> > > optional arguments (i,j) to indicate which two dimensions to
> > > transpose.
> > >
> > > I'm new to this mailing list format, hopefully I'm doing this right
> > > :)
> > >
> > > Thanks,
> > > Stew
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/cf1910be/attachment.html>

From sebastian at sipsolutions.net  Sun Jun 23 17:19:54 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 23 Jun 2019 14:19:54 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAMEWA4N7NztobsNq67TBtPOpp2ySrKaMLMbH68Eo_mRUkCGM6g@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAMEWA4N7NztobsNq67TBtPOpp2ySrKaMLMbH68Eo_mRUkCGM6g@mail.gmail.com>
Message-ID: <576c2c6a981ef688cba7ddf172dd84ed63f3c9d6.camel@sipsolutions.net>

On Sun, 2019-06-23 at 23:03 +0200, Andras Deak wrote:
> On Sun, Jun 23, 2019 at 10:37 PM Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> > Yeah, likely worth a short. I doubt many uses for the n-dimensional
> > axis transpose, so maybe a futurewarning approach can work. If not,
> > I
> > suppose the solution is the deprecation for ndim != 2.
> 
> Any chance that the n-dimensional transpose is being used in code
> interfacing fortran/matlab and python? One thing the current
> multidimensional transpose is good for is to switch between row-major
> and column-major order. I don't know, however, whether this switch
> actually has to be done often in code, in practice.
> 

I suppose there is a chance for that, to fix the order for returned
arrays (for input arrays you probably need to fix the memory order, so
that `copy(..., order="F")` or `np.ensure` is more likely what you
want.

Those users should be fine to switch over to `arr.transpose()`. The
question is mostly if it hits so much code that it is painful.

- Sebastian


> Andr?s
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/4aabda83/attachment-0001.sig>

From sebastian at sipsolutions.net  Sun Jun 23 17:27:51 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 23 Jun 2019 14:27:51 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
Message-ID: <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>

On Sun, 2019-06-23 at 17:12 -0400, Marten van Kerkwijk wrote:
> Hi All,
> 
> I'd love to have `.T` mean the right thing, and am happy that people
> are suggesting it after I told Steward this was likely off-limits
> (which, in fairness, did seem to be the conclusion when we visited
> this before...). But is there something we can do to make it possible
> to use it already but ensure that code on previous numpy versions
> breaks? (Or works, but that seems impossible...)
> 
> For instance, in python2, one had `from __future__ import division
> (etc.); could we have, e.g., a `from numpy.__future__ import
> matrix_transpose`, which, when imported, implied that `.T` just did
> the right thing without any warning? (Obviously, since that
> __future__.matrix_transpose wouldn't exist on older versions of
> numpy, it would correctly break the code when used with those.)
> 

If I remember correctly, this is actually possible but hacky. So it
would probably be nicer to not go there. But yes, you are right, that
would mean that we practically limit `.T` to 2-D arrays for at least 2
years.

> Also, a bit more towards the original request in the PR of a
> hermitian transpose, if we're trying to go for `.T` eventually having
> the obvious meaning, should we directly move towards also having `.H`
> as a short-cut for `.T.conj()`? We could even expose that only with
> the above future import - otherwise, the risk of abuse of `.T` would
> only grow...

This opens the general question of how many and which attributes we
actually want on ndarray. My first gut reaction is that I am -0 on it,
but OTOH, for some math it is very nice and not a huge amount of
clutter...


> 
> Finally, on the meaning of `.T` for 1-D arrays, the sensible choices
> would seem to (1) error; or (2) change shape to `(n, 1)`. Since while
> writing this sentence I changed my preference twice, I guess I should
> go for erroring (I think we need a separate solution for easily
> making stacks of row/column vectors).

Probably an error is good, which is nice, because we can just tag on a
warning and not worry about it for a while ;).

> 
> All the best,
> 
> Marten
> 
> On Sun, Jun 23, 2019 at 4:37 PM Sebastian Berg <
> sebastian at sipsolutions.net> wrote:
> > On Sun, 2019-06-23 at 19:51 +0000, Hameer Abbasi wrote:
> > > +1 for this. I have often seen (and sometimes written) code that
> > does
> > > this automatically, and it is a common mistake.
> > 
> > Yeah, likely worth a short. I doubt many uses for the n-dimensional
> > axis transpose, so maybe a futurewarning approach can work. If not,
> > I
> > suppose the solution is the deprecation for ndim != 2.
> > 
> > Another point about the `.T` is the 1-dimensional case, which
> > commonly
> > causes confusion. If we do something here, should think about that
> > as
> > well.
> > 
> > - Sebastian
> > 
> > 
> > > 
> > > However, we will need some way to filter for intent, as the
> > people
> > > who write this code are the ones who didn?t read docs on it at
> > the
> > > time, and so there might be a fair amount of noise even if it
> > fixes
> > > their code.
> > > 
> > > I also agree that a transpose of an array with ndim > 2 doesn?t
> > make
> > > sense without specifying the order, at least for the applications
> > I
> > > have seen so far.
> > > 
> > > Get Outlook for iOS
> > >  
> > > From: NumPy-Discussion <
> > > numpy-discussion-bounces+einstein.edison=gmail.com at python.org> on
> > > behalf of Eric Wieser <wieser.eric+numpy at gmail.com>
> > > Sent: Sunday, June 23, 2019 9:24 PM
> > > To: Discussion of Numerical Python
> > > Subject: Re: [Numpy-discussion] Syntax Improvement for Array
> > > Transpose
> > >  
> > > This might be contentious, but I wonder if, with a long enough
> > > deprecation cycle, we can change the meaning of .T. That would
> > look
> > > like:
> > > 
> > > * Emit a future warning on `more_than_2d.T` with a message like
> > "in
> > > future .T will transpose just the last two dimensions, not all
> > > dimensions. Use are.transpose() if transposing all {n} dimensions
> > is
> > > deliberate"
> > > * Wait 5 releases or so, see how many matches Google / GitHub has
> > for
> > > this warning.
> > > * If the impact is minimal, change .T
> > > * If the impact is large, change to a deprecation warning
> > > 
> > > An argument for this approach: a good amount of code I've seen in
> > the
> > > wild already assumes T is a 2d transpose, and as a result does
> > not
> > > work correctly when called with stacks of arrays. Changing T
> > might
> > > fix this broken code automatically.
> > > 
> > > If the change would be too intrusive, then keeping the
> > deprecation
> > > warning at least prevents new users deliberately using .T for >2d
> > > transposes, which is possibly valuable for readers.
> > > 
> > > Eric
> > > 
> > > 
> > > On Sun, Jun 23, 2019, 12:05 Stewart Clelland <
> > > stewartclelland at gmail.com> wrote:
> > > > Hi All,
> > > > 
> > > > Based on discussion with Marten on github, I have a couple of
> > > > suggestions on syntax improvements on array transpose
> > operations.
> > > > 
> > > > First, introducing a shorthand for the Hermitian Transpose
> > > > operator. I thought "A.HT" might be a viable candidate.
> > > > 
> > > > Second, the adding an array method that operates like a normal
> > > > transpose. To my understanding,
> > > > "A.tranpose()" currently inverts the usual order of all
> > dimensions.
> > > > This may be useful in some applications involving tensors, but
> > is
> > > > not what I would usually assume a transpose on a multi-
> > dimensional
> > > > array would entail. I suggest a syntax of "A.MT" to indicate a
> > > > transpose of the last two dimensions by default, maybe with
> > > > optional arguments (i,j) to indicate which two dimensions to
> > > > transpose.
> > > > 
> > > > I'm new to this mailing list format, hopefully I'm doing this
> > right
> > > > :)
> > > > 
> > > > Thanks,
> > > > Stew
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/f5323642/attachment.sig>

From wieser.eric+numpy at gmail.com  Sun Jun 23 18:19:00 2019
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sun, 23 Jun 2019 15:19:00 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
Message-ID: <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>

If I remember correctly, [numpy.future imports are] actually possible
but hacky. So it would probably be nicer to not go there.

There was some discussion of this at
https://stackoverflow.com/q/29905278/102441.
I agree with the conclusion we should not go there - in particular,
note that every builtin __future__ feature has been an
interpreter-level change, not an object-level change.
from __future__ import division changes the meaning of / not of int.__div__.
Framing the numpy change this way would mean rewriting Attribute(obj,
attr, Load) ast nodes to Call(np._attr_override, obj, attr), which is
obvious not interoperable with any other module wanting to do the same
thing.

This opens other unpleasant cans of worms about ?builtin? modules that
perform attribute access:

Should getattr(arr, 'T') change behavior based on the module that calls it?
Should operator.itemgetter('T') change behavior ?

So I do not think we want to go down that road.

On Sun, 23 Jun 2019 at 14:28, Sebastian Berg <sebastian at sipsolutions.net> wrote:
>
> On Sun, 2019-06-23 at 17:12 -0400, Marten van Kerkwijk wrote:
> > Hi All,
> >
> > I'd love to have `.T` mean the right thing, and am happy that people
> > are suggesting it after I told Steward this was likely off-limits
> > (which, in fairness, did seem to be the conclusion when we visited
> > this before...). But is there something we can do to make it possible
> > to use it already but ensure that code on previous numpy versions
> > breaks? (Or works, but that seems impossible...)
> >
> > For instance, in python2, one had `from __future__ import division
> > (etc.); could we have, e.g., a `from numpy.__future__ import
> > matrix_transpose`, which, when imported, implied that `.T` just did
> > the right thing without any warning? (Obviously, since that
> > __future__.matrix_transpose wouldn't exist on older versions of
> > numpy, it would correctly break the code when used with those.)
> >
>
> If I remember correctly, this is actually possible but hacky. So it
> would probably be nicer to not go there. But yes, you are right, that
> would mean that we practically limit `.T` to 2-D arrays for at least 2
> years.
>
> > Also, a bit more towards the original request in the PR of a
> > hermitian transpose, if we're trying to go for `.T` eventually having
> > the obvious meaning, should we directly move towards also having `.H`
> > as a short-cut for `.T.conj()`? We could even expose that only with
> > the above future import - otherwise, the risk of abuse of `.T` would
> > only grow...
>
> This opens the general question of how many and which attributes we
> actually want on ndarray. My first gut reaction is that I am -0 on it,
> but OTOH, for some math it is very nice and not a huge amount of
> clutter...
>
>
> >
> > Finally, on the meaning of `.T` for 1-D arrays, the sensible choices
> > would seem to (1) error; or (2) change shape to `(n, 1)`. Since while
> > writing this sentence I changed my preference twice, I guess I should
> > go for erroring (I think we need a separate solution for easily
> > making stacks of row/column vectors).
>
> Probably an error is good, which is nice, because we can just tag on a
> warning and not worry about it for a while ;).
>
> >
> > All the best,
> >
> > Marten
> >
> > On Sun, Jun 23, 2019 at 4:37 PM Sebastian Berg <
> > sebastian at sipsolutions.net> wrote:
> > > On Sun, 2019-06-23 at 19:51 +0000, Hameer Abbasi wrote:
> > > > +1 for this. I have often seen (and sometimes written) code that
> > > does
> > > > this automatically, and it is a common mistake.
> > >
> > > Yeah, likely worth a short. I doubt many uses for the n-dimensional
> > > axis transpose, so maybe a futurewarning approach can work. If not,
> > > I
> > > suppose the solution is the deprecation for ndim != 2.
> > >
> > > Another point about the `.T` is the 1-dimensional case, which
> > > commonly
> > > causes confusion. If we do something here, should think about that
> > > as
> > > well.
> > >
> > > - Sebastian
> > >
> > >
> > > >
> > > > However, we will need some way to filter for intent, as the
> > > people
> > > > who write this code are the ones who didn?t read docs on it at
> > > the
> > > > time, and so there might be a fair amount of noise even if it
> > > fixes
> > > > their code.
> > > >
> > > > I also agree that a transpose of an array with ndim > 2 doesn?t
> > > make
> > > > sense without specifying the order, at least for the applications
> > > I
> > > > have seen so far.
> > > >
> > > > Get Outlook for iOS
> > > >
> > > > From: NumPy-Discussion <
> > > > numpy-discussion-bounces+einstein.edison=gmail.com at python.org> on
> > > > behalf of Eric Wieser <wieser.eric+numpy at gmail.com>
> > > > Sent: Sunday, June 23, 2019 9:24 PM
> > > > To: Discussion of Numerical Python
> > > > Subject: Re: [Numpy-discussion] Syntax Improvement for Array
> > > > Transpose
> > > >
> > > > This might be contentious, but I wonder if, with a long enough
> > > > deprecation cycle, we can change the meaning of .T. That would
> > > look
> > > > like:
> > > >
> > > > * Emit a future warning on `more_than_2d.T` with a message like
> > > "in
> > > > future .T will transpose just the last two dimensions, not all
> > > > dimensions. Use are.transpose() if transposing all {n} dimensions
> > > is
> > > > deliberate"
> > > > * Wait 5 releases or so, see how many matches Google / GitHub has
> > > for
> > > > this warning.
> > > > * If the impact is minimal, change .T
> > > > * If the impact is large, change to a deprecation warning
> > > >
> > > > An argument for this approach: a good amount of code I've seen in
> > > the
> > > > wild already assumes T is a 2d transpose, and as a result does
> > > not
> > > > work correctly when called with stacks of arrays. Changing T
> > > might
> > > > fix this broken code automatically.
> > > >
> > > > If the change would be too intrusive, then keeping the
> > > deprecation
> > > > warning at least prevents new users deliberately using .T for >2d
> > > > transposes, which is possibly valuable for readers.
> > > >
> > > > Eric
> > > >
> > > >
> > > > On Sun, Jun 23, 2019, 12:05 Stewart Clelland <
> > > > stewartclelland at gmail.com> wrote:
> > > > > Hi All,
> > > > >
> > > > > Based on discussion with Marten on github, I have a couple of
> > > > > suggestions on syntax improvements on array transpose
> > > operations.
> > > > >
> > > > > First, introducing a shorthand for the Hermitian Transpose
> > > > > operator. I thought "A.HT" might be a viable candidate.
> > > > >
> > > > > Second, the adding an array method that operates like a normal
> > > > > transpose. To my understanding,
> > > > > "A.tranpose()" currently inverts the usual order of all
> > > dimensions.
> > > > > This may be useful in some applications involving tensors, but
> > > is
> > > > > not what I would usually assume a transpose on a multi-
> > > dimensional
> > > > > array would entail. I suggest a syntax of "A.MT" to indicate a
> > > > > transpose of the last two dimensions by default, maybe with
> > > > > optional arguments (i,j) to indicate which two dimensions to
> > > > > transpose.
> > > > >
> > > > > I'm new to this mailing list format, hopefully I'm doing this
> > > right
> > > > > :)
> > > > >
> > > > > Thanks,
> > > > > Stew
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > >
> > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From shoyer at gmail.com  Sun Jun 23 18:25:29 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 24 Jun 2019 01:25:29 +0300
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9vQC4pv_ASGA4RiKjWOBzyr4PhJDrtpygvwbaXcUzQG6g@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
 <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>
 <CAEQ_Tvf8TQhnJtgfxae2dmxJAKvYvT-9ZoGiFFE92eNsDdu_CA@mail.gmail.com>
 <CAJNV+9vQC4pv_ASGA4RiKjWOBzyr4PhJDrtpygvwbaXcUzQG6g@mail.gmail.com>
Message-ID: <CAEQ_TvfzdyxgNo8EeV=533=y=9GtD7A+3K9X40QZ=kPQTCkczQ@mail.gmail.com>

On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Your proposal would be something like np.sum(array,
>> where=np.ones_like(array))? This seems rather verbose for a common
>> operation. Perhaps np.sum(array, where=True) would work, making use of
>> broadcasting? (I haven't actually checked whether this is well-defined yet.)
>>
>> I think we'd need to consider separately the operation on the mask and on
> the data. In my proposal, the data would always do `np.sum(array,
> where=~mask)`, while how the mask would propagate might depend on the mask
> itself, i.e., we'd have different mask types for `skipna=True` (default)
> and `False` ("contagious") reductions, which differed in doing
> `logical_and.reduce` or `logical_or.reduce` on the mask.
>

OK, I think I finally understand what you're getting at. So suppose this
this how we implement it internally. Would we really insist on a user
creating a new MaskedArray with a new mask object, e.g., with a GreedyMask?
We could add sugar for this, but certainly array.greedy_masked().sum() is
significantly less clear than array.sum(skipna=False).

I'm also a little concerned about a proliferation of MaskedArray/Mask
types. New types are significantly harder to understand than new functions
(or new arguments on existing functions). I don't know if we have enough
distinct use cases for this many types.

Are there use-cases for propagating masks separately from data? If not, it
>> might make sense to only define mask operations along with data, which
>> could be much simpler.
>>
>
> I had only thought about separating out the concern of mask propagation
> from the "MaskedArray" class to the mask proper, but it might indeed make
> things easier if the mask also did any required preparation for passing
> things on to the data (such as adjusting the "where" argument in a
> reduction). I also like that this way the mask can determine even before
> the data what functionality is available (i.e., it could be the place from
> which to return `NotImplemented` for a ufunc.at call with a masked index
> argument).
>

You're going to have to come up with something more compelling than
"separation of concerns" to convince me that this extra Mask abstraction is
worthwhile. On its own, I think a separate Mask class would only obfuscate
MaskedArray functions.

For example, compare these two implementations of add:

def  add1(x, y):
    return MaskedArray(x.data + y.data,  x.mask | y.mask)

def  add2(x, y):
    return MaskedArray(x.data + y.data,  x.mask + y.mask)

The second version requires that you *also* know how Mask classes work, and
how they implement +. So now you need to look in at least twice as many
places to understand add() for MaskedArray objects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/8a6798d0/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Sun Jun 23 18:58:59 2019
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sun, 23 Jun 2019 15:58:59 -0700
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAEQ_TvfzdyxgNo8EeV=533=y=9GtD7A+3K9X40QZ=kPQTCkczQ@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
 <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>
 <CAEQ_Tvf8TQhnJtgfxae2dmxJAKvYvT-9ZoGiFFE92eNsDdu_CA@mail.gmail.com>
 <CAJNV+9vQC4pv_ASGA4RiKjWOBzyr4PhJDrtpygvwbaXcUzQG6g@mail.gmail.com>
 <CAEQ_TvfzdyxgNo8EeV=533=y=9GtD7A+3K9X40QZ=kPQTCkczQ@mail.gmail.com>
Message-ID: <CAL1kJvCSJt6oGTtZdLoqoYQDtD6fxBzwmUMdiPwA1x740QtkCg@mail.gmail.com>

I think we?d need to consider separately the operation on the mask and on
the data. In my proposal, the data would always do np.sum(array,
where=~mask), while how the mask would propagate might depend on the mask
itself,

I quite like this idea, and I think Stephan?s strawman design is actually
plausible, where MaskedArray.mask is either an InvalidMask or a IgnoreMask
instance to pick between the different propagation types. Both classes
could simply have an underlying ._array attribute pointing to a duck-array
of some kind that backs their boolean data.

The second version requires that you *also* know how Mask classes work, and
how they implement +

I remain unconvinced that Mask classes should behave differently on
different ufuncs. I don?t think np.minimum(ignore_na, b) is any different
to np.add(ignore_na, b) - either both should produce b, or both should
produce ignore_na. I would lean towards produxing ignore_na, and
propagation behavior differing between ?ignore? and ?invalid? only for
reduce / accumulate operations, where the concept of skipping an
application is well-defined.

Some possible follow-up questions that having two distinct masked types
raise:

   - what if I want my data to support both invalid and skip fields at the
   same time? sum([invalid, skip, 1]) == invalid
   - is there a use case for more that these two types of mask?
   invalid_due_to_reason_A, invalid_due_to_reason_B would be interesting
   things to track through a calculation, possibly a dictionary of named masks.

Eric

On Sun, 23 Jun 2019 at 15:28, Stephan Hoyer <shoyer at gmail.com> wrote:

> On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Your proposal would be something like np.sum(array,
>>> where=np.ones_like(array))? This seems rather verbose for a common
>>> operation. Perhaps np.sum(array, where=True) would work, making use of
>>> broadcasting? (I haven't actually checked whether this is well-defined yet.)
>>>
>>> I think we'd need to consider separately the operation on the mask and
>> on the data. In my proposal, the data would always do `np.sum(array,
>> where=~mask)`, while how the mask would propagate might depend on the mask
>> itself, i.e., we'd have different mask types for `skipna=True` (default)
>> and `False` ("contagious") reductions, which differed in doing
>> `logical_and.reduce` or `logical_or.reduce` on the mask.
>>
>
> OK, I think I finally understand what you're getting at. So suppose this
> this how we implement it internally. Would we really insist on a user
> creating a new MaskedArray with a new mask object, e.g., with a GreedyMask?
> We could add sugar for this, but certainly array.greedy_masked().sum() is
> significantly less clear than array.sum(skipna=False).
>
> I'm also a little concerned about a proliferation of MaskedArray/Mask
> types. New types are significantly harder to understand than new functions
> (or new arguments on existing functions). I don't know if we have enough
> distinct use cases for this many types.
>
> Are there use-cases for propagating masks separately from data? If not, it
>>> might make sense to only define mask operations along with data, which
>>> could be much simpler.
>>>
>>
>> I had only thought about separating out the concern of mask propagation
>> from the "MaskedArray" class to the mask proper, but it might indeed make
>> things easier if the mask also did any required preparation for passing
>> things on to the data (such as adjusting the "where" argument in a
>> reduction). I also like that this way the mask can determine even before
>> the data what functionality is available (i.e., it could be the place from
>> which to return `NotImplemented` for a ufunc.at call with a masked index
>> argument).
>>
>
> You're going to have to come up with something more compelling than
> "separation of concerns" to convince me that this extra Mask abstraction is
> worthwhile. On its own, I think a separate Mask class would only obfuscate
> MaskedArray functions.
>
> For example, compare these two implementations of add:
>
> def  add1(x, y):
>     return MaskedArray(x.data + y.data,  x.mask | y.mask)
>
> def  add2(x, y):
>     return MaskedArray(x.data + y.data,  x.mask + y.mask)
>
> The second version requires that you *also* know how Mask classes work,
> and how they implement +. So now you need to look in at least twice as many
> places to understand add() for MaskedArray objects.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/4920e4af/attachment.html>

From m.h.vankerkwijk at gmail.com  Sun Jun 23 21:09:22 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 23 Jun 2019 21:09:22 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
Message-ID: <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>

I had not looked at any implementation (only remembered the nice idea of
"importing from the future"), and looking at the links Eric shared, it
seems that the only way this would work is, effectively, pre-compilation
doing a `<codetext>.replace('.T', '._T_from_the_future')`, where you'd be
hoping that there never is any other meaning for a `.T` attribute, for any
class, since it is impossible to be sure a given variable is an ndarray.
(Actually, a lot less implausible than for the case of numpy indexing
discussed in the link...)

Anyway, what I had in mind was something along the lines of inside the `.T`
code there being be a check on whether a particular future item was present
in the environment. But thinking more, I can see that it is not trivial to
get to know something about the environment in which the code that called
you was written....

So, it seems there is no (simple) way to tell numpy that inside a given
module you want `.T` to have the new behaviour, but still to warn if
outside the module it is used in the old way (when risky)?

-- Marten

p.s. I'm somewhat loath to add new properties to ndarray, but `.T` and `.H`
have such obvious and clear meaning to anyone dealing with (complex)
matrices that I think it is worth it. See
https://mail.python.org/pipermail/numpy-discussion/2019-June/079584.html
for a list of options of attributes that we might deprecate "in exchange"...

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/a4f4a9d8/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Sun Jun 23 22:04:05 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 23 Jun 2019 22:04:05 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAL1kJvCSJt6oGTtZdLoqoYQDtD6fxBzwmUMdiPwA1x740QtkCg@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
 <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>
 <CAEQ_Tvf8TQhnJtgfxae2dmxJAKvYvT-9ZoGiFFE92eNsDdu_CA@mail.gmail.com>
 <CAJNV+9vQC4pv_ASGA4RiKjWOBzyr4PhJDrtpygvwbaXcUzQG6g@mail.gmail.com>
 <CAEQ_TvfzdyxgNo8EeV=533=y=9GtD7A+3K9X40QZ=kPQTCkczQ@mail.gmail.com>
 <CAL1kJvCSJt6oGTtZdLoqoYQDtD6fxBzwmUMdiPwA1x740QtkCg@mail.gmail.com>
Message-ID: <CAJNV+9scHMs=rDottJzhiSamfb=p0e_iwSeMV65++vYd_TEaiQ@mail.gmail.com>

Hi Stephan,

Eric perhaps explained my concept better than I could!

I do agree that, as written, your example would be clearer, but Allan's
code and the current MaskedArray code do have not that much semblance to
it, and mine even less, as they deal with operators as whole groups.

For mine, it may be useful to tell that this quite possibly crazy idea came
from considering how one would quite logically implement all shape
operations, which effect the data and mask the same way, so one soon writes
down something where (as in my ShapedLikeNDArray mixin;
https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L858)
all methods pass on to `return self._apply(method, <arguments>), with
`_apply` looking like:
```
    def _apply(self, method, *args, **kwargs):
        # For use with NDArrayShapeMethods.
        data = getattr(self._data, method)(*args, **kwargs)
        mask = getattr(self._mask, method)(*args, **kwargs)

        return self.__class__(data, mask=mask)
```
(Or the same is auto-generated for all those methods.)

Now considering this, for `__array_ufunc__`, one might similarly do
(__call__ only, ignoring outputs)
```
    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
        data = []
        masks = []
        for input_ in inputs:
            d, m = self._data_mask(input_)
            data.append(d)
            masks.append(m)
        result_mask = getattr(ufunc, method)(*masks, **mask_kwargs)
        if result_mask is NotImplemented:
            return NotImplemented

        result_data = getattr(ufunc, method)(*data, **kwargs)
        return self._masked_result(result_data, result_mask, out)
```
The beauty here is that the Masked class itself needs to know very little
about how to do masking, or how to implement it; it is almost like a
collection of arrays: the only thing that separates `__array_ufunc__` from
`_apply` above is that, since other objects are involved, the data and mask
have to be separated out for those. (And perhaps for people trying to
change it, it actually helps that the special-casing of single-, double-,
and triple-input ufuncs can all be done inside the mask class, and
adjustments can be made there without having to understand the machinery
for getting masks out of data, etc.?)

But what brought me to this was not __array_ufunc__ itself, but rather the
next step, that I really would like to have masked version of classes that
are array-like in shape but do not generally work with numpy ufuncs or
functions, and where I prefer not to force the use of numpy ufuncs. So,
inside the masked class I have to override the operators. Obviously, there
again I could do it with functions like you describe, but I can also just
split off data and masks as above, and just call the same operator on both.
Then, the code could basically use that of `__array_ufunc__` above when
called as `self.__array_ufunc__(operator, '__add__', self, other)`.

I think (but am not sure, not actually having tried it yet), that this
would also make it easier to mostly auto-generated masked classes: those
supported by both the mask and the data are relatively easy; those separate
for the data need some hints from the data class (e.g., `.unit` on a
`Quantity` is independent of the mask).

Anyway, just to be clear, I am *not* sure that this is the right way
forward. I think the more important really was your suggestion to take mask
duck types into account a priori (for bit-packed ones, etc.). But it seems
worth thinking it through now, so we don't repeat the things that ended up
making `MaskedArray` quite annoying.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/4d81aadf/attachment.html>

From m.h.vankerkwijk at gmail.com  Sun Jun 23 22:27:52 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Sun, 23 Jun 2019 22:27:52 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAL1kJvCSJt6oGTtZdLoqoYQDtD6fxBzwmUMdiPwA1x740QtkCg@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
 <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>
 <CAEQ_Tvf8TQhnJtgfxae2dmxJAKvYvT-9ZoGiFFE92eNsDdu_CA@mail.gmail.com>
 <CAJNV+9vQC4pv_ASGA4RiKjWOBzyr4PhJDrtpygvwbaXcUzQG6g@mail.gmail.com>
 <CAEQ_TvfzdyxgNo8EeV=533=y=9GtD7A+3K9X40QZ=kPQTCkczQ@mail.gmail.com>
 <CAL1kJvCSJt6oGTtZdLoqoYQDtD6fxBzwmUMdiPwA1x740QtkCg@mail.gmail.com>
Message-ID: <CAJNV+9vBdbfAOHbY8sJg=VP5GZySsZ_qOsCTh6au4LRuaq7PKQ@mail.gmail.com>

Hi Eric,

On your other points:

I remain unconvinced that Mask classes should behave differently on
> different ufuncs. I don?t think np.minimum(ignore_na, b) is any different
> to np.add(ignore_na, b) - either both should produce b, or both should
> produce ignore_na. I would lean towards produxing ignore_na, and
> propagation behavior differing between ?ignore? and ?invalid? only for
> reduce / accumulate operations, where the concept of skipping an
> application is well-defined.
>
I think I mostly agree - this is really about reductions. And this fact
that there are apparently only two choices weakens the case for pushing the
logic into the mask class itself.

But the one case that still tempts me to break with the strict rule for
ufunc.__call__ is `fmin, fmax` vs `minimum, maximum`... What do you think?


> Some possible follow-up questions that having two distinct masked types
> raise:
>
>    - what if I want my data to support both invalid and skip fields at
>    the same time? sum([invalid, skip, 1]) == invalid
>
> Have a triple-valued mask? Would be easy to implement if all the logic is
in the mask...

(Indeed, for all I care it could implement weighting! That would actually
care about the actual operation, so would be a real example. Though of
course it also does need access to the actual data, so perhaps best not to
go there...)

>
>    - is there a use case for more that these two types of mask?
>    invalid_due_to_reason_A, invalid_due_to_reason_B would be interesting
>    things to track through a calculation, possibly a dictionary of named masks.
>
> For astropy's NDData, there has been quite a bit of discussion of a
`Flags` object, which works exactly as you describe, an OR together of
different reasons for why data is invalid (HST uses this, though the
discussion was for the Large Synodic Survey Telescope data pipeline). Those
flags are propagated like masks.

I think in most cases, these examples would not require more than allowing
the mask to be a duck type. Though perhaps for some reductions, it might
matter what the reduction of the data is actually doing (e.g.,
`np.minimum.reduce` might need different rules than `np.add.reduce`). And,
of course, one can argue that for such a case it might be best to subclass
MaskedArray itself, and do the logic there.

All the best,

Marten

p.s. For accumulations, I'm still not sure I find them well-defined. I
could see that np.add.reduce([0, 1, 1, --, 3])` could lead to `[0, 1, 2,
5]`, i.e., a shorter sequence, but this doesn't work on arrays where
different rows can have different numbers of masked elements. It then
perhaps suggests `[0, 1, 2, --, 5]` is OK, but the annoyance I have is that
there is nothing that tells me what the underlying data should be, i.e.,
this is truly different from having a `where` keyword in `np.add.reduce`.
But perhaps it is that I do not see much use for accumulation beyond
changing a histogram into its cumulative version - for which masked
elements really make no sense - one somehow has to interpolate over, not
just discount the masked one.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190623/525925e4/attachment-0001.html>

From u.lupo at l2f.ch  Mon Jun 24 04:35:36 2019
From: u.lupo at l2f.ch (Umberto Lupo)
Date: Mon, 24 Jun 2019 08:35:36 +0000
Subject: [Numpy-discussion] =?windows-1252?q?=5BJOB=5D_Full-time_opportun?=
 =?windows-1252?q?ity_=96_Software_engineer_for_open_source_project?=
Message-ID: <32FD8B3F-DC7F-40EB-82A9-93FF3541EA12@l2f.ch>

Who we are
L2F is a start-up based on the EPFL Innovation Park (Lausanne, Switzerland).  We are currently working at the frontier of machine learning and topological data analysis, in collaboration with several academic partners.

Our Mission
We are developing an open source Python library implementing innovative topological data analysis algorithms which are being designed by our team of full-time research scientists and post-doctoral researchers.  The library shall be user-friendly, well documented, high-performance and well integrated with state-of-the-art machine learning libraries (such as NumPy/SciPy, scikit-learn and Keras or other popular deep learning frameworks).  We are offering a full-time job in our company to help us develop this library.  The candidate will work in the L2F research team.

Profile description
We are looking for a computer scientist matching these characteristics:

  *   2+ years of experience in software engineering.
  *   Skilled with Python and C++ (in particular, at ease wrapping C++ code for Python).
  *   Aware of how open source communities work.  Better if he/she contributed in open-source collaborations, such as scikit-learn.
  *   At ease writing specifications, developer documentation and good user documentation.
  *   Fluent with continuous integration, Git and common developer tools.
  *   Skilled in testing architectures (unit tests, integration tests, etc.).

How to apply
Applicants can write an e-mail to Dr. Matteo Caorsi (m.caorsi at l2f.ch<mailto:m.caorsi at l2f.ch>) attaching their CV and a short letter detailing their relevant experience and motivation.

Starting date
This position is available for immediate start for the right candidate.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/dc46538c/attachment.html>

From ilhanpolat at gmail.com  Mon Jun 24 05:56:38 2019
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Mon, 24 Jun 2019 11:56:38 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
Message-ID: <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>

Please don't introduce more errors for 1D arrays. They are already very
counter-intuitive for transposition and for other details not relevant to
this issue. Emitting errors for such a basic operation is very bad for user
experience. This already is the case with wildly changing slicing syntax.
It would have made sense if 2D arrays were the default objects and 1D
required extra effort to create. But it is the other way around. Hence a
transpose operation is "expected" from it. This would kind of force all
NumPy users to shift their code one tab further to accomodate for the extra
try, catch blocks for "Oh wait, what if a 1D array comes in?" checks for
the existence of transposability everytime I write down `.T` in the code.

Code example; I am continuously writing code involving lots of matrix
products with inverses and transposes/hermitians (say, the 2nd eq.,
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_continuous_are.html
)
That means I have to check at least 4-6 matrices if any of them are
transposable to make that equation go through.

The dot-H solution is actually my ideal choice but I get the point that the
base namespace is already crowded. I am even OK with having
`x.conj(T=True)` having a keyword for extra transposition so that I can get
away with `x.conj(1)`; it doesn't solve the fundamental issue but at least
gives some convenience.

Best,
ilhan


On Mon, Jun 24, 2019 at 3:11 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> I had not looked at any implementation (only remembered the nice idea of
> "importing from the future"), and looking at the links Eric shared, it
> seems that the only way this would work is, effectively, pre-compilation
> doing a `<codetext>.replace('.T', '._T_from_the_future')`, where you'd be
> hoping that there never is any other meaning for a `.T` attribute, for any
> class, since it is impossible to be sure a given variable is an ndarray.
> (Actually, a lot less implausible than for the case of numpy indexing
> discussed in the link...)
>
> Anyway, what I had in mind was something along the lines of inside the
> `.T` code there being be a check on whether a particular future item was
> present in the environment. But thinking more, I can see that it is not
> trivial to get to know something about the environment in which the code
> that called you was written....
>
> So, it seems there is no (simple) way to tell numpy that inside a given
> module you want `.T` to have the new behaviour, but still to warn if
> outside the module it is used in the old way (when risky)?
>
> -- Marten
>
> p.s. I'm somewhat loath to add new properties to ndarray, but `.T` and
> `.H` have such obvious and clear meaning to anyone dealing with (complex)
> matrices that I think it is worth it. See
> https://mail.python.org/pipermail/numpy-discussion/2019-June/079584.html
> for a list of options of attributes that we might deprecate "in exchange"...
>
> All the best,
>
> Marten
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/5806379b/attachment.html>

From einstein.edison at gmail.com  Mon Jun 24 06:24:54 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Mon, 24 Jun 2019 12:24:54 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
Message-ID: <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>

Given that np.dot and np.matmul do the right thing for 1-D arrays, I would be opposed to introducing an error as well.

 
From: NumPy-Discussion <numpy-discussion-bounces+einstein.edison=gmail.com at python.org> on behalf of Ilhan Polat <ilhanpolat at gmail.com>
Reply-To: Discussion of Numerical Python <numpy-discussion at python.org>
Date: Monday, 24. June 2019 at 11:58
To: Discussion of Numerical Python <numpy-discussion at python.org>
Subject: Re: [Numpy-discussion] Syntax Improvement for Array Transpose

 
Please don't introduce more errors for 1D arrays. They are already very counter-intuitive for transposition and for other details not relevant to this issue. Emitting errors for such a basic operation is very bad for user experience. This already is the case with wildly changing slicing syntax. It would have made sense if 2D arrays were the default objects and 1D required extra effort to create. But it is the other way around. Hence a transpose operation is "expected" from it. This would kind of force all NumPy users to shift their code one tab further to accomodate for the extra try, catch blocks for "Oh wait, what if a 1D array comes in?" checks for the existence of transposability everytime I write down `.T` in the code. 

 
Code example; I am continuously writing code involving lots of matrix products with inverses and transposes/hermitians (say, the 2nd eq., https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_continuous_are.html )

That means I have to check at least 4-6 matrices if any of them are transposable to make that equation go through.

 
The dot-H solution is actually my ideal choice but I get the point that the base namespace is already crowded. I am even OK with having `x.conj(T=True)` having a keyword for extra transposition so that I can get away with `x.conj(1)`; it doesn't solve the fundamental issue but at least gives some convenience.

 
Best,

ilhan

 
On Mon, Jun 24, 2019 at 3:11 AM Marten van Kerkwijk <m.h.vankerkwijk at gmail.com> wrote:

I had not looked at any implementation (only remembered the nice idea of "importing from the future"), and looking at the links Eric shared, it seems that the only way this would work is, effectively, pre-compilation doing a `<codetext>.replace('.T', '._T_from_the_future')`, where you'd be hoping that there never is any other meaning for a `.T` attribute, for any class, since it is impossible to be sure a given variable is an ndarray. (Actually, a lot less implausible than for the case of numpy indexing discussed in the link...)

 
Anyway, what I had in mind was something along the lines of inside the `.T` code there being be a check on whether a particular future item was present in the environment. But thinking more, I can see that it is not trivial to get to know something about the environment in which the code that called you was written....

 
So, it seems there is no (simple) way to tell numpy that inside a given module you want `.T` to have the new behaviour, but still to warn if outside the module it is used in the old way (when risky)?

 
-- Marten

 
p.s. I'm somewhat loath to add new properties to ndarray, but `.T` and `.H` have such obvious and clear meaning to anyone dealing with (complex) matrices that I think it is worth it. See https://mail.python.org/pipermail/numpy-discussion/2019-June/079584.html for a list of options of attributes that we might deprecate "in exchange"...

 
All the best,

 
Marten

 
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/ed1e02b9/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Mon Jun 24 09:24:22 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 24 Jun 2019 09:24:22 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
Message-ID: <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>

Dear Hameer, Ilhan,

Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
of `(n, 1)` the right behaviour? I.e., it should still change from what it
is now - which is to leave the shape at `(n,)`.

Your argument about `dot` and `matmul` having similar behaviour certainly
adds weight (but then, as I wrote before, my opinion on this changes by the
second, so I'm very happy to defer to others who have a clearer sense of
what is the right thing to do here!).

I think my main worry now is how to get to be able to use a new state
without having to wait 4..6 releases...

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/28b79df9/attachment.html>

From einstein.edison at gmail.com  Mon Jun 24 09:53:30 2019
From: einstein.edison at gmail.com (Hameer Abbasi)
Date: Mon, 24 Jun 2019 13:53:30 +0000
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>,
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
Message-ID: <AM0PR04MB6306B5011458D87742E50B7FF7E00@AM0PR04MB6306.eurprd04.prod.outlook.com>

Hello Marten,

I was suggesting not changing the shape at all, since dot/matmul/solve do the right thing already in such a case.

In my proposal, only for ndim >=2 do we switch the last two dimensions.

Ilhan is right that adding a special case for ndim=1 (error) adds programmer overhead, which is against the general philosophy of NumPy I feel.

Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________
From: NumPy-Discussion <numpy-discussion-bounces+einstein.edison=gmail.com at python.org> on behalf of Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
Sent: Monday, June 24, 2019 3:24 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] Syntax Improvement for Array Transpose

Dear Hameer, Ilhan,

Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape of `(n, 1)` the right behaviour? I.e., it should still change from what it is now - which is to leave the shape at `(n,)`.

Your argument about `dot` and `matmul` having similar behaviour certainly adds weight (but then, as I wrote before, my opinion on this changes by the second, so I'm very happy to defer to others who have a clearer sense of what is the right thing to do here!).

I think my main worry now is how to get to be able to use a new state without having to wait 4..6 releases...

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/290837f2/attachment.html>

From toddrjen at gmail.com  Mon Jun 24 10:24:09 2019
From: toddrjen at gmail.com (Todd)
Date: Mon, 24 Jun 2019 10:24:09 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <AM0PR04MB6306B5011458D87742E50B7FF7E00@AM0PR04MB6306.eurprd04.prod.outlook.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <AM0PR04MB6306B5011458D87742E50B7FF7E00@AM0PR04MB6306.eurprd04.prod.outlook.com>
Message-ID: <CAFpSVp+N7tSL74yZb88bzqBSowyCVqaHrufcPqtPTqw6DC3W0A@mail.gmail.com>

I think we need to do something about the 1D case.  I know from a strict
mathematical standpoint it doesn't do anything, and philosophically we
should avoid special cases, but I think the current solution leads to
enough confusion and silently doing an unexpected thing that I think we
need a better approach.

Personally I think it is a nonsensical operation and so should result in an
exception, but at the very least I think it needs to raise a warning.

On Mon, Jun 24, 2019, 09:54 Hameer Abbasi <einstein.edison at gmail.com> wrote:

> Hello Marten,
>
> I was suggesting not changing the shape at all, since dot/matmul/solve do
> the right thing already in such a case.
>
> In my proposal, only for ndim >=2 do we switch the last two dimensions.
>
> Ilhan is right that adding a special case for ndim=1 (error) adds
> programmer overhead, which is against the general philosophy of NumPy I
> feel.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
> ------------------------------
> *From:* NumPy-Discussion <numpy-discussion-bounces+einstein.edison=
> gmail.com at python.org> on behalf of Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com>
> *Sent:* Monday, June 24, 2019 3:24 PM
> *To:* Discussion of Numerical Python
> *Subject:* Re: [Numpy-discussion] Syntax Improvement for Array Transpose
>
> Dear Hameer, Ilhan,
>
> Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
> of `(n, 1)` the right behaviour? I.e., it should still change from what it
> is now - which is to leave the shape at `(n,)`.
>
> Your argument about `dot` and `matmul` having similar behaviour certainly
> adds weight (but then, as I wrote before, my opinion on this changes by the
> second, so I'm very happy to defer to others who have a clearer sense of
> what is the right thing to do here!).
>
> I think my main worry now is how to get to be able to use a new state
> without having to wait 4..6 releases...
>
> All the best,
>
> Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/7e8628c8/attachment.html>

From alan.isaac at gmail.com  Mon Jun 24 10:32:07 2019
From: alan.isaac at gmail.com (Alan Isaac)
Date: Mon, 24 Jun 2019 10:32:07 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <AM0PR04MB6306B5011458D87742E50B7FF7E00@AM0PR04MB6306.eurprd04.prod.outlook.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <AM0PR04MB6306B5011458D87742E50B7FF7E00@AM0PR04MB6306.eurprd04.prod.outlook.com>
Message-ID: <bad1f11f-2dda-e28b-ad70-3ee7c930fd39@gmail.com>

Points of reference:
Mathematica: https://reference.wolfram.com/language/ref/Transpose.html
Matlab: https://www.mathworks.com/help/matlab/ref/permute.html

Personally I would find any divergence between a.T and a.transpose()
to be rather surprising.

Cheers, Alan Isaac

From toddrjen at gmail.com  Mon Jun 24 10:45:13 2019
From: toddrjen at gmail.com (Todd)
Date: Mon, 24 Jun 2019 10:45:13 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <bad1f11f-2dda-e28b-ad70-3ee7c930fd39@gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <AM0PR04MB6306B5011458D87742E50B7FF7E00@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <bad1f11f-2dda-e28b-ad70-3ee7c930fd39@gmail.com>
Message-ID: <CAFpSVpJPHcDnLg2WwM_xraA0ySjT8X6-fsOPyvQeyw8Du6rCxA@mail.gmail.com>

I think the corresponding MATLAB function/operation is this:

https://www.mathworks.com/help/matlab/ref/transpose.html


On Mon, Jun 24, 2019, 10:33 Alan Isaac <alan.isaac at gmail.com> wrote:

> Points of reference:
> Mathematica: https://reference.wolfram.com/language/ref/Transpose.html
> Matlab: https://www.mathworks.com/help/matlab/ref/permute.html
>
> Personally I would find any divergence between a.T and a.transpose()
> to be rather surprising.
>
> Cheers, Alan Isaac
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/7fe4b34e/attachment.html>

From shoyer at gmail.com  Mon Jun 24 10:59:42 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 24 Jun 2019 17:59:42 +0300
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
Message-ID: <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>

On Sun, Jun 23, 2019 at 10:05 PM Stewart Clelland <stewartclelland at gmail.com>
wrote:

> Hi All,
>
> Based on discussion with Marten on github
> <https://github.com/numpy/numpy/issues/13797>, I have a couple of
> suggestions on syntax improvements on array transpose operations.
>
> First, introducing a shorthand for the Hermitian Transpose operator. I
> thought "A.HT" might be a viable candidate.
>

I agree that short-hand for the Hermitian transpose would make sense,
though I would try to stick with "A.H". It's one of the last reasons to
prefer the venerable np.matrix. NumPy arrays already has loads of
methods/properties, and this is a case (like @ for matrix multiplication)
where the operator significantly improves readability: consider "(x.H @
M @ x) / (x.H @ x)" vs "(x.conj().T @ M @ x) / (x.conj().T @ x)" [1].
Nearly everyone who does linear algebra with complex numbers would find
this useful.

If I recall correctly, the last time this came up, it was suggested that we
might implement this with NumPy view as  a "complex conjugate" dtype rather
than a memory copy. This would allow the operation to be essentially free.
I find this very appealing, both due to symmetry with ".T" and because of
the principle that properties should be cheap to compute.

So my tentative vote would be (1) yes, let's do the short-hand attribute,
but (2) let's wait until we have a complex conjugate dtype that do this
efficiently. My hope is that this should be relatively doable in a year or
two after current dtype refactor/usability effect comes to fruition.

Best,
Stephan

[1]  I copied the first non-trivial example off the Wikipedia page for a
Hermitian matrix:  https://en.wikipedia.org/wiki/Hermitian_matrix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/ff98058f/attachment.html>

From toddrjen at gmail.com  Mon Jun 24 11:10:35 2019
From: toddrjen at gmail.com (Todd)
Date: Mon, 24 Jun 2019 11:10:35 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
Message-ID: <CAFpSVpKfJFd0xXBrKzU9=zfR6E91meu0HLkHV_eM0O1UszdZqA@mail.gmail.com>

On Mon, Jun 24, 2019 at 11:00 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Sun, Jun 23, 2019 at 10:05 PM Stewart Clelland <
> stewartclelland at gmail.com> wrote:
>
>> Hi All,
>>
>> Based on discussion with Marten on github
>> <https://github.com/numpy/numpy/issues/13797>, I have a couple of
>> suggestions on syntax improvements on array transpose operations.
>>
>> First, introducing a shorthand for the Hermitian Transpose operator. I
>> thought "A.HT" might be a viable candidate.
>>
>
> I agree that short-hand for the Hermitian transpose would make sense,
> though I would try to stick with "A.H". It's one of the last reasons to
> prefer the venerable np.matrix. NumPy arrays already has loads of
> methods/properties, and this is a case (like @ for matrix multiplication)
> where the operator significantly improves readability: consider "(x.H @
> M @ x) / (x.H @ x)" vs "(x.conj().T @ M @ x) / (x.conj().T @ x)" [1].
> Nearly everyone who does linear algebra with complex numbers would find
> this useful.
>
> If I recall correctly, the last time this came up, it was suggested that
> we might implement this with NumPy view as  a "complex conjugate" dtype
> rather than a memory copy. This would allow the operation to be essentially
> free. I find this very appealing, both due to symmetry with ".T" and
> because of the principle that properties should be cheap to compute.
>
> So my tentative vote would be (1) yes, let's do the short-hand attribute,
> but (2) let's wait until we have a complex conjugate dtype that do this
> efficiently. My hope is that this should be relatively doable in a year or
> two after current dtype refactor/usability effect comes to fruition.
>
> Best,
> Stephan
>
> [1]  I copied the first non-trivial example off the Wikipedia page for a
> Hermitian matrix:  https://en.wikipedia.org/wiki/Hermitian_matrix
>
>
I would call it .CT or something like that, based on the term "Conjugate
transpose".  Wikipedia redirects "Hermitian transpose" to "Conjugate
transpose", and google has 49,800 results for "Hermitian transpose" vs
201,000 for "Conjugate transpose" (both with quotes).  So "Conjugate
transpose" seems to be the more widely-known name.  Further, I think what a
"Conjugate transpose" does is immediately obvious to someone who isn't
already familiar with the term so long as they know what a "conjugate" and
"transpose" are, while no one would be able to tell what a "Hermitian
transpose" unless they are already familiar with the name.  So I have no
problem calling it a "Hermitian transpose" somewhere in the docs, but I
think the naming and documentation should focus on the "Conjugate
transpose" term.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/292a4ff5/attachment.html>

From alan.isaac at gmail.com  Mon Jun 24 11:30:24 2019
From: alan.isaac at gmail.com (Alan Isaac)
Date: Mon, 24 Jun 2019 11:30:24 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAFpSVpJPHcDnLg2WwM_xraA0ySjT8X6-fsOPyvQeyw8Du6rCxA@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <AM0PR04MB6306B5011458D87742E50B7FF7E00@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <bad1f11f-2dda-e28b-ad70-3ee7c930fd39@gmail.com>
 <CAFpSVpJPHcDnLg2WwM_xraA0ySjT8X6-fsOPyvQeyw8Du6rCxA@mail.gmail.com>
Message-ID: <f8018f01-0064-e428-de3e-11b86c3c0a24@gmail.com>

Iirc, that works only on (2-d) matrices.
Cheers, Alan Isaac


On 6/24/2019 10:45 AM, Todd wrote:
> I think the corresponding MATLAB function/operation is this:
> 
> https://www.mathworks.com/help/matlab/ref/transpose.html


From allanhaldane at gmail.com  Mon Jun 24 11:46:02 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Mon, 24 Jun 2019 11:46:02 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
Message-ID: <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>

On 6/22/19 11:50 AM, Marten van Kerkwijk wrote:
> Hi Allan,
> 
> I'm not sure I would go too much by what the old MaskedArray class did. 
> It indeed made an effort not to overwrite masked values with a new 
> result, even to the extend of copying back masked input data elements to 
> the output data array after an operation. But the fact that this is 
> non-sensical if the dtype changes (or the units in an operation on 
> quantities) suggests that this mental model simply does not work.
> 
> I think a sensible alternative mental model for the MaskedArray class is 
> that all it does is forward any operations to the data it holds and 
> separately propagate a mask, ORing elements together for binary 
> operations, etc., and explicitly skipping masked elements in reductions 
> (ideally using `where` to be as agnostic as possible about the 
> underlying data, for which, e.g., setting masked values to `0` for 
> `np.reduce.add` may or may not be the right thing to do - what if they 
> are string? >
> With this mental picture, the underlying data are always have 
> well-defined meaning: they have been operated on as if the mask did not 
> exist. There then is also less reason to try to avoid getting it back to 
> the user.
> 
> As a concrete example (maybe Ben has others): in astropy we have a 
> sigma-clipping average routine, which uses a `MaskedArray` to 
> iteratively mask items that are too far off from the mean; here, the 
> mask varies each iteration (an initially masked element can come back 
> into play), but the data do not.
> 
> All the best,
> 
> Marten

I want to distinguish 3 different behaviors we are discussing:

 1. Making a "no-clobber" guarantee on the underlying data
 2. whether the data under the mask should have "sensible" values
 3. whether to allow "unmasking"


1. For "no clobber"
===================

I agree this would be a nice guarantee to make, as long as it does not
impose too much burden on us. Sometimes it is better not to chain
ourselves down.

By using "where", I can indeed make many api-functions and ufuncs
guarantee no-clobber. There are still a bunch of functions that seem
tricky to implement currently without either clobbering or making a
copy: dot, cross, outer, sort/argsort/searchsorted, correlate, convolve,
nonzero, and similar functions.

I'll think about how to implement those in a no-clobber way. Any
suggestions welcome, eg for np.dot/outer.

A no-clobber guarantee makes your "iterative mask" example solvable in
an efficient (no-copy) way:

    mask, last_mask = False
    while True:
        dat_mean = np.mean(MaskedArray(data, mask))
        mask, last_mask = np.abs(data - mask) > cutoff, mask
        if np.all(mask == last_mask):
            break

The MaskedArray constructor should have pretty minimal overhead.


2. Whether we can make "masked" data keep sensible values
=========================================================

I'm not confident this is a good guarantee to make. Certainly in a
simple case like c = a + b we can make masked values in c contain the
correct sum of the masked data in a and b. But I foresee complications
in other cases. For instance,

    MaskedArray([4,4,4])/MaskedArray([0, 1, 2], mask=[1,0,1])

If we use the "where" ufunc argument to evaluate the division operation,
a division-by-0 warning is not output which is good. However, if we use
"where" index 2 does not get updated correctly and will contain
"nonsense". If we use "where" a lot (which I think we should) we can
expect a lot of uninitialized masked values to commonly appear.

So my interpretation is that this comment:

> I think a sensible alternative mental model for the MaskedArray class
is that all it does is forward any operations to the data it holds and
separately propagate a mask

should only roughly be true, in the sense that we will not simply
"forward any operations" but we will also use "where" arguments which
produce nonsense masked values.


3. Whether to allow unmasking
=============================

If we agree that masked values will contain nonsense, it seems like a
bad idea for those values to be easily exposed.

Further, in all the comments so far I have not seen an example of a need
for unmasking that is not more easily, efficiently and safely achieved
by simply creating a new MaskedArray with a different mask.

If super-users want to access the ._data attribute they can, but I don't
think it should be recommended.

Normal users can use the ".filled" method, which by the way I
implemented to optionally support returning a readonly view rather than
a copy (the default).

Cheers,
Allan

> 
> On Sat, Jun 22, 2019 at 10:54 AM Allan Haldane <allanhaldane at gmail.com 
> <mailto:allanhaldane at gmail.com>> wrote:
> 
>     On 6/21/19 2:37 PM, Benjamin Root wrote:
>      > Just to note, data that is masked isn't always garbage. There are
>     plenty
>      > of use-cases where one may want to temporarily apply a mask for a
>     set of
>      > computation, or possibly want to apply a series of different masks to
>      > the data. I haven't read through this discussion deeply enough,
>     but is
>      > this new class going to destroy underlying masked data? and will
>     it be
>      > possible to swap out masks?
>      >
>      > Cheers!
>      > Ben Root
> 
>     Indeed my implementation currently feels free to clobber the data at
>     masked positions and makes no guarantees not to.
> 
>     I'd like to try to support reasonable use-cases like yours though. A
>     few
>     thoughts:
> 
>     First, the old np.ma.MaskedArray explicitly does not promise to
>     preserve
>     masked values, with a big warning in the docs. I can't recall the
>     examples, but I remember coming across cases where clobbering happens.
>     So arguably your behavior was never supported, and perhaps this means
>     that no-clobber behavior is difficult to reasonably support.
> 
>     Second, the old np.ma.MaskedArray avoids frequent clobbering by making
>     lots of copies. Therefore, in most cases you will not lose any
>     performance in my new MaskedArray relative to the old one by making an
>     explicit copy yourself. I.e, is it problematic to have to do
> 
>      ? ? ?>>> result = MaskedArray(data.copy(), trial_mask).sum()
> 
>     instead of
> 
>      ? ? ?>>> marr.mask = trial_mask
>      ? ? ?>>> result = marr.sum()
> 
>     since they have similar performance?
> 
>     Third, in the old np.ma.MaskedArray masked positions are very often
>     "effectively" clobbered, in the sense that they are not computed. For
>     example, if you do "c = a+b", and then change the mask of c, the values
>     at masked position of the result of (a+b) do not correspond to the sum
>     of the masked values in a and b. Thus, by "unmasking" c you are
>     exposing
>     nonsense values, which to me seems likely to cause heisenbugs.
> 
> 
>     In summary, by not making no-clobber guarantees and by strictly
>     preventing exposure of nonsense values, I suspect that: 1. my new code
>     is simpler and faster by avoiding lots of copies, and forces copies to
>     be explicit in user code. 2. disallowing direct modification of the
>     mask
>     lowers the "API surface area" making people's MaskedArray code less
>     buggy and easier to read: Exposure of nonsense values by "unmasking" is
>     one less possibility to keep in mind.
> 
>     Best,
>     Allan
> 
> 
>      > On Thu, Jun 20, 2019 at 12:44 PM Allan Haldane
>     <allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>      > <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>>
>     wrote:
>      >
>      >? ? ?On 6/19/19 10:19 PM, Marten van Kerkwijk wrote:
>      >? ? ?> Hi Allan,
>      >? ? ?>
>      >? ? ?> This is very impressive! I could get the tests that I wrote
>     for my
>      >? ? ?class
>      >? ? ?> pass with yours using Quantity with what I would consider
>     very minimal
>      >? ? ?> changes. I only could not find a good way to unmask data (I
>     like the
>      >? ? ?> idea of setting the mask on some elements via `ma[item] =
>     X`); is this
>      >? ? ?> on purpose?
>      >
>      >? ? ?Yes, I want to make it difficult for the user to access the
>     garbage
>      >? ? ?values under the mask, which are often clobbered values. The
>     only way to
>      >? ? ?"remove" a masked value is by replacing it with a new
>     non-masked value.
>      >
>      >
>      >? ? ?> Anyway, it would seem easily at the point where I should
>     comment
>      >? ? ?on your
>      >? ? ?> repository rather than in the mailing list!
>      >
>      >? ? ?To make further progress on this encapsulation idea I need a more
>      >? ? ?complete ducktype to pass into MaskedArray to test, so that's
>     what I'll
>      >? ? ?work on next, when I have time. I'll either try to finish my
>      >? ? ?ArrayCollection type, or try making a simple NDunit ducktype
>      >? ? ?piggybacking on astropy's Unit.
>      >
>      >? ? ?Best,
>      >? ? ?Allan
>      >
>      >
>      >? ? ?>
>      >? ? ?> All the best,
>      >? ? ?>
>      >? ? ?> Marten
>      >? ? ?>
>      >? ? ?>
>      >? ? ?> On Wed, Jun 19, 2019 at 5:45 PM Allan Haldane
>      >? ? ?<allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>     <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>
>      >? ? ?> <mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com> <mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com>>>>
>      >? ? ?wrote:
>      >? ? ?>
>      >? ? ?>? ? ?On 6/18/19 2:04 PM, Marten van Kerkwijk wrote:
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> On Tue, Jun 18, 2019 at 12:55 PM Allan Haldane
>      >? ? ?>? ? ?<allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>
>     <mailto:allanhaldane at gmail.com <mailto:allanhaldane at gmail.com>>
>      >? ? ?<mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com> <mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com>>>
>      >? ? ?>? ? ?> <mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com>
>      >? ? ?<mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com>> <mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com>
>      >? ? ?<mailto:allanhaldane at gmail.com
>     <mailto:allanhaldane at gmail.com>>>>>
>      >? ? ?>? ? ?wrote:
>      >? ? ?>? ? ?> <snip>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?> This may be too much to ask from the
>     initializer, but, if
>      >? ? ?>? ? ?so, it still
>      >? ? ?>? ? ?>? ? ?> seems most useful if it is made as easy as
>     possible to do,
>      >? ? ?>? ? ?say, `class
>      >? ? ?>? ? ?>? ? ?> MaskedQuantity(Masked, Quantity): <very few
>     overrides>`.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?Currently MaskedArray does not accept ducktypes as
>      >? ? ?underlying
>      >? ? ?>? ? ?arrays,
>      >? ? ?>? ? ?>? ? ?but I think it shouldn't be too hard to modify it
>     to do so.
>      >? ? ?>? ? ?Good idea!
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> Looking back at my trial, I see that I also never got to
>      >? ? ?duck arrays -
>      >? ? ?>? ? ?> only ndarray subclasses - though I tried to make the
>     code as
>      >? ? ?>? ? ?agnostic as
>      >? ? ?>? ? ?> possible.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> (Trial at
>      >? ? ?>? ? ?>
>      >? ? ?>
>      >
>     https://github.com/astropy/astropy/compare/master...mhvk:utils-masked-class?expand=1)
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?I already partly navigated this mixin-issue in the
>      >? ? ?>? ? ?>? ? ?"MaskedArrayCollection" class, which essentially does
>      >? ? ?>? ? ?>? ? ?ArrayCollection(MaskedArray(array)), and only
>     takes about 30
>      >? ? ?>? ? ?lines of
>      >? ? ?>? ? ?>? ? ?boilerplate. That's the backwards encapsulation
>     order from
>      >? ? ?>? ? ?what you want
>      >? ? ?>? ? ?>? ? ?though.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> Yes, indeed, from a quick trial
>     `MaskedArray(np.arange(3.) *
>      >? ? ?u.m,
>      >? ? ?>? ? ?> mask=[True, False, False])` does indeed not have a
>     `.unit`
>      >? ? ?attribute
>      >? ? ?>? ? ?> (and cannot represent itself...); I'm not at all sure
>     that my
>      >? ? ?>? ? ?method of
>      >? ? ?>? ? ?> just creating a mixed class is anything but a recipe for
>      >? ? ?disaster,
>      >? ? ?>? ? ?though!
>      >? ? ?>
>      >? ? ?>? ? ?Based on your suggestion I worked on this a little
>     today, and
>      >? ? ?now my
>      >? ? ?>? ? ?MaskedArray more easily encapsulates both ducktypes and
>     ndarray
>      >? ? ?>? ? ?subclasses (pushed to repo). Here's an example I got
>     working
>      >? ? ?with masked
>      >? ? ?>? ? ?units using unyt:
>      >? ? ?>
>      >? ? ?>? ? ?[1]: from MaskedArray import X, MaskedArray, MaskedScalar
>      >? ? ?>
>      >? ? ?>? ? ?[2]: from unyt import m, km
>      >? ? ?>
>      >? ? ?>? ? ?[3]: import numpy as np
>      >? ? ?>
>      >? ? ?>? ? ?[4]: uarr = MaskedArray([1., 2., 3.]*km, mask=[0,1,0])
>      >? ? ?>
>      >? ? ?>? ? ?[5]: uarr
>      >? ? ?>
>      >? ? ?>? ? ?MaskedArray([1., X , 3.])
>      >? ? ?>? ? ?[6]: uarr + 1*m
>      >? ? ?>
>      >? ? ?>? ? ?MaskedArray([1.001, X? ? , 3.001])
>      >? ? ?>? ? ?[7]: uarr.filled()
>      >? ? ?>
>      >? ? ?>? ? ?unyt_array([1., 0., 3.], 'km')
>      >? ? ?>? ? ?[8]: np.concatenate([uarr, 2*uarr]).filled()
>      >? ? ?>? ? ?unyt_array([1., 0., 3., 2., 0., 6.], '(dimensionless)')
>      >? ? ?>
>      >? ? ?>? ? ?The catch is the ducktype/subclass has to rigorously follow
>      >? ? ?numpy's
>      >? ? ?>? ? ?indexing rules, including distinguishing 0d arrays from
>      >? ? ?scalars. For now
>      >? ? ?>? ? ?only I used unyt in the example above since it happens
>     to be
>      >? ? ?less strict
>      >? ? ?>? ? ??about dimensionless operations than astropy.units
>     which trips
>      >? ? ?up my
>      >? ? ?>? ? ?repr code. (see below for example with astropy.units).
>     Note in
>      >? ? ?the last
>      >? ? ?>? ? ?line I lost the dimensions, but that is because unyt
>     does not
>      >? ? ?handle
>      >? ? ?>? ? ?np.concatenate. To get that to work we need a true ducktype
>      >? ? ?for units.
>      >? ? ?>
>      >? ? ?>? ? ?The example above doesn't expose the ".units" attribute
>      >? ? ?outside the
>      >? ? ?>? ? ?MaskedArray, and it doesn't print the units in the
>     repr. But
>      >? ? ?you can
>      >? ? ?>? ? ?access them using "filled".
>      >? ? ?>
>      >? ? ?>? ? ?While I could make MaskedArray forward unknown attribute
>      >? ? ?accesses to the
>      >? ? ?>? ? ?encapsulated array, that seems a bit
>     dangerous/bug-prone at first
>      >? ? ?>? ? ?glance, so probably I want to require the user to make a
>      >? ? ?MaskedArray
>      >? ? ?>? ? ?subclass to do so. I've just started playing with that
>      >? ? ?(probably buggy),
>      >? ? ?>? ? ?and Ive attached subclass examples for astropy.unit and
>     unyt,
>      >? ? ?with some
>      >? ? ?>? ? ?example output below.
>      >? ? ?>
>      >? ? ?>? ? ?Cheers,
>      >? ? ?>? ? ?Allan
>      >? ? ?>
>      >? ? ?>
>      >? ? ?>
>      >? ? ?>? ? ?Example using the attached astropy unit subclass:
>      >? ? ?>
>      >? ? ?>? ? ?? ? >>> from astropy.units import m, km, s
>      >? ? ?>? ? ?? ? >>> uarr = MaskedQ(np.ones(3), units=km, mask=[0,1,0])
>      >? ? ?>? ? ?? ? >>> uarr
>      >? ? ?>? ? ?? ? MaskedQ([1., X , 1.], units=km)
>      >? ? ?>? ? ?? ? >>> uarr.units
>      >? ? ?>? ? ?? ? km
>      >? ? ?>? ? ?? ? >>> uarr + (1*m)
>      >? ? ?>? ? ?? ? MaskedQ([1.001, X? ? , 1.001], units=km)
>      >? ? ?>? ? ?? ? >>> uarr/(1*s)
>      >? ? ?>? ? ?? ? MaskedQ([1., X , 1.], units=km / s)
>      >? ? ?>? ? ?? ? >>> (uarr*(1*m))[1:]
>      >? ? ?>? ? ?? ? MaskedQ([X , 1.], units=km m)
>      >? ? ?>? ? ?? ? >>> np.add.outer(uarr, uarr)
>      >? ? ?>? ? ?? ? MaskedQ([[2., X , 2.],
>      >? ? ?>? ? ?? ? ? ? ? ? ?[X , X , X ],
>      >? ? ?>? ? ?? ? ? ? ? ? ?[2., X , 2.]], units=km)
>      >? ? ?>? ? ?? ? >>> print(uarr)
>      >? ? ?>? ? ?? ? [1. X? 1.] km m
>      >? ? ?>
>      >? ? ?>? ? ?Cheers,
>      >? ? ?>? ? ?Allan
>      >? ? ?>
>      >? ? ?>
>      >? ? ?>? ? ?>? ? ?> Even if this impossible, I think it is
>     conceptually useful
>      >? ? ?>? ? ?to think
>      >? ? ?>? ? ?>? ? ?> about what the masking class should do. My
>     sense is that,
>      >? ? ?>? ? ?e.g., it
>      >? ? ?>? ? ?>? ? ?> should not attempt to decide when an operation
>      >? ? ?succeeds or not,
>      >? ? ?>? ? ?>? ? ?but just
>      >? ? ?>? ? ?>? ? ?> "or together" input masks for regular,
>     multiple-input
>      >? ? ?functions,
>      >? ? ?>? ? ?>? ? ?and let
>      >? ? ?>? ? ?>? ? ?> the underlying arrays skip elements for
>     reductions by
>      >? ? ?using
>      >? ? ?>? ? ?`where`
>      >? ? ?>? ? ?>? ? ?> (hey, I did implement that for a reason... ;-). In
>      >? ? ?>? ? ?particular, it
>      >? ? ?>? ? ?>? ? ?> suggests one should not have things like
>     domains and all
>      >? ? ?>? ? ?that (I never
>      >? ? ?>? ? ?>? ? ?> understood why `MaskedArray` did that). If one
>     wants more,
>      >? ? ?>? ? ?the class
>      >? ? ?>? ? ?>? ? ?> should provide a method that updates the mask
>     (a sensible
>      >? ? ?>? ? ?default
>      >? ? ?>? ? ?>? ? ?might
>      >? ? ?>? ? ?>? ? ?> be `mask |= ~np.isfinite(result)` - here, the class
>      >? ? ?being masked
>      >? ? ?>? ? ?>? ? ?should
>      >? ? ?>? ? ?>? ? ?> logically support ufuncs and functions, so it can
>      >? ? ?decide what
>      >? ? ?>? ? ?>? ? ?"isfinite"
>      >? ? ?>? ? ?>? ? ?> means).
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?I agree it would be nice to remove domains. It would
>      >? ? ?make life
>      >? ? ?>? ? ?easier,
>      >? ? ?>? ? ?>? ? ?and I could remove a lot of twiddly code! I kept
>     it in
>      >? ? ?for now to
>      >? ? ?>? ? ?>? ? ?minimize the behavior changes from the old
>     MaskedArray.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> That makes sense. Could be separated out to a
>      >? ? ?backwards-compatibility
>      >? ? ?>? ? ?> class later.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?> In any case, I would think that a basic truth
>     should
>      >? ? ?be that
>      >? ? ?>? ? ?>? ? ?everything
>      >? ? ?>? ? ?>? ? ?> has a mask with a shape consistent with the
>     data, so
>      >? ? ?>? ? ?>? ? ?> 1. Each complex numbers has just one mask, and
>     setting
>      >? ? ?>? ? ?`a.imag` with a
>      >? ? ?>? ? ?>? ? ?> masked array should definitely propagate the mask.
>      >? ? ?>? ? ?>? ? ?> 2. For a masked array with structured dtype, I'd
>      >? ? ?similarly say
>      >? ? ?>? ? ?>? ? ?that the
>      >? ? ?>? ? ?>? ? ?> default is for a mask to have the same shape as
>     the array.
>      >? ? ?>? ? ?But that
>      >? ? ?>? ? ?>? ? ?> something like your collection makes sense for
>     the case
>      >? ? ?>? ? ?where one
>      >? ? ?>? ? ?>? ? ?wants
>      >? ? ?>? ? ?>? ? ?> to mask items in a structure.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?Agreed that we should have a single bool per
>     complex or
>      >? ? ?structured
>      >? ? ?>? ? ?>? ? ?element, and the mask shape is the same as the
>     array shape.
>      >? ? ?>? ? ?That's how I
>      >? ? ?>? ? ?>? ? ?implemented it. But there is still a problem with
>      >? ? ?complex.imag
>      >? ? ?>? ? ?>? ? ?assignment:
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?? ? >>> a = MaskedArray([1j, 2, X])
>      >? ? ?>? ? ?>? ? ?? ? >>> i = a.imag
>      >? ? ?>? ? ?>? ? ?? ? >>> i[:] = MaskedArray([1, X, 1])
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?If we make the last line copy the mask to the
>     original
>      >? ? ?array, what
>      >? ? ?>? ? ?>? ? ?should the real part of a[2] be? Conversely, if
>     we don't
>      >? ? ?copy
>      >? ? ?>? ? ?the mask,
>      >? ? ?>? ? ?>? ? ?what should the imag part of a[1] be? It seems
>     like we might
>      >? ? ?>? ? ?"want" the
>      >? ? ?>? ? ?>? ? ?masks to be OR'd instead, but then should i[2] be
>     masked
>      >? ? ?after
>      >? ? ?>? ? ?we just
>      >? ? ?>? ? ?>? ? ?set it to 1?
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> Ah, I see the issue now... Easiest to implement and
>     closest
>      >? ? ?in analogy
>      >? ? ?>? ? ?> to a regular view would be to just let it unmask a[2]
>     (with
>      >? ? ?>? ? ?whatever is
>      >? ? ?>? ? ?> in real; user beware!).
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> Perhaps better would be to special-case such that `imag`
>      >? ? ?returns a
>      >? ? ?>? ? ?> read-only view of the mask. Making `imag` itself
>     read-only would
>      >? ? ?>? ? ?prevent
>      >? ? ?>? ? ?> possibly reasonable things like `i[np.isclose(i, 0)]
>     = 0` - but
>      >? ? ?>? ? ?there is
>      >? ? ?>? ? ?> no reason this should update the mask.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> Still, neither is really satisfactory...
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?> p.s. I started trying to implement the above
>     "Mixin"
>      >? ? ?class; will
>      >? ? ?>? ? ?>? ? ?try to
>      >? ? ?>? ? ?>? ? ?> clean that up a bit so that at least it uses
>     `where` and
>      >? ? ?>? ? ?push it up.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?I played with "where", but didn't include it
>     since 1.17
>      >? ? ?is not
>      >? ? ?>? ? ?released.
>      >? ? ?>? ? ?>? ? ?To avoid duplication of effort, I've attached a
>     diff of
>      >? ? ?what I
>      >? ? ?>? ? ?tried. I
>      >? ? ?>? ? ?>? ? ?actually get a slight slowdown of about 10% by using
>      >? ? ?where...
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> Your implementation is indeed quite similar to what I
>     got in
>      >? ? ?>? ? ?> __array_ufunc__ (though one should "&" the where with
>     ~mask).
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> I think the main benefit is not to presume that
>     whatever is
>      >? ? ?underneath
>      >? ? ?>? ? ?> understands 0 or 1, i.e., avoid filling.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?>? ? ?If you make progress with the mixin, a push is
>     welcome. I
>      >? ? ?>? ? ?imagine a
>      >? ? ?>? ? ?>? ? ?problem is going to be that np.isscalar doesn't
>     work to
>      >? ? ?detect
>      >? ? ?>? ? ?duck
>      >? ? ?>? ? ?>? ? ?scalars.
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> I fear that in my attempts I've simply decided that only
>      >? ? ?array scalars
>      >? ? ?>? ? ?> exist...
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> -- Marten
>      >? ? ?>? ? ?>
>      >? ? ?>? ? ?> _______________________________________________
>      >? ? ?>? ? ?> NumPy-Discussion mailing list
>      >? ? ?>? ? ?> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>      >? ? ?<mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>      >? ? ?<mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>      >? ? ?<mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>>
>      >? ? ?>? ? ?> https://mail.python.org/mailman/listinfo/numpy-discussion
>      >? ? ?>? ? ?>
>      >? ? ?>
>      >? ? ?>? ? ?_______________________________________________
>      >? ? ?>? ? ?NumPy-Discussion mailing list
>      >? ? ?> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>      >? ? ?<mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>      >? ? ?<mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>      >? ? ?<mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>>
>      >? ? ?> https://mail.python.org/mailman/listinfo/numpy-discussion
>      >? ? ?>
>      >? ? ?>
>      >? ? ?> _______________________________________________
>      >? ? ?> NumPy-Discussion mailing list
>      >? ? ?> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>      >? ? ?> https://mail.python.org/mailman/listinfo/numpy-discussion
>      >? ? ?>
>      >
>      >? ? ?_______________________________________________
>      >? ? ?NumPy-Discussion mailing list
>      > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>      > https://mail.python.org/mailman/listinfo/numpy-discussion
>      >
>      >
>      > _______________________________________________
>      > NumPy-Discussion mailing list
>      > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>      > https://mail.python.org/mailman/listinfo/numpy-discussion
>      >
> 
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From shoyer at gmail.com  Mon Jun 24 11:48:04 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 24 Jun 2019 08:48:04 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAFpSVpKfJFd0xXBrKzU9=zfR6E91meu0HLkHV_eM0O1UszdZqA@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAFpSVpKfJFd0xXBrKzU9=zfR6E91meu0HLkHV_eM0O1UszdZqA@mail.gmail.com>
Message-ID: <CAEQ_Tvcnkf_187j0pS+oF87519WcNjq67nJPNVbRvOuf81kkqg@mail.gmail.com>

On Mon, Jun 24, 2019 at 8:10 AM Todd <toddrjen at gmail.com> wrote:

> On Mon, Jun 24, 2019 at 11:00 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Sun, Jun 23, 2019 at 10:05 PM Stewart Clelland <
>> stewartclelland at gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Based on discussion with Marten on github
>>> <https://github.com/numpy/numpy/issues/13797>, I have a couple of
>>> suggestions on syntax improvements on array transpose operations.
>>>
>>> First, introducing a shorthand for the Hermitian Transpose operator. I
>>> thought "A.HT" might be a viable candidate.
>>>
>>
>> I agree that short-hand for the Hermitian transpose would make sense,
>> though I would try to stick with "A.H". It's one of the last reasons to
>> prefer the venerable np.matrix. NumPy arrays already has loads of
>> methods/properties, and this is a case (like @ for matrix multiplication)
>> where the operator significantly improves readability: consider "(x.H @
>> M @ x) / (x.H @ x)" vs "(x.conj().T @ M @ x) / (x.conj().T @ x)" [1].
>> Nearly everyone who does linear algebra with complex numbers would find
>> this useful.
>>
>> If I recall correctly, the last time this came up, it was suggested that
>> we might implement this with NumPy view as  a "complex conjugate" dtype
>> rather than a memory copy. This would allow the operation to be essentially
>> free. I find this very appealing, both due to symmetry with ".T" and
>> because of the principle that properties should be cheap to compute.
>>
>> So my tentative vote would be (1) yes, let's do the short-hand attribute,
>> but (2) let's wait until we have a complex conjugate dtype that do this
>> efficiently. My hope is that this should be relatively doable in a year or
>> two after current dtype refactor/usability effect comes to fruition.
>>
>> Best,
>> Stephan
>>
>> [1]  I copied the first non-trivial example off the Wikipedia page for a
>> Hermitian matrix:  https://en.wikipedia.org/wiki/Hermitian_matrix
>>
>>
> I would call it .CT or something like that, based on the term "Conjugate
> transpose".  Wikipedia redirects "Hermitian transpose" to "Conjugate
> transpose", and google has 49,800 results for "Hermitian transpose" vs
> 201,000 for "Conjugate transpose" (both with quotes).  So "Conjugate
> transpose" seems to be the more widely-known name.  Further, I think what a
> "Conjugate transpose" does is immediately obvious to someone who isn't
> already familiar with the term so long as they know what a "conjugate" and
> "transpose" are, while no one would be able to tell what a "Hermitian
> transpose" unless they are already familiar with the name.  So I have no
> problem calling it a "Hermitian transpose" somewhere in the docs, but I
> think the naming and documentation should focus on the "Conjugate
> transpose" term.
>

Sure, we should absolutely document the name as the "Conjugate transpose".
But the standard mathematical notation is definitely "A^H" rather than
"A^{CT}".


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/467ae3c3/attachment.html>

From allanhaldane at gmail.com  Mon Jun 24 11:59:36 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Mon, 24 Jun 2019 11:59:36 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAL1kJvCSJt6oGTtZdLoqoYQDtD6fxBzwmUMdiPwA1x740QtkCg@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <CAEQ_TvdyO-WuaTs3CL-fu7Y6MpPAzS8_jXdrHnp2z9_ADWX8gw@mail.gmail.com>
 <CAJNV+9t_a0zG4bh986xmSyQdoUGHwLk1PJxqPPVKG=RHuUNDjw@mail.gmail.com>
 <CAEQ_Tvf8TQhnJtgfxae2dmxJAKvYvT-9ZoGiFFE92eNsDdu_CA@mail.gmail.com>
 <CAJNV+9vQC4pv_ASGA4RiKjWOBzyr4PhJDrtpygvwbaXcUzQG6g@mail.gmail.com>
 <CAEQ_TvfzdyxgNo8EeV=533=y=9GtD7A+3K9X40QZ=kPQTCkczQ@mail.gmail.com>
 <CAL1kJvCSJt6oGTtZdLoqoYQDtD6fxBzwmUMdiPwA1x740QtkCg@mail.gmail.com>
Message-ID: <53ead052-6757-c942-daa2-0f293288fab9@gmail.com>

On 6/23/19 6:58 PM, Eric Wieser wrote:
>     I think we?d need to consider separately the operation on the mask
>     and on the data. In my proposal, the data would always do
>     |np.sum(array, where=~mask)|, while how the mask would propagate
>     might depend on the mask itself,
> 
> I quite like this idea, and I think Stephan?s strawman design is
> actually plausible, where |MaskedArray.mask| is either an |InvalidMask|
> or a |IgnoreMask| instance to pick between the different propagation
> types. Both classes could simply have an underlying |._array| attribute
> pointing to a duck-array of some kind that backs their boolean data.
> 
>     The second version requires that you /also/ know how Mask classes
>     work, and how they implement +
> 
> I remain unconvinced that Mask classes should behave differently on
> different ufuncs. I don?t think |np.minimum(ignore_na, b)| is any
> different to |np.add(ignore_na, b)| - either both should produce |b|, or
> both should produce |ignore_na|. I would lean towards produxing
> |ignore_na|, and propagation behavior differing between ?ignore? and
> ?invalid? only for |reduce| / |accumulate| operations, where the concept
> of skipping an application is well-defined.
> 
> Some possible follow-up questions that having two distinct masked types
> raise:
> 
>   * what if I want my data to support both invalid and skip fields at
>     the same time? sum([invalid, skip, 1]) == invalid
>   * is there a use case for more that these two types of mask?
>     |invalid_due_to_reason_A|, |invalid_due_to_reason_B| would be
>     interesting things to track through a calculation, possibly a
>     dictionary of named masks.
> 
> Eric

General comments on the last few emails:

For now I intentionally decided not to worry about NA masks in my
implementation. I want to get a first working implementation finished
for IGNORE style.

I agree it would be nice to have some support for NA style later, either
by a new MaskedArray subclass, a ducktype'd .mask attribute, or by some
other modification. In the latter category, consider that currently the
mask is stored as a boolean (1 byte) mask. One idea I have not put much
thought into is that internally we could make the mask a uint8, so
unmasked would be "0" IGNORE mask would be 1, and NA mask would be 2.
That allows mixing of mask types. Not sure how messy it would be to
implement.

For the idea of replacing the mask by a ducktype for NA style, my
instinct is that would be tricky. Many of the MaskedArray
__array_function__ method implementations, like sort and dot and many
others, do "complicated" computations using the mask that I don't think
you could easily get to work right by simply substituting a mask
ducktype. I think those would need to be reimplemented for NA style, in
other words you would need to make a MaskedArray subclass anyway.

Cheers,
Allan


> On Sun, 23 Jun 2019 at 15:28, Stephan Hoyer <shoyer at gmail.com
> <mailto:shoyer at gmail.com>> wrote:
> 
>     On Sun, Jun 23, 2019 at 11:55 PM Marten van Kerkwijk
>     <m.h.vankerkwijk at gmail.com <mailto:m.h.vankerkwijk at gmail.com>> wrote:
> 
>             Your proposal would be something like np.sum(array,
>             where=np.ones_like(array))? This seems rather verbose for a
>             common operation.?Perhaps np.sum(array, where=True) would
>             work, making use of broadcasting? (I haven't actually
>             checked whether this is well-defined yet.)
> 
>         I think we'd need to consider separately the operation on the
>         mask and on the data. In my proposal, the data would always do
>         `np.sum(array, where=~mask)`, while how the mask would propagate
>         might depend on the mask itself, i.e., we'd have different mask
>         types for `skipna=True` (default) and `False` ("contagious")
>         reductions, which differed in doing `logical_and.reduce` or
>         `logical_or.reduce` on the mask.
> 
> 
>     OK, I think I finally understand what you're getting at. So suppose
>     this this how we implement it internally. Would we really insist on
>     a user creating a new MaskedArray with a new mask object, e.g., with
>     a GreedyMask? We could add sugar for this, but certainly
>     array.greedy_masked().sum() is significantly less clear than
>     array.sum(skipna=False).
> 
>     I'm also a little concerned about a proliferation of
>     MaskedArray/Mask types. New types are significantly harder to
>     understand than new functions (or new arguments on existing
>     functions). I don't know if we have enough distinct use cases for
>     this many types.
> 
>             Are there use-cases for propagating masks separately from
>             data? If not, it might make sense to only define mask
>             operations along with data, which could be much simpler.
> 
> 
>         I had only thought about separating out the concern of mask
>         propagation from the "MaskedArray" class to the mask proper, but
>         it might indeed make things easier if the mask also did any
>         required preparation for passing things on to the data (such as
>         adjusting the "where" argument in a reduction). I also like that
>         this way the mask can determine even before the data what
>         functionality is available (i.e., it could be the place from
>         which to return `NotImplemented` for a ufunc.at
>         <http://ufunc.at> call with a masked index argument).
> 
> 
>     You're going to have to come up with something more compelling than
>     "separation of concerns" to convince me that this extra Mask
>     abstraction is worthwhile. On its own, I think a separate Mask class
>     would only obfuscate MaskedArray functions.
> 
>     For example, compare these two implementations of add:
> 
>     def? add1(x, y):
>     ? ? return MaskedArray(x.data?+ y.data,? x.mask | y.mask)
> 
>     def? add2(x, y):
>     ? ? return MaskedArray(x.data?+ y.data,? x.mask + y.mask)
> 
>     The second version requires that you *also* know how Mask classes
>     work, and how they implement +. So now you need to look in at least
>     twice as many places to understand add() for MaskedArray objects.
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From allanhaldane at gmail.com  Mon Jun 24 12:05:52 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Mon, 24 Jun 2019 12:05:52 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
Message-ID: <a0202408-285a-438f-b8a5-a3790ba408df@gmail.com>

On 6/24/19 11:46 AM, Allan Haldane wrote:
> A no-clobber guarantee makes your "iterative mask" example solvable in
> an efficient (no-copy) way:
> 
>     mask, last_mask = False
>     while True:
>         dat_mean = np.mean(MaskedArray(data, mask))
>         mask, last_mask = np.abs(data - mask) > cutoff, mask
>         if np.all(mask == last_mask):
>             break

Whoops, that should read "np.abs(data - dat_mean)" in there.

Allan

From shoyer at gmail.com  Mon Jun 24 12:16:08 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 24 Jun 2019 09:16:08 -0700
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
Message-ID: <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>

On Mon, Jun 24, 2019 at 8:46 AM Allan Haldane <allanhaldane at gmail.com>
wrote:

>  1. Making a "no-clobber" guarantee on the underlying data
>

Hi Allan -- could kindly clarify what you mean by "no-clobber"?

Is this referring to allowing masked arrays to mutate masked data values
in-place, even on apparently non-in-place operators? If so, that definitely
seems like a bad idea to me. I would much rather do an unnecessary copy
than have surprisingly non-thread-safe operations.


>  If we agree that masked values will contain nonsense, it seems like a
> bad idea for those values to be easily exposed.
>
> Further, in all the comments so far I have not seen an example of a need
> for unmasking that is not more easily, efficiently and safely achieved
> by simply creating a new MaskedArray with a different mask.


My understanding is that essentially every low-level MaskedArray function
is implemented by looking at the data and mask separately. If so, we should
definitely expose this API directly to users (as part of the public API for
MaskedArray), so they can write their own efficient algorithms.

As a concrete example, suppose I wanted to implement a low-level "grouped
mean" operation for masked arrays like that found in pandas. This isn't a
builtin NumPy function, so I would need to write this myself. This would be
relatively straightforward to do in Numba or Cython with raw NumPy arrays
(e.g., see my example here for a NaN skipping version:
https://github.com/shoyer/numbagg/blob/v0.1.0/numbagg/grouped.py), but to
do it efficiently you definitely don't want to make an unnecessary copy.

The usual reason for hiding implementation details is when we want to
reserve the right to change them. But if we're sure about the data model
(which I think we are for MaskedArray) then I think there's a lot of value
in exposing it directly to users, even if it's lower level than it
appropriate to use in most cases.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/c7ddb147/attachment.html>

From allanhaldane at gmail.com  Mon Jun 24 13:38:07 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Mon, 24 Jun 2019 13:38:07 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
Message-ID: <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>

On 6/24/19 12:16 PM, Stephan Hoyer wrote:
> On Mon, Jun 24, 2019 at 8:46 AM Allan Haldane <allanhaldane at gmail.com
> <mailto:allanhaldane at gmail.com>> wrote:
> 
>     ?1. Making a "no-clobber" guarantee on the underlying data
> 
> 
> Hi Allan -- could kindly clarify what you mean by "no-clobber"?
> 
> Is this referring to allowing masked arrays to mutate masked data values
> in-place, even on apparently non-in-place operators? If so, that
> definitely seems like a bad idea to me. I would much rather do an
> unnecessary copy than have surprisingly non-thread-safe operations.

Yes. In my current implementation, the operation:

     >>> a = np.arange(6)
     >>> m = MaskedArray(a, mask=a < 3)
     >>> res = np.dot(m, m)

will clobber elements of a. It appears that to avoid clobbering we will
need to have np.dot make a copy. I also discuss how my implementation
clobbers views in the docs:

https://github.com/ahaldane/ndarray_ducktypes/blob/master/doc/MaskedArray.md#views-and-copies

I expect I could be convinced to make a no-clobber guarantee, if others
agree it is better to accept the performance loss by making a copy
internally.

I just still have a hard time thinking of cases where clobbering is
really that confusing, or easily avoidable by the user making an
explicit copy. I like giving the user control over whether a copy is
made or not, since I expect in the vast majority of cases a copy is
unnecessary.

I think it will be rare usage for people to hold on to the data array
("a" in the example above). Most of the time you create the MaskedArray
on data created on the spot which you never touch directly again. We are
all already used to numpy's "view" behavior (eg, for the np.array
function), where if you don't explicitly make a copy of your orginal
array you can expect further operations to modify it. Admittedly for
MaskedArray it's a bit different since apparently readonly operations
like np.dot can clobber, but again, it doesn't seem hard to know about
or burdensome to avoid by explicit copy, and can give big performance
advantages.

> 
>     ?If we agree that masked values will contain nonsense, it seems like a
>     bad idea for those values to be easily exposed.
> 
>     Further, in all the comments so far I have not seen an example of a need
>     for unmasking that is not more easily, efficiently and safely achieved
>     by simply creating a new MaskedArray with a different mask.
> 
> 
> My understanding is that essentially every low-level MaskedArray
> function is implemented by looking at the data and mask separately. If
> so, we should definitely expose this API directly to users (as part of
> the public API for MaskedArray), so they can write their own efficient
> algorithms.>
> As a concrete example, suppose I wanted to implement a low-level
> "grouped mean" operation for masked arrays like that found in pandas.
> This isn't a builtin NumPy function, so I would need to write this
> myself. This would be relatively straightforward to do in Numba or
> Cython with raw NumPy arrays (e.g., see my example here for a NaN
> skipping
> version:?https://github.com/shoyer/numbagg/blob/v0.1.0/numbagg/grouped.py),
> but to do it efficiently you definitely don't want to make an
> unnecessary copy.
> 
> The usual reason for hiding implementation details is when we want to
> reserve the right to change them. But if we're sure about the data model
> (which I think we are for MaskedArray) then I think there's a lot of
> value in exposing it directly to users, even if it's lower level than it
> appropriate to use in most cases.

Fair enough, I think it is all right to allow people access to ._data
and make some guarantees about it if they are implementing subclasses or
defining new ducktypes.

There should be a section in the documentation describing what
guanantees we make about ._data (or ._array if we change the name) and
how/when to use it.

Best,
Allan


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From sebastian at sipsolutions.net  Mon Jun 24 14:02:21 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 24 Jun 2019 11:02:21 -0700
Subject: [Numpy-discussion] NumPy Community Meeting Thursday, June 26
Message-ID: <7ee73632c863cda414b0a840b398856d68c290f2.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting on _Thursday_ June 26 at 11 am
Pacific Time. Everyone is invited to join in and edit the work-in-
progress meeting notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian


PS: We decided to move to Thursday, since at least Matti is traveling.
If you have something to discuss and Thursday is not good, we can do
Wednesday as well.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/4cf01bed/attachment-0001.sig>

From sebastian at sipsolutions.net  Mon Jun 24 14:04:21 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 24 Jun 2019 11:04:21 -0700
Subject: [Numpy-discussion] NumPy Community Meeting Thursday, June 27
In-Reply-To: <7ee73632c863cda414b0a840b398856d68c290f2.camel@sipsolutions.net>
References: <7ee73632c863cda414b0a840b398856d68c290f2.camel@sipsolutions.net>
Message-ID: <a51cf3fa1e26680d7ea0d3c4218d19d09c4914c0.camel@sipsolutions.net>

Sorry, Thursday is the 27th of course.


On Mon, 2019-06-24 at 11:02 -0700, Sebastian Berg wrote:
> Hi all,
> 
> There will be a NumPy Community meeting on _Thursday_ June 26 at 11
> am
> Pacific Time. Everyone is invited to join in and edit the work-in-
> progress meeting notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both
> 
> Best wishes
> 
> Sebastian
> 
> 
> PS: We decided to move to Thursday, since at least Matti is
> traveling.
> If you have something to discuss and Thursday is not good, we can do
> Wednesday as well.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/a82ae247/attachment.sig>

From m.h.vankerkwijk at gmail.com  Mon Jun 24 14:18:02 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 24 Jun 2019 14:18:02 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
Message-ID: <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>

Hi Stephan,

Yes, the complex conjugate dtype would make things a lot faster, but I
don't quite see why we would wait for that with introducing the `.H`
property.

I do agree that `.H` is the correct name, giving most immediate clarity
(i.e., people who know what conjugate transpose is, will recognize it,
while likely having to look up `.CT`, while people who do not know will
have to look up regardless). But at the same time agree that the docstring
and other documentation should start with "Conjugate tranpose" - good to
try to avoid using names of people where you have to be in the "in crowd"
to know what it means.

The above said, if we were going with the initial suggestion of `.MT` for
matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.

But it seems there is little interest in that suggestion, although sadly a
clear path forward has not yet emerged either.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/7c51ce4f/attachment.html>

From m.h.vankerkwijk at gmail.com  Mon Jun 24 15:09:00 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 24 Jun 2019 15:09:00 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
Message-ID: <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>

Hi Allan,

Thanks for bringing up the noclobber explicitly (and Stephan for asking for
clarification; I was similarly confused).

It does clarify the difference in mental picture. In mine, the operation
would indeed be guaranteed to be done on the underlying data, without copy
and without `.filled(...)`. I should clarify further that I use `where`
only to skip reading elements (i.e., in reductions), not writing elements
(as you mention, the unwritten element will often be nonsense - e.g., wrong
units - which to me is worse than infinity or something similar; I've not
worried at all about runtime warnings). Note that my main reason here is
not that I'm against filling with numbers for numerical arrays, but rather
wanting to make minimal assumptions about the underlying data itself. This
may well be a mistake (but I want to find out where it breaks).

Anyway, it would seem in many ways all the better that our models are quite
different. I definitely see the advantages of your choice to decide one can
do with masked data elements whatever is logical ahead of an operation!

Thanks also for bringing up a useful example with `np.dot(m, m)` - clearly,
I didn't yet get beyond overriding ufuncs!

In my mental model, where I'd apply `np.dot` on the data and the mask
separately, the result will be wrong, so the mask has to be set (which it
would be). For your specific example, that might not be the best solution,
but when using `np.dot(matrix_shaped, matrix_shaped)`, I think it does give
the correct masking: any masked element in a matrix better propagate to all
parts that it influences, even if there is a reduction of sorts happening.
So, perhaps a price to pay for a function that tries to do multiple things.

The alternative solution in my model would be to replace `np.dot` with a
masked-specific implementation of what `np.dot` is supposed to stand for
(in your simple example, `np.add.reduce(np.multiply(m, m))` - more
generally, add relevant `outer` and `axes`). This would be similar to what
I think all implementations do for `.mean()` - we cannot calculate that
from the data using any fill value or skipping, so rather use a more easily
cared-for `.sum()` and divide by a suitable number of elements. But in both
examples the disadvantage is that we took away the option to use the
underlying class's `.dot()` or `.mean()` implementations.

(Aside: considerations such as these underlie my longed-for exposure of
standard implementations of functions in terms of basic ufunc calls.)

Another example of a function for which I think my model is not
particularly insightful (and for which it is difficult to know what to do
generally) is `np.fft.fft`. Since an fft is equivalent to a sine/cosine
fits to data points, the answer for masked data is in principle quite
well-defined. But much less easy to implement!

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/a44b16f4/attachment.html>

From efiring at hawaii.edu  Mon Jun 24 16:40:20 2019
From: efiring at hawaii.edu (Eric Firing)
Date: Mon, 24 Jun 2019 10:40:20 -1000
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
Message-ID: <59f47ff7-3841-fa81-eac4-0033feebdc6f@hawaii.edu>

On 2019/06/24 9:09 AM, Marten van Kerkwijk wrote:
> Another example of a function for which I think my model is not 
> particularly insightful (and for which it is difficult to know what to 
> do generally) is `np.fft.fft`. Since an fft is equivalent to a 
> sine/cosine fits to data points, the answer for masked data is in 
> principle quite well-defined. But much less easy to implement!

How is it well-defined?  If you have N points of which M are masked, you 
have a vector with N-M pieces of information in the time domain.  That 
can't be *uniquely* mapped to N points in the frequency domain.  I think 
fft applied to a MaskedArray input should raise an exception, at least 
if there are any masked points.  It must be left to the user to decide 
how to handle the masked points, e.g. fill with zero, or with the mean, 
or with linear interpolation, etc.

Eric

From m.h.vankerkwijk at gmail.com  Mon Jun 24 17:39:00 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 24 Jun 2019 17:39:00 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <59f47ff7-3841-fa81-eac4-0033feebdc6f@hawaii.edu>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
 <59f47ff7-3841-fa81-eac4-0033feebdc6f@hawaii.edu>
Message-ID: <CAJNV+9u+1UrL0Bj+09LkZFpGo0-aFXbx9JoSe=hq=0vQGDzk0A@mail.gmail.com>

Hi Eric,

The easiest definitely is for the mask to just propagate, which that even
if just one point is masked, all points in the fft will be masked.

On the direct point I made, I think it is correct that since one can think
of the Fourier transform of a sine/cosine fit, then there is a solution
even in the presence of some masked data, and this solution is distinct
from that for a specific choice of fill value. But of course it is also
true that the solution will be at least partially degenerate in its result
and possibly indeterminate (e.g., for the extreme example of a real
transform for which all but the first point are masked, all cosine term
amplitudes are equal to the value of the first term, and are completely
degenerate with each other, and all sine term amplitudes are indeterminate;
one has only one piece of information, after all). Yet the inverse of any
of those choices reproduces the input. That said, clearly there is a choice
to be made whether this solution is at all interesting, which means that
you are right that it needs an explicit user decision.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/f809c57c/attachment.html>

From warren.weckesser at gmail.com  Mon Jun 24 17:50:35 2019
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Mon, 24 Jun 2019 17:50:35 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9u+1UrL0Bj+09LkZFpGo0-aFXbx9JoSe=hq=0vQGDzk0A@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
 <59f47ff7-3841-fa81-eac4-0033feebdc6f@hawaii.edu>
 <CAJNV+9u+1UrL0Bj+09LkZFpGo0-aFXbx9JoSe=hq=0vQGDzk0A@mail.gmail.com>
Message-ID: <CAGzF1ud83=_CoA0M-PjBxU5p0TZU=YWyy+ZsqDJTVX7GOdVt-A@mail.gmail.com>

On 6/24/19, Marten van Kerkwijk <m.h.vankerkwijk at gmail.com> wrote:
> Hi Eric,
>
> The easiest definitely is for the mask to just propagate, which that even
> if just one point is masked, all points in the fft will be masked.
>
> On the direct point I made, I think it is correct that since one can think
> of the Fourier transform of a sine/cosine fit, then there is a solution
> even in the presence of some masked data, and this solution is distinct
> from that for a specific choice of fill value. But of course it is also
> true that the solution will be at least partially degenerate in its result
> and possibly indeterminate (e.g., for the extreme example of a real
> transform for which all but the first point are masked, all cosine term
> amplitudes are equal to the value of the first term, and are completely
> degenerate with each other, and all sine term amplitudes are indeterminate;
> one has only one piece of information, after all). Yet the inverse of any
> of those choices reproduces the input. That said, clearly there is a choice
> to be made whether this solution is at all interesting, which means that
> you are right that it needs an explicit user decision.
>

FWIW: The discrete Fourier transform is equivalent to a matrix
multiplication (https://en.wikipedia.org/wiki/DFT_matrix,
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.dft.html),
so whatever behavior you define for a nonmasked array times a masked
vector also applies to the FFT of a masked vector.

Warren


> All the best,
>
> Marten
>

From efiring at hawaii.edu  Mon Jun 24 17:52:59 2019
From: efiring at hawaii.edu (Eric Firing)
Date: Mon, 24 Jun 2019 11:52:59 -1000
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9u+1UrL0Bj+09LkZFpGo0-aFXbx9JoSe=hq=0vQGDzk0A@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
 <59f47ff7-3841-fa81-eac4-0033feebdc6f@hawaii.edu>
 <CAJNV+9u+1UrL0Bj+09LkZFpGo0-aFXbx9JoSe=hq=0vQGDzk0A@mail.gmail.com>
Message-ID: <737698a3-e5b7-66f6-3482-4ee4132e1077@hawaii.edu>

On 2019/06/24 11:39 AM, Marten van Kerkwijk wrote:
> Hi Eric,
> 
> The easiest definitely is for the mask to just propagate, which that 
> even if just one point is masked, all points in the fft will be masked.

This is perfectly reasonable, and consistent with what happens with 
nans, of course.  My suggestion of raising an Exception is probably not 
a good idea, as I realized shortly after sending the message.

As a side note, I am happy to see the current burst of effort toward 
improved MaskedArray functionality, and very grateful for the work you, 
Allan, and others are doing in that regard.

Eric


> 
> On the direct point I made, I think it is correct that since one can 
> think of the Fourier transform of a sine/cosine fit, then there is a 
> solution even in the presence of some masked data, and this solution is 
> distinct from that for a specific choice of fill value. But of course it 
> is also true that the solution will be at least partially degenerate in 
> its result and possibly indeterminate (e.g., for the extreme example of 
> a real transform for which all but the first point are masked, all 
> cosine term amplitudes are equal to the value of the first term, and are 
> completely degenerate with each other, and all sine term amplitudes are 
> indeterminate; one has only one piece of information, after all). Yet 
> the inverse of any of those choices reproduces the input. That said, 
> clearly there is a choice to be made whether this solution is at all 
> interesting, which means that you are right that it needs an explicit 
> user decision.
> 
> All the best,
> 
> Marten
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From charlesr.harris at gmail.com  Mon Jun 24 17:57:40 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 24 Jun 2019 15:57:40 -0600
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9u+1UrL0Bj+09LkZFpGo0-aFXbx9JoSe=hq=0vQGDzk0A@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
 <59f47ff7-3841-fa81-eac4-0033feebdc6f@hawaii.edu>
 <CAJNV+9u+1UrL0Bj+09LkZFpGo0-aFXbx9JoSe=hq=0vQGDzk0A@mail.gmail.com>
Message-ID: <CAB6mnxLEEapqEKkQ=ttok5sqSW_M+2k2TL+f-pxDSmQp7-3VrQ@mail.gmail.com>

On Mon, Jun 24, 2019 at 3:40 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Eric,
>
> The easiest definitely is for the mask to just propagate, which that even
> if just one point is masked, all points in the fft will be masked.
>
> On the direct point I made, I think it is correct that since one can think
> of the Fourier transform of a sine/cosine fit, then there is a solution
> even in the presence of some masked data, and this solution is distinct
> from that for a specific choice of fill value. But of course it is also
> true that the solution will be at least partially degenerate in its result
> and possibly indeterminate (e.g., for the extreme example of a real
> transform for which all but the first point are masked, all cosine term
> amplitudes are equal to the value of the first term, and are completely
> degenerate with each other, and all sine term amplitudes are indeterminate;
> one has only one piece of information, after all). Yet the inverse of any
> of those choices reproduces the input. That said, clearly there is a choice
> to be made whether this solution is at all interesting, which means that
> you are right that it needs an explicit user decision.
>
>
Might be a good fit with the NUFFT
<https://jakevdp.github.io/blog/2015/02/24/optimizing-python-with-numpy-and-numba/>
.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/28a32b6b/attachment-0001.html>

From ilhanpolat at gmail.com  Mon Jun 24 18:41:12 2019
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Tue, 25 Jun 2019 00:41:12 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
Message-ID: <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>

I think enumerating the cases along the way makes it a bit more tangible
for the discussion


import numpy as np
z = 1+1j
z.conjugate()  # 1-1j

zz = np.array(z)
zz  # array(1+1j)
zz.T  # array(1+1j)  # OK expected.
zz.conj()  # 1-1j ?? what happened; no arrays?
zz.conjugate()  # 1-1j ?? same

zz1d = np.array([z]*3)
zz1d.T  # no change so this is not the regular 2D array
zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None] is
known

zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None] is
known
zz2d.conj()  # 2D col vec conjugated
zz2d.conj().T  # 2D col vec conjugated transposed

zz3d = np.arange(24.).reshape(2,3,4).view(complex)
zz3d.conj()  # no surprises, conjugated
zz3d.conj().T  # ?? Why not the last two dims swapped like other stacked ops

# For scalar arrays conjugation strips the number
# For 1D arrays transpose is a no-op but conjugation works
# For 2D arrays conjugate it is the matlab's elementwise conjugation op .'
#     and transpose is acting like expected
# For 3D arrays conjugate it is the matlab's elementwise conjugation op .'
#     but transpose is the reversing all dims just like matlab's permute()
#     with static dimorder.

and so on. Maybe we can try to identify all the use cases and the quirks
before we can make design the solution. Because these are a bit more
involved and I don't even know if this is exhaustive.


On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Stephan,
>
> Yes, the complex conjugate dtype would make things a lot faster, but I
> don't quite see why we would wait for that with introducing the `.H`
> property.
>
> I do agree that `.H` is the correct name, giving most immediate clarity
> (i.e., people who know what conjugate transpose is, will recognize it,
> while likely having to look up `.CT`, while people who do not know will
> have to look up regardless). But at the same time agree that the docstring
> and other documentation should start with "Conjugate tranpose" - good to
> try to avoid using names of people where you have to be in the "in crowd"
> to know what it means.
>
> The above said, if we were going with the initial suggestion of `.MT` for
> matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.
>
> But it seems there is little interest in that suggestion, although sadly a
> clear path forward has not yet emerged either.
>
> All the best,
>
> Marten
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/2880dab7/attachment.html>

From allanhaldane at gmail.com  Mon Jun 24 18:55:10 2019
From: allanhaldane at gmail.com (Allan Haldane)
Date: Mon, 24 Jun 2019 18:55:10 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
Message-ID: <1d4580bc-bea6-6d83-66d4-f96088bfa511@gmail.com>

On 6/24/19 3:09 PM, Marten van Kerkwijk wrote:
> Hi Allan,
> 
> Thanks for bringing up the noclobber explicitly (and Stephan for asking
> for clarification; I was similarly confused).
> 
> It does clarify the difference in mental picture. In mine, the operation
> would indeed be guaranteed to be done on the underlying data, without
> copy and without `.filled(...)`. I should clarify further that I use
> `where` only to skip reading elements (i.e., in reductions), not writing
> elements (as you mention, the unwritten element will often be nonsense -
> e.g., wrong units - which to me is worse than infinity or something
> similar; I've not worried at all about runtime warnings). Note that my
> main reason here is not that I'm against filling with numbers for
> numerical arrays, but rather wanting to make minimal assumptions about
> the underlying data itself. This may well be a mistake (but I want to
> find out where it breaks).
> 
> Anyway, it would seem in many ways all the better that our models are
> quite different. I definitely see the advantages of your choice to
> decide one can do with masked data elements whatever is logical ahead of
> an operation!
> 
> Thanks also for bringing up a useful example with `np.dot(m, m)` -
> clearly, I didn't yet get beyond overriding ufuncs!
> 
> In my mental model, where I'd apply `np.dot` on the data and the mask
> separately, the result will be wrong, so the mask has to be set (which
> it would be). For your specific example, that might not be the best
> solution, but when using `np.dot(matrix_shaped, matrix_shaped)`, I think
> it does give the correct masking: any masked element in a matrix better
> propagate to all parts that it influences, even if there is a reduction
> of sorts happening. So, perhaps a price to pay for a function that tries
> to do multiple things.
> 
> The alternative solution in my model would be to replace `np.dot` with a
> masked-specific implementation of what `np.dot` is supposed to stand for
> (in your simple example, `np.add.reduce(np.multiply(m, m))` - more
> generally, add relevant `outer` and `axes`). This would be similar to
> what I think all implementations do for `.mean()` - we cannot calculate
> that from the data using any fill value or skipping, so rather use a
> more easily cared-for `.sum()` and divide by a suitable number of
> elements. But in both examples the disadvantage is that we took away the
> option to use the underlying class's `.dot()` or `.mean()` implementations.

Just to note, my current implementation uses the IGNORE style of mask,
so does not propagate the mask in np.dot:

    >>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]])
    >>> np.dot(a, a)

    MaskedArray([[3, 2, 3],
                 [2, 2, 2],
                 [3, 2, 3]])

I'm not at all set on that behavior and we can do something else. For
now, I chose this way since it seemed to best match the "IGNORE" mask
behavior.

The behavior you described further above where the output row/col would
be masked corresponds better to "NA" (propagating) mask behavior, which
I am leaving for later implementation.

best,
Allan

> 
> (Aside: considerations such as these underlie my longed-for exposure of
> standard implementations of functions in terms of basic ufunc calls.)
> 
> Another example of a function for which I think my model is not
> particularly insightful (and for which it is difficult to know what to
> do generally) is `np.fft.fft`. Since an fft is equivalent to a
> sine/cosine fits to data points, the answer for masked data is in
> principle quite well-defined. But much less easy to implement!
> 
> All the best,
> 
> Marten
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


From shoyer at gmail.com  Mon Jun 24 19:21:17 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 24 Jun 2019 16:21:17 -0700
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <1d4580bc-bea6-6d83-66d4-f96088bfa511@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
 <1d4580bc-bea6-6d83-66d4-f96088bfa511@gmail.com>
Message-ID: <CAEQ_Tveys4Y9J5i+F+HZKAMAvA9pe5BD1Zm5p25F+RpTgyoQPQ@mail.gmail.com>

On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane <allanhaldane at gmail.com>
wrote:

> I'm not at all set on that behavior and we can do something else. For
> now, I chose this way since it seemed to best match the "IGNORE" mask
> behavior.
>
> The behavior you described further above where the output row/col would
> be masked corresponds better to "NA" (propagating) mask behavior, which
> I am leaving for later implementation.


This does seem like a clean way to *implement* things, but from a user
perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
masks.

I tend to think of "IGNORE" vs "NA" as descriptions of particular
operations rather than the data itself. There are a spectrum of ways to
handle missing data, and the right way to propagating missing values is
often highly context dependent. The right way to set this is in functions
where operations are defined, not on classes that may be defined far away
from where the computation happen. For example, pandas has a "min_count"
parameter in functions for intermediate use-cases between "IGNORE" and "NA"
semantics, e.g., "take an average, unless the number of data points is
fewer than min_count."

Are there examples of existing projects that define separate user-facing
types for different styles of masks?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/9d23900e/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Mon Jun 24 19:54:09 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 24 Jun 2019 19:54:09 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <1d4580bc-bea6-6d83-66d4-f96088bfa511@gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
 <1d4580bc-bea6-6d83-66d4-f96088bfa511@gmail.com>
Message-ID: <CAJNV+9swTXcZt0ua-4QqDDnk0nhOpK+tFi4tVYHPv4atb+oVGQ@mail.gmail.com>

Hi Allan,

> The alternative solution in my model would be to replace `np.dot` with a
> > masked-specific implementation of what `np.dot` is supposed to stand for
> > (in your simple example, `np.add.reduce(np.multiply(m, m))` - more
> > generally, add relevant `outer` and `axes`). This would be similar to
> > what I think all implementations do for `.mean()` - we cannot calculate
> > that from the data using any fill value or skipping, so rather use a
> > more easily cared-for `.sum()` and divide by a suitable number of
> > elements. But in both examples the disadvantage is that we took away the
> > option to use the underlying class's `.dot()` or `.mean()`
> implementations.
>
> Just to note, my current implementation uses the IGNORE style of mask,
> so does not propagate the mask in np.dot:
>
>     >>> a = MaskedArray([[1,1,1], [1,X,1], [1,1,1]])
>     >>> np.dot(a, a)
>
>     MaskedArray([[3, 2, 3],
>                  [2, 2, 2],
>                  [3, 2, 3]])
>
> I'm not at all set on that behavior and we can do something else. For
> now, I chose this way since it seemed to best match the "IGNORE" mask
> behavior.
>

It is a nice example, I think. In terms of action on the data, one would
get this result as well in my pseudo-representation of
`np.add.reduce(np.multiply(m, m))` - as long as the multiply is taken as
outer product along the relevant axes (which does not ignore the mask,
i.e., if either element is masked, the product is too), and subsequently a
sum which works like other reductions and skips masked elements.

>From the FFT array multiplication analogy, though, it is not clear that,
effectively, replacing masked elements by 0 is reasonable.

Equivalently, thinking of `np.dot` in its 1-D form as presenting the length
of the projection of one vector along another, it is not clear what a
single masked element is supposed to mean. In a way, masking just one
element of a vector or of a matrix makes vector or matrix operations
meaningless.

I thought fitting data with a mask might give a counterexample, but in that
one usually calculates at some point r = y - A x, so no masking of the
matrix, and subtraction y-Ax passing on a mask, and summing of r ignoring
masked elements does just the right thing.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/4c163dd5/attachment.html>

From m.h.vankerkwijk at gmail.com  Mon Jun 24 20:34:34 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Mon, 24 Jun 2019 20:34:34 -0400
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAEQ_Tveys4Y9J5i+F+HZKAMAvA9pe5BD1Zm5p25F+RpTgyoQPQ@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
 <1d4580bc-bea6-6d83-66d4-f96088bfa511@gmail.com>
 <CAEQ_Tveys4Y9J5i+F+HZKAMAvA9pe5BD1Zm5p25F+RpTgyoQPQ@mail.gmail.com>
Message-ID: <CAJNV+9tQq1m6zjFEdvwJgYN_gwmSTA2nF68f7JWL_vYFsU+kWg@mail.gmail.com>

On Mon, Jun 24, 2019 at 7:21 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane <allanhaldane at gmail.com>
> wrote:
>
>> I'm not at all set on that behavior and we can do something else. For
>> now, I chose this way since it seemed to best match the "IGNORE" mask
>> behavior.
>>
>> The behavior you described further above where the output row/col would
>> be masked corresponds better to "NA" (propagating) mask behavior, which
>> I am leaving for later implementation.
>
>
> This does seem like a clean way to *implement* things, but from a user
> perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
> masks.
>
> I tend to think of "IGNORE" vs "NA" as descriptions of particular
> operations rather than the data itself. There are a spectrum of ways to
> handle missing data, and the right way to propagating missing values is
> often highly context dependent. The right way to set this is in functions
> where operations are defined, not on classes that may be defined far away
> from where the computation happen. For example, pandas has a "min_count"
> parameter in functions for intermediate use-cases between "IGNORE" and "NA"
> semantics, e.g., "take an average, unless the number of data points is
> fewer than min_count."
>

Anything that specific like that is probably indeed outside of the purview
of a MaskedArray class.

But your general point is well taken: we really need to ask clearly what
the mask means not in terms of operations but conceptually.

Personally, I guess like Benjamin I have mostly thought of it as "data here
is bad" (because corrupted, etc.) or "data here is irrelevant" (because of
sea instead of land in a map). And I would like to proceed nevertheless
with calculating things on the remainder. For an expectation value (or,
less obviously, a minimum or maximum), this is mostly OK: just ignore the
masked elements. But even for something as simple as a sum, what is correct
is not obvious: if I ignore the count, I'm effectively assuming the
expectation is symmetric around zero (this is why `vector.dot(vector)`
fails); a better estimate would be `np.add.reduce(data, where=~mask) *
N(total) / N(unmasked)`.

Of course, the logical conclusion would be that this is not possible to do
without guidance from the user, or, thinking more, that really a masked
array is not at all what I want for this problem; really I am just using
(1-mask) as a weight, and the sum of the weights matters, so I should have
a WeightArray class where that is returned along with the sum of the data
(or, a bit less extreme, a `CountArray` class, or, more extreme, a value
and its uncertainty - heck, sounds a lot like my Variable class from 4
years ago, https://github.com/astropy/astropy/pull/3715, which even takes
care of covariance [following the Uncertainty package]).

OK, it seems I've definitely worked myself in a corner tonight where I'm
not sure any more what a masked array is good for in the first place...
I'll stop for the day!

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/ed9f8869/attachment.html>

From shoyer at gmail.com  Mon Jun 24 21:03:38 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 24 Jun 2019 18:03:38 -0700
Subject: [Numpy-discussion] new MaskedArray class
In-Reply-To: <CAJNV+9tQq1m6zjFEdvwJgYN_gwmSTA2nF68f7JWL_vYFsU+kWg@mail.gmail.com>
References: <407a8a7a-0c33-46d4-dc15-d4c660dd8a2c@gmail.com>
 <CAJNV+9sGJ=P6VdE8=as2+VbZEHHJZj_QUrj_1_ixBV41hQk5Sg@mail.gmail.com>
 <168aacc7-a217-5635-1c1c-2b8ba1c1d9f1@gmail.com>
 <CAJNV+9svQukg9fScm2vCyvDR5+wAbzm4t68UfqzUfZ7-bWQezQ@mail.gmail.com>
 <6b9ebebe-e125-03d7-09ee-708cb882fbe3@gmail.com>
 <CAJNV+9sB7swmKfRtGpPDZqy9SzMKxawb80L48hp=6kBGXFqGSA@mail.gmail.com>
 <41f1e983-503f-5efc-9d9c-abe49d701a7a@gmail.com>
 <CANNq6FkTx6S80WBEJ48ktU7x8KPrO+5cHQ=-fEaRpqrTmQE84w@mail.gmail.com>
 <02276f62-a133-a026-b99d-280cd860b77e@gmail.com>
 <CAJNV+9unBrWKA1sG=ZhhGpJFo9bPMhJm2ynPjjw4-qy_nKm+2g@mail.gmail.com>
 <67ff789a-292e-8604-be0e-0e652e257762@gmail.com>
 <CAEQ_TvdbfcG4GQsViLNzxfdpUUL5QV208ndsi9SHXkzr18JYqQ@mail.gmail.com>
 <d5f9ae17-516a-3d4e-4082-a276d23538d0@gmail.com>
 <CAJNV+9tHypZDurAwGhhp7pUYae4iFYXvNSKk0nxAZfeahej8VA@mail.gmail.com>
 <1d4580bc-bea6-6d83-66d4-f96088bfa511@gmail.com>
 <CAEQ_Tveys4Y9J5i+F+HZKAMAvA9pe5BD1Zm5p25F+RpTgyoQPQ@mail.gmail.com>
 <CAJNV+9tQq1m6zjFEdvwJgYN_gwmSTA2nF68f7JWL_vYFsU+kWg@mail.gmail.com>
Message-ID: <CAEQ_TvcWde2AUuNn221Ao7vvsc-3xgF6SD8WydVZHG0aFMf5AA@mail.gmail.com>

On Mon, Jun 24, 2019 at 5:36 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

>
>
> On Mon, Jun 24, 2019 at 7:21 PM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Mon, Jun 24, 2019 at 3:56 PM Allan Haldane <allanhaldane at gmail.com>
>> wrote:
>>
>>> I'm not at all set on that behavior and we can do something else. For
>>> now, I chose this way since it seemed to best match the "IGNORE" mask
>>> behavior.
>>>
>>> The behavior you described further above where the output row/col would
>>> be masked corresponds better to "NA" (propagating) mask behavior, which
>>> I am leaving for later implementation.
>>
>>
>> This does seem like a clean way to *implement* things, but from a user
>> perspective I'm not sure I would want separate classes for "IGNORE" vs "NA"
>> masks.
>>
>> I tend to think of "IGNORE" vs "NA" as descriptions of particular
>> operations rather than the data itself. There are a spectrum of ways to
>> handle missing data, and the right way to propagating missing values is
>> often highly context dependent. The right way to set this is in functions
>> where operations are defined, not on classes that may be defined far away
>> from where the computation happen. For example, pandas has a "min_count"
>> parameter in functions for intermediate use-cases between "IGNORE" and "NA"
>> semantics, e.g., "take an average, unless the number of data points is
>> fewer than min_count."
>>
>
> Anything that specific like that is probably indeed outside of the purview
> of a MaskedArray class.
>

I agree that it doesn't make much sense to have a "min_count" attribute on
a MaskedArray class, but certainly it makes sense for operations on
MaskedArray objects, e.g., to write something like
masked_array.mean(min_count=10). This is what users do in pandas today.


> But your general point is well taken: we really need to ask clearly what
> the mask means not in terms of operations but conceptually.
>
> Personally, I guess like Benjamin I have mostly thought of it as "data
> here is bad" (because corrupted, etc.) or "data here is irrelevant"
> (because of sea instead of land in a map). And I would like to proceed
> nevertheless with calculating things on the remainder. For an expectation
> value (or, less obviously, a minimum or maximum), this is mostly OK: just
> ignore the masked elements. But even for something as simple as a sum, what
> is correct is not obvious: if I ignore the count, I'm effectively assuming
> the expectation is symmetric around zero (this is why `vector.dot(vector)`
> fails); a better estimate would be `np.add.reduce(data, where=~mask) *
> N(total) / N(unmasked)`.
>

I think it's fine and logical to define default semantics for operations on
MaskedArray objects. Much of the time, replacing masked values with 0 is
the right thing to do for sum. Certainly IGNORE semantics are more useful
overall than the NA semantics.

But even if a MaskedArray conceptually always represents "bad" or
"irrelevant" data, the way to handle those missing values will differ based
on the use case, and not everything will fall cleanly into either IGNORE or
NA buckets. I think it makes sense to provide users with functions/methods
that expose these options, rather than requiring that they convert their
data into a different type MaskedArray.

"It is better to have 100 functions operate on one data structure than 10
functions on 10 data structures." ?Alan Perlis
https://stackoverflow.com/questions/6016271/why-is-it-better-to-have-100-functions-operate-on-one-data-structure-than-10-fun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/041680d4/attachment-0001.html>

From cameronjblocker at gmail.com  Mon Jun 24 22:29:01 2019
From: cameronjblocker at gmail.com (Cameron Blocker)
Date: Mon, 24 Jun 2019 22:29:01 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
Message-ID: <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>

I would love for there to be .H property. I have .conj().T in almost every
math function that I write so that it will be general enough for complex
numbers.

Besides being less readable, what puts me in a bind is trying to
accommodate LinearOperator/LinearMap like duck type objects in place of
matrix inputs, such as an object that does an FFT, but acts like a matrix
and supports @. For my objects to work in my code, I have to create .conj()
and .T methods which are not as simple as defining .H (the adjoint) for say
an FFT. Sometimes I just define .T to be the adjoint/conjugate transpose,
and .conj() to do nothing so it will work with my code and I can avoid
making useless objects along the way, but then I am in a weird state where
np.asarray(A).T != np.asarray(A.T).

In my opinion, the matrix transpose operator and the conjugate transpose
operator should be one and the same. Something nice about both Julia and
MATLAB is that it takes more keystrokes to do a regular transpose instead
of a conjugate transpose. Then people who work exclusively with real
numbers can just forget that it's a conjugate transpose, and for relatively
simple algorithms, their code will just work with complex numbers with
little modification.

Ideally, I'd like to see a .H that was the defacto Matrix/Linear
Algebra/Conjugate transpose that for 2 or more dimensions, conjugate
transposes the last two dimensions and for 1 dimension just conjugates (if
necessary). And then .T can stay the Array/Tensor transpose for general
axis manipulation. I'd be okay with .T raising an error/warning on 1D
arrays if .H did not. I commonly write things like u.conj().T at v even if I
know both u and v are 1D just so it looks more like an inner product.

-Cameron

On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:

> I think enumerating the cases along the way makes it a bit more tangible
> for the discussion
>
>
> import numpy as np
> z = 1+1j
> z.conjugate()  # 1-1j
>
> zz = np.array(z)
> zz  # array(1+1j)
> zz.T  # array(1+1j)  # OK expected.
> zz.conj()  # 1-1j ?? what happened; no arrays?
> zz.conjugate()  # 1-1j ?? same
>
> zz1d = np.array([z]*3)
> zz1d.T  # no change so this is not the regular 2D array
> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
> zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None] is
> known
>
> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None] is
> known
> zz2d.conj()  # 2D col vec conjugated
> zz2d.conj().T  # 2D col vec conjugated transposed
>
> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
> zz3d.conj()  # no surprises, conjugated
> zz3d.conj().T  # ?? Why not the last two dims swapped like other stacked
> ops
>
> # For scalar arrays conjugation strips the number
> # For 1D arrays transpose is a no-op but conjugation works
> # For 2D arrays conjugate it is the matlab's elementwise conjugation op .'
> #     and transpose is acting like expected
> # For 3D arrays conjugate it is the matlab's elementwise conjugation op .'
> #     but transpose is the reversing all dims just like matlab's permute()
> #     with static dimorder.
>
> and so on. Maybe we can try to identify all the use cases and the quirks
> before we can make design the solution. Because these are a bit more
> involved and I don't even know if this is exhaustive.
>
>
> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Hi Stephan,
>>
>> Yes, the complex conjugate dtype would make things a lot faster, but I
>> don't quite see why we would wait for that with introducing the `.H`
>> property.
>>
>> I do agree that `.H` is the correct name, giving most immediate clarity
>> (i.e., people who know what conjugate transpose is, will recognize it,
>> while likely having to look up `.CT`, while people who do not know will
>> have to look up regardless). But at the same time agree that the docstring
>> and other documentation should start with "Conjugate tranpose" - good to
>> try to avoid using names of people where you have to be in the "in crowd"
>> to know what it means.
>>
>> The above said, if we were going with the initial suggestion of `.MT` for
>> matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.
>>
>> But it seems there is little interest in that suggestion, although sadly
>> a clear path forward has not yet emerged either.
>>
>> All the best,
>>
>> Marten
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190624/85a74b2b/attachment.html>

From deak.andris at gmail.com  Tue Jun 25 04:54:55 2019
From: deak.andris at gmail.com (Andras Deak)
Date: Tue, 25 Jun 2019 10:54:55 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
Message-ID: <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>

On Tue, Jun 25, 2019 at 4:29 AM Cameron Blocker
<cameronjblocker at gmail.com> wrote:
>
> In my opinion, the matrix transpose operator and the conjugate transpose operator should be one and the same. Something nice about both Julia and MATLAB is that it takes more keystrokes to do a regular transpose instead of a conjugate transpose. Then people who work exclusively with real numbers can just forget that it's a conjugate transpose, and for relatively simple algorithms, their code will just work with complex numbers with little modification.
>

I'd argue that MATLAB's feature of `'` meaning adjoint (conjugate
transpose etc.) and `.'` meaning regular transpose causes a lot of
confusion and probably a lot of subtle bugs. Most people are unaware
that `'` does a conjugate transpose and use it habitually, and when
for once they have a complex array they don't understand why the
values are off (assuming they even notice). Even the MATLAB docs
conflate the two operations occasionally, which doesn't help at all.
Transpose should _not_ incur conjugation automatically. I'm already a
bit wary of special-casing matrix dynamics this much, when ndarrays
are naturally multidimensional objects. Making transposes be more than
transposes would be a huge mistake in my opinion, already for matrices
(2d arrays) and especially for everything else.

Andr?s


> Ideally, I'd like to see a .H that was the defacto Matrix/Linear Algebra/Conjugate transpose that for 2 or more dimensions, conjugate transposes the last two dimensions and for 1 dimension just conjugates (if necessary). And then .T can stay the Array/Tensor transpose for general axis manipulation. I'd be okay with .T raising an error/warning on 1D arrays if .H did not. I commonly write things like u.conj().T at v even if I know both u and v are 1D just so it looks more like an inner product.
>
> -Cameron
>
> On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>>
>> I think enumerating the cases along the way makes it a bit more tangible for the discussion
>>
>>
>> import numpy as np
>> z = 1+1j
>> z.conjugate()  # 1-1j
>>
>> zz = np.array(z)
>> zz  # array(1+1j)
>> zz.T  # array(1+1j)  # OK expected.
>> zz.conj()  # 1-1j ?? what happened; no arrays?
>> zz.conjugate()  # 1-1j ?? same
>>
>> zz1d = np.array([z]*3)
>> zz1d.T  # no change so this is not the regular 2D array
>> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None] is known
>>
>> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None] is known
>> zz2d.conj()  # 2D col vec conjugated
>> zz2d.conj().T  # 2D col vec conjugated transposed
>>
>> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
>> zz3d.conj()  # no surprises, conjugated
>> zz3d.conj().T  # ?? Why not the last two dims swapped like other stacked ops
>>
>> # For scalar arrays conjugation strips the number
>> # For 1D arrays transpose is a no-op but conjugation works
>> # For 2D arrays conjugate it is the matlab's elementwise conjugation op .'
>> #     and transpose is acting like expected
>> # For 3D arrays conjugate it is the matlab's elementwise conjugation op .'
>> #     but transpose is the reversing all dims just like matlab's permute()
>> #     with static dimorder.
>>
>> and so on. Maybe we can try to identify all the use cases and the quirks before we can make design the solution. Because these are a bit more involved and I don't even know if this is exhaustive.
>>
>>
>> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <m.h.vankerkwijk at gmail.com> wrote:
>>>
>>> Hi Stephan,
>>>
>>> Yes, the complex conjugate dtype would make things a lot faster, but I don't quite see why we would wait for that with introducing the `.H` property.
>>>
>>> I do agree that `.H` is the correct name, giving most immediate clarity (i.e., people who know what conjugate transpose is, will recognize it, while likely having to look up `.CT`, while people who do not know will have to look up regardless). But at the same time agree that the docstring and other documentation should start with "Conjugate tranpose" - good to try to avoid using names of people where you have to be in the "in crowd" to know what it means.
>>>
>>> The above said, if we were going with the initial suggestion of `.MT` for matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.
>>>
>>> But it seems there is little interest in that suggestion, although sadly a clear path forward has not yet emerged either.
>>>
>>> All the best,
>>>
>>> Marten
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From ilhanpolat at gmail.com  Tue Jun 25 07:03:11 2019
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Tue, 25 Jun 2019 13:03:11 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
Message-ID: <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>

I have to disagree, I hardly ever saw such bugs and moreover <Zbar, Z> is
not compatible if you don't also transpose it but expected in almost all
contexts of matrices, vectors and scalars. Elementwise conjugation is well
inline with other elementwise operations starting with a dot in matlab
hence still consistent.

I would still expect an conjugation+transposition to be the default since
just transposing a complex array is way more special and rare than its
ubiquitous regular usage.

ilhan


On Tue, Jun 25, 2019 at 10:57 AM Andras Deak <deak.andris at gmail.com> wrote:

> On Tue, Jun 25, 2019 at 4:29 AM Cameron Blocker
> <cameronjblocker at gmail.com> wrote:
> >
> > In my opinion, the matrix transpose operator and the conjugate transpose
> operator should be one and the same. Something nice about both Julia and
> MATLAB is that it takes more keystrokes to do a regular transpose instead
> of a conjugate transpose. Then people who work exclusively with real
> numbers can just forget that it's a conjugate transpose, and for relatively
> simple algorithms, their code will just work with complex numbers with
> little modification.
> >
>
> I'd argue that MATLAB's feature of `'` meaning adjoint (conjugate
> transpose etc.) and `.'` meaning regular transpose causes a lot of
> confusion and probably a lot of subtle bugs. Most people are unaware
> that `'` does a conjugate transpose and use it habitually, and when
> for once they have a complex array they don't understand why the
> values are off (assuming they even notice). Even the MATLAB docs
> conflate the two operations occasionally, which doesn't help at all.
> Transpose should _not_ incur conjugation automatically. I'm already a
> bit wary of special-casing matrix dynamics this much, when ndarrays
> are naturally multidimensional objects. Making transposes be more than
> transposes would be a huge mistake in my opinion, already for matrices
> (2d arrays) and especially for everything else.
>
> Andr?s
>
>
>
> > Ideally, I'd like to see a .H that was the defacto Matrix/Linear
> Algebra/Conjugate transpose that for 2 or more dimensions, conjugate
> transposes the last two dimensions and for 1 dimension just conjugates (if
> necessary). And then .T can stay the Array/Tensor transpose for general
> axis manipulation. I'd be okay with .T raising an error/warning on 1D
> arrays if .H did not. I commonly write things like u.conj().T at v even if I
> know both u and v are 1D just so it looks more like an inner product.
> >
> > -Cameron
> >
> > On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <ilhanpolat at gmail.com>
> wrote:
> >>
> >> I think enumerating the cases along the way makes it a bit more
> tangible for the discussion
> >>
> >>
> >> import numpy as np
> >> z = 1+1j
> >> z.conjugate()  # 1-1j
> >>
> >> zz = np.array(z)
> >> zz  # array(1+1j)
> >> zz.T  # array(1+1j)  # OK expected.
> >> zz.conj()  # 1-1j ?? what happened; no arrays?
> >> zz.conjugate()  # 1-1j ?? same
> >>
> >> zz1d = np.array([z]*3)
> >> zz1d.T  # no change so this is not the regular 2D array
> >> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
> >> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
> >> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
> >> zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None] is
> known
> >>
> >> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None] is
> known
> >> zz2d.conj()  # 2D col vec conjugated
> >> zz2d.conj().T  # 2D col vec conjugated transposed
> >>
> >> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
> >> zz3d.conj()  # no surprises, conjugated
> >> zz3d.conj().T  # ?? Why not the last two dims swapped like other
> stacked ops
> >>
> >> # For scalar arrays conjugation strips the number
> >> # For 1D arrays transpose is a no-op but conjugation works
> >> # For 2D arrays conjugate it is the matlab's elementwise conjugation op
> .'
> >> #     and transpose is acting like expected
> >> # For 3D arrays conjugate it is the matlab's elementwise conjugation op
> .'
> >> #     but transpose is the reversing all dims just like matlab's
> permute()
> >> #     with static dimorder.
> >>
> >> and so on. Maybe we can try to identify all the use cases and the
> quirks before we can make design the solution. Because these are a bit more
> involved and I don't even know if this is exhaustive.
> >>
> >>
> >> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
> >>>
> >>> Hi Stephan,
> >>>
> >>> Yes, the complex conjugate dtype would make things a lot faster, but I
> don't quite see why we would wait for that with introducing the `.H`
> property.
> >>>
> >>> I do agree that `.H` is the correct name, giving most immediate
> clarity (i.e., people who know what conjugate transpose is, will recognize
> it, while likely having to look up `.CT`, while people who do not know will
> have to look up regardless). But at the same time agree that the docstring
> and other documentation should start with "Conjugate tranpose" - good to
> try to avoid using names of people where you have to be in the "in crowd"
> to know what it means.
> >>>
> >>> The above said, if we were going with the initial suggestion of `.MT`
> for matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate
> version.
> >>>
> >>> But it seems there is little interest in that suggestion, although
> sadly a clear path forward has not yet emerged either.
> >>>
> >>> All the best,
> >>>
> >>> Marten
> >>>
> >>> _______________________________________________
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion at python.org
> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/84e8f35d/attachment-0001.html>

From deak.andris at gmail.com  Tue Jun 25 08:07:01 2019
From: deak.andris at gmail.com (Andras Deak)
Date: Tue, 25 Jun 2019 14:07:01 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
Message-ID: <CAMEWA4PwD0cGABB7rnPE3CDv6zJeWBav+AErZBwp6g_0DRx3+Q@mail.gmail.com>

On Tue, Jun 25, 2019 at 1:03 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
> I have to disagree, I hardly ever saw such bugs <snip>

I know the exact behaviour of MATLAB isn't very relevant for this
discussion, but anyway the reason I think this is a problem in MATLAB
is that there are a bunch of confused questions on Stack Overflow due
to this behaviour. Just from the first page this[1] query I could find
examples [2-8] (quite a few are about porting MATLAB to numpy or vice
versa).

> <snip> and moreover <Zbar, Z> is not compatible if you don't also transpose it but expected in almost all contexts of matrices, vectors and scalars. Elementwise conjugation is well inline with other elementwise operations starting with a dot in matlab hence still consistent.

I probably misunderstood your point here, Ilhan, because it sounds to
me that you're arguing that conjugation should not come without a
transpose. This is different from saying that transpose should not
come without conjugation (although I'd object to both). And `.'` is
exactly _not_ an elementwise operation in MATLAB: it's the transpose,
despite the seemingly element-wise syntax. `arr.'` will _not_
conjugate your array, `arr'` will (while both will transpose).
Finally, I don't think "MATLAB does it" is a very good argument
anyway; my subjective impression is that several of the issues with
np.matrix are due to behaviour that resembles that of MATLAB. But
MATLAB is very much built for matrices, and there are no 1d arrays, so
they don't end up with some of the pitfalls that numpy does.

>
> I would still expect an conjugation+transposition to be the default since just transposing a complex array is way more special and rare than its ubiquitous regular usage.
>

Coming back to numpy, I disagree with your statement. I'd say "just
transposing a complex _matrix_ is way more special and rare than its
ubiquitous regular usage", which is true. Admittedly I have a patchy
math background, but to me it seems that the "conjugation and
transpose go hand in hand" claim is mostly valid for linear algebra,
i.e. actual matrices and vectors. However numpy arrays are much more
general, and I very often want to reverse the shape of a complex
non-matrix 2d array (i.e. transpose it) for purposes of broadcasting
or vectorized matrix operations, and not want to conjugate it in the
process.
Do you at least agree that the feature of conjugate+transpose as
default mostly makes sense for linear algebra, or am I missing other
typical (and general numerical programming) use cases?

Andr?s

[1]: https://stackoverflow.com/search?q=%5Bmatlab%5D+conjugate+transpose+is%3Aa&mixed=1
[2]: https://stackoverflow.com/a/45272576
[3]: https://stackoverflow.com/a/54179564
[4]: https://stackoverflow.com/a/42320906
[5]: https://stackoverflow.com/a/23510668
[6]: https://stackoverflow.com/a/11416502
[7]: https://stackoverflow.com/a/49057640
[8]: https://stackoverflow.com/a/54309764

> ilhan
>
>
> On Tue, Jun 25, 2019 at 10:57 AM Andras Deak <deak.andris at gmail.com> wrote:
>>
>> On Tue, Jun 25, 2019 at 4:29 AM Cameron Blocker
>> <cameronjblocker at gmail.com> wrote:
>> >
>> > In my opinion, the matrix transpose operator and the conjugate transpose operator should be one and the same. Something nice about both Julia and MATLAB is that it takes more keystrokes to do a regular transpose instead of a conjugate transpose. Then people who work exclusively with real numbers can just forget that it's a conjugate transpose, and for relatively simple algorithms, their code will just work with complex numbers with little modification.
>> >
>>
>> I'd argue that MATLAB's feature of `'` meaning adjoint (conjugate
>> transpose etc.) and `.'` meaning regular transpose causes a lot of
>> confusion and probably a lot of subtle bugs. Most people are unaware
>> that `'` does a conjugate transpose and use it habitually, and when
>> for once they have a complex array they don't understand why the
>> values are off (assuming they even notice). Even the MATLAB docs
>> conflate the two operations occasionally, which doesn't help at all.
>> Transpose should _not_ incur conjugation automatically. I'm already a
>> bit wary of special-casing matrix dynamics this much, when ndarrays
>> are naturally multidimensional objects. Making transposes be more than
>> transposes would be a huge mistake in my opinion, already for matrices
>> (2d arrays) and especially for everything else.
>>
>> Andr?s
>>
>>
>>
>> > Ideally, I'd like to see a .H that was the defacto Matrix/Linear Algebra/Conjugate transpose that for 2 or more dimensions, conjugate transposes the last two dimensions and for 1 dimension just conjugates (if necessary). And then .T can stay the Array/Tensor transpose for general axis manipulation. I'd be okay with .T raising an error/warning on 1D arrays if .H did not. I commonly write things like u.conj().T at v even if I know both u and v are 1D just so it looks more like an inner product.
>> >
>> > -Cameron
>> >
>> > On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>> >>
>> >> I think enumerating the cases along the way makes it a bit more tangible for the discussion
>> >>
>> >>
>> >> import numpy as np
>> >> z = 1+1j
>> >> z.conjugate()  # 1-1j
>> >>
>> >> zz = np.array(z)
>> >> zz  # array(1+1j)
>> >> zz.T  # array(1+1j)  # OK expected.
>> >> zz.conj()  # 1-1j ?? what happened; no arrays?
>> >> zz.conjugate()  # 1-1j ?? same
>> >>
>> >> zz1d = np.array([z]*3)
>> >> zz1d.T  # no change so this is not the regular 2D array
>> >> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> >> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> >> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> >> zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None] is known
>> >>
>> >> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None] is known
>> >> zz2d.conj()  # 2D col vec conjugated
>> >> zz2d.conj().T  # 2D col vec conjugated transposed
>> >>
>> >> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
>> >> zz3d.conj()  # no surprises, conjugated
>> >> zz3d.conj().T  # ?? Why not the last two dims swapped like other stacked ops
>> >>
>> >> # For scalar arrays conjugation strips the number
>> >> # For 1D arrays transpose is a no-op but conjugation works
>> >> # For 2D arrays conjugate it is the matlab's elementwise conjugation op .'
>> >> #     and transpose is acting like expected
>> >> # For 3D arrays conjugate it is the matlab's elementwise conjugation op .'
>> >> #     but transpose is the reversing all dims just like matlab's permute()
>> >> #     with static dimorder.
>> >>
>> >> and so on. Maybe we can try to identify all the use cases and the quirks before we can make design the solution. Because these are a bit more involved and I don't even know if this is exhaustive.
>> >>
>> >>
>> >> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <m.h.vankerkwijk at gmail.com> wrote:
>> >>>
>> >>> Hi Stephan,
>> >>>
>> >>> Yes, the complex conjugate dtype would make things a lot faster, but I don't quite see why we would wait for that with introducing the `.H` property.
>> >>>
>> >>> I do agree that `.H` is the correct name, giving most immediate clarity (i.e., people who know what conjugate transpose is, will recognize it, while likely having to look up `.CT`, while people who do not know will have to look up regardless). But at the same time agree that the docstring and other documentation should start with "Conjugate tranpose" - good to try to avoid using names of people where you have to be in the "in crowd" to know what it means.
>> >>>
>> >>> The above said, if we were going with the initial suggestion of `.MT` for matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate version.
>> >>>
>> >>> But it seems there is little interest in that suggestion, although sadly a clear path forward has not yet emerged either.
>> >>>
>> >>> All the best,
>> >>>
>> >>> Marten
>> >>>
>> >>> _______________________________________________
>> >>> NumPy-Discussion mailing list
>> >>> NumPy-Discussion at python.org
>> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion at python.org
>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From m.h.vankerkwijk at gmail.com  Tue Jun 25 09:05:13 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 25 Jun 2019 09:05:13 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAMEWA4PwD0cGABB7rnPE3CDv6zJeWBav+AErZBwp6g_0DRx3+Q@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAMEWA4PwD0cGABB7rnPE3CDv6zJeWBav+AErZBwp6g_0DRx3+Q@mail.gmail.com>
Message-ID: <CAJNV+9sDhJ=1R0E1YedvLkKy3W3mkkGi4wb34Ht2UxpWa6aidA@mail.gmail.com>

Hi All,

The examples with different notation brought back memory of another
solution: define
`m.?` and m.?`. This is possible, since python3 allows any unicode for
names, nicely readable, but admittedly a bit annoying to enter (in emacs,
set-input-method to TeX and then ^T, ^H).

More seriously, still hoping to move to just being able to use .T and .H as
matrix (conjugate) transpose in newer numpy versions, is it really not
possible within a property to know whether the context where the operation
was defined has some specific "matrix_transpose" variable set? After all,
an error or warning generates a stack backtrace and from the ndarray C code
one would have to look only one stack level up (inside a warning, one can
even ask for the warning to be given from inside a different level, if I
recall correctly).

If that is truly impossible, then I think we need different names for both
.T and .H.  Some suggestions:

1. a.MT, a.MH (original suggestion at top of the thread)
2. a.mT, a.mH (m still for matrix, but no standing out as much, maybe
making it is easier to guess what is means)
3. a.RT and .CT (regular and conjugate transpose - the C also reminds of
complex)

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/00fc37d3/attachment.html>

From jni.soma at gmail.com  Tue Jun 25 09:34:51 2019
From: jni.soma at gmail.com (Juan Nunez-Iglesias)
Date: Tue, 25 Jun 2019 23:34:51 +1000
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
Message-ID: <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>

On Mon, 24 Jun 2019, at 11:25 PM, Marten van Kerkwijk wrote:
> Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape of `(n, 1)` the right behaviour? I.e., it should still change from what it is now - which is to leave the shape at `(n,)`.

Just to chime in as a user: v.T should continue to be a silent no-op for 1D arrays. NumPy makes it arbitrary whether a 1D array is viewed as a row or column vector, but we often want to write .T to match the notation in a paper we're implementing.

More deeply, I think .T should never change the number of dimensions of an array.

I'm ambivalent about the whole discussion in this thread, but generally I think NumPy should err on the side of caution when deprecating behaviour. It's unclear to me whether the benefits of making .T transpose only the last two dimensions outweigh the costs of deprecation. Despite some people's assertion that those using .T to transpose >2D arrays are probably making a mistake, we have two perfectly correct uses in scikit-image. These could be easily changed to .transpose() (honestly they probably should!), but they illustrate that there is some amount of correct code out there that would be forced to keep up with an update here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/47cc3596/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Tue Jun 25 10:05:08 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 25 Jun 2019 10:05:08 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
Message-ID: <CAJNV+9sKdLzy_=GD4Zn+uEo6C+cZotLOx3h-KkfEDyL68UO4GA@mail.gmail.com>

Hi Juan,

On Tue, Jun 25, 2019 at 9:35 AM Juan Nunez-Iglesias <jni.soma at gmail.com>
wrote:

> On Mon, 24 Jun 2019, at 11:25 PM, Marten van Kerkwijk wrote:
>
> Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
> of `(n, 1)` the right behaviour? I.e., it should still change from what it
> is now - which is to leave the shape at `(n,)`.
>
>
> Just to chime in as a user: v.T should continue to be a silent no-op for
> 1D arrays. NumPy makes it arbitrary whether a 1D array is viewed as a row
> or column vector, but we often want to write .T to match the notation in a
> paper we're implementing.
>
> More deeply, I think .T should never change the number of dimensions of an
> array.
>

OK, that makes three of you, all agreeing on the same core argument, but
with you now adding another strong one, of not changing the number of
dimensions. Let's consider this aspect settled.


>
> I'm ambivalent about the whole discussion in this thread, but generally I
> think NumPy should err on the side of caution when deprecating behaviour.
> It's unclear to me whether the benefits of making .T transpose only the
> last two dimensions outweigh the costs of deprecation. Despite some
> people's assertion that those using .T to transpose >2D arrays are probably
> making a mistake, we have two perfectly correct uses in scikit-image. These
> could be easily changed to .transpose() (honestly they probably should!),
> but they illustrate that there is some amount of correct code out there
> that would be forced to keep up with an update here.
>

Fair enough, there are people who actually read the manual and use things
correctly! Though, being generally one of those, I still was very
disappointed to find `.T` didn't do the last two axes.

Any preference for alternative spellings?

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/5d846e12/attachment.html>

From toddrjen at gmail.com  Tue Jun 25 10:06:36 2019
From: toddrjen at gmail.com (Todd)
Date: Tue, 25 Jun 2019 10:06:36 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
Message-ID: <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>

That is how it is in your field, but not mine.  For us we only use the
conventional transpose, even for complex numbers.  And I routinely see bugs
in MATLAB because of its choice of defaults, and there are probably many
more that don't get caught because they happen silently.

I think the principle of least surprise should apply here.  For people who
need the conjugate transform the know to make sure they use the right
operation.  But a lot of people aren't even aware that there conjugate
transpose exists, they are just going to copy what they see in the examples
without realizing it does the completely wrong thing in certain cases.
They wouldn't bother to check because they don't even know there is a
second transpose operation they need to look out for.  So it would hurt a
lot of people without helping anyone.

On Tue, Jun 25, 2019, 07:03 Ilhan Polat <ilhanpolat at gmail.com> wrote:

> I have to disagree, I hardly ever saw such bugs and moreover <Zbar, Z> is
> not compatible if you don't also transpose it but expected in almost all
> contexts of matrices, vectors and scalars. Elementwise conjugation is well
> inline with other elementwise operations starting with a dot in matlab
> hence still consistent.
>
> I would still expect an conjugation+transposition to be the default since
> just transposing a complex array is way more special and rare than its
> ubiquitous regular usage.
>
> ilhan
>
>
> On Tue, Jun 25, 2019 at 10:57 AM Andras Deak <deak.andris at gmail.com>
> wrote:
>
>> On Tue, Jun 25, 2019 at 4:29 AM Cameron Blocker
>> <cameronjblocker at gmail.com> wrote:
>> >
>> > In my opinion, the matrix transpose operator and the conjugate
>> transpose operator should be one and the same. Something nice about both
>> Julia and MATLAB is that it takes more keystrokes to do a regular transpose
>> instead of a conjugate transpose. Then people who work exclusively with
>> real numbers can just forget that it's a conjugate transpose, and for
>> relatively simple algorithms, their code will just work with complex
>> numbers with little modification.
>> >
>>
>> I'd argue that MATLAB's feature of `'` meaning adjoint (conjugate
>> transpose etc.) and `.'` meaning regular transpose causes a lot of
>> confusion and probably a lot of subtle bugs. Most people are unaware
>> that `'` does a conjugate transpose and use it habitually, and when
>> for once they have a complex array they don't understand why the
>> values are off (assuming they even notice). Even the MATLAB docs
>> conflate the two operations occasionally, which doesn't help at all.
>> Transpose should _not_ incur conjugation automatically. I'm already a
>> bit wary of special-casing matrix dynamics this much, when ndarrays
>> are naturally multidimensional objects. Making transposes be more than
>> transposes would be a huge mistake in my opinion, already for matrices
>> (2d arrays) and especially for everything else.
>>
>> Andr?s
>>
>>
>>
>> > Ideally, I'd like to see a .H that was the defacto Matrix/Linear
>> Algebra/Conjugate transpose that for 2 or more dimensions, conjugate
>> transposes the last two dimensions and for 1 dimension just conjugates (if
>> necessary). And then .T can stay the Array/Tensor transpose for general
>> axis manipulation. I'd be okay with .T raising an error/warning on 1D
>> arrays if .H did not. I commonly write things like u.conj().T at v even if
>> I know both u and v are 1D just so it looks more like an inner product.
>> >
>> > -Cameron
>> >
>> > On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <ilhanpolat at gmail.com>
>> wrote:
>> >>
>> >> I think enumerating the cases along the way makes it a bit more
>> tangible for the discussion
>> >>
>> >>
>> >> import numpy as np
>> >> z = 1+1j
>> >> z.conjugate()  # 1-1j
>> >>
>> >> zz = np.array(z)
>> >> zz  # array(1+1j)
>> >> zz.T  # array(1+1j)  # OK expected.
>> >> zz.conj()  # 1-1j ?? what happened; no arrays?
>> >> zz.conjugate()  # 1-1j ?? same
>> >>
>> >> zz1d = np.array([z]*3)
>> >> zz1d.T  # no change so this is not the regular 2D array
>> >> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> >> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> >> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>> >> zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None]
>> is known
>> >>
>> >> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None]
>> is known
>> >> zz2d.conj()  # 2D col vec conjugated
>> >> zz2d.conj().T  # 2D col vec conjugated transposed
>> >>
>> >> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
>> >> zz3d.conj()  # no surprises, conjugated
>> >> zz3d.conj().T  # ?? Why not the last two dims swapped like other
>> stacked ops
>> >>
>> >> # For scalar arrays conjugation strips the number
>> >> # For 1D arrays transpose is a no-op but conjugation works
>> >> # For 2D arrays conjugate it is the matlab's elementwise conjugation
>> op .'
>> >> #     and transpose is acting like expected
>> >> # For 3D arrays conjugate it is the matlab's elementwise conjugation
>> op .'
>> >> #     but transpose is the reversing all dims just like matlab's
>> permute()
>> >> #     with static dimorder.
>> >>
>> >> and so on. Maybe we can try to identify all the use cases and the
>> quirks before we can make design the solution. Because these are a bit more
>> involved and I don't even know if this is exhaustive.
>> >>
>> >>
>> >> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
>> m.h.vankerkwijk at gmail.com> wrote:
>> >>>
>> >>> Hi Stephan,
>> >>>
>> >>> Yes, the complex conjugate dtype would make things a lot faster, but
>> I don't quite see why we would wait for that with introducing the `.H`
>> property.
>> >>>
>> >>> I do agree that `.H` is the correct name, giving most immediate
>> clarity (i.e., people who know what conjugate transpose is, will recognize
>> it, while likely having to look up `.CT`, while people who do not know will
>> have to look up regardless). But at the same time agree that the docstring
>> and other documentation should start with "Conjugate tranpose" - good to
>> try to avoid using names of people where you have to be in the "in crowd"
>> to know what it means.
>> >>>
>> >>> The above said, if we were going with the initial suggestion of `.MT`
>> for matrix transpose, then I'd prefer `.CT` over `.HT` as its conjugate
>> version.
>> >>>
>> >>> But it seems there is little interest in that suggestion, although
>> sadly a clear path forward has not yet emerged either.
>> >>>
>> >>> All the best,
>> >>>
>> >>> Marten
>> >>>
>> >>> _______________________________________________
>> >>> NumPy-Discussion mailing list
>> >>> NumPy-Discussion at python.org
>> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion at python.org
>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/e666eb32/attachment-0001.html>

From toddrjen at gmail.com  Tue Jun 25 10:20:18 2019
From: toddrjen at gmail.com (Todd)
Date: Tue, 25 Jun 2019 10:20:18 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
Message-ID: <CAFpSVpLS_HvuQjLF6Bcxm5QALA-UxZAifA-K6-b=SEBPs9d5eQ@mail.gmail.com>

On Tue, Jun 25, 2019 at 9:35 AM Juan Nunez-Iglesias <jni.soma at gmail.com>
wrote:

> On Mon, 24 Jun 2019, at 11:25 PM, Marten van Kerkwijk wrote:
>
> Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
> of `(n, 1)` the right behaviour? I.e., it should still change from what it
> is now - which is to leave the shape at `(n,)`.
>
>
> Just to chime in as a user: v.T should continue to be a silent no-op for
> 1D arrays. NumPy makes it arbitrary whether a 1D array is viewed as a row
> or column vector, but we often want to write .T to match the notation in a
> paper we're implementing.
>

Why should it be silent?  This is a source of bugs.  At least in my
experience, generally when people write v.T it is a mistake.  Either they
are coming from another language that works differently, or they failed to
properly check their function arguments.  And if you are doing it on
purpose, you are doing something you know is a no-op for essentially
documentation purposes, and I would think that is the sort of thing you
need to make as explicit as possible.  "Errors should never pass silently.
Unless explicitly silenced."

So as I said, I think at the very least this should be a warning.  People
who are doing this on purpose can easily silence (or just ignore) the
warning, but it will help people who do it by mistake.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/cd570936/attachment.html>

From shoyer at gmail.com  Tue Jun 25 10:41:39 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 25 Jun 2019 07:41:39 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAFpSVpLS_HvuQjLF6Bcxm5QALA-UxZAifA-K6-b=SEBPs9d5eQ@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
 <CAFpSVpLS_HvuQjLF6Bcxm5QALA-UxZAifA-K6-b=SEBPs9d5eQ@mail.gmail.com>
Message-ID: <CAEQ_Tvfp6E-++Ly4sep1er7XDnBJ=aVB-1j6yOqHTuquMF9b6A@mail.gmail.com>

On Tue, Jun 25, 2019 at 7:20 AM Todd <toddrjen at gmail.com> wrote:

> On Tue, Jun 25, 2019 at 9:35 AM Juan Nunez-Iglesias <jni.soma at gmail.com>
> wrote:
>
>> On Mon, 24 Jun 2019, at 11:25 PM, Marten van Kerkwijk wrote:
>>
>> Just to be sure: for a 1-d array, you'd both consider `.T` giving a shape
>> of `(n, 1)` the right behaviour? I.e., it should still change from what it
>> is now - which is to leave the shape at `(n,)`.
>>
>>
>> Just to chime in as a user: v.T should continue to be a silent no-op for
>> 1D arrays. NumPy makes it arbitrary whether a 1D array is viewed as a row
>> or column vector, but we often want to write .T to match the notation in a
>> paper we're implementing.
>>
>
> Why should it be silent?  This is a source of bugs.  At least in my
> experience, generally when people write v.T it is a mistake.  Either they
> are coming from another language that works differently, or they failed to
> properly check their function arguments.  And if you are doing it on
> purpose, you are doing something you know is a no-op for essentially
> documentation purposes, and I would think that is the sort of thing you
> need to make as explicit as possible.  "Errors should never pass silently.
> Unless explicitly silenced."
>

Writing v.T is also sensible if you're writing code that could apply
equally well to either a single vector or a stack of vectors. This mirrors
the behavior of @, which also allows either single vectors or stacks of
vectors (matrices) with the same syntax.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/e987e827/attachment.html>

From alan.isaac at gmail.com  Tue Jun 25 11:01:35 2019
From: alan.isaac at gmail.com (Alan Isaac)
Date: Tue, 25 Jun 2019 11:01:35 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9sDhJ=1R0E1YedvLkKy3W3mkkGi4wb34Ht2UxpWa6aidA@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAMEWA4PwD0cGABB7rnPE3CDv6zJeWBav+AErZBwp6g_0DRx3+Q@mail.gmail.com>
 <CAJNV+9sDhJ=1R0E1YedvLkKy3W3mkkGi4wb34Ht2UxpWa6aidA@mail.gmail.com>
Message-ID: <86a39fed-59a7-cd83-dbab-97659cd1779c@gmail.com>

I wish this discussion would be clearer that a.T is not going anywhere,
should not change, and in any case should match a.transpose().
Anything else threatens to break existing code for no good payoff.

How many people in this discussion are proposing that a widely used
library like numpy should make a breaking change in syntax just because
someone guesses it won't break too much code "out there"?
I'm having trouble telling if this is an actual view.

Because a.T cannot reasonably change,
if a.H is allowed, it should mean a.conj().transpose().
This also supports the easiest and least buggy transition away from np.matrix.

But since `a.H` would not be a view of `a`, most probably any `a.H`
proposal should be discarded as misleading and not materially better than
the existing syntax (a.conj().T).

I trust nobody is proposing to change `transpose`.

Cheers, Alan Isaac

From toddrjen at gmail.com  Tue Jun 25 11:03:30 2019
From: toddrjen at gmail.com (Todd)
Date: Tue, 25 Jun 2019 11:03:30 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEQ_Tvfp6E-++Ly4sep1er7XDnBJ=aVB-1j6yOqHTuquMF9b6A@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAL1kJvCkd7qAuK3GbEwN_orcid9sGZ5PVZhU5VmT2_xjeq8S7g@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
 <CAFpSVpLS_HvuQjLF6Bcxm5QALA-UxZAifA-K6-b=SEBPs9d5eQ@mail.gmail.com>
 <CAEQ_Tvfp6E-++Ly4sep1er7XDnBJ=aVB-1j6yOqHTuquMF9b6A@mail.gmail.com>
Message-ID: <CAFpSVp+duUQVS+uX6MntM6o2MWL_cE-bw7pZe_ry+zbAVQMPdw@mail.gmail.com>

On Tue, Jun 25, 2019 at 10:42 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Tue, Jun 25, 2019 at 7:20 AM Todd <toddrjen at gmail.com> wrote:
>
>> On Tue, Jun 25, 2019 at 9:35 AM Juan Nunez-Iglesias <jni.soma at gmail.com>
>> wrote:
>>
>>> On Mon, 24 Jun 2019, at 11:25 PM, Marten van Kerkwijk wrote:
>>>
>>> Just to be sure: for a 1-d array, you'd both consider `.T` giving a
>>> shape of `(n, 1)` the right behaviour? I.e., it should still change from
>>> what it is now - which is to leave the shape at `(n,)`.
>>>
>>>
>>> Just to chime in as a user: v.T should continue to be a silent no-op for
>>> 1D arrays. NumPy makes it arbitrary whether a 1D array is viewed as a row
>>> or column vector, but we often want to write .T to match the notation in a
>>> paper we're implementing.
>>>
>>
>> Why should it be silent?  This is a source of bugs.  At least in my
>> experience, generally when people write v.T it is a mistake.  Either they
>> are coming from another language that works differently, or they failed to
>> properly check their function arguments.  And if you are doing it on
>> purpose, you are doing something you know is a no-op for essentially
>> documentation purposes, and I would think that is the sort of thing you
>> need to make as explicit as possible.  "Errors should never pass silently.
>> Unless explicitly silenced."
>>
>
> Writing v.T is also sensible if you're writing code that could apply
> equally well to either a single vector or a stack of vectors. This mirrors
> the behavior of @, which also allows either single vectors or stacks of
> vectors (matrices) with the same syntax.
>

Fair enough.  But although there are valid reasons to do a divide by zero,
it still causes a warning because it is a problem often enough that people
should be made aware of it.  I think this is a similar scenario.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/78a0b26d/attachment-0001.html>

From alan.isaac at gmail.com  Tue Jun 25 11:46:56 2019
From: alan.isaac at gmail.com (Alan Isaac)
Date: Tue, 25 Jun 2019 11:46:56 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAFpSVp+duUQVS+uX6MntM6o2MWL_cE-bw7pZe_ry+zbAVQMPdw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
 <CAFpSVpLS_HvuQjLF6Bcxm5QALA-UxZAifA-K6-b=SEBPs9d5eQ@mail.gmail.com>
 <CAEQ_Tvfp6E-++Ly4sep1er7XDnBJ=aVB-1j6yOqHTuquMF9b6A@mail.gmail.com>
 <CAFpSVp+duUQVS+uX6MntM6o2MWL_cE-bw7pZe_ry+zbAVQMPdw@mail.gmail.com>
Message-ID: <42c14036-1678-4740-4757-6c72b6577a52@gmail.com>

On 6/25/2019 11:03 AM, Todd wrote:
> Fair enough.? But although there are valid reasons to do a divide by zero, it still causes a warning because it is a problem often enough that people should be made aware of it.? I 
> think this is a similar scenario.


I side with Stephan on this, but when there are opinions on both sides,
I wonder what the resolution strategy is.  I suppose there is a possible tension:

1. Existing practice should be privileged (no change for the sake of change).
2. Documented user issues need to be addressed.

So what is an "in the wild" example of where numpy users create errors that pass
silently because a 1-d array transpose did not behave as expected?
Why would the unexpected array shape of the result not alert the user if it happens?

In your favor, Mathematica's `Transpose` raises an error when applied to 1d arrays,
and the Mma designs are usually carefully considered.

Cheers, Alan Isaac


From ilhanpolat at gmail.com  Tue Jun 25 12:04:46 2019
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Tue, 25 Jun 2019 18:04:46 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
Message-ID: <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>

I think we would have seen a lot of evidence in the last four decades if
this was that problematic.

You are the second person to memtion these bugs. Care to show me some
examples of these bugs?

Maybe I am missing the point here. I haven't seen any bugs because somebody
thought they are just transposing.

Using transpose to reshape an array is a different story. That we can
discuss.

On Tue, Jun 25, 2019, 16:10 Todd <toddrjen at gmail.com> wrote:

> That is how it is in your field, but not mine.  For us we only use the
> conventional transpose, even for complex numbers.  And I routinely see bugs
> in MATLAB because of its choice of defaults, and there are probably many
> more that don't get caught because they happen silently.
>
> I think the principle of least surprise should apply here.  For people who
> need the conjugate transform the know to make sure they use the right
> operation.  But a lot of people aren't even aware that there conjugate
> transpose exists, they are just going to copy what they see in the examples
> without realizing it does the completely wrong thing in certain cases.
> They wouldn't bother to check because they don't even know there is a
> second transpose operation they need to look out for.  So it would hurt a
> lot of people without helping anyone.
>
> On Tue, Jun 25, 2019, 07:03 Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
>> I have to disagree, I hardly ever saw such bugs and moreover <Zbar, Z> is
>> not compatible if you don't also transpose it but expected in almost all
>> contexts of matrices, vectors and scalars. Elementwise conjugation is well
>> inline with other elementwise operations starting with a dot in matlab
>> hence still consistent.
>>
>> I would still expect an conjugation+transposition to be the default since
>> just transposing a complex array is way more special and rare than its
>> ubiquitous regular usage.
>>
>> ilhan
>>
>>
>> On Tue, Jun 25, 2019 at 10:57 AM Andras Deak <deak.andris at gmail.com>
>> wrote:
>>
>>> On Tue, Jun 25, 2019 at 4:29 AM Cameron Blocker
>>> <cameronjblocker at gmail.com> wrote:
>>> >
>>> > In my opinion, the matrix transpose operator and the conjugate
>>> transpose operator should be one and the same. Something nice about both
>>> Julia and MATLAB is that it takes more keystrokes to do a regular transpose
>>> instead of a conjugate transpose. Then people who work exclusively with
>>> real numbers can just forget that it's a conjugate transpose, and for
>>> relatively simple algorithms, their code will just work with complex
>>> numbers with little modification.
>>> >
>>>
>>> I'd argue that MATLAB's feature of `'` meaning adjoint (conjugate
>>> transpose etc.) and `.'` meaning regular transpose causes a lot of
>>> confusion and probably a lot of subtle bugs. Most people are unaware
>>> that `'` does a conjugate transpose and use it habitually, and when
>>> for once they have a complex array they don't understand why the
>>> values are off (assuming they even notice). Even the MATLAB docs
>>> conflate the two operations occasionally, which doesn't help at all.
>>> Transpose should _not_ incur conjugation automatically. I'm already a
>>> bit wary of special-casing matrix dynamics this much, when ndarrays
>>> are naturally multidimensional objects. Making transposes be more than
>>> transposes would be a huge mistake in my opinion, already for matrices
>>> (2d arrays) and especially for everything else.
>>>
>>> Andr?s
>>>
>>>
>>>
>>> > Ideally, I'd like to see a .H that was the defacto Matrix/Linear
>>> Algebra/Conjugate transpose that for 2 or more dimensions, conjugate
>>> transposes the last two dimensions and for 1 dimension just conjugates (if
>>> necessary). And then .T can stay the Array/Tensor transpose for general
>>> axis manipulation. I'd be okay with .T raising an error/warning on 1D
>>> arrays if .H did not. I commonly write things like u.conj().T at v even if
>>> I know both u and v are 1D just so it looks more like an inner product.
>>> >
>>> > -Cameron
>>> >
>>> > On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <ilhanpolat at gmail.com>
>>> wrote:
>>> >>
>>> >> I think enumerating the cases along the way makes it a bit more
>>> tangible for the discussion
>>> >>
>>> >>
>>> >> import numpy as np
>>> >> z = 1+1j
>>> >> z.conjugate()  # 1-1j
>>> >>
>>> >> zz = np.array(z)
>>> >> zz  # array(1+1j)
>>> >> zz.T  # array(1+1j)  # OK expected.
>>> >> zz.conj()  # 1-1j ?? what happened; no arrays?
>>> >> zz.conjugate()  # 1-1j ?? same
>>> >>
>>> >> zz1d = np.array([z]*3)
>>> >> zz1d.T  # no change so this is not the regular 2D array
>>> >> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>> >> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>> >> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>> >> zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None]
>>> is known
>>> >>
>>> >> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None]
>>> is known
>>> >> zz2d.conj()  # 2D col vec conjugated
>>> >> zz2d.conj().T  # 2D col vec conjugated transposed
>>> >>
>>> >> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
>>> >> zz3d.conj()  # no surprises, conjugated
>>> >> zz3d.conj().T  # ?? Why not the last two dims swapped like other
>>> stacked ops
>>> >>
>>> >> # For scalar arrays conjugation strips the number
>>> >> # For 1D arrays transpose is a no-op but conjugation works
>>> >> # For 2D arrays conjugate it is the matlab's elementwise conjugation
>>> op .'
>>> >> #     and transpose is acting like expected
>>> >> # For 3D arrays conjugate it is the matlab's elementwise conjugation
>>> op .'
>>> >> #     but transpose is the reversing all dims just like matlab's
>>> permute()
>>> >> #     with static dimorder.
>>> >>
>>> >> and so on. Maybe we can try to identify all the use cases and the
>>> quirks before we can make design the solution. Because these are a bit more
>>> involved and I don't even know if this is exhaustive.
>>> >>
>>> >>
>>> >> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
>>> m.h.vankerkwijk at gmail.com> wrote:
>>> >>>
>>> >>> Hi Stephan,
>>> >>>
>>> >>> Yes, the complex conjugate dtype would make things a lot faster, but
>>> I don't quite see why we would wait for that with introducing the `.H`
>>> property.
>>> >>>
>>> >>> I do agree that `.H` is the correct name, giving most immediate
>>> clarity (i.e., people who know what conjugate transpose is, will recognize
>>> it, while likely having to look up `.CT`, while people who do not know will
>>> have to look up regardless). But at the same time agree that the docstring
>>> and other documentation should start with "Conjugate tranpose" - good to
>>> try to avoid using names of people where you have to be in the "in crowd"
>>> to know what it means.
>>> >>>
>>> >>> The above said, if we were going with the initial suggestion of
>>> `.MT` for matrix transpose, then I'd prefer `.CT` over `.HT` as its
>>> conjugate version.
>>> >>>
>>> >>> But it seems there is little interest in that suggestion, although
>>> sadly a clear path forward has not yet emerged either.
>>> >>>
>>> >>> All the best,
>>> >>>
>>> >>> Marten
>>> >>>
>>> >>> _______________________________________________
>>> >>> NumPy-Discussion mailing list
>>> >>> NumPy-Discussion at python.org
>>> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>> >>
>>> >> _______________________________________________
>>> >> NumPy-Discussion mailing list
>>> >> NumPy-Discussion at python.org
>>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>>> >
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at python.org
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/c1934e9c/attachment.html>

From toddrjen at gmail.com  Tue Jun 25 13:12:35 2019
From: toddrjen at gmail.com (Todd)
Date: Tue, 25 Jun 2019 13:12:35 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <42c14036-1678-4740-4757-6c72b6577a52@gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
 <CAFpSVpLS_HvuQjLF6Bcxm5QALA-UxZAifA-K6-b=SEBPs9d5eQ@mail.gmail.com>
 <CAEQ_Tvfp6E-++Ly4sep1er7XDnBJ=aVB-1j6yOqHTuquMF9b6A@mail.gmail.com>
 <CAFpSVp+duUQVS+uX6MntM6o2MWL_cE-bw7pZe_ry+zbAVQMPdw@mail.gmail.com>
 <42c14036-1678-4740-4757-6c72b6577a52@gmail.com>
Message-ID: <CAFpSVpKmdiXRnBf8_BgyKoEb8ECAYK4=-fuURnfSkBho-ey1wQ@mail.gmail.com>

On Tue, Jun 25, 2019 at 11:47 AM Alan Isaac <alan.isaac at gmail.com> wrote:

> On 6/25/2019 11:03 AM, Todd wrote:
> > Fair enough.  But although there are valid reasons to do a divide by
> zero, it still causes a warning because it is a problem often enough that
> people should be made aware of it.  I
> > think this is a similar scenario.
>
>
>
> I side with Stephan on this, but when there are opinions on both sides,
> I wonder what the resolution strategy is.  I suppose there is a possible
> tension:
>
> 1. Existing practice should be privileged (no change for the sake of
> change).
> 2. Documented user issues need to be addressed.
>

Note that the behavior wouldn't change.  Transposing vectors would do
exactly what it has always done: nothing.  But people would be made aware
that the operation they are doing won't actually do anything.

I completely agree that change for the sake of change is not a good thing.
But we are talking about a no-op here.  If someone is intentionally doing
something that does nothing, I would like to think that they could deal
with a warning that can be easily silenced.


> So what is an "in the wild" example of where numpy users create errors
> that pass
> silently because a 1-d array transpose did not behave as expected?
>

Part of the problem with silent errors is that we typically aren't going to
see them, by definition.  The only way you could catch a silent error like
that is if someone noticed the results looked different than they expected,
but that can easily be hidden if the error is a corner case that is
averaged out.  That is the whole point of having a warning, to make it not
silent.  It reminds me of the old Weisert quote, "As far as we know, our
computer has never had an undetected error."

The problems I typically encounter is when people write their code assuming
that, for example, a trial will have multiple results.  It usually does,
but on occasion it doesn't.  This sort of thing usually results in an
error, although it is typically an error far removed from where the problem
actually occurs and is therefor extremely hard to debug.  I haven't seen
truly completely silent errors, but again I wouldn't expect to.

We can't really tell how common this sort of thing is until we actively
check for it.  Remember how many silent errors in encoding were caught once
python3 starting enforcing proper encoding/decoding handling?  People
insisted encoding was being handled properly with python2, but it wasn't
even in massive, mature projects.  People just didn't notice the problems
before because they were silent.

At the very least, the warning could tell people coming from other
languages why the transpose is doing something different than they expect,
as this is not an uncommon issue on stackoverflow. [1]


> Why would the unexpected array shape of the result not alert the user if
> it happens?


I think counting on the code to produce an error is really dangerous.  I
have seen people do a lot of really bizarre things with their code.


> In your favor, Mathematica's `Transpose` raises an error when applied to
> 1d arrays,
> and the Mma designs are usually carefully considered.


Yes, numpy is really the outlier here in making this a silent no-op.
MATLAB, Julia, R, and SAS all transpose vectors, coercing them to matrices
if needed.  Again, I don't think we should change how the transpose works,
it is too late for that.  But I do think that people should be warned about
it.

[1] https://stackoverflow.com/search?q=numpy+transpose+vector (not all of
these are relevant, but there are a bunch on there)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/3e892b83/attachment-0001.html>

From toddrjen at gmail.com  Tue Jun 25 13:15:05 2019
From: toddrjen at gmail.com (Todd)
Date: Tue, 25 Jun 2019 13:15:05 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
Message-ID: <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>

I was saying we shouldn't change the default transpose operation to be
conjugate transpose.  We don't currently have a conjugate transpose so it
isn't an issue.  I think having a conjugate transpose is a great idea, I
just don't think it should be the default.

On Tue, Jun 25, 2019 at 12:12 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:

> I think we would have seen a lot of evidence in the last four decades if
> this was that problematic.
>
> You are the second person to memtion these bugs. Care to show me some
> examples of these bugs?
>
> Maybe I am missing the point here. I haven't seen any bugs because
> somebody thought they are just transposing.
>
> Using transpose to reshape an array is a different story. That we can
> discuss.
>
> On Tue, Jun 25, 2019, 16:10 Todd <toddrjen at gmail.com> wrote:
>
>> That is how it is in your field, but not mine.  For us we only use the
>> conventional transpose, even for complex numbers.  And I routinely see bugs
>> in MATLAB because of its choice of defaults, and there are probably many
>> more that don't get caught because they happen silently.
>>
>> I think the principle of least surprise should apply here.  For people
>> who need the conjugate transform the know to make sure they use the right
>> operation.  But a lot of people aren't even aware that there conjugate
>> transpose exists, they are just going to copy what they see in the examples
>> without realizing it does the completely wrong thing in certain cases.
>> They wouldn't bother to check because they don't even know there is a
>> second transpose operation they need to look out for.  So it would hurt a
>> lot of people without helping anyone.
>>
>> On Tue, Jun 25, 2019, 07:03 Ilhan Polat <ilhanpolat at gmail.com> wrote:
>>
>>> I have to disagree, I hardly ever saw such bugs and moreover <Zbar, Z>
>>> is not compatible if you don't also transpose it but expected in almost all
>>> contexts of matrices, vectors and scalars. Elementwise conjugation is well
>>> inline with other elementwise operations starting with a dot in matlab
>>> hence still consistent.
>>>
>>> I would still expect an conjugation+transposition to be the default
>>> since just transposing a complex array is way more special and rare than
>>> its ubiquitous regular usage.
>>>
>>> ilhan
>>>
>>>
>>> On Tue, Jun 25, 2019 at 10:57 AM Andras Deak <deak.andris at gmail.com>
>>> wrote:
>>>
>>>> On Tue, Jun 25, 2019 at 4:29 AM Cameron Blocker
>>>> <cameronjblocker at gmail.com> wrote:
>>>> >
>>>> > In my opinion, the matrix transpose operator and the conjugate
>>>> transpose operator should be one and the same. Something nice about both
>>>> Julia and MATLAB is that it takes more keystrokes to do a regular transpose
>>>> instead of a conjugate transpose. Then people who work exclusively with
>>>> real numbers can just forget that it's a conjugate transpose, and for
>>>> relatively simple algorithms, their code will just work with complex
>>>> numbers with little modification.
>>>> >
>>>>
>>>> I'd argue that MATLAB's feature of `'` meaning adjoint (conjugate
>>>> transpose etc.) and `.'` meaning regular transpose causes a lot of
>>>> confusion and probably a lot of subtle bugs. Most people are unaware
>>>> that `'` does a conjugate transpose and use it habitually, and when
>>>> for once they have a complex array they don't understand why the
>>>> values are off (assuming they even notice). Even the MATLAB docs
>>>> conflate the two operations occasionally, which doesn't help at all.
>>>> Transpose should _not_ incur conjugation automatically. I'm already a
>>>> bit wary of special-casing matrix dynamics this much, when ndarrays
>>>> are naturally multidimensional objects. Making transposes be more than
>>>> transposes would be a huge mistake in my opinion, already for matrices
>>>> (2d arrays) and especially for everything else.
>>>>
>>>> Andr?s
>>>>
>>>>
>>>>
>>>> > Ideally, I'd like to see a .H that was the defacto Matrix/Linear
>>>> Algebra/Conjugate transpose that for 2 or more dimensions, conjugate
>>>> transposes the last two dimensions and for 1 dimension just conjugates (if
>>>> necessary). And then .T can stay the Array/Tensor transpose for general
>>>> axis manipulation. I'd be okay with .T raising an error/warning on 1D
>>>> arrays if .H did not. I commonly write things like u.conj().T at v even
>>>> if I know both u and v are 1D just so it looks more like an inner product.
>>>> >
>>>> > -Cameron
>>>> >
>>>> > On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <ilhanpolat at gmail.com>
>>>> wrote:
>>>> >>
>>>> >> I think enumerating the cases along the way makes it a bit more
>>>> tangible for the discussion
>>>> >>
>>>> >>
>>>> >> import numpy as np
>>>> >> z = 1+1j
>>>> >> z.conjugate()  # 1-1j
>>>> >>
>>>> >> zz = np.array(z)
>>>> >> zz  # array(1+1j)
>>>> >> zz.T  # array(1+1j)  # OK expected.
>>>> >> zz.conj()  # 1-1j ?? what happened; no arrays?
>>>> >> zz.conjugate()  # 1-1j ?? same
>>>> >>
>>>> >> zz1d = np.array([z]*3)
>>>> >> zz1d.T  # no change so this is not the regular 2D array
>>>> >> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>>> >> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>>> >> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>>> >> zz1d[:, None].conj()  # 2D column vector - no surprises if [:, None]
>>>> is known
>>>> >>
>>>> >> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:, None]
>>>> is known
>>>> >> zz2d.conj()  # 2D col vec conjugated
>>>> >> zz2d.conj().T  # 2D col vec conjugated transposed
>>>> >>
>>>> >> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
>>>> >> zz3d.conj()  # no surprises, conjugated
>>>> >> zz3d.conj().T  # ?? Why not the last two dims swapped like other
>>>> stacked ops
>>>> >>
>>>> >> # For scalar arrays conjugation strips the number
>>>> >> # For 1D arrays transpose is a no-op but conjugation works
>>>> >> # For 2D arrays conjugate it is the matlab's elementwise conjugation
>>>> op .'
>>>> >> #     and transpose is acting like expected
>>>> >> # For 3D arrays conjugate it is the matlab's elementwise conjugation
>>>> op .'
>>>> >> #     but transpose is the reversing all dims just like matlab's
>>>> permute()
>>>> >> #     with static dimorder.
>>>> >>
>>>> >> and so on. Maybe we can try to identify all the use cases and the
>>>> quirks before we can make design the solution. Because these are a bit more
>>>> involved and I don't even know if this is exhaustive.
>>>> >>
>>>> >>
>>>> >> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
>>>> m.h.vankerkwijk at gmail.com> wrote:
>>>> >>>
>>>> >>> Hi Stephan,
>>>> >>>
>>>> >>> Yes, the complex conjugate dtype would make things a lot faster,
>>>> but I don't quite see why we would wait for that with introducing the `.H`
>>>> property.
>>>> >>>
>>>> >>> I do agree that `.H` is the correct name, giving most immediate
>>>> clarity (i.e., people who know what conjugate transpose is, will recognize
>>>> it, while likely having to look up `.CT`, while people who do not know will
>>>> have to look up regardless). But at the same time agree that the docstring
>>>> and other documentation should start with "Conjugate tranpose" - good to
>>>> try to avoid using names of people where you have to be in the "in crowd"
>>>> to know what it means.
>>>> >>>
>>>> >>> The above said, if we were going with the initial suggestion of
>>>> `.MT` for matrix transpose, then I'd prefer `.CT` over `.HT` as its
>>>> conjugate version.
>>>> >>>
>>>> >>> But it seems there is little interest in that suggestion, although
>>>> sadly a clear path forward has not yet emerged either.
>>>> >>>
>>>> >>> All the best,
>>>> >>>
>>>> >>> Marten
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> NumPy-Discussion mailing list
>>>> >>> NumPy-Discussion at python.org
>>>> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>> >>
>>>> >> _______________________________________________
>>>> >> NumPy-Discussion mailing list
>>>> >> NumPy-Discussion at python.org
>>>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>> >
>>>> > _______________________________________________
>>>> > NumPy-Discussion mailing list
>>>> > NumPy-Discussion at python.org
>>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/4bd21045/attachment.html>

From shoyer at gmail.com  Tue Jun 25 13:55:13 2019
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 25 Jun 2019 10:55:13 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAFpSVpKmdiXRnBf8_BgyKoEb8ECAYK4=-fuURnfSkBho-ey1wQ@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
 <CAFpSVpLS_HvuQjLF6Bcxm5QALA-UxZAifA-K6-b=SEBPs9d5eQ@mail.gmail.com>
 <CAEQ_Tvfp6E-++Ly4sep1er7XDnBJ=aVB-1j6yOqHTuquMF9b6A@mail.gmail.com>
 <CAFpSVp+duUQVS+uX6MntM6o2MWL_cE-bw7pZe_ry+zbAVQMPdw@mail.gmail.com>
 <42c14036-1678-4740-4757-6c72b6577a52@gmail.com>
 <CAFpSVpKmdiXRnBf8_BgyKoEb8ECAYK4=-fuURnfSkBho-ey1wQ@mail.gmail.com>
Message-ID: <CAEQ_TvfktMzqs7ZnaoK-eqH6MdrmfLJFp5rEuAVDL=L1BC=3dg@mail.gmail.com>

On Tue, Jun 25, 2019 at 10:14 AM Todd <toddrjen at gmail.com> wrote:

> On Tue, Jun 25, 2019 at 11:47 AM Alan Isaac <alan.isaac at gmail.com> wrote:
>
>> On 6/25/2019 11:03 AM, Todd wrote:
>> > Fair enough.  But although there are valid reasons to do a divide by
>> zero, it still causes a warning because it is a problem often enough that
>> people should be made aware of it.  I
>> > think this is a similar scenario.
>>
>>
>>
>> I side with Stephan on this, but when there are opinions on both sides,
>> I wonder what the resolution strategy is.  I suppose there is a possible
>> tension:
>>
>> 1. Existing practice should be privileged (no change for the sake of
>> change).
>> 2. Documented user issues need to be addressed.
>>
>
> Note that the behavior wouldn't change.  Transposing vectors would do
> exactly what it has always done: nothing.  But people would be made aware
> that the operation they are doing won't actually do anything.
>
> I completely agree that change for the sake of change is not a good
> thing.  But we are talking about a no-op here.  If someone is intentionally
> doing something that does nothing, I would like to think that they could
> deal with a warning that can be easily silenced.
>

I am strongly opposed to adding warnings for documented and correct
behavior that we are not going to change. Warnings are only appropriate in
rare cases that demand user's attention, i.e., code that is almost
certainly not correct, like division by 0. We have already documented use
cases for .T on 1D arrays, such as compatibility with operations also
defined on 2D arrays.

I also agree with Alan that probably it's too late to change the behavior
of .T for arrays with more than 2-dimensions. NumPy could certainly use a
more comprehensive policy around backwards compatibility, but we certainly
need to meet a *very* high bar to break backwards compatibility. I am
skeptical that the slightly cleaner code facilitated by this new definition
for .T would be worth it.


>
>
>> So what is an "in the wild" example of where numpy users create errors
>> that pass
>> silently because a 1-d array transpose did not behave as expected?
>>
>
> Part of the problem with silent errors is that we typically aren't going
> to see them, by definition.  The only way you could catch a silent error
> like that is if someone noticed the results looked different than they
> expected, but that can easily be hidden if the error is a corner case that
> is averaged out.  That is the whole point of having a warning, to make it
> not silent.  It reminds me of the old Weisert quote, "As far as we know,
> our computer has never had an undetected error."
>
> The problems I typically encounter is when people write their code
> assuming that, for example, a trial will have multiple results.  It usually
> does, but on occasion it doesn't.  This sort of thing usually results in an
> error, although it is typically an error far removed from where the problem
> actually occurs and is therefor extremely hard to debug.  I haven't seen
> truly completely silent errors, but again I wouldn't expect to.
>
> We can't really tell how common this sort of thing is until we actively
> check for it.  Remember how many silent errors in encoding were caught once
> python3 starting enforcing proper encoding/decoding handling?  People
> insisted encoding was being handled properly with python2, but it wasn't
> even in massive, mature projects.  People just didn't notice the problems
> before because they were silent.
>
> At the very least, the warning could tell people coming from other
> languages why the transpose is doing something different than they expect,
> as this is not an uncommon issue on stackoverflow. [1]
>
>
>> Why would the unexpected array shape of the result not alert the user if
>> it happens?
>
>
> I think counting on the code to produce an error is really dangerous.  I
> have seen people do a lot of really bizarre things with their code.
>
>
>> In your favor, Mathematica's `Transpose` raises an error when applied to
>> 1d arrays,
>> and the Mma designs are usually carefully considered.
>
>
> Yes, numpy is really the outlier here in making this a silent no-op.
> MATLAB, Julia, R, and SAS all transpose vectors, coercing them to matrices
> if needed.  Again, I don't think we should change how the transpose works,
> it is too late for that.  But I do think that people should be warned about
> it.
>
> [1] https://stackoverflow.com/search?q=numpy+transpose+vector (not all of
> these are relevant, but there are a bunch on there)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/1ebdde93/attachment-0001.html>

From cameronjblocker at gmail.com  Tue Jun 25 14:18:49 2019
From: cameronjblocker at gmail.com (Cameron Blocker)
Date: Tue, 25 Jun 2019 14:18:49 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
Message-ID: <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>

It seems to me that the general consensus is that we shouldn't be changing
.T to do what we've termed matrix transpose or conjugate transpose. As
such, the discussion of whether .T should be changed to throw errors or
warnings on 1D arrays seems a bit off topic (not that it shouldn't be
discussed).

My suggestion that conjugate transpose and matrix transpose be a single
operation .H was partially because I thought it would fill 90% of the use
cases while limiting the added API. The are times that I am batch
processing complex-valued images when what I want is .MT with no
conjugation, but I just figured those use cases would be rare or not
benefit from a short as much, and we could then add .MT later if the demand
presented itself. I agree that the fact that the difference in MATLAB is
implicit is bad, to me .H is explicit to people working with complex
numbers, but I could be wrong.

In regards to Marten's earlier post on names, my preference is .mT for
matrix transpose. I prefer .H if what is implemented is equivalent to
.conj().T, but if what is implemented is equivalent to .conj().mT, then I'd
prefer .mH for symmetry.

I also hope there is a way to implement .H/.mH without a copy as was
briefly discussed above, otherwise .H()/.mH() might be better at making the
copy explicit.

On Tue, Jun 25, 2019 at 1:17 PM Todd <toddrjen at gmail.com> wrote:

> I was saying we shouldn't change the default transpose operation to be
> conjugate transpose.  We don't currently have a conjugate transpose so it
> isn't an issue.  I think having a conjugate transpose is a great idea, I
> just don't think it should be the default.
>
> On Tue, Jun 25, 2019 at 12:12 PM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
>> I think we would have seen a lot of evidence in the last four decades if
>> this was that problematic.
>>
>> You are the second person to memtion these bugs. Care to show me some
>> examples of these bugs?
>>
>> Maybe I am missing the point here. I haven't seen any bugs because
>> somebody thought they are just transposing.
>>
>> Using transpose to reshape an array is a different story. That we can
>> discuss.
>>
>> On Tue, Jun 25, 2019, 16:10 Todd <toddrjen at gmail.com> wrote:
>>
>>> That is how it is in your field, but not mine.  For us we only use the
>>> conventional transpose, even for complex numbers.  And I routinely see bugs
>>> in MATLAB because of its choice of defaults, and there are probably many
>>> more that don't get caught because they happen silently.
>>>
>>> I think the principle of least surprise should apply here.  For people
>>> who need the conjugate transform the know to make sure they use the right
>>> operation.  But a lot of people aren't even aware that there conjugate
>>> transpose exists, they are just going to copy what they see in the examples
>>> without realizing it does the completely wrong thing in certain cases.
>>> They wouldn't bother to check because they don't even know there is a
>>> second transpose operation they need to look out for.  So it would hurt a
>>> lot of people without helping anyone.
>>>
>>> On Tue, Jun 25, 2019, 07:03 Ilhan Polat <ilhanpolat at gmail.com> wrote:
>>>
>>>> I have to disagree, I hardly ever saw such bugs and moreover <Zbar, Z>
>>>> is not compatible if you don't also transpose it but expected in almost all
>>>> contexts of matrices, vectors and scalars. Elementwise conjugation is well
>>>> inline with other elementwise operations starting with a dot in matlab
>>>> hence still consistent.
>>>>
>>>> I would still expect an conjugation+transposition to be the default
>>>> since just transposing a complex array is way more special and rare than
>>>> its ubiquitous regular usage.
>>>>
>>>> ilhan
>>>>
>>>>
>>>> On Tue, Jun 25, 2019 at 10:57 AM Andras Deak <deak.andris at gmail.com>
>>>> wrote:
>>>>
>>>>> On Tue, Jun 25, 2019 at 4:29 AM Cameron Blocker
>>>>> <cameronjblocker at gmail.com> wrote:
>>>>> >
>>>>> > In my opinion, the matrix transpose operator and the conjugate
>>>>> transpose operator should be one and the same. Something nice about both
>>>>> Julia and MATLAB is that it takes more keystrokes to do a regular transpose
>>>>> instead of a conjugate transpose. Then people who work exclusively with
>>>>> real numbers can just forget that it's a conjugate transpose, and for
>>>>> relatively simple algorithms, their code will just work with complex
>>>>> numbers with little modification.
>>>>> >
>>>>>
>>>>> I'd argue that MATLAB's feature of `'` meaning adjoint (conjugate
>>>>> transpose etc.) and `.'` meaning regular transpose causes a lot of
>>>>> confusion and probably a lot of subtle bugs. Most people are unaware
>>>>> that `'` does a conjugate transpose and use it habitually, and when
>>>>> for once they have a complex array they don't understand why the
>>>>> values are off (assuming they even notice). Even the MATLAB docs
>>>>> conflate the two operations occasionally, which doesn't help at all.
>>>>> Transpose should _not_ incur conjugation automatically. I'm already a
>>>>> bit wary of special-casing matrix dynamics this much, when ndarrays
>>>>> are naturally multidimensional objects. Making transposes be more than
>>>>> transposes would be a huge mistake in my opinion, already for matrices
>>>>> (2d arrays) and especially for everything else.
>>>>>
>>>>> Andr?s
>>>>>
>>>>>
>>>>>
>>>>> > Ideally, I'd like to see a .H that was the defacto Matrix/Linear
>>>>> Algebra/Conjugate transpose that for 2 or more dimensions, conjugate
>>>>> transposes the last two dimensions and for 1 dimension just conjugates (if
>>>>> necessary). And then .T can stay the Array/Tensor transpose for general
>>>>> axis manipulation. I'd be okay with .T raising an error/warning on 1D
>>>>> arrays if .H did not. I commonly write things like u.conj().T at v even
>>>>> if I know both u and v are 1D just so it looks more like an inner product.
>>>>> >
>>>>> > -Cameron
>>>>> >
>>>>> > On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <ilhanpolat at gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> I think enumerating the cases along the way makes it a bit more
>>>>> tangible for the discussion
>>>>> >>
>>>>> >>
>>>>> >> import numpy as np
>>>>> >> z = 1+1j
>>>>> >> z.conjugate()  # 1-1j
>>>>> >>
>>>>> >> zz = np.array(z)
>>>>> >> zz  # array(1+1j)
>>>>> >> zz.T  # array(1+1j)  # OK expected.
>>>>> >> zz.conj()  # 1-1j ?? what happened; no arrays?
>>>>> >> zz.conjugate()  # 1-1j ?? same
>>>>> >>
>>>>> >> zz1d = np.array([z]*3)
>>>>> >> zz1d.T  # no change so this is not the regular 2D array
>>>>> >> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>>>> >> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>>>> >> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
>>>>> >> zz1d[:, None].conj()  # 2D column vector - no surprises if [:,
>>>>> None] is known
>>>>> >>
>>>>> >> zz2d = zz1d[:, None]  # 2D column vector - no surprises if [:,
>>>>> None] is known
>>>>> >> zz2d.conj()  # 2D col vec conjugated
>>>>> >> zz2d.conj().T  # 2D col vec conjugated transposed
>>>>> >>
>>>>> >> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
>>>>> >> zz3d.conj()  # no surprises, conjugated
>>>>> >> zz3d.conj().T  # ?? Why not the last two dims swapped like other
>>>>> stacked ops
>>>>> >>
>>>>> >> # For scalar arrays conjugation strips the number
>>>>> >> # For 1D arrays transpose is a no-op but conjugation works
>>>>> >> # For 2D arrays conjugate it is the matlab's elementwise
>>>>> conjugation op .'
>>>>> >> #     and transpose is acting like expected
>>>>> >> # For 3D arrays conjugate it is the matlab's elementwise
>>>>> conjugation op .'
>>>>> >> #     but transpose is the reversing all dims just like matlab's
>>>>> permute()
>>>>> >> #     with static dimorder.
>>>>> >>
>>>>> >> and so on. Maybe we can try to identify all the use cases and the
>>>>> quirks before we can make design the solution. Because these are a bit more
>>>>> involved and I don't even know if this is exhaustive.
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
>>>>> m.h.vankerkwijk at gmail.com> wrote:
>>>>> >>>
>>>>> >>> Hi Stephan,
>>>>> >>>
>>>>> >>> Yes, the complex conjugate dtype would make things a lot faster,
>>>>> but I don't quite see why we would wait for that with introducing the `.H`
>>>>> property.
>>>>> >>>
>>>>> >>> I do agree that `.H` is the correct name, giving most immediate
>>>>> clarity (i.e., people who know what conjugate transpose is, will recognize
>>>>> it, while likely having to look up `.CT`, while people who do not know will
>>>>> have to look up regardless). But at the same time agree that the docstring
>>>>> and other documentation should start with "Conjugate tranpose" - good to
>>>>> try to avoid using names of people where you have to be in the "in crowd"
>>>>> to know what it means.
>>>>> >>>
>>>>> >>> The above said, if we were going with the initial suggestion of
>>>>> `.MT` for matrix transpose, then I'd prefer `.CT` over `.HT` as its
>>>>> conjugate version.
>>>>> >>>
>>>>> >>> But it seems there is little interest in that suggestion, although
>>>>> sadly a clear path forward has not yet emerged either.
>>>>> >>>
>>>>> >>> All the best,
>>>>> >>>
>>>>> >>> Marten
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> NumPy-Discussion mailing list
>>>>> >>> NumPy-Discussion at python.org
>>>>> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> NumPy-Discussion mailing list
>>>>> >> NumPy-Discussion at python.org
>>>>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>> >
>>>>> > _______________________________________________
>>>>> > NumPy-Discussion mailing list
>>>>> > NumPy-Discussion at python.org
>>>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/12efc25f/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Jun 25 14:30:19 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 25 Jun 2019 11:30:19 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
Message-ID: <2b1babd6f758f1446790c0ece5aad92b5a8850dd.camel@sipsolutions.net>

On Tue, 2019-06-25 at 14:18 -0400, Cameron Blocker wrote:
> It seems to me that the general consensus is that we shouldn't be
> changing .T to do what we've termed matrix transpose or conjugate
> transpose. As such, the discussion of whether .T should be changed to
> throw errors or warnings on 1D arrays seems a bit off topic (not that
> it shouldn't be discussed). 
> 

Yeah, it is a separate thing and is likely better to be discussed after
the other discussion has somewhat settled down.

> My suggestion that conjugate transpose and matrix transpose be a
> single operation .H was partially because I thought it would fill 90%
> of the use cases while limiting the added API. The are times that I
> am batch processing complex-valued images when what I want is .MT
> with no conjugation, but I just figured those use cases would be rare
> or not benefit from a short as much, and we could then add .MT later
> if the demand presented itself. I agree that the fact that the
> difference in MATLAB is implicit is bad, to me .H is explicit to
> people working with complex numbers, but I could be wrong.
> 

True, a lot of use cases may be happy to use .H just to get the matrix
transpose operation (although that relies on no-copy semantics
possibly). On the other hand, it might be a confusing change to have .T
and .H not be interchangable, which is a point for `.mT` and `mH` even
if I am even more hesitant on it right now.

> In regards to Marten's earlier post on names, my preference is .mT
> for matrix transpose. I prefer .H if what is implemented is
> equivalent to .conj().T, but if what is implemented is equivalent to
> .conj().mT, then I'd prefer .mH for symmetry. 
> 
> I also hope there is a way to implement .H/.mH without a copy as was
> briefly discussed above, otherwise .H()/.mH() might be better at
> making the copy explicit.
> 

To be honest, the copy/no-copy thing is a going to be a small issue in
any case. There is the idea of no-copy for complex values. Which is
great, however, that does have the issue that it does not work for
object arrays, which have to call `.conjugate()` on each element (and
thus have to copy).

Which is not to say we cannot do it. If we go there, but it is a
confusion (also that .H and .T behave quite differently is). We can
return read-only views, which at least fixes one direction of
unintentional change here. Making it `.H()` could be better in that
regard...

- Sebastian


> On Tue, Jun 25, 2019 at 1:17 PM Todd <toddrjen at gmail.com> wrote:
> > I was saying we shouldn't change the default transpose operation to
> > be conjugate transpose.  We don't currently have a conjugate
> > transpose so it isn't an issue.  I think having a conjugate
> > transpose is a great idea, I just don't think it should be the
> > default.
> > 
> > On Tue, Jun 25, 2019 at 12:12 PM Ilhan Polat <ilhanpolat at gmail.com>
> > wrote:
> > > I think we would have seen a lot of evidence in the last four
> > > decades if this was that problematic.
> > > 
> > > You are the second person to memtion these bugs. Care to show me
> > > some examples of these bugs? 
> > > 
> > > Maybe I am missing the point here. I haven't seen any bugs
> > > because somebody thought they are just transposing.
> > > 
> > > Using transpose to reshape an array is a different story. That we
> > > can discuss. 
> > > 
> > > On Tue, Jun 25, 2019, 16:10 Todd <toddrjen at gmail.com> wrote:
> > > > That is how it is in your field, but not mine.  For us we only
> > > > use the conventional transpose, even for complex numbers.  And
> > > > I routinely see bugs in MATLAB because of its choice of
> > > > defaults, and there are probably many more that don't get
> > > > caught because they happen silently.
> > > > 
> > > > I think the principle of least surprise should apply here.  For
> > > > people who need the conjugate transform the know to make sure
> > > > they use the right operation.  But a lot of people aren't even
> > > > aware that there conjugate transpose exists, they are just
> > > > going to copy what they see in the examples without realizing
> > > > it does the completely wrong thing in certain cases.  They
> > > > wouldn't bother to check because they don't even know there is
> > > > a second transpose operation they need to look out for.  So it
> > > > would hurt a lot of people without helping anyone.
> > > > 
> > > > On Tue, Jun 25, 2019, 07:03 Ilhan Polat <ilhanpolat at gmail.com>
> > > > wrote:
> > > > > I have to disagree, I hardly ever saw such bugs and moreover
> > > > > <Zbar, Z> is not compatible if you don't also transpose it
> > > > > but expected in almost all contexts of matrices, vectors and
> > > > > scalars. Elementwise conjugation is well inline with other
> > > > > elementwise operations starting with a dot in matlab hence
> > > > > still consistent. 
> > > > > 
> > > > > I would still expect an conjugation+transposition to be the
> > > > > default since just transposing a complex array is way more
> > > > > special and rare than its ubiquitous regular usage.
> > > > > 
> > > > > ilhan
> > > > > 
> > > > > 
> > > > > On Tue, Jun 25, 2019 at 10:57 AM Andras Deak <
> > > > > deak.andris at gmail.com> wrote:
> > > > > > On Tue, Jun 25, 2019 at 4:29 AM Cameron Blocker
> > > > > > <cameronjblocker at gmail.com> wrote:
> > > > > > >
> > > > > > > In my opinion, the matrix transpose operator and the
> > > > > > conjugate transpose operator should be one and the same.
> > > > > > Something nice about both Julia and MATLAB is that it takes
> > > > > > more keystrokes to do a regular transpose instead of a
> > > > > > conjugate transpose. Then people who work exclusively with
> > > > > > real numbers can just forget that it's a conjugate
> > > > > > transpose, and for relatively simple algorithms, their code
> > > > > > will just work with complex numbers with little
> > > > > > modification.
> > > > > > >
> > > > > > 
> > > > > > I'd argue that MATLAB's feature of `'` meaning adjoint
> > > > > > (conjugate
> > > > > > transpose etc.) and `.'` meaning regular transpose causes a
> > > > > > lot of
> > > > > > confusion and probably a lot of subtle bugs. Most people
> > > > > > are unaware
> > > > > > that `'` does a conjugate transpose and use it habitually,
> > > > > > and when
> > > > > > for once they have a complex array they don't understand
> > > > > > why the
> > > > > > values are off (assuming they even notice). Even the MATLAB
> > > > > > docs
> > > > > > conflate the two operations occasionally, which doesn't
> > > > > > help at all.
> > > > > > Transpose should _not_ incur conjugation automatically. I'm
> > > > > > already a
> > > > > > bit wary of special-casing matrix dynamics this much, when
> > > > > > ndarrays
> > > > > > are naturally multidimensional objects. Making transposes
> > > > > > be more than
> > > > > > transposes would be a huge mistake in my opinion, already
> > > > > > for matrices
> > > > > > (2d arrays) and especially for everything else.
> > > > > > 
> > > > > > Andr?s
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Ideally, I'd like to see a .H that was the defacto
> > > > > > Matrix/Linear Algebra/Conjugate transpose that for 2 or
> > > > > > more dimensions, conjugate transposes the last two
> > > > > > dimensions and for 1 dimension just conjugates (if
> > > > > > necessary). And then .T can stay the Array/Tensor transpose
> > > > > > for general axis manipulation. I'd be okay with .T raising
> > > > > > an error/warning on 1D arrays if .H did not. I commonly
> > > > > > write things like u.conj().T at v even if I know both u and v
> > > > > > are 1D just so it looks more like an inner product.
> > > > > > >
> > > > > > > -Cameron
> > > > > > >
> > > > > > > On Mon, Jun 24, 2019 at 6:43 PM Ilhan Polat <
> > > > > > ilhanpolat at gmail.com> wrote:
> > > > > > >>
> > > > > > >> I think enumerating the cases along the way makes it a
> > > > > > bit more tangible for the discussion
> > > > > > >>
> > > > > > >>
> > > > > > >> import numpy as np
> > > > > > >> z = 1+1j
> > > > > > >> z.conjugate()  # 1-1j
> > > > > > >>
> > > > > > >> zz = np.array(z)
> > > > > > >> zz  # array(1+1j)
> > > > > > >> zz.T  # array(1+1j)  # OK expected.
> > > > > > >> zz.conj()  # 1-1j ?? what happened; no arrays?
> > > > > > >> zz.conjugate()  # 1-1j ?? same
> > > > > > >>
> > > > > > >> zz1d = np.array([z]*3)
> > > > > > >> zz1d.T  # no change so this is not the regular 2D array
> > > > > > >> zz1d.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
> > > > > > >> zz1d.conj().T  # array([1.-1.j, 1.-1.j, 1.-1.j])
> > > > > > >> zz1d.T.conj()  # array([1.-1.j, 1.-1.j, 1.-1.j])
> > > > > > >> zz1d[:, None].conj()  # 2D column vector - no surprises
> > > > > > if [:, None] is known
> > > > > > >>
> > > > > > >> zz2d = zz1d[:, None]  # 2D column vector - no surprises
> > > > > > if [:, None] is known
> > > > > > >> zz2d.conj()  # 2D col vec conjugated
> > > > > > >> zz2d.conj().T  # 2D col vec conjugated transposed
> > > > > > >>
> > > > > > >> zz3d = np.arange(24.).reshape(2,3,4).view(complex)
> > > > > > >> zz3d.conj()  # no surprises, conjugated
> > > > > > >> zz3d.conj().T  # ?? Why not the last two dims swapped
> > > > > > like other stacked ops
> > > > > > >>
> > > > > > >> # For scalar arrays conjugation strips the number
> > > > > > >> # For 1D arrays transpose is a no-op but conjugation
> > > > > > works
> > > > > > >> # For 2D arrays conjugate it is the matlab's elementwise
> > > > > > conjugation op .'
> > > > > > >> #     and transpose is acting like expected
> > > > > > >> # For 3D arrays conjugate it is the matlab's elementwise
> > > > > > conjugation op .'
> > > > > > >> #     but transpose is the reversing all dims just like
> > > > > > matlab's permute()
> > > > > > >> #     with static dimorder.
> > > > > > >>
> > > > > > >> and so on. Maybe we can try to identify all the use
> > > > > > cases and the quirks before we can make design the
> > > > > > solution. Because these are a bit more involved and I don't
> > > > > > even know if this is exhaustive.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Mon, Jun 24, 2019 at 8:21 PM Marten van Kerkwijk <
> > > > > > m.h.vankerkwijk at gmail.com> wrote:
> > > > > > >>>
> > > > > > >>> Hi Stephan,
> > > > > > >>>
> > > > > > >>> Yes, the complex conjugate dtype would make things a
> > > > > > lot faster, but I don't quite see why we would wait for
> > > > > > that with introducing the `.H` property.
> > > > > > >>>
> > > > > > >>> I do agree that `.H` is the correct name, giving most
> > > > > > immediate clarity (i.e., people who know what conjugate
> > > > > > transpose is, will recognize it, while likely having to
> > > > > > look up `.CT`, while people who do not know will have to
> > > > > > look up regardless). But at the same time agree that the
> > > > > > docstring and other documentation should start with
> > > > > > "Conjugate tranpose" - good to try to avoid using names of
> > > > > > people where you have to be in the "in crowd" to know what
> > > > > > it means.
> > > > > > >>>
> > > > > > >>> The above said, if we were going with the initial
> > > > > > suggestion of `.MT` for matrix transpose, then I'd prefer
> > > > > > `.CT` over `.HT` as its conjugate version.
> > > > > > >>>
> > > > > > >>> But it seems there is little interest in that
> > > > > > suggestion, although sadly a clear path forward has not yet
> > > > > > emerged either.
> > > > > > >>>
> > > > > > >>> All the best,
> > > > > > >>>
> > > > > > >>> Marten
> > > > > > >>>
> > > > > > >>> _______________________________________________
> > > > > > >>> NumPy-Discussion mailing list
> > > > > > >>> NumPy-Discussion at python.org
> > > > > > >>> 
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > >>
> > > > > > >> _______________________________________________
> > > > > > >> NumPy-Discussion mailing list
> > > > > > >> NumPy-Discussion at python.org
> > > > > > >> 
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > NumPy-Discussion mailing list
> > > > > > > NumPy-Discussion at python.org
> > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > _______________________________________________
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion at python.org
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > 
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/1c6f240b/attachment.sig>

From matthew.brett at gmail.com  Tue Jun 25 14:47:03 2019
From: matthew.brett at gmail.com (Matthew Brett)
Date: Tue, 25 Jun 2019 11:47:03 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEQ_TvfktMzqs7ZnaoK-eqH6MdrmfLJFp5rEuAVDL=L1BC=3dg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <AM0PR04MB63061E5DA759D9F458F8C454F7E10@AM0PR04MB6306.eurprd04.prod.outlook.com>
 <26d4bb07fc89b90b2b36812ba0ee1d781ada8117.camel@sipsolutions.net>
 <CAJNV+9sf43MaEjSe1ZH1gwZD1fEiegtrDqN2Dz7OBr3Qkumaew@mail.gmail.com>
 <0e17686d63452272d9f222b577e213e2880fcf25.camel@sipsolutions.net>
 <CAL1kJvBu+QtY9NUKVRE_MUPG+oKsk9fD1tu+ozUy7PJ2TP_rWg@mail.gmail.com>
 <CAJNV+9vL7cOCC2MEVzROgzPRemJ551hvK80qGMbh+X0t-fq1Rw@mail.gmail.com>
 <CAEBuzr9okiVQqSAJhErDckp0MxSWPp6V4yMxiXu3rcsH6c7Zhg@mail.gmail.com>
 <8CF63ED6-CFE5-4A32-A6B3-5221B72877A6@gmail.com>
 <CAJNV+9vbwRXxpw1fNAcRg0U4ZjG3vjqo1sXMWaf_ekNCE__ywQ@mail.gmail.com>
 <e1bb33c4-13bd-40cd-a5f1-8cc13a494fa7@www.fastmail.com>
 <CAFpSVpLS_HvuQjLF6Bcxm5QALA-UxZAifA-K6-b=SEBPs9d5eQ@mail.gmail.com>
 <CAEQ_Tvfp6E-++Ly4sep1er7XDnBJ=aVB-1j6yOqHTuquMF9b6A@mail.gmail.com>
 <CAFpSVp+duUQVS+uX6MntM6o2MWL_cE-bw7pZe_ry+zbAVQMPdw@mail.gmail.com>
 <42c14036-1678-4740-4757-6c72b6577a52@gmail.com>
 <CAFpSVpKmdiXRnBf8_BgyKoEb8ECAYK4=-fuURnfSkBho-ey1wQ@mail.gmail.com>
 <CAEQ_TvfktMzqs7ZnaoK-eqH6MdrmfLJFp5rEuAVDL=L1BC=3dg@mail.gmail.com>
Message-ID: <CAH6Pt5ogWFygVDWpEaR=nLWVd6XEdUXXNQy4mF0wcoDZYabpEg@mail.gmail.com>

Hi,

On Tue, Jun 25, 2019 at 10:57 AM Stephan Hoyer <shoyer at gmail.com>
[snip]
...
> I also agree with Alan that probably it's too late to change the behavior of .T for arrays with more than 2-dimensions. NumPy could certainly use a more comprehensive policy around backwards compatibility, but we certainly need to meet a *very* high bar to break backwards compatibility. I am skeptical that the slightly cleaner code facilitated by this new definition for .T would be worth it.
>

I feel strongly that we should have the following policy:

* Under no circumstances should we make changes that mean that correct
old code will give different results with new Numpy.

On the other hand, it's OK (with a suitable period of deprecation) for
correct old code to raise an informative error with new Numpy.

That means that a.T deprecation -> a.T error -> a.T means a.MT is
forbidden, but a.T deprecation -> a.T error is OK.

Cheers,

Matthew

From kirillbalunov at gmail.com  Tue Jun 25 16:16:05 2019
From: kirillbalunov at gmail.com (Kirill Balunov)
Date: Tue, 25 Jun 2019 23:16:05 +0300
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
Message-ID: <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>

??, 25 ???. 2019 ?. ? 21:20, Cameron Blocker <cameronjblocker at gmail.com>:

> It seems to me that the general consensus is that we shouldn't be changing
> .T to do what we've termed matrix transpose or conjugate transpose.
>

Reading through this thread, I can not say that I have the same opinion -
at first, many looked positively at the possibility of change - `arr.T` to
mean a transpose of the last two dimensions by default, and then people
start discussing several different (albeit related) topics at once. So, I
want to point out that it is rather difficult to follow what is currently
discussed in this thread, probably because several different (albeit
related) topics are being discussed at once. I would suggest at first
discuss `arr.T` change, because other topics somewhat depend on that
(`arr.MT`/`arr.CT`/`arr.H` and others).

p.s:  Documentation about  `.T` shows only two examples, for 1d - to show
that it works and for 2d case. Maybe it means something? (especially for
new `numpy` users. )

with kind regards,
-gdg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/d2cae1ff/attachment-0001.html>

From ralf.gommers at gmail.com  Tue Jun 25 17:00:22 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 25 Jun 2019 23:00:22 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
Message-ID: <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>

On Tue, Jun 25, 2019 at 10:17 PM Kirill Balunov <kirillbalunov at gmail.com>
wrote:

>
> ??, 25 ???. 2019 ?. ? 21:20, Cameron Blocker <cameronjblocker at gmail.com>:
>
>> It seems to me that the general consensus is that we shouldn't be
>> changing .T to do what we've termed matrix transpose or conjugate
>> transpose.
>>
>
> Reading through this thread, I can not say that I have the same opinion -
> at first, many looked positively at the possibility of change - `arr.T` to
> mean a transpose of the last two dimensions by default, and then people
> start discussing several different (albeit related) topics at once. So, I
> want to point out that it is rather difficult to follow what is currently
> discussed in this thread, probably because several different (albeit
> related) topics are being discussed at once. I would suggest at first
> discuss `arr.T` change, because other topics somewhat depend on that
> (`arr.MT`/`arr.CT`/`arr.H` and others).
>

Perhaps not full consensus between the many people with different opinions
and interests. But for the first one, arr.T change: it's clear that this
won't happen. Between Juan's examples of valid use, and what Stephan and
Matthew said, there's not much more to add. We're not going to change
correct code for minor benefits.


> p.s:  Documentation about  `.T` shows only two examples, for 1d - to show
> that it works and for 2d case. Maybe it means something? (especially for
> new `numpy` users. )
>

That only means that there's a limit to the number of examples we've
managed to put in docstrings.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/a2cb9afa/attachment.html>

From m.h.vankerkwijk at gmail.com  Tue Jun 25 17:00:25 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 25 Jun 2019 17:00:25 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
Message-ID: <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>

Hi Kirill, others,

Indeed, it is becoming long! That said, while initially I was quite charmed
by Eric's suggestion of deprecating and then changing `.T`, I think the
well-argued opposition to it has changed my opinion. Perhaps most
persuasive to me was Matthew's point just now that code (or a code snippet)
that worked on an old numpy should not silently do something different on a
new numpy (unless the old behaviour was a true bug, of course; but here
`.T` has always had a very well-defined meaning - even though you are right
that the documentation does not exactly lead the novice user away from
using it for matrix transpose! If someone has the time to open a PR that
clarifies it.........).

Note that I do agree with the sentiment that the deprecation/change would
likely expose some hidden bugs - and, as noted, it is hard to know where
those bugs are if they are hidden! (FWIW, I did find some in astropy's
coordinate implementation, which was initially written for scalar
coordinates where `.T` worked was just fine; as a result, astropy gained a
`matrix_transpose` utility function.) Still, it does not quite outweigh to
me the disadvantages enumerated.

One thing seems clear: if `.T` is out, that means `.H` is out as well (at
least as a matrix transpose, the only sensible meaning I think it has).
Having `.H` as a conjugate matrix transpose would just cause more confusion
about the meaning of `.T`.

For the names, my suggestion of lower-casing the M in the initial one,
i.e., `.mT` and `.mH`, so far seemed most supported (and I think we should
discuss *assuming* those would eventually involve not copying data; let's
not worry about implementation details).

So, specific items to confirm:

1) Is this a worthy addition? (certainly, their existence would reduce
confusion about `.T`... so far, my sense is tentative yes)

2) Are `.mT` and `.mH` indeed the consensus? [1]

3) What, if anything, should these new properties do for 0-d and 1-d
arrays: pass through, change shape, or error? (logically, I think *new*
properties should never emit warnings: either do something or error).
- In favour of pass-through: 1-d is a vector `dot` and `matmul` work fine
with this;
- In favour of shape change: "m" stands for matrix; can be generous on
input, but should be strict on output. After all, other code may not make
the same assumption that 1-d arrays are fine as row and column vectors.
- In favour of error: "m" stands for matrix and the input is not a matrix!
Let the user add np.newaxis in the right place, which will make the intent
clear.

All the best,

Marten

[1] Some sadness about m? and m? - but, then, there is
http://www.modernemacs.com/post/prettify-mode/

On Tue, Jun 25, 2019 at 4:17 PM Kirill Balunov <kirillbalunov at gmail.com>
wrote:

>
> ??, 25 ???. 2019 ?. ? 21:20, Cameron Blocker <cameronjblocker at gmail.com>:
>
>> It seems to me that the general consensus is that we shouldn't be
>> changing .T to do what we've termed matrix transpose or conjugate
>> transpose.
>>
>
> Reading through this thread, I can not say that I have the same opinion -
> at first, many looked positively at the possibility of change - `arr.T` to
> mean a transpose of the last two dimensions by default, and then people
> start discussing several different (albeit related) topics at once. So, I
> want to point out that it is rather difficult to follow what is currently
> discussed in this thread, probably because several different (albeit
> related) topics are being discussed at once. I would suggest at first
> discuss `arr.T` change, because other topics somewhat depend on that
> (`arr.MT`/`arr.CT`/`arr.H` and others).
>
> p.s:  Documentation about  `.T` shows only two examples, for 1d - to show
> that it works and for 2d case. Maybe it means something? (especially for
> new `numpy` users. )
>
> with kind regards,
> -gdg
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/81e9e916/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Jun 25 17:49:33 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 25 Jun 2019 14:49:33 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
Message-ID: <b522b1ecfb8e502c58940dee6334d286abf24834.camel@sipsolutions.net>

On Tue, 2019-06-25 at 17:00 -0400, Marten van Kerkwijk wrote:
> Hi Kirill, others,
> 
> Indeed, it is becoming long! That said, while initially I was quite
> charmed by Eric's suggestion of deprecating and then changing `.T`, I
> think the well-argued opposition to it has changed my opinion.
> Perhaps most persuasive to me was Matthew's point just now that code
> (or a code snippet) that worked on an old numpy should not silently
> do something different on a new numpy (unless the old behaviour was a
> true bug, of course; but here `.T` has always had a very well-defined 
> meaning - even though you are right that the documentation does not
> exactly lead the novice user away from using it for matrix transpose!
> If someone has the time to open a PR that clarifies it.........).
> 
> Note that I do agree with the sentiment that the deprecation/change
> would likely expose some hidden bugs - and, as noted, it is hard to
> know where those bugs are if they are hidden! (FWIW, I did find some
> in astropy's coordinate implementation, which was initially written
> for scalar coordinates where `.T` worked was just fine; as a result,
> astropy gained a `matrix_transpose` utility function.) Still, it does
> not quite outweigh to me the disadvantages enumerated.
> 

True, eventually switching is much more problematic than only
deprecation, and yes, I guess the last step is likely forbidding.

I do not care too much, but the at least the deprecation/warning does
not seem too bad to me unless it is really widely used for high
dimensions. Sure, it requires to touch code and may make it uglier, but
a change requiring to touch a fair amount of scripts is not all that
uncommon, especially if it can find some bugs (e.g. for me
scipy.misc.factorial moving for example meant I had to change a lot of
scripts, annoying but I could live with it).

Although, I might prefer to spend our "force users to do annoying code
changes" chips on better things. And I guess there may not be much of a
point in a mere deprecation.


> One thing seems clear: if `.T` is out, that means `.H` is out as well
> (at least as a matrix transpose, the only sensible meaning I think it
> has). Having `.H` as a conjugate matrix transpose would just cause
> more confusion about the meaning of `.T`.
> 

I tend to agree, the only way that could work seems if T was deprecated
for high dimensions.


> For the names, my suggestion of lower-casing the M in the initial
> one, i.e., `.mT` and `.mH`, so far seemed most supported (and I think
> we should discuss *assuming* those would eventually involve not
> copying data; let's not worry about implementation details). 

It would be a nice assumption, but as I said, I do see an issue with
object array support. Which makes it likely that `.H` could only be
supported on some dtypes (similar to `.real/.imag`).
(Strictly speaking it would be possible to make a ConugateObject dtype
and define casting for it, I have some doubt that the added complexity
is worth it though). The no-copy conjugate is a cool idea but
ultimately may be a bit too cool?

> So, specific items to confirm:
> 
> 1) Is this a worthy addition? (certainly, their existence would
> reduce confusion about `.T`... so far, my sense is tentative yes)
> 
> 2) Are `.mT` and `.mH` indeed the consensus? [1]
> 

It is likely the only reasonable option, unless you make `H` object
which does `arr_like**H` but I doubt that is a good idea.

> 3) What, if anything, should these new properties do for 0-d and 1-d
> arrays: pass through, change shape, or error? (logically, I think
> *new* properties should never emit warnings: either do something or
> error).
<snip>
> Marten
> 
> [1] Some sadness about m? and m? - but, then, there is 
> http://www.modernemacs.com/post/prettify-mode/
> 

Hehe, you are using a block for Phonetic Extensions, and that block has
a second H which looks the same on my font but is Cyrillic. Lucky us,
we could make one of them for row vectors and the other for column
vectors ;).

- Sebastian


> On Tue, Jun 25, 2019 at 4:17 PM Kirill Balunov <
> kirillbalunov at gmail.com> wrote:
> > ??, 25 ???. 2019 ?. ? 21:20, Cameron Blocker <
> > cameronjblocker at gmail.com>:
> > > It seems to me that the general consensus is that we shouldn't be
> > > changing .T to do what we've termed matrix transpose or conjugate
> > > transpose.
> > > 
> >  
> > Reading through this thread, I can not say that I have the same
> > opinion - at first, many looked positively at the possibility of
> > change - `arr.T` to mean a transpose of the last two dimensions by
> > default, and then people start discussing several different (albeit
> > related) topics at once. So, I want to point out that it is rather
> > difficult to follow what is currently discussed in this thread,
> > probably because several different (albeit related) topics are
> > being discussed at once. I would suggest at first discuss `arr.T`
> > change, because other topics somewhat depend on that
> > (`arr.MT`/`arr.CT`/`arr.H` and others).
> > 
> > p.s:  Documentation about  `.T` shows only two examples, for 1d -
> > to show that it works and for 2d case. Maybe it means something?
> > (especially for new `numpy` users. )
> > 
> > with kind regards,
> > -gdg
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/1ee4ed0f/attachment.sig>

From wieser.eric+numpy at gmail.com  Tue Jun 25 18:11:41 2019
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Tue, 25 Jun 2019 15:11:41 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <b522b1ecfb8e502c58940dee6334d286abf24834.camel@sipsolutions.net>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <b522b1ecfb8e502c58940dee6334d286abf24834.camel@sipsolutions.net>
Message-ID: <CAL1kJvBktQG9Rw=HtUfN3zAQ+3TTDfD5Cww7BLmUKjaXOtnOZQ@mail.gmail.com>

One other approach here that perhaps treads a little too close to np.matrix:

class MatrixOpWrapper:
    def __init__(self, arr):  # todo: accept axis arguments here?
        self._array = arr  # todo: assert that arr.ndim >= 2 / call atleast1d
    @property
    def T(self):
        return linalg.transpose(self._array)
    @property
    def H(self):
        return M(self._array.conj()).T
    # add .I too?

M = MatrixOpWrapper

So M(arr).T instead of arr.mT, which has the benefit of not expanding the
number of ndarray members (and those needed by duck-types) further.


On Tue, 25 Jun 2019 at 14:50, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Tue, 2019-06-25 at 17:00 -0400, Marten van Kerkwijk wrote:
> > Hi Kirill, others,
> >
> > Indeed, it is becoming long! That said, while initially I was quite
> > charmed by Eric's suggestion of deprecating and then changing `.T`, I
> > think the well-argued opposition to it has changed my opinion.
> > Perhaps most persuasive to me was Matthew's point just now that code
> > (or a code snippet) that worked on an old numpy should not silently
> > do something different on a new numpy (unless the old behaviour was a
> > true bug, of course; but here `.T` has always had a very well-defined
> > meaning - even though you are right that the documentation does not
> > exactly lead the novice user away from using it for matrix transpose!
> > If someone has the time to open a PR that clarifies it.........).
> >
> > Note that I do agree with the sentiment that the deprecation/change
> > would likely expose some hidden bugs - and, as noted, it is hard to
> > know where those bugs are if they are hidden! (FWIW, I did find some
> > in astropy's coordinate implementation, which was initially written
> > for scalar coordinates where `.T` worked was just fine; as a result,
> > astropy gained a `matrix_transpose` utility function.) Still, it does
> > not quite outweigh to me the disadvantages enumerated.
> >
>
> True, eventually switching is much more problematic than only
> deprecation, and yes, I guess the last step is likely forbidding.
>
> I do not care too much, but the at least the deprecation/warning does
> not seem too bad to me unless it is really widely used for high
> dimensions. Sure, it requires to touch code and may make it uglier, but
> a change requiring to touch a fair amount of scripts is not all that
> uncommon, especially if it can find some bugs (e.g. for me
> scipy.misc.factorial moving for example meant I had to change a lot of
> scripts, annoying but I could live with it).
>
> Although, I might prefer to spend our "force users to do annoying code
> changes" chips on better things. And I guess there may not be much of a
> point in a mere deprecation.
>
>
> > One thing seems clear: if `.T` is out, that means `.H` is out as well
> > (at least as a matrix transpose, the only sensible meaning I think it
> > has). Having `.H` as a conjugate matrix transpose would just cause
> > more confusion about the meaning of `.T`.
> >
>
> I tend to agree, the only way that could work seems if T was deprecated
> for high dimensions.
>
>
> > For the names, my suggestion of lower-casing the M in the initial
> > one, i.e., `.mT` and `.mH`, so far seemed most supported (and I think
> > we should discuss *assuming* those would eventually involve not
> > copying data; let's not worry about implementation details).
>
> It would be a nice assumption, but as I said, I do see an issue with
> object array support. Which makes it likely that `.H` could only be
> supported on some dtypes (similar to `.real/.imag`).
> (Strictly speaking it would be possible to make a ConugateObject dtype
> and define casting for it, I have some doubt that the added complexity
> is worth it though). The no-copy conjugate is a cool idea but
> ultimately may be a bit too cool?
>
> > So, specific items to confirm:
> >
> > 1) Is this a worthy addition? (certainly, their existence would
> > reduce confusion about `.T`... so far, my sense is tentative yes)
> >
> > 2) Are `.mT` and `.mH` indeed the consensus? [1]
> >
>
> It is likely the only reasonable option, unless you make `H` object
> which does `arr_like**H` but I doubt that is a good idea.
>
> > 3) What, if anything, should these new properties do for 0-d and 1-d
> > arrays: pass through, change shape, or error? (logically, I think
> > *new* properties should never emit warnings: either do something or
> > error).
> <snip>
> > Marten
> >
> > [1] Some sadness about m? and m? - but, then, there is
> > http://www.modernemacs.com/post/prettify-mode/
> >
>
> Hehe, you are using a block for Phonetic Extensions, and that block has
> a second H which looks the same on my font but is Cyrillic. Lucky us,
> we could make one of them for row vectors and the other for column
> vectors ;).
>
> - Sebastian
>
>
> > On Tue, Jun 25, 2019 at 4:17 PM Kirill Balunov <
> > kirillbalunov at gmail.com> wrote:
> > > ??, 25 ???. 2019 ?. ? 21:20, Cameron Blocker <
> > > cameronjblocker at gmail.com>:
> > > > It seems to me that the general consensus is that we shouldn't be
> > > > changing .T to do what we've termed matrix transpose or conjugate
> > > > transpose.
> > > >
> > >
> > > Reading through this thread, I can not say that I have the same
> > > opinion - at first, many looked positively at the possibility of
> > > change - `arr.T` to mean a transpose of the last two dimensions by
> > > default, and then people start discussing several different (albeit
> > > related) topics at once. So, I want to point out that it is rather
> > > difficult to follow what is currently discussed in this thread,
> > > probably because several different (albeit related) topics are
> > > being discussed at once. I would suggest at first discuss `arr.T`
> > > change, because other topics somewhat depend on that
> > > (`arr.MT`/`arr.CT`/`arr.H` and others).
> > >
> > > p.s:  Documentation about  `.T` shows only two examples, for 1d -
> > > to show that it works and for 2d case. Maybe it means something?
> > > (especially for new `numpy` users. )
> > >
> > > with kind regards,
> > > -gdg
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/a457f97c/attachment-0001.html>

From ralf.gommers at gmail.com  Tue Jun 25 18:30:42 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 26 Jun 2019 00:30:42 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
Message-ID: <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>

On Tue, Jun 25, 2019 at 11:02 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

>
> For the names, my suggestion of lower-casing the M in the initial one,
> i.e., `.mT` and `.mH`, so far seemed most supported (and I think we should
> discuss *assuming* those would eventually involve not copying data; let's
> not worry about implementation details).
>

For the record, this is not an implementation detail. It was the consensus
before that `H` is a bad idea unless it returns a view just like `T`:
https://github.com/numpy/numpy/issues/8882


> So, specific items to confirm:
>
> 1) Is this a worthy addition? (certainly, their existence would reduce
> confusion about `.T`... so far, my sense is tentative yes)
>
> 2) Are `.mT` and `.mH` indeed the consensus? [1]
>

I think `H` would be good to revisit *if* it can be made to return a view.
I think a tweak on `T` for >2-D input does not meet the bar for inclusion.

Cheers,
Ralf


> 3) What, if anything, should these new properties do for 0-d and 1-d
> arrays: pass through, change shape, or error? (logically, I think *new*
> properties should never emit warnings: either do something or error).
> - In favour of pass-through: 1-d is a vector `dot` and `matmul` work fine
> with this;
> - In favour of shape change: "m" stands for matrix; can be generous on
> input, but should be strict on output. After all, other code may not make
> the same assumption that 1-d arrays are fine as row and column vectors.
> - In favour of error: "m" stands for matrix and the input is not a matrix!
> Let the user add np.newaxis in the right place, which will make the intent
> clear.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/9010e127/attachment.html>

From m.h.vankerkwijk at gmail.com  Tue Jun 25 21:54:10 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Tue, 25 Jun 2019 21:54:10 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
Message-ID: <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>

Hi Ralf,

On Tue, Jun 25, 2019 at 6:31 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Tue, Jun 25, 2019 at 11:02 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>>
>> For the names, my suggestion of lower-casing the M in the initial one,
>> i.e., `.mT` and `.mH`, so far seemed most supported (and I think we should
>> discuss *assuming* those would eventually involve not copying data; let's
>> not worry about implementation details).
>>
>
> For the record, this is not an implementation detail. It was the consensus
> before that `H` is a bad idea unless it returns a view just like `T`:
> https://github.com/numpy/numpy/issues/8882
>

Is there more than an issue in which Nathaniel rejecting it mentioning some
previous consensus? I was part of the discussion of the complex conjugate
dtype, but do not recall any consensus beyond a "wish to keep properties
simple". Certainly the "property does not do any calculation" rule seems
arbitrary; the only strict rule I would apply myself is that the
computation should not be able to fail (computationally, out-of-memory does
not count; that's like large integer overflow). So,  I'd definitely agree
with you if we were discussion a property `.I` for matrix inverse (and
indeed have said so in related issues). But for .H, not so much. Certainly
whoever wrote np.Matrix didn't seem to feel bound by it.

Note that for *matrix* transpose (as opposed to general axis reordering
with .tranpose()), I see far less use for what is returned being a writable
view. Indeed, for conjugate transpose, I would rather never allow writing
back even if it we had the conjugate dtype since one would surely get it
wrong (likely, `.conj()` would also return a read-only view, at least by
default; perhaps one should even go as far as only allowing
`a.view(conjugate-dtype)` as they way to get a writable view).


> So, specific items to confirm:
>
> 1) Is this a worthy addition? (certainly, their existence would reduce
> confusion about `.T`... so far, my sense is tentative yes)
>
> 2) Are `.mT` and `.mH` indeed the consensus? [1]
>

> I think `H` would be good to revisit *if* it can be made to return a
view. I think a tweak on `T` for >2-D input does not meet the bar for
inclusion.

Well, I guess it is obvious I disagree: I think this more than meets the
bar for inclusion. To me, this certainly is a much bigger deal that
something like oindex or vindex (which I do like).

Indeed, it would seem to me that if a visually more convenient way to do
(stacks of) matrix multiplication for numpy is good enough to warrant
changing the python syntax, then surely having a visually more convenient
standard way to do matrix transpose should not be considered off-limits for
ndarray; how often do you see a series matrix manipulations that does not
involve both multiplication and transpose?

It certainly doesn't seem to me much of an argument that someone previously
decided to use .T for a shortcut for the computer scientist idea of
transpose to not allow the mathematical/physical-scientist one - one I
would argue is guaranteed to be used much more.

The latter of course just repeats what many others have written above, but
since given that you call it a "tweak", perhaps it is worth backing up. For
astropy, a quick grep gives:

- 28 uses of the matrix_transpose function I wrote because numpy doesn't
have even a simple function for that and the people who wrote the original
code used the Matrix class which had the proper .T (but doesn't extend to
multiple dimensions; we might still be using it otherwise).

- 11 uses of .T,  all of which seem to be on 2-D arrays and are certainly
used as if they were matrix transpose (most are for fitting). Certainly,
all of these are bugs lying in waiting if the arrays ever get to be >2-D.

All the best,

Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190625/d4b60088/attachment.html>

From ralf.gommers at gmail.com  Tue Jun 25 22:37:48 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 26 Jun 2019 04:37:48 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
 <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>
Message-ID: <CABL7CQjTEkLW8+0sVvkfOk9oErQu2sHrV_z_fpLJj3+hatqLCQ@mail.gmail.com>

On Wed, Jun 26, 2019 at 3:56 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf,
>
> On Tue, Jun 25, 2019 at 6:31 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jun 25, 2019 at 11:02 PM Marten van Kerkwijk <
>> m.h.vankerkwijk at gmail.com> wrote:
>>
>>>
>>> For the names, my suggestion of lower-casing the M in the initial one,
>>> i.e., `.mT` and `.mH`, so far seemed most supported (and I think we should
>>> discuss *assuming* those would eventually involve not copying data; let's
>>> not worry about implementation details).
>>>
>>
>> For the record, this is not an implementation detail. It was the
>> consensus before that `H` is a bad idea unless it returns a view just like
>> `T`: https://github.com/numpy/numpy/issues/8882
>>
>
> Is there more than an issue in which Nathaniel rejecting it mentioning
> some previous consensus?
>

Yes, this has been discussed in lots of detail before, also on this list
(as Nathaniel mentioned in the issue). I spent 10 minutes to try and find
it but that wasn't enough. I do think it's not necessarily my
responsibility though to dig up all the history here - that should be on
the proposers of a new feature ....

I was part of the discussion of the complex conjugate dtype, but do not
> recall any consensus beyond a "wish to keep properties simple". Certainly
> the "property does not do any calculation" rule seems arbitrary; the only
> strict rule I would apply myself is that the computation should not be able
> to fail (computationally, out-of-memory does not count; that's like large
> integer overflow). So,  I'd definitely agree with you if we were discussion
> a property `.I` for matrix inverse (and indeed have said so in related
> issues). But for .H, not so much. Certainly whoever wrote np.Matrix didn't
> seem to feel bound by it.
>
> Note that for *matrix* transpose (as opposed to general axis reordering
> with .tranpose()), I see far less use for what is returned being a writable
> view. Indeed, for conjugate transpose, I would rather never allow writing
> back even if it we had the conjugate dtype since one would surely get it
> wrong (likely, `.conj()` would also return a read-only view, at least by
> default; perhaps one should even go as far as only allowing
> `a.view(conjugate-dtype)` as they way to get a writable view).
>
>
>> So, specific items to confirm:
>>
>> 1) Is this a worthy addition? (certainly, their existence would reduce
>> confusion about `.T`... so far, my sense is tentative yes)
>>
>> 2) Are `.mT` and `.mH` indeed the consensus? [1]
>>
>
> > I think `H` would be good to revisit *if* it can be made to return a
> view. I think a tweak on `T` for >2-D input does not meet the bar for
> inclusion.
>
> Well, I guess it is obvious I disagree: I think this more than meets the
> bar for inclusion. To me, this certainly is a much bigger deal that
> something like oindex or vindex (which I do like).
>

Honestly, I don't really want to be arguing against this (or even be forced
to spend time following along here). My main problem with this proposal
right now is that we've had this discussion multiple times, and it was
rejected with solid arguments after taking up a lot of time. Restarting
that discussion from scratch without considering the history feels wrong.
It's like a democracy voting on becoming a dictatorship repeatedly: you can
have a "no" vote several times, but if you rerun the vote often enough at
some point you'll get a "yes", and then it's a done deal.

I think this requires a serious write-up, as either a NEP or a GitHub issue
with a good set of cross-links and addressing all previous arguments.


> Indeed, it would seem to me that if a visually more convenient way to do
> (stacks of) matrix multiplication for numpy is good enough to warrant
> changing the python syntax, then surely having a visually more convenient
> standard way to do matrix transpose should not be considered off-limits for
> ndarray; how often do you see a series matrix manipulations that does not
> involve both multiplication and transpose?
>
> It certainly doesn't seem to me much of an argument that someone
> previously decided to use .T for a shortcut for the computer scientist idea
> of transpose to not allow the mathematical/physical-scientist one - one I
> would argue is guaranteed to be used much more.
>
> The latter of course just repeats what many others have written above, but
> since given that you call it a "tweak", perhaps it is worth backing up. For
> astropy, a quick grep gives:
>
> - 28 uses of the matrix_transpose function I wrote because numpy doesn't
> have even a simple function for that and the people who wrote the original
> code used the Matrix class which had the proper .T (but doesn't extend to
> multiple dimensions; we might still be using it otherwise).
>

A utility function in scipy.linalg would be a more low-API-impact approach
to addressing this.


> - 11 uses of .T,  all of which seem to be on 2-D arrays and are certainly
> used as if they were matrix transpose (most are for fitting). Certainly,
> all of these are bugs lying in waiting if the arrays ever get to be >2-D.
>

Most linalg is 2-D, that's why numpy.matrix and scipy.sparse matrices are
2-D only. If it's a real worry for those 11 cases, you could just add some
comments or tests that prevent introducing bugs.

More importantly, your assumption that >2-D arrays are "stacks of matrices"
and that other usage is for "computer scientists" is arguably incorrect.
There are many use cases for 3-D and higher-dimensional arrays that are not
just "vectorized matrix math". As a physicist, I've done lots of work with
3-D and 4-D grids for everything from quantum physics to engineering
problems in semiconductor equipment. NumPy is great for that, and I've
never needed >=3-D linalg for any of it (and transposing is useful). So
please don't claim the physicial-scientist view for this:)

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/77cc5188/attachment-0001.html>

From ilhanpolat at gmail.com  Wed Jun 26 03:03:37 2019
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Wed, 26 Jun 2019 09:03:37 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABL7CQjTEkLW8+0sVvkfOk9oErQu2sHrV_z_fpLJj3+hatqLCQ@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
 <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>
 <CABL7CQjTEkLW8+0sVvkfOk9oErQu2sHrV_z_fpLJj3+hatqLCQ@mail.gmail.com>
Message-ID: <CAEBuzr-jd5G=TrS86jSi2KX24Y9WvAxPD4S01D4njdTKmOy=oA@mail.gmail.com>

Maybe a bit of a grouping would help, because I am also losing track here.
Let's see if I could manage to get something sensible because, just like
Marten mentioned, I am confusing myself even when I am thinking about this

1- Transpose operation on 1D arrays:
    This is a well-known confusion point for anyone that arrives at NumPy
usage from, say matlab background or any linear algebra based user. Andras
mentioned already that this is a subset of NumPy users so we have to be
careful about the user assumptions. 1D arrays are computational constructs
and mathematically they don't exist and this is the basis that matlab
enforced since day 1. Any numerical object is an at least 2D array
including scalars hence transposition flips the dimensions even for a col
vector or row vector. That doesn't mean we cannot change it or we need to
follow matlab but this is kind of what anybody kinda sorta wouda expect.
For some historical reason, on numpy side transposition on 1D arrays did
nothing since they have single dimensions. Hence you have to create a 2D
vector for transpose from the get go to match the linear algebra intuition.
Points that has been discussed so far are about whether we should go
further and even intercept this behavior such that 1D transpose gives
errors or warnings as opposed to the current behavior of silent no-op. as
far as I can tell, we have a consensus that this behavior is here to stay
for the foreseeable future.

2- Using transpose to reshape the (complex) array or flip its dimensions
    This is a usage that has been mentioned above that I don't know much
about. I usually go the "reshape() et al." way for this but apparently
folks use it to flip dimensions and they don't want the automatically
conjugation which is exactly the opposite of a linear algebra oriented user
is used to have as an adjoint operator. Therefore points that have been
discussed about are whether to inject conjugation into .T behavior of
complex arrays or not. If not can we have an extra .H or something that
specifically does .conj().T together (or .T.conj() order doesn't matter).
The main feel (that I got so far) is that we shouldn't touch the current
way and hopefully bring in another attribute.

3- Having a shorthand notation such as .H or .mH etc.
    If the previous assertion is true then the issue becomes what should be
the new name of the attribute and how can it have the nice properties of a
transpose such as returning a view etc. However this has been proposed and
rejected before e.g., GH-8882 and GH-13797. There is a catch here though,
because if the alternative is .conj().T then it doesn't matter whether it
copies or not because .conj().T doesn't return a view either and therefore
the user receives a new array anyways. Therefore no benefits lost. Since
the idea is to have a shorthand notation, it seems to me that this point is
artificial in that sense and not necessarily a valid argument for
rejection. But from the reluctance of Ralf I feel like there is a
historical wear-out on this subject.

4- transpose of 3+D arrays
    I think we missed the bus on this one for changing the default behavior
now and there are glimpses of confirmation of this above in the previous
mails. I would suggest discussing this separately.

So if you are not already worn out and not feeling sour about it, I would
like to propose the discussion of item 3 opened once again. Because the
need is real and we don't need to get choked on the implementation details
right away.

Disclaimer: I do applied math so I have a natural bias towards the linalg-y
way of doing things. And sorry about that if I did that above, sometimes
typing quickly loses the intention.


Best,
ilhan


On Wed, Jun 26, 2019 at 4:39 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Wed, Jun 26, 2019 at 3:56 AM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Hi Ralf,
>>
>> On Tue, Jun 25, 2019 at 6:31 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 25, 2019 at 11:02 PM Marten van Kerkwijk <
>>> m.h.vankerkwijk at gmail.com> wrote:
>>>
>>>>
>>>> For the names, my suggestion of lower-casing the M in the initial one,
>>>> i.e., `.mT` and `.mH`, so far seemed most supported (and I think we should
>>>> discuss *assuming* those would eventually involve not copying data; let's
>>>> not worry about implementation details).
>>>>
>>>
>>> For the record, this is not an implementation detail. It was the
>>> consensus before that `H` is a bad idea unless it returns a view just like
>>> `T`: https://github.com/numpy/numpy/issues/8882
>>>
>>
>> Is there more than an issue in which Nathaniel rejecting it mentioning
>> some previous consensus?
>>
>
> Yes, this has been discussed in lots of detail before, also on this list
> (as Nathaniel mentioned in the issue). I spent 10 minutes to try and find
> it but that wasn't enough. I do think it's not necessarily my
> responsibility though to dig up all the history here - that should be on
> the proposers of a new feature ....
>
> I was part of the discussion of the complex conjugate dtype, but do not
>> recall any consensus beyond a "wish to keep properties simple". Certainly
>> the "property does not do any calculation" rule seems arbitrary; the only
>> strict rule I would apply myself is that the computation should not be able
>> to fail (computationally, out-of-memory does not count; that's like large
>> integer overflow). So,  I'd definitely agree with you if we were discussion
>> a property `.I` for matrix inverse (and indeed have said so in related
>> issues). But for .H, not so much. Certainly whoever wrote np.Matrix didn't
>> seem to feel bound by it.
>>
>> Note that for *matrix* transpose (as opposed to general axis reordering
>> with .tranpose()), I see far less use for what is returned being a writable
>> view. Indeed, for conjugate transpose, I would rather never allow writing
>> back even if it we had the conjugate dtype since one would surely get it
>> wrong (likely, `.conj()` would also return a read-only view, at least by
>> default; perhaps one should even go as far as only allowing
>> `a.view(conjugate-dtype)` as they way to get a writable view).
>>
>>
>>> So, specific items to confirm:
>>>
>>> 1) Is this a worthy addition? (certainly, their existence would reduce
>>> confusion about `.T`... so far, my sense is tentative yes)
>>>
>>> 2) Are `.mT` and `.mH` indeed the consensus? [1]
>>>
>>
>> > I think `H` would be good to revisit *if* it can be made to return a
>> view. I think a tweak on `T` for >2-D input does not meet the bar for
>> inclusion.
>>
>> Well, I guess it is obvious I disagree: I think this more than meets the
>> bar for inclusion. To me, this certainly is a much bigger deal that
>> something like oindex or vindex (which I do like).
>>
>
> Honestly, I don't really want to be arguing against this (or even be
> forced to spend time following along here). My main problem with this
> proposal right now is that we've had this discussion multiple times, and it
> was rejected with solid arguments after taking up a lot of time. Restarting
> that discussion from scratch without considering the history feels wrong.
> It's like a democracy voting on becoming a dictatorship repeatedly: you can
> have a "no" vote several times, but if you rerun the vote often enough at
> some point you'll get a "yes", and then it's a done deal.
>
> I think this requires a serious write-up, as either a NEP or a GitHub
> issue with a good set of cross-links and addressing all previous arguments.
>
>
>> Indeed, it would seem to me that if a visually more convenient way to do
>> (stacks of) matrix multiplication for numpy is good enough to warrant
>> changing the python syntax, then surely having a visually more convenient
>> standard way to do matrix transpose should not be considered off-limits for
>> ndarray; how often do you see a series matrix manipulations that does not
>> involve both multiplication and transpose?
>>
>> It certainly doesn't seem to me much of an argument that someone
>> previously decided to use .T for a shortcut for the computer scientist idea
>> of transpose to not allow the mathematical/physical-scientist one - one I
>> would argue is guaranteed to be used much more.
>>
>> The latter of course just repeats what many others have written above,
>> but since given that you call it a "tweak", perhaps it is worth backing up.
>> For astropy, a quick grep gives:
>>
>> - 28 uses of the matrix_transpose function I wrote because numpy doesn't
>> have even a simple function for that and the people who wrote the original
>> code used the Matrix class which had the proper .T (but doesn't extend to
>> multiple dimensions; we might still be using it otherwise).
>>
>
> A utility function in scipy.linalg would be a more low-API-impact approach
> to addressing this.
>
>
>> - 11 uses of .T,  all of which seem to be on 2-D arrays and are certainly
>> used as if they were matrix transpose (most are for fitting). Certainly,
>> all of these are bugs lying in waiting if the arrays ever get to be >2-D.
>>
>
> Most linalg is 2-D, that's why numpy.matrix and scipy.sparse matrices are
> 2-D only. If it's a real worry for those 11 cases, you could just add some
> comments or tests that prevent introducing bugs.
>
> More importantly, your assumption that >2-D arrays are "stacks of
> matrices" and that other usage is for "computer scientists" is arguably
> incorrect. There are many use cases for 3-D and higher-dimensional arrays
> that are not just "vectorized matrix math". As a physicist, I've done lots
> of work with 3-D and 4-D grids for everything from quantum physics to
> engineering problems in semiconductor equipment. NumPy is great for that,
> and I've never needed >=3-D linalg for any of it (and transposing is
> useful). So please don't claim the physicial-scientist view for this:)
>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/8c3f8300/attachment-0001.html>

From deak.andris at gmail.com  Wed Jun 26 04:49:23 2019
From: deak.andris at gmail.com (Andras Deak)
Date: Wed, 26 Jun 2019 10:49:23 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEBuzr-jd5G=TrS86jSi2KX24Y9WvAxPD4S01D4njdTKmOy=oA@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
 <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>
 <CABL7CQjTEkLW8+0sVvkfOk9oErQu2sHrV_z_fpLJj3+hatqLCQ@mail.gmail.com>
 <CAEBuzr-jd5G=TrS86jSi2KX24Y9WvAxPD4S01D4njdTKmOy=oA@mail.gmail.com>
Message-ID: <CAMEWA4ON4=de_G8E3okedgunM23O+L7LLqAueMgx8xGFCc06+A@mail.gmail.com>

Dear Ilhan,

Thanks for writing these up.
I feel that from a usability standpoint most people would support #3
(.H/.mH), especially considering Marten's very good argument about @.
Having to wrap your transposed matrices in function calls half defeats
the purpose of being able to write stacked matrix operations elegantly
within the ndarray class. The question is of course whether it's
feasible from a project management/API design stand point (just to
state the obvious).
Regarding #1 (1d transpose): I just want to make it clear as someone
who switched from MATLAB to python (and couldn't be happier) that we
should treat MATLAB's behaviour as more of a cautionary tale rather
than design ethos. I paused for exactly 5 seconds the first time I ran
into the no-op of 1d transposes, and then I thought "yes, this makes
sense", and that was it. To put it differently, I think it's more
about MATLAB injecting false assumptions into users than about numpy
behaving surprisingly. (On a side note, MATLAB's quirks are one of the
reasons that the Spyder IDE, designed to be a MATLAB replacement, has
very weird quirks that regularly trip up python users.)
Regards,

Andr?s

On Wed, Jun 26, 2019 at 9:04 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
> Maybe a bit of a grouping would help, because I am also losing track here. Let's see if I could manage to get something sensible because, just like Marten mentioned, I am confusing myself even when I am thinking about this
>
> 1- Transpose operation on 1D arrays:
>     This is a well-known confusion point for anyone that arrives at NumPy usage from, say matlab background or any linear algebra based user. Andras mentioned already that this is a subset of NumPy users so we have to be careful about the user assumptions. 1D arrays are computational constructs and mathematically they don't exist and this is the basis that matlab enforced since day 1. Any numerical object is an at least 2D array including scalars hence transposition flips the dimensions even for a col vector or row vector. That doesn't mean we cannot change it or we need to follow matlab but this is kind of what anybody kinda sorta wouda expect. For some historical reason, on numpy side transposition on 1D arrays did nothing since they have single dimensions. Hence you have to create a 2D vector for transpose from the get go to match the linear algebra intuition. Points that has been discussed so far are about whether we should go further and even intercept this behavior such that 1D transpose gives errors or warnings as opposed to the current behavior of silent no-op. as far as I can tell, we have a consensus that this behavior is here to stay for the foreseeable future.
>
> 2- Using transpose to reshape the (complex) array or flip its dimensions
>     This is a usage that has been mentioned above that I don't know much about. I usually go the "reshape() et al." way for this but apparently folks use it to flip dimensions and they don't want the automatically conjugation which is exactly the opposite of a linear algebra oriented user is used to have as an adjoint operator. Therefore points that have been discussed about are whether to inject conjugation into .T behavior of complex arrays or not. If not can we have an extra .H or something that specifically does .conj().T together (or .T.conj() order doesn't matter). The main feel (that I got so far) is that we shouldn't touch the current way and hopefully bring in another attribute.
>
> 3- Having a shorthand notation such as .H or .mH etc.
>     If the previous assertion is true then the issue becomes what should be the new name of the attribute and how can it have the nice properties of a transpose such as returning a view etc. However this has been proposed and rejected before e.g., GH-8882 and GH-13797. There is a catch here though, because if the alternative is .conj().T then it doesn't matter whether it copies or not because .conj().T doesn't return a view either and therefore the user receives a new array anyways. Therefore no benefits lost. Since the idea is to have a shorthand notation, it seems to me that this point is artificial in that sense and not necessarily a valid argument for rejection. But from the reluctance of Ralf I feel like there is a historical wear-out on this subject.
>
> 4- transpose of 3+D arrays
>     I think we missed the bus on this one for changing the default behavior now and there are glimpses of confirmation of this above in the previous mails. I would suggest discussing this separately.
>
> So if you are not already worn out and not feeling sour about it, I would like to propose the discussion of item 3 opened once again. Because the need is real and we don't need to get choked on the implementation details right away.
>
> Disclaimer: I do applied math so I have a natural bias towards the linalg-y way of doing things. And sorry about that if I did that above, sometimes typing quickly loses the intention.
>
>
> Best,
> ilhan
>
>
> On Wed, Jun 26, 2019 at 4:39 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>>
>>
>>
>> On Wed, Jun 26, 2019 at 3:56 AM Marten van Kerkwijk <m.h.vankerkwijk at gmail.com> wrote:
>>>
>>> Hi Ralf,
>>>
>>> On Tue, Jun 25, 2019 at 6:31 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Jun 25, 2019 at 11:02 PM Marten van Kerkwijk <m.h.vankerkwijk at gmail.com> wrote:
>>>>>
>>>>>
>>>>> For the names, my suggestion of lower-casing the M in the initial one, i.e., `.mT` and `.mH`, so far seemed most supported (and I think we should discuss *assuming* those would eventually involve not copying data; let's not worry about implementation details).
>>>>
>>>>
>>>> For the record, this is not an implementation detail. It was the consensus before that `H` is a bad idea unless it returns a view just like `T`: https://github.com/numpy/numpy/issues/8882
>>>
>>>
>>> Is there more than an issue in which Nathaniel rejecting it mentioning some previous consensus?
>>
>>
>> Yes, this has been discussed in lots of detail before, also on this list (as Nathaniel mentioned in the issue). I spent 10 minutes to try and find it but that wasn't enough. I do think it's not necessarily my responsibility though to dig up all the history here - that should be on the proposers of a new feature ....
>>
>>> I was part of the discussion of the complex conjugate dtype, but do not recall any consensus beyond a "wish to keep properties simple". Certainly the "property does not do any calculation" rule seems arbitrary; the only strict rule I would apply myself is that the computation should not be able to fail (computationally, out-of-memory does not count; that's like large integer overflow). So,  I'd definitely agree with you if we were discussion a property `.I` for matrix inverse (and indeed have said so in related issues). But for .H, not so much. Certainly whoever wrote np.Matrix didn't seem to feel bound by it.
>>>
>>> Note that for *matrix* transpose (as opposed to general axis reordering with .tranpose()), I see far less use for what is returned being a writable view. Indeed, for conjugate transpose, I would rather never allow writing back even if it we had the conjugate dtype since one would surely get it wrong (likely, `.conj()` would also return a read-only view, at least by default; perhaps one should even go as far as only allowing `a.view(conjugate-dtype)` as they way to get a writable view).
>>>
>>>>
>>>> So, specific items to confirm:
>>>>
>>>> 1) Is this a worthy addition? (certainly, their existence would reduce confusion about `.T`... so far, my sense is tentative yes)
>>>>
>>>> 2) Are `.mT` and `.mH` indeed the consensus? [1]
>>>
>>>
>>> > I think `H` would be good to revisit *if* it can be made to return a view. I think a tweak on `T` for >2-D input does not meet the bar for inclusion.
>>>
>>> Well, I guess it is obvious I disagree: I think this more than meets the bar for inclusion. To me, this certainly is a much bigger deal that something like oindex or vindex (which I do like).
>>
>>
>> Honestly, I don't really want to be arguing against this (or even be forced to spend time following along here). My main problem with this proposal right now is that we've had this discussion multiple times, and it was rejected with solid arguments after taking up a lot of time. Restarting that discussion from scratch without considering the history feels wrong. It's like a democracy voting on becoming a dictatorship repeatedly: you can have a "no" vote several times, but if you rerun the vote often enough at some point you'll get a "yes", and then it's a done deal.
>>
>> I think this requires a serious write-up, as either a NEP or a GitHub issue with a good set of cross-links and addressing all previous arguments.
>>
>>>
>>> Indeed, it would seem to me that if a visually more convenient way to do (stacks of) matrix multiplication for numpy is good enough to warrant changing the python syntax, then surely having a visually more convenient standard way to do matrix transpose should not be considered off-limits for ndarray; how often do you see a series matrix manipulations that does not involve both multiplication and transpose?
>>>
>>> It certainly doesn't seem to me much of an argument that someone previously decided to use .T for a shortcut for the computer scientist idea of transpose to not allow the mathematical/physical-scientist one - one I would argue is guaranteed to be used much more.
>>>
>>> The latter of course just repeats what many others have written above, but since given that you call it a "tweak", perhaps it is worth backing up. For astropy, a quick grep gives:
>>>
>>> - 28 uses of the matrix_transpose function I wrote because numpy doesn't have even a simple function for that and the people who wrote the original code used the Matrix class which had the proper .T (but doesn't extend to multiple dimensions; we might still be using it otherwise).
>>
>>
>> A utility function in scipy.linalg would be a more low-API-impact approach to addressing this.
>>
>>>
>>> - 11 uses of .T,  all of which seem to be on 2-D arrays and are certainly used as if they were matrix transpose (most are for fitting). Certainly, all of these are bugs lying in waiting if the arrays ever get to be >2-D.
>>
>>
>> Most linalg is 2-D, that's why numpy.matrix and scipy.sparse matrices are 2-D only. If it's a real worry for those 11 cases, you could just add some comments or tests that prevent introducing bugs.
>>
>> More importantly, your assumption that >2-D arrays are "stacks of matrices" and that other usage is for "computer scientists" is arguably incorrect. There are many use cases for 3-D and higher-dimensional arrays that are not just "vectorized matrix math". As a physicist, I've done lots of work with 3-D and 4-D grids for everything from quantum physics to engineering problems in semiconductor equipment. NumPy is great for that, and I've never needed >=3-D linalg for any of it (and transposing is useful). So please don't claim the physicial-scientist view for this:)
>>
>> Cheers,
>> Ralf
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From cameronjblocker at gmail.com  Wed Jun 26 12:32:04 2019
From: cameronjblocker at gmail.com (Cameron Blocker)
Date: Wed, 26 Jun 2019 12:32:04 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAMEWA4ON4=de_G8E3okedgunM23O+L7LLqAueMgx8xGFCc06+A@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
 <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>
 <CABL7CQjTEkLW8+0sVvkfOk9oErQu2sHrV_z_fpLJj3+hatqLCQ@mail.gmail.com>
 <CAEBuzr-jd5G=TrS86jSi2KX24Y9WvAxPD4S01D4njdTKmOy=oA@mail.gmail.com>
 <CAMEWA4ON4=de_G8E3okedgunM23O+L7LLqAueMgx8xGFCc06+A@mail.gmail.com>
Message-ID: <CAMFejUBpu4ystrqZBsviCiVGJv_FRXEHphX9yh7Jvi14aWduzQ@mail.gmail.com>

A previous discussion of adding a .H operator on the mailing list can be
found here:
http://numpy-discussion.10968.n7.nabble.com/add-H-attribute-td34474.html
that thread refers to an earlier discussion at
http://thread.gmane.org/gmane.comp.python.numeric.general/6637
but that link was broken for me at least, but Ralf summarized it as "No
strong arguments against and then several more votes in favor."

In summary, people seemed to like the idea of .H if it could return a view(
or iterator) like .T, and didn't want it to return a copy temporarily until
that could happen. A couple of people thought that .H was out of scope for
an array library.

This discussion also seems to be before the deprecation of np.Matrix had
started, so the demand was maybe less evident then?

Is what is stopping .H from happening just that no one has stepped up to
implement a conjugate view? If so, I am happy to contribute my time to
this. I commonly work with large complex arrays and would appreciate saving
the copy.


On Wed, Jun 26, 2019 at 4:50 AM Andras Deak <deak.andris at gmail.com> wrote:

> Dear Ilhan,
>
> Thanks for writing these up.
> I feel that from a usability standpoint most people would support #3
> (.H/.mH), especially considering Marten's very good argument about @.
> Having to wrap your transposed matrices in function calls half defeats
> the purpose of being able to write stacked matrix operations elegantly
> within the ndarray class. The question is of course whether it's
> feasible from a project management/API design stand point (just to
> state the obvious).
> Regarding #1 (1d transpose): I just want to make it clear as someone
> who switched from MATLAB to python (and couldn't be happier) that we
> should treat MATLAB's behaviour as more of a cautionary tale rather
> than design ethos. I paused for exactly 5 seconds the first time I ran
> into the no-op of 1d transposes, and then I thought "yes, this makes
> sense", and that was it. To put it differently, I think it's more
> about MATLAB injecting false assumptions into users than about numpy
> behaving surprisingly. (On a side note, MATLAB's quirks are one of the
> reasons that the Spyder IDE, designed to be a MATLAB replacement, has
> very weird quirks that regularly trip up python users.)
> Regards,
>
> Andr?s
>
> On Wed, Jun 26, 2019 at 9:04 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:
> >
> > Maybe a bit of a grouping would help, because I am also losing track
> here. Let's see if I could manage to get something sensible because, just
> like Marten mentioned, I am confusing myself even when I am thinking about
> this
> >
> > 1- Transpose operation on 1D arrays:
> >     This is a well-known confusion point for anyone that arrives at
> NumPy usage from, say matlab background or any linear algebra based user.
> Andras mentioned already that this is a subset of NumPy users so we have to
> be careful about the user assumptions. 1D arrays are computational
> constructs and mathematically they don't exist and this is the basis that
> matlab enforced since day 1. Any numerical object is an at least 2D array
> including scalars hence transposition flips the dimensions even for a col
> vector or row vector. That doesn't mean we cannot change it or we need to
> follow matlab but this is kind of what anybody kinda sorta wouda expect.
> For some historical reason, on numpy side transposition on 1D arrays did
> nothing since they have single dimensions. Hence you have to create a 2D
> vector for transpose from the get go to match the linear algebra intuition.
> Points that has been discussed so far are about whether we should go
> further and even intercept this behavior such that 1D transpose gives
> errors or warnings as opposed to the current behavior of silent no-op. as
> far as I can tell, we have a consensus that this behavior is here to stay
> for the foreseeable future.
> >
> > 2- Using transpose to reshape the (complex) array or flip its dimensions
> >     This is a usage that has been mentioned above that I don't know much
> about. I usually go the "reshape() et al." way for this but apparently
> folks use it to flip dimensions and they don't want the automatically
> conjugation which is exactly the opposite of a linear algebra oriented user
> is used to have as an adjoint operator. Therefore points that have been
> discussed about are whether to inject conjugation into .T behavior of
> complex arrays or not. If not can we have an extra .H or something that
> specifically does .conj().T together (or .T.conj() order doesn't matter).
> The main feel (that I got so far) is that we shouldn't touch the current
> way and hopefully bring in another attribute.
> >
> > 3- Having a shorthand notation such as .H or .mH etc.
> >     If the previous assertion is true then the issue becomes what should
> be the new name of the attribute and how can it have the nice properties of
> a transpose such as returning a view etc. However this has been proposed
> and rejected before e.g., GH-8882 and GH-13797. There is a catch here
> though, because if the alternative is .conj().T then it doesn't matter
> whether it copies or not because .conj().T doesn't return a view either and
> therefore the user receives a new array anyways. Therefore no benefits
> lost. Since the idea is to have a shorthand notation, it seems to me that
> this point is artificial in that sense and not necessarily a valid argument
> for rejection. But from the reluctance of Ralf I feel like there is a
> historical wear-out on this subject.
> >
> > 4- transpose of 3+D arrays
> >     I think we missed the bus on this one for changing the default
> behavior now and there are glimpses of confirmation of this above in the
> previous mails. I would suggest discussing this separately.
> >
> > So if you are not already worn out and not feeling sour about it, I
> would like to propose the discussion of item 3 opened once again. Because
> the need is real and we don't need to get choked on the implementation
> details right away.
> >
> > Disclaimer: I do applied math so I have a natural bias towards the
> linalg-y way of doing things. And sorry about that if I did that above,
> sometimes typing quickly loses the intention.
> >
> >
> > Best,
> > ilhan
> >
> >
> > On Wed, Jun 26, 2019 at 4:39 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >>
> >>
> >>
> >> On Wed, Jun 26, 2019 at 3:56 AM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
> >>>
> >>> Hi Ralf,
> >>>
> >>> On Tue, Jun 25, 2019 at 6:31 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Jun 25, 2019 at 11:02 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>> For the names, my suggestion of lower-casing the M in the initial
> one, i.e., `.mT` and `.mH`, so far seemed most supported (and I think we
> should discuss *assuming* those would eventually involve not copying data;
> let's not worry about implementation details).
> >>>>
> >>>>
> >>>> For the record, this is not an implementation detail. It was the
> consensus before that `H` is a bad idea unless it returns a view just like
> `T`: https://github.com/numpy/numpy/issues/8882
> >>>
> >>>
> >>> Is there more than an issue in which Nathaniel rejecting it mentioning
> some previous consensus?
> >>
> >>
> >> Yes, this has been discussed in lots of detail before, also on this
> list (as Nathaniel mentioned in the issue). I spent 10 minutes to try and
> find it but that wasn't enough. I do think it's not necessarily my
> responsibility though to dig up all the history here - that should be on
> the proposers of a new feature ....
> >>
> >>> I was part of the discussion of the complex conjugate dtype, but do
> not recall any consensus beyond a "wish to keep properties simple".
> Certainly the "property does not do any calculation" rule seems arbitrary;
> the only strict rule I would apply myself is that the computation should
> not be able to fail (computationally, out-of-memory does not count; that's
> like large integer overflow). So,  I'd definitely agree with you if we were
> discussion a property `.I` for matrix inverse (and indeed have said so in
> related issues). But for .H, not so much. Certainly whoever wrote np.Matrix
> didn't seem to feel bound by it.
> >>>
> >>> Note that for *matrix* transpose (as opposed to general axis
> reordering with .tranpose()), I see far less use for what is returned being
> a writable view. Indeed, for conjugate transpose, I would rather never
> allow writing back even if it we had the conjugate dtype since one would
> surely get it wrong (likely, `.conj()` would also return a read-only view,
> at least by default; perhaps one should even go as far as only allowing
> `a.view(conjugate-dtype)` as they way to get a writable view).
> >>>
> >>>>
> >>>> So, specific items to confirm:
> >>>>
> >>>> 1) Is this a worthy addition? (certainly, their existence would
> reduce confusion about `.T`... so far, my sense is tentative yes)
> >>>>
> >>>> 2) Are `.mT` and `.mH` indeed the consensus? [1]
> >>>
> >>>
> >>> > I think `H` would be good to revisit *if* it can be made to return a
> view. I think a tweak on `T` for >2-D input does not meet the bar for
> inclusion.
> >>>
> >>> Well, I guess it is obvious I disagree: I think this more than meets
> the bar for inclusion. To me, this certainly is a much bigger deal that
> something like oindex or vindex (which I do like).
> >>
> >>
> >> Honestly, I don't really want to be arguing against this (or even be
> forced to spend time following along here). My main problem with this
> proposal right now is that we've had this discussion multiple times, and it
> was rejected with solid arguments after taking up a lot of time. Restarting
> that discussion from scratch without considering the history feels wrong.
> It's like a democracy voting on becoming a dictatorship repeatedly: you can
> have a "no" vote several times, but if you rerun the vote often enough at
> some point you'll get a "yes", and then it's a done deal.
> >>
> >> I think this requires a serious write-up, as either a NEP or a GitHub
> issue with a good set of cross-links and addressing all previous arguments.
> >>
> >>>
> >>> Indeed, it would seem to me that if a visually more convenient way to
> do (stacks of) matrix multiplication for numpy is good enough to warrant
> changing the python syntax, then surely having a visually more convenient
> standard way to do matrix transpose should not be considered off-limits for
> ndarray; how often do you see a series matrix manipulations that does not
> involve both multiplication and transpose?
> >>>
> >>> It certainly doesn't seem to me much of an argument that someone
> previously decided to use .T for a shortcut for the computer scientist idea
> of transpose to not allow the mathematical/physical-scientist one - one I
> would argue is guaranteed to be used much more.
> >>>
> >>> The latter of course just repeats what many others have written above,
> but since given that you call it a "tweak", perhaps it is worth backing up.
> For astropy, a quick grep gives:
> >>>
> >>> - 28 uses of the matrix_transpose function I wrote because numpy
> doesn't have even a simple function for that and the people who wrote the
> original code used the Matrix class which had the proper .T (but doesn't
> extend to multiple dimensions; we might still be using it otherwise).
> >>
> >>
> >> A utility function in scipy.linalg would be a more low-API-impact
> approach to addressing this.
> >>
> >>>
> >>> - 11 uses of .T,  all of which seem to be on 2-D arrays and are
> certainly used as if they were matrix transpose (most are for fitting).
> Certainly, all of these are bugs lying in waiting if the arrays ever get to
> be >2-D.
> >>
> >>
> >> Most linalg is 2-D, that's why numpy.matrix and scipy.sparse matrices
> are 2-D only. If it's a real worry for those 11 cases, you could just add
> some comments or tests that prevent introducing bugs.
> >>
> >> More importantly, your assumption that >2-D arrays are "stacks of
> matrices" and that other usage is for "computer scientists" is arguably
> incorrect. There are many use cases for 3-D and higher-dimensional arrays
> that are not just "vectorized matrix math". As a physicist, I've done lots
> of work with 3-D and 4-D grids for everything from quantum physics to
> engineering problems in semiconductor equipment. NumPy is great for that,
> and I've never needed >=3-D linalg for any of it (and transposing is
> useful). So please don't claim the physicial-scientist view for this:)
> >>
> >> Cheers,
> >> Ralf
> >>
> >>
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/83995bd0/attachment-0001.html>

From ralf.gommers at gmail.com  Wed Jun 26 13:00:20 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 26 Jun 2019 19:00:20 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAMFejUBpu4ystrqZBsviCiVGJv_FRXEHphX9yh7Jvi14aWduzQ@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
 <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>
 <CABL7CQjTEkLW8+0sVvkfOk9oErQu2sHrV_z_fpLJj3+hatqLCQ@mail.gmail.com>
 <CAEBuzr-jd5G=TrS86jSi2KX24Y9WvAxPD4S01D4njdTKmOy=oA@mail.gmail.com>
 <CAMEWA4ON4=de_G8E3okedgunM23O+L7LLqAueMgx8xGFCc06+A@mail.gmail.com>
 <CAMFejUBpu4ystrqZBsviCiVGJv_FRXEHphX9yh7Jvi14aWduzQ@mail.gmail.com>
Message-ID: <CABL7CQgHVdM53n0qZh1-qMx7dMUSPe0F=wRmNMnwzmeSRuty=g@mail.gmail.com>

On Wed, Jun 26, 2019 at 6:32 PM Cameron Blocker <cameronjblocker at gmail.com>
wrote:

> A previous discussion of adding a .H operator on the mailing list can be
> found here:
> http://numpy-discussion.10968.n7.nabble.com/add-H-attribute-td34474.html
> that thread refers to an earlier discussion at
> http://thread.gmane.org/gmane.comp.python.numeric.general/6637
> but that link was broken for me at least, but Ralf summarized it as "No
> strong arguments against and then several more votes in favor."
>

Thanks for digging up that history!

Summary is that it was indeed about copy/view. Travis, Dag Sverre and
Nathaniel all argued against .H with copy behavior.


> In summary, people seemed to like the idea of .H if it could return a
> view( or iterator) like .T, and didn't want it to return a copy temporarily
> until that could happen. A couple of people thought that .H was out of
> scope for an array library.
>
> This discussion also seems to be before the deprecation of np.Matrix had
> started, so the demand was maybe less evident then?
>

Probably not, that thread is from 5 years after it was clear that np.matrix
should not be used anymore.


> Is what is stopping .H from happening just that no one has stepped up to
> implement a conjugate view?
>

I think so, yes.

If so, I am happy to contribute my time to this. I commonly work with large
> complex arrays and would appreciate saving the copy.
>

Thanks, that would be really welcome.

If that doesn't work out, the alternative proposed by Eric yesterday to
write a better new matrix object (
https://github.com/numpy/numpy/issues/13835) is probably the way to go. Or
it may be preferred anyway / as well, because you can add more niceties
like row/column vectors and enforcing >= 2-D while still not causing
problems like np.matrix did by changing semantics of operators or indexing.


>
> On Wed, Jun 26, 2019 at 4:50 AM Andras Deak <deak.andris at gmail.com> wrote:
>
>> Dear Ilhan,
>>
>> Thanks for writing these up.
>> I feel that from a usability standpoint most people would support #3
>> (.H/.mH), especially considering Marten's very good argument about @.
>>
>
The main motivation for the @ PEP was actually to be able to get rid of
objects like np.matrix and scipy.sparse matrices that redefine the meaning
of the * operator. Quote: "This PEP proposes the minimum effective change
to Python syntax that will allow us to drain this swamp [meaning np.matrix
& co]."

Notably, the @ PEP was written by Nathaniel, who was opposed to a copying
.H.

Cheers,
Ralf


Having to wrap your transposed matrices in function calls half defeats
>> the purpose of being able to write stacked matrix operations elegantly
>> within the ndarray class. The question is of course whether it's
>> feasible from a project management/API design stand point (just to
>> state the obvious).
>> Regarding #1 (1d transpose): I just want to make it clear as someone
>> who switched from MATLAB to python (and couldn't be happier) that we
>> should treat MATLAB's behaviour as more of a cautionary tale rather
>> than design ethos. I paused for exactly 5 seconds the first time I ran
>> into the no-op of 1d transposes, and then I thought "yes, this makes
>> sense", and that was it. To put it differently, I think it's more
>> about MATLAB injecting false assumptions into users than about numpy
>> behaving surprisingly. (On a side note, MATLAB's quirks are one of the
>> reasons that the Spyder IDE, designed to be a MATLAB replacement, has
>> very weird quirks that regularly trip up python users.)
>> Regards,
>>
>> Andr?s
>>
>> On Wed, Jun 26, 2019 at 9:04 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:
>> >
>> > Maybe a bit of a grouping would help, because I am also losing track
>> here. Let's see if I could manage to get something sensible because, just
>> like Marten mentioned, I am confusing myself even when I am thinking about
>> this
>> >
>> > 1- Transpose operation on 1D arrays:
>> >     This is a well-known confusion point for anyone that arrives at
>> NumPy usage from, say matlab background or any linear algebra based user.
>> Andras mentioned already that this is a subset of NumPy users so we have to
>> be careful about the user assumptions. 1D arrays are computational
>> constructs and mathematically they don't exist and this is the basis that
>> matlab enforced since day 1. Any numerical object is an at least 2D array
>> including scalars hence transposition flips the dimensions even for a col
>> vector or row vector. That doesn't mean we cannot change it or we need to
>> follow matlab but this is kind of what anybody kinda sorta wouda expect.
>> For some historical reason, on numpy side transposition on 1D arrays did
>> nothing since they have single dimensions. Hence you have to create a 2D
>> vector for transpose from the get go to match the linear algebra intuition.
>> Points that has been discussed so far are about whether we should go
>> further and even intercept this behavior such that 1D transpose gives
>> errors or warnings as opposed to the current behavior of silent no-op. as
>> far as I can tell, we have a consensus that this behavior is here to stay
>> for the foreseeable future.
>> >
>> > 2- Using transpose to reshape the (complex) array or flip its dimensions
>> >     This is a usage that has been mentioned above that I don't know
>> much about. I usually go the "reshape() et al." way for this but apparently
>> folks use it to flip dimensions and they don't want the automatically
>> conjugation which is exactly the opposite of a linear algebra oriented user
>> is used to have as an adjoint operator. Therefore points that have been
>> discussed about are whether to inject conjugation into .T behavior of
>> complex arrays or not. If not can we have an extra .H or something that
>> specifically does .conj().T together (or .T.conj() order doesn't matter).
>> The main feel (that I got so far) is that we shouldn't touch the current
>> way and hopefully bring in another attribute.
>> >
>> > 3- Having a shorthand notation such as .H or .mH etc.
>> >     If the previous assertion is true then the issue becomes what
>> should be the new name of the attribute and how can it have the nice
>> properties of a transpose such as returning a view etc. However this has
>> been proposed and rejected before e.g., GH-8882 and GH-13797. There is a
>> catch here though, because if the alternative is .conj().T then it doesn't
>> matter whether it copies or not because .conj().T doesn't return a view
>> either and therefore the user receives a new array anyways. Therefore no
>> benefits lost. Since the idea is to have a shorthand notation, it seems to
>> me that this point is artificial in that sense and not necessarily a valid
>> argument for rejection. But from the reluctance of Ralf I feel like there
>> is a historical wear-out on this subject.
>> >
>> > 4- transpose of 3+D arrays
>> >     I think we missed the bus on this one for changing the default
>> behavior now and there are glimpses of confirmation of this above in the
>> previous mails. I would suggest discussing this separately.
>> >
>> > So if you are not already worn out and not feeling sour about it, I
>> would like to propose the discussion of item 3 opened once again. Because
>> the need is real and we don't need to get choked on the implementation
>> details right away.
>> >
>> > Disclaimer: I do applied math so I have a natural bias towards the
>> linalg-y way of doing things. And sorry about that if I did that above,
>> sometimes typing quickly loses the intention.
>> >
>> >
>> > Best,
>> > ilhan
>> >
>> >
>> > On Wed, Jun 26, 2019 at 4:39 AM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>> >>
>> >>
>> >>
>> >> On Wed, Jun 26, 2019 at 3:56 AM Marten van Kerkwijk <
>> m.h.vankerkwijk at gmail.com> wrote:
>> >>>
>> >>> Hi Ralf,
>> >>>
>> >>> On Tue, Jun 25, 2019 at 6:31 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Jun 25, 2019 at 11:02 PM Marten van Kerkwijk <
>> m.h.vankerkwijk at gmail.com> wrote:
>> >>>>>
>> >>>>>
>> >>>>> For the names, my suggestion of lower-casing the M in the initial
>> one, i.e., `.mT` and `.mH`, so far seemed most supported (and I think we
>> should discuss *assuming* those would eventually involve not copying data;
>> let's not worry about implementation details).
>> >>>>
>> >>>>
>> >>>> For the record, this is not an implementation detail. It was the
>> consensus before that `H` is a bad idea unless it returns a view just like
>> `T`: https://github.com/numpy/numpy/issues/8882
>> >>>
>> >>>
>> >>> Is there more than an issue in which Nathaniel rejecting it
>> mentioning some previous consensus?
>> >>
>> >>
>> >> Yes, this has been discussed in lots of detail before, also on this
>> list (as Nathaniel mentioned in the issue). I spent 10 minutes to try and
>> find it but that wasn't enough. I do think it's not necessarily my
>> responsibility though to dig up all the history here - that should be on
>> the proposers of a new feature ....
>> >>
>> >>> I was part of the discussion of the complex conjugate dtype, but do
>> not recall any consensus beyond a "wish to keep properties simple".
>> Certainly the "property does not do any calculation" rule seems arbitrary;
>> the only strict rule I would apply myself is that the computation should
>> not be able to fail (computationally, out-of-memory does not count; that's
>> like large integer overflow). So,  I'd definitely agree with you if we were
>> discussion a property `.I` for matrix inverse (and indeed have said so in
>> related issues). But for .H, not so much. Certainly whoever wrote np.Matrix
>> didn't seem to feel bound by it.
>> >>>
>> >>> Note that for *matrix* transpose (as opposed to general axis
>> reordering with .tranpose()), I see far less use for what is returned being
>> a writable view. Indeed, for conjugate transpose, I would rather never
>> allow writing back even if it we had the conjugate dtype since one would
>> surely get it wrong (likely, `.conj()` would also return a read-only view,
>> at least by default; perhaps one should even go as far as only allowing
>> `a.view(conjugate-dtype)` as they way to get a writable view).
>> >>>
>> >>>>
>> >>>> So, specific items to confirm:
>> >>>>
>> >>>> 1) Is this a worthy addition? (certainly, their existence would
>> reduce confusion about `.T`... so far, my sense is tentative yes)
>> >>>>
>> >>>> 2) Are `.mT` and `.mH` indeed the consensus? [1]
>> >>>
>> >>>
>> >>> > I think `H` would be good to revisit *if* it can be made to return
>> a view. I think a tweak on `T` for >2-D input does not meet the bar for
>> inclusion.
>> >>>
>> >>> Well, I guess it is obvious I disagree: I think this more than meets
>> the bar for inclusion. To me, this certainly is a much bigger deal that
>> something like oindex or vindex (which I do like).
>> >>
>> >>
>> >> Honestly, I don't really want to be arguing against this (or even be
>> forced to spend time following along here). My main problem with this
>> proposal right now is that we've had this discussion multiple times, and it
>> was rejected with solid arguments after taking up a lot of time. Restarting
>> that discussion from scratch without considering the history feels wrong.
>> It's like a democracy voting on becoming a dictatorship repeatedly: you can
>> have a "no" vote several times, but if you rerun the vote often enough at
>> some point you'll get a "yes", and then it's a done deal.
>> >>
>> >> I think this requires a serious write-up, as either a NEP or a GitHub
>> issue with a good set of cross-links and addressing all previous arguments.
>> >>
>> >>>
>> >>> Indeed, it would seem to me that if a visually more convenient way to
>> do (stacks of) matrix multiplication for numpy is good enough to warrant
>> changing the python syntax, then surely having a visually more convenient
>> standard way to do matrix transpose should not be considered off-limits for
>> ndarray; how often do you see a series matrix manipulations that does not
>> involve both multiplication and transpose?
>> >>>
>> >>> It certainly doesn't seem to me much of an argument that someone
>> previously decided to use .T for a shortcut for the computer scientist idea
>> of transpose to not allow the mathematical/physical-scientist one - one I
>> would argue is guaranteed to be used much more.
>> >>>
>> >>> The latter of course just repeats what many others have written
>> above, but since given that you call it a "tweak", perhaps it is worth
>> backing up. For astropy, a quick grep gives:
>> >>>
>> >>> - 28 uses of the matrix_transpose function I wrote because numpy
>> doesn't have even a simple function for that and the people who wrote the
>> original code used the Matrix class which had the proper .T (but doesn't
>> extend to multiple dimensions; we might still be using it otherwise).
>> >>
>> >>
>> >> A utility function in scipy.linalg would be a more low-API-impact
>> approach to addressing this.
>> >>
>> >>>
>> >>> - 11 uses of .T,  all of which seem to be on 2-D arrays and are
>> certainly used as if they were matrix transpose (most are for fitting).
>> Certainly, all of these are bugs lying in waiting if the arrays ever get to
>> be >2-D.
>> >>
>> >>
>> >> Most linalg is 2-D, that's why numpy.matrix and scipy.sparse matrices
>> are 2-D only. If it's a real worry for those 11 cases, you could just add
>> some comments or tests that prevent introducing bugs.
>> >>
>> >> More importantly, your assumption that >2-D arrays are "stacks of
>> matrices" and that other usage is for "computer scientists" is arguably
>> incorrect. There are many use cases for 3-D and higher-dimensional arrays
>> that are not just "vectorized matrix math". As a physicist, I've done lots
>> of work with 3-D and 4-D grids for everything from quantum physics to
>> engineering problems in semiconductor equipment. NumPy is great for that,
>> and I've never needed >=3-D linalg for any of it (and transposing is
>> useful). So please don't claim the physicial-scientist view for this:)
>> >>
>> >> Cheers,
>> >> Ralf
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion at python.org
>> >> https://mail.python.org/mailman/listinfo/numpy-discussion
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at python.org
>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/7ced3332/attachment-0001.html>

From kirillbalunov at gmail.com  Wed Jun 26 16:03:34 2019
From: kirillbalunov at gmail.com (Kirill Balunov)
Date: Wed, 26 Jun 2019 23:03:34 +0300
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
Message-ID: <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>

 Only concerns #4 from Ilhan's list.

??, 26 ???. 2019 ?. ? 00:01, Ralf Gommers <ralf.gommers at gmail.com>:

>
> [....]
>
> Perhaps not full consensus between the many people with different opinions
> and interests. But for the first one, arr.T change: it's clear that this
> won't happen.
>

To begin with, I must admit that I am not familiar with the accepted policy
of introducing changes to NumPy. But I find it quite nonconstructive just
to say - it will not happen. What then is the point in the discussion?


> Between Juan's examples of valid use, and what Stephan and Matthew said,
> there's not much more to add. We're not going to change correct code for
> minor benefits.
>

I fully agree that any feature can find its use, valid or not is another
question. Juan did not present these examples, but I will allow myself to
assume that it is more correct to describe what is being done there as a
permutation, and not a transpose. In addition, in the very next sentence,
Juan adds that "These could be easily changed to .transpose() (honestly
they probably should!)"

We're not going to change correct code for minor benefits.
>

It's fair, I personally have no preferences in both cases, the most
important thing for me is that in the 2d case it works correctly. To be
honest, until today, I thought that `.T` will raise for` ndim > 2`. At
least that's what my experience told me. For example in

    Matlab - Error using  .' Transpose on ND array is not defined. Use
PERMUTE instead.

    Julia - transpose not defined for Array(Float64, 3). Consider using
permutedims for higher-dimensional arrays.

    Sympy - raise ValueError("array rank not 2")

Here, I agree with the authors that, to begin with, `transpose` is not the
best name, since in general it doesn?t fit as an any mathematical
definition (of course it will depend on what we take as an element) or a
definition from linear algebra. Thus the name `transpose` only leads to
confusion.

For a note about another suggestion - `.T` to mean a transpose of the last
two dimensions, in Mathematica authors for some reason did the
opposite (personally,
I could not understand why they made such a choice :) ):

    Transpose[list]
        transposes the first two levels in list.

    I feel strongly that we should have the following policy:
>
>     * Under no circumstances should we make changes that mean that correct
>     old code will give different results with new Numpy.
>

I find this overly strict rules that do not allow to evolve. I completely
agree that a silent change in behavior is a disaster, that changing
behavior (if it is not an error) in the same minor version (1.X.Y) is not
acceptable, but I see no reason to extend this rule for a major version
bump (2.A.B.),  especially if it allows something to improve.

I would see such a rough version of a roadmap of change (I foresee my
loneliness in this :)) Also considering this comment

    Personally I would find any divergence between a.T and a.transpose()
>     to be rather surprising.
>

it will be as follows:

1. in 1.18 add the `.permute` method to the array, with the same semantics
as `.transpose`.
2. Starting from 1.18, emit  `FutureWarning`, ` DeprectationWarning` for
`.transpose` and advise replacing it with `.permute`.
3. Starting from 1.18 for `.T` with` ndim> 2`, emit a `FutureWarning`, with
a note that in future versions the behavior will change.
4. In version 2, remove the `.transpose` and change the behavior for `.T`.

Regarding `.T` with` ndim> 2` - I don?t have preferences between error or
transpose of the last two dimensions.

with kind regards,
-gdg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/4f79c674/attachment.html>

From ralf.gommers at gmail.com  Wed Jun 26 16:18:05 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 26 Jun 2019 22:18:05 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
 <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>
Message-ID: <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>

On Wed, Jun 26, 2019 at 10:04 PM Kirill Balunov <kirillbalunov at gmail.com>
wrote:

> Only concerns #4 from Ilhan's list.
>
> ??, 26 ???. 2019 ?. ? 00:01, Ralf Gommers <ralf.gommers at gmail.com>:
>
>>
>> [....]
>>
>> Perhaps not full consensus between the many people with different
>> opinions and interests. But for the first one, arr.T change: it's clear
>> that this won't happen.
>>
>
> To begin with, I must admit that I am not familiar with the accepted
> policy of introducing changes to NumPy. But I find it quite
> nonconstructive just to say - it will not happen. What then is the point
> in the discussion?
>

There has been a *very* long discussion already, and several others on the
same topic before. There are also long-standing ways of dealing with
backwards compatibility - e.g. what Matthew said is not new, it's an agreed
upon way of working.
http://www.numpy.org/neps/nep-0023-backwards-compatibility.html lists some
principles. That NEP is not yet accepted (it needs rework), but it gives a
good idea of what does and does not go.


>
>
>> Between Juan's examples of valid use, and what Stephan and Matthew said,
>> there's not much more to add. We're not going to change correct code for
>> minor benefits.
>>
>
> I fully agree that any feature can find its use, valid or not is another
> question. Juan did not present these examples, but I will allow myself to
> assume that it is more correct to describe what is being done there as a
> permutation, and not a transpose. In addition, in the very next sentence,
> Juan adds that "These could be easily changed to .transpose() (honestly
> they probably should!)"
>
> We're not going to change correct code for minor benefits.
>>
>
> It's fair, I personally have no preferences in both cases, the most
> important thing for me is that in the 2d case it works correctly. To be
> honest, until today, I thought that `.T` will raise for` ndim > 2`. At
> least that's what my experience told me. For example in
>
>     Matlab - Error using  .' Transpose on ND array is not defined. Use
> PERMUTE instead.
>
>     Julia - transpose not defined for Array(Float64, 3). Consider using
> permutedims for higher-dimensional arrays.
>
>     Sympy - raise ValueError("array rank not 2")
>
> Here, I agree with the authors that, to begin with, `transpose` is not the
> best name, since in general it doesn?t fit as an any mathematical
> definition (of course it will depend on what we take as an element) or a
> definition from linear algebra. Thus the name `transpose` only leads to
> confusion.
>
> For a note about another suggestion - `.T` to mean a transpose of the last
> two dimensions, in Mathematica authors for some reason did the opposite (personally,
> I could not understand why they made such a choice :) ):
>
>     Transpose[list]
>         transposes the first two levels in list.
>
>     I feel strongly that we should have the following policy:
>>
>>     * Under no circumstances should we make changes that mean that correct
>>     old code will give different results with new Numpy.
>>
>
> I find this overly strict rules that do not allow to evolve. I completely
> agree that a silent change in behavior is a disaster, that changing
> behavior (if it is not an error) in the same minor version (1.X.Y) is not
> acceptable, but I see no reason to extend this rule for a major version
> bump (2.A.B.),  especially if it allows something to improve.
>

I'm sorry, you'll have to live with this rule. We've had lots of discussion
about this rule in many concrete cases. When existing code is buggy or is
consistently confusing many users, we can discuss. But in general changing
old code to do something else is a terrible idea.


> I would see such a rough version of a roadmap of change (I foresee my
> loneliness in this :)) Also considering this comment
>
>     Personally I would find any divergence between a.T and a.transpose()
>>     to be rather surprising.
>>
>
> it will be as follows:
>
> 1. in 1.18 add the `.permute` method to the array, with the same semantics
> as `.transpose`.
> 2. Starting from 1.18, emit  `FutureWarning`, ` DeprectationWarning` for
> `.transpose` and advise replacing it with `.permute`.
> 3. Starting from 1.18 for `.T` with` ndim> 2`, emit a `FutureWarning`,
> with a note that in future versions the behavior will change.
> 4. In version 2, remove the `.transpose` and change the behavior for `.T`.
>

This is simply not enough. Many users will skip versions when upgrading.
There must be an exceptionally good reason to change numerical results, and
this simply is not one.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/a2cb49e3/attachment-0001.html>

From m.h.vankerkwijk at gmail.com  Wed Jun 26 16:24:14 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 26 Jun 2019 16:24:14 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABL7CQgHVdM53n0qZh1-qMx7dMUSPe0F=wRmNMnwzmeSRuty=g@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
 <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>
 <CABL7CQjTEkLW8+0sVvkfOk9oErQu2sHrV_z_fpLJj3+hatqLCQ@mail.gmail.com>
 <CAEBuzr-jd5G=TrS86jSi2KX24Y9WvAxPD4S01D4njdTKmOy=oA@mail.gmail.com>
 <CAMEWA4ON4=de_G8E3okedgunM23O+L7LLqAueMgx8xGFCc06+A@mail.gmail.com>
 <CAMFejUBpu4ystrqZBsviCiVGJv_FRXEHphX9yh7Jvi14aWduzQ@mail.gmail.com>
 <CABL7CQgHVdM53n0qZh1-qMx7dMUSPe0F=wRmNMnwzmeSRuty=g@mail.gmail.com>
Message-ID: <CAJNV+9u13Bai=t4RmYqLRHUdPnmPgPtKpu3N_ZGVep0xXZytfw@mail.gmail.com>

> The main motivation for the @ PEP was actually to be able to get rid of
> objects like np.matrix and scipy.sparse matrices that redefine the meaning
> of the * operator. Quote: "This PEP proposes the minimum effective change
> to Python syntax that will allow us to drain this swamp [meaning np.matrix
> & co]."
>
> Notably, the @ PEP was written by Nathaniel, who was opposed to a copying
> .H.
>

I should note that my comment invoking the history of @ was about the
regular transpose, .mT.  The executive summary of the PEP includes the
following relevant sentence:
"""
Currently, most numerical Python code uses * for elementwise
multiplication, and function/method syntax for matrix multiplication;
however, this leads to ugly and unreadable code in common circumstances.
"""

Exactly the same holds for matrix transpose, and indeed for many matrix
expressions the gain in readability is lost without a clean option to do
the transpose.

-- Marten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/d602ba23/attachment.html>

From ralf.gommers at gmail.com  Wed Jun 26 17:22:39 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 26 Jun 2019 23:22:39 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9u13Bai=t4RmYqLRHUdPnmPgPtKpu3N_ZGVep0xXZytfw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CAJNV+9usxAzdTpwXPbUqHDf=WysVDJs9qaeDQAutBM=QM2_xZg@mail.gmail.com>
 <CABL7CQjff1qu8nc5W-B-jXsvif0f5c46ep49bu5+QE0YyE3j2w@mail.gmail.com>
 <CAJNV+9tScf9BOZjHBd3F_aKeFZxOY+RQc1OMj7Tkamxbp-YhCQ@mail.gmail.com>
 <CABL7CQjTEkLW8+0sVvkfOk9oErQu2sHrV_z_fpLJj3+hatqLCQ@mail.gmail.com>
 <CAEBuzr-jd5G=TrS86jSi2KX24Y9WvAxPD4S01D4njdTKmOy=oA@mail.gmail.com>
 <CAMEWA4ON4=de_G8E3okedgunM23O+L7LLqAueMgx8xGFCc06+A@mail.gmail.com>
 <CAMFejUBpu4ystrqZBsviCiVGJv_FRXEHphX9yh7Jvi14aWduzQ@mail.gmail.com>
 <CABL7CQgHVdM53n0qZh1-qMx7dMUSPe0F=wRmNMnwzmeSRuty=g@mail.gmail.com>
 <CAJNV+9u13Bai=t4RmYqLRHUdPnmPgPtKpu3N_ZGVep0xXZytfw@mail.gmail.com>
Message-ID: <CABL7CQjkhk+==JKKyvwnEyY8UG4sUhzzhSup-41bU=XsUOi0wg@mail.gmail.com>

On Wed, Jun 26, 2019 at 10:24 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

>
> The main motivation for the @ PEP was actually to be able to get rid of
>> objects like np.matrix and scipy.sparse matrices that redefine the meaning
>> of the * operator. Quote: "This PEP proposes the minimum effective change
>> to Python syntax that will allow us to drain this swamp [meaning np.matrix
>> & co]."
>>
>> Notably, the @ PEP was written by Nathaniel, who was opposed to a copying
>> .H.
>>
>
> I should note that my comment invoking the history of @ was about the
> regular transpose, .mT.  The executive summary of the PEP includes the
> following relevant sentence:
> """
> Currently, most numerical Python code uses * for elementwise
> multiplication, and function/method syntax for matrix multiplication;
> however, this leads to ugly and unreadable code in common circumstances.
> """
>
> Exactly the same holds for matrix transpose, and indeed for many matrix
> expressions the gain in readability is lost without a clean option to do
> the transpose.
>

Yes, but that's not at all equivalent. The point for @ was that you cannot
just create your own operator in Python. Therefore there really is no
alternative to a builtin operator. For methods however, there's nothing
stopping you from building a well-designed matrix class. You can then add
all the new properties that you want.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/d58e04d1/attachment.html>

From m.h.vankerkwijk at gmail.com  Wed Jun 26 17:22:47 2019
From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk)
Date: Wed, 26 Jun 2019 17:22:47 -0400
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
 <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>
 <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>
Message-ID: <CAJNV+9sFYY_icN128WkshL07usB0iRx=8QPzzxxU06n3tJh6Pw@mail.gmail.com>

Hi Ralf,

I realize you feel strongly that this whole thread is rehashing history,
but I think it is worth pointing out that many seem to consider that the
criterion for allowing backward incompatible changes, i.e., that "existing
code is buggy or is consistently confusing many users", is actually
fulfilled here.

Indeed, this appears true to such an extent that even those among the
steering council do not agree: while the topic of this thread was about
introducing *new* properties (because in the relevant issue I had suggested
to Steward it was not possible to change .T), it was Eric who brought up
the question whether we shouldn't just change `.T` after all. And in the
relevant issue, Sebastian noted that "I am not quite convinced that we
cannot change .T (at least in the sense of deprecation) myself", with Chuck
chiming in that "I don't recall being in opposition, and I also think the
current transpose is not what we want."

That makes three of your fellow steering council members who are not sure,
despite all the previous discussions (of which Chuck surely has seen most -
sorry, Chuck!).

It seems to me the only sure way in which we can avoid future discussions
is to actually address the underlying problem. E.g., is the cost of
deprecating & changing .T truly that much more than even having this
discussion?

All the best,

Marten


On Wed, Jun 26, 2019 at 4:18 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Wed, Jun 26, 2019 at 10:04 PM Kirill Balunov <kirillbalunov at gmail.com>
> wrote:
>
>> Only concerns #4 from Ilhan's list.
>>
>> ??, 26 ???. 2019 ?. ? 00:01, Ralf Gommers <ralf.gommers at gmail.com>:
>>
>>>
>>> [....]
>>>
>>> Perhaps not full consensus between the many people with different
>>> opinions and interests. But for the first one, arr.T change: it's clear
>>> that this won't happen.
>>>
>>
>> To begin with, I must admit that I am not familiar with the accepted
>> policy of introducing changes to NumPy. But I find it quite
>> nonconstructive just to say - it will not happen. What then is the point
>> in the discussion?
>>
>
> There has been a *very* long discussion already, and several others on the
> same topic before. There are also long-standing ways of dealing with
> backwards compatibility - e.g. what Matthew said is not new, it's an agreed
> upon way of working.
> http://www.numpy.org/neps/nep-0023-backwards-compatibility.html lists
> some principles. That NEP is not yet accepted (it needs rework), but it
> gives a good idea of what does and does not go.
>
>
>>
>>
>>> Between Juan's examples of valid use, and what Stephan and Matthew said,
>>> there's not much more to add. We're not going to change correct code for
>>> minor benefits.
>>>
>>
>> I fully agree that any feature can find its use, valid or not is another
>> question. Juan did not present these examples, but I will allow myself
>> to assume that it is more correct to describe what is being done there as a
>> permutation, and not a transpose. In addition, in the very next
>> sentence, Juan adds that "These could be easily changed to .transpose()
>> (honestly they probably should!)"
>>
>> We're not going to change correct code for minor benefits.
>>>
>>
>> It's fair, I personally have no preferences in both cases, the most
>> important thing for me is that in the 2d case it works correctly. To be
>> honest, until today, I thought that `.T` will raise for` ndim > 2`. At
>> least that's what my experience told me. For example in
>>
>>     Matlab - Error using  .' Transpose on ND array is not defined. Use
>> PERMUTE instead.
>>
>>     Julia - transpose not defined for Array(Float64, 3). Consider using
>> permutedims for higher-dimensional arrays.
>>
>>     Sympy - raise ValueError("array rank not 2")
>>
>> Here, I agree with the authors that, to begin with, `transpose` is not
>> the best name, since in general it doesn?t fit as an any mathematical
>> definition (of course it will depend on what we take as an element) or a
>> definition from linear algebra. Thus the name `transpose` only leads to
>> confusion.
>>
>> For a note about another suggestion - `.T` to mean a transpose of the
>> last two dimensions, in Mathematica authors for some reason did the
>> opposite (personally, I could not understand why they made such a choice
>> :) ):
>>
>>     Transpose[list]
>>         transposes the first two levels in list.
>>
>>     I feel strongly that we should have the following policy:
>>>
>>>     * Under no circumstances should we make changes that mean that
>>> correct
>>>     old code will give different results with new Numpy.
>>>
>>
>> I find this overly strict rules that do not allow to evolve. I
>> completely agree that a silent change in behavior is a disaster, that
>> changing behavior (if it is not an error) in the same minor version (1.X.Y)
>> is not acceptable, but I see no reason to extend this rule for a major
>> version bump (2.A.B.),  especially if it allows something to improve.
>>
>
> I'm sorry, you'll have to live with this rule. We've had lots of
> discussion about this rule in many concrete cases. When existing code is
> buggy or is consistently confusing many users, we can discuss. But in
> general changing old code to do something else is a terrible idea.
>
>
>> I would see such a rough version of a roadmap of change (I foresee my
>> loneliness in this :)) Also considering this comment
>>
>>     Personally I would find any divergence between a.T and a.transpose()
>>>     to be rather surprising.
>>>
>>
>> it will be as follows:
>>
>> 1. in 1.18 add the `.permute` method to the array, with the same
>> semantics as `.transpose`.
>> 2. Starting from 1.18, emit  `FutureWarning`, ` DeprectationWarning` for
>> `.transpose` and advise replacing it with `.permute`.
>> 3. Starting from 1.18 for `.T` with` ndim> 2`, emit a `FutureWarning`,
>> with a note that in future versions the behavior will change.
>> 4. In version 2, remove the `.transpose` and change the behavior for `.T`.
>>
>
> This is simply not enough. Many users will skip versions when upgrading.
> There must be an exceptionally good reason to change numerical results, and
> this simply is not one.
>
> Cheers,
> Ralf
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/be3d86fc/attachment-0001.html>

From sebastian at sipsolutions.net  Wed Jun 26 17:51:41 2019
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 26 Jun 2019 14:51:41 -0700
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9sFYY_icN128WkshL07usB0iRx=8QPzzxxU06n3tJh6Pw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
 <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>
 <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>
 <CAJNV+9sFYY_icN128WkshL07usB0iRx=8QPzzxxU06n3tJh6Pw@mail.gmail.com>
Message-ID: <c106c62cb18975a34712f8e60b61cb4a7cdb55f4.camel@sipsolutions.net>

On Wed, 2019-06-26 at 17:22 -0400, Marten van Kerkwijk wrote:
> Hi Ralf,
> 
> I realize you feel strongly that this whole thread is rehashing
> history, but I think it is worth pointing out that many seem to
> consider that the criterion for allowing backward incompatible
> changes, i.e., that "existing code is buggy or is consistently
> confusing many users", is actually fulfilled here.
> 
> Indeed, this appears true to such an extent that even those among the
> steering council do not agree: while the topic of this thread was
> about introducing *new* properties (because in the relevant issue I
> had suggested to Steward it was not possible to change .T), it was
> Eric who brought up the question whether we shouldn't just change
> `.T` after all. And in the relevant issue, Sebastian noted that "I am
> not quite convinced that we cannot change .T (at least in the sense
> of deprecation) myself", with Chuck chiming in that "I don't recall
> being in opposition, and I also think the current transpose is not
> what we want."
> 
> That makes three of your fellow steering council members who are not
> sure, despite all the previous discussions (of which Chuck surely has
> seen most - sorry, Chuck!).
> 
> It seems to me the only sure way in which we can avoid future
> discussions is to actually address the underlying problem. E.g., is
> the cost of deprecating & changing .T truly that much more than even
> having this discussion?
> 


To me, I think what we have here is simply that if we want to do it, it
will be an uphill battle. And an uphill battle may mean that we have to
write something close to an NEP. Including seeing how much code blows
up, e.g. by providing an environment variable switchable behaviour or
so. I think it would be better to approach it from that side: What is
necessary to be convincing enough?

The problem of going from one behaviour to another (at least without an
many-year waiting period) is real though, it not uncommon to leave
scripts lying around for 5 years...

So in that sense, I would agree that to really switch behaviour (not
just error), it would need extremely careful analysis, which may not be
feasible. OTOH, some other options, such as a new name or deprecations
(or warning) do not have such fundamental problems. Quite honestly, I
am not sure that deprecating `.T` completely for high dimensions is
much more painful then e.g. the move of factorial in scipy, which
forced me to modify a lot of my scripts (ok, its search+replace instead
of replacing one line).

We could go further of course, and say we do a "painful major" release
at some point with things like py3k warnings and all. But we probably
need more good reasons than a `.T`, and in-person discussions before
even considering it.

Best,

Sebastian


> All the best,
> 
> Marten
> 
> 
> On Wed, Jun 26, 2019 at 4:18 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> > 
> > On Wed, Jun 26, 2019 at 10:04 PM Kirill Balunov <
> > kirillbalunov at gmail.com> wrote:
> > > Only concerns #4 from Ilhan's list.
> > > 
> > > ??, 26 ???. 2019 ?. ? 00:01, Ralf Gommers <ralf.gommers at gmail.com
> > > >:
> > > > [....]
> > > > 
> > > > Perhaps not full consensus between the many people with
> > > > different opinions and interests. But for the first one, arr.T
> > > > change: it's clear that this won't happen.
> > > > 
> > > 
> > > To begin with, I must admit that I am not familiar with the
> > > accepted policy of introducing changes to NumPy. But I find it
> > > quite nonconstructive just to say - it will not happen. What then
> > > is the point in the discussion?
> > > 
> > 
> > There has been a *very* long discussion already, and several others
> > on the same topic before. There are also long-standing ways of
> > dealing with backwards compatibility - e.g. what Matthew said is
> > not new, it's an agreed upon way of working. 
> > http://www.numpy.org/neps/nep-0023-backwards-compatibility.html
> > lists some principles. That NEP is not yet accepted (it needs rework), but it gives a good idea of what does and does not go.
> >  
> > >  
> > > > Between Juan's examples of valid use, and what Stephan and
> > > > Matthew said, there's not much more to add. We're not going to
> > > > change correct code for minor benefits.
> > > > 
> > > 
> > > I fully agree that any feature can find its use, valid or not is
> > > another question. Juan did not present these examples, but I will
> > > allow myself to assume that it is more correct to describe what
> > > is being done there as a permutation, and not a transpose. In
> > > addition, in the very next sentence, Juan adds that "These could
> > > be easily changed to .transpose() (honestly they probably
> > > should!)"
> > > 
> > > > We're not going to change correct code for minor benefits. 
> > > 
> > > It's fair, I personally have no preferences in both cases, the
> > > most important thing for me is that in the 2d case it works
> > > correctly. To be honest, until today, I thought that `.T` will
> > > raise for` ndim > 2`. At least that's what my experience told me.
> > > For example in
> > > 
> > >     Matlab - Error using  .' Transpose on ND array is not
> > > defined. Use PERMUTE instead.
> > > 
> > >     Julia - transpose not defined for Array(Float64, 3). Consider
> > > using permutedims for higher-dimensional arrays.
> > > 
> > >     Sympy - raise ValueError("array rank not 2") 
> > > 
> > > Here, I agree with the authors that, to begin with, `transpose`
> > > is not the best name, since in general it doesn?t fit as an any
> > > mathematical definition (of course it will depend on what we take
> > > as an element) or a definition from linear algebra. Thus the name
> > > `transpose` only leads to confusion. 
> > > 
> > > For a note about another suggestion - `.T` to mean a transpose of
> > > the last two dimensions, in Mathematica authors for some reason
> > > did the opposite (personally, I could not understand why they
> > > made such a choice :) ):
> > > 
> > >     Transpose[list]
> > >         transposes the first two levels in list. 
> > > 
> > > >     I feel strongly that we should have the following policy:
> > > > 
> > > >     * Under no circumstances should we make changes that mean
> > > > that correct
> > > >     old code will give different results with new Numpy.
> > > > 
> > > 
> > > I find this overly strict rules that do not allow to evolve. I
> > > completely agree that a silent change in behavior is a disaster,
> > > that changing behavior (if it is not an error) in the same minor
> > > version (1.X.Y) is not acceptable, but I see no reason to extend
> > > this rule for a major version bump (2.A.B.),  especially if it
> > > allows something to improve.
> > > 
> > 
> > I'm sorry, you'll have to live with this rule. We've had lots of
> > discussion about this rule in many concrete cases. When existing
> > code is buggy or is consistently confusing many users, we can
> > discuss. But in general changing old code to do something else is a
> > terrible idea.
> > 
> > > I would see such a rough version of a roadmap of change (I
> > > foresee my loneliness in this :)) Also considering this comment  
> > > 
> > > >     Personally I would find any divergence between a.T and
> > > > a.transpose()
> > > >     to be rather surprising.
> > > > 
> > > 
> > > it will be as follows:
> > > 
> > > 1. in 1.18 add the `.permute` method to the array, with the same
> > > semantics as `.transpose`.  
> > > 2. Starting from 1.18, emit  `FutureWarning`, `
> > > DeprectationWarning` for `.transpose` and advise replacing it
> > > with `.permute`.
> > > 3. Starting from 1.18 for `.T` with` ndim> 2`, emit a
> > > `FutureWarning`, with a note that in future versions the behavior
> > > will change.
> > > 4. In version 2, remove the `.transpose` and change the behavior
> > > for `.T`.
> > > 
> > 
> > This is simply not enough. Many users will skip versions when
> > upgrading. There must be an exceptionally good reason to change
> > numerical results, and this simply is not one.
> > 
> > Cheers,
> > Ralf
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/c3beb1a2/attachment.sig>

From ralf.gommers at gmail.com  Wed Jun 26 17:54:58 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 26 Jun 2019 23:54:58 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAJNV+9sFYY_icN128WkshL07usB0iRx=8QPzzxxU06n3tJh6Pw@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
 <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>
 <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>
 <CAJNV+9sFYY_icN128WkshL07usB0iRx=8QPzzxxU06n3tJh6Pw@mail.gmail.com>
Message-ID: <CABL7CQgRf=PQqYcBo50g52hMxjf8NzX++rz_EGU-36OkHC=14g@mail.gmail.com>

On Wed, Jun 26, 2019 at 11:24 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi Ralf,
>
> I realize you feel strongly that this whole thread is rehashing history,
>

The .H part was. But Cameron volunteered to work on a solution that
satisfies all concerns.

but I think it is worth pointing out that many seem to consider that the
> criterion for allowing backward incompatible changes, i.e., that "existing
> code is buggy or is consistently confusing many users", is actually
> fulfilled here.
>
> Indeed, this appears true to such an extent that even those among the
> steering council do not agree: while the topic of this thread was about
> introducing *new* properties (because in the relevant issue I had suggested
> to Steward it was not possible to change .T), it was Eric who brought up
> the question whether we shouldn't just change `.T` after all.
>

Yes, and then he came up with a better suggestion in
https://github.com/numpy/numpy/issues/13835.
If a comment starts with "this may be contentious" and then real-world
correct code like in scikit-image shows up, it's better not to dig it back
up after 60 emails to make your point.....

And in the relevant issue, Sebastian noted that "I am not quite convinced
> that we cannot change .T (at least in the sense of deprecation)
>

deprecation I interpret as "to raise an error after"

myself", with Chuck chiming in that "I don't recall being in opposition,
> and I also think the current transpose is not what we want."
>
> That makes three of your fellow steering council members who are not sure,
> despite all the previous discussions (of which Chuck surely has seen most -
> sorry, Chuck!).
>
> It seems to me the only sure way in which we can avoid future discussions
> is to actually address the underlying problem. E.g., is the cost of
> deprecating & changing .T truly that much more than even having this
> discussion?
>

Yes it is. Seriously, every time someone proposes something like this,
eventually a better solution is found. Raising for .T on >2-D is a
possibility, in case the problem is really *that* bad (which doesn't seem
to be the case - but if so, please propose that instead). Changing the
meaning of .T to give changed numerical results is not acceptable (not just
my opinion, also Matthew, Alan, and Stephan said no). If you've been on
this list for a few years, you really should understand that by now.

I'll quote Matthew again, who said it best:
"Under no circumstances should we make changes that mean that correct old
code will give different results with new Numpy.
On the other hand, it's OK (with a suitable period of deprecation) for
correct old code to raise an informative error with new Numpy."

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/5c46cf61/attachment.html>

From charlesr.harris at gmail.com  Wed Jun 26 18:01:56 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 26 Jun 2019 16:01:56 -0600
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
 <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>
 <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>
Message-ID: <CAB6mnxK6nRLQ_sj__w7vU125KD82vVEmdtAdS0cCRC3JaZY3eQ@mail.gmail.com>

On Wed, Jun 26, 2019 at 2:18 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Wed, Jun 26, 2019 at 10:04 PM Kirill Balunov <kirillbalunov at gmail.com>
> wrote:
>
>> Only concerns #4 from Ilhan's list.
>>
>> ??, 26 ???. 2019 ?. ? 00:01, Ralf Gommers <ralf.gommers at gmail.com>:
>>
>>>
>>> [....]
>>>
>>> Perhaps not full consensus between the many people with different
>>> opinions and interests. But for the first one, arr.T change: it's clear
>>> that this won't happen.
>>>
>>
>> To begin with, I must admit that I am not familiar with the accepted
>> policy of introducing changes to NumPy. But I find it quite
>> nonconstructive just to say - it will not happen. What then is the point
>> in the discussion?
>>
>
> There has been a *very* long discussion already, and several others on the
> same topic before. There are also long-standing ways of dealing with
> backwards compatibility - e.g. what Matthew said is not new, it's an agreed
> upon way of working.
> http://www.numpy.org/neps/nep-0023-backwards-compatibility.html lists
> some principles. That NEP is not yet accepted (it needs rework), but it
> gives a good idea of what does and does not go.
>
>
>>
>>
>>> Between Juan's examples of valid use, and what Stephan and Matthew said,
>>> there's not much more to add. We're not going to change correct code for
>>> minor benefits.
>>>
>>
>> I fully agree that any feature can find its use, valid or not is another
>> question. Juan did not present these examples, but I will allow myself
>> to assume that it is more correct to describe what is being done there as a
>> permutation, and not a transpose. In addition, in the very next
>> sentence, Juan adds that "These could be easily changed to .transpose()
>> (honestly they probably should!)"
>>
>> We're not going to change correct code for minor benefits.
>>>
>>
>> It's fair, I personally have no preferences in both cases, the most
>> important thing for me is that in the 2d case it works correctly. To be
>> honest, until today, I thought that `.T` will raise for` ndim > 2`. At
>> least that's what my experience told me. For example in
>>
>>     Matlab - Error using  .' Transpose on ND array is not defined. Use
>> PERMUTE instead.
>>
>>     Julia - transpose not defined for Array(Float64, 3). Consider using
>> permutedims for higher-dimensional arrays.
>>
>>     Sympy - raise ValueError("array rank not 2")
>>
>> Here, I agree with the authors that, to begin with, `transpose` is not
>> the best name, since in general it doesn?t fit as an any mathematical
>> definition (of course it will depend on what we take as an element) or a
>> definition from linear algebra. Thus the name `transpose` only leads to
>> confusion.
>>
>> For a note about another suggestion - `.T` to mean a transpose of the
>> last two dimensions, in Mathematica authors for some reason did the
>> opposite (personally, I could not understand why they made such a choice
>> :) ):
>>
>>     Transpose[list]
>>         transposes the first two levels in list.
>>
>>     I feel strongly that we should have the following policy:
>>>
>>>     * Under no circumstances should we make changes that mean that
>>> correct
>>>     old code will give different results with new Numpy.
>>>
>>
>> I find this overly strict rules that do not allow to evolve. I
>> completely agree that a silent change in behavior is a disaster, that
>> changing behavior (if it is not an error) in the same minor version (1.X.Y)
>> is not acceptable, but I see no reason to extend this rule for a major
>> version bump (2.A.B.),  especially if it allows something to improve.
>>
>
> I'm sorry, you'll have to live with this rule. We've had lots of
> discussion about this rule in many concrete cases. When existing code is
> buggy or is consistently confusing many users, we can discuss. But in
> general changing old code to do something else is a terrible idea.
>
>
>> I would see such a rough version of a roadmap of change (I foresee my
>> loneliness in this :)) Also considering this comment
>>
>>     Personally I would find any divergence between a.T and a.transpose()
>>>     to be rather surprising.
>>>
>>
>> it will be as follows:
>>
>> 1. in 1.18 add the `.permute` method to the array, with the same
>> semantics as `.transpose`.
>> 2. Starting from 1.18, emit  `FutureWarning`, ` DeprectationWarning` for
>> `.transpose` and advise replacing it with `.permute`.
>> 3. Starting from 1.18 for `.T` with` ndim> 2`, emit a `FutureWarning`,
>> with a note that in future versions the behavior will change.
>> 4. In version 2, remove the `.transpose` and change the behavior for `.T`.
>>
>
> This is simply not enough. Many users will skip versions when upgrading.
> There must be an exceptionally good reason to change numerical results, and
> this simply is not one.
>
>
I agree with Ralf that `*.T` should be left alone, it is widely used and
changing its behavior is bound to lead to broken code. I could see `*.mT`
or `*.mH`, but I'm beginning to wonder if we would not be better served
with a better matrix class that could also deal intelligently with stacks
of row and column vectors. In the past I have preferred `einsum` over `@`
precisely because it made handling those variations easy. The `@` operator
is very convenient at a low level, but it simply cannot deal with stacks of
mixed types in generality. With a class we could do something about that.

 Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190626/8581329d/attachment-0001.html>

From ilhanpolat at gmail.com  Wed Jun 26 22:18:32 2019
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Thu, 27 Jun 2019 04:18:32 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAB6mnxK6nRLQ_sj__w7vU125KD82vVEmdtAdS0cCRC3JaZY3eQ@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
 <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>
 <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>
 <CAB6mnxK6nRLQ_sj__w7vU125KD82vVEmdtAdS0cCRC3JaZY3eQ@mail.gmail.com>
Message-ID: <CAEBuzr--RZUyyXOREaXkYnJbVGe-Y4R8qYjCs=mQmXiA7XirGg@mail.gmail.com>

I've finally gone through the old discussion and finally got the
counter-argument in one of the Dag Sverre's replies
http://numpy-discussion.10968.n7.nabble.com/add-H-attribute-tp34474p34668.html

TL; DR

I disagree with [...adding the .H attribute...] being forward looking, as
> it explicitly creates a situation where code will break if .H becomes a
> view
>

This actually makes perfect sense and a valid concern that I have not
considered before.

The remaining question is why we treat as if returning a view is a
requirement. We have been using .conj().T and receiving the copies of the
arrays since that day with equally inefficient code after many years. Then
the discussion diverges to other things hence I am not sure where does this
requirement come from.

But I guess this part should be rehashed clearer until next time :)


On Thu, Jun 27, 2019 at 12:03 AM Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Wed, Jun 26, 2019 at 2:18 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Wed, Jun 26, 2019 at 10:04 PM Kirill Balunov <kirillbalunov at gmail.com>
>> wrote:
>>
>>> Only concerns #4 from Ilhan's list.
>>>
>>> ??, 26 ???. 2019 ?. ? 00:01, Ralf Gommers <ralf.gommers at gmail.com>:
>>>
>>>>
>>>> [....]
>>>>
>>>> Perhaps not full consensus between the many people with different
>>>> opinions and interests. But for the first one, arr.T change: it's clear
>>>> that this won't happen.
>>>>
>>>
>>> To begin with, I must admit that I am not familiar with the accepted
>>> policy of introducing changes to NumPy. But I find it quite
>>> nonconstructive just to say - it will not happen. What then is the
>>> point in the discussion?
>>>
>>
>> There has been a *very* long discussion already, and several others on
>> the same topic before. There are also long-standing ways of dealing with
>> backwards compatibility - e.g. what Matthew said is not new, it's an agreed
>> upon way of working.
>> http://www.numpy.org/neps/nep-0023-backwards-compatibility.html lists
>> some principles. That NEP is not yet accepted (it needs rework), but it
>> gives a good idea of what does and does not go.
>>
>>
>>>
>>>
>>>> Between Juan's examples of valid use, and what Stephan and Matthew
>>>> said, there's not much more to add. We're not going to change correct code
>>>> for minor benefits.
>>>>
>>>
>>> I fully agree that any feature can find its use, valid or not is another
>>> question. Juan did not present these examples, but I will allow myself
>>> to assume that it is more correct to describe what is being done there as a
>>> permutation, and not a transpose. In addition, in the very next
>>> sentence, Juan adds that "These could be easily changed to .transpose()
>>> (honestly they probably should!)"
>>>
>>> We're not going to change correct code for minor benefits.
>>>>
>>>
>>> It's fair, I personally have no preferences in both cases, the most
>>> important thing for me is that in the 2d case it works correctly. To be
>>> honest, until today, I thought that `.T` will raise for` ndim > 2`. At
>>> least that's what my experience told me. For example in
>>>
>>>     Matlab - Error using  .' Transpose on ND array is not defined. Use
>>> PERMUTE instead.
>>>
>>>     Julia - transpose not defined for Array(Float64, 3). Consider using
>>> permutedims for higher-dimensional arrays.
>>>
>>>     Sympy - raise ValueError("array rank not 2")
>>>
>>> Here, I agree with the authors that, to begin with, `transpose` is not
>>> the best name, since in general it doesn?t fit as an any mathematical
>>> definition (of course it will depend on what we take as an element) or a
>>> definition from linear algebra. Thus the name `transpose` only leads to
>>> confusion.
>>>
>>> For a note about another suggestion - `.T` to mean a transpose of the
>>> last two dimensions, in Mathematica authors for some reason did the
>>> opposite (personally, I could not understand why they made such a
>>> choice :) ):
>>>
>>>     Transpose[list]
>>>         transposes the first two levels in list.
>>>
>>>     I feel strongly that we should have the following policy:
>>>>
>>>>     * Under no circumstances should we make changes that mean that
>>>> correct
>>>>     old code will give different results with new Numpy.
>>>>
>>>
>>> I find this overly strict rules that do not allow to evolve. I
>>> completely agree that a silent change in behavior is a disaster, that
>>> changing behavior (if it is not an error) in the same minor version (1.X.Y)
>>> is not acceptable, but I see no reason to extend this rule for a major
>>> version bump (2.A.B.),  especially if it allows something to improve.
>>>
>>
>> I'm sorry, you'll have to live with this rule. We've had lots of
>> discussion about this rule in many concrete cases. When existing code is
>> buggy or is consistently confusing many users, we can discuss. But in
>> general changing old code to do something else is a terrible idea.
>>
>>
>>> I would see such a rough version of a roadmap of change (I foresee my
>>> loneliness in this :)) Also considering this comment
>>>
>>>     Personally I would find any divergence between a.T and a.transpose()
>>>>     to be rather surprising.
>>>>
>>>
>>> it will be as follows:
>>>
>>> 1. in 1.18 add the `.permute` method to the array, with the same
>>> semantics as `.transpose`.
>>> 2. Starting from 1.18, emit  `FutureWarning`, ` DeprectationWarning` for
>>> `.transpose` and advise replacing it with `.permute`.
>>> 3. Starting from 1.18 for `.T` with` ndim> 2`, emit a `FutureWarning`,
>>> with a note that in future versions the behavior will change.
>>> 4. In version 2, remove the `.transpose` and change the behavior for
>>> `.T`.
>>>
>>
>> This is simply not enough. Many users will skip versions when upgrading.
>> There must be an exceptionally good reason to change numerical results, and
>> this simply is not one.
>>
>>
> I agree with Ralf that `*.T` should be left alone, it is widely used and
> changing its behavior is bound to lead to broken code. I could see `*.mT`
> or `*.mH`, but I'm beginning to wonder if we would not be better served
> with a better matrix class that could also deal intelligently with stacks
> of row and column vectors. In the past I have preferred `einsum` over `@`
> precisely because it made handling those variations easy. The `@` operator
> is very convenient at a low level, but it simply cannot deal with stacks of
> mixed types in generality. With a class we could do something about that.
>
>  Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190627/42fdcfe0/attachment-0001.html>

From ralf.gommers at gmail.com  Wed Jun 26 22:26:02 2019
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 27 Jun 2019 04:26:02 +0200
Subject: [Numpy-discussion] Syntax Improvement for Array Transpose
In-Reply-To: <CAEBuzr--RZUyyXOREaXkYnJbVGe-Y4R8qYjCs=mQmXiA7XirGg@mail.gmail.com>
References: <CA+yAOojf-27vf+mF8tukCaXnpxzOmf+hyRtEs3hZPbwAkD1Kdw@mail.gmail.com>
 <CAEQ_TveiusVB4VZ8evs76QbYgJ-UNGSQXYj3c+i0CMk9jTfiJA@mail.gmail.com>
 <CAJNV+9vGqLze6ii=Jao7mfvFf6i-H4jmQD15rRsqiAEJcrkn4A@mail.gmail.com>
 <CAEBuzr_P1xsGSE4MDatKVpaCxoXMLPx3bJmHkW5H2+n9O0ynpg@mail.gmail.com>
 <CAMFejUB52gvv2jVjjnLP3VXH=vje93ssdtF_L+5+kjMnVixRVw@mail.gmail.com>
 <CAMEWA4PAwDOEoTv5CO8WwY9m26be0g5K7UQsEOsH7-QdBySCEw@mail.gmail.com>
 <CAEBuzr_a4jUu0783SZhL209VbtOWN62g-kTR5yTO92u7yU-nTw@mail.gmail.com>
 <CAFpSVpLyXOs6YNvwqmte=SMuP9YxgaG9Cv=5NfObZkUvS3Tjkw@mail.gmail.com>
 <CAEBuzr8rxjP1M4rjyzY=X4scP7-ZLbV7=ugt-HobZJj=smwxSA@mail.gmail.com>
 <CAFpSVpLcfjUuXWzv3hfSdN_dbTo63a1soXkJ7FDR+Ow_j_t=mw@mail.gmail.com>
 <CAMFejUDVh9YmJ02n_oSFy88Pfmqw1GxshgV2Aftan-rkpate2A@mail.gmail.com>
 <CABwCVQC6XiaEOAZePSfC-9DjwEGEB4RBhP5aT1MQoy_3yDnceg@mail.gmail.com>
 <CABL7CQisvU+o1DLgNFxcE_-TTkiaw7RA-MDTofXa+de_2+tKCA@mail.gmail.com>
 <CABwCVQBudczfJfeccrijzJMwvhZWQoh5ni1uFTuY2VokymWbqg@mail.gmail.com>
 <CABL7CQg5jDsUYz6zvwZBRSv75MQBdBaAtW1zx+q0jhoqZvNxJg@mail.gmail.com>
 <CAB6mnxK6nRLQ_sj__w7vU125KD82vVEmdtAdS0cCRC3JaZY3eQ@mail.gmail.com>
 <CAEBuzr--RZUyyXOREaXkYnJbVGe-Y4R8qYjCs=mQmXiA7XirGg@mail.gmail.com>
Message-ID: <CABL7CQjhsheUosz2=kNfta9KKYGHNsjyNwX-OfbhZOC_W1kmsg@mail.gmail.com>

On Thu, Jun 27, 2019 at 4:19 AM Ilhan Polat <ilhanpolat at gmail.com> wrote:

> I've finally gone through the old discussion and finally got the
> counter-argument in one of the Dag Sverre's replies
> http://numpy-discussion.10968.n7.nabble.com/add-H-attribute-tp34474p34668.html
>
> TL; DR
>
> I disagree with [...adding the .H attribute...] being forward looking, as
>> it explicitly creates a situation where code will break if .H becomes a
>> view
>>
>
> This actually makes perfect sense and a valid concern that I have not
> considered before.
>
> The remaining question is why we treat as if returning a view is a
> requirement. We have been using .conj().T and receiving the copies of the
> arrays since that day with equally inefficient code after many years. Then
> the discussion diverges to other things hence I am not sure where does this
> requirement come from.
>

I think that's in that thread somewhere in more detail, but the summary is:
1. properties imply that they're cheap computationally
2. .T returning a view and .H a copy would be inconsistent and unintuitive
There may be one more argument, this is just from memory.

Cheers,
Ralf


> But I guess this part should be rehashed clearer until next time :)
>
>
>
>
> On Thu, Jun 27, 2019 at 12:03 AM Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Wed, Jun 26, 2019 at 2:18 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 26, 2019 at 10:04 PM Kirill Balunov <kirillbalunov at gmail.com>
>>> wrote:
>>>
>>>> Only concerns #4 from Ilhan's list.
>>>>
>>>> ??, 26 ???. 2019 ?. ? 00:01, Ralf Gommers <ralf.gommers at gmail.com>:
>>>>
>>>>>
>>>>> [....]
>>>>>
>>>>> Perhaps not full consensus between the many people with different
>>>>> opinions and interests. But for the first one, arr.T change: it's clear
>>>>> that this won't happen.
>>>>>
>>>>
>>>> To begin with, I must admit that I am not familiar with the accepted
>>>> policy of introducing changes to NumPy. But I find it quite
>>>> nonconstructive just to say - it will not happen. What then is the
>>>> point in the discussion?
>>>>
>>>
>>> There has been a *very* long discussion already, and several others on
>>> the same topic before. There are also long-standing ways of dealing with
>>> backwards compatibility - e.g. what Matthew said is not new, it's an agreed
>>> upon way of working.
>>> http://www.numpy.org/neps/nep-0023-backwards-compatibility.html lists
>>> some principles. That NEP is not yet accepted (it needs rework), but it
>>> gives a good idea of what does and does not go.
>>>
>>>
>>>>
>>>>
>>>>> Between Juan's examples of valid use, and what Stephan and Matthew
>>>>> said, there's not much more to add. We're not going to change correct code
>>>>> for minor benefits.
>>>>>
>>>>
>>>> I fully agree that any feature can find its use, valid or not is
>>>> another question. Juan did not present these examples, but I will
>>>> allow myself to assume that it is more correct to describe what is being
>>>> done there as a permutation, and not a transpose. In addition, in the
>>>> very next sentence, Juan adds that "These could be easily changed to
>>>> .transpose() (honestly they probably should!)"
>>>>
>>>> We're not going to change correct code for minor benefits.
>>>>>
>>>>
>>>> It's fair, I personally have no preferences in both cases, the most
>>>> important thing for me is that in the 2d case it works correctly. To
>>>> be honest, until today, I thought that `.T` will raise for` ndim > 2`. At
>>>> least that's what my experience told me. For example in
>>>>
>>>>     Matlab - Error using  .' Transpose on ND array is not defined. Use
>>>> PERMUTE instead.
>>>>
>>>>     Julia - transpose not defined for Array(Float64, 3). Consider using
>>>> permutedims for higher-dimensional arrays.
>>>>
>>>>     Sympy - raise ValueError("array rank not 2")
>>>>
>>>> Here, I agree with the authors that, to begin with, `transpose` is not
>>>> the best name, since in general it doesn?t fit as an any mathematical
>>>> definition (of course it will depend on what we take as an element) or a
>>>> definition from linear algebra. Thus the name `transpose` only leads
>>>> to confusion.
>>>>
>>>> For a note about another suggestion - `.T` to mean a transpose of the
>>>> last two dimensions, in Mathematica authors for some reason did the
>>>> opposite (personally, I could not understand why they made such a
>>>> choice :) ):
>>>>
>>>>     Transpose[list]
>>>>         transposes the first two levels in list.
>>>>
>>>>     I feel strongly that we should have the following policy:
>>>>>
>>>>>     * Under no circumstances should we make changes that mean that
>>>>> correct
>>>>>     old code will give different results with new Numpy.
>>>>>
>>>>
>>>> I find this overly strict rules that do not allow to evolve. I
>>>> completely agree that a silent change in behavior is a disaster, that
>>>> changing behavior (if it is not an error) in the same minor version (1.X.Y)
>>>> is not acceptable, but I see no reason to extend this rule for a major
>>>> version bump (2.A.B.),  especially if it allows something to improve.
>>>>
>>>
>>> I'm sorry, you'll have to live with this rule. We've had lots of
>>> discussion about this rule in many concrete cases. When existing code is
>>> buggy or is consistently confusing many users, we can discuss. But in
>>> general changing old code to do something else is a terrible idea.
>>>
>>>
>>>> I would see such a rough version of a roadmap of change (I foresee my
>>>> loneliness in this :)) Also considering this comment
>>>>
>>>>     Personally I would find any divergence between a.T and a.transpose()
>>>>>     to be rather surprising.
>>>>>
>>>>
>>>> it will be as follows:
>>>>
>>>> 1. in 1.18 add the `.permute` method to the array, with the same
>>>> semantics as `.transpose`.
>>>> 2. Starting from 1.18, emit  `FutureWarning`, ` DeprectationWarning` for
>>>> `.transpose` and advise replacing it with `.permute`.
>>>> 3. Starting from 1.18 for `.T` with` ndim> 2`, emit a `FutureWarning`,
>>>> with a note that in future versions the behavior will change.
>>>> 4. In version 2, remove the `.transpose` and change the behavior for
>>>> `.T`.
>>>>
>>>
>>> This is simply not enough. Many users will skip versions when upgrading.
>>> There must be an exceptionally good reason to change numerical results, and
>>> this simply is not one.
>>>
>>>
>> I agree with Ralf that `*.T` should be left alone, it is widely used and
>> changing its behavior is bound to lead to broken code. I could see `*.mT`
>> or `*.mH`, but I'm beginning to wonder if we would not be better served
>> with a better matrix class that could also deal intelligently with stacks
>> of row and column vectors. In the past I have preferred `einsum` over `@`
>> precisely because it made handling those variations easy. The `@` operator
>> is very convenient at a low level, but it simply cannot deal with stacks of
>> mixed types in generality. With a class we could do something about that.
>>
>>  Chuck
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190627/a5d7e901/attachment-0001.html>

From charlesr.harris at gmail.com  Sat Jun 29 18:28:42 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 29 Jun 2019 16:28:42 -0600
Subject: [Numpy-discussion] 1.17. 0 Release
Message-ID: <CAB6mnx+=B-3TDU4DGArkTjzLk-QvkMV0LJ4tJsshWLdct4PqtA@mail.gmail.com>

Hi All,

I've put up a partially edited version of the 1.17.0 release notes
<https://github.com/numpy/numpy/pull/13869>, review would be much
appreciated. The use of ` and ' can be especially tricky. Once the
preparation is finished, I will make a branch and put out the first release
candidate.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190629/1c61f295/attachment.html>

From charlesr.harris at gmail.com  Sun Jun 30 14:37:51 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 30 Jun 2019 12:37:51 -0600
Subject: [Numpy-discussion] NumPy master open for 1.18 development.
Message-ID: <CAB6mnxJP+Uvhs4Mde=jk5YCJKY==CT+ub7=Bf2=PiSvY9gdbvg@mail.gmail.com>

Hi All,

NumPy 1.17.x has been branched.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190630/c9a28460/attachment.html>

From charlesr.harris at gmail.com  Sun Jun 30 18:47:25 2019
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 30 Jun 2019 16:47:25 -0600
Subject: [Numpy-discussion] NumPy 1.17.0rc1 released
Message-ID: <CAB6mnxJPcSS8SbzqgPZXFDwqdDJS4jnceqX-5nuKwxT7hpq6og@mail.gmail.com>

Hi All,

On behalf of the NumPy team I am pleased to announce the release of NumPy
1.17.0rc1. The 1.17 release contains a number of new features that should
substantially improve its performance and usefulness. The Python versions
supported are 3.5-3.7, note that Python 2.7 has been dropped. Python 3.8b1
should work with the released source packages, but there are no guarantees
about future releases. Highlights of this release are:

   - A new extensible random module along with four selectable random
   numbe5 generators and improved seeding designed for use in parallel
   processes has been added. The currently available bit generators are
   MT19937, PCG64, Philox, and SFC64.
   - NumPy's FFT implementation was changed from fftpack to pocketfft,
   resulting in faster, more accurate transforms and better handling of
   datasets of prime length.
   - New radix sort and timsort sorting methods. It is currently not
   possible to choose which will be used, but they are hardwired to the
   datatype and used when either ``stable`` or ``mergesort`` is passed as the
   method.
   - Overriding numpy functions is now possible by default

Downstream developers should use Cython >= 0.29.10 for Python 3.8 support
and OpenBLAS >= 3.7 (not currently out) to avoid problems on the Skylake
architecture. The NumPy wheels on PyPI are built from the OpenBLAS
development branch in order to avoid those problems. Wheels for this
release can be downloaded from PyPI
<https://pypi.org/project/numpy/1.17.0rc1/>, source archives and release
notes are available from Github
<https://github.com/numpy/numpy/releases/tag/v1.17.0rc1>.

*Contributors*

A total of 142 people contributed to this release.  People with a "+" by
their
names contributed a patch for the first time.

   - Aaron Voelker +
   - Abdur Rehman +
   - Abdur-Rahmaan Janhangeer +
   - Abhinav Sagar +
   - Adam J. Stewart +
   - Adam Orr +
   - Albert Thomas +
   - Alex Watt +
   - Alexander Blinne +
   - Alexander Shadchin
   - Allan Haldane
   - Ander Ustarroz +
   - Andras Deak
   - Andreas Schwab
   - Andrew Naguib +
   - Andy Scholand +
   - Ankit Shukla +
   - Anthony Sottile
   - Antoine Pitrou
   - Antony Lee
   - Arcesio Castaneda Medina +
   - Assem +
   - Bernardt Duvenhage +
   - Bharat Raghunathan +
   - Bharat123rox +
   - Bran +
   - Bruce Merry +
   - Charles Harris
   - Chirag Nighut +
   - Christoph Gohlke
   - Christopher Whelan +
   - Chuanzhu Xu +
   - Daniel Hrisca
   - Daniel Lawrence +
   - Debsankha Manik +
   - Dennis Zollo +
   - Dieter Werthm?ller +
   - Dominic Jack +
   - EelcoPeacs +
   - Eric Larson
   - Eric Wieser
   - Fabrice Fontaine +
   - Gary Gurlaskie +
   - Gregory Lee +
   - Gregory R. Lee
   - Hameer Abbasi
   - Haoyu Sun +
   - He Jia +
   - Hunter Damron +
   - Ian Sanders +
   - Ilja +
   - Isaac Virshup +
   - Isaiah Norton +
   - Jaime Fernandez
   - Jakub Wilk
   - Jan S. (Milania1) +
   - Jarrod Millman
   - Javier Dehesa +
   - Jeremy Lay +
   - Jim Turner +
   - Jingbei Li +
   - Joachim Hereth +
   - John Belmonte +
   - John Kirkham
   - John Law +
   - Jonas Jensen
   - Joseph Fox-Rabinovitz
   - Joseph Martinot-Lagarde
   - Josh Wilson
   - Juan Luis Cano Rodr?guez
   - Julian Taylor
   - J?r?mie du Boisberranger +
   - Kai Striega +
   - Katharine Hyatt +
   - Kevin Sheppard
   - Kexuan Sun
   - Kiko Correoso +
   - Kriti Singh +
   - Lars Grueter +
   - Maksim Shabunin +
   - Manvi07 +
   - Mark Harfouche
   - Marten van Kerkwijk
   - Martin Reinecke +
   - Matthew Brett
   - Matthias Bussonnier
   - Matti Picus
   - Michel Fruchart +
   - Mike Lui +
   - Mike Taves +
   - Min ho Kim +
   - Mircea Akos Bruma
   - Nick Minkyu Lee
   - Nick Papior
   - Nick R. Papior +
   - Nicola Soranzo +
   - Nimish Telang +
   - OBATA Akio +
   - Oleksandr Pavlyk
   - Ori Broda +
   - Paul Ivanov
   - Pauli Virtanen
   - Peter Andreas Entschev +
   - Peter Bell +
   - Pierre de Buyl
   - Piyush Jaipuriayar +
   - Prithvi MK +
   - Raghuveer Devulapalli +
   - Ralf Gommers
   - Richard Harris +
   - Rishabh Chakrabarti +
   - Riya Sharma +
   - Robert Kern
   - Roman Yurchak
   - Ryan Levy +
   - Sebastian Berg
   - Sergei Lebedev +
   - Shekhar Prasad Rajak +
   - Stefan van der Walt
   - Stephan Hoyer
   - SuryaChand P +
   - S?ren Rasmussen +
   - Thibault Hallouin +
   - Thomas A Caswell
   - Tobias Uelwer +
   - Tony LaTorre +
   - Toshiki Kataoka
   - Tyler Moncur +
   - Tyler Reddy
   - Valentin Haenel
   - Vrinda Narayan +
   - Warren Weckesser
   - Weitang Li
   - Wojtek Ruszczewski
   - Yu Feng
   - Yu Kobayashi +
   - Yury Kirienko +
   - @aashuli +
   - @euronion +
   - @luzpaz
   - @parul +
   - @spacescientist +

Cheers,

Charles Harris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190630/ee258e1b/attachment.html>