From friedrichromstedt at gmail.com  Mon Feb  1 02:57:40 2021
From: friedrichromstedt at gmail.com (Friedrich Romstedt)
Date: Mon, 1 Feb 2021 08:57:40 +0100
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
Message-ID: <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>

Hi,

Am Di., 26. Jan. 2021 um 09:48 Uhr schrieb Friedrich Romstedt
<friedrichromstedt at gmail.com>:
>
> [...] The following Python
> code crashes::
>
>     image = <... Image production ...>
>     ar = numpy.asarray(image)
>
> However, when I say::
>
>     image = <... Image production ...>
>     print("---")
>     ar = numpy.asarray(image)
>
> the entire program is executing properly with correct data in the
> numpy ndarray produced using the buffer interface.
>
> [...]

Does anyone have an idea about this?  By the way, I noticed that this
mailing list turned pretty quiet, am I missing something?

For completeness, the abovementioned "crash" shows up as just a
premature exit of the program.  There is no error message whatsoever.
The buffer view producing function raises Exceptions properly when
something goes wrong; also notice that this code completes without
error when the ``print("---")`` statement is in action.  So I presume
the culprit lies somewhere on the C level.  I can only guess that it
might be some side-effect unknown to me.

Best,
Friedrich

From hameerabbasi at yahoo.com  Mon Feb  1 03:03:33 2021
From: hameerabbasi at yahoo.com (Hameer Abbasi)
Date: Mon, 1 Feb 2021 09:03:33 +0100
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
Message-ID: <c35540ea-515b-438d-929e-7f06d0f8eb1c@Canary>

Hey Friedrich,

If you can produce an MVCE that would be really helpful, along with your hardware and environment. Without that, it isn?t possible to be of much help.

https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

Best Regards,
Hameer Abbasi

--
Sent from Canary (https://canarymail.io)

> On Dienstag, Jan. 26, 2021 at 9:49 AM, Friedrich Romstedt <friedrichromstedt at gmail.com (mailto:friedrichromstedt at gmail.com)> wrote:
> Hi,
>
> This is with Python 3.8.2 64-bit and numpy 1.19.2 on Windows 10. I'd
> like to be able to convert some C++ extension type to a numpy array by
> using ``numpy.asarray``. The extension type implements the Python
> buffer interface to support this.
>
> The extension type, called "Image" here, holds some chunk of
> ``double``, C order, contiguous, 2 dimensions. It "owns" the buffer;
> the buffer is not shared with other objects. The following Python
> code crashes::
>
> image = <... Image production ...>
> ar = numpy.asarray(image)
>
> However, when I say::
>
> image = <... Image production ...>
> print("---")
> ar = numpy.asarray(image)
>
> the entire program is executing properly with correct data in the
> numpy ndarray produced using the buffer interface.
>
> The extension type permits reading the pixel values by a method;
> copying them over by a Python loop works fine. I am ``Py_INCREF``-ing
> the producer in the C++ buffer view creation function properly. The
> shapes and strides of the buffer view are ``delete[]``-ed upon
> releasing the buffer; avoiding this does not prevent the crash. I am
> catching ``std::exception`` in the view creation function; no such
> exception occurs. The shapes and strides are allocated by ``new
> Py_ssize_t[2]``, so they will survive the view creation function.
>
> I spent some hours trying to figure out what I am doing wrong. Maybe
> someone has an idea about this? I double-checked each line of code
> related to this problem and couldn't find any mistake. Probabaly I am
> not looking at the right aspect.
>
> Best,
> Friedrich
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210201/6afdec35/attachment.html>

From matti.picus at gmail.com  Mon Feb  1 03:46:37 2021
From: matti.picus at gmail.com (Matti Picus)
Date: Mon, 1 Feb 2021 10:46:37 +0200
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
Message-ID: <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>


On 2/1/21 9:57 AM, Friedrich Romstedt wrote:
> Hi,
>
> Am Di., 26. Jan. 2021 um 09:48 Uhr schrieb Friedrich Romstedt
> <friedrichromstedt at gmail.com>:
>> [...] The following Python
>> code crashes::
>>
>>      image = <... Image production ...>
>>      ar = numpy.asarray(image)
>>
>> However, when I say::
>>
>>      image = <... Image production ...>
>>      print("---")
>>      ar = numpy.asarray(image)
>>
>> the entire program is executing properly with correct data in the
>> numpy ndarray produced using the buffer interface.
>>
>> [...]
> Does anyone have an idea about this?  By the way, I noticed that this
> mailing list turned pretty quiet, am I missing something?
>
> For completeness, the abovementioned "crash" shows up as just a
> premature exit of the program.  There is no error message whatsoever.
> The buffer view producing function raises Exceptions properly when
> something goes wrong; also notice that this code completes without
> error when the ``print("---")`` statement is in action.  So I presume
> the culprit lies somewhere on the C level.  I can only guess that it
> might be some side-effect unknown to me.
>
> Best,
> Friedrich


It is very hard to help you from this description. It may be a refcount 
problem, it may be a buffer protocol problem, it may be something else. 
Typically, one would create a complete example and then pointing to the 
code (as repo or pastebin, not as an attachment to a mail here). A few 
things you might want to check:

- Make sure you give instructions how to build your project for Linux, 
since most of the people on this list do not use windows.

- There are tools out there to analyze refcount problems. Python has 
some built-in tools for switching allocation strategies.

- numpy.asarray has a number of strategies to convert instances, which 
one is it using?


Matti


From sebastian at sipsolutions.net  Tue Feb  2 19:02:26 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 02 Feb 2021 18:02:26 -0600
Subject: [Numpy-discussion] NumPy Community Meeting Wednesday
Message-ID: <080bd5452f7b3f5af6188bd0fde7e95c48a88234.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting Wednesday February 3rd at 12pm
Pacific Time (20:00 UTC). Everyone is invited and encouraged to
join in and edit the work-in-progress meeting topics and notes at:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210202/906ddc61/attachment.sig>

From calhoun137 at gmail.com  Wed Feb  3 18:18:49 2021
From: calhoun137 at gmail.com (Matt Calhoun)
Date: Wed, 3 Feb 2021 18:18:49 -0500
Subject: [Numpy-discussion] Math Inspector Beta
Message-ID: <CAKniQWWu-_72qiaz-kUJhM6vDaZWm=OtW+foQbCG3pZaiDyCcw@mail.gmail.com>

Hi Everyone!  I have been using numpy for an extremely long time, but this
is the first time emailing the list.  I recently released the beta version
of my free open source math app called math inspector, and so far the
response has been really amazing, it was on the front page of hacker news
all day sunday and went from 15 stars to 348 on GitHub since then.  I
wanted to reach out to the community to find out if people like this
project, have any feedback/suggestions/feature requests, or would possibly
be interested in placing a link to the website (mathinspector.com) on the
numpy homepage.

Math inspector is a python interpreter which contains a frozen version of
python and numpy, this makes it very easy for non-technical people to get
started, it also creates a block coding environment which represents the
memory of the running program.  This block coding environment is at such a
high level of generality that it's capable of working for all of python.
It also has an interactive graphing system made in pygame which updates and
modernizes all of the functionality in matplotlib.  This graphing system is
it's own stand alone module by the way.  Math inspector also has a
documentation browser which creates a beautiful interactive experience for
exploring the documentation.

Everything in math inspector has been designed specifically for numpy, even
though it works for all of python.  I started it 2 years ago when I got
really confused after searching through the numpy website, and I wanted to
build a system where I could dig into the modules in a directory file type
structure that was highly organized.  From there everything just took off.

The main goal of this project is to support the mathematics education
community on youtube, by providing a free tool that everyone can use to
share code samples for their videos, but I believe it has a wide range of
additional applications for scientific computing as well.

I have been working really hard on this project, and I really hope everyone
likes it!

You can find the full source code on the GitHub page:
https://github.com/MathInspector/MathInspector

Cheers!
- Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210203/c7d0d312/attachment.html>

From mansourmoufid at gmail.com  Wed Feb  3 22:03:26 2021
From: mansourmoufid at gmail.com (Mansour Moufid)
Date: Wed, 3 Feb 2021 22:03:26 -0500
Subject: [Numpy-discussion] Math Inspector Beta
In-Reply-To: <CAKniQWWu-_72qiaz-kUJhM6vDaZWm=OtW+foQbCG3pZaiDyCcw@mail.gmail.com>
References: <CAKniQWWu-_72qiaz-kUJhM6vDaZWm=OtW+foQbCG3pZaiDyCcw@mail.gmail.com>
Message-ID: <CALogXGXDOYgwZ87oAoYHTv5+TWZxhNaJdh=dMixHRyQ8F66Tiw@mail.gmail.com>

Very cool!

But the Mac disk image (mathinspector_0.9.1.dmg) isn't opening ("corrupt
image").

It's 145279488 bytes and the shasum ends with f1ed9231.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210203/24030aec/attachment.html>

From friedrichromstedt at gmail.com  Thu Feb  4 03:07:59 2021
From: friedrichromstedt at gmail.com (Friedrich Romstedt)
Date: Thu, 4 Feb 2021 09:07:59 +0100
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
Message-ID: <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>

Hello Matti,

Am Mo., 1. Feb. 2021 um 09:46 Uhr schrieb Matti Picus <matti.picus at gmail.com>:
>
> [...]
>
> It is very hard to help you from this description. It may be a refcount
> problem, it may be a buffer protocol problem, it may be something else.

Yes, indeed!

> Typically, one would create a complete example and then pointing to the
> code (as repo or pastebin, not as an attachment to a mail here).

https://github.com/friedrichromstedt/bughunting-01

I boiled it down considerably, compared to the program where I
stumbled upon the problem.  In the abovementioned repo, you find a
Python test script in the `test/` folder.  Therein, a single `print`
statement can be used to trigger or to avoid the error.  On Linux, I
get a somewhat more precise description than just from the premature
exit on Windows: It is a segfault.

Certainly it is still asked quite much to skim through my source code,
however, I hope that I trimmed it down sufficiently.

> - Make sure you give instructions how to build your project for Linux,
> since most of the people on this list do not use windows.

The code reproducing the segfault can be compiled by `$ python3
setup.py install`, both on Windows as well as on Linux.

> - There are tools out there to analyze refcount problems. Python has
> some built-in tools for switching allocation strategies.

Can you give me some pointer about this?

> - numpy.asarray has a number of strategies to convert instances, which
> one is it using?

I've tried to read about this, but coudn't find anything.  What are
these different strategies?

Many thanks in advance,
Friedrich

From ralf.gommers at gmail.com  Thu Feb  4 04:36:58 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 4 Feb 2021 10:36:58 +0100
Subject: [Numpy-discussion] Math Inspector Beta
In-Reply-To: <CAKniQWWu-_72qiaz-kUJhM6vDaZWm=OtW+foQbCG3pZaiDyCcw@mail.gmail.com>
References: <CAKniQWWu-_72qiaz-kUJhM6vDaZWm=OtW+foQbCG3pZaiDyCcw@mail.gmail.com>
Message-ID: <CABL7CQhJOSeKzqsmor4CqouJtaVm-XQmXHEUVXsqQfCa6BxDtQ@mail.gmail.com>

Hi Matt,

Very cool, thanks for sharing!


On Thu, Feb 4, 2021 at 12:19 AM Matt Calhoun <calhoun137 at gmail.com> wrote:

> Hi Everyone!  I have been using numpy for an extremely long time, but this
> is the first time emailing the list.  I recently released the beta version
> of my free open source math app called math inspector, and so far the
> response has been really amazing, it was on the front page of hacker news
> all day sunday and went from 15 stars to 348 on GitHub since then.  I
> wanted to reach out to the community to find out if people like this
> project, have any feedback/suggestions/feature requests, or would possibly
> be interested in placing a link to the website (mathinspector.com) on the
> numpy homepage.
>

We have an Ecosystem section on numpy.org, we can add it there. There's an
Interactive Computing section where it kind of fits (although a place
labeled education would be better). There's some discussion on the numpy.org
issue tracker (
https://github.com/numpy/numpy.org/issues/313#issuecomment-751466980) about
moving that to its own tab instead of having it as an entry under
"Scientific computing", but for now we could add it there under
Jupyter/IPython/Binder.


> Math inspector is a python interpreter which contains a frozen version of
> python and numpy, this makes it very easy for non-technical people to get
> started, it also creates a block coding environment which represents the
> memory of the running program.  This block coding environment is at such a
> high level of generality that it's capable of working for all of python.
> It also has an interactive graphing system made in pygame which updates and
> modernizes all of the functionality in matplotlib.  This graphing system is
> it's own stand alone module by the way.  Math inspector also has a
> documentation browser which creates a beautiful interactive experience for
> exploring the documentation.
>
> Everything in math inspector has been designed specifically for
> numpy, even though it works for all of python.  I started it 2 years ago
> when I got really confused after searching through the numpy website, and I
> wanted to build a system where I could dig into the modules in a directory
> file type structure that was highly organized.  From there everything just
> took off.
>

One thing I realized when browsing through the video on your front page is
that the public module layout we have is very unhelpful for this kind of
education - it'd be good if we had a way to hide things like core, emath,
matrixlib, etc. that we don't want people to import and use directly.
Essentially we'd to teach people mostly about the main namespace, and fft,
linalg, and random.

If you have other thoughts on what would help you to make NumPy more
approachable, in Math Inspector or in general, those would be great to hear.

Cheers,
Ralf


> The main goal of this project is to support the mathematics education
> community on youtube, by providing a free tool that everyone can use to
> share code samples for their videos, but I believe it has a wide range of
> additional applications for scientific computing as well.
>
> I have been working really hard on this project, and I really hope
> everyone likes it!
>
> You can find the full source code on the GitHub page:
> https://github.com/MathInspector/MathInspector
>
> Cheers!
> - Matt
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210204/0bed4c02/attachment-0001.html>

From melissawm at gmail.com  Thu Feb  4 08:55:50 2021
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Thu, 4 Feb 2021 10:55:50 -0300
Subject: [Numpy-discussion] Math Inspector Beta
In-Reply-To: <CABL7CQhJOSeKzqsmor4CqouJtaVm-XQmXHEUVXsqQfCa6BxDtQ@mail.gmail.com>
References: <CAKniQWWu-_72qiaz-kUJhM6vDaZWm=OtW+foQbCG3pZaiDyCcw@mail.gmail.com>
 <CABL7CQhJOSeKzqsmor4CqouJtaVm-XQmXHEUVXsqQfCa6BxDtQ@mail.gmail.com>
Message-ID: <CAC7J6VYRsT96_DfOqBFrpZbQq-aL_8w4CdO6A-aaVsWLmu9UMA@mail.gmail.com>

Hi Matt!

This is great timing - we actually talked about mathinspector in our
Documentation Team meeting on monday (you can see the meeting notes here:
https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg

If you are interested, you are welcome to join our slack space and/or our
docs meetings, we would love to chat in more detail.

Cheers,

Melissa

On Thu, Feb 4, 2021 at 6:38 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

> Hi Matt,
>
> Very cool, thanks for sharing!
>
>
> On Thu, Feb 4, 2021 at 12:19 AM Matt Calhoun <calhoun137 at gmail.com> wrote:
>
>> Hi Everyone!  I have been using numpy for an extremely long time, but
>> this is the first time emailing the list.  I recently released the beta
>> version of my free open source math app called math inspector, and so far
>> the response has been really amazing, it was on the front page of hacker
>> news all day sunday and went from 15 stars to 348 on GitHub since then.  I
>> wanted to reach out to the community to find out if people like this
>> project, have any feedback/suggestions/feature requests, or would possibly
>> be interested in placing a link to the website (mathinspector.com) on
>> the numpy homepage.
>>
>
> We have an Ecosystem section on numpy.org, we can add it there. There's
> an Interactive Computing section where it kind of fits (although a place
> labeled education would be better). There's some discussion on the
> numpy.org issue tracker (
> https://github.com/numpy/numpy.org/issues/313#issuecomment-751466980)
> about moving that to its own tab instead of having it as an entry under
> "Scientific computing", but for now we could add it there under
> Jupyter/IPython/Binder.
>
>
>> Math inspector is a python interpreter which contains a frozen version of
>> python and numpy, this makes it very easy for non-technical people to get
>> started, it also creates a block coding environment which represents the
>> memory of the running program.  This block coding environment is at such a
>> high level of generality that it's capable of working for all of python.
>> It also has an interactive graphing system made in pygame which updates and
>> modernizes all of the functionality in matplotlib.  This graphing system is
>> it's own stand alone module by the way.  Math inspector also has a
>> documentation browser which creates a beautiful interactive experience for
>> exploring the documentation.
>>
>> Everything in math inspector has been designed specifically for
>> numpy, even though it works for all of python.  I started it 2 years ago
>> when I got really confused after searching through the numpy website, and I
>> wanted to build a system where I could dig into the modules in a directory
>> file type structure that was highly organized.  From there everything just
>> took off.
>>
>
> One thing I realized when browsing through the video on your front page is
> that the public module layout we have is very unhelpful for this kind of
> education - it'd be good if we had a way to hide things like core, emath,
> matrixlib, etc. that we don't want people to import and use directly.
> Essentially we'd to teach people mostly about the main namespace, and fft,
> linalg, and random.
>
> If you have other thoughts on what would help you to make NumPy more
> approachable, in Math Inspector or in general, those would be great to hear.
>
> Cheers,
> Ralf
>
>
>
>> The main goal of this project is to support the mathematics education
>> community on youtube, by providing a free tool that everyone can use to
>> share code samples for their videos, but I believe it has a wide range of
>> additional applications for scientific computing as well.
>>
>> I have been working really hard on this project, and I really hope
>> everyone likes it!
>>
>> You can find the full source code on the GitHub page:
>> https://github.com/MathInspector/MathInspector
>>
>> Cheers!
>> - Matt
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210204/1b847c0c/attachment.html>

From calhoun137 at gmail.com  Thu Feb  4 09:09:01 2021
From: calhoun137 at gmail.com (Matt Calhoun)
Date: Thu, 4 Feb 2021 09:09:01 -0500
Subject: [Numpy-discussion] Math Inspector Beta
Message-ID: <CAKniQWVvsdhmdMO2A4K92ngQ_HYbg62XUwf_xSd_DHYqV3CXgg@mail.gmail.com>

@ Mansour Moufid
> Very cool!
> But the Mac disk image (mathinspector_0.9.1.dmg) isn't opening ("corrupt
image").
> It's 145279488 bytes and the shasum ends with f1ed9231.

Oh no, whoops!  The .dmg file has been code signed with my apple developer
id, notarized with apple, and passes all verification checks on my machine
when I download it from the website.  Ever since sunday I have been
scrambling to support every platform and os version out there basically,
and this is the first time I saw this one.  For the sake of avoiding using
the mailing list to debug, would be willing to open an issue on the Math
Inspector github page?  Thanks!  (btw I checked the file on my machine and
its the same filesize with the same shasum, so my guess is there is a
pyinstaller issue related to an os version conflict, or a code signing
issue, not sure though, I built it on BigSur 11.1)

@ Ralf Gommers
> Very cool, thanks for sharing!
Thank you!!!

> We have an Ecosystem section on numpy.org, we can add it there

It's really important to me to make math inspector a part of the numpy
ecosystem, and since this is the first time I am reaching out to the
mailing list, I'd like emphasize that I am more than willing to work with
the community to improve the product, respond to bug reports & feature
requests, and in general I strongly value constructive criticism.

> One thing I realized when browsing through the video on your front page is
> that the public module layout we have is very unhelpful for this kind of
> education...If you have other thoughts on what would help you to make
NumPy more
> approachable, in Math Inspector or in general, those would be great to
hear.

I completely agree with your observation here.  It hadn't occurred to me to
change numpy to make it better for math inspector, but I think you are
hitting the nail on the head when you suggest re-organizing the file
structure of the core package.

The main suggestion I have is to update the documentation in a way that
leverages the power of math inspector.  The math inspector doc browser is a
powerful tool with lots of extra functionality that is not available from
the website or in the normal python help() function.  This extra
functionality could be used to make numpy more approachable.  For example,
replace references to matplotlib in the doc's with mathinspector.plot(),
and substitute mathinspector for iPython as the recommended tool.

Thanks for this fantastic feedback!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210204/3fdbd06f/attachment.html>

From charlesr.harris at gmail.com  Thu Feb  4 12:20:39 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Thu, 4 Feb 2021 10:20:39 -0700
Subject: [Numpy-discussion] Math Inspector Beta
In-Reply-To: <CAKniQWWu-_72qiaz-kUJhM6vDaZWm=OtW+foQbCG3pZaiDyCcw@mail.gmail.com>
References: <CAKniQWWu-_72qiaz-kUJhM6vDaZWm=OtW+foQbCG3pZaiDyCcw@mail.gmail.com>
Message-ID: <CAB6mnxL1rHN_uLS2wcjouEtG2rsVMSw_b-c8dWe2QxnHhsN+_Q@mail.gmail.com>

On Wed, Feb 3, 2021 at 4:19 PM Matt Calhoun <calhoun137 at gmail.com> wrote:

> Hi Everyone!  I have been using numpy for an extremely long time, but this
> is the first time emailing the list.  I recently released the beta version
> of my free open source math app called math inspector, and so far the
> response has been really amazing, it was on the front page of hacker news
> all day sunday and went from 15 stars to 348 on GitHub since then.  I
> wanted to reach out to the community to find out if people like this
> project, have any feedback/suggestions/feature requests, or would possibly
> be interested in placing a link to the website (mathinspector.com) on the
> numpy homepage.
>
> Math inspector is a python interpreter which contains a frozen version of
> python and numpy, this makes it very easy for non-technical people to get
> started, it also creates a block coding environment which represents the
> memory of the running program.  This block coding environment is at such a
> high level of generality that it's capable of working for all of python.
> It also has an interactive graphing system made in pygame which updates and
> modernizes all of the functionality in matplotlib.  This graphing system is
> it's own stand alone module by the way.  Math inspector also has a
> documentation browser which creates a beautiful interactive experience for
> exploring the documentation.
>
> Everything in math inspector has been designed specifically for
> numpy, even though it works for all of python.  I started it 2 years ago
> when I got really confused after searching through the numpy website, and I
> wanted to build a system where I could dig into the modules in a directory
> file type structure that was highly organized.  From there everything just
> took off.
>
> The main goal of this project is to support the mathematics education
> community on youtube, by providing a free tool that everyone can use to
> share code samples for their videos, but I believe it has a wide range of
> additional applications for scientific computing as well.
>
> I have been working really hard on this project, and I really hope
> everyone likes it!
>
> You can find the full source code on the GitHub page:
> https://github.com/MathInspector/MathInspector
>
> Cheers!
> - Matt
>

Somewhat off topic, but this brought to mind Model Based Design
<https://en.wikipedia.org/wiki/Model-based_design>. MBD is a
different subject, but I suspect the same underlying tools used for
MathInspector might be useful.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210204/68924783/attachment.html>

From camel-cdr at protonmail.com  Sat Feb  6 04:31:40 2021
From: camel-cdr at protonmail.com (camel-cdr at protonmail.com)
Date: Sat, 06 Feb 2021 09:31:40 +0000
Subject: [Numpy-discussion] Question about optimizing random_standard_normal
Message-ID: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>

I tried to implement a different implementation of the ziggurat method for generating standard normal distributions that is about twice as fast and uses 2/3 of the memory than the old one.
I tested the implementation separately and am very confident it's correct, but it does fail 28 test in coverage testing.
Checking the testing code I found out that all the failed tests are inside TestRandomDist which has the goal of "Make[ing] sure the random distribution returns the correct value for a given seed". Why would this be needed?
The only explanation I can come up with is that it's standard_normal is, in regards to seeding, required to be backwards compatible. If that's the case how would, could one even implement a new algorithm?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210206/be8af3e9/attachment.html>

From tom at swirly.com  Sat Feb  6 06:22:04 2021
From: tom at swirly.com (Tom Swirly)
Date: Sat, 6 Feb 2021 12:22:04 +0100
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
Message-ID: <CAOuQWfWEOk=dd5+LGL9HyztcbsFN-LpVJRHQmAiAVTLe35BzEg@mail.gmail.com>

Well, I can tell you why it needs to be backward compatible!  I use random
numbers fairly frequently, and to unit test them I set a specific seed and
then make sure I get the same answers.

If your change went in (and I were using numpy normal distributions, which
I am not) then my tests would break.

Particularly, you'd have the unfixable problem that it would be impossible
to write tests for your code that worked regardless of the version of numpy
that was installed.

Yes, I agree that in your use case, this is powerfully unfortunate, and
prevents you from making a change that would otherwise benefit everyone.

The three ways to do this would be the following:

   - Add a new parameter to the function call, say, faster=False, which you
   set True to get the new behavior
   - Add a global flag somewhere you set to get the new behavior everywhere
   - Create a new function called normal_faster or some such

All of these are ugly for obvious reasons.

On Sat, Feb 6, 2021 at 10:33 AM <camel-cdr at protonmail.com> wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 
     /t

PGP Key: https://flowcrypt.com/pub/tom.ritchford at gmail.com
*https://tom.ritchford.com <https://tom.ritchford.com>*
*https://tom.swirly.com <https://tom.swirly.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210206/1bcd3877/attachment.html>

From kevin.k.sheppard at gmail.com  Sat Feb  6 06:54:32 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Sat, 6 Feb 2021 11:54:32 +0000
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
Message-ID: <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>

Have you benchmarked it using the generator interface? The structure of
this as a no monolithic generator makes it a good deal slower than
generating in straight C (with everything inline).  While I'm not sure a
factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
don't know where the cutoff is).

Can you post benchmarks from using it through Generator?

Also, those tests would be replaced with new values if the patch was
accepted, so don't worry about them.

Kevin


On Sat, Feb 6, 2021, 09:32 <camel-cdr at protonmail.com> wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210206/b621a2d2/attachment.html>

From camel-cdr at protonmail.com  Sat Feb  6 07:25:49 2021
From: camel-cdr at protonmail.com (camel-cdr at protonmail.com)
Date: Sat, 06 Feb 2021 12:25:49 +0000
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
Message-ID: <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>

> Well, I can tell you why it needs to be backward compatible! I use random numbers fairly frequently, and to unit test them I set a specific seed and then make sure I get the same answers.

Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same thing with the same bit's as the original one, but I couldn't get it to work.

> Have you benchmarked it using the generator interface? The structure of this as a no monolithic generator makes it a good deal slower than generating in straight C (with everything inline). While I'm not sure a factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I don't know where the cutoff is).

I originally benchmarked my implementation against a bunch of other ones in c (because I was developing a c library https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
But I did run the built-in benchmark: ./runtests.py --bench bench_random.RNG.time_normal_zig and the results are:

new old
PCG64 589?3?s 1.06?0.03ms
MT19937 985?4?s 1.44?0.01ms
Philox 981?30?s 1.39?0.01ms
SFC64 508?4?s 900?4?s
numpy 2.99?0.06ms 2.98?0.01ms # no change for /dev/urandom

I'm not yet 100% certain about the implementations, but I attached a diff of my current progress.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210206/ac013d05/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ziggurat.diff
Type: text/x-diff
Size: 59720 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210206/ac013d05/attachment-0001.diff>

From charlesr.harris at gmail.com  Sat Feb  6 09:19:33 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 6 Feb 2021 07:19:33 -0700
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
Message-ID: <CAB6mnxJrw3iK+nzNn-BOrU16wKFWGxyqCbkfBnyO59xr6N0oUw@mail.gmail.com>

On Sat, Feb 6, 2021 at 5:27 AM <camel-cdr at protonmail.com> wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>
> Have you benchmarked it using the generator interface? The structure of
> this as a no monolithic generator makes it a good deal slower than
> generating in straight C (with everything inline).  While I'm not sure a
> factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
> don't know where the cutoff is).
>
>
> I originally benchmarked my implementation against a bunch of other ones
> in c (because I was developing a c library
> https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
> But I did run the built-in benchmark: ./runtests.py --bench
> bench_random.RNG.time_normal_zig and the results are:
>
>               new           old
> PCG64      589?3?s     1.06?0.03ms
> MT19937     985?4?s     1.44?0.01ms
> Philox     981?30?s    1.39?0.01ms
> SFC64      508?4?s       900?4?s
> numpy    2.99?0.06ms   2.98?0.01ms # no change for /dev/urandom
>
>
> I'm not yet 100% certain about the implementations, but I attached a diff
> of my current progress.
>
>
You can actually get rid of the loop entirely and implement the exponential
function directly by using an exponential bound on the bottom ziggurat
block ends. It just requires a slight change in the block boundaries.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210206/0265c016/attachment.html>

From robert.kern at gmail.com  Sat Feb  6 09:29:46 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 6 Feb 2021 09:29:46 -0500
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
Message-ID: <CAF6FJisqaXZobyGTp4J1f=SuxwU5+igoNpBUw5WEKQSKhn4obA@mail.gmail.com>

On Sat, Feb 6, 2021 at 7:27 AM <camel-cdr at protonmail.com> wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>

To be clear, this is not our backwards compatibility policy for the methods
that you have modified. Our policy is spelled out here:

  https://numpy.org/neps/nep-0019-rng-policy.html

The TestRandomDist suite of tests were adapted from the older RandomState
(which is indeed frozen and not allowed to change algorithms). It's a mix
of correctness tests that are valid regardless of the precise algorithm
(does this method reject invalid arguments? do degenerate arguments yield
the correct constant value?) and actual "has this algorithm changed
unexpectedly?" tests. The former are the most valuable, but the latter are
useful for testing in cross-platform contexts. Compilers and different CPUs
can do naughty things sometimes, and we want the cross-platform differences
to be minimal. When you do change an algorithm implementation for
Generator, as you have done, you are expected to do thorough tests
(offline, not in the unit tests) that it is correctly sampling from the
target probability distribution, then once satisfied, change the hard-coded
values in TestRandomDist to match whatever you are generating.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210206/35668269/attachment.html>

From camel-cdr at protonmail.com  Sat Feb  6 09:49:07 2021
From: camel-cdr at protonmail.com (camel-cdr at protonmail.com)
Date: Sat, 06 Feb 2021 14:49:07 +0000
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <CAF6FJisqaXZobyGTp4J1f=SuxwU5+igoNpBUw5WEKQSKhn4obA@mail.gmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
 <CAF6FJisqaXZobyGTp4J1f=SuxwU5+igoNpBUw5WEKQSKhn4obA@mail.gmail.com>
Message-ID: <L2KrQmSafu_zECXQhF4kkWwF3YRKj65wsdZd1aDVnccsle0Qmpl91fS1AVmsKjL1Vu-LoWFN5dOMZXuQFjB0LMt00rsijnTIReRLJ2yl13c=@protonmail.com>

??????? Original Message ???????
On Saturday, February 6, 2021 3:29 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Sat, Feb 6, 2021 at 7:27 AM <camel-cdr at protonmail.com> wrote:
>
>>> Well, I can tell you why it needs to be backward compatible! I use random numbers fairly frequently, and to unit test them I set a specific seed and then make sure I get the same answers.
>>
>> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the same thing with the same bit's as the original one, but I couldn't get it to work.
>
> To be clear, this is not our backwards compatibility policy for the methods that you have modified. Our policy is spelled out here:
>
> https://numpy.org/neps/nep-0019-rng-policy.html
>
> The TestRandomDist suite of tests were adapted from the older RandomState (which is indeed frozen and not allowed to change algorithms). It's a mix of correctness tests that are valid regardless of the precise algorithm (does this method reject invalid arguments? do degenerate arguments yield the correct constant value?) and actual "has this algorithm changed unexpectedly?" tests. The former are the most valuable, but the latter are useful for testing in cross-platform contexts. Compilers and different CPUs can do naughty things sometimes, and we want the cross-platform differences to be minimal. When you do change an algorithm implementation for Generator, as you have done, you are expected to do thorough tests (offline, not in the unit tests) that it is correctly sampling from the target probability distribution, then once satisfied, change the hard-coded values in TestRandomDist to match whatever you are generating.
>
> --
> Robert Kern

Ok, cool, that basically explains a lot.

> When you do change an algorithm implementation for Generator, as you have done, you are expected to do thorough tests (offline, not in the unit tests) that it is correctly sampling from the target probability distribution, then once satisfied, change the hard-coded values in TestRandomDist to match whatever you are generating.

I'm probably not versed enough in statistics to do thorough testing. I used the testing in https://www.seehuhn.de/pages/ziggurat and plotting histograms to verify correctness, that probably won't be sufficient.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210206/89700e86/attachment-0001.html>

From charlesr.harris at gmail.com  Sun Feb  7 13:12:21 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 7 Feb 2021 11:12:21 -0700
Subject: [Numpy-discussion] Pearu Peterson has joined the NumPy developers
 team.
Message-ID: <CAB6mnxLe3T1tErtP8-O2bTjB1bAqE=1mr-pwFM-UJx0PiKCApQ@mail.gmail.com>

Hi All,

Pearu Peterson has joined the NumPy developers team. Pearu was responsible
for contributing f2py and much of distutils in the early days of NumPy.
Welcome back Pearu.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210207/b294c9a9/attachment.html>

From stefanv at berkeley.edu  Sun Feb  7 15:08:41 2021
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Sun, 07 Feb 2021 12:08:41 -0800
Subject: [Numpy-discussion] 
 =?utf-8?q?Pearu_Peterson_has_joined_the_NumPy?=
 =?utf-8?q?_developers_team=2E?=
In-Reply-To: <CAB6mnxLe3T1tErtP8-O2bTjB1bAqE=1mr-pwFM-UJx0PiKCApQ@mail.gmail.com>
References: <CAB6mnxLe3T1tErtP8-O2bTjB1bAqE=1mr-pwFM-UJx0PiKCApQ@mail.gmail.com>
Message-ID: <6fe3b84f-6ae0-4b01-934c-5cd8605a23bc@www.fastmail.com>

On Sun, Feb 7, 2021, at 10:12, Charles R Harris wrote:
> Pearu Peterson has joined the NumPy developers team. Pearu was responsible for contributing f2py and much of distutils in the early days of NumPy. Welcome back Pearu.

Welcome back, it's good to see you around more, Pearu!

Best regards,
St?fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210207/2a001b57/attachment.html>

From charlesr.harris at gmail.com  Sun Feb  7 16:23:04 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sun, 7 Feb 2021 14:23:04 -0700
Subject: [Numpy-discussion] NumPy 1.20.1 released.
Message-ID: <CAB6mnxJYuauVDMO1+WP-FKfhgc+R1gsUjF_aHWMUmZz7xsVgzA@mail.gmail.com>

Hi All,

On behalf of the NumPy team I am pleased to announce the release of NumPy
1.20.1. NumPy 1.20.1 is a rapid bugfix release fixing several bugs and
regressions reported after the 1.20.0 release. The Python versions
supported for this release are 3.7-3.9. Wheels can be downloaded from PyPI
<https://pypi.org/project/numpy/1.20.1/>; source archives, release notes,
and wheel hashes are available on Github
<https://github.com/numpy/numpy/releases/tag/v1.20.1>. Linux users will
need pip >= 0.19.3 in order to install manylinux2010 and manylinux2014
wheels.

*Highlights*

   -  The distutils bug that caused problems with downstream projects is
   fixed.
   -  The ``random.shuffle`` regression is fixed.

*Contributors*

A total of 8 people contributed to this release.  People with a "+" by their
names contributed a patch for the first time.

   - Bas van Beek
   - Charles Harris
   - Nicholas McKibben +
   - Pearu Peterson
   - Ralf Gommers
   - Sebastian Berg
   - Tyler Reddy
   - @Aerysv +

*Pull requests merged*

A total of 15 pull requests were merged for this release.

   - gh-18306: MAINT: Add missing placeholder annotations
   - gh-18310: BUG: Fix typo in ``numpy.__init__.py``
   - gh-18326: BUG: don't mutate list of fake libraries while iterating
   over...
   - gh-18327: MAINT: gracefully shuffle memoryviews
   - gh-18328: BUG: Use C linkage for random distributions
   - gh-18336: CI: fix when GitHub Actions builds trigger, and allow ci
   skips
   - gh-18337: BUG: Allow unmodified use of isclose, allclose, etc. with
   timedelta
   - gh-18345: BUG: Allow pickling all relevant DType types/classes
   - gh-18351: BUG: Fix missing signed_char dependency. Closes #18335.
   - gh-18352: DOC: Change license date 2020 -> 2021
   - gh-18353: CI: CircleCI seems to occasionally time out, increase the
   limit
   - gh-18354: BUG: Fix f2py bugs when wrapping F90 subroutines.
   - gh-18356: MAINT: crackfortran regex simplify
   - gh-18357: BUG: threads.h existence test requires GLIBC > 2.12.
   - gh-18359: REL: Prepare for the NumPy 1.20.1 release.

Cheers,

Charles Harris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210207/9d15a2b5/attachment.html>

From kevin.k.sheppard at gmail.com  Mon Feb  8 03:03:34 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Mon, 8 Feb 2021 08:03:34 +0000
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
Message-ID: <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>

If I understand correctly, there is no gain when applying this patch to
Generator.  I'm not that surprised that this is the case since the compiler
is much more limited in when it can do in Generator than what it can when
filling a large array directly with functions available for inlining and
unrolling. Again, if I understand correctly I think it will be difficult to
justify breaking the stream for a negligible gain in performance.

Kevin


On Sat, Feb 6, 2021 at 12:27 PM <camel-cdr at protonmail.com> wrote:

> Well, I can tell you why it needs to be backward compatible!  I use random
> numbers fairly frequently, and to unit test them I set a specific seed and
> then make sure I get the same answers.
>
> Hmm, I guess that makes sense. I tried to adjust my algorithms to do the
> same thing with the same bit's as the original one, but I couldn't get it
> to work.
>
> Have you benchmarked it using the generator interface? The structure of
> this as a no monolithic generator makes it a good deal slower than
> generating in straight C (with everything inline).  While I'm not sure a
> factor of 2 is enough to justify a change (for me 10x, 1.2x is not but I
> don't know where the cutoff is).
>
>
> I originally benchmarked my implementation against a bunch of other ones
> in c (because I was developing a c library
> https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h#L1928).
> But I did run the built-in benchmark: ./runtests.py --bench
> bench_random.RNG.time_normal_zig and the results are:
>
>               new           old
> PCG64      589?3?s     1.06?0.03ms
> MT19937     985?4?s     1.44?0.01ms
> Philox     981?30?s    1.39?0.01ms
> SFC64      508?4?s       900?4?s
> numpy    2.99?0.06ms   2.98?0.01ms # no change for /dev/urandom
>
>
> I'm not yet 100% certain about the implementations, but I attached a diff
> of my current progress.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/90d4c1b8/attachment-0001.html>

From ilhanpolat at gmail.com  Mon Feb  8 04:51:13 2021
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Mon, 8 Feb 2021 10:51:13 +0100
Subject: [Numpy-discussion] Pearu Peterson has joined the NumPy
 developers team.
In-Reply-To: <6fe3b84f-6ae0-4b01-934c-5cd8605a23bc@www.fastmail.com>
References: <CAB6mnxLe3T1tErtP8-O2bTjB1bAqE=1mr-pwFM-UJx0PiKCApQ@mail.gmail.com>
 <6fe3b84f-6ae0-4b01-934c-5cd8605a23bc@www.fastmail.com>
Message-ID: <CAEBuzr9q3HNqK50+2j_gKmhp6_YH+LAsjEZ1YL8-zUF9Q+DA4A@mail.gmail.com>

This is very comforting news :) Welcome back

On Sun, Feb 7, 2021 at 9:10 PM Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> On Sun, Feb 7, 2021, at 10:12, Charles R Harris wrote:
>
> Pearu Peterson has joined the NumPy developers team. Pearu was responsible
> for contributing f2py and much of distutils in the early days of NumPy.
> Welcome back Pearu.
>
>
> Welcome back, it's good to see you around more, Pearu!
>
> Best regards,
> St?fan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/050e8fb6/attachment.html>

From robert.kern at gmail.com  Mon Feb  8 10:40:40 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 8 Feb 2021 10:40:40 -0500
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
 <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>
Message-ID: <CAF6FJiuxVK-ZTHUB4aBE85=3K8_c0N1ZHRf7FmRAjSHrO_H3Ow@mail.gmail.com>

On Mon, Feb 8, 2021 at 3:05 AM Kevin Sheppard <kevin.k.sheppard at gmail.com>
wrote:

> If I understand correctly, there is no gain when applying this patch to
> Generator.  I'm not that surprised that this is the case since the compiler
> is much more limited in when it can do in Generator than what it can when
> filling a large array directly with functions available for inlining and
> unrolling. Again, if I understand correctly I think it will be difficult to
> justify breaking the stream for a negligible gain in performance.
>

Can you explain your understanding of the benchmark results? To me, it
looks like nearly a 2x improvement with the faster BitGenerators (our
default PCG64 and SFC64). That may or may not worth breaking the stream,
but it's far from negligible.

> But I did run the built-in benchmark: ./runtests.py --bench
>> bench_random.RNG.time_normal_zig and the results are:
>>
>>
>>               new           old
>> PCG64      589?3?s     1.06?0.03ms
>> MT19937     985?4?s     1.44?0.01ms
>> Philox     981?30?s    1.39?0.01ms
>> SFC64      508?4?s       900?4?s
>> numpy    2.99?0.06ms   2.98?0.01ms # no change for /dev/urandom
>>
>
-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/5bc42138/attachment.html>

From kevin.k.sheppard at gmail.com  Mon Feb  8 10:52:05 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Mon, 8 Feb 2021 15:52:05 +0000
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <CAF6FJiuxVK-ZTHUB4aBE85=3K8_c0N1ZHRf7FmRAjSHrO_H3Ow@mail.gmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
 <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>,
 <CAF6FJiuxVK-ZTHUB4aBE85=3K8_c0N1ZHRf7FmRAjSHrO_H3Ow@mail.gmail.com>
Message-ID: <11BB0603-25E4-4B02-9B55-F783E54B51DA@hxcore.ol>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/e2313cf8/attachment.html>

From robert.kern at gmail.com  Mon Feb  8 11:05:27 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 8 Feb 2021 11:05:27 -0500
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <11BB0603-25E4-4B02-9B55-F783E54B51DA@hxcore.ol>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
 <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>
 <CAF6FJiuxVK-ZTHUB4aBE85=3K8_c0N1ZHRf7FmRAjSHrO_H3Ow@mail.gmail.com>
 <11BB0603-25E4-4B02-9B55-F783E54B51DA@hxcore.ol>
Message-ID: <CAF6FJisP_a5X3e+qRXnq_t8Z0w0y9KmbULV0yLmZVTz0=eC_Sg@mail.gmail.com>

On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard <kevin.k.sheppard at gmail.com>
wrote:

> My reading is that the first 4 are pure C, presumably using the standard
> practice of inclining so as to make the tightest loop possible, and to
> allow the compiler to make other optimizations.  The final line is what
> happens when you replace the existing ziggurat in NumPy with the new one. I
> read it this way since it has both ?new? and ?old? with numpy. If it isn?t
> this, then I?m unsure what ?new? and ?old? could mean in the context of
> this thread.
>

No, these are our benchmarks of `Generator`. `numpy` is testing
`RandomState`, which wasn't touched by their contribution.


https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L93-L97

https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L123-L124


> I suppose camel-cdr can clarify what was actually done.
>

But I did run the built-in benchmark: ./runtests.py --bench
> bench_random.RNG.time_normal_zig and the results are:
>
>
-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/82ed257b/attachment-0001.html>

From kevin.k.sheppard at gmail.com  Mon Feb  8 11:36:51 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Mon, 8 Feb 2021 16:36:51 +0000
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <CAF6FJisP_a5X3e+qRXnq_t8Z0w0y9KmbULV0yLmZVTz0=eC_Sg@mail.gmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
 <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>
 <CAF6FJiuxVK-ZTHUB4aBE85=3K8_c0N1ZHRf7FmRAjSHrO_H3Ow@mail.gmail.com>
 <11BB0603-25E4-4B02-9B55-F783E54B51DA@hxcore.ol>,
 <CAF6FJisP_a5X3e+qRXnq_t8Z0w0y9KmbULV0yLmZVTz0=eC_Sg@mail.gmail.com>
Message-ID: <A9EDDFC4-B3E2-4736-8C10-2C5C8D3839A4@hxcore.ol>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/ff473a9f/attachment.html>

From robert.kern at gmail.com  Mon Feb  8 12:05:28 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 8 Feb 2021 12:05:28 -0500
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <A9EDDFC4-B3E2-4736-8C10-2C5C8D3839A4@hxcore.ol>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
 <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>
 <CAF6FJiuxVK-ZTHUB4aBE85=3K8_c0N1ZHRf7FmRAjSHrO_H3Ow@mail.gmail.com>
 <11BB0603-25E4-4B02-9B55-F783E54B51DA@hxcore.ol>
 <CAF6FJisP_a5X3e+qRXnq_t8Z0w0y9KmbULV0yLmZVTz0=eC_Sg@mail.gmail.com>
 <A9EDDFC4-B3E2-4736-8C10-2C5C8D3839A4@hxcore.ol>
Message-ID: <CAF6FJisHMF+_sHoZUO8jiic5=Me+Qx11cBXm1LK-AkdfBZR7tQ@mail.gmail.com>

On Mon, Feb 8, 2021 at 11:38 AM Kevin Sheppard <kevin.k.sheppard at gmail.com>
wrote:

> That is good news indeed.  Seems like a good upgrade, especially given the
> breadth of application of normals and the multiple appearances within
> distributions.c (e.g., Cauchy). Is there a deprecation for a change like
> this? Or is it just a note and new random numbers in the next major?  I
> think this is the first time a substantially new algo has replaced an
> existing one.
>

Per NEP 19, a change like this is a new feature that can be included in a
feature release, like any other feature.

I would like to see some more testing on the quality of the sequences more
than the ones already quoted. Using Kolmogorov-Smirnov and/or
Anderson-Darling tests as well, which should be more thorough.


https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L604-L620

There are also some subtle issues involved in ziggurat method
implementations that go beyond just the marginal distributions. I'm not
sure, even, that our current implementation deals with the issues raised in
the following paper, but I'd like to do no worse.

  https://www.doornik.com/research/ziggurat.pdf

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/90d5dac8/attachment.html>

From sebastian at sipsolutions.net  Mon Feb  8 12:04:23 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 08 Feb 2021 11:04:23 -0600
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <A9EDDFC4-B3E2-4736-8C10-2C5C8D3839A4@hxcore.ol>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
 <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>
 <CAF6FJiuxVK-ZTHUB4aBE85=3K8_c0N1ZHRf7FmRAjSHrO_H3Ow@mail.gmail.com>
 <11BB0603-25E4-4B02-9B55-F783E54B51DA@hxcore.ol>
 , <CAF6FJisP_a5X3e+qRXnq_t8Z0w0y9KmbULV0yLmZVTz0=eC_Sg@mail.gmail.com>
 <A9EDDFC4-B3E2-4736-8C10-2C5C8D3839A4@hxcore.ol>
Message-ID: <c1b674150dbb5e4ac0bd06d8967774b54599150c.camel@sipsolutions.net>

On Mon, 2021-02-08 at 16:36 +0000, Kevin Sheppard wrote:
> That is good news indeed.? Seems like a good upgrade, especially
> given the breadth of application of normals and the multiple
> appearances within distributions.c (e.g., Cauchy). Is there a
> deprecation for a change like this? Or is it just a note and new
> random numbers in the next major?? I think this is the first time a
> substantially new algo has replaced an existing one.
> ?

I don't think we can deprecate or even warn about it, that would just
result in noise that cannot be silenced.
If we really think warnings are necessary, it sounds like you would
need an opt-in `numpy.random.set_warn_if_streams_will_change()`.
That sounds like diminishing returns on first sight.

It may be good that this happens now, rather than in a few years when
adoption of the new API is probably still on the low side.


This type of change should be in the release notes undoubtedly and
likely a `.. versionchanged::` directive in the docstring.

Maybe the best thing would be to create a single, prominent but brief,
changelog listing all (or almost all) stream changes? (Possibly instead
of documenting it in the individual function as `.. versionchanged::`)

I am thinking just a table with:
  * version changed
  * short description
  * affected functions
  * how the stream changed (if that is ever relevant)


Cheers,

Sebastian


> Kevin
> ?
> ?
> From: Robert Kern
> Sent: Monday, February 8, 2021 4:06 PM
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Question about optimizing
> random_standard_normal
> ?
> On Mon, Feb 8, 2021 at 10:53 AM Kevin Sheppard <
> kevin.k.sheppard at gmail.com> wrote:
> > My reading is that the first 4 are pure C, presumably using the
> > standard practice of inclining so as to make the tightest loop
> > possible, and to allow the compiler to make other optimizations.?
> > The final line is what happens when you replace the existing
> > ziggurat in NumPy with the new one. I read it this way since it has
> > both ?new? and ?old? with numpy. If it isn?t this, then I?m unsure
> > what ?new? and ?old? could mean in the context of this thread.
> 
> ?
> No, these are our benchmarks of `Generator`. `numpy` is testing
> `RandomState`, which wasn't touched by their contribution.
> ?
> ??
> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L93-L97
> ? 
> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_random.py#L123-L124
> ?
> > I suppose camel-cdr can clarify what was actually done.
> 
> ?
> > > > > But I did run the built-in benchmark: ./runtests.py --bench
> > > > > bench_random.RNG.time_normal_zig and the results are:
> 
> ?
> -- 
> Robert Kern
> ?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/cf34023d/attachment.sig>

From charlesr.harris at gmail.com  Mon Feb  8 17:37:11 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Mon, 8 Feb 2021 15:37:11 -0700
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
Message-ID: <CAB6mnxLG5BydW1bdsWauQ-hEO-bKvxPm=1YKyvp=YhE-V32vUg@mail.gmail.com>

On Sat, Feb 6, 2021 at 2:32 AM <camel-cdr at protonmail.com> wrote:

> I tried to implement a different implementation of the ziggurat method for
> generating standard normal distributions that is about twice as fast and
> uses 2/3 of the memory than the old one.
> I tested the implementation separately and am very confident it's correct,
> but it does fail 28 test in coverage testing.
> Checking the testing code I found out that all the failed tests are inside
> TestRandomDist which has the goal of "Make[ing] sure the random
> distribution returns the correct value for a given seed". Why would this be
> needed?
> The only explanation I can come up with is that it's standard_normal is,
> in regards to seeding, required to be backwards compatible. If that's the
> case how would, could one even implement a new algorithm?
>

Just for fun, I've attached the (C++) implementation that uses the
exponentially extended base block. Note that the constructor produces the
table.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/91a1be5f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RandomNormal.hpp
Type: text/x-c++hdr
Size: 1913 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/91a1be5f/attachment.hpp>

From robert.kern at gmail.com  Mon Feb  8 18:00:11 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 8 Feb 2021 18:00:11 -0500
Subject: [Numpy-discussion] Question about optimizing
 random_standard_normal
In-Reply-To: <c1b674150dbb5e4ac0bd06d8967774b54599150c.camel@sipsolutions.net>
References: <FOVVg7X9ouUxRcpRFXr_S_2kSFdfpquQaZCPfnzLDE3Sske0cPbgOch7blnnadsIH2zf1uRvgSn1ZGo79lrYDZXz3DiSnUpPEuOKhr7tpP8=@protonmail.com>
 <CAE8Ss2GLwcfQLjFdX5jR2OSLgVxDXhHsXeHTDMScA41xy0gojA@mail.gmail.com>
 <IqWd-IgFMo4HPsnb6xm6VkXxXa-jOnP4VYPGlVj8zVQMiFPSHuw8bR4jMTEnhE7lxEzzkAgCn1vQhp1r-1lHzHDmSw4YyQZ3WHjALjOzM-A=@protonmail.com>
 <CAE8Ss2Ew6=ERjzA9eTWn5F9DfmM5Kmoa1H7PuPxRcDixJpLbPg@mail.gmail.com>
 <CAF6FJiuxVK-ZTHUB4aBE85=3K8_c0N1ZHRf7FmRAjSHrO_H3Ow@mail.gmail.com>
 <11BB0603-25E4-4B02-9B55-F783E54B51DA@hxcore.ol>
 <CAF6FJisP_a5X3e+qRXnq_t8Z0w0y9KmbULV0yLmZVTz0=eC_Sg@mail.gmail.com>
 <A9EDDFC4-B3E2-4736-8C10-2C5C8D3839A4@hxcore.ol>
 <c1b674150dbb5e4ac0bd06d8967774b54599150c.camel@sipsolutions.net>
Message-ID: <CAF6FJiuCEBadRkG136HHD5pN6VOycNZmOM95ocdv=79hnL1Fxw@mail.gmail.com>

On Mon, Feb 8, 2021 at 12:10 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

>
> This type of change should be in the release notes undoubtedly and
> likely a `.. versionchanged::` directive in the docstring.
>
> Maybe the best thing would be to create a single, prominent but brief,
> changelog listing all (or almost all) stream changes? (Possibly instead
> of documenting it in the individual function as `.. versionchanged::`)
>
> I am thinking just a table with:
>   * version changed
>   * short description
>   * affected functions
>   * how the stream changed (if that is ever relevant)
>

Both are probably useful. Good ideas.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210208/ce5af21b/attachment-0001.html>

From sebastian at sipsolutions.net  Wed Feb 10 00:25:20 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 09 Feb 2021 23:25:20 -0600
Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage
 Focus
Message-ID: <dad78a275825e0017c9b8c0cfe75fd3623c9676d.camel@sipsolutions.net>

Hi all,

Our bi-weekly triage-focused NumPy development meeting is Wednesday,
Feb 10th at 11 am Pacific Time (19:00 UTC).
Everyone is invited to join in and edit the work-in-progress meeting
topics and notes:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

I encourage everyone to notify us of issues or PRs that you feel should
be prioritized, discussed, or reviewed.

Best regards

Sebastian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210209/ad7904d9/attachment.sig>

From jfoxrabinovitz at gmail.com  Wed Feb 10 17:31:35 2021
From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz)
Date: Wed, 10 Feb 2021 17:31:35 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
Message-ID: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>

I've created PR#18386 to add a function called atleast_nd to numpy and
numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
atleast_3d functions.

I proposed a similar idea about four and a half years ago:
https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html,
PR#7804. The reception was ambivalent, but a couple of folks have asked me
about this, so I'm bringing it back.

Some pros:

- This closes issue #12336
- There are a couple of Stack Overflow questions that would benefit
- Been asked about this a couple of times
- Implementation of three existing atleast_*d functions gets easier
- Looks nicer that the equivalent broadcasting and reshaping

Some cons:

- Cluttering up the API
- Maintenance burden (but not a big one)
- This is just a utility function, which can be achieved through
broadcasting and reshaping

If this meets with approval, there are a couple of interface issues that
probably need to be hashed out:

- The consensus was that this function should accept a single array, rather
than a tuple, or multiple arrays as the other atleast_nd functions do. Does
that need to be revisited?
- Right now, a `pos` argument specifies where to place new axes, if any.
That can be specified in different ways. Another way might be to specify
the offset of the existing dimensions, or something entirely different.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210210/5776d96c/attachment.html>

From sebastian at sipsolutions.net  Wed Feb 10 17:48:30 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 10 Feb 2021 16:48:30 -0600
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
Message-ID: <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>

On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
> I've created PR#18386 to add a function called atleast_nd to numpy
> and
> numpy.ma. This would generalize the existing atleast_1d, atleast_2d,
> and
> atleast_3d functions.
> 
> I proposed a similar idea about four and a half years ago:
> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
> ,
> PR#7804. The reception was ambivalent, but a couple of folks have
> asked me
> about this, so I'm bringing it back.
> 
> Some pros:
> 
> - This closes issue #12336
> - There are a couple of Stack Overflow questions that would benefit
> - Been asked about this a couple of times
> - Implementation of three existing atleast_*d functions gets easier
> - Looks nicer that the equivalent broadcasting and reshaping
> 
> Some cons:
> 
> - Cluttering up the API
> - Maintenance burden (but not a big one)
> - This is just a utility function, which can be achieved through
> broadcasting and reshaping
> 

My main concern would be the namespace cluttering. I can't say I use
even the `atleast_2d` etc. functions personally, so I would tend to be
slightly against the addition. But if others land on the "useful" side
here (and it seemed a bit at least on github), I am also not opposed.
?It is a clean name that lines up with existing ones, so it doesn't
seem like a big "mental load" with respect to namespace cluttering.

Bike shedding the API is probably a good idea in any case.

I have pasted the current PR documentation (as html) below for quick
reference. I wonder a bit about the reasoning for having `pos` specify
a value rather than just a side?


numpy.atleast_nd(ary,?ndim,?pos=0)
View input as array with at least?ndim?dimensions.
New unit dimensions are inserted at the index given by?pos?if
necessary.
Parameters
ary ?array_like
The input array. Non-array inputs are converted to arrays. Arrays that
already have?ndim?or more dimensions are preserved.

ndim ?int
The minimum number of dimensions required.

pos ?int, optional
The index to insert the new dimensions. May range from?-ary.ndim?-
?1?to?+ary.ndim?(inclusive). Non-negative indices indicate locations
before the corresponding axis:?pos=0?means to insert at the very
beginning. Negative indices indicate locations after the corresponding
axis:?pos=-1?means to insert at the very end. 0 and -1 are always
guaranteed to work. Any other number will depend on the dimensions of
the existing array. Default is 0.


Returns
res ?ndarray
An array with?res.ndim?>=?ndim. A view is returned for array inputs.
Dimensions are prepended if?pos?is 0, so for example, a 1-D array of
shape?(N,)?with?ndim=4becomes a view of shape?(1,?1,?1,?N). Dimensions
are appended if?pos?is -1, so for example a 2-D array of
shape?(M,?N)?becomes a view of shape?(M,?N,?1,?1)when?ndim=4.


See also
atleast_1d,?atleast_2d,?atleast_3d


Notes
This function does not follow the convention of the
other?atleast_*d?functions in numpy in that it only accepts a single
array argument. To process multiple arrays, use a comprehension or loop
around the function call. See examples below.
Setting?pos=0?is equivalent to how the array would be interpreted by
numpy?s broadcasting rules. There is no need to call this function for
simple broadcasting. This is also roughly (but not exactly) equivalent
to?np.array(ary,?copy=False,?subok=True,?ndmin=ndim).
It is easy to create functions for specific dimensions similar to the
other?atleast_*d?functions using Python?s?functools.partial?function.
An example is shown below.
Examples
>>> np.atleast_nd(3.0, 4)
array([[[[ 3.]]]])
>>> x = np.arange(3.0)
>>> np.atleast_nd(x, 2).shape
(1, 3)
>>> x = np.arange(12.0).reshape(4, 3)
>>> np.atleast_nd(x, 5).shape
(1, 1, 1, 4, 3)
>>> np.atleast_nd(x, 5).base is x.base
True
>>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:
[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>> np.atleast_nd((1, 2), 5, pos=0).shape
(1, 1, 1, 1, 2)
>>> np.atleast_nd((1, 2), 5, pos=-1).shape
(2, 1, 1, 1, 1)
>>> from functools import partial
>>> atleast_4d = partial(np.atleast_nd, ndim=4)
>>> atleast_4d([1, 2, 3])
[[[[1, 2, 3]]]]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210210/3d0ee91b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210210/3d0ee91b/attachment-0001.sig>

From jni at fastmail.com  Thu Feb 11 00:46:50 2021
From: jni at fastmail.com (Juan Nunez-Iglesias)
Date: Thu, 11 Feb 2021 16:46:50 +1100
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
Message-ID: <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>

I totally agree with the namespace clutter concern, but honestly, I would use `atleast_nd` with its `pos` argument (I might rename it to `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`, for which I had no idea where the new axes would end up.

So, I?m in favour of including it, and optionally deprecating `atleast_{1,2,3}d`.

Juan.

> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net> wrote:
> 
> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>> I've created PR#18386 to add a function called atleast_nd to numpy and
>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>> atleast_3d functions.
>> 
>> I proposed a similar idea about four and a half years ago:
>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html <https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html>,
>> PR#7804. The reception was ambivalent, but a couple of folks have asked me
>> about this, so I'm bringing it back.
>> 
>> Some pros:
>> 
>> - This closes issue #12336
>> - There are a couple of Stack Overflow questions that would benefit
>> - Been asked about this a couple of times
>> - Implementation of three existing atleast_*d functions gets easier
>> - Looks nicer that the equivalent broadcasting and reshaping
>> 
>> Some cons:
>> 
>> - Cluttering up the API
>> - Maintenance burden (but not a big one)
>> - This is just a utility function, which can be achieved through
>> broadcasting and reshaping
>> 
> 
> My main concern would be the namespace cluttering. I can't say I use even the `atleast_2d` etc. functions personally, so I would tend to be slightly against the addition. But if others land on the "useful" side here (and it seemed a bit at least on github), I am also not opposed.  It is a clean name that lines up with existing ones, so it doesn't seem like a big "mental load" with respect to namespace cluttering.
> 
> Bike shedding the API is probably a good idea in any case.
> 
> I have pasted the current PR documentation (as html) below for quick reference. I wonder a bit about the reasoning for having `pos` specify a value rather than just a side?
> 
> 
> 
> numpy.atleast_nd(ary, ndim, pos=0)
> View input as array with at least ndim dimensions.
> New unit dimensions are inserted at the index given by pos if necessary.
> Parameters
> ary  array_like
> The input array. Non-array inputs are converted to arrays. Arrays that already have ndim or more dimensions are preserved.
> ndim  int
> The minimum number of dimensions required.
> pos  int, optional
> The index to insert the new dimensions. May range from -ary.ndim - 1 to +ary.ndim (inclusive). Non-negative indices indicate locations before the corresponding axis: pos=0 means to insert at the very beginning. Negative indices indicate locations after the corresponding axis: pos=-1 means to insert at the very end. 0 and -1 are always guaranteed to work. Any other number will depend on the dimensions of the existing array. Default is 0.
> Returns
> res  ndarray
> An array with res.ndim >= ndim. A view is returned for array inputs. Dimensions are prepended if pos is 0, so for example, a 1-D array of shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions are appended if pos is -1, so for example a 2-D array of shape (M, N) becomes a view of shape (M, N, 1, 1)when ndim=4.
> See also
> atleast_1d <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>, atleast_2d <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>, atleast_3d <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
> Notes
> This function does not follow the convention of the other atleast_*d functions in numpy in that it only accepts a single array argument. To process multiple arrays, use a comprehension or loop around the function call. See examples below.
> Setting pos=0 is equivalent to how the array would be interpreted by numpy?s broadcasting rules. There is no need to call this function for simple broadcasting. This is also roughly (but not exactly) equivalent to np.array(ary, copy=False, subok=True, ndmin=ndim).
> It is easy to create functions for specific dimensions similar to the other atleast_*d functions using Python?s functools.partial <https://docs.python.org/dev/library/functools.html#functools.partial> function. An example is shown below.
> Examples
> >>> np.atleast_nd(3.0, 4)
> array([[[[ 3.]]]])
> >>> x = np.arange(3.0)
> >>> np.atleast_nd(x, 2).shape
> (1, 3)
> >>> x = np.arange(12.0).reshape(4, 3)
> >>> np.atleast_nd(x, 5).shape
> (1, 1, 1, 4, 3)
> >>> np.atleast_nd(x, 5).base is x.base
> True
> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:
> [array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
> >>> np.atleast_nd((1, 2), 5, pos=0).shape
> (1, 1, 1, 1, 2)
> >>> np.atleast_nd((1, 2), 5, pos=-1).shape
> (2, 1, 1, 1, 1)
> >>> from functools import partial
> >>> atleast_4d = partial(np.atleast_nd, ndim=4)
> >>> atleast_4d([1, 2, 3])
> [[[[1, 2, 3]]]]
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210211/a680ca0d/attachment-0001.html>

From shoyer at gmail.com  Thu Feb 11 01:55:30 2021
From: shoyer at gmail.com (Stephan Hoyer)
Date: Wed, 10 Feb 2021 22:55:30 -0800
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
Message-ID: <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>

On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
wrote:

> I totally agree with the namespace clutter concern, but honestly, I would
> use `atleast_nd` with its `pos` argument (I might rename it to `position`,
> `axis`, or `axis_position`) any day over `at_least{1,2,3}d`, for which I
> had no idea where the new axes would end up.
>
> So, I?m in favour of including it, and optionally deprecating
> `atleast_{1,2,3}d`.
>
>
I appreciate that `atleast_nd` feels more sensible than `at_least{1,2,3}d`,
but I don't think "better" than a pattern we would not recommend is a good
enough reason for inclusion in NumPy. It needs to stand on its own.

What would be the recommended use-cases for this new function?
Have any libraries building on top of NumPy implemented a version of this?


> Juan.
>
> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
>
> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>
> I've created PR#18386 to add a function called atleast_nd to numpy and
> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
> atleast_3d functions.
>
> I proposed a similar idea about four and a half years ago:
> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html,
> PR#7804. The reception was ambivalent, but a couple of folks have asked me
> about this, so I'm bringing it back.
>
> Some pros:
>
> - This closes issue #12336
> - There are a couple of Stack Overflow questions that would benefit
> - Been asked about this a couple of times
> - Implementation of three existing atleast_*d functions gets easier
> - Looks nicer that the equivalent broadcasting and reshaping
>
> Some cons:
>
> - Cluttering up the API
> - Maintenance burden (but not a big one)
> - This is just a utility function, which can be achieved through
> broadcasting and reshaping
>
>
> My main concern would be the namespace cluttering. I can't say I use even
> the `atleast_2d` etc. functions personally, so I would tend to be slightly
> against the addition. But if others land on the "useful" side here (and it
> seemed a bit at least on github), I am also not opposed.  It is a clean
> name that lines up with existing ones, so it doesn't seem like a big
> "mental load" with respect to namespace cluttering.
>
> Bike shedding the API is probably a good idea in any case.
>
> I have pasted the current PR documentation (as html) below for quick
> reference. I wonder a bit about the reasoning for having `pos` specify a
> value rather than just a side?
>
>
>
> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
> View input as array with at least ndim dimensions.
> New unit dimensions are inserted at the index given by *pos* if necessary.
> Parameters*ary  *array_like
> The input array. Non-array inputs are converted to arrays. Arrays that
> already have ndim or more dimensions are preserved.
> *ndim  *int
> The minimum number of dimensions required.
> *pos  *int, optional
> The index to insert the new dimensions. May range from -ary.ndim - 1 to
> +ary.ndim (inclusive). Non-negative indices indicate locations before the
> corresponding axis: pos=0 means to insert at the very beginning. Negative
> indices indicate locations after the corresponding axis: pos=-1 means to
> insert at the very end. 0 and -1 are always guaranteed to work. Any other
> number will depend on the dimensions of the existing array. Default is 0.
> Returns*res  *ndarray
> An array with res.ndim >= ndim. A view is returned for array inputs.
> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
> are appended if *pos* is -1, so for example a 2-D array of shape (M, N) becomes
> a view of shape (M, N, 1, 1)when ndim=4.
> *See also*
> atleast_1d
> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
> , atleast_2d
> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
> , atleast_3d
> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
> *Notes*
> This function does not follow the convention of the other atleast_*d functions
> in numpy in that it only accepts a single array argument. To process
> multiple arrays, use a comprehension or loop around the function call. See
> examples below.
> Setting pos=0 is equivalent to how the array would be interpreted by
> numpy?s broadcasting rules. There is no need to call this function for
> simple broadcasting. This is also roughly (but not exactly) equivalent to
> np.array(ary, copy=False, subok=True, ndmin=ndim).
> It is easy to create functions for specific dimensions similar to the other
>  atleast_*d functions using Python?s functools.partial
> <https://docs.python.org/dev/library/functools.html#functools.partial> function.
> An example is shown below.
> *Examples*
>
> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>
> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>
> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>
> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>
> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>
> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210210/c696f83c/attachment-0001.html>

From ben.v.root at gmail.com  Thu Feb 11 12:40:16 2021
From: ben.v.root at gmail.com (Benjamin Root)
Date: Thu, 11 Feb 2021 12:40:16 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
Message-ID: <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>

for me, I find that the at_least{1,2,3}d functions are useful for
sanitizing inputs. Having an at_leastnd() function can be viewed as a step
towards cleaning up the API, not cluttering it (although, deprecations of
the existing functions probably should be long given how long they have
existed).

On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
> wrote:
>
>> I totally agree with the namespace clutter concern, but honestly, I would
>> use `atleast_nd` with its `pos` argument (I might rename it to `position`,
>> `axis`, or `axis_position`) any day over `at_least{1,2,3}d`, for which I
>> had no idea where the new axes would end up.
>>
>> So, I?m in favour of including it, and optionally deprecating
>> `atleast_{1,2,3}d`.
>>
>>
> I appreciate that `atleast_nd` feels more sensible than
> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
> recommend is a good enough reason for inclusion in NumPy. It needs to stand
> on its own.
>
> What would be the recommended use-cases for this new function?
> Have any libraries building on top of NumPy implemented a version of this?
>
>
>> Juan.
>>
>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
>> wrote:
>>
>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>
>> I've created PR#18386 to add a function called atleast_nd to numpy and
>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>> atleast_3d functions.
>>
>> I proposed a similar idea about four and a half years ago:
>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html,
>> PR#7804. The reception was ambivalent, but a couple of folks have asked me
>> about this, so I'm bringing it back.
>>
>> Some pros:
>>
>> - This closes issue #12336
>> - There are a couple of Stack Overflow questions that would benefit
>> - Been asked about this a couple of times
>> - Implementation of three existing atleast_*d functions gets easier
>> - Looks nicer that the equivalent broadcasting and reshaping
>>
>> Some cons:
>>
>> - Cluttering up the API
>> - Maintenance burden (but not a big one)
>> - This is just a utility function, which can be achieved through
>> broadcasting and reshaping
>>
>>
>> My main concern would be the namespace cluttering. I can't say I use even
>> the `atleast_2d` etc. functions personally, so I would tend to be slightly
>> against the addition. But if others land on the "useful" side here (and it
>> seemed a bit at least on github), I am also not opposed.  It is a clean
>> name that lines up with existing ones, so it doesn't seem like a big
>> "mental load" with respect to namespace cluttering.
>>
>> Bike shedding the API is probably a good idea in any case.
>>
>> I have pasted the current PR documentation (as html) below for quick
>> reference. I wonder a bit about the reasoning for having `pos` specify a
>> value rather than just a side?
>>
>>
>>
>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>> View input as array with at least ndim dimensions.
>> New unit dimensions are inserted at the index given by *pos* if
>> necessary.
>> Parameters*ary  *array_like
>> The input array. Non-array inputs are converted to arrays. Arrays that
>> already have ndim or more dimensions are preserved.
>> *ndim  *int
>> The minimum number of dimensions required.
>> *pos  *int, optional
>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>> +ary.ndim (inclusive). Non-negative indices indicate locations before
>> the corresponding axis: pos=0 means to insert at the very beginning.
>> Negative indices indicate locations after the corresponding axis: pos=-1 means
>> to insert at the very end. 0 and -1 are always guaranteed to work. Any
>> other number will depend on the dimensions of the existing array. Default
>> is 0.
>> Returns*res  *ndarray
>> An array with res.ndim >= ndim. A view is returned for array inputs.
>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
>> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
>> are appended if *pos* is -1, so for example a 2-D array of shape (M, N) becomes
>> a view of shape (M, N, 1, 1)when ndim=4.
>> *See also*
>> atleast_1d
>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
>> , atleast_2d
>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
>> , atleast_3d
>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>> *Notes*
>> This function does not follow the convention of the other atleast_*d functions
>> in numpy in that it only accepts a single array argument. To process
>> multiple arrays, use a comprehension or loop around the function call. See
>> examples below.
>> Setting pos=0 is equivalent to how the array would be interpreted by
>> numpy?s broadcasting rules. There is no need to call this function for
>> simple broadcasting. This is also roughly (but not exactly) equivalent to
>>  np.array(ary, copy=False, subok=True, ndmin=ndim).
>> It is easy to create functions for specific dimensions similar to the
>> other atleast_*d functions using Python?s functools.partial
>> <https://docs.python.org/dev/library/functools.html#functools.partial> function.
>> An example is shown below.
>> *Examples*
>>
>> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>>
>> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>>
>> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>>
>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>
>> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>>
>> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210211/04ade147/attachment-0001.html>

From jfoxrabinovitz at gmail.com  Thu Feb 11 12:48:40 2021
From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz)
Date: Thu, 11 Feb 2021 12:48:40 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
Message-ID: <CAAa1KPbEaBEz2eFt3WpCbSGk+hrPXhjUqjPeyiPMzBp4tDXVNg@mail.gmail.com>

The original functions appear to have been written for things like *stack
originally, which actually goes a long way to explaining the inconsistent
argument list.

- Joe


On Thu, Feb 11, 2021, 12:41 Benjamin Root <ben.v.root at gmail.com> wrote:

> for me, I find that the at_least{1,2,3}d functions are useful for
> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
> towards cleaning up the API, not cluttering it (although, deprecations of
> the existing functions probably should be long given how long they have
> existed).
>
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
>> wrote:
>>
>>> I totally agree with the namespace clutter concern, but honestly, I
>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>> for which I had no idea where the new axes would end up.
>>>
>>> So, I?m in favour of including it, and optionally deprecating
>>> `atleast_{1,2,3}d`.
>>>
>>>
>> I appreciate that `atleast_nd` feels more sensible than
>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>> on its own.
>>
>> What would be the recommended use-cases for this new function?
>> Have any libraries building on top of NumPy implemented a version of this?
>>
>>
>>> Juan.
>>>
>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
>>> wrote:
>>>
>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>
>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>>> atleast_3d functions.
>>>
>>> I proposed a similar idea about four and a half years ago:
>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>> ,
>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>> me
>>> about this, so I'm bringing it back.
>>>
>>> Some pros:
>>>
>>> - This closes issue #12336
>>> - There are a couple of Stack Overflow questions that would benefit
>>> - Been asked about this a couple of times
>>> - Implementation of three existing atleast_*d functions gets easier
>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>
>>> Some cons:
>>>
>>> - Cluttering up the API
>>> - Maintenance burden (but not a big one)
>>> - This is just a utility function, which can be achieved through
>>> broadcasting and reshaping
>>>
>>>
>>> My main concern would be the namespace cluttering. I can't say I use
>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>> slightly against the addition. But if others land on the "useful" side here
>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>> "mental load" with respect to namespace cluttering.
>>>
>>> Bike shedding the API is probably a good idea in any case.
>>>
>>> I have pasted the current PR documentation (as html) below for quick
>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>> value rather than just a side?
>>>
>>>
>>>
>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>> View input as array with at least ndim dimensions.
>>> New unit dimensions are inserted at the index given by *pos* if
>>> necessary.
>>> Parameters*ary  *array_like
>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>> already have ndim or more dimensions are preserved.
>>> *ndim  *int
>>> The minimum number of dimensions required.
>>> *pos  *int, optional
>>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>>> +ary.ndim (inclusive). Non-negative indices indicate locations before
>>> the corresponding axis: pos=0 means to insert at the very beginning.
>>> Negative indices indicate locations after the corresponding axis: pos=-1
>>>  means to insert at the very end. 0 and -1 are always guaranteed to
>>> work. Any other number will depend on the dimensions of the existing array.
>>> Default is 0.
>>> Returns*res  *ndarray
>>> An array with res.ndim >= ndim. A view is returned for array inputs.
>>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
>>> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
>>> are appended if *pos* is -1, so for example a 2-D array of shape (M, N) becomes
>>> a view of shape (M, N, 1, 1)when ndim=4.
>>> *See also*
>>> atleast_1d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
>>> , atleast_2d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
>>> , atleast_3d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>>> *Notes*
>>> This function does not follow the convention of the other atleast_*d functions
>>> in numpy in that it only accepts a single array argument. To process
>>> multiple arrays, use a comprehension or loop around the function call. See
>>> examples below.
>>> Setting pos=0 is equivalent to how the array would be interpreted by
>>> numpy?s broadcasting rules. There is no need to call this function for
>>> simple broadcasting. This is also roughly (but not exactly) equivalent to
>>>  np.array(ary, copy=False, subok=True, ndmin=ndim).
>>> It is easy to create functions for specific dimensions similar to the
>>> other atleast_*d functions using Python?s functools.partial
>>> <https://docs.python.org/dev/library/functools.html#functools.partial> function.
>>> An example is shown below.
>>> *Examples*
>>>
>>> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>>>
>>> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>>>
>>> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>>>
>>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>>
>>> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>>>
>>> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210211/7bae2557/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Thu Feb 11 13:12:51 2021
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Thu, 11 Feb 2021 18:12:51 +0000
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
Message-ID: <CAL1kJvDe3b7d=Sw7B2JDW+-amTves+WQFB-U5tWSO1JGKDaGag@mail.gmail.com>

> I find that the at_least{1,2,3}d functions are useful for sanitizing
inputs

IMO, this type of "sanitization" goes against "In the face of ambiguity,
refuse the temptation to guess".
Instead of using `at_least{n}d`, it could be argued that `if np.ndim(x) !=
n: raise ValueError` is a safer bet, which forces the user to think about
what's actually going on, and saves them from silent headaches.

Of course, this is just an argument for discouraging users from using these
functions, and for the fact that we perhaps should not have had them in the
first place.
Given we already have some of them, adding `atleast_nd` probably isn't
going to make things any worse.
In principle, it could actually make things better, as we could put a
"Notes" section in the new function docs that describes the XY problem that
makes atleast_nd look like a better solution that it is and presents better
alternatives, and the other three function docs could link there.

Eric

On Thu, 11 Feb 2021 at 17:41, Benjamin Root <ben.v.root at gmail.com> wrote:

> for me, I find that the at_least{1,2,3}d functions are useful for
> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
> towards cleaning up the API, not cluttering it (although, deprecations of
> the existing functions probably should be long given how long they have
> existed).
>
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
>> wrote:
>>
>>> I totally agree with the namespace clutter concern, but honestly, I
>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>> for which I had no idea where the new axes would end up.
>>>
>>> So, I?m in favour of including it, and optionally deprecating
>>> `atleast_{1,2,3}d`.
>>>
>>>
>> I appreciate that `atleast_nd` feels more sensible than
>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>> on its own.
>>
>> What would be the recommended use-cases for this new function?
>> Have any libraries building on top of NumPy implemented a version of this?
>>
>>
>>> Juan.
>>>
>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
>>> wrote:
>>>
>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>
>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>>> atleast_3d functions.
>>>
>>> I proposed a similar idea about four and a half years ago:
>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>> ,
>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>> me
>>> about this, so I'm bringing it back.
>>>
>>> Some pros:
>>>
>>> - This closes issue #12336
>>> - There are a couple of Stack Overflow questions that would benefit
>>> - Been asked about this a couple of times
>>> - Implementation of three existing atleast_*d functions gets easier
>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>
>>> Some cons:
>>>
>>> - Cluttering up the API
>>> - Maintenance burden (but not a big one)
>>> - This is just a utility function, which can be achieved through
>>> broadcasting and reshaping
>>>
>>>
>>> My main concern would be the namespace cluttering. I can't say I use
>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>> slightly against the addition. But if others land on the "useful" side here
>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>> "mental load" with respect to namespace cluttering.
>>>
>>> Bike shedding the API is probably a good idea in any case.
>>>
>>> I have pasted the current PR documentation (as html) below for quick
>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>> value rather than just a side?
>>>
>>>
>>>
>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>> View input as array with at least ndim dimensions.
>>> New unit dimensions are inserted at the index given by *pos* if
>>> necessary.
>>> Parameters*ary  *array_like
>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>> already have ndim or more dimensions are preserved.
>>> *ndim  *int
>>> The minimum number of dimensions required.
>>> *pos  *int, optional
>>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>>> +ary.ndim (inclusive). Non-negative indices indicate locations before
>>> the corresponding axis: pos=0 means to insert at the very beginning.
>>> Negative indices indicate locations after the corresponding axis: pos=-1
>>>  means to insert at the very end. 0 and -1 are always guaranteed to
>>> work. Any other number will depend on the dimensions of the existing array.
>>> Default is 0.
>>> Returns*res  *ndarray
>>> An array with res.ndim >= ndim. A view is returned for array inputs.
>>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
>>> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
>>> are appended if *pos* is -1, so for example a 2-D array of shape (M, N) becomes
>>> a view of shape (M, N, 1, 1)when ndim=4.
>>> *See also*
>>> atleast_1d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
>>> , atleast_2d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
>>> , atleast_3d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>>> *Notes*
>>> This function does not follow the convention of the other atleast_*d functions
>>> in numpy in that it only accepts a single array argument. To process
>>> multiple arrays, use a comprehension or loop around the function call. See
>>> examples below.
>>> Setting pos=0 is equivalent to how the array would be interpreted by
>>> numpy?s broadcasting rules. There is no need to call this function for
>>> simple broadcasting. This is also roughly (but not exactly) equivalent to
>>>  np.array(ary, copy=False, subok=True, ndmin=ndim).
>>> It is easy to create functions for specific dimensions similar to the
>>> other atleast_*d functions using Python?s functools.partial
>>> <https://docs.python.org/dev/library/functools.html#functools.partial> function.
>>> An example is shown below.
>>> *Examples*
>>>
>>> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>>>
>>> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>>>
>>> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>>>
>>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>>
>>> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>>>
>>> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210211/f265dbf8/attachment-0001.html>

From shoyer at gmail.com  Thu Feb 11 13:13:20 2021
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 11 Feb 2021 10:13:20 -0800
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
Message-ID: <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>

On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root <ben.v.root at gmail.com> wrote:

> for me, I find that the at_least{1,2,3}d functions are useful for
> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
> towards cleaning up the API, not cluttering it (although, deprecations of
> the existing functions probably should be long given how long they have
> existed).
>

I would love to see examples of this -- perhaps in matplotlib?

My thinking is that in most cases it's probably a better idea to keep the
interface simpler, and raise an error for lower-dimensional arrays.
Automatic conversion is convenient (and endemic within the SciPy
ecosystem), but is also a common source of bugs.

On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
>> wrote:
>>
>>> I totally agree with the namespace clutter concern, but honestly, I
>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>> for which I had no idea where the new axes would end up.
>>>
>>> So, I?m in favour of including it, and optionally deprecating
>>> `atleast_{1,2,3}d`.
>>>
>>>
>> I appreciate that `atleast_nd` feels more sensible than
>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>> on its own.
>>
>> What would be the recommended use-cases for this new function?
>> Have any libraries building on top of NumPy implemented a version of this?
>>
>>
>>> Juan.
>>>
>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
>>> wrote:
>>>
>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>
>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d, and
>>> atleast_3d functions.
>>>
>>> I proposed a similar idea about four and a half years ago:
>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>> ,
>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>> me
>>> about this, so I'm bringing it back.
>>>
>>> Some pros:
>>>
>>> - This closes issue #12336
>>> - There are a couple of Stack Overflow questions that would benefit
>>> - Been asked about this a couple of times
>>> - Implementation of three existing atleast_*d functions gets easier
>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>
>>> Some cons:
>>>
>>> - Cluttering up the API
>>> - Maintenance burden (but not a big one)
>>> - This is just a utility function, which can be achieved through
>>> broadcasting and reshaping
>>>
>>>
>>> My main concern would be the namespace cluttering. I can't say I use
>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>> slightly against the addition. But if others land on the "useful" side here
>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>> "mental load" with respect to namespace cluttering.
>>>
>>> Bike shedding the API is probably a good idea in any case.
>>>
>>> I have pasted the current PR documentation (as html) below for quick
>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>> value rather than just a side?
>>>
>>>
>>>
>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>> View input as array with at least ndim dimensions.
>>> New unit dimensions are inserted at the index given by *pos* if
>>> necessary.
>>> Parameters*ary  *array_like
>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>> already have ndim or more dimensions are preserved.
>>> *ndim  *int
>>> The minimum number of dimensions required.
>>> *pos  *int, optional
>>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>>> +ary.ndim (inclusive). Non-negative indices indicate locations before
>>> the corresponding axis: pos=0 means to insert at the very beginning.
>>> Negative indices indicate locations after the corresponding axis: pos=-1
>>>  means to insert at the very end. 0 and -1 are always guaranteed to
>>> work. Any other number will depend on the dimensions of the existing array.
>>> Default is 0.
>>> Returns*res  *ndarray
>>> An array with res.ndim >= ndim. A view is returned for array inputs.
>>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
>>> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
>>> are appended if *pos* is -1, so for example a 2-D array of shape (M, N) becomes
>>> a view of shape (M, N, 1, 1)when ndim=4.
>>> *See also*
>>> atleast_1d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
>>> , atleast_2d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
>>> , atleast_3d
>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>>> *Notes*
>>> This function does not follow the convention of the other atleast_*d functions
>>> in numpy in that it only accepts a single array argument. To process
>>> multiple arrays, use a comprehension or loop around the function call. See
>>> examples below.
>>> Setting pos=0 is equivalent to how the array would be interpreted by
>>> numpy?s broadcasting rules. There is no need to call this function for
>>> simple broadcasting. This is also roughly (but not exactly) equivalent to
>>>  np.array(ary, copy=False, subok=True, ndmin=ndim).
>>> It is easy to create functions for specific dimensions similar to the
>>> other atleast_*d functions using Python?s functools.partial
>>> <https://docs.python.org/dev/library/functools.html#functools.partial> function.
>>> An example is shown below.
>>> *Examples*
>>>
>>> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>>>
>>> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>>>
>>> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>>>
>>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>>
>>> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>>>
>>> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210211/058362da/attachment-0001.html>

From ben.v.root at gmail.com  Thu Feb 11 13:26:29 2021
From: ben.v.root at gmail.com (Benjamin Root)
Date: Thu, 11 Feb 2021 13:26:29 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
Message-ID: <CANNq6FkU-3fF2JFrPr=zVjsK4uxGPYt=8QBFpZ762iwMd7ZZkQ@mail.gmail.com>

My original usecase for these was dealing with output data from Matlab
where those users would use `squeeze()` quite liberally. In addition, there
was the problem of the implicit squeeze() in the numpy's loadtxt() for
which I added the ndmin kwarg for in case an input CSV file had just one
row or no rows.

np.atleast_1d() is used in matplotlib in a bunch of places where inputs are
allowed to be scalar or lists.

On Thu, Feb 11, 2021 at 1:15 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root <ben.v.root at gmail.com>
> wrote:
>
>> for me, I find that the at_least{1,2,3}d functions are useful for
>> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
>> towards cleaning up the API, not cluttering it (although, deprecations of
>> the existing functions probably should be long given how long they have
>> existed).
>>
>
> I would love to see examples of this -- perhaps in matplotlib?
>
> My thinking is that in most cases it's probably a better idea to keep the
> interface simpler, and raise an error for lower-dimensional arrays.
> Automatic conversion is convenient (and endemic within the SciPy
> ecosystem), but is also a common source of bugs.
>
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
>>> wrote:
>>>
>>>> I totally agree with the namespace clutter concern, but honestly, I
>>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>>> for which I had no idea where the new axes would end up.
>>>>
>>>> So, I?m in favour of including it, and optionally deprecating
>>>> `atleast_{1,2,3}d`.
>>>>
>>>>
>>> I appreciate that `atleast_nd` feels more sensible than
>>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>>> on its own.
>>>
>>> What would be the recommended use-cases for this new function?
>>> Have any libraries building on top of NumPy implemented a version of
>>> this?
>>>
>>>
>>>> Juan.
>>>>
>>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
>>>> wrote:
>>>>
>>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>>
>>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d,
>>>> and
>>>> atleast_3d functions.
>>>>
>>>> I proposed a similar idea about four and a half years ago:
>>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>>> ,
>>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>>> me
>>>> about this, so I'm bringing it back.
>>>>
>>>> Some pros:
>>>>
>>>> - This closes issue #12336
>>>> - There are a couple of Stack Overflow questions that would benefit
>>>> - Been asked about this a couple of times
>>>> - Implementation of three existing atleast_*d functions gets easier
>>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>>
>>>> Some cons:
>>>>
>>>> - Cluttering up the API
>>>> - Maintenance burden (but not a big one)
>>>> - This is just a utility function, which can be achieved through
>>>> broadcasting and reshaping
>>>>
>>>>
>>>> My main concern would be the namespace cluttering. I can't say I use
>>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>>> slightly against the addition. But if others land on the "useful" side here
>>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>>> "mental load" with respect to namespace cluttering.
>>>>
>>>> Bike shedding the API is probably a good idea in any case.
>>>>
>>>> I have pasted the current PR documentation (as html) below for quick
>>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>>> value rather than just a side?
>>>>
>>>>
>>>>
>>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>>> View input as array with at least ndim dimensions.
>>>> New unit dimensions are inserted at the index given by *pos* if
>>>> necessary.
>>>> Parameters*ary  *array_like
>>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>>> already have ndim or more dimensions are preserved.
>>>> *ndim  *int
>>>> The minimum number of dimensions required.
>>>> *pos  *int, optional
>>>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>>>>  +ary.ndim (inclusive). Non-negative indices indicate locations before
>>>> the corresponding axis: pos=0 means to insert at the very beginning.
>>>> Negative indices indicate locations after the corresponding axis:
>>>> pos=-1 means to insert at the very end. 0 and -1 are always guaranteed
>>>> to work. Any other number will depend on the dimensions of the existing
>>>> array. Default is 0.
>>>> Returns*res  *ndarray
>>>> An array with res.ndim >= ndim. A view is returned for array inputs.
>>>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
>>>> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
>>>> are appended if *pos* is -1, so for example a 2-D array of shape (M, N)
>>>>  becomes a view of shape (M, N, 1, 1)when ndim=4.
>>>> *See also*
>>>> atleast_1d
>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
>>>> , atleast_2d
>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
>>>> , atleast_3d
>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>>>> *Notes*
>>>> This function does not follow the convention of the other atleast_*d functions
>>>> in numpy in that it only accepts a single array argument. To process
>>>> multiple arrays, use a comprehension or loop around the function call. See
>>>> examples below.
>>>> Setting pos=0 is equivalent to how the array would be interpreted by
>>>> numpy?s broadcasting rules. There is no need to call this function for
>>>> simple broadcasting. This is also roughly (but not exactly) equivalent to
>>>>  np.array(ary, copy=False, subok=True, ndmin=ndim).
>>>> It is easy to create functions for specific dimensions similar to the
>>>> other atleast_*d functions using Python?s functools.partial
>>>> <https://docs.python.org/dev/library/functools.html#functools.partial> function.
>>>> An example is shown below.
>>>> *Examples*
>>>>
>>>> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>>>>
>>>> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>>>>
>>>> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>>>>
>>>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>>>
>>>> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>>>>
>>>> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210211/2f68d3b3/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Thu Feb 11 13:32:42 2021
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Thu, 11 Feb 2021 18:32:42 +0000
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
Message-ID: <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>

I did a quick search of matplotlib, and found a few uses of all three
functions:

*
https://github.com/matplotlib/matplotlib/blob/fed55c63a314351cd39a12783f385009782c06e1/lib/matplotlib/_layoutgrid.py#L441-L446
  This one isn't really numpy at all, and is really just a shorthand for
normalizing an argument `x=n` to `x=[n, n]`
*
https://github.com/matplotlib/matplotlib/blob/dd249744270f6abe3f540f81b7a77c0cb728ddbb/lib/matplotlib/mlab.py#L888
   This one is the classic "either multivariate or single-variable data"
thing endemic to the SciPy ecosystem.
*
https://github.com/matplotlib/matplotlib/blob/1eef019109b64ee4085732544cb5e310e69451ab/lib/matplotlib/cbook/__init__.py#L1325-L1326
  Matplotlib has their own `_check_1d` function for input sanitization,
although github says it's only used to parse the arguments to `plot`, which
at this point are fairly established as being flexible.
*
https://github.com/matplotlib/matplotlib/blob/f72adc49092fe0233a8cd21aa0f317918dafb18d/lib/matplotlib/transforms.py#L631
  This just looks like "defensive programming", and if the argument isn't
already 3d then something is probably wrong.

This isn't an exhaustive list, just a handful of different situations the
functions were used.

Eric


On Thu, 11 Feb 2021 at 18:15, Stephan Hoyer <shoyer at gmail.com> wrote:

> On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root <ben.v.root at gmail.com>
> wrote:
>
>> for me, I find that the at_least{1,2,3}d functions are useful for
>> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
>> towards cleaning up the API, not cluttering it (although, deprecations of
>> the existing functions probably should be long given how long they have
>> existed).
>>
>
> I would love to see examples of this -- perhaps in matplotlib?
>
> My thinking is that in most cases it's probably a better idea to keep the
> interface simpler, and raise an error for lower-dimensional arrays.
> Automatic conversion is convenient (and endemic within the SciPy
> ecosystem), but is also a common source of bugs.
>
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
>>> wrote:
>>>
>>>> I totally agree with the namespace clutter concern, but honestly, I
>>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>>> for which I had no idea where the new axes would end up.
>>>>
>>>> So, I?m in favour of including it, and optionally deprecating
>>>> `atleast_{1,2,3}d`.
>>>>
>>>>
>>> I appreciate that `atleast_nd` feels more sensible than
>>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>>> on its own.
>>>
>>> What would be the recommended use-cases for this new function?
>>> Have any libraries building on top of NumPy implemented a version of
>>> this?
>>>
>>>
>>>> Juan.
>>>>
>>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
>>>> wrote:
>>>>
>>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>>
>>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d,
>>>> and
>>>> atleast_3d functions.
>>>>
>>>> I proposed a similar idea about four and a half years ago:
>>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>>> ,
>>>> PR#7804. The reception was ambivalent, but a couple of folks have asked
>>>> me
>>>> about this, so I'm bringing it back.
>>>>
>>>> Some pros:
>>>>
>>>> - This closes issue #12336
>>>> - There are a couple of Stack Overflow questions that would benefit
>>>> - Been asked about this a couple of times
>>>> - Implementation of three existing atleast_*d functions gets easier
>>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>>
>>>> Some cons:
>>>>
>>>> - Cluttering up the API
>>>> - Maintenance burden (but not a big one)
>>>> - This is just a utility function, which can be achieved through
>>>> broadcasting and reshaping
>>>>
>>>>
>>>> My main concern would be the namespace cluttering. I can't say I use
>>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>>> slightly against the addition. But if others land on the "useful" side here
>>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>>> "mental load" with respect to namespace cluttering.
>>>>
>>>> Bike shedding the API is probably a good idea in any case.
>>>>
>>>> I have pasted the current PR documentation (as html) below for quick
>>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>>> value rather than just a side?
>>>>
>>>>
>>>>
>>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>>> View input as array with at least ndim dimensions.
>>>> New unit dimensions are inserted at the index given by *pos* if
>>>> necessary.
>>>> Parameters*ary  *array_like
>>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>>> already have ndim or more dimensions are preserved.
>>>> *ndim  *int
>>>> The minimum number of dimensions required.
>>>> *pos  *int, optional
>>>> The index to insert the new dimensions. May range from -ary.ndim - 1 to
>>>>  +ary.ndim (inclusive). Non-negative indices indicate locations before
>>>> the corresponding axis: pos=0 means to insert at the very beginning.
>>>> Negative indices indicate locations after the corresponding axis:
>>>> pos=-1 means to insert at the very end. 0 and -1 are always guaranteed
>>>> to work. Any other number will depend on the dimensions of the existing
>>>> array. Default is 0.
>>>> Returns*res  *ndarray
>>>> An array with res.ndim >= ndim. A view is returned for array inputs.
>>>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array of
>>>> shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions
>>>> are appended if *pos* is -1, so for example a 2-D array of shape (M, N)
>>>>  becomes a view of shape (M, N, 1, 1)when ndim=4.
>>>> *See also*
>>>> atleast_1d
>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
>>>> , atleast_2d
>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
>>>> , atleast_3d
>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>>>> *Notes*
>>>> This function does not follow the convention of the other atleast_*d functions
>>>> in numpy in that it only accepts a single array argument. To process
>>>> multiple arrays, use a comprehension or loop around the function call. See
>>>> examples below.
>>>> Setting pos=0 is equivalent to how the array would be interpreted by
>>>> numpy?s broadcasting rules. There is no need to call this function for
>>>> simple broadcasting. This is also roughly (but not exactly) equivalent to
>>>>  np.array(ary, copy=False, subok=True, ndmin=ndim).
>>>> It is easy to create functions for specific dimensions similar to the
>>>> other atleast_*d functions using Python?s functools.partial
>>>> <https://docs.python.org/dev/library/functools.html#functools.partial> function.
>>>> An example is shown below.
>>>> *Examples*
>>>>
>>>> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>>>>
>>>> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>>>>
>>>> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>>>>
>>>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>>>
>>>> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>>>>
>>>> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210211/58af33f9/attachment-0001.html>

From jni at fastmail.com  Thu Feb 11 21:31:56 2021
From: jni at fastmail.com (Juan Nunez-Iglesias)
Date: Fri, 12 Feb 2021 13:31:56 +1100
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
Message-ID: <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>

both napari and scikit-image use atleast_ a few times. I don?t have many examples of where I used nd because it didn?t exist. But I have the very distinct impression of needing it repeatedly. In some places, I?ve used `np.broadcast_to` to signal the same intention, where `atleast_nd` would have been the more readable solution.

I don?t buy the argument that it?s just a way to mask errors. NumPy broadcasting also has that same potential but I hope no one would seriously consider deprecating it. Indeed, even if we accept that we (library authors) should force users to provide an array of the right dimensionality, that still argues for making it convenient for users to do that!

I don?t feel super strongly about this. But I think atleast_nd is a move in a positive direction and I?d prefer  it to what?s there now:

In [1]: import numpy as np
In [2]: np.atleast_3d(np.ones(4)).shape
Out[2]: (1, 4, 1)

There might be some linear algebraic reason why those axis positions make sense, but I?m not aware of it...

Juan.

> On 12 Feb 2021, at 5:32 am, Eric Wieser <wieser.eric+numpy at gmail.com> wrote:
> 
> I did a quick search of matplotlib, and found a few uses of all three functions:
> 
> * https://github.com/matplotlib/matplotlib/blob/fed55c63a314351cd39a12783f385009782c06e1/lib/matplotlib/_layoutgrid.py#L441-L446 <https://github.com/matplotlib/matplotlib/blob/fed55c63a314351cd39a12783f385009782c06e1/lib/matplotlib/_layoutgrid.py#L441-L446>
>   This one isn't really numpy at all, and is really just a shorthand for normalizing an argument `x=n` to `x=[n, n]`
> * https://github.com/matplotlib/matplotlib/blob/dd249744270f6abe3f540f81b7a77c0cb728ddbb/lib/matplotlib/mlab.py#L888 <https://github.com/matplotlib/matplotlib/blob/dd249744270f6abe3f540f81b7a77c0cb728ddbb/lib/matplotlib/mlab.py#L888>
>    This one is the classic "either multivariate or single-variable data" thing endemic to the SciPy ecosystem.
> * https://github.com/matplotlib/matplotlib/blob/1eef019109b64ee4085732544cb5e310e69451ab/lib/matplotlib/cbook/__init__.py#L1325-L1326 <https://github.com/matplotlib/matplotlib/blob/1eef019109b64ee4085732544cb5e310e69451ab/lib/matplotlib/cbook/__init__.py#L1325-L1326>
>   Matplotlib has their own `_check_1d` function for input sanitization, although github says it's only used to parse the arguments to `plot`, which at this point are fairly established as being flexible.
> * https://github.com/matplotlib/matplotlib/blob/f72adc49092fe0233a8cd21aa0f317918dafb18d/lib/matplotlib/transforms.py#L631 <https://github.com/matplotlib/matplotlib/blob/f72adc49092fe0233a8cd21aa0f317918dafb18d/lib/matplotlib/transforms.py#L631>
>   This just looks like "defensive programming", and if the argument isn't already 3d then something is probably wrong.
> 
> This isn't an exhaustive list, just a handful of different situations the functions were used.
> 
> Eric
> 
> 
> 
> On Thu, 11 Feb 2021 at 18:15, Stephan Hoyer <shoyer at gmail.com <mailto:shoyer at gmail.com>> wrote:
> On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root <ben.v.root at gmail.com <mailto:ben.v.root at gmail.com>> wrote:
> for me, I find that the at_least{1,2,3}d functions are useful for sanitizing inputs. Having an at_leastnd() function can be viewed as a step towards cleaning up the API, not cluttering it (although, deprecations of the existing functions probably should be long given how long they have existed).
> 
> I would love to see examples of this -- perhaps in matplotlib?
> 
> My thinking is that in most cases it's probably a better idea to keep the interface simpler, and raise an error for lower-dimensional arrays. Automatic conversion is convenient (and endemic within the SciPy ecosystem), but is also a common source of bugs.
> 
> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com <mailto:shoyer at gmail.com>> wrote:
> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com <mailto:jni at fastmail.com>> wrote:
> I totally agree with the namespace clutter concern, but honestly, I would use `atleast_nd` with its `pos` argument (I might rename it to `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`, for which I had no idea where the new axes would end up.
> 
> So, I?m in favour of including it, and optionally deprecating `atleast_{1,2,3}d`.
> 
> 
> I appreciate that `atleast_nd` feels more sensible than `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not recommend is a good enough reason for inclusion in NumPy. It needs to stand on its own.
> 
> What would be the recommended use-cases for this new function?
> Have any libraries building on top of NumPy implemented a version of this?
>  
> Juan.
> 
>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net <mailto:sebastian at sipsolutions.net>> wrote:
>> 
>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>> numpy.ma <http://numpy.ma/>. This would generalize the existing atleast_1d, atleast_2d, and
>>> atleast_3d functions.
>>> 
>>> I proposed a similar idea about four and a half years ago:
>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html <https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html>,
>>> PR#7804. The reception was ambivalent, but a couple of folks have asked me
>>> about this, so I'm bringing it back.
>>> 
>>> Some pros:
>>> 
>>> - This closes issue #12336
>>> - There are a couple of Stack Overflow questions that would benefit
>>> - Been asked about this a couple of times
>>> - Implementation of three existing atleast_*d functions gets easier
>>> - Looks nicer that the equivalent broadcasting and reshaping
>>> 
>>> Some cons:
>>> 
>>> - Cluttering up the API
>>> - Maintenance burden (but not a big one)
>>> - This is just a utility function, which can be achieved through
>>> broadcasting and reshaping
>>> 
>> 
>> My main concern would be the namespace cluttering. I can't say I use even the `atleast_2d` etc. functions personally, so I would tend to be slightly against the addition. But if others land on the "useful" side here (and it seemed a bit at least on github), I am also not opposed.  It is a clean name that lines up with existing ones, so it doesn't seem like a big "mental load" with respect to namespace cluttering.
>> 
>> Bike shedding the API is probably a good idea in any case.
>> 
>> I have pasted the current PR documentation (as html) below for quick reference. I wonder a bit about the reasoning for having `pos` specify a value rather than just a side?
>> 
>> 
>> 
>> numpy.atleast_nd(ary, ndim, pos=0)
>> View input as array with at least ndim dimensions.
>> New unit dimensions are inserted at the index given by pos if necessary.
>> Parameters
>> ary  array_like
>> The input array. Non-array inputs are converted to arrays. Arrays that already have ndim or more dimensions are preserved.
>> ndim  int
>> The minimum number of dimensions required.
>> pos  int, optional
>> The index to insert the new dimensions. May range from -ary.ndim - 1 to +ary.ndim (inclusive). Non-negative indices indicate locations before the corresponding axis: pos=0 means to insert at the very beginning. Negative indices indicate locations after the corresponding axis: pos=-1 means to insert at the very end. 0 and -1 are always guaranteed to work. Any other number will depend on the dimensions of the existing array. Default is 0.
>> Returns
>> res  ndarray
>> An array with res.ndim >= ndim. A view is returned for array inputs. Dimensions are prepended if pos is 0, so for example, a 1-D array of shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N). Dimensions are appended if pos is -1, so for example a 2-D array of shape (M, N) becomes a view of shape (M, N, 1, 1)when ndim=4.
>> See also
>> atleast_1d <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>, atleast_2d <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>, atleast_3d <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>> Notes
>> This function does not follow the convention of the other atleast_*d functions in numpy in that it only accepts a single array argument. To process multiple arrays, use a comprehension or loop around the function call. See examples below.
>> Setting pos=0 is equivalent to how the array would be interpreted by numpy?s broadcasting rules. There is no need to call this function for simple broadcasting. This is also roughly (but not exactly) equivalent to np.array(ary, copy=False, subok=True, ndmin=ndim).
>> It is easy to create functions for specific dimensions similar to the other atleast_*d functions using Python?s functools.partial <https://docs.python.org/dev/library/functools.html#functools.partial> function. An example is shown below.
>> Examples
>> >>> np.atleast_nd(3.0, 4)
>> array([[[[ 3.]]]])
>> >>> x = np.arange(3.0)
>> >>> np.atleast_nd(x, 2).shape
>> (1, 3)
>> >>> x = np.arange(12.0).reshape(4, 3)
>> >>> np.atleast_nd(x, 5).shape
>> (1, 1, 1, 4, 3)
>> >>> np.atleast_nd(x, 5).base is x.base
>> True
>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:
>> [array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>> >>> np.atleast_nd((1, 2), 5, pos=0).shape
>> (1, 1, 1, 1, 2)
>> >>> np.atleast_nd((1, 2), 5, pos=-1).shape
>> (2, 1, 1, 1, 1)
>> >>> from functools import partial
>> >>> atleast_4d = partial(np.atleast_nd, ndim=4)
>> >>> atleast_4d([1, 2, 3])
>> [[[[1, 2, 3]]]]
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/2dfdb06f/attachment-0001.html>

From ralf.gommers at gmail.com  Fri Feb 12 05:13:18 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 12 Feb 2021 11:13:18 +0100
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
Message-ID: <CABL7CQgQ_Dcg06xYMdksGRibyUO20EAUc45XRriTCvxJ0RTzSQ@mail.gmail.com>

On Fri, Feb 12, 2021 at 3:32 AM Juan Nunez-Iglesias <jni at fastmail.com>
wrote:

> both napari and scikit-image use atleast_ a few times. I don?t have many
> examples of where I used nd because it didn?t exist. But I have the very
> distinct impression of needing it repeatedly. In some places, I?ve used
> `np.broadcast_to` to signal the same intention, where `atleast_nd` would
> have been the more readable solution.
>
> I don?t buy the argument that it?s just a way to mask errors. NumPy
> broadcasting also has that same potential but I hope no one would seriously
> consider deprecating it. Indeed, even if we accept that we (library
> authors) should force users to provide an array of the right
> dimensionality, that still argues for making it convenient for users to do
> that!
>
> I don?t feel super strongly about this. But I think atleast_nd is a move
> in a positive direction and I?d prefer  it to what?s there now:
>
> In [1]: import numpy as np
> In [2]: np.atleast_3d(np.ones(4)).shape
> Out[2]: (1, 4, 1)
>
> There might be some linear algebraic reason why those axis positions make
> sense, but I?m not aware of it...
>

Yes that's pretty weird. I'm also not sure there's a reason.

It would be good that, if atleast_nd is not going to replicate this
behavior, atleast_3d was deprecated (perhaps a release or two after
introduction of atleast_nd).

Not having `atleast_3d(x) == atleast_nd(x, pos=3)` is unnecessarily
confusing.

Ralf


> Juan.
>
> On 12 Feb 2021, at 5:32 am, Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
> I did a quick search of matplotlib, and found a few uses of all three
> functions:
>
> *
> https://github.com/matplotlib/matplotlib/blob/fed55c63a314351cd39a12783f385009782c06e1/lib/matplotlib/_layoutgrid.py#L441-L446
>   This one isn't really numpy at all, and is really just a shorthand for
> normalizing an argument `x=n` to `x=[n, n]`
> *
> https://github.com/matplotlib/matplotlib/blob/dd249744270f6abe3f540f81b7a77c0cb728ddbb/lib/matplotlib/mlab.py#L888
>    This one is the classic "either multivariate or single-variable data"
> thing endemic to the SciPy ecosystem.
> *
> https://github.com/matplotlib/matplotlib/blob/1eef019109b64ee4085732544cb5e310e69451ab/lib/matplotlib/cbook/__init__.py#L1325-L1326
>   Matplotlib has their own `_check_1d` function for input sanitization,
> although github says it's only used to parse the arguments to `plot`, which
> at this point are fairly established as being flexible.
> *
> https://github.com/matplotlib/matplotlib/blob/f72adc49092fe0233a8cd21aa0f317918dafb18d/lib/matplotlib/transforms.py#L631
>   This just looks like "defensive programming", and if the argument isn't
> already 3d then something is probably wrong.
>
> This isn't an exhaustive list, just a handful of different situations the
> functions were used.
>
> Eric
>
>
>
> On Thu, 11 Feb 2021 at 18:15, Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root <ben.v.root at gmail.com>
>> wrote:
>>
>>> for me, I find that the at_least{1,2,3}d functions are useful for
>>> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
>>> towards cleaning up the API, not cluttering it (although, deprecations of
>>> the existing functions probably should be long given how long they have
>>> existed).
>>>
>>
>> I would love to see examples of this -- perhaps in matplotlib?
>>
>> My thinking is that in most cases it's probably a better idea to keep the
>> interface simpler, and raise an error for lower-dimensional arrays.
>> Automatic conversion is convenient (and endemic within the SciPy
>> ecosystem), but is also a common source of bugs.
>>
>> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>>>
>>>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
>>>> wrote:
>>>>
>>>>> I totally agree with the namespace clutter concern, but honestly, I
>>>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>>>> for which I had no idea where the new axes would end up.
>>>>>
>>>>> So, I?m in favour of including it, and optionally deprecating
>>>>> `atleast_{1,2,3}d`.
>>>>>
>>>>>
>>>> I appreciate that `atleast_nd` feels more sensible than
>>>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>>>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>>>> on its own.
>>>>
>>>> What would be the recommended use-cases for this new function?
>>>> Have any libraries building on top of NumPy implemented a version of
>>>> this?
>>>>
>>>>
>>>>> Juan.
>>>>>
>>>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
>>>>> wrote:
>>>>>
>>>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>>>
>>>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d,
>>>>> and
>>>>> atleast_3d functions.
>>>>>
>>>>> I proposed a similar idea about four and a half years ago:
>>>>>
>>>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>>>> ,
>>>>> PR#7804. The reception was ambivalent, but a couple of folks have
>>>>> asked me
>>>>> about this, so I'm bringing it back.
>>>>>
>>>>> Some pros:
>>>>>
>>>>> - This closes issue #12336
>>>>> - There are a couple of Stack Overflow questions that would benefit
>>>>> - Been asked about this a couple of times
>>>>> - Implementation of three existing atleast_*d functions gets easier
>>>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>>>
>>>>> Some cons:
>>>>>
>>>>> - Cluttering up the API
>>>>> - Maintenance burden (but not a big one)
>>>>> - This is just a utility function, which can be achieved through
>>>>> broadcasting and reshaping
>>>>>
>>>>>
>>>>> My main concern would be the namespace cluttering. I can't say I use
>>>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>>>> slightly against the addition. But if others land on the "useful" side here
>>>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>>>> "mental load" with respect to namespace cluttering.
>>>>>
>>>>> Bike shedding the API is probably a good idea in any case.
>>>>>
>>>>> I have pasted the current PR documentation (as html) below for quick
>>>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>>>> value rather than just a side?
>>>>>
>>>>>
>>>>>
>>>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>>>> View input as array with at least ndim dimensions.
>>>>> New unit dimensions are inserted at the index given by *pos* if
>>>>> necessary.
>>>>> Parameters*ary  *array_like
>>>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>>>> already have ndim or more dimensions are preserved.
>>>>> *ndim  *int
>>>>> The minimum number of dimensions required.
>>>>> *pos  *int, optional
>>>>> The index to insert the new dimensions. May range from -ary.ndim - 1
>>>>> to +ary.ndim (inclusive). Non-negative indices indicate locations
>>>>> before the corresponding axis: pos=0 means to insert at the very
>>>>> beginning. Negative indices indicate locations after the corresponding axis:
>>>>>  pos=-1 means to insert at the very end. 0 and -1 are always
>>>>> guaranteed to work. Any other number will depend on the dimensions of the
>>>>> existing array. Default is 0.
>>>>> Returns*res  *ndarray
>>>>> An array with res.ndim >= ndim. A view is returned for array inputs.
>>>>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array
>>>>> of shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N).
>>>>> Dimensions are appended if *pos* is -1, so for example a 2-D array of
>>>>> shape (M, N) becomes a view of shape (M, N, 1, 1)when ndim=4.
>>>>> *See also*
>>>>> atleast_1d
>>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
>>>>> , atleast_2d
>>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
>>>>> , atleast_3d
>>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>>>>> *Notes*
>>>>> This function does not follow the convention of the other atleast_*d functions
>>>>> in numpy in that it only accepts a single array argument. To process
>>>>> multiple arrays, use a comprehension or loop around the function call. See
>>>>> examples below.
>>>>> Setting pos=0 is equivalent to how the array would be interpreted by
>>>>> numpy?s broadcasting rules. There is no need to call this function for
>>>>> simple broadcasting. This is also roughly (but not exactly) equivalent to
>>>>>  np.array(ary, copy=False, subok=True, ndmin=ndim).
>>>>> It is easy to create functions for specific dimensions similar to the
>>>>> other atleast_*d functions using Python?s functools.partial
>>>>> <https://docs.python.org/dev/library/functools.html#functools.partial>
>>>>>  function. An example is shown below.
>>>>> *Examples*
>>>>>
>>>>> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>>>>>
>>>>> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>>>>>
>>>>> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>>>>>
>>>>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>>>>
>>>>> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>>>>>
>>>>> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/230dbee7/attachment-0001.html>

From wieser.eric+numpy at gmail.com  Fri Feb 12 05:14:23 2021
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Fri, 12 Feb 2021 10:14:23 +0000
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
Message-ID: <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>

> There might be some linear algebraic reason why those axis positions make
sense, but I?m not aware of it...

My guess is that the historical motivation was to allow grayscale `(H, W)`
images to be converted into `(H, W, 1)` images so that they can be
broadcast against `(H, W, 3)` RGB images.

Eric

On Fri, 12 Feb 2021 at 02:32, Juan Nunez-Iglesias <jni at fastmail.com> wrote:

> both napari and scikit-image use atleast_ a few times. I don?t have many
> examples of where I used nd because it didn?t exist. But I have the very
> distinct impression of needing it repeatedly. In some places, I?ve used
> `np.broadcast_to` to signal the same intention, where `atleast_nd` would
> have been the more readable solution.
>
> I don?t buy the argument that it?s just a way to mask errors. NumPy
> broadcasting also has that same potential but I hope no one would seriously
> consider deprecating it. Indeed, even if we accept that we (library
> authors) should force users to provide an array of the right
> dimensionality, that still argues for making it convenient for users to do
> that!
>
> I don?t feel super strongly about this. But I think atleast_nd is a move
> in a positive direction and I?d prefer  it to what?s there now:
>
> In [1]: import numpy as np
> In [2]: np.atleast_3d(np.ones(4)).shape
> Out[2]: (1, 4, 1)
>
> There might be some linear algebraic reason why those axis positions make
> sense, but I?m not aware of it...
>
> Juan.
>
> On 12 Feb 2021, at 5:32 am, Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
> I did a quick search of matplotlib, and found a few uses of all three
> functions:
>
> *
> https://github.com/matplotlib/matplotlib/blob/fed55c63a314351cd39a12783f385009782c06e1/lib/matplotlib/_layoutgrid.py#L441-L446
>   This one isn't really numpy at all, and is really just a shorthand for
> normalizing an argument `x=n` to `x=[n, n]`
> *
> https://github.com/matplotlib/matplotlib/blob/dd249744270f6abe3f540f81b7a77c0cb728ddbb/lib/matplotlib/mlab.py#L888
>    This one is the classic "either multivariate or single-variable data"
> thing endemic to the SciPy ecosystem.
> *
> https://github.com/matplotlib/matplotlib/blob/1eef019109b64ee4085732544cb5e310e69451ab/lib/matplotlib/cbook/__init__.py#L1325-L1326
>   Matplotlib has their own `_check_1d` function for input sanitization,
> although github says it's only used to parse the arguments to `plot`, which
> at this point are fairly established as being flexible.
> *
> https://github.com/matplotlib/matplotlib/blob/f72adc49092fe0233a8cd21aa0f317918dafb18d/lib/matplotlib/transforms.py#L631
>   This just looks like "defensive programming", and if the argument isn't
> already 3d then something is probably wrong.
>
> This isn't an exhaustive list, just a handful of different situations the
> functions were used.
>
> Eric
>
>
>
> On Thu, 11 Feb 2021 at 18:15, Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root <ben.v.root at gmail.com>
>> wrote:
>>
>>> for me, I find that the at_least{1,2,3}d functions are useful for
>>> sanitizing inputs. Having an at_leastnd() function can be viewed as a step
>>> towards cleaning up the API, not cluttering it (although, deprecations of
>>> the existing functions probably should be long given how long they have
>>> existed).
>>>
>>
>> I would love to see examples of this -- perhaps in matplotlib?
>>
>> My thinking is that in most cases it's probably a better idea to keep the
>> interface simpler, and raise an error for lower-dimensional arrays.
>> Automatic conversion is convenient (and endemic within the SciPy
>> ecosystem), but is also a common source of bugs.
>>
>> On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>>>
>>>> On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <jni at fastmail.com>
>>>> wrote:
>>>>
>>>>> I totally agree with the namespace clutter concern, but honestly, I
>>>>> would use `atleast_nd` with its `pos` argument (I might rename it to
>>>>> `position`, `axis`, or `axis_position`) any day over `at_least{1,2,3}d`,
>>>>> for which I had no idea where the new axes would end up.
>>>>>
>>>>> So, I?m in favour of including it, and optionally deprecating
>>>>> `atleast_{1,2,3}d`.
>>>>>
>>>>>
>>>> I appreciate that `atleast_nd` feels more sensible than
>>>> `at_least{1,2,3}d`, but I don't think "better" than a pattern we would not
>>>> recommend is a good enough reason for inclusion in NumPy. It needs to stand
>>>> on its own.
>>>>
>>>> What would be the recommended use-cases for this new function?
>>>> Have any libraries building on top of NumPy implemented a version of
>>>> this?
>>>>
>>>>
>>>>> Juan.
>>>>>
>>>>> On 11 Feb 2021, at 9:48 am, Sebastian Berg <sebastian at sipsolutions.net>
>>>>> wrote:
>>>>>
>>>>> On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz wrote:
>>>>>
>>>>> I've created PR#18386 to add a function called atleast_nd to numpy and
>>>>> numpy.ma. This would generalize the existing atleast_1d, atleast_2d,
>>>>> and
>>>>> atleast_3d functions.
>>>>>
>>>>> I proposed a similar idea about four and a half years ago:
>>>>>
>>>>> https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
>>>>> ,
>>>>> PR#7804. The reception was ambivalent, but a couple of folks have
>>>>> asked me
>>>>> about this, so I'm bringing it back.
>>>>>
>>>>> Some pros:
>>>>>
>>>>> - This closes issue #12336
>>>>> - There are a couple of Stack Overflow questions that would benefit
>>>>> - Been asked about this a couple of times
>>>>> - Implementation of three existing atleast_*d functions gets easier
>>>>> - Looks nicer that the equivalent broadcasting and reshaping
>>>>>
>>>>> Some cons:
>>>>>
>>>>> - Cluttering up the API
>>>>> - Maintenance burden (but not a big one)
>>>>> - This is just a utility function, which can be achieved through
>>>>> broadcasting and reshaping
>>>>>
>>>>>
>>>>> My main concern would be the namespace cluttering. I can't say I use
>>>>> even the `atleast_2d` etc. functions personally, so I would tend to be
>>>>> slightly against the addition. But if others land on the "useful" side here
>>>>> (and it seemed a bit at least on github), I am also not opposed.  It is a
>>>>> clean name that lines up with existing ones, so it doesn't seem like a big
>>>>> "mental load" with respect to namespace cluttering.
>>>>>
>>>>> Bike shedding the API is probably a good idea in any case.
>>>>>
>>>>> I have pasted the current PR documentation (as html) below for quick
>>>>> reference. I wonder a bit about the reasoning for having `pos` specify a
>>>>> value rather than just a side?
>>>>>
>>>>>
>>>>>
>>>>> numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
>>>>> View input as array with at least ndim dimensions.
>>>>> New unit dimensions are inserted at the index given by *pos* if
>>>>> necessary.
>>>>> Parameters*ary  *array_like
>>>>> The input array. Non-array inputs are converted to arrays. Arrays that
>>>>> already have ndim or more dimensions are preserved.
>>>>> *ndim  *int
>>>>> The minimum number of dimensions required.
>>>>> *pos  *int, optional
>>>>> The index to insert the new dimensions. May range from -ary.ndim - 1
>>>>> to +ary.ndim (inclusive). Non-negative indices indicate locations
>>>>> before the corresponding axis: pos=0 means to insert at the very
>>>>> beginning. Negative indices indicate locations after the corresponding axis:
>>>>>  pos=-1 means to insert at the very end. 0 and -1 are always
>>>>> guaranteed to work. Any other number will depend on the dimensions of the
>>>>> existing array. Default is 0.
>>>>> Returns*res  *ndarray
>>>>> An array with res.ndim >= ndim. A view is returned for array inputs.
>>>>> Dimensions are prepended if *pos* is 0, so for example, a 1-D array
>>>>> of shape (N,) with ndim=4becomes a view of shape (1, 1, 1, N).
>>>>> Dimensions are appended if *pos* is -1, so for example a 2-D array of
>>>>> shape (M, N) becomes a view of shape (M, N, 1, 1)when ndim=4.
>>>>> *See also*
>>>>> atleast_1d
>>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d>
>>>>> , atleast_2d
>>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d>
>>>>> , atleast_3d
>>>>> <https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d>
>>>>> *Notes*
>>>>> This function does not follow the convention of the other atleast_*d functions
>>>>> in numpy in that it only accepts a single array argument. To process
>>>>> multiple arrays, use a comprehension or loop around the function call. See
>>>>> examples below.
>>>>> Setting pos=0 is equivalent to how the array would be interpreted by
>>>>> numpy?s broadcasting rules. There is no need to call this function for
>>>>> simple broadcasting. This is also roughly (but not exactly) equivalent to
>>>>>  np.array(ary, copy=False, subok=True, ndmin=ndim).
>>>>> It is easy to create functions for specific dimensions similar to the
>>>>> other atleast_*d functions using Python?s functools.partial
>>>>> <https://docs.python.org/dev/library/functools.html#functools.partial>
>>>>>  function. An example is shown below.
>>>>> *Examples*
>>>>>
>>>>> >>> np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
>>>>>
>>>>> >>> x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
>>>>>
>>>>> >>> x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x, 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base is x.baseTrue
>>>>>
>>>>> >>> [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1, 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1, 2]]])]
>>>>>
>>>>> >>> np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1, 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1, 1, 1)
>>>>>
>>>>> >>> from functools import partial>>> atleast_4d = partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2, 3])[[[[1, 2, 3]]]]
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/2348e076/attachment-0001.html>

From sebastian at sipsolutions.net  Fri Feb 12 09:29:28 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 12 Feb 2021 08:29:28 -0600
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CABL7CQgQ_Dcg06xYMdksGRibyUO20EAUc45XRriTCvxJ0RTzSQ@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CABL7CQgQ_Dcg06xYMdksGRibyUO20EAUc45XRriTCvxJ0RTzSQ@mail.gmail.com>
Message-ID: <5f9919794e073c5474c3d19bb8ccbf3542e4ad09.camel@sipsolutions.net>

On Fri, 2021-02-12 at 11:13 +0100, Ralf Gommers wrote:
> On Fri, Feb 12, 2021 at 3:32 AM Juan Nunez-Iglesias
> <jni at fastmail.com>
> wrote:
> 
> > both napari and scikit-image use atleast_ a few times. I don?t have
> > many
> > examples of where I used nd because it didn?t exist. But I have the
> > very
> > distinct impression of needing it repeatedly. In some places, I?ve
> > used
> > `np.broadcast_to` to signal the same intention, where `atleast_nd`
> > would
> > have been the more readable solution.
> > 
> > I don?t buy the argument that it?s just a way to mask errors. NumPy
> > broadcasting also has that same potential but I hope no one would
> > seriously
> > consider deprecating it. Indeed, even if we accept that we (library
> > authors) should force users to provide an array of the right
> > dimensionality, that still argues for making it convenient for
> > users to do
> > that!
> > 
> > I don?t feel super strongly about this. But I think atleast_nd is a
> > move
> > in a positive direction and I?d prefer? it to what?s there now:
> > 
> > In [1]: import numpy as np
> > In [2]: np.atleast_3d(np.ones(4)).shape
> > Out[2]: (1, 4, 1)
> > 
> > There might be some linear algebraic reason why those axis
> > positions make
> > sense, but I?m not aware of it...
> > 
> 
> Yes that's pretty weird. I'm also not sure there's a reason.
> 
> It would be good that, if atleast_nd is not going to replicate this
> behavior, atleast_3d was deprecated (perhaps a release or two after
> introduction of atleast_nd).
> 

Planning to replace `atleast_3d` (not right now but soon), sounds like
a good way forward. "1, 2, nd" is pretty good. `atleast_3d` seems not
used all that much and is an odd one out. Having the `nd` version
should make a future deprecation painless, so long term we will be
better off.

- Sebastian


> Not having `atleast_3d(x) == atleast_nd(x, pos=3)` is unnecessarily
> confusing.
> 
> Ralf
> 
> 
> > Juan.
> > 
> > On 12 Feb 2021, at 5:32 am, Eric Wieser <
> > wieser.eric+numpy at gmail.com>
> > wrote:
> > 
> > I did a quick search of matplotlib, and found a few uses of all
> > three
> > functions:
> > 
> > *
> > https://github.com/matplotlib/matplotlib/blob/fed55c63a314351cd39a12783f385009782c06e1/lib/matplotlib/_layoutgrid.py#L441-L446
> > ? This one isn't really numpy at all, and is really just a
> > shorthand for
> > normalizing an argument `x=n` to `x=[n, n]`
> > *
> > https://github.com/matplotlib/matplotlib/blob/dd249744270f6abe3f540f81b7a77c0cb728ddbb/lib/matplotlib/mlab.py#L888
> > ?? This one is the classic "either multivariate or single-variable
> > data"
> > thing endemic to the SciPy ecosystem.
> > *
> > https://github.com/matplotlib/matplotlib/blob/1eef019109b64ee4085732544cb5e310e69451ab/lib/matplotlib/cbook/__init__.py#L1325-L1326
> > ? Matplotlib has their own `_check_1d` function for input
> > sanitization,
> > although github says it's only used to parse the arguments to
> > `plot`, which
> > at this point are fairly established as being flexible.
> > *
> > https://github.com/matplotlib/matplotlib/blob/f72adc49092fe0233a8cd21aa0f317918dafb18d/lib/matplotlib/transforms.py#L631
> > ? This just looks like "defensive programming", and if the argument
> > isn't
> > already 3d then something is probably wrong.
> > 
> > This isn't an exhaustive list, just a handful of different
> > situations the
> > functions were used.
> > 
> > Eric
> > 
> > 
> > 
> > On Thu, 11 Feb 2021 at 18:15, Stephan Hoyer <shoyer at gmail.com>
> > wrote:
> > 
> > > On Thu, Feb 11, 2021 at 9:42 AM Benjamin Root <
> > > ben.v.root at gmail.com>
> > > wrote:
> > > 
> > > > for me, I find that the at_least{1,2,3}d functions are useful
> > > > for
> > > > sanitizing inputs. Having an at_leastnd() function can be
> > > > viewed as a step
> > > > towards cleaning up the API, not cluttering it (although,
> > > > deprecations of
> > > > the existing functions probably should be long given how long
> > > > they have
> > > > existed).
> > > > 
> > > 
> > > I would love to see examples of this -- perhaps in matplotlib?
> > > 
> > > My thinking is that in most cases it's probably a better idea to
> > > keep the
> > > interface simpler, and raise an error for lower-dimensional
> > > arrays.
> > > Automatic conversion is convenient (and endemic within the SciPy
> > > ecosystem), but is also a common source of bugs.
> > > 
> > > On Thu, Feb 11, 2021 at 1:56 AM Stephan Hoyer <shoyer at gmail.com>
> > > wrote:
> > > > 
> > > > > On Wed, Feb 10, 2021 at 9:48 PM Juan Nunez-Iglesias <
> > > > > jni at fastmail.com>
> > > > > wrote:
> > > > > 
> > > > > > I totally agree with the namespace clutter concern, but
> > > > > > honestly, I
> > > > > > would use `atleast_nd` with its `pos` argument (I might
> > > > > > rename it to
> > > > > > `position`, `axis`, or `axis_position`) any day over
> > > > > > `at_least{1,2,3}d`,
> > > > > > for which I had no idea where the new axes would end up.
> > > > > > 
> > > > > > So, I?m in favour of including it, and optionally
> > > > > > deprecating
> > > > > > `atleast_{1,2,3}d`.
> > > > > > 
> > > > > > 
> > > > > I appreciate that `atleast_nd` feels more sensible than
> > > > > `at_least{1,2,3}d`, but I don't think "better" than a pattern
> > > > > we would not
> > > > > recommend is a good enough reason for inclusion in NumPy. It
> > > > > needs to stand
> > > > > on its own.
> > > > > 
> > > > > What would be the recommended use-cases for this new
> > > > > function?
> > > > > Have any libraries building on top of NumPy implemented a
> > > > > version of
> > > > > this?
> > > > > 
> > > > > 
> > > > > > Juan.
> > > > > > 
> > > > > > On 11 Feb 2021, at 9:48 am, Sebastian Berg <
> > > > > > sebastian at sipsolutions.net>
> > > > > > wrote:
> > > > > > 
> > > > > > On Wed, 2021-02-10 at 17:31 -0500, Joseph Fox-Rabinovitz
> > > > > > wrote:
> > > > > > 
> > > > > > I've created PR#18386 to add a function called atleast_nd
> > > > > > to numpy and
> > > > > > numpy.ma. This would generalize the existing atleast_1d,
> > > > > > atleast_2d,
> > > > > > and
> > > > > > atleast_3d functions.
> > > > > > 
> > > > > > I proposed a similar idea about four and a half years ago:
> > > > > > 
> > > > > > https://mail.python.org/pipermail/numpy-discussion/2016-July/075722.html
> > > > > > ,
> > > > > > PR#7804. The reception was ambivalent, but a couple of
> > > > > > folks have
> > > > > > asked me
> > > > > > about this, so I'm bringing it back.
> > > > > > 
> > > > > > Some pros:
> > > > > > 
> > > > > > - This closes issue #12336
> > > > > > - There are a couple of Stack Overflow questions that would
> > > > > > benefit
> > > > > > - Been asked about this a couple of times
> > > > > > - Implementation of three existing atleast_*d functions
> > > > > > gets easier
> > > > > > - Looks nicer that the equivalent broadcasting and
> > > > > > reshaping
> > > > > > 
> > > > > > Some cons:
> > > > > > 
> > > > > > - Cluttering up the API
> > > > > > - Maintenance burden (but not a big one)
> > > > > > - This is just a utility function, which can be achieved
> > > > > > through
> > > > > > broadcasting and reshaping
> > > > > > 
> > > > > > 
> > > > > > My main concern would be the namespace cluttering. I can't
> > > > > > say I use
> > > > > > even the `atleast_2d` etc. functions personally, so I would
> > > > > > tend to be
> > > > > > slightly against the addition. But if others land on the
> > > > > > "useful" side here
> > > > > > (and it seemed a bit at least on github), I am also not
> > > > > > opposed.? It is a
> > > > > > clean name that lines up with existing ones, so it doesn't
> > > > > > seem like a big
> > > > > > "mental load" with respect to namespace cluttering.
> > > > > > 
> > > > > > Bike shedding the API is probably a good idea in any case.
> > > > > > 
> > > > > > I have pasted the current PR documentation (as html) below
> > > > > > for quick
> > > > > > reference. I wonder a bit about the reasoning for having
> > > > > > `pos` specify a
> > > > > > value rather than just a side?
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > numpy.atleast_nd(*ary*, *ndim*, *pos=0*)
> > > > > > View input as array with at least ndim dimensions.
> > > > > > New unit dimensions are inserted at the index given by
> > > > > > *pos* if
> > > > > > necessary.
> > > > > > Parameters*ary? *array_like
> > > > > > The input array. Non-array inputs are converted to arrays.
> > > > > > Arrays that
> > > > > > already have ndim or more dimensions are preserved.
> > > > > > *ndim? *int
> > > > > > The minimum number of dimensions required.
> > > > > > *pos? *int, optional
> > > > > > The index to insert the new dimensions. May range from -
> > > > > > ary.ndim - 1
> > > > > > to +ary.ndim (inclusive). Non-negative indices indicate
> > > > > > locations
> > > > > > before the corresponding axis: pos=0 means to insert at the
> > > > > > very
> > > > > > beginning. Negative indices indicate locations after the
> > > > > > corresponding axis:
> > > > > > ?pos=-1 means to insert at the very end. 0 and -1 are
> > > > > > always
> > > > > > guaranteed to work. Any other number will depend on the
> > > > > > dimensions of the
> > > > > > existing array. Default is 0.
> > > > > > Returns*res? *ndarray
> > > > > > An array with res.ndim >= ndim. A view is returned for
> > > > > > array inputs.
> > > > > > Dimensions are prepended if *pos* is 0, so for example, a
> > > > > > 1-D array
> > > > > > of shape (N,) with ndim=4becomes a view of shape (1, 1, 1,
> > > > > > N).
> > > > > > Dimensions are appended if *pos* is -1, so for example a 2-
> > > > > > D array of
> > > > > > shape (M, N) becomes a view of shape (M, N, 1, 1)when
> > > > > > ndim=4.
> > > > > > *See also*
> > > > > > atleast_1d
> > > > > > <
> > > > > > https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_1d.html#numpy.atleast_1d
> > > > > > >
> > > > > > , atleast_2d
> > > > > > <
> > > > > > https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_2d.html#numpy.atleast_2d
> > > > > > >
> > > > > > , atleast_3d
> > > > > > <
> > > > > > https://18298-908607-gh.circle-artifacts.com/0/doc/build/html/reference/generated/numpy.atleast_3d.html#numpy.atleast_3d
> > > > > > >
> > > > > > *Notes*
> > > > > > This function does not follow the convention of the other
> > > > > > atleast_*d functions
> > > > > > in numpy in that it only accepts a single array argument.
> > > > > > To process
> > > > > > multiple arrays, use a comprehension or loop around the
> > > > > > function call. See
> > > > > > examples below.
> > > > > > Setting pos=0 is equivalent to how the array would be
> > > > > > interpreted by
> > > > > > numpy?s broadcasting rules. There is no need to call this
> > > > > > function for
> > > > > > simple broadcasting. This is also roughly (but not exactly)
> > > > > > equivalent to
> > > > > > ?np.array(ary, copy=False, subok=True, ndmin=ndim).
> > > > > > It is easy to create functions for specific dimensions
> > > > > > similar to the
> > > > > > other atleast_*d functions using Python?s functools.partial
> > > > > > <
> > > > > > https://docs.python.org/dev/library/functools.html#functools.partial
> > > > > > >
> > > > > > ?function. An example is shown below.
> > > > > > *Examples*
> > > > > > 
> > > > > > > > > np.atleast_nd(3.0, 4)array([[[[ 3.]]]])
> > > > > > 
> > > > > > > > > x = np.arange(3.0)>>> np.atleast_nd(x, 2).shape(1, 3)
> > > > > > 
> > > > > > > > > x = np.arange(12.0).reshape(4, 3)>>> np.atleast_nd(x,
> > > > > > > > > 5).shape(1, 1, 1, 4, 3)>>> np.atleast_nd(x, 5).base
> > > > > > > > > is x.baseTrue
> > > > > > 
> > > > > > > > > [np.atleast_nd(x) for x in ((1, 2), [[1, 2]], [[[1,
> > > > > > > > > 2]]])]:[array([[1, 2]]), array([[1, 2]]), array([[[1,
> > > > > > > > > 2]]])]
> > > > > > 
> > > > > > > > > np.atleast_nd((1, 2), 5, pos=0).shape(1, 1, 1, 1,
> > > > > > > > > 2)>>> np.atleast_nd((1, 2), 5, pos=-1).shape(2, 1, 1,
> > > > > > > > > 1, 1)
> > > > > > 
> > > > > > > > > from functools import partial>>> atleast_4d =
> > > > > > > > > partial(np.atleast_nd, ndim=4)>>> atleast_4d([1, 2,
> > > > > > > > > 3])[[[[1, 2, 3]]]]
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion at python.org
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > 
> > > > > > 
> > > > > > _______________________________________________
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion at python.org
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > 
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > 
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/fc1e19e4/attachment.sig>

From robert.kern at gmail.com  Fri Feb 12 09:30:07 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 12 Feb 2021 09:30:07 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
Message-ID: <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>

On Fri, Feb 12, 2021 at 5:15 AM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:

> > There might be some linear algebraic reason why those axis positions
> make sense, but I?m not aware of it...
>
> My guess is that the historical motivation was to allow grayscale `(H, W)`
> images to be converted into `(H, W, 1)` images so that they can be
> broadcast against `(H, W, 3)` RGB images.
>

Correct. If you do introduce atleast_nd(), I'm not sure why you'd deprecate
and remove the one existing function that *isn't* made redundant thereby.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/091916fc/attachment.html>

From jfoxrabinovitz at gmail.com  Fri Feb 12 09:44:31 2021
From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz)
Date: Fri, 12 Feb 2021 09:44:31 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
Message-ID: <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>

On Fri, Feb 12, 2021, 09:32 Robert Kern <robert.kern at gmail.com> wrote:

> On Fri, Feb 12, 2021 at 5:15 AM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> > There might be some linear algebraic reason why those axis positions
>> make sense, but I?m not aware of it...
>>
>> My guess is that the historical motivation was to allow grayscale `(H,
>> W)` images to be converted into `(H, W, 1)` images so that they can be
>> broadcast against `(H, W, 3)` RGB images.
>>
>
> Correct. If you do introduce atleast_nd(), I'm not sure why you'd
> deprecate and remove the one existing function that *isn't* made redundant
> thereby.
>

`atleast_nd` handles the promotion of 2D to 3D correctly. The `pos`
argument lets you tell it where to put the new axes. What's unintuitive to
my is that the 1D case gets promoted to from shape `(x,)` to shape `(1, x,
1)`. It takes two calls to `atleast_nd` to replicate that behavior.

One modification to `atleast_nd` I've thought about is making `pos` refer
to the position of the existing axes in the new array rather than the
position of the new axes, but that's likely not a useful way to go about it.

- Joe


> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/9c9d9010/attachment-0001.html>

From robert.kern at gmail.com  Fri Feb 12 10:08:45 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 12 Feb 2021 10:08:45 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
Message-ID: <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>

On Fri, Feb 12, 2021 at 9:45 AM Joseph Fox-Rabinovitz <
jfoxrabinovitz at gmail.com> wrote:

>
>
> On Fri, Feb 12, 2021, 09:32 Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Fri, Feb 12, 2021 at 5:15 AM Eric Wieser <wieser.eric+numpy at gmail.com>
>> wrote:
>>
>>> > There might be some linear algebraic reason why those axis positions
>>> make sense, but I?m not aware of it...
>>>
>>> My guess is that the historical motivation was to allow grayscale `(H,
>>> W)` images to be converted into `(H, W, 1)` images so that they can be
>>> broadcast against `(H, W, 3)` RGB images.
>>>
>>
>> Correct. If you do introduce atleast_nd(), I'm not sure why you'd
>> deprecate and remove the one existing function that *isn't* made redundant
>> thereby.
>>
>
> `atleast_nd` handles the promotion of 2D to 3D correctly. The `pos`
> argument lets you tell it where to put the new axes. What's unintuitive to
> my is that the 1D case gets promoted to from shape `(x,)` to shape `(1, x,
> 1)`. It takes two calls to `atleast_nd` to replicate that behavior.
>

When thinking about channeled images, the channel axis is not of the same
kind as the H and W axes. Really, you tend to want to think about an RGB
image as a (H, W) array of colors rather than an (H, W, 3) ndarray of
intensity values. As much as possible, you want to treat RGB images similar
to (H, W)-shaped grayscale images. Let's say I want to make a separable
filter to convolve with my image, that is, we have a 1D filter for each of
the H and W axes, and they are repeated for each channel, if RGB. Setting
up a separable filter for (H, W) grayscale is straightforward with
broadcasting semantics. I can use (ntaps,)-shaped vector for the W axis and
(ntaps, 1)-shaped filter for the H axis. Now, when I go to the RGB case, I
want the same thing. atleast_3d() adapts those correctly for the (H, W,
nchannels) case.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/b538395c/attachment.html>

From sebastian at sipsolutions.net  Fri Feb 12 13:25:31 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 12 Feb 2021 12:25:31 -0600
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
Message-ID: <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>

On Fri, 2021-02-12 at 10:08 -0500, Robert Kern wrote:
> On Fri, Feb 12, 2021 at 9:45 AM Joseph Fox-Rabinovitz <
> jfoxrabinovitz at gmail.com> wrote:
> 
> > 
> > 
> > On Fri, Feb 12, 2021, 09:32 Robert Kern <robert.kern at gmail.com>
> > wrote:
> > 
> > > On Fri, Feb 12, 2021 at 5:15 AM Eric Wieser <
> > > wieser.eric+numpy at gmail.com>
> > > wrote:
> > > 
> > > > > There might be some linear algebraic reason why those axis
> > > > > positions
> > > > make sense, but I?m not aware of it...
> > > > 
> > > > My guess is that the historical motivation was to allow
> > > > grayscale `(H,
> > > > W)` images to be converted into `(H, W, 1)` images so that they
> > > > can be
> > > > broadcast against `(H, W, 3)` RGB images.
> > > > 
> > > 
> > > Correct. If you do introduce atleast_nd(), I'm not sure why you'd
> > > deprecate and remove the one existing function that *isn't* made
> > > redundant
> > > thereby.
> > > 
> > 
> > `atleast_nd` handles the promotion of 2D to 3D correctly. The `pos`
> > argument lets you tell it where to put the new axes. What's
> > unintuitive to
> > my is that the 1D case gets promoted to from shape `(x,)` to shape
> > `(1, x,
> > 1)`. It takes two calls to `atleast_nd` to replicate that behavior.
> > 
> 
> When thinking about channeled images, the channel axis is not of the
> same
> kind as the H and W axes. Really, you tend to want to think about an
> RGB
> image as a (H, W) array of colors rather than an (H, W, 3) ndarray of
> intensity values. As much as possible, you want to treat RGB images
> similar
> to (H, W)-shaped grayscale images. Let's say I want to make a
> separable
> filter to convolve with my image, that is, we have a 1D filter for
> each of
> the H and W axes, and they are repeated for each channel, if RGB.
> Setting
> up a separable filter for (H, W) grayscale is straightforward with
> broadcasting semantics. I can use (ntaps,)-shaped vector for the W
> axis and
> (ntaps, 1)-shaped filter for the H axis. Now, when I go to the RGB
> case, I
> want the same thing. atleast_3d() adapts those correctly for the (H,
> W,
> nchannels) case.

Right, my initial feeling it that without such context `atleast_3d` is
pretty surprising.  So I wonder if we can design `atleast_nd` in a way
that it is explicit about this context.

The `pos` argument is the current solution to this, but maybe is a
better way [2]?  Meshgrid for example defaults to `indexing='xy'` and
has `indexing='ij'` for a similar purpose [1].

Of course, if `atleast_3d` is common enough, I guess that argument
could also swing to adding a keyword-only argument to `atleast_3d`
(that way we can/will never change the default).

- Sebastian


[1] Not sure the purposes are comparable, but in both cases, they
provide information about the "context" in which meshgrid/atleast_3d
are used.

[2] It feels a bit like you may have to think about what `pos=3` will
actually do (in the sense, that we will all just end up doing trial and
error :)). At which point I am not sure there is too much gained over
the surprise of `atleast_3d`. 

> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/650133c6/attachment.sig>

From ralf.gommers at gmail.com  Fri Feb 12 13:46:02 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 12 Feb 2021 19:46:02 +0100
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
Message-ID: <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>

On Fri, Feb 12, 2021 at 7:25 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Fri, 2021-02-12 at 10:08 -0500, Robert Kern wrote:
> > On Fri, Feb 12, 2021 at 9:45 AM Joseph Fox-Rabinovitz <
> > jfoxrabinovitz at gmail.com> wrote:
> >
> > >
> > >
> > > On Fri, Feb 12, 2021, 09:32 Robert Kern <robert.kern at gmail.com>
> > > wrote:
> > >
> > > > On Fri, Feb 12, 2021 at 5:15 AM Eric Wieser <
> > > > wieser.eric+numpy at gmail.com>
> > > > wrote:
> > > >
> > > > > > There might be some linear algebraic reason why those axis
> > > > > > positions
> > > > > make sense, but I?m not aware of it...
> > > > >
> > > > > My guess is that the historical motivation was to allow
> > > > > grayscale `(H,
> > > > > W)` images to be converted into `(H, W, 1)` images so that they
> > > > > can be
> > > > > broadcast against `(H, W, 3)` RGB images.
> > > > >
> > > >
> > > > Correct. If you do introduce atleast_nd(), I'm not sure why you'd
> > > > deprecate and remove the one existing function that *isn't* made
> > > > redundant
> > > > thereby.
> > > >
> > >
> > > `atleast_nd` handles the promotion of 2D to 3D correctly. The `pos`
> > > argument lets you tell it where to put the new axes. What's
> > > unintuitive to
> > > my is that the 1D case gets promoted to from shape `(x,)` to shape
> > > `(1, x,
> > > 1)`. It takes two calls to `atleast_nd` to replicate that behavior.
> > >
> >
> > When thinking about channeled images, the channel axis is not of the
> > same
> > kind as the H and W axes. Really, you tend to want to think about an
> > RGB
> > image as a (H, W) array of colors rather than an (H, W, 3) ndarray of
> > intensity values. As much as possible, you want to treat RGB images
> > similar
> > to (H, W)-shaped grayscale images. Let's say I want to make a
> > separable
> > filter to convolve with my image, that is, we have a 1D filter for
> > each of
> > the H and W axes, and they are repeated for each channel, if RGB.
> > Setting
> > up a separable filter for (H, W) grayscale is straightforward with
> > broadcasting semantics. I can use (ntaps,)-shaped vector for the W
> > axis and
> > (ntaps, 1)-shaped filter for the H axis. Now, when I go to the RGB
> > case, I
> > want the same thing. atleast_3d() adapts those correctly for the (H,
> > W,
> > nchannels) case.
>
> Right, my initial feeling it that without such context `atleast_3d` is
> pretty surprising.  So I wonder if we can design `atleast_nd` in a way
> that it is explicit about this context.
>

Agreed. I think such a use case is probably too specific to design a single
function for, at least in such a hardcoded way. There's also "channels
first" and "channels last" versions of RGB images as 3-D arrays, and
"channels first" is the default in most deep learning frameworks - so the
choice atleast_3d makes is a little outdated by now.

Cheers,
Ralf


> The `pos` argument is the current solution to this, but maybe is a
> better way [2]?  Meshgrid for example defaults to `indexing='xy'` and
> has `indexing='ij'` for a similar purpose [1].
>
> Of course, if `atleast_3d` is common enough, I guess that argument
> could also swing to adding a keyword-only argument to `atleast_3d`
> (that way we can/will never change the default).
>
> - Sebastian
>
>
> [1] Not sure the purposes are comparable, but in both cases, they
> provide information about the "context" in which meshgrid/atleast_3d
> are used.
>
> [2] It feels a bit like you may have to think about what `pos=3` will
> actually do (in the sense, that we will all just end up doing trial and
> error :)). At which point I am not sure there is too much gained over
> the surprise of `atleast_3d`.
>
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/837e899b/attachment.html>

From robert.kern at gmail.com  Fri Feb 12 15:20:21 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 12 Feb 2021 15:20:21 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
Message-ID: <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>

On Fri, Feb 12, 2021 at 1:47 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
> On Fri, Feb 12, 2021 at 7:25 PM Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
>
>> On Fri, 2021-02-12 at 10:08 -0500, Robert Kern wrote:
>> > On Fri, Feb 12, 2021 at 9:45 AM Joseph Fox-Rabinovitz <
>> > jfoxrabinovitz at gmail.com> wrote:
>> >
>> > >
>> > >
>> > > On Fri, Feb 12, 2021, 09:32 Robert Kern <robert.kern at gmail.com>
>> > > wrote:
>> > >
>> > > > On Fri, Feb 12, 2021 at 5:15 AM Eric Wieser <
>> > > > wieser.eric+numpy at gmail.com>
>> > > > wrote:
>> > > >
>> > > > > > There might be some linear algebraic reason why those axis
>> > > > > > positions
>> > > > > make sense, but I?m not aware of it...
>> > > > >
>> > > > > My guess is that the historical motivation was to allow
>> > > > > grayscale `(H,
>> > > > > W)` images to be converted into `(H, W, 1)` images so that they
>> > > > > can be
>> > > > > broadcast against `(H, W, 3)` RGB images.
>> > > > >
>> > > >
>> > > > Correct. If you do introduce atleast_nd(), I'm not sure why you'd
>> > > > deprecate and remove the one existing function that *isn't* made
>> > > > redundant
>> > > > thereby.
>> > > >
>> > >
>> > > `atleast_nd` handles the promotion of 2D to 3D correctly. The `pos`
>> > > argument lets you tell it where to put the new axes. What's
>> > > unintuitive to
>> > > my is that the 1D case gets promoted to from shape `(x,)` to shape
>> > > `(1, x,
>> > > 1)`. It takes two calls to `atleast_nd` to replicate that behavior.
>> > >
>> >
>> > When thinking about channeled images, the channel axis is not of the
>> > same
>> > kind as the H and W axes. Really, you tend to want to think about an
>> > RGB
>> > image as a (H, W) array of colors rather than an (H, W, 3) ndarray of
>> > intensity values. As much as possible, you want to treat RGB images
>> > similar
>> > to (H, W)-shaped grayscale images. Let's say I want to make a
>> > separable
>> > filter to convolve with my image, that is, we have a 1D filter for
>> > each of
>> > the H and W axes, and they are repeated for each channel, if RGB.
>> > Setting
>> > up a separable filter for (H, W) grayscale is straightforward with
>> > broadcasting semantics. I can use (ntaps,)-shaped vector for the W
>> > axis and
>> > (ntaps, 1)-shaped filter for the H axis. Now, when I go to the RGB
>> > case, I
>> > want the same thing. atleast_3d() adapts those correctly for the (H,
>> > W,
>> > nchannels) case.
>>
>> Right, my initial feeling it that without such context `atleast_3d` is
>> pretty surprising.  So I wonder if we can design `atleast_nd` in a way
>> that it is explicit about this context.
>>
>
> Agreed. I think such a use case is probably too specific to design a
> single function for, at least in such a hardcoded way.
>

That might be an argument for not designing a new one (or at least not
giving it such a name). Not sure it's a good argument for removing a
long-standing one.

Broadcasting is a very powerful convention that makes coding with arrays
tolerable. It makes some choices (namely, prepending 1s to the shape) to
make some common operations with mixed-dimension arrays work "by default".
But it doesn't cover all of the desired operations conveniently.
atleast_3d() bridges the gap to an important convention for a major
use-case of arrays.

There's also "channels first" and "channels last" versions of RGB images as
> 3-D arrays, and "channels first" is the default in most deep learning
> frameworks - so the choice atleast_3d makes is a little outdated by now.
>

DL frameworks do not constitute the majority of image processing code,
which has a very strong channels-last contingent. But nonetheless, the very
popular Tensorflow defaults to channels-last.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/983c434d/attachment-0001.html>

From melissawm at gmail.com  Fri Feb 12 15:36:50 2021
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Fri, 12 Feb 2021 17:36:50 -0300
Subject: [Numpy-discussion] Documentation Team meeting - Monday February 15
In-Reply-To: <CAC7J6Va3g7Pqrs=Hxtgg6GKHhZwErz4PPiKLw0YL-R2HUkWprQ@mail.gmail.com>
References: <CAC7J6Vbkch+=wQcEbJgELAphM2d2552Ywr2txrbinUR-khTR=g@mail.gmail.com>
 <CAC7J6VY9k_EGL-adWpeHLvFOEG6EBy3wauhD8cZ5vmL-JDxA0Q@mail.gmail.com>
 <CAC7J6VaNwZDbtt4FP6mX_-+iD_d=_4VVcjdGiaJj=q7eWCHewA@mail.gmail.com>
 <CAC7J6VZ2D8q0s7+rpKv7HYh9oBqwOQnUP_HjwKCjMtAnfi60Og@mail.gmail.com>
 <CAC7J6VY=vgKXySL1Euy30ZNzqkqvcr4Za1L04JZv-GCk87c1AQ@mail.gmail.com>
 <CAC7J6Vbq0vtd5JH6MXvcSj4A4aZjg+x=QdHvPyQGeGcmpp8JYg@mail.gmail.com>
 <CAC7J6VZ_+uh967zGc7qA6+KWKsP=gZ8bZecrJ9Hq9oG0JTzsOg@mail.gmail.com>
 <CAC7J6VZkder9wiEmE+Cm5Lxg0zt0t_a-KwfuvcCXZG91Wk+7kA@mail.gmail.com>
 <CAC7J6VYyt4c2qcp92iE8_BMF70cUC3-ZX5E5mqNa0zVAFryvmQ@mail.gmail.com>
 <CAC7J6VZq1DLDk--F+S6F50DHMMXOaWo6cbpSau7APHe22X_c9w@mail.gmail.com>
 <CAC7J6VaegMEY9dvMeRSe2YY=AUEJYkXNNF-KTy89zyE6LgHo5A@mail.gmail.com>
 <CAC7J6VY4_X-MyXajy0KryML03KoryKR9n5dsf5bupgv7noukUg@mail.gmail.com>
 <CAC7J6Va3n9Hdym+RmscOkDyf1ukH+DDtCv7N9D8y0cX1a+uC-A@mail.gmail.com>
 <CAC7J6VZX57n+_nWL-dKe4YzqY=Nc1=LYRXkNv8+JcP=q3Vyqog@mail.gmail.com>
 <CAC7J6Vb5uGp-gEMudbbiWSBKo1gyMsq03M4hz9mnNYir9GcqXw@mail.gmail.com>
 <CAC7J6Va3g7Pqrs=Hxtgg6GKHhZwErz4PPiKLw0YL-R2HUkWprQ@mail.gmail.com>
Message-ID: <CAC7J6Vay+j5SsbXsJkCh2K5aeR6a8HkrkFpzDWoPkOfH=a9OkA@mail.gmail.com>

Hi all!

Our next Documentation Team meeting will be on *Monday, February 15* at ***4PM
UTC***. All are welcome - you don't need to already be a contributor to
join. If you have questions or are curious about what we're doing, we'll be
happy to meet you!

If you wish to join on Zoom, use this link:

https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success

Here's the permanent hackmd document with the meeting notes (still being
updated in the next few days!):

https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg
<https://www.google.com/url?q=https%3A%2F%2Fhackmd.io%2FoB_boakvRqKR-_2jRV-Qjg&sa=D&usd=2&usg=AFQjCNGIOzVwlfDFd6YAgBwVUjmjQKWRSw>

Hope to see you around!

** You can click this link to get the correct time at your timezone:
https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210215T16&p1=1440&ah=1

- Melissa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/9d8f0092/attachment.html>

From ralf.gommers at gmail.com  Fri Feb 12 15:41:26 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Fri, 12 Feb 2021 21:41:26 +0100
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
Message-ID: <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>

On Fri, Feb 12, 2021 at 9:21 PM Robert Kern <robert.kern at gmail.com> wrote:

> On Fri, Feb 12, 2021 at 1:47 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>> On Fri, Feb 12, 2021 at 7:25 PM Sebastian Berg <
>> sebastian at sipsolutions.net> wrote:
>>
>>> On Fri, 2021-02-12 at 10:08 -0500, Robert Kern wrote:
>>> > On Fri, Feb 12, 2021 at 9:45 AM Joseph Fox-Rabinovitz <
>>> > jfoxrabinovitz at gmail.com> wrote:
>>> >
>>> > >
>>> > >
>>> > > On Fri, Feb 12, 2021, 09:32 Robert Kern <robert.kern at gmail.com>
>>> > > wrote:
>>> > >
>>> > > > On Fri, Feb 12, 2021 at 5:15 AM Eric Wieser <
>>> > > > wieser.eric+numpy at gmail.com>
>>> > > > wrote:
>>> > > >
>>> > > > > > There might be some linear algebraic reason why those axis
>>> > > > > > positions
>>> > > > > make sense, but I?m not aware of it...
>>> > > > >
>>> > > > > My guess is that the historical motivation was to allow
>>> > > > > grayscale `(H,
>>> > > > > W)` images to be converted into `(H, W, 1)` images so that they
>>> > > > > can be
>>> > > > > broadcast against `(H, W, 3)` RGB images.
>>> > > > >
>>> > > >
>>> > > > Correct. If you do introduce atleast_nd(), I'm not sure why you'd
>>> > > > deprecate and remove the one existing function that *isn't* made
>>> > > > redundant
>>> > > > thereby.
>>> > > >
>>> > >
>>> > > `atleast_nd` handles the promotion of 2D to 3D correctly. The `pos`
>>> > > argument lets you tell it where to put the new axes. What's
>>> > > unintuitive to
>>> > > my is that the 1D case gets promoted to from shape `(x,)` to shape
>>> > > `(1, x,
>>> > > 1)`. It takes two calls to `atleast_nd` to replicate that behavior.
>>> > >
>>> >
>>> > When thinking about channeled images, the channel axis is not of the
>>> > same
>>> > kind as the H and W axes. Really, you tend to want to think about an
>>> > RGB
>>> > image as a (H, W) array of colors rather than an (H, W, 3) ndarray of
>>> > intensity values. As much as possible, you want to treat RGB images
>>> > similar
>>> > to (H, W)-shaped grayscale images. Let's say I want to make a
>>> > separable
>>> > filter to convolve with my image, that is, we have a 1D filter for
>>> > each of
>>> > the H and W axes, and they are repeated for each channel, if RGB.
>>> > Setting
>>> > up a separable filter for (H, W) grayscale is straightforward with
>>> > broadcasting semantics. I can use (ntaps,)-shaped vector for the W
>>> > axis and
>>> > (ntaps, 1)-shaped filter for the H axis. Now, when I go to the RGB
>>> > case, I
>>> > want the same thing. atleast_3d() adapts those correctly for the (H,
>>> > W,
>>> > nchannels) case.
>>>
>>> Right, my initial feeling it that without such context `atleast_3d` is
>>> pretty surprising.  So I wonder if we can design `atleast_nd` in a way
>>> that it is explicit about this context.
>>>
>>
>> Agreed. I think such a use case is probably too specific to design a
>> single function for, at least in such a hardcoded way.
>>
>
> That might be an argument for not designing a new one (or at least not
> giving it such a name). Not sure it's a good argument for removing a
> long-standing one.
>

I agree. I'm not sure deprecating is best. But introducing new
functionality where `nd(pos=3) != 3d` is also not great.

At the very least, atleast_3d should be better documented. It also is
telling that Juan (a long-time) scikit-image dev doesn't like atleast_3d
and there's very little usage of it in scikit-image.

Cheers,
Ralf


> Broadcasting is a very powerful convention that makes coding with arrays
> tolerable. It makes some choices (namely, prepending 1s to the shape) to
> make some common operations with mixed-dimension arrays work "by default".
> But it doesn't cover all of the desired operations conveniently.
> atleast_3d() bridges the gap to an important convention for a major
> use-case of arrays.
>
> There's also "channels first" and "channels last" versions of RGB images
>> as 3-D arrays, and "channels first" is the default in most deep learning
>> frameworks - so the choice atleast_3d makes is a little outdated by now.
>>
>
> DL frameworks do not constitute the majority of image processing code,
> which has a very strong channels-last contingent. But nonetheless, the very
> popular Tensorflow defaults to channels-last.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/9b4a1024/attachment.html>

From robert.kern at gmail.com  Fri Feb 12 16:04:49 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Fri, 12 Feb 2021 16:04:49 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
 <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
Message-ID: <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>

On Fri, Feb 12, 2021 at 3:42 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
> On Fri, Feb 12, 2021 at 9:21 PM Robert Kern <robert.kern at gmail.com> wrote:
>
>> On Fri, Feb 12, 2021 at 1:47 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>> On Fri, Feb 12, 2021 at 7:25 PM Sebastian Berg <
>>> sebastian at sipsolutions.net> wrote:
>>>
>>>>
>>>> Right, my initial feeling it that without such context `atleast_3d` is
>>>> pretty surprising.  So I wonder if we can design `atleast_nd` in a way
>>>> that it is explicit about this context.
>>>>
>>>
>>> Agreed. I think such a use case is probably too specific to design a
>>> single function for, at least in such a hardcoded way.
>>>
>>
>> That might be an argument for not designing a new one (or at least not
>> giving it such a name). Not sure it's a good argument for removing a
>> long-standing one.
>>
>
> I agree. I'm not sure deprecating is best. But introducing new
> functionality where `nd(pos=3) != 3d` is also not great.
>
> At the very least, atleast_3d should be better documented. It also is
> telling that Juan (a long-time) scikit-image dev doesn't like atleast_3d
> and there's very little usage of it in scikit-image.
>

I'm fairly neutral on atleast_nd(). I think that for n=1 and n=2, you can
derive The One Way to Do It from broadcasting semantics, but for n>=3, I'm
not sure there's much value in trying to systematize it to a single
convention. I think that once you get up to those dimensions, you start to
want to have domain-specific semantics. I do agree that, in retrospect,
atleast_3d() probably should have been named more specifically. It was of a
piece of other conveniences like dstack() that did special things to
support channel-last images (and implicitly treat 3D arrays as such). For
example, DL frameworks that assemble channeled images into minibatches
(with different conventions like BHWC and BCHW), you'd want the n=4
behavior to do different things. I _think_ you'd just want to do those with
different functions than a complicated set of arguments to one function.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210212/11195597/attachment-0001.html>

From friedrichromstedt at gmail.com  Mon Feb 15 04:12:56 2021
From: friedrichromstedt at gmail.com (Friedrich Romstedt)
Date: Mon, 15 Feb 2021 10:12:56 +0100
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
Message-ID: <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>

Hi,

Am Do., 4. Feb. 2021 um 09:07 Uhr schrieb Friedrich Romstedt
<friedrichromstedt at gmail.com>:
> Am Mo., 1. Feb. 2021 um 09:46 Uhr schrieb Matti Picus <matti.picus at gmail.com>:
> > Typically, one would create a complete example and then pointing to the
> > code (as repo or pastebin, not as an attachment to a mail here).
>
> https://github.com/friedrichromstedt/bughunting-01

Last week I updated my example code to be more slim.  There now exists
a single-file extension module:
https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp.
The corresponding test program
https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py
crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as
well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the ``print``
statement contained in the test file is commented out.

My hope to be able to fix my error myself by reducing the code to
reproduce the problem has not been fulfillled.  I feel that the
abovementioned test code is short enough to ask for help with it here.
Any hint on how I could solve my problem would be appreciated very
much.

There are some points which were not clarified yet; I am citing them below.

So far,
Friedrich

> > - There are tools out there to analyze refcount problems. Python has
> > some built-in tools for switching allocation strategies.
>
> Can you give me some pointer about this?
>
> > - numpy.asarray has a number of strategies to convert instances, which
> > one is it using?
>
> I've tried to read about this, but couldn't find anything.  What are
> these different strategies?

From Pietro.Fontana at synopsys.com  Mon Feb 15 10:38:09 2021
From: Pietro.Fontana at synopsys.com (Pietro Fontana)
Date: Mon, 15 Feb 2021 15:38:09 +0000
Subject: [Numpy-discussion] Compile NumPy with ifort,
 MSVC and MKL - DLL load failed
Message-ID: <MWHPR12MB1871E5D71BDBFB82F1783285C4889@MWHPR12MB1871.namprd12.prod.outlook.com>

Hi all,
I've been trying to compile NumPy from source on Windows 10, with MSVC compiler and Intel MKL. Whenever I link to MKL it fails at loading DLLs.
I am running Windows 10.0.18363 with Microsoft Visual Studio 2019 (16.8.5) and Intel MKL 2017.8.275.
I managed to reproduce the issue with a minimal setup, using latest Python and NumPy.

  1.  Download latest Python (3.9.1) and latest NumPy (1.20.1) source.
  2.  Open a VS command prompt, unpack Python source, build with PCbuild\build.bat
  3.  Run mklvars.bat intel64 to get the right environment variables set.
  4.  Add the Intel compilers (needed for ifort) to PATH:
set PATH=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017\windows\bin\intel64;%PATH%

  1.  Create a virtual env, copy a few files from the Python build and activate the virtual env:
copy Python\PCbuild\amd64\python39.dll venv\Scripts
copy Python\PC\pyconfig.h venv\Include

  1.  Build NumPy from source and install: pip install . -v
  2.  Try to import NumPy: python -c "import numpy"
The error message appears as follows:

Traceback (most recent call last):

  File "C:\path\numpy_clean_env\venv\lib\site-packages\numpy\core\__init__.py", line 22, in <module>

    from . import multiarray

  File "C:\path\numpy_clean_env\venv\lib\site-packages\numpy\core\multiarray.py", line 12, in <module>

    from . import overrides

  File "C:\path\numpy_clean_env\venv\lib\site-packages\numpy\core\overrides.py", line 7, in <module>

    from numpy.core._multiarray_umath import (

ImportError: DLL load failed while importing _multiarray_umath: The specified module could not be found.


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "<string>", line 1, in <module>

  File "C:\path\numpy_clean_env\venv\lib\site-packages\numpy\__init__.py", line 145, in <module>

    from . import core

  File "C:\path\numpy_clean_env\venv\lib\site-packages\numpy\core\__init__.py", line 48, in <module>

    raise ImportError(msg)

ImportError:

[... useful suggestions that however did not lead to a solution...]

Original error was: DLL load failed while importing _multiarray_umath: The specified module could not be found.


The MKL libraries are picked up during compilation since it returns:

FOUND:

        libraries = ['mkl_rt']

        library_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries\\windows\\mkl\\lib\\intel64']

        define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]

        include_dirs = ['C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries\\windows\\mkl', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries\\windows\\mkl\\include', 'C:\\Program Files (x86)\\IntelSWTools\\compilers_and_libraries\\windows\\mkl\\lib']

I tried to analyze the DLL resolution on _multiarray_umath.pyd with Dependencies (the newer version of Dependency Walker) but it seems that the MKL DLL loads fine. There are some DLLs that appear as not correctly loaded, but as far as I understand it is caused by the inspection software limit with Windows API sets (api-ms-win-core-*, ext-ms-onecore-*, ext-ms-win-*, and similar), not by actual problems with this DLLs, so I think the system is correctly setup.

If I skip the initialization of MKL environment variables, then the MKL libraries are not picked and NumPy is compiled to a functional state.

In the past this setup used to work with Python 3.6, VS2015 and a similar version of Intel MKL.
I was able to reproduce the issue with NumPy 1.16.2, 1.17 and 1.20.1; with Python 3.8.6 and Python 3.9.1; with Intel MKL 2017 and oneAPI 2020.

Am I missing any obvious step to succeed in this adventure?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210215/5d9ef8cb/attachment-0001.html>

From sebastian at sipsolutions.net  Mon Feb 15 10:54:19 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 15 Feb 2021 09:54:19 -0600
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
Message-ID: <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>

On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote:
> Hi,
> 
> Am Do., 4. Feb. 2021 um 09:07 Uhr schrieb Friedrich Romstedt
> <friedrichromstedt at gmail.com>:
> > Am Mo., 1. Feb. 2021 um 09:46 Uhr schrieb Matti Picus <
> > matti.picus at gmail.com>:
> > > Typically, one would create a complete example and then pointing
> > > to the
> > > code (as repo or pastebin, not as an attachment to a mail here).
> > 
> > https://github.com/friedrichromstedt/bughunting-01
> 
> Last week I updated my example code to be more slim.? There now
> exists
> a single-file extension module:
> https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp
> .
> The corresponding test program
> https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py
> crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as
> well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the
> ``print``
> statement contained in the test file is commented out.
> 
> My hope to be able to fix my error myself by reducing the code to
> reproduce the problem has not been fulfillled.? I feel that the
> abovementioned test code is short enough to ask for help with it
> here.
> Any hint on how I could solve my problem would be appreciated very
> much.

I have tried it out, and can confirm that using debugging tools (namely
valgrind), will allow you track down the issue (valgrind reports it
from within python, running a python without debug symbols may
obfuscate the actual problem; if that is the limiting you, I can post
my valgrind output).
Since you are running a linux system, I am confident that you can run
it in valgrind to find it yourself.  (There may be other ways.)

Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and
ignore some errors e.g. when importing NumPy.

Cheers,

Sebastian


> 
> There are some points which were not clarified yet; I am citing them
> below.
> 
> So far,
> Friedrich
> 
> > > - There are tools out there to analyze refcount problems. Python
> > > has
> > > some built-in tools for switching allocation strategies.
> > 
> > Can you give me some pointer about this?
> > 
> > > - numpy.asarray has a number of strategies to convert instances,
> > > which
> > > one is it using?
> > 
> > I've tried to read about this, but couldn't find anything.? What
> > are
> > these different strategies?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210215/fd70e35e/attachment.sig>

From cgohlke at uci.edu  Mon Feb 15 11:34:36 2021
From: cgohlke at uci.edu (Christoph Gohlke)
Date: Mon, 15 Feb 2021 08:34:36 -0800
Subject: [Numpy-discussion] Compile NumPy with ifort,
 MSVC and MKL - DLL load failed
In-Reply-To: <MWHPR12MB1871E5D71BDBFB82F1783285C4889@MWHPR12MB1871.namprd12.prod.outlook.com>
References: <MWHPR12MB1871E5D71BDBFB82F1783285C4889@MWHPR12MB1871.namprd12.prod.outlook.com>
Message-ID: <2297ae37-cf0d-7929-400e-8033206bdb5d@uci.edu>

Hello,

On 2/15/2021 7:38 AM, Pietro Fontana wrote:
> Hi all,
> 
> I've been trying to compile NumPy from source on Windows 10, with MSVC 
> compiler and Intel MKL. Whenever I link to MKL it fails at loading DLLs.
> I am running Windows 10.0.18363 with Microsoft Visual Studio 2019 
> (16.8.5) and Intel MKL 2017.8.275.
> 
> I managed to reproduce the issue with a minimal setup, using latest 
> Python and NumPy.
> 
>  1. Download latest Python (3.9.1) and latest NumPy (1.20.1) source.
>  2. Open a VS command prompt, unpack Python source, build with
>     PCbuild\build.bat
>  3. Run mklvars.bat intel64 to get the right environment variables set.
>  4. Add the Intel compilers (needed for ifort) to PATH:
> 
> set PATH=C:\Program Files 
> (x86)\IntelSWTools\compilers_and_libraries_2017\windows\bin\intel64;%PATH%
> 
>  5. Create a virtual env, copy a few files from the Python build and
>     activate the virtual env:
> 
> copy Python\PCbuild\amd64\python39.dll venv\Scripts
> copy Python\PC\pyconfig.h venv\Include
> 
>  6. Build NumPy from source and install: pip install . -v
>  7. Try to import NumPy: python -c "import numpy"
> 
> The error message appears as follows:
> 
> |Traceback (most recent |call||last|):|
> 
> ||File||"C:\path\numpy_clean_env\venv\lib\site-packages\numpy\core\__init__.py"|, line 
> |22|, |in|<|module|>|
> 
> ||from|. |import|multiarray|
> 
> ||File||"C:\path\numpy_clean_env\venv\lib\site-packages\numpy\core\multiarray.py"|, 
> line |12|, |in|<|module|>|
> 
> ||from|. |import|overrides|
> 
> ||File||"C:\path\numpy_clean_env\venv\lib\site-packages\numpy\core\overrides.py"|, 
> line |7|, |in|<|module|>|
> 
> ||from|numpy.core._multiarray_umath |import|(|
> 
> |ImportError: DLL |load||failed||while|importing _multiarray_umath: The specified |module|could |not|be found.|
> 
> ||
> 
> |During handling |of|the above |exception|, another |exception|occurred:|
> 
> ||
> 
> |Traceback (most recent |call||last|):|
> 
> ||File||"<string>"|, line |1|, |in|<|module|>|
> 
> ||File||"C:\path\numpy_clean_env\venv\lib\site-packages\numpy\__init__.py"|, line |145|, |in|<|module|>|
> 
> ||from|. |import|core|
> 
> ||File||"C:\path\numpy_clean_env\venv\lib\site-packages\numpy\core\__init__.py"|, line 
> |48|, |in|<|module|>|
> 
> ||raise|ImportError(msg)|
> 
> |ImportError:|
> 
> |[? useful suggestions that however did not lead to a solution?]|
> 
> |Original |error|was: DLL |load||failed||while|importing _multiarray_umath: The specified |module|could |not|be found.|
> 
>   
> 
> The MKL libraries are picked up during compilation since it returns:
> 
> |FOUND:|
> 
> |??????? libraries = [|'mkl_rt'|]|
> 
> |??????? library_dirs = [|'C:\\Program Files 
> (x86)\\IntelSWTools\\compilers_and_libraries\\windows\\mkl\\lib\\intel64'|]|
> 
> |??????? define_macros = [(|'SCIPY_MKL_H'|, |None|), (|'HAVE_CBLAS'|, |None|)]|
> 
> |??????? include_dirs = [|'C:\\Program Files 
> (x86)\\IntelSWTools\\compilers_and_libraries\\windows\\mkl'|, |'C:\\Program Files 
> (x86)\\IntelSWTools\\compilers_and_libraries\\windows\\mkl\\include'|, |'C:\\Program Files 
> (x86)\\IntelSWTools\\compilers_and_libraries\\windows\\mkl\\lib'|]|
> 
> I tried to analyze the DLL resolution on |_multiarray_umath.pyd| with 
> Dependencies (the newer version of Dependency Walker) but it seems that 
> the MKL DLL loads fine. There are some DLLs that appear as not correctly 
> loaded, but as far as I understand it is caused by the inspection 
> software limit with Windows API sets (api-ms-win-core-*, 
> ext-ms-onecore-*, ext-ms-win-*, and similar), not by actual problems 
> with this DLLs, so I think the system is correctly setup.
> 
> If I skip the initialization of MKL environment variables, then the MKL 
> libraries are not picked and NumPy is compiled to a functional state.
> 
> In the past this setup used to work with Python 3.6, VS2015 and a 
> similar version of Intel MKL.
> I was able to reproduce the issue with NumPy 1.16.2, 1.17 and 1.20.1; 
> with Python 3.8.6 and Python 3.9.1; with Intel MKL 2017 and oneAPI 2020.
> 
> Am I missing any obvious step to succeed in this adventure?
> 
> 

Python >= 3.8 will no longer use PATH for resolving dependencies of 
extension modules. Use os.add_dll_directory(mkl_bin_path) 
<https://docs.python.org/3/library/os.html#os.add_dll_directory> in all 
your scripts before importing numpy or add the call to a 
_distributor_init.py file in the numpy package directory.

Christoph

From lev.maximov at gmail.com  Mon Feb 15 12:49:34 2021
From: lev.maximov at gmail.com (Lev Maximov)
Date: Tue, 16 Feb 2021 00:49:34 +0700
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
 <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
Message-ID: <CAE_ApoBHS+J=HorXoBoHPuEbjyuYSGk4+Hrx_B-VZaGn3xMzuA@mail.gmail.com>

Hi Friedrich,

Try adding
    view->suboffsets = NULL;
    view->internal = NULL;
to Image_getbuffer

Best regards,
Lev

On Mon, Feb 15, 2021 at 10:57 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote:
> > Hi,
> >
> > Am Do., 4. Feb. 2021 um 09:07 Uhr schrieb Friedrich Romstedt
> > <friedrichromstedt at gmail.com>:
> > > Am Mo., 1. Feb. 2021 um 09:46 Uhr schrieb Matti Picus <
> > > matti.picus at gmail.com>:
> > > > Typically, one would create a complete example and then pointing
> > > > to the
> > > > code (as repo or pastebin, not as an attachment to a mail here).
> > >
> > > https://github.com/friedrichromstedt/bughunting-01
> >
> > Last week I updated my example code to be more slim.  There now
> > exists
> > a single-file extension module:
> >
> https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp
> > .
> > The corresponding test program
> >
> https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py
> > crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as
> > well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the
> > ``print``
> > statement contained in the test file is commented out.
> >
> > My hope to be able to fix my error myself by reducing the code to
> > reproduce the problem has not been fulfillled.  I feel that the
> > abovementioned test code is short enough to ask for help with it
> > here.
> > Any hint on how I could solve my problem would be appreciated very
> > much.
>
> I have tried it out, and can confirm that using debugging tools (namely
> valgrind), will allow you track down the issue (valgrind reports it
> from within python, running a python without debug symbols may
> obfuscate the actual problem; if that is the limiting you, I can post
> my valgrind output).
> Since you are running a linux system, I am confident that you can run
> it in valgrind to find it yourself.  (There may be other ways.)
>
> Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and
> ignore some errors e.g. when importing NumPy.
>
> Cheers,
>
> Sebastian
>
>
> >
> > There are some points which were not clarified yet; I am citing them
> > below.
> >
> > So far,
> > Friedrich
> >
> > > > - There are tools out there to analyze refcount problems. Python
> > > > has
> > > > some built-in tools for switching allocation strategies.
> > >
> > > Can you give me some pointer about this?
> > >
> > > > - numpy.asarray has a number of strategies to convert instances,
> > > > which
> > > > one is it using?
> > >
> > > I've tried to read about this, but couldn't find anything.  What
> > > are
> > > these different strategies?
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/173aed23/attachment.html>

From Pietro.Fontana at synopsys.com  Mon Feb 15 13:12:10 2021
From: Pietro.Fontana at synopsys.com (Pietro Fontana)
Date: Mon, 15 Feb 2021 18:12:10 +0000
Subject: [Numpy-discussion] Compile NumPy with ifort,
 MSVC and MKL - DLL load failed
In-Reply-To: <2297ae37-cf0d-7929-400e-8033206bdb5d@uci.edu>
References: <MWHPR12MB1871E5D71BDBFB82F1783285C4889@MWHPR12MB1871.namprd12.prod.outlook.com>
 <2297ae37-cf0d-7929-400e-8033206bdb5d@uci.edu>
Message-ID: <MWHPR12MB18711DCB1B44C3F6E9D1140DC4889@MWHPR12MB1871.namprd12.prod.outlook.com>

Hi, thank you very much for pointing me at this. I managed not to find this bit of information despite spending quite some time on the issue. 

Cheers, Pietro

From mansourmoufid at gmail.com  Mon Feb 15 19:35:35 2021
From: mansourmoufid at gmail.com (Mansour Moufid)
Date: Mon, 15 Feb 2021 19:35:35 -0500
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
Message-ID: <CALogXGXzu_rKv092=8C0Hi41QQ=bc_uzj2ESYnvsMkdFNUwTgQ@mail.gmail.com>

On Tue, Jan 26, 2021 at 3:50 AM Friedrich Romstedt
<friedrichromstedt at gmail.com> wrote:
>
> Hi,
>
> This is with Python 3.8.2 64-bit and numpy 1.19.2 on Windows 10.  I'd
> like to be able to convert some C++ extension type to a numpy array by
> using ``numpy.asarray``.  The extension type implements the Python
> buffer interface to support this.
>
> The extension type, called "Image" here, holds some chunk of
> ``double``, C order, contiguous, 2 dimensions.  It "owns" the buffer;
> the buffer is not shared with other objects.  The following Python
> code crashes::
>
>     image = <... Image production ...>
>     ar = numpy.asarray(image)
>
> However, when I say::
>
>     image = <... Image production ...>
>     print("---")
>     ar = numpy.asarray(image)
>
> the entire program is executing properly with correct data in the
> numpy ndarray produced using the buffer interface.

Maybe a dereference bug.

Try setting pointers to NULL after freeing, something like this:

    delete[] view->shape;
    view->shape = NULL;
    delete[] view->strides;
    view->strides = NULL;

...

    delete[] self->data;
    self->data = NULL;

From mansourmoufid at gmail.com  Mon Feb 15 19:47:42 2021
From: mansourmoufid at gmail.com (Mansour Moufid)
Date: Mon, 15 Feb 2021 19:47:42 -0500
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CALogXGXzu_rKv092=8C0Hi41QQ=bc_uzj2ESYnvsMkdFNUwTgQ@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CALogXGXzu_rKv092=8C0Hi41QQ=bc_uzj2ESYnvsMkdFNUwTgQ@mail.gmail.com>
Message-ID: <CALogXGWtbTb4C1n6wFzqSGyVdQPpLeKFm9VKZdSj69icB41ckg@mail.gmail.com>

On Mon, Feb 15, 2021 at 7:35 PM Mansour Moufid <mansourmoufid at gmail.com> wrote:
>
> On Tue, Jan 26, 2021 at 3:50 AM Friedrich Romstedt
> <friedrichromstedt at gmail.com> wrote:
> >
> > Hi,
> >
> > This is with Python 3.8.2 64-bit and numpy 1.19.2 on Windows 10.  I'd
> > like to be able to convert some C++ extension type to a numpy array by
> > using ``numpy.asarray``.  The extension type implements the Python
> > buffer interface to support this.
> >
> > The extension type, called "Image" here, holds some chunk of
> > ``double``, C order, contiguous, 2 dimensions.  It "owns" the buffer;
> > the buffer is not shared with other objects.  The following Python
> > code crashes::
> >
> >     image = <... Image production ...>
> >     ar = numpy.asarray(image)
> >
> > However, when I say::
> >
> >     image = <... Image production ...>
> >     print("---")
> >     ar = numpy.asarray(image)
> >
> > the entire program is executing properly with correct data in the
> > numpy ndarray produced using the buffer interface.
>
> Maybe a dereference bug.
>
> Try setting pointers to NULL after freeing, something like this:
>
>     delete[] view->shape;
>     view->shape = NULL;
>     delete[] view->strides;
>     view->strides = NULL;
>
> ...
>
>     delete[] self->data;
>     self->data = NULL;

Sorry for two messages in a row, I just noticed:

I don't see the type's tp_free member defined?

You can set it to PyObject_Free in Init_ImageType:

    ImageType.tp_free = PyObject_Free;

See here: https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_free

From pierre.augier at univ-grenoble-alpes.fr  Tue Feb 16 04:14:18 2021
From: pierre.augier at univ-grenoble-alpes.fr (PIERRE AUGIER)
Date: Tue, 16 Feb 2021 10:14:18 +0100 (CET)
Subject: [Numpy-discussion] Type annotation for Numpy arrays,
 accelerators and numpy.typing
Message-ID: <308708215.6518305.1613466858105.JavaMail.zimbra@univ-grenoble-alpes.fr>

Hi,

When Numpy 1.20 was released, I discovered numpy.typing and its documentation https://numpy.org/doc/stable/reference/typing.html

I know that it is very new but I'm a bit lost. A good API to describe Array type would be useful not only for type checkers but also for Python accelerators using ndarrays (in particular Pythran, Numba, Cython, Transonic).

For Transonic, I'd like to be able to use internally numpy.typing to have a better implementation of what we need in transonic.typing (in particular compatible with type checkers like MyPy).

However, it seems that I can't do anything with what I see today in numpy.typing.

For Python-Numpy accelerators, we need to be able to define precise array types to limit the compilation time and give useful hints for optimizations (ndim, partial or full shape). We also need fused types.

What can be done with Transonic is described in these pages: https://transonic.readthedocs.io/en/latest/examples/type_hints.html and https://transonic.readthedocs.io/en/latest/generated/transonic.typing.html

I think it would be good to be able to do things like that with numpy.typing. It may be already possible but I can't find how in the doc.

I can give few examples here. First very simple:

from transonic import Array

Af3d = Array[float, "3d"]

# Note that this can also be written without Array just as
Af3d = "float[:,:,:]"

# same thing but only contiguous C ordered
Af3d = Array[float, "3d", "C"]

Note: being able to limit the compilation just for C-aligned arrays is very important since it can drastically decrease the compilation time/memory and that some numerical kernels are anyway written to be efficient only with C (or Fortran) ordered arrays.

# 2d color image
A_im = Array[np.int16, "[:,:,3]"]

Now, fused types. This example is taken from a real life case (https://foss.heptapod.net/fluiddyn/fluidsim/-/blob/branch/default/fluidsim/base/time_stepping/pseudo_spect.py) so it's really useful in practice.

from transonic import Type, NDim, Array, Union

N = NDim(2, 3, 4)
A = Array[np.complex128, N, "C"]
Am1 = Array[np.complex128, N - 1, "C"]

N123 = NDim(1, 2, 3)
A123c = Array[np.complex128, N123, "C"]
A123f = Array[np.float64, N123, "C"]

T = Type(np.float64, np.complex128)
A1 = Array[T, N, "C"]
A2 = Array[T, N - 1, "C"]
ArrayDiss = Union[A1, A2]

To summarize, type annotations are and will also be used for Python-Numpy accelerators. It would be good to also consider this application when designing numpy.typing.

Cheers,
Pierre

From friedrichromstedt at gmail.com  Tue Feb 16 05:00:34 2021
From: friedrichromstedt at gmail.com (Friedrich Romstedt)
Date: Tue, 16 Feb 2021 11:00:34 +0100
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
 <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
Message-ID: <CAN06=Cw0P91gGZpHHrewrBjek+9+szo-3n2=OKLFbrL_GTQvQg@mail.gmail.com>

Hello again,

Am Mo., 15. Feb. 2021 um 16:57 Uhr schrieb Sebastian Berg
<sebastian at sipsolutions.net>:
>
> On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote:
> > Last week I updated my example code to be more slim.  There now
> > exists
> > a single-file extension module:
> > https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp
> > .
> > The corresponding test program
> > https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py
> > crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as
> > well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the
> > ``print``
> > statement contained in the test file is commented out.
>
> I have tried it out, and can confirm that using debugging tools (namely
> valgrind), will allow you track down the issue (valgrind reports it
> from within python, running a python without debug symbols may
> obfuscate the actual problem; if that is the limiting you, I can post
> my valgrind output).
> Since you are running a linux system, I am confident that you can run
> it in valgrind to find it yourself.  (There may be other ways.)
>
> Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and
> ignore some errors e.g. when importing NumPy.

>From running ``PYTHONMALLOC=malloc valgrind python3
2021-01-11_0909.py`` (with the preceding call of ``print`` in
:file:`2021-01-11_0909.py` commented out) I found a few things:

-   The call might or might not succeed.  It doesn't always lead to a segfault.
-   "at 0x4A64A73: ??? (in /usr/lib/libpython3.9.so.1.0), called by
0x4A64914: PyMemoryView_FromObject (in /usr/lib/libpython3.9.so.1.0)",
a "Conditional jump or move depends on uninitialised value(s)".  After
one more block of valgrind output ("Use of uninitialised value of size
8 at 0x48EEA1B: ??? (in /usr/lib/libpython3.9.so.1.0)"), it finally
leads either to "Invalid read of size 8 at 0x48EEA1B: ??? (in
/usr/lib/libpython3.9.so.1.0) [...] Address 0x1 is not stack'd,
malloc'd or (recently) free'd", resulting in a segfault, or just to
another "Use of uninitialised value of size 8 at 0x48EEA15: ??? (in
/usr/lib/libpython3.9.so.1.0)", after which the program completes
successfully.
-   All this happens within "PyMemoryView_FromObject".

So I can only guess that the "uninitialised value" is compared to 0x0,
and when it is different (e.g. 0x1), it leads via "Address 0x1 is not
stack'd, malloc'd or (recently) free'd" to the segfault observed.

I suppose I need to compile Python and numpy myself to see the debug
symbols instead of the "???" marks? Maybe even with ``-O0``?

Furthermore, the shared object belonging to my code isn't involved
directly in any way, so the segfault possibly has to do with some data
I am leaving "uninitialised" at the moment.

Thanks for the other replies as well; for the moment I feel that going
the valgrind way might teach me how to debug errors of this kind
myself.

So far,
Friedrich

From lev.maximov at gmail.com  Tue Feb 16 05:48:31 2021
From: lev.maximov at gmail.com (Lev Maximov)
Date: Tue, 16 Feb 2021 17:48:31 +0700
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=Cw0P91gGZpHHrewrBjek+9+szo-3n2=OKLFbrL_GTQvQg@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
 <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
 <CAN06=Cw0P91gGZpHHrewrBjek+9+szo-3n2=OKLFbrL_GTQvQg@mail.gmail.com>
Message-ID: <CAE_ApoD-1gi2uSgsmdeUONfqBsNxuLBy5xxeZpHqNv6fewOHXg@mail.gmail.com>

I've reproduced the error you've described and got rid of it without
valgrind.
Those two lines are enough to avoid the segfault.

But feel free to find it yourself :)

Best regards,
Lev

On Tue, Feb 16, 2021 at 5:02 PM Friedrich Romstedt <
friedrichromstedt at gmail.com> wrote:

> Hello again,
>
> Am Mo., 15. Feb. 2021 um 16:57 Uhr schrieb Sebastian Berg
> <sebastian at sipsolutions.net>:
> >
> > On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote:
> > > Last week I updated my example code to be more slim.  There now
> > > exists
> > > a single-file extension module:
> > >
> https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp
> > > .
> > > The corresponding test program
> > >
> https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py
> > > crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as
> > > well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the
> > > ``print``
> > > statement contained in the test file is commented out.
> >
> > I have tried it out, and can confirm that using debugging tools (namely
> > valgrind), will allow you track down the issue (valgrind reports it
> > from within python, running a python without debug symbols may
> > obfuscate the actual problem; if that is the limiting you, I can post
> > my valgrind output).
> > Since you are running a linux system, I am confident that you can run
> > it in valgrind to find it yourself.  (There may be other ways.)
> >
> > Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and
> > ignore some errors e.g. when importing NumPy.
>
> From running ``PYTHONMALLOC=malloc valgrind python3
> 2021-01-11_0909.py`` (with the preceding call of ``print`` in
> :file:`2021-01-11_0909.py` commented out) I found a few things:
>
> -   The call might or might not succeed.  It doesn't always lead to a
> segfault.
> -   "at 0x4A64A73: ??? (in /usr/lib/libpython3.9.so.1.0), called by
> 0x4A64914: PyMemoryView_FromObject (in /usr/lib/libpython3.9.so.1.0)",
> a "Conditional jump or move depends on uninitialised value(s)".  After
> one more block of valgrind output ("Use of uninitialised value of size
> 8 at 0x48EEA1B: ??? (in /usr/lib/libpython3.9.so.1.0)"), it finally
> leads either to "Invalid read of size 8 at 0x48EEA1B: ??? (in
> /usr/lib/libpython3.9.so.1.0) [...] Address 0x1 is not stack'd,
> malloc'd or (recently) free'd", resulting in a segfault, or just to
> another "Use of uninitialised value of size 8 at 0x48EEA15: ??? (in
> /usr/lib/libpython3.9.so.1.0)", after which the program completes
> successfully.
> -   All this happens within "PyMemoryView_FromObject".
>
> So I can only guess that the "uninitialised value" is compared to 0x0,
> and when it is different (e.g. 0x1), it leads via "Address 0x1 is not
> stack'd, malloc'd or (recently) free'd" to the segfault observed.
>
> I suppose I need to compile Python and numpy myself to see the debug
> symbols instead of the "???" marks? Maybe even with ``-O0``?
>
> Furthermore, the shared object belonging to my code isn't involved
> directly in any way, so the segfault possibly has to do with some data
> I am leaving "uninitialised" at the moment.
>
> Thanks for the other replies as well; for the moment I feel that going
> the valgrind way might teach me how to debug errors of this kind
> myself.
>
> So far,
> Friedrich
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/b1cec2ab/attachment-0001.html>

From ralf.gommers at gmail.com  Tue Feb 16 05:50:48 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 16 Feb 2021 11:50:48 +0100
Subject: [Numpy-discussion] Type annotation for Numpy arrays,
 accelerators and numpy.typing
In-Reply-To: <308708215.6518305.1613466858105.JavaMail.zimbra@univ-grenoble-alpes.fr>
References: <308708215.6518305.1613466858105.JavaMail.zimbra@univ-grenoble-alpes.fr>
Message-ID: <CABL7CQiMdCQ5r5okt+Va+r93qSnndCf6iFoJnn_Cw2wJJc5Uzw@mail.gmail.com>

On Tue, Feb 16, 2021 at 10:20 AM PIERRE AUGIER <
pierre.augier at univ-grenoble-alpes.fr> wrote:

> Hi,
>
> When Numpy 1.20 was released, I discovered numpy.typing and its
> documentation https://numpy.org/doc/stable/reference/typing.html
>
> I know that it is very new but I'm a bit lost. A good API to describe
> Array type would be useful not only for type checkers but also for Python
> accelerators using ndarrays (in particular Pythran, Numba, Cython,
> Transonic).
>
> For Transonic, I'd like to be able to use internally numpy.typing to have
> a better implementation of what we need in transonic.typing (in particular
> compatible with type checkers like MyPy).
>
> However, it seems that I can't do anything with what I see today in
> numpy.typing.
>

> For Python-Numpy accelerators, we need to be able to define precise array
> types to limit the compilation time and give useful hints for optimizations
> (ndim, partial or full shape). We also need fused types.
>

Hi Pierre, I think what you are getting at is that ArrayLike isn't useful
for accelerators, right?  ArrayLike is needed to add annotations to
functions that use np.asarray to coerce their inputs, which may be scalars,
lists, etc. That's indeed never what you want for an accelerator, and it'd
be great if people stopped writing that kind of code - but we're stuck with
a lot of it in SciPy and many other downstream libraries.

For your purposes, I think you want one of two things:
1. functions that only take `ndarray`, or maybe at most `Union[float,
ndarray]`
2. perhaps in the future, a well-defined array Protocol, to support
multiple array types (this is hinted at in
https://data-apis.github.io/array-api/latest/design_topics/static_typing.html
)

You don't need numpy.typing for (1), you can directly annotate with `x :
np.ndarray`


> What can be done with Transonic is described in these pages:
> https://transonic.readthedocs.io/en/latest/examples/type_hints.html and
> https://transonic.readthedocs.io/en/latest/generated/transonic.typing.html
>
> I think it would be good to be able to do things like that with
> numpy.typing. It may be already possible but I can't find how in the doc.
>

Two things that are still work-in-progress are annotating arrays with
dtypes and with shapes. Your examples already have that, so that's useful
input. For C/F-contiguity, I believe that's useful but normally shouldn't
show up in user-facing APIs (only in internal helper routines) so probably
less urgent.

For dtype annotations, a lot of work is being done at the moment by Bas van
Beek. Example: https://github.com/numpy/numpy/pull/18128. That all turns
out to be quite complex, because there's so many valid ways of specifying a
dtype. It's the same kind of flexibility problem as with `asarray` - the
complexity is needed to correctly type current code in NumPy, SciPy et al.,
but it's not what you want for an accelerator. For that you'd want to
accept only one way of spelling this, `dtype=<one of a fixed set of dtype
literals>`.


> I can give few examples here. First very simple:
>
> from transonic import Array
>
> Af3d = Array[float, "3d"]
>
> # Note that this can also be written without Array just as
> Af3d = "float[:,:,:]"
>
> # same thing but only contiguous C ordered
> Af3d = Array[float, "3d", "C"]
>
> Note: being able to limit the compilation just for C-aligned arrays is
> very important since it can drastically decrease the compilation
> time/memory and that some numerical kernels are anyway written to be
> efficient only with C (or Fortran) ordered arrays.
>
> # 2d color image
> A_im = Array[np.int16, "[:,:,3]"]
>
> Now, fused types. This example is taken from a real life case (
> https://foss.heptapod.net/fluiddyn/fluidsim/-/blob/branch/default/fluidsim/base/time_stepping/pseudo_spect.py)
> so it's really useful in practice.
>

Yes definitely useful, there's also a lot of Cython code in downstream
libraries that shows this.

Annotations for fused types, when dtypes are just type literals, should
hopefully work out of the box with TypeVar without us having to do anything
special in numpy.

Cheers,
Ralf


> from transonic import Type, NDim, Array, Union
>
> N = NDim(2, 3, 4)
> A = Array[np.complex128, N, "C"]
> Am1 = Array[np.complex128, N - 1, "C"]
>
> N123 = NDim(1, 2, 3)
> A123c = Array[np.complex128, N123, "C"]
> A123f = Array[np.float64, N123, "C"]
>
> T = Type(np.float64, np.complex128)
> A1 = Array[T, N, "C"]
> A2 = Array[T, N - 1, "C"]
> ArrayDiss = Union[A1, A2]
>
> To summarize, type annotations are and will also be used for Python-Numpy
> accelerators. It would be good to also consider this application when
> designing numpy.typing.
>
> Cheers,
> Pierre
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/19037e89/attachment.html>

From friedrichromstedt at gmail.com  Tue Feb 16 06:40:56 2021
From: friedrichromstedt at gmail.com (Friedrich Romstedt)
Date: Tue, 16 Feb 2021 12:40:56 +0100
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAE_ApoD-1gi2uSgsmdeUONfqBsNxuLBy5xxeZpHqNv6fewOHXg@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
 <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
 <CAN06=Cw0P91gGZpHHrewrBjek+9+szo-3n2=OKLFbrL_GTQvQg@mail.gmail.com>
 <CAE_ApoD-1gi2uSgsmdeUONfqBsNxuLBy5xxeZpHqNv6fewOHXg@mail.gmail.com>
Message-ID: <CAN06=CxfoPPU3x4aqRATNUAOdi+5XyHDSX82zviHgyD0q8aneA@mail.gmail.com>

Hi Lev,

Am Di., 16. Feb. 2021 um 11:50 Uhr schrieb Lev Maximov <lev.maximov at gmail.com>:
>
> I've reproduced the error you've described and got rid of it without valgrind.
> Those two lines are enough to avoid the segfault.

Okay, good to know, I'll try it! Thanks for looking into it.

> But feel free to find it yourself :)

Yes :-D

Best wishes,
Friedrich

From jfoxrabinovitz at gmail.com  Tue Feb 16 10:49:23 2021
From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz)
Date: Tue, 16 Feb 2021 10:49:23 -0500
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
 <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
 <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>
Message-ID: <CAAa1KPaHvvGvqdHvm2pXC2nzteraW2vNFeME1wJ+mBf-5rihnQ@mail.gmail.com>

I'm getting a generally lukewarm not negative response. Should we put it to
a vote?

- Joe

On Fri, Feb 12, 2021, 16:06 Robert Kern <robert.kern at gmail.com> wrote:

> On Fri, Feb 12, 2021 at 3:42 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>> On Fri, Feb 12, 2021 at 9:21 PM Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>> On Fri, Feb 12, 2021 at 1:47 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Fri, Feb 12, 2021 at 7:25 PM Sebastian Berg <
>>>> sebastian at sipsolutions.net> wrote:
>>>>
>>>>>
>>>>> Right, my initial feeling it that without such context `atleast_3d` is
>>>>> pretty surprising.  So I wonder if we can design `atleast_nd` in a way
>>>>> that it is explicit about this context.
>>>>>
>>>>
>>>> Agreed. I think such a use case is probably too specific to design a
>>>> single function for, at least in such a hardcoded way.
>>>>
>>>
>>> That might be an argument for not designing a new one (or at least not
>>> giving it such a name). Not sure it's a good argument for removing a
>>> long-standing one.
>>>
>>
>> I agree. I'm not sure deprecating is best. But introducing new
>> functionality where `nd(pos=3) != 3d` is also not great.
>>
>> At the very least, atleast_3d should be better documented. It also is
>> telling that Juan (a long-time) scikit-image dev doesn't like atleast_3d
>> and there's very little usage of it in scikit-image.
>>
>
> I'm fairly neutral on atleast_nd(). I think that for n=1 and n=2, you can
> derive The One Way to Do It from broadcasting semantics, but for n>=3, I'm
> not sure there's much value in trying to systematize it to a single
> convention. I think that once you get up to those dimensions, you start to
> want to have domain-specific semantics. I do agree that, in retrospect,
> atleast_3d() probably should have been named more specifically. It was of a
> piece of other conveniences like dstack() that did special things to
> support channel-last images (and implicitly treat 3D arrays as such). For
> example, DL frameworks that assemble channeled images into minibatches
> (with different conventions like BHWC and BCHW), you'd want the n=4
> behavior to do different things. I _think_ you'd just want to do those with
> different functions than a complicated set of arguments to one function.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/a6d71f7b/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Feb 16 11:00:12 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 16 Feb 2021 10:00:12 -0600
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=CxfoPPU3x4aqRATNUAOdi+5XyHDSX82zviHgyD0q8aneA@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
 <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
 <CAN06=Cw0P91gGZpHHrewrBjek+9+szo-3n2=OKLFbrL_GTQvQg@mail.gmail.com>
 <CAE_ApoD-1gi2uSgsmdeUONfqBsNxuLBy5xxeZpHqNv6fewOHXg@mail.gmail.com>
 <CAN06=CxfoPPU3x4aqRATNUAOdi+5XyHDSX82zviHgyD0q8aneA@mail.gmail.com>
Message-ID: <1c350429f25ade94f7b2b5a97aee7e7666c24bd7.camel@sipsolutions.net>

On Tue, 2021-02-16 at 12:40 +0100, Friedrich Romstedt wrote:
> Hi Lev,
> 
> Am Di., 16. Feb. 2021 um 11:50 Uhr schrieb Lev Maximov <
> lev.maximov at gmail.com>:
> > 
> > I've reproduced the error you've described and got rid of it
> > without valgrind.
> > Those two lines are enough to avoid the segfault.
> 
> Okay, good to know, I'll try it! Thanks for looking into it.

Yeah, sorry if I was too fuzzy.  Your error was random, and checking
valgrind in that case is often helpful and typically quick (it runs
slow, but not much preparation needed).
Especially because you reported it succeeding sometimes, where
"uninitialized" might help, although I guess a `gdb` backtrace in the
crash case might have been just as clear.

With debugging symbols in Python (a full debug build makes sense), it
mentioned "suboffsets" in a function name for me (maybe when a crash
happened), a debug Python will also default to a debug malloc:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONMALLOC
Which would not have been very useful here, but could be if you access
a Python object after it was free'd for example.

Uninitialized + "suboffsets" seemed fairly clear, but I may have
underestimated it alot because I recognize "suboffsets" for buffers
immediately.

Cheers,

Sebastian
  

> 
> > But feel free to find it yourself :)
> 
> Yes :-D
> 
> Best wishes,
> Friedrich
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/80a0348c/attachment.sig>

From sebastian at sipsolutions.net  Tue Feb 16 18:08:48 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 16 Feb 2021 17:08:48 -0600
Subject: [Numpy-discussion] What to do about structured string dtype and
 string regression?
Message-ID: <f96b2391ed6e80011ce2802baef66271ba1fe495.camel@sipsolutions.net>

Hi all,

In https://github.com/numpy/numpy/issues/18407 it was reported that
there is a regression for `np.array()` and friends in NumPy 1.20 for
code such as:

    np.array(["1234"], dtype=("U1", 4))
    # NumPy 1.20: array(['1', '1', '1', '1'], dtype='<U1')
    # NumPy 1.19: array(['1', '2', '3', '4'], dtype='<U1')


The Basics
----------

This happens when you ask for a rare "subarray" dtype, ways to create
it are:

    np.dtype(("U1", 4))
    np.dtype("(4)U1,")  # (does not have a field, only a subarray)

Both of which give the same subarray dtype a "U1" dtype with shape 4. 
One thing to know about these dtypes is that they cannot be attached to
an array:

    np.zeros(3, dtype="(4)U1,").dtype == "U1"
    np.zeros(3, dtype="(4)U1,").shape == (3, 4)

I.e. the shape is moved/added into the array itself (instead of
remaining part of the dtype).

The Change
----------

Now what/why did something change?  When filling subarray dtypes, NumPy
normally fills every element with the same input. In the above case in
most cases NumPy will give the 1.20 result because it assigns "1234" to
every subarray element individually; maybe confusingly, this truncates
so that only the "1" is actually assigned, we can proof it with a
structured dtype (same result in 1.19 and 1.20):

    >>> np.array(["1234"], dtype="(4)U1,i")
    array([(['1', '1', '1', '1'], 1234)],
          dtype=[('f0', '<U1', (4,)), ('f1', '<i4')])

Another, weirder case which changed (more obviously for the better is:

    >>> np.array("1234", dtype="(4)U1,")
    # Numpy 1.20: array(['1', '1', '1', '1'], dtype='<U1')
    # NumPy 1.19: array(['1', '', '', ''], dtype='<U1')

And, to point it out, we can have subarrays that are not 1-D:

    >>> np.array(["12"],dtype=("(2,2)U1,"))
    array([[['1', '1'],
        ['2', '2']]], dtype='<U1')  # NumPy 1.19, 1.20 all is '1'


The Cause
---------

The cause of the 1.19 behaviour is two-fold:

1. The "subarray" part of the dtype is moved into the array after the
dimension is found. At this point strings are always considered
"scalars".  In most above examples, the new array shape is (1,)+(4,).

2. When filling the new array with values, it now has an _additional_
dimension!  Because of this, the string is now suddenly considered a
sequence, so it behaves the same as if `list("1234")`.  Although,
normally, NumPy would never consider a string a sequence.


The Solution?
-------------

I honestly don't have one.  We can consider strings as sequences in
this weird special case.  That will probably create other weird special
cases, but they would be even more hidden (I expect mainly odder things
throwing an error).

Should we try to document this better in the release notes or can we
think of some better (or at least louder) solution?


Cheers,

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/9c8d9e5d/attachment.sig>

From stefanv at berkeley.edu  Tue Feb 16 18:25:03 2021
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Tue, 16 Feb 2021 15:25:03 -0800
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAAa1KPaHvvGvqdHvm2pXC2nzteraW2vNFeME1wJ+mBf-5rihnQ@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
 <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
 <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>
 <CAAa1KPaHvvGvqdHvm2pXC2nzteraW2vNFeME1wJ+mBf-5rihnQ@mail.gmail.com>
Message-ID: <21e8caad-2cc4-4c80-8878-e14b929d9018@www.fastmail.com>

On Tue, Feb 16, 2021, at 07:49, Joseph Fox-Rabinovitz wrote:
> I'm getting a generally lukewarm not negative response. Should we put it to a vote?

Things here don't typically get decided by vote?I think you'll have to build towards consensus.  It may be overkill to write a NEP, but outlining a proposed solution along with pros and cons and getting everyone on board is necessary.

The API surface is a touchy issue, and so it is difficult to get new features like these added.

Ralf has been working towards this idea, but having a well-organised namespace of utility functions outside of the core NumPy API would be helpful in allowing expansion and experimentation, without making the current situation worse (where we effectively have to support things forever).  As an example, take Cartesian product [0] and array combinations [1], which have been requested several times on StackOverflow, but there's nowhere to put them.

St?fan

[0] https://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays#comment22769580_1235363

[1] https://stackoverflow.com/questions/16003217/n-d-version-of-itertools-combinations-in-numpy/16008578#16008578
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/f1e6f93a/attachment.html>

From shoyer at gmail.com  Tue Feb 16 20:13:29 2021
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 16 Feb 2021 17:13:29 -0800
Subject: [Numpy-discussion] What to do about structured string dtype and
 string regression?
In-Reply-To: <f96b2391ed6e80011ce2802baef66271ba1fe495.camel@sipsolutions.net>
References: <f96b2391ed6e80011ce2802baef66271ba1fe495.camel@sipsolutions.net>
Message-ID: <CAEQ_Tvf692hZmJD9mNksY0Q-KRy3FWrwb9JSchSnTV4MLbTVXQ@mail.gmail.com>

On Tue, Feb 16, 2021 at 3:13 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> In https://github.com/numpy/numpy/issues/18407 it was reported that
> there is a regression for `np.array()` and friends in NumPy 1.20 for
> code such as:
>
>     np.array(["1234"], dtype=("U1", 4))
>     # NumPy 1.20: array(['1', '1', '1', '1'], dtype='<U1')
>     # NumPy 1.19: array(['1', '2', '3', '4'], dtype='<U1')
>
>
> The Basics
> ----------
>
> This happens when you ask for a rare "subarray" dtype, ways to create
> it are:
>
>     np.dtype(("U1", 4))
>     np.dtype("(4)U1,")  # (does not have a field, only a subarray)
>
> Both of which give the same subarray dtype a "U1" dtype with shape 4.
> One thing to know about these dtypes is that they cannot be attached to
> an array:
>
>     np.zeros(3, dtype="(4)U1,").dtype == "U1"
>     np.zeros(3, dtype="(4)U1,").shape == (3, 4)
>
> I.e. the shape is moved/added into the array itself (instead of
> remaining part of the dtype).
>
> The Change
> ----------
>
> Now what/why did something change?  When filling subarray dtypes, NumPy
> normally fills every element with the same input. In the above case in
> most cases NumPy will give the 1.20 result because it assigns "1234" to
> every subarray element individually; maybe confusingly, this truncates
> so that only the "1" is actually assigned, we can proof it with a
> structured dtype (same result in 1.19 and 1.20):
>
>     >>> np.array(["1234"], dtype="(4)U1,i")
>     array([(['1', '1', '1', '1'], 1234)],
>           dtype=[('f0', '<U1', (4,)), ('f1', '<i4')])
>
> Another, weirder case which changed (more obviously for the better is:
>
>     >>> np.array("1234", dtype="(4)U1,")
>     # Numpy 1.20: array(['1', '1', '1', '1'], dtype='<U1')
>     # NumPy 1.19: array(['1', '', '', ''], dtype='<U1')
>
> And, to point it out, we can have subarrays that are not 1-D:
>
>     >>> np.array(["12"],dtype=("(2,2)U1,"))
>     array([[['1', '1'],
>         ['2', '2']]], dtype='<U1')  # NumPy 1.19, 1.20 all is '1'
>
>
> The Cause
> ---------
>
> The cause of the 1.19 behaviour is two-fold:
>
> 1. The "subarray" part of the dtype is moved into the array after the
> dimension is found. At this point strings are always considered
> "scalars".  In most above examples, the new array shape is (1,)+(4,).
>
> 2. When filling the new array with values, it now has an _additional_
> dimension!  Because of this, the string is now suddenly considered a
> sequence, so it behaves the same as if `list("1234")`.  Although,
> normally, NumPy would never consider a string a sequence.
>
>
> The Solution?
> -------------
>
> I honestly don't have one.  We can consider strings as sequences in
> this weird special case.  That will probably create other weird special
> cases, but they would be even more hidden (I expect mainly odder things
> throwing an error).
>
> Should we try to document this better in the release notes or can we
> think of some better (or at least louder) solution?
>

There are way too many unsafe assumptions in this example. It's an edge
case of an edge case.

I don't think we should be beholden to continuing to support this
behavior, which was obviously never anticipated. If there was a way to
raise a warning or error in potentially ambiguous situations like this, I
would support it.


> Cheers,
>
> Sebastian
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/c5c81cf7/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Feb 16 20:54:29 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 16 Feb 2021 19:54:29 -0600
Subject: [Numpy-discussion] NumPy Community Meeting Wednesday
Message-ID: <ee267cff5c1955a79ce0632f502a0903ea149121.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting Wednesday February 17th at 12pm
Pacific Time (20:00 UTC). Everyone is invited and encouraged to
join in and edit the work-in-progress meeting topics and notes at:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210216/fbf14c10/attachment.sig>

From ralf.gommers at gmail.com  Wed Feb 17 05:15:50 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 17 Feb 2021 11:15:50 +0100
Subject: [Numpy-discussion] What to do about structured string dtype and
 string regression?
In-Reply-To: <CAEQ_Tvf692hZmJD9mNksY0Q-KRy3FWrwb9JSchSnTV4MLbTVXQ@mail.gmail.com>
References: <f96b2391ed6e80011ce2802baef66271ba1fe495.camel@sipsolutions.net>
 <CAEQ_Tvf692hZmJD9mNksY0Q-KRy3FWrwb9JSchSnTV4MLbTVXQ@mail.gmail.com>
Message-ID: <CABL7CQiMStnSht3P0QOfgwBmNbO+kMy+eAb79sDq9ByAg9w11g@mail.gmail.com>

On Wed, Feb 17, 2021 at 2:14 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Tue, Feb 16, 2021 at 3:13 PM Sebastian Berg <sebastian at sipsolutions.net>
> wrote:
>
>> Hi all,
>>
>> In https://github.com/numpy/numpy/issues/18407 it was reported that
>> there is a regression for `np.array()` and friends in NumPy 1.20 for
>> code such as:
>>
>>     np.array(["1234"], dtype=("U1", 4))
>>     # NumPy 1.20: array(['1', '1', '1', '1'], dtype='<U1')
>>     # NumPy 1.19: array(['1', '2', '3', '4'], dtype='<U1')
>>
>>
>> The Basics
>> ----------
>>
>> This happens when you ask for a rare "subarray" dtype, ways to create
>> it are:
>>
>>     np.dtype(("U1", 4))
>>     np.dtype("(4)U1,")  # (does not have a field, only a subarray)
>>
>> Both of which give the same subarray dtype a "U1" dtype with shape 4.
>> One thing to know about these dtypes is that they cannot be attached to
>> an array:
>>
>>     np.zeros(3, dtype="(4)U1,").dtype == "U1"
>>     np.zeros(3, dtype="(4)U1,").shape == (3, 4)
>>
>> I.e. the shape is moved/added into the array itself (instead of
>> remaining part of the dtype).
>>
>> The Change
>> ----------
>>
>> Now what/why did something change?  When filling subarray dtypes, NumPy
>> normally fills every element with the same input. In the above case in
>> most cases NumPy will give the 1.20 result because it assigns "1234" to
>> every subarray element individually; maybe confusingly, this truncates
>> so that only the "1" is actually assigned, we can proof it with a
>> structured dtype (same result in 1.19 and 1.20):
>>
>>     >>> np.array(["1234"], dtype="(4)U1,i")
>>     array([(['1', '1', '1', '1'], 1234)],
>>           dtype=[('f0', '<U1', (4,)), ('f1', '<i4')])
>>
>> Another, weirder case which changed (more obviously for the better is:
>>
>>     >>> np.array("1234", dtype="(4)U1,")
>>     # Numpy 1.20: array(['1', '1', '1', '1'], dtype='<U1')
>>     # NumPy 1.19: array(['1', '', '', ''], dtype='<U1')
>>
>> And, to point it out, we can have subarrays that are not 1-D:
>>
>>     >>> np.array(["12"],dtype=("(2,2)U1,"))
>>     array([[['1', '1'],
>>         ['2', '2']]], dtype='<U1')  # NumPy 1.19, 1.20 all is '1'
>>
>>
>> The Cause
>> ---------
>>
>> The cause of the 1.19 behaviour is two-fold:
>>
>> 1. The "subarray" part of the dtype is moved into the array after the
>> dimension is found. At this point strings are always considered
>> "scalars".  In most above examples, the new array shape is (1,)+(4,).
>>
>> 2. When filling the new array with values, it now has an _additional_
>> dimension!  Because of this, the string is now suddenly considered a
>> sequence, so it behaves the same as if `list("1234")`.  Although,
>> normally, NumPy would never consider a string a sequence.
>>
>>
>> The Solution?
>> -------------
>>
>> I honestly don't have one.  We can consider strings as sequences in
>> this weird special case.  That will probably create other weird special
>> cases, but they would be even more hidden (I expect mainly odder things
>> throwing an error).
>>
>> Should we try to document this better in the release notes or can we
>> think of some better (or at least louder) solution?
>>
>
I was honestly surprised there's even such a thing as a "subarray data
type", I've never seen it used in the wild. Looking at the release notes
you already have,
https://numpy.org/devdocs/release/1.20.0-notes.html#arrays-cannot-be-using-subarray-dtypes,
all I'm thinking is that no one should ever be writing code like that.


> There are way too many unsafe assumptions in this example. It's an edge
> case of an edge case.
>
> I don't think we should be beholden to continuing to support this
> behavior, which was obviously never anticipated. If there was a way to
> raise a warning or error in potentially ambiguous situations like this, I
> would support it.
>

+1

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210217/bc43ce54/attachment.html>

From ralf.gommers at gmail.com  Wed Feb 17 05:35:52 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 17 Feb 2021 11:35:52 +0100
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <21e8caad-2cc4-4c80-8878-e14b929d9018@www.fastmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
 <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
 <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>
 <CAAa1KPaHvvGvqdHvm2pXC2nzteraW2vNFeME1wJ+mBf-5rihnQ@mail.gmail.com>
 <21e8caad-2cc4-4c80-8878-e14b929d9018@www.fastmail.com>
Message-ID: <CABL7CQhmjOSqwhs8Hh4yN0PzHPEF1H7miKMNt_jCnAUZErwZzg@mail.gmail.com>

On Wed, Feb 17, 2021 at 12:26 AM Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> On Tue, Feb 16, 2021, at 07:49, Joseph Fox-Rabinovitz wrote:
>
> I'm getting a generally lukewarm not negative response. Should we put it
> to a vote?
>
>
> Things here don't typically get decided by vote?I think you'll have to
> build towards consensus.  It may be overkill to write a NEP, but outlining
> a proposed solution along with pros and cons and getting everyone on board
> is necessary.
>
> The API surface is a touchy issue, and so it is difficult to get new
> features like these added.
>

This function is less bad than most similar utility functions, because it
starts with atleast_ so from a "function browsing" end user perspective
it's not much additional clutter. But it does still force other libraries
to do work because they aim to be compatible to numpy's main namespace
(e.g. see jax.numpy).

And there's 6-7 maintainers all not strongly opposed but also not
enthusiastic.


> Ralf has been working towards this idea, but having a well-organised
> namespace of utility functions outside of the core NumPy API would be
> helpful in allowing expansion and experimentation, without making the
> current situation worse (where we effectively have to support things
> forever).  As an example, take Cartesian product [0] and array combinations
> [1], which have been requested several times on StackOverflow, but there's
> nowhere to put them.
>

This is a good point. If we could put it in `numpy.lib` without it bleeding
into the main namespace, saying yes here would be easier. Maybe we can give
it a conditional yes based on that namespace reorganization?

Cheers,
Ralf


> St?fan
>
> [0]
> https://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays#comment22769580_1235363
>
> [1]
> https://stackoverflow.com/questions/16003217/n-d-version-of-itertools-combinations-in-numpy/16008578#16008578
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210217/f7a616e8/attachment-0001.html>

From sebastian at sipsolutions.net  Wed Feb 17 11:20:12 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 17 Feb 2021 10:20:12 -0600
Subject: [Numpy-discussion] What to do about structured string dtype and
 string regression?
In-Reply-To: <CABL7CQiMStnSht3P0QOfgwBmNbO+kMy+eAb79sDq9ByAg9w11g@mail.gmail.com>
References: <f96b2391ed6e80011ce2802baef66271ba1fe495.camel@sipsolutions.net>
 <CAEQ_Tvf692hZmJD9mNksY0Q-KRy3FWrwb9JSchSnTV4MLbTVXQ@mail.gmail.com>
 <CABL7CQiMStnSht3P0QOfgwBmNbO+kMy+eAb79sDq9ByAg9w11g@mail.gmail.com>
Message-ID: <87b33b42eb10ca773b7fa1ad0a4f2551747e0e9f.camel@sipsolutions.net>

On Wed, 2021-02-17 at 11:15 +0100, Ralf Gommers wrote:
> On Wed, Feb 17, 2021 at 2:14 AM Stephan Hoyer <shoyer at gmail.com>
> wrote:
> 
> > On Tue, Feb 16, 2021 at 3:13 PM Sebastian Berg < 
> > sebastian at sipsolutions.net>
> > wrote:
> > 
> > > Hi all,
> > > 
> > > In https://github.com/numpy/numpy/issues/18407?it was reported
> > > that
> > > there is a regression for `np.array()` and friends in NumPy 1.20
> > > for
> > > code such as:
> > > 
> > > ??? np.array(["1234"], dtype=("U1", 4))
> > > ??? # NumPy 1.20: array(['1', '1', '1', '1'], dtype='<U1')
> > > ??? # NumPy 1.19: array(['1', '2', '3', '4'], dtype='<U1')
> > > 
> > > 
> > > The Basics
> > > ----------
> > > 
> > > This happens when you ask for a rare "subarray" dtype, ways to
> > > create
> > > it are:
> > > 
> > > ??? np.dtype(("U1", 4))
> > > ??? np.dtype("(4)U1,")? # (does not have a field, only a
> > > subarray)
> > > 
> > > Both of which give the same subarray dtype a "U1" dtype with
> > > shape 4.
> > > One thing to know about these dtypes is that they cannot be
> > > attached to
> > > an array:
> > > 
> > > ??? np.zeros(3, dtype="(4)U1,").dtype == "U1"
> > > ??? np.zeros(3, dtype="(4)U1,").shape == (3, 4)
> > > 
> > > I.e. the shape is moved/added into the array itself (instead of
> > > remaining part of the dtype).
> > > 
> > > The Change
> > > ----------
> > > 
> > > Now what/why did something change?? When filling subarray dtypes,
> > > NumPy
> > > normally fills every element with the same input. In the above
> > > case in
> > > most cases NumPy will give the 1.20 result because it assigns
> > > "1234" to
> > > every subarray element individually; maybe confusingly, this
> > > truncates
> > > so that only the "1" is actually assigned, we can proof it with a
> > > structured dtype (same result in 1.19 and 1.20):
> > > 
> > > ??? >>> np.array(["1234"], dtype="(4)U1,i")
> > > ??? array([(['1', '1', '1', '1'], 1234)],
> > > ????????? dtype=[('f0', '<U1', (4,)), ('f1', '<i4')])
> > > 
> > > Another, weirder case which changed (more obviously for the
> > > better is:
> > > 
> > > ??? >>> np.array("1234", dtype="(4)U1,")
> > > ??? # Numpy 1.20: array(['1', '1', '1', '1'], dtype='<U1')
> > > ??? # NumPy 1.19: array(['1', '', '', ''], dtype='<U1')
> > > 
> > > And, to point it out, we can have subarrays that are not 1-D:
> > > 
> > > ??? >>> np.array(["12"],dtype=("(2,2)U1,"))
> > > ??? array([[['1', '1'],
> > > ??????? ['2', '2']]], dtype='<U1')? # NumPy 1.19, 1.20 all is '1'
> > > 
> > > 
> > > The Cause
> > > ---------
> > > 
> > > The cause of the 1.19 behaviour is two-fold:
> > > 
> > > 1. The "subarray" part of the dtype is moved into the array after
> > > the
> > > dimension is found. At this point strings are always considered
> > > "scalars".? In most above examples, the new array shape is
> > > (1,)+(4,).
> > > 
> > > 2. When filling the new array with values, it now has an
> > > _additional_
> > > dimension!? Because of this, the string is now suddenly
> > > considered a
> > > sequence, so it behaves the same as if `list("1234")`.? Although,
> > > normally, NumPy would never consider a string a sequence.
> > > 
> > > 
> > > The Solution?
> > > -------------
> > > 
> > > I honestly don't have one.? We can consider strings as sequences
> > > in
> > > this weird special case.? That will probably create other weird
> > > special
> > > cases, but they would be even more hidden (I expect mainly odder
> > > things
> > > throwing an error).
> > > 
> > > Should we try to document this better in the release notes or can
> > > we
> > > think of some better (or at least louder) solution?
> > > 
> > 
> I was honestly surprised there's even such a thing as a "subarray
> data
> type", I've never seen it used in the wild. Looking at the release
> notes
> you already have,
>  
> https://numpy.org/devdocs/release/1.20.0-notes.html#arrays-cannot-be-using-subarray-dtypes
> ,
> all I'm thinking is that no one should ever be writing code like
> that.
> 

Sure, if you look at the big picture its arguably weird or even plain
wrong.  I guess the spelled out question here should have been:

    Does anyone think there is enough usage of this in the wild to
    worry about it?

based on the current response, it seems, and I hope not...

> 
> > There are way too many unsafe assumptions in this example. It's an
> > edge
> > case of an edge case.
> > 
> > I don't think we should be beholden to continuing to support this
> > behavior, which was obviously never anticipated. If there was a way
> > to
> > raise a warning or error in potentially ambiguous situations like
> > this, I
> > would support it.
> > 
> 

We can warn for all subarrays (including deprecation), but that is
probably too noisy/much.
We probably can flag subarray+strings and warn in that case. Just a
full undo seems tricky.  What I mean is a warning like:

    Oops, string+subarray can lead to weird things and unfortunately
    a fix in behaviour means 1.20 may have a different result compared
    to <1.19.x. (you are seeing the new behaviour, see release notes)

If that sounds useful, I can do it, but it will lead to an unavoidable
warning.

Cheers,

Sebastian


> +1
> 
> Ralf
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210217/3c2dad83/attachment.sig>

From oscar.j.benjamin at gmail.com  Wed Feb 17 15:26:01 2021
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Wed, 17 Feb 2021 20:26:01 +0000
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CABL7CQhmjOSqwhs8Hh4yN0PzHPEF1H7miKMNt_jCnAUZErwZzg@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
 <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
 <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>
 <CAAa1KPaHvvGvqdHvm2pXC2nzteraW2vNFeME1wJ+mBf-5rihnQ@mail.gmail.com>
 <21e8caad-2cc4-4c80-8878-e14b929d9018@www.fastmail.com>
 <CABL7CQhmjOSqwhs8Hh4yN0PzHPEF1H7miKMNt_jCnAUZErwZzg@mail.gmail.com>
Message-ID: <CAHVvXxQNeB=N-tpiFPkPE4DNeqLpo3=FxX9xqAQNCepULmAf3Q@mail.gmail.com>

On Wed, 17 Feb 2021 at 10:36, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
> On Wed, Feb 17, 2021 at 12:26 AM Stefan van der Walt <stefanv at berkeley.edu> wrote:
>>
>> Ralf has been working towards this idea, but having a well-organised namespace of utility functions outside of the core NumPy API would be helpful in allowing expansion and experimentation, without making the current situation worse (where we effectively have to support things forever).  As an example, take Cartesian product [0] and array combinations [1], which have been requested several times on StackOverflow, but there's nowhere to put them.
>
> This is a good point. If we could put it in `numpy.lib` without it bleeding into the main namespace, saying yes here would be easier. Maybe we can give it a conditional yes based on that namespace reorganization?

As an aside is this numpy.lib idea explained anywhere?

I've been thinking about something possibly similar for sympy which
also has a bloated top-level namespace (and has no other place for
public API to go).


Oscar

From tyler.je.reddy at gmail.com  Wed Feb 17 22:27:40 2021
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Wed, 17 Feb 2021 20:27:40 -0700
Subject: [Numpy-discussion] ANN: SciPy 1.6.1
Message-ID: <CAHPuU_Yu5c8zsKFAqPD1ww1h0s48M4TePeJS6a6CwDU65wgGAA@mail.gmail.com>

Hi all,

On behalf of the SciPy development team I'm pleased to announce
the release of SciPy 1.6.1, which is a bug fix release.

Sources and binary wheels can be found at:
https://pypi.org/project/scipy/
and at: https://github.com/scipy/scipy/releases/tag/v1.6.1

One of a few ways to install this release with pip:

pip install scipy==1.6.1

=====================
SciPy 1.6.1 Release Notes
=====================

SciPy 1.6.1 is a bug-fix release with no new features
compared to 1.6.0.

Please note that for SciPy wheels to correctly install with Pip on
macOS 11, Pip >= 20.3.3 is needed.

Authors
======

* Peter Bell
* Evgeni Burovski
* CJ Carey
* Ralf Gommers
* Peter Mahler Larsen
* Cheng H. Lee +
* Cong Ma
* Nicholas McKibben
* Nikola Forr?
* Tyler Reddy
* Warren Weckesser

A total of 11 people contributed to this release.
People with a "+" by their names contributed a patch for the first time.
This list of names is automatically generated, and may not be fully
complete.

Issues closed for 1.6.1
-------------------------------

* `#13072 <https://github.com/scipy/scipy/issues/13072>`__: BLD: Quadpack
undefined references
* `#13241 <https://github.com/scipy/scipy/issues/13241>`__: Not enough
values to unpack when passing tuple to \`blocksize\`...
* `#13329 <https://github.com/scipy/scipy/issues/13329>`__: Large sparse
matrices of big integers lose information
* `#13342 <https://github.com/scipy/scipy/issues/13342>`__: fftn crashes if
shape arguments are supplied as ndarrays
* `#13356 <https://github.com/scipy/scipy/issues/13356>`__:
LSQBivariateSpline segmentation fault when quitting the Python...
* `#13358 <https://github.com/scipy/scipy/issues/13358>`__:
scipy.spatial.transform.Rotation object can not be deepcopied...
* `#13408 <https://github.com/scipy/scipy/issues/13408>`__: Type of
\`has_sorted_indices\` property
* `#13412 <https://github.com/scipy/scipy/issues/13412>`__: Sorting
spherical Voronoi vertices leads to crash in area calculation
* `#13421 <https://github.com/scipy/scipy/issues/13421>`__:
linear_sum_assignment - support for matrices with more than 2^31...
* `#13428 <https://github.com/scipy/scipy/issues/13428>`__:
\`stats.exponnorm.cdf\` returns \`nan\` for small values of \`K\`...
* `#13465 <https://github.com/scipy/scipy/issues/13465>`__:
KDTree.count_neighbors : 0xC0000005 error for tuple of different...
* `#13468 <https://github.com/scipy/scipy/issues/13468>`__:
directed_hausdorff issue with shuffle
* `#13472 <https://github.com/scipy/scipy/issues/13472>`__: Failures on
FutureWarnings with numpy 1.20.0 for lfilter, sosfilt...
* `#13565 <https://github.com/scipy/scipy/issues/13565>`__: BUG: 32-bit
wheels repo test failure in optimize

Pull requests for 1.6.1
-----------------------------

* `#13318 <https://github.com/scipy/scipy/pull/13318>`__: REL: prepare for
SciPy 1.6.1
* `#13344 <https://github.com/scipy/scipy/pull/13344>`__: BUG: fftpack
doesn't work with ndarray shape argument
* `#13345 <https://github.com/scipy/scipy/pull/13345>`__: MAINT: Replace
scipy.take with numpy.take in FFT function docstrings.
* `#13354 <https://github.com/scipy/scipy/pull/13354>`__: BUG: optimize:
rename private functions to include leading underscore
* `#13387 <https://github.com/scipy/scipy/pull/13387>`__: BUG: Support
big-endian platforms and big-endian WAVs
* `#13394 <https://github.com/scipy/scipy/pull/13394>`__: BUG: Fix Python
crash by allocating larger array in LSQBivariateSpline
* `#13400 <https://github.com/scipy/scipy/pull/13400>`__: BUG: sparse:
Better validation for BSR ctor
* `#13403 <https://github.com/scipy/scipy/pull/13403>`__: BUG: sparse:
Propagate dtype through CSR/CSC constructors
* `#13414 <https://github.com/scipy/scipy/pull/13414>`__: BUG: maintain
dtype of SphericalVoronoi regions
* `#13422 <https://github.com/scipy/scipy/pull/13422>`__: FIX: optimize:
use npy_intp to store array dims for lsap
* `#13425 <https://github.com/scipy/scipy/pull/13425>`__: BUG: spatial:
make Rotation picklable
* `#13426 <https://github.com/scipy/scipy/pull/13426>`__: BUG:
\`has_sorted_indices\` and \`has_canonical_format\` should...
* `#13430 <https://github.com/scipy/scipy/pull/13430>`__: BUG: stats: Fix
exponnorm.cdf and exponnorm.sf for small K
* `#13470 <https://github.com/scipy/scipy/pull/13470>`__: MAINT: silence
warning generated by \`spatial.directed_hausdorff\`
* `#13473 <https://github.com/scipy/scipy/pull/13473>`__: TST: fix failures
due to new FutureWarnings in NumPy 1.21.dev0
* `#13479 <https://github.com/scipy/scipy/pull/13479>`__: MAINT: update
directed_hausdorff Cython code
* `#13485 <https://github.com/scipy/scipy/pull/13485>`__: BUG: KDTree
weighted count_neighbors doesn't work between two...
* `#13503 <https://github.com/scipy/scipy/pull/13503>`__: TST: fix
\`test_fortranfile_read_mixed_record\` on big-endian...
* `#13518 <https://github.com/scipy/scipy/pull/13518>`__: DOC: document
that pip >= 20.3.3 is needed for macOS 11
* `#13520 <https://github.com/scipy/scipy/pull/13520>`__: BLD: update reqs
based on oldest-supported-numpy in pyproject.toml
* `#13567 <https://github.com/scipy/scipy/pull/13567>`__: TST, BUG: adjust
tol on test_equivalence

Checksums
=========

MD5
~~~

6312f6644420a0ad11f9dfb80aaa0560
 scipy-1.6.1-cp37-cp37m-macosx_10_9_x86_64.whl
0018622e5d32ca0cc690db152a371889  scipy-1.6.1-cp37-cp37m-manylinux1_i686.whl
7612dc5ebc5928d606b6f486e0edabad
 scipy-1.6.1-cp37-cp37m-manylinux1_x86_64.whl
bcbc57efab027e9c74fe4be8ac1b6470
 scipy-1.6.1-cp37-cp37m-manylinux2014_aarch64.whl
49d9b5824b22c87d184214497fec1079  scipy-1.6.1-cp37-cp37m-win32.whl
929834c270b3056997717bbcee58809c  scipy-1.6.1-cp37-cp37m-win_amd64.whl
4a862104bb2add633ead9a28496356ae
 scipy-1.6.1-cp38-cp38-macosx_10_9_x86_64.whl
c0dc4f798d0acc015c5fb36d3d97f4ed  scipy-1.6.1-cp38-cp38-manylinux1_i686.whl
8f0dce3503871db857f44a3ffb5800f6
 scipy-1.6.1-cp38-cp38-manylinux1_x86_64.whl
e4ee2176f25684d1cd3d21f0db5906ed
 scipy-1.6.1-cp38-cp38-manylinux2014_aarch64.whl
8589661ea9a320746aef8299cd16f32f  scipy-1.6.1-cp38-cp38-win32.whl
819424a909991eec489441880709a97c  scipy-1.6.1-cp38-cp38-win_amd64.whl
e7ea30f4dc26b79a3a2b9446afd4c572
 scipy-1.6.1-cp39-cp39-macosx_10_9_x86_64.whl
d8f7678b426174aba4a6184803d90c5a  scipy-1.6.1-cp39-cp39-manylinux1_i686.whl
d8f5ec24b15fef9786a6233c28003753
 scipy-1.6.1-cp39-cp39-manylinux1_x86_64.whl
4a832944f71c5f7b019f6539475647a2
 scipy-1.6.1-cp39-cp39-manylinux2014_aarch64.whl
5fff9d3f673e4ae73e76f02ea8544aa3  scipy-1.6.1-cp39-cp39-win32.whl
b03f9713b7b9be7fa019ab3c94c72254  scipy-1.6.1-cp39-cp39-win_amd64.whl
98a860ce2d6296cace333d0a07501f13  scipy-1.6.1.tar.gz
5cd15c4b4abf2e24ed05dbde9e7b90c8  scipy-1.6.1.tar.xz
a3c4bf7491ea0ab49bc8b149334f50f7  scipy-1.6.1.zip

SHA256
~~~~~~

a15a1f3fc0abff33e792d6049161b7795909b40b97c6cc2934ed54384017ab76
 scipy-1.6.1-cp37-cp37m-macosx_10_9_x86_64.whl
e79570979ccdc3d165456dd62041d9556fb9733b86b4b6d818af7a0afc15f092
 scipy-1.6.1-cp37-cp37m-manylinux1_i686.whl
a423533c55fec61456dedee7b6ee7dce0bb6bfa395424ea374d25afa262be261
 scipy-1.6.1-cp37-cp37m-manylinux1_x86_64.whl
33d6b7df40d197bdd3049d64e8e680227151673465e5d85723b3b8f6b15a6ced
 scipy-1.6.1-cp37-cp37m-manylinux2014_aarch64.whl
6725e3fbb47da428794f243864f2297462e9ee448297c93ed1dcbc44335feb78
 scipy-1.6.1-cp37-cp37m-win32.whl
5fa9c6530b1661f1370bcd332a1e62ca7881785cc0f80c0d559b636567fab63c
 scipy-1.6.1-cp37-cp37m-win_amd64.whl
bd50daf727f7c195e26f27467c85ce653d41df4358a25b32434a50d8870fc519
 scipy-1.6.1-cp38-cp38-macosx_10_9_x86_64.whl
f46dd15335e8a320b0fb4685f58b7471702234cba8bb3442b69a3e1dc329c345
 scipy-1.6.1-cp38-cp38-manylinux1_i686.whl
0e5b0ccf63155d90da576edd2768b66fb276446c371b73841e3503be1d63fb5d
 scipy-1.6.1-cp38-cp38-manylinux1_x86_64.whl
2481efbb3740977e3c831edfd0bd9867be26387cacf24eb5e366a6a374d3d00d
 scipy-1.6.1-cp38-cp38-manylinux2014_aarch64.whl
68cb4c424112cd4be886b4d979c5497fba190714085f46b8ae67a5e4416c32b4
 scipy-1.6.1-cp38-cp38-win32.whl
5f331eeed0297232d2e6eea51b54e8278ed8bb10b099f69c44e2558c090d06bf
 scipy-1.6.1-cp38-cp38-win_amd64.whl
0c8a51d33556bf70367452d4d601d1742c0e806cd0194785914daf19775f0e67
 scipy-1.6.1-cp39-cp39-macosx_10_9_x86_64.whl
83bf7c16245c15bc58ee76c5418e46ea1811edcc2e2b03041b804e46084ab627
 scipy-1.6.1-cp39-cp39-manylinux1_i686.whl
794e768cc5f779736593046c9714e0f3a5940bc6dcc1dba885ad64cbfb28e9f0
 scipy-1.6.1-cp39-cp39-manylinux1_x86_64.whl
5da5471aed911fe7e52b86bf9ea32fb55ae93e2f0fac66c32e58897cfb02fa07
 scipy-1.6.1-cp39-cp39-manylinux2014_aarch64.whl
8e403a337749ed40af60e537cc4d4c03febddcc56cd26e774c9b1b600a70d3e4
 scipy-1.6.1-cp39-cp39-win32.whl
a5193a098ae9f29af283dcf0041f762601faf2e595c0db1da929875b7570353f
 scipy-1.6.1-cp39-cp39-win_amd64.whl
c4fceb864890b6168e79b0e714c585dbe2fd4222768ee90bc1aa0f8218691b11
 scipy-1.6.1.tar.gz
2800f47a5040cbab05b3ce58f1dfb670c70232b0f56d30380c6fd4ef4e787df5
 scipy-1.6.1.tar.xz
18601effa06aba0e9f34475b6d34b3d7454feabe8b0f96bcc483b3fd38b0afc2
 scipy-1.6.1.zip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210217/1246de8d/attachment-0001.html>

From shoyer at gmail.com  Thu Feb 18 03:07:52 2021
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 18 Feb 2021 00:07:52 -0800
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CABL7CQhmjOSqwhs8Hh4yN0PzHPEF1H7miKMNt_jCnAUZErwZzg@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
 <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
 <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>
 <CAAa1KPaHvvGvqdHvm2pXC2nzteraW2vNFeME1wJ+mBf-5rihnQ@mail.gmail.com>
 <21e8caad-2cc4-4c80-8878-e14b929d9018@www.fastmail.com>
 <CABL7CQhmjOSqwhs8Hh4yN0PzHPEF1H7miKMNt_jCnAUZErwZzg@mail.gmail.com>
Message-ID: <CAEQ_TvfygDGa2H4PkYsi1kDZJzo6fA0KA8-F4cOkNCmuXJNR0A@mail.gmail.com>

On Wed, Feb 17, 2021 at 2:37 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Wed, Feb 17, 2021 at 12:26 AM Stefan van der Walt <stefanv at berkeley.edu>
> wrote:
>
>> On Tue, Feb 16, 2021, at 07:49, Joseph Fox-Rabinovitz wrote:
>>
>> I'm getting a generally lukewarm not negative response. Should we put it
>> to a vote?
>>
>>
>> Things here don't typically get decided by vote?I think you'll have to
>> build towards consensus.  It may be overkill to write a NEP, but outlining
>> a proposed solution along with pros and cons and getting everyone on board
>> is necessary.
>>
>> The API surface is a touchy issue, and so it is difficult to get new
>> features like these added.
>>
>
> This function is less bad than most similar utility functions, because it
> starts with atleast_ so from a "function browsing" end user perspective
> it's not much additional clutter. But it does still force other libraries
> to do work because they aim to be compatible to numpy's main namespace
> (e.g. see jax.numpy).
>
> And there's 6-7 maintainers all not strongly opposed but also not
> enthusiastic.
>

I agree with Ralf's assessment.

This function feels like a natural generalization of existing NumPy
functionality, but we don't expand NumPy's API without use-cases. That's
just a waste of time for everyone involved.

I am most moved by Juan's report that he has the "very distinct impression
of needing it repeatedly," but I would still love to see concrete examples
of where users have found this be helpful.

It is not a hard function to write, so if it was useful I would expect to
see some version of it in an existing open source project or at least on
StackOverflow.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210218/bb8c830f/attachment.html>

From ralf.gommers at gmail.com  Thu Feb 18 05:10:21 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 18 Feb 2021 11:10:21 +0100
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CAHVvXxQNeB=N-tpiFPkPE4DNeqLpo3=FxX9xqAQNCepULmAf3Q@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
 <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
 <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>
 <CAAa1KPaHvvGvqdHvm2pXC2nzteraW2vNFeME1wJ+mBf-5rihnQ@mail.gmail.com>
 <21e8caad-2cc4-4c80-8878-e14b929d9018@www.fastmail.com>
 <CABL7CQhmjOSqwhs8Hh4yN0PzHPEF1H7miKMNt_jCnAUZErwZzg@mail.gmail.com>
 <CAHVvXxQNeB=N-tpiFPkPE4DNeqLpo3=FxX9xqAQNCepULmAf3Q@mail.gmail.com>
Message-ID: <CABL7CQiBkj4zzcAe2MdUq9rrv+DKWmG3itdTwtktW45D1UUHxA@mail.gmail.com>

On Wed, Feb 17, 2021 at 9:26 PM Oscar Benjamin <oscar.j.benjamin at gmail.com>
wrote:

> On Wed, 17 Feb 2021 at 10:36, Ralf Gommers <ralf.gommers at gmail.com> wrote:
> >
> > On Wed, Feb 17, 2021 at 12:26 AM Stefan van der Walt <
> stefanv at berkeley.edu> wrote:
> >>
> >> Ralf has been working towards this idea, but having a well-organised
> namespace of utility functions outside of the core NumPy API would be
> helpful in allowing expansion and experimentation, without making the
> current situation worse (where we effectively have to support things
> forever).  As an example, take Cartesian product [0] and array combinations
> [1], which have been requested several times on StackOverflow, but there's
> nowhere to put them.
> >
> > This is a good point. If we could put it in `numpy.lib` without it
> bleeding into the main namespace, saying yes here would be easier. Maybe we
> can give it a conditional yes based on that namespace reorganization?
>
> As an aside is this numpy.lib idea explained anywhere?
>

It isn't, but it's relatively straightforward and can be done without
thinking about the issues around our other namespaces. Basically:
- today `numpy.lib` is a public but fairly useless namespace, because its
contents get star-imported to the main namespace; only subsubmodules like
`numpy.lib.stride_tricks` are separate
- we want to stop this star-importing, which required some tedious work of
fixing up how we handle __all__ dicts in addition to making exports explicit
- then, we would like to use `numpy.lib` as a namespace for utilities and
assorted functionality that people seem to want, but does not meet the bar
for the main namespace and doesn't fit in our other decent namespace (fft,
linalg, random, polynomial, f2py, distutils).
- TBD if there should be subsubmodules under `numpy.lib` or not
- it should be explicitly documented that this is a "lower bar namespace"
and that we discourage other array/tensor libraries from copying its API

We had a good discussion about this in the community meeting yesterday.
Sebastian volunteered to sort out the star-import issue.


> I've been thinking about something possibly similar for sympy which
> also has a bloated top-level namespace (and has no other place for
> public API to go).
>

A larger plan for cleaning up main namespace bloat, as well as dealing with
our unmaintained namespaces (numpy.dual, numpy.emath, etc.) is still needed.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210218/f3b4bb55/attachment.html>

From matti.picus at gmail.com  Thu Feb 18 06:33:19 2021
From: matti.picus at gmail.com (Matti Picus)
Date: Thu, 18 Feb 2021 13:33:19 +0200
Subject: [Numpy-discussion] Speed up large arrays with PNumPy
Message-ID: <019f2967-d069-bb33-f548-c18b82941304@gmail.com>

I am pleased to announce the first release of PNumPy: a project to 
seamlessly speed up NumPy for large arrays (64K+ elements) with no 
change required to your existing NumPy code. PNumPy supports Linux, 
Windows, and MacOS on top of NumPy >= 1.18 for python 3.6, 3.7, 3.8, and 
3.9. This first release speeds up NumPy binary and unary ufuncs such as 
add, multiply, isnan, abs, sin, log, sum, min and many more. PNumPy also 
speeds up functions sort, argsort, lexsort, arange, boolean indexing, 
and fancy indexing. In the near future it will speed up: astype, where, 
putmask, and searchsorted. Other packages that use NumPy, such as 
scikit-learn or pandas, will also be sped up for large arrays. Once 
installed via "pip install pnumpy", you can trigger it by "import 
pnumpy". This will import and modify NumPy by replacing functionality 
under-the-hood. More information at 
https://quansight.github.io/pnumpy/stable/index.html 
<https://quansight.github.io/pnumpy/stable/index.html>.

This project is a collaboration between RTOS Holdings and Quansight. 
Thanks to those companies for their support, and to everyone who 
contributed to this release. Thanks also to the original holder of the 
pnumpy pypi project, who agreed to allow us to adopt the name. Their 
project is still available as pnumpy<2.0.

Matti

From oscar.j.benjamin at gmail.com  Thu Feb 18 09:39:16 2021
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Thu, 18 Feb 2021 14:39:16 +0000
Subject: [Numpy-discussion] ENH: Proposal to add atleast_nd function
In-Reply-To: <CABL7CQiBkj4zzcAe2MdUq9rrv+DKWmG3itdTwtktW45D1UUHxA@mail.gmail.com>
References: <CAAa1KPappZcJAFcHaO79GfY_eijwvrbeZgmJWwbCF7wZL_f0XQ@mail.gmail.com>
 <1001ae35de9d51204170cfb5742a6ffab6e89990.camel@sipsolutions.net>
 <E9B3376B-2DFD-40BF-8546-7D72FDE25EE0@fastmail.com>
 <CAEQ_Tvcs4M=Lv4kfdEokJDgQYT9ZPgjXx_czkpV3W5hAJq8Dqg@mail.gmail.com>
 <CANNq6Fk6iuDyK4=9ckYU9xLrJjg42ZxRgPVyaxdaM5Z7teEOaA@mail.gmail.com>
 <CAEQ_TveUc55Uxx_nXmT7bV=APQ0R+WOBAvX1p2J1VH+SMZKEBA@mail.gmail.com>
 <CAL1kJvAzR4_CWcDjgkmxyox_5g=+k3vMg8vfLiz2s=SOHXYA8Q@mail.gmail.com>
 <FA47F4D3-B978-4E94-A3AC-1C0040514360@fastmail.com>
 <CAL1kJvDqoOdemLoCkk+qu8NPGYP6BJ6Kg2gH+KP4A_WiCcUGig@mail.gmail.com>
 <CAF6FJiuHR3+-m1WCdiqS5+7QgMCW1qrpMNDCYtZmKxiOg20fPg@mail.gmail.com>
 <CAAa1KPb4Hw__4mhwFrQV9fZ-bdA-8SxPXgaNKFmzLLTiugq-+A@mail.gmail.com>
 <CAF6FJisdGaAjDtcw2LV4ABc8zujp5ShPAFxro-=L0itFq3JZRw@mail.gmail.com>
 <417beafed3212391571b55dcd10c0e6e4311034e.camel@sipsolutions.net>
 <CABL7CQifajU_SeoN1buefywWD2ojQV86pKUqfNtJd_PQq+zNmg@mail.gmail.com>
 <CAF6FJiv9_9vQ=0zNF-5f1KQjkhPFE2GWSZPmC9X4iNeSOgLpZw@mail.gmail.com>
 <CABL7CQi-h91v3hMLLavddc92YNZCL-v0TVm9drgaOfJ6XpOQRw@mail.gmail.com>
 <CAF6FJitspkE5x-ppGdw_cPV1qaa=CS+am8MPuqm04BEF=pwu8w@mail.gmail.com>
 <CAAa1KPaHvvGvqdHvm2pXC2nzteraW2vNFeME1wJ+mBf-5rihnQ@mail.gmail.com>
 <21e8caad-2cc4-4c80-8878-e14b929d9018@www.fastmail.com>
 <CABL7CQhmjOSqwhs8Hh4yN0PzHPEF1H7miKMNt_jCnAUZErwZzg@mail.gmail.com>
 <CAHVvXxQNeB=N-tpiFPkPE4DNeqLpo3=FxX9xqAQNCepULmAf3Q@mail.gmail.com>
 <CABL7CQiBkj4zzcAe2MdUq9rrv+DKWmG3itdTwtktW45D1UUHxA@mail.gmail.com>
Message-ID: <CAHVvXxRKnJ5WscWFhyqnWYQXhROe1YTUO31JbD5J-U8Nc7DzFg@mail.gmail.com>

On Thu, 18 Feb 2021 at 10:11, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Wed, Feb 17, 2021 at 9:26 PM Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>>
>> On Wed, 17 Feb 2021 at 10:36, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>> >
>> > On Wed, Feb 17, 2021 at 12:26 AM Stefan van der Walt <stefanv at berkeley.edu> wrote:
>> >>
>> >> Ralf has been working towards this idea, but having a well-organised namespace of utility functions outside of the core NumPy API would be helpful in allowing expansion and experimentation, without making the current situation worse (where we effectively have to support things forever).  As an example, take Cartesian product [0] and array combinations [1], which have been requested several times on StackOverflow, but there's nowhere to put them.
>> >
>> > This is a good point. If we could put it in `numpy.lib` without it bleeding into the main namespace, saying yes here would be easier. Maybe we can give it a conditional yes based on that namespace reorganization?
>>
>> As an aside is this numpy.lib idea explained anywhere?
>
>
> It isn't, but it's relatively straightforward and can be done without thinking about the issues around our other namespaces. Basically:
> - today `numpy.lib` is a public but fairly useless namespace, because its contents get star-imported to the main namespace; only subsubmodules like `numpy.lib.stride_tricks` are separate

Okay, that's a bit different from what I was thinking of for sympy.
The problem for sympy is that everything is either in the top-level
sympy namespace or is just directly imported from the module where it
is defined. That means that there is no proper separation between
public and private apart from being in the top-level namespace which
is already bloated on the one hand and incomplete on the other since
we obviously can't put *everything* there.

Even something is simple as deleting a no longer needed internal
function or renaming an "internal" module is potentially problematic
in sympy. I was thinking about having a sympy.public module (and
submodules) and documenting that as the expected public interface for
importing *anything* from sympy. Potentially that could be called
sympy.lib which would seem consistent with numpy although having the
same name could be problematic if the intent is not necessarily the
same.

> - we want to stop this star-importing, which required some tedious work of fixing up how we handle __all__ dicts in addition to making exports explicit
> - then, we would like to use `numpy.lib` as a namespace for utilities and assorted functionality that people seem to want, but does not meet the bar for the main namespace and doesn't fit in our other decent namespace (fft, linalg, random, polynomial, f2py, distutils).
> - TBD if there should be subsubmodules under `numpy.lib` or not
> - it should be explicitly documented that this is a "lower bar namespace" and that we discourage other array/tensor libraries from copying its API
>
> We had a good discussion about this in the community meeting yesterday. Sebastian volunteered to sort out the star-import issue.

I already removed all the star-imports from sympy which was somewhat tedious.

Sebastian you might be interested in the script I wrote below. It
extracts all of the star-imported names from a module and formats the
__all__ and import lines for the __init__.py file. I used it to create
e.g. this:
https://github.com/sympy/sympy/blob/master/sympy/__init__.py#L51-L491
I think that flake8 spots if the import list and the __all__ get out
of sync so it's not so hard to maintain later on.

You just tell the script what package the __init__.py is and what
submodules to import like:

$ my/fmt_imports.py numpy.lib type_check index_tricks
__all__ = [
    'iscomplexobj', 'isrealobj', 'imag', 'iscomplex', 'isreal', 'nan_to_num',
    'real', 'real_if_close', 'typename', 'asfarray', 'mintypecode',
    'asscalar', 'common_type',

    'ravel_multi_index', 'unravel_index', 'mgrid', 'ogrid', 'r_', 'c_', 's_',
    'index_exp', 'ix_', 'ndenumerate', 'ndindex', 'fill_diagonal',
    'diag_indices', 'diag_indices_from',
]
from .type_check import (iscomplexobj, isrealobj, imag, iscomplex, isreal,
        nan_to_num, real, real_if_close, typename, asfarray, mintypecode,
        asscalar, common_type)

from .index_tricks import (ravel_multi_index, unravel_index, mgrid, ogrid, r_,
        c_, s_, index_exp, ix_, ndenumerate, ndindex, fill_diagonal,
        diag_indices, diag_indices_from)


The script is:

#!/usr/bin/env python

from __future__ import print_function
from importlib import import_module

import __future__
future_imports = dir(__future__)

def main(pkgname, *submodules):
    imports = find_imports(pkgname, submodules)
    pretty_all(imports, submodules)
    pretty_imports(imports, submodules)

def find_imports(pkgname, submodules):
    imports = {}
    for modname in submodules:
        modpath = pkgname + '.' + modname
        mod = import_module(modpath)
        mall = getattr(mod, '__all__', None)
        if mall is None:
            namespace = {}
            exec('from %s.%s import *' % (pkgname, modname), namespace)
            mall = sorted(namespace)
            mall = [n for n in mall if not n.startswith('_')]
            mall = [n for n in mall if not n in future_imports]
        imports[modname] = mall
    return imports

def pretty_imports(imports, submodules):
    for modname in submodules:
        print(pretty_import(modname, imports[modname]))
        print()

def pretty_all(imports, submodules):
    lines = ['__all__ = [']
    for modname in submodules:
        strings = ["'%s'" % name for name in imports[modname]]
        line = 4*' ' + strings[0] + ','
        for s in strings[1:]:
            new_line = line + ' ' + s + ','
            if len(new_line) <= 78:
                line = new_line
            else:
                lines.append(line)
                line = '    ' + s + ','
        lines.append(line)
        lines.append('')
    lines.append(']')
    print('\n'.join(lines))

def pretty_import(modname, names):
    line = 'from .%s import ' % modname
    names_str = ', '.join(names)
    if len(line + names_str) <= 78:
        line = line + names_str
    else:
        lines = [line + '(']
        for n, name in enumerate(names):
            space = n != 0
            comma = n != len(names) - 1
            new_line = lines[-1]
            if space:
                new_line += ' '
            new_line += name
            if comma:
                new_line += ','
            else:
                new_line += ')'
            if len(new_line) <= 78:
                lines[-1] = new_line
            else:
                next_line = 8 * ' ' + name
                if comma:
                    next_line += ','
                else:
                    next_line += ')'
                lines.append(next_line)
        line = '\n'.join(lines)
    return line

if __name__ == "__main__":
    import sys
    main(*sys.argv[1:])

--
Oscar

From sebastian at sipsolutions.net  Fri Feb 19 11:24:46 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 19 Feb 2021 10:24:46 -0600
Subject: [Numpy-discussion] Cleanup of the `np.lib` namespace (was: ENH:
 Proposal to add atleast_nd function)
Message-ID: <6fb59e76230df8e1007d1da9fa6bc51a33ac9aaa.camel@sipsolutions.net>

On Thu, 2021-02-18 at 11:10 +0100, Ralf Gommers wrote:
> On Wed, Feb 17, 2021 at 9:26 PM Oscar Benjamin <
> 
<snip>
> > 
> 
> It isn't, but it's relatively straightforward and can be done without
> thinking about the issues around our other namespaces. Basically:
> - today `numpy.lib` is a public but fairly useless namespace, because
> its
> contents get star-imported to the main namespace; only subsubmodules
> like
> `numpy.lib.stride_tricks` are separate
> - we want to stop this star-importing, which required some tedious
> work of
> fixing up how we handle __all__ dicts in addition to making exports
> explicit
> - then, we would like to use `numpy.lib` as a namespace for utilities
> and
> assorted functionality that people seem to want, but does not meet
> the bar


Initilaly, I did not like the idea of using `np.lib.xyz.function` a lot
(the `lib` feels a bit unnecessary, compared to fft, etc.), but I am
starting to warm up to it more. At least it is at least easy to
remember.


> for the main namespace and doesn't fit in our other decent namespace
> (fft,
> linalg, random, polynomial, f2py, distutils).
> - TBD if there should be subsubmodules under `numpy.lib` or not


I think we should discuss this part :).  I was thinking that we should
probably aim for _only_ having (or showing) submodules in `lib`?

In particular, I was aiming for hiding all names in `np.lib` that are
also part of the main namespace (not necessarily deprecating, thanks to
Python 3.7+ magic).
The reason is, that I would like to be able to find the "interesting"
submodules with tab completion and I won't be able to do that, if I
have a wall of random functions.

That would leave us with the _current_ "submodules":

* mixins
* scimath (same as np.emath though)

And the ones which are mostly fully exposed to the main namespace:

* type_check
* index_tricks
* function_base
* nanfunctions
* shape_base
* stride_tricks
* twodim_base
* ufunclike
* histograms

* polynomial  (not to be confused with numpy.polynomial)
* utils
* arraysetops
* npyio  (includes some additional rarely used funcs)
* arraypad
* recfunctions  (needs to be explicitly imported currently)

Looking at those, there are a few I would consider hiding completely
(since they don't add anything, and the name doesn't feel great):

* function_base
* ufunlike
* type_check
* twodim_base
* arraypad
* utils
* polynomial  (due to the name confusion with `np.polynomial`)

While I am probably OK with keeping the others around, although for
some the namespace won't be well groomed (e.g. there is another
`shape_base` in `numpy.core`.
Others, like `arraysetops` are fully exported to the main namespace,
but represent complete topical groupings, so I am less sure what we
should aim for there.

Cheers,

Sebastian


> - it should be explicitly documented that this is a "lower bar
> namespace"
> and that we discourage other array/tensor libraries from copying its
> API
> 
> We had a good discussion about this in the community meeting
> yesterday.
> Sebastian volunteered to sort out the star-import issue.
> 
> 
> > I've been thinking about something possibly similar for sympy which
> > also has a bloated top-level namespace (and has no other place for
> > public API to go).
> > 
> 
> A larger plan for cleaning up main namespace bloat, as well as
> dealing with
> our unmaintained namespaces (numpy.dual, numpy.emath, etc.) is still
> needed.
> 
> Cheers,
> Ralf
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210219/e81533ac/attachment.sig>

From ralf.gommers at gmail.com  Sat Feb 20 14:51:13 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sat, 20 Feb 2021 20:51:13 +0100
Subject: [Numpy-discussion] Cleanup of the `np.lib` namespace (was: ENH:
 Proposal to add atleast_nd function)
In-Reply-To: <6fb59e76230df8e1007d1da9fa6bc51a33ac9aaa.camel@sipsolutions.net>
References: <6fb59e76230df8e1007d1da9fa6bc51a33ac9aaa.camel@sipsolutions.net>
Message-ID: <CABL7CQj1QdS6_MKKwmGd=GZSzmWfrGVh0qDw2qdxi3skKP8vDQ@mail.gmail.com>

On Fri, Feb 19, 2021 at 5:25 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Thu, 2021-02-18 at 11:10 +0100, Ralf Gommers wrote:
> > On Wed, Feb 17, 2021 at 9:26 PM Oscar Benjamin <
> >
> <snip>
> > >
> >
> > It isn't, but it's relatively straightforward and can be done without
> > thinking about the issues around our other namespaces. Basically:
> > - today `numpy.lib` is a public but fairly useless namespace, because
> > its
> > contents get star-imported to the main namespace; only subsubmodules
> > like
> > `numpy.lib.stride_tricks` are separate
> > - we want to stop this star-importing, which required some tedious
> > work of
> > fixing up how we handle __all__ dicts in addition to making exports
> > explicit
> > - then, we would like to use `numpy.lib` as a namespace for utilities
> > and
> > assorted functionality that people seem to want, but does not meet
> > the bar
>
>
> Initilaly, I did not like the idea of using `np.lib.xyz.function` a lot
> (the `lib` feels a bit unnecessary, compared to fft, etc.), but I am
> starting to warm up to it more. At least it is at least easy to
> remember.
>

The `lib` feels quite necessary to me - it's either cordoning it off
nicely, or moving it to another package completely I'd say. Otherwise we're
again mixing the few high-quality and well-defined `np.xxx` namespaces with
more random stuff.


>
> > for the main namespace and doesn't fit in our other decent namespace
> > (fft,
> > linalg, random, polynomial, f2py, distutils).
> > - TBD if there should be subsubmodules under `numpy.lib` or not
>
>
> I think we should discuss this part :).  I was thinking that we should
> probably aim for _only_ having (or showing) submodules in `lib`?
>
> In particular, I was aiming for hiding all names in `np.lib` that are
> also part of the main namespace (not necessarily deprecating, thanks to
> Python 3.7+ magic).
> The reason is, that I would like to be able to find the "interesting"
> submodules with tab completion and I won't be able to do that, if I
> have a wall of random functions.
>

That sounds like a good idea (the hiding, not deprecating).


> That would leave us with the _current_ "submodules":
>
> * mixins
>
* scimath (same as np.emath though)
>

Mixins seems good to keep, assuming there's a plan to add some more mixins.
A namespace with a single object in it is a bit weird.

scimath is duplicate with emath, so should get an underscore.


> And the ones which are mostly fully exposed to the main namespace:
>
> * type_check
> * index_tricks
> * function_base
> * nanfunctions
> * shape_base
> * stride_tricks
> * twodim_base
> * ufunclike
> * histograms
>

> * polynomial  (not to be confused with numpy.polynomial)
> * utils
> * arraysetops
> * npyio  (includes some additional rarely used funcs)
> * arraypad
> * recfunctions  (needs to be explicitly imported currently)
>

> Looking at those, there are a few I would consider hiding completely
> (since they don't add anything, and the name doesn't feel great):
>
> * function_base
> * ufunlike
> * type_check
> * twodim_base
> * arraypad
> * utils
> * polynomial  (due to the name confusion with `np.polynomial`)
>
> While I am probably OK with keeping the others around, although for
> some the namespace won't be well groomed (e.g. there is another
> `shape_base` in `numpy.core`.
> Others, like `arraysetops` are fully exported to the main namespace,
> but represent complete topical groupings, so I am less sure what we
> should aim for there.
>

For all the ones that get imported into the main namespace, we should give
the subsubmodules an underscore in the name. It makes no sense to have the
same functionality exposed in two different places. Unless we'd want to
clean up the main namespace for some of that stuff in the long term.

Then I think we only have a few left - mixins, recfunctions, npyio,
stride_tricks.

Cheers,
Ralf


>
> Cheers,
>
> Sebastian
>
>
> > - it should be explicitly documented that this is a "lower bar
> > namespace"
> > and that we discourage other array/tensor libraries from copying its
> > API
> >
> > We had a good discussion about this in the community meeting
> > yesterday.
> > Sebastian volunteered to sort out the star-import issue.
> >
> >
> > > I've been thinking about something possibly similar for sympy which
> > > also has a bloated top-level namespace (and has no other place for
> > > public API to go).
> > >
> >
> > A larger plan for cleaning up main namespace bloat, as well as
> > dealing with
> > our unmaintained namespaces (numpy.dual, numpy.emath, etc.) is still
> > needed.
> >
> > Cheers,
> > Ralf
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210220/634759d7/attachment.html>

From ralf.gommers at gmail.com  Sun Feb 21 09:49:01 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 21 Feb 2021 15:49:01 +0100
Subject: [Numpy-discussion] NEP 48: Spending NumPy Project funds
Message-ID: <CABL7CQhZe4dqTgZZ_A8XhKzW9iWWingo4ru8iOFxEzF1-3AwdA@mail.gmail.com>

Hi all,

Here is a NEP with guidelines around spending NumPy project funds for your
consideration/review, drafted by Inessa, St?fan and myself. Please keep
discussion of the main ideas on this thread, and detailed/textual comments
on https://github.com/numpy/numpy/pull/18454.

We did quite a bit of research on relevant policies and practices in other
open source projects/communities. While we could find examples of
practices, we did not find any published policies like this. If you know of
any that are relevant, it'd be great if you could point to them.

Cheers,
Ralf


:Author: Ralf Gommers <ralf.gommers at gmail.com>
:Author: Inessa Pawson <inessa at albuscode.org>
:Author: Stefan van der Walt <stefanv at berkeley.edu>
:Status: Draft
:Type: Informational
:Created: 2021-02-07
:Resolution:


Abstract
--------

The NumPy project has historically never received significant
**unrestricted**
funding. However, that is starting to change.  This NEP aims to provide
guidance about spending NumPy project unrestricted funds, by formulating a
set
of principles about *what* to pay for and *who* to pay. It will also touch
on
how decisions regarding spending funds get made, how funds get administered,
and transparency around these topics.


Motivation and Scope
--------------------

In its 16+ year history, the NumPy project has only spent on the order of
$10,000 USD of funds that were not restricted to a particular program.
Project
income of this type has been relying on donations from individuals and, from
May 2019, recurring monthly contributions from Tidelift. By the end of 2020,
the Tidelift contributions increased to $3,000/month, and there's also a
potential for an increase of donations and grants going directly to the
project. Having a clear set of principles around how to use these funds will
facilitate spending them fairly and effectively. Additionally, it will make
it
easier to solicit donations and other contributions.

A key assumption this NEP makes is that NumPy remains a largely
volunteer-driven project, and that the project funds are not enough to
employ
maintainers full-time. If funding increases to the point where that
assumption
is no longer true, this NEP should be updated.

In scope for this NEP are:

- Principles of spending project funds: what to pay for, and who to pay.
- Describing how NumPy's funds get administered.
- Describing how decisions to spend funds get proposed and made.

Out of scope for this NEP are:

- Making any decisions about spending project funds on a specific project or
  activity.
- Principles for spending funds that are intended for NumPy development, but
  don't fall in the category of NumPy unrestricted funds. This includes
most of
  the grant funding, which is usually earmarked for certain
  activities/deliverables and goes to an Institutional Partner rather than
  directly to the NumPy project, and companies or institutions funding
specific
  features.
  *Rationale: As a project, we have no direct control over how this work
gets
  executed. In some cases, we may not even know the contributions were
funded
  or done by an employee on work time. (Whether that's the case or not
should
  not change how we approach a contribution).  For grants though, we do
expect
  the PI and funded team to align their work with the project's needs and be
  receptive to feedback from other NumPy maintainers and contributors.*


Principles of spending project funds
------------------------------------

NumPy will likely always be a project with many times more volunteer
contributors than funded people. Therefore having those funded people
operate
in ways that attract more volunteers and enhance their participation
experience
is critical. That key principle motivates many of the more detailed
principles
given below for what to pay for and whom to pay.

What to pay for
```````````````

1. Pay for things that are important *and* otherwise won't get done.
   *Rationale: there is way more to be done than there are funds to do all
   those things. So count on interested volunteers or external sponsored
work
   to do many of those things.*
2. Plan for sustainability. Don't rely on money always being there.
3. Consider potential positive benefits for NumPy maintainers and
contributors,
   maintainers of other projects, end users, and other stakeholders like
   packagers and educators.
4. Think broadly. There's more to a project than code: websites,
documentation,
   community building, governance - it's all important.
5. For proposed funded work, include paid time for others to review your
work
   if such review is expected to be significant effort - do not just
increase
   the load on volunteer maintainers.
   *Rationale: we want the effect of spending funds to be positive for
   everyone, not just for the people getting paid. This is also a matter of
   fairness.*

When considering development work, principle (1) implies that priority
should
be giving to (a) the most boring/painful tasks that no one likes doing, and
to
necessary structural changes to the code base that are too large to be done
by
a volunteer in a reasonable amount of time.

There are also a large amount of tasks, activities, and projects outside of
development work that are important and could enhance the project or
community
- think for example of user surveys, translations, outreach, dedicated
mentoring of newcomers, community organization, website improvements, and
administrative tasks.

Time of people to perform tasks is also not the only thing that funds can be
used for: expenses for in-person developer meetings or sprints, hosted
hardware
for benchmarking or development work, and CI or other software services
could
all be good candidates to spend funds on.

Whom to pay
```````````

1. All else being equal, give preference to existing
maintainers/contributors.
2. Consider this an opportunity to make the project more diverse.
3. Pay attention to the following when considering paying someone:

   - the necessary technical or domain-specific skills to execute the tasks,
   - communication and self-management skills,
   - experience contributing to and working with open source projects.

It will likely depend on the project/tasks whether there's already a clear
best
candidate within the NumPy team, or whether we look for new people to get
involved. Before making any decisions, the decision makers should think
about
whether an opportunity should be advertised to give a wider group of people
a
chance to throw their hat in the ring for it.

Compensating fairly
```````````````````

Paying people fairly is a difficult topic. Therefore, we will only offer
some
guidance here. Final decisions will always have to be considered and
approved
by the group of people that bears this responsibility (according to the
current NumPy governance structure, this would be the NumPy Steering
Council).

Discussions on employee compensation tend to be dominated by two narratives:
"pay local market rates" and "same work -- same pay".

We consider them both extreme:

- "Same work -- same pay" is unfair to people living in locations with a
higher
  cost of living. For example, the average rent for a single family
apartment
  can differ by a large factor (from a few hundred dollar to thousands of
  dollars per month).
- "Pay local market rates" bakes in existing inequalities between countries
  and makes fixed-cost items like a development machine or a holiday trip
  abroad relatively harder to afford in locations where market rates are
lower.

We seek to find a middle ground between these two extremes.

Useful points of reference include companies like GitLab and
Buffer who are transparent about their remuneration policies ([3]_, [4]_),
Google Summer of Code stipends ([5]_), other open source projects that
manage
their budget in a transparent manner (e.g., Babel and Webpack on Open
Collective ([6]_, [7]_)), and standard salary comparison sites.

Since NumPy is a not-for-profit project, we also looked to the nonprofit
sector
for guidelines on remuneration policies and compensation levels. Our
findings
show that most smaller non-profits tend to pay a median salary/wage. We
recognize merit in this approach: applying candidates are likely to have a
genuine interest in open source, rather than to be motivated purely by
financial incentives.

Considering all of the above, we will use the following guidelines for
determining compensation:

1. Aim to compensate people appropriately, up to a level that's expected for
   senior engineers or other professionals as applicable.
2. Establish a compensation cap of $125,000 USD that cannot be exceeded even
   for the residents from the most expensive/competitive locations
([#f-pay]_).
3. For equivalent work and seniority,  a pay differential between locations
   should never be more than 2x.
   For example, if we pay $110,000 USD to a senior-level developer from New
   York, for equivalent work a senior-level developer from South-East Asia
   should be paid at least $55,000 USD. To compare locations, we will use
   `Numbeo Cost of Living calculator <https://www.numbeo.com/cost-of-living/
>`__
   (or its equivalent).

Some other considerations:

- Often, compensated work is offered for a limited amount of hours or fixed
  term. In those cases, consider compensation equivalent to a remuneration
  package that comes with permanent employment (e.g., one month of work
should
  be compensated by at most 1/12th of a full-year salary + benefits).
- When comparing rates, an individual contractor should typically make 20%
more
  than someone who is employed since they have to take care of their
benefits
  and accounting on their own.
- Some people may be happy with one-off payments towards a particular
  deliverable (e.g., hiring a cleaner or some other service to use the saved
  time for work on open source). This should be compensated at a lower rate
  compared to an individual contractor.
- When funding someone's time through their employer, that employer may
want to
  set the compensation level based on its internal rules (e.g., overhead
rates).
  Small deviations from the guidelines in this NEP may be needed in such
cases,
  however they should be within reason.
- It's entirely possible that another strategy rather than paying people for
  their time on certain tasks may turn out to be more effective. Anything
that
  helps the project and community grow and improve is worth considering.
- Transparency helps. If everyone involved is comfortable sharing their
  compensation levels with the rest of the team (or better make it public),
  it's least likely to be way off the mark for fairness.

We highly recommend that the individuals involved in decision-making about
hiring and compensation peruse the content of the References section of this
NEP. It offers a lot of helpful advice on this topic.


Defining fundable activities and projects
-----------------------------------------

We'd like to have a broader set of fundable ideas that we will prioritize
with
input from NumPy team members and the wider community. All ideas will be
documented on a single wiki page. Anyone may propose an idea. Only members
of a
NumPy team may edit the wiki page.

Each listed idea must meet the following requirements:

1. It must be clearly scoped: its description must explain the importance to
   the project, referencing the NumPy Roadmap if possible, the items to pay
for
   or activities and deliverables, and why it should be a funded activity.
2. It must contain the following metadata: title, cost, time duration or
effort
   estimate, and (if known) names of the team member(s) to execute or
coordinate.
3. It must have an assigned priority (low, medium, or high). This discussion
   can originate at a NumPy community meeting or on the mailing list.
However,
   it must be finalized on the mailing list allowing everyone to weigh in.

If a proposed idea has been assigned a high priority level, a decision on
allocating funding for it will be made on the private NumPy Steering Council
mailing list. *Rationale: these will often involve decisions about
individuals,
which is typically hard to do in public. This is the current practice that
seems to be working well.*

Sometimes, it may be practical to make a single funding decision ad-hoc
(e.g.,
"Here's a great opportunity plus the right person to execute it right now?).
However, this approach to decision-making should be used rarely.


Strategy for spending/saving funds
----------------------------------

There is an expectation from NumPy individual, corporate, and institutional
donors that the funds will be used for the benefit of the project and the
community. Therefore, we should spend available funds, thoughtfully,
strategically, and fairly, as they come in. For emergencies, we should keep
a
$10,000 - $15,000 USD reserve which could cover, for example, a year of CI
and
hosting services, 1-2 months of full-time maintenance work, or contracting a
consultant for a specific need.


How project funds get administered
----------------------------------

We will first summarize how administering of funds works today, and then
discuss how to make this process more efficient and transparent.

Currently, the project funds are held by NumFOCUS in a dedicated account.
NumFOCUS has a small accounting team, which produces an account overview as
a
set of spreadsheets on a monthly basis. These land in a shared drive,
typically
with about a one month delay (e.g., the balance and transactions for
February
are available at the end of March), where a few NumPy team members can
access
them. Expense claims and invoices are submitted through the NumFOCUS
website.
Those then show up in another spreadsheet, where a NumPy team member must
review and approve each of them before payments are made. Following NumPy
bylaws, the NumFOCUS finance subcommittee, consisting of five people, meets
every six months to review all the project related transactions. (In
practice,
there have been so few transactions that we skipped some of these meetings.)

The existing process is time-consuming and error-prone. More transparency
and
automation are desirable.


Transparency about project funds and in decision making
```````````````````````````````````````````````````````

**To discuss: do we want full transparency by publishing our accounts,
transparency to everyone on a NumPy team, or some other level?**

Ralf: I'd personally like it to be fully transparent, like through Open
Collective, so the whole community can see current balance, income and
expenses
paid out at any moment in time. Moving to Open Collective is nontrivial,
however we can publish the data elsewhere for now if we'd want to.
*Note: Google Season of Docs this year requires having an Open Collective
account, so this is likely to happen soon enough.*

Stefan/Inessa: at least a summary overview should be fully public, and all
transactions should be visible to the Steering Council. Full transparency of
all transactions is probably fine, but not necessary.

*The options here may be determined by the accounting system and amount of
effort required.*


History and current status
--------------------------

The NumPy project received its first major funding in 2017. For an overview
of
the early history of NumPy (and SciPy), including some institutions
sponsoring
time for their employees or contractors to work on NumPy, see [1]_ and
[2]_. To
date, NumPy has received four grants:

- Two grants, from the Alfred P. Sloan Foundation and the Gordon and Betty
  Moore Foundation respectively, of about $1.3M combined to the Berkeley
  Institute of Data Science. Work performed during the period 2017-2020;
  PI St?fan van der Walt.
- Two grants from the Chan Zuckerberg Foundation to NumFOCUS, for a combined
  amount of $335k. Work performed during the period 2020-2021; PI's Ralf
  Gommers (first grant) and Melissa Mendon?a (second grant).

>From 2012 onwards NumPy has been a fiscally sponsored project of NumFOCUS.
Note that fiscal sponsorship doesn't mean NumPy gets funding, rather that it
can receive funds under the umbrella of a nonprofit. See `NumFOCUS Project
Support <https://numfocus.org/projects-overview>`__ for more details.

Only since 2017 has the NumPy website displayed a "Donate" button, and since
2019 the NumPy repositories have had the GitHub Sponsors button. Before
that,
it was possible to donate to NumPy on the NumFOCUS website. The sum total of
donations from individuals to NumPy for 2017-2020 was about $6,100.

>From May 2019 onwards, Tidelift has supported NumPy financially as part of
its "managed open source" business model. From May 2019 till July 2020 this
was
$1,000/month, and it started steadily growing after that to about
$3,000/month
(as of Feb 2021).

Finally there have been other incidental project income, for example some
book
royalties from Packt Publishing, GSoC mentoring fees from Google, and
merchandise sales revenue through the NumFOCUS web shop. All of these were
small (two or three figure) amounts.

This brings the total amount of project income which did not already have a
spending target to about $35,000. Most of that is recent, from Tidelift.
Over the past 1.5 years we spent about $10,000 for work on the new NumPy
website and Sphinx theme. Those spending decisions were made by the NumPy
Steering Council and announced on the mailing list.

That leaves about $25,000 in available funds at the time of writing, and
that amount is current growing at a rate of about $3,000/month.


Related Work
------------

See references. We assume that there are other open source projects having
published guidelines on spending project funds, however we don't have
concrete
examples at this time.


Alternatives
------------

*Alternative spending strategy*: not having cash reserves. The rationale
being that NumPy is important enough that in a real emergency some person or
entity will likely jump in to help out. This is not a responsible approach
to
financial stewardship of the project though. Hence, we decided against it.


Discussion
----------


References and Footnotes
------------------------

.. [1] Pauli Virtanen et al., "SciPy 1.0: fundamental algorithms for
scientific
       computing in Python",
https://www.nature.com/articles/s41592-019-0686-2,
       2020

.. [2] Charles Harris et al., "Array programming with NumPy",
https://www.nature.com/articles/s41586-020-2649-2, 2020

.. [3] https://remote.com/blog/remote-compensation

.. [4]
https://about.gitlab.com/company/culture/all-remote/compensation/#how-do-you-decide-how-much-to-pay-people

.. [5] https://developers.google.com/open-source/gsoc/help/student-stipends

.. [6] Jurgen Appelo, "Compensation: what is fair?",
https://blog.agilityscales.com/compensation-what-is-fair-38a65a822c29, 2016

.. [7] Project Include, "Compensating fairly",
https://projectinclude.org/compensating_fairly

.. [#f-pay] This cap is derived from comparing with compensation levels at
            other open source projects (e.g., Babel, Webpack, Drupal - all
in
            the $100,000 -- $125,000 range) and Partner Institutions.

- Nadia Eghbal, "Roads and Bridges: The Unseen Labor Behind Our Digital
  Infrastructure", 2016
- Nadia Eghbal, "Working in Public: The Making and Maintenance of Open
  Source", 2020
- https://github.com/nayafia/lemonade-stand
- Daniel Oberhaus, `"The Internet Was Built on the Free Labor of Open Source
  Developers. Is That Sustainable?"
  <
https://www.vice.com/en/article/43zak3/the-internet-was-built-on-the-free-labor-of-open-source-developers-is-that-sustainable>`_,
2019
- David Heinemeier Hansson, `"The perils of mixing open source and money" <
https://dhh.dk/2013/the-perils-of-mixing-open-source-and-money.html>`_, 2013
- Danny Crichton, `"Open source sustainability" <
https://techcrunch.com/2018/06/23/open-source-sustainability/?guccounter=1>`_,
2018
- Nadia Eghbal, "Rebuilding the Cathedral",
https://www.youtube.com/watch?v=VS6IpvTWwkQ, 2017
- Nadia Eghbal, "Where money meets open source",
https://www.youtube.com/watch?v=bjAinwgvQqc&t=246s, 2017
- Eileen Uchitelle, ""The unbearable vulnerability of open source",
https://www.youtube.com/watch?v=VdwO3LQ56oM, 2017 (the inverted triangle,
open source is a funnel)
- Dries Buytaert, "Balancing Makers and Takers to scale and sustain Open
Source",
https://dri.es/balancing-makers-and-takers-to-scale-and-sustain-open-source,
2019
- Safia Abdalla, "Beyond Maintenance",
https://increment.com/open-source/beyond-maintenance/, 2019
- Xavier Damman, "Money and Open Source Communities",
https://blog.opencollective.com/money-and-open-source-communities/, 2016
- Aseem Sood, "Let's talk about money",
https://blog.opencollective.com/lets-talk-about-money/, 2017
- Alanna Irving, "Has your open source community raised money? Here's how
to spend it.",
https://blog.opencollective.com/has-your-open-source-community-raised-money-heres-how-to-spend-it/,
2017
- Alanna Irving, "Funding open source, how Webpack reached $400k+/year",
https://blog.opencollective.com/funding-open-source-how-webpack-reached-400k-year/,
2017
- Alanna Irving, "Babel's rise to financial sustainability",
https://blog.opencollective.com/babels-rise-to-financial-sustainability/,
2019
- Devon Zuegel, "The city guide to open source",
https://www.youtube.com/watch?v=80KTVu6GGSE, 2020 + blog:
https://increment.com/open-source/the-city-guide-to-open-source/

GitHub Sponsors:

-
https://github.blog/2019-05-23-announcing-github-sponsors-a-new-way-to-contribute-to-open-source/
-
https://github.blog/2020-05-12-github-sponsors-is-out-of-beta-for-sponsored-organizations/
- https://blog.opencollective.com/on-github-sponsors/, 2019
- https://blog.opencollective.com/double-the-love/, 2020
-
https://blog.opencollective.com/github-sponsors-for-companies-open-source-collective-for-people/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210221/c4b6a275/attachment-0001.html>

From ralf.gommers at gmail.com  Sun Feb 21 11:30:17 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 21 Feb 2021 17:30:17 +0100
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
Message-ID: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>

Hi all,

Here is a NEP, written together with Stephan Hoyer and Aaron Meurer, for
discussion on adoption of the array API standard (
https://data-apis.github.io/array-api/latest/). This will add a new
numpy.array_api submodule containing that standardized API. The main
purpose of this API is to be able to write code that is portable to other
array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and MXNet.

We expect this NEP to remain in draft state for quite a while, while we're
gaining experience with using it in downstream libraries, discuss adding it
to other array libraries, and finishing some of the loose ends (e.g.,
specifications for linear algebra functions that aren't merged yet, see
https://github.com/data-apis/array-api/pulls) in the API standard itself.

See
https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
for an initial discussion about this topic.

Please keep high-level discussion here and detailed comments on
https://github.com/numpy/numpy/pull/18456. Also, you can access a rendered
version of the NEP from that PR (see PR description for how), which may be
helpful.
Cheers,
Ralf


Abstract
--------

We propose to adopt the `Python array API standard`_, developed by the
`Consortium for Python Data API Standards`_. Implementing this as a separate
new namespace in NumPy will allow authors of libraries which depend on NumPy
as well as end users to write code that is portable between NumPy and all
other array/tensor libraries that adopt this standard.

.. note::

    We expect that this NEP will remain in a draft state for quite a while.
    Given the large scope we don't expect to propose it for acceptance any
    time soon; instead, we want to solicit feedback on both the high-level
    design and implementation, and learn what needs describing better in
this
    NEP or changing in either the implementation or the array API standard
    itself.


Motivation and Scope
--------------------

Python users have a wealth of choice for libraries and frameworks for
numerical computing, data science, machine learning, and deep learning. New
frameworks pushing forward the state of the art in these fields are
appearing
every year. One unintended consequence of all this activity and creativity
has been fragmentation in multidimensional array (a.k.a. tensor) libraries -
which are the fundamental data structure for these fields. Choices include
NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.

The APIs of each of these libraries are largely similar, but with enough
differences that it?s quite difficult to write code that works with multiple
(or all) of these libraries. The array API standard aims to address that
issue, by specifying an API for the most common ways arrays are constructed
and used. The proposed API is quite similar to NumPy's API, and deviates
mainly
in places where (a) NumPy made design choices that are inherently not
portable
to other implementations, and (b) where other libraries consistently
deviated
from NumPy on purpose because NumPy's design turned out to have issues or
unnecessary complexity.

For a longer discussion on the purpose of the array API standard we refer to
the `Purpose and Scope section of the array API standard <
https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`__
and the two blog posts announcing the formation of the Consortium [1]_ and
the release of the first draft version of the standard for community review
[2]_.

The scope of this NEP includes:

- Adopting the 2021 version of the array API standard
- Adding a separate namespace, tentatively named ``numpy.array_api``
- Changes needed/desired outside of the new namespace, for example new
dunder
  methods on the ``ndarray`` object
- Implementation choices, and differences between functions in the new
  namespace with those in the main ``numpy`` namespace
- A new array object conforming to the array API standard
- Maintenance effort and testing strategy
- Impact on NumPy's total exposed API surface and on other future and
  under-discussion design choices
- Relation to existing and proposed NumPy array protocols
  (``__array_ufunc__``, ``__array_function__``, ``__array_module__``).
- Required improvements to existing NumPy functionality

Out of scope for this NEP are:

- Changes in the array API standard itself. Those are likely to come up
  during review of this NEP, but should be upstreamed as needed and this NEP
  subsequently updated.


Usage and Impact
----------------

*This section will be fleshed out later, for now we refer to the use cases
given
in* `the array API standard Use Cases section <
https://data-apis.github.io/array-api/latest/use_cases.html>`__

In addition to those use cases, the new namespace contains functionality
that
is widely used and supported by many array libraries. As such, it is a good
set of functions to teach to newcomers to NumPy and recommend as "best
practice". That contrasts with NumPy's main namespace, which contains many
functions and objects that have been superceded or we consider mistakes -
but
that we can't remove because of backwards compatibility reasons.

The usage of the ``numpy.array_api`` namespace by downstream libraries is
intended to enable them to consume multiple kinds of arrays, *without having
to have a hard dependency on all of those array libraries*:

.. image:: _static/nep-0047-library-dependencies.png

Adoption in downstream libraries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The prototype implementation of the ``array_api`` namespace will be used
with
SciPy, scikit-learn and other libraries of interest that depend on NumPy, in
order to get more experience with the design and find out if any important
parts are missing.

The pattern to support multiple array libraries is intended to be something
like::

    def somefunc(x, y):
        # Retrieves standard namespace. Raises if x and y have different
        # namespaces.  See Appendix for possible get_namespace
implementation
        xp = get_namespace(x, y)
        out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
        return out

The ``get_namespace`` call is effectively the library author opting in to
using the standard API namespace, and thereby explicitly supporting
all conforming array libraries.


The ``asarray`` / ``asanyarray`` pattern
````````````````````````````````````````

Many existing libraries use the same ``asarray`` (or ``asanyarray``) pattern
as NumPy itself does; accepting any object that can be coerced into a
``np.ndarray``.
We consider this design pattern problematic - keeping in mind the Zen of
Python, *"explicit is better than implicit"*, as well as the pattern being
historically problematic in the SciPy ecosystem for ``ndarray`` subclasses
and with over-eager object creation. All other array/tensor libraries are
more strict, and that works out fine in practice. We would advise authors of
new libraries to avoid the ``asarray`` pattern. Instead they should either
accept just NumPy arrays or, if they want to support multiple kinds of
arrays, check if the incoming array object supports the array API standard
by checking for ``__array_namespace__`` as shown in the example above.

Existing libraries can do such a check as well, and only call ``asarray`` if
the check fails. This is very similar to the ``__duckarray__`` idea in
:ref:`NEP30`.


.. _adoption-application-code:

Adoption in application code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The new namespace can be seen by end users as a cleaned up and slimmed down
version of NumPy's main namespace. Encouraging end users to use this
namespace like::

    import numpy.array_api as xp

    x = xp.linspace(0, 2*xp.pi, num=100)
    y = xp.cos(x)

seems perfectly reasonable, and potentially beneficial - users get offered
only
one function for each purpose (the one we consider best-practice), and they
then write code that is more easily portable to other libraries.


Backward compatibility
----------------------

No deprecations or removals of existing NumPy APIs or other backwards
incompatible changes are proposed.


High-level design
-----------------

The array API standard consists of approximately 120 objects, all of which
have a direct NumPy equivalent. This figure shows what is included at a
high level:

.. image:: _static/nep-0047-scope-of-array-API.png

The most important changes compared to what NumPy currently offers are:

- A new array object which:

    - conforms to the casting rules and indexing behaviour specified by the
      standard,
    - does not have methods other than dunder methods,
    - does not support the full range of NumPy indexing behaviour. Advanced
      indexing with integers is not supported. Only boolean indexing
      with a single (possibly multi-dimensional) boolean array is supported.
      An indexing expression that selects a single element returns a 0-D
array
      rather than a scalar.

- Functions in the ``array_api`` namespace:

    - do not accept ``array_like`` inputs, only NumPy arrays and Python
scalars
    - do not support ``__array_ufunc__`` and ``__array_function__``,
    - use positional-only and keyword-only parameters in their signatures,
    - have inline type annotations,
    - may have minor changes to signatures and semantics of individual
      functions compared to their equivalents already present in NumPy,
    - only support dtype literals, not format strings or other ways of
      specifying dtypes

- DLPack_ support will be added to NumPy,
- New syntax for "device support" will be added, through a ``.device``
  attribute on the new array object, and ``device=`` keywords in array
creation
  functions in the ``array_api`` namespace,
- Casting rules that differ from those NumPy currently has. Output dtypes
can
  be derived from input dtypes (i.e. no value-based casting), and 0-D arrays
  are treated like >=1-D arrays.
- Not all dtypes NumPy has are part of the standard. Only boolean, signed
and
  unsigned integers, and floating-point dtypes up to ``float64`` are
supported.
  Complex dtypes are expected to be added in the next version of the
standard.
  Extended precision, string, void, object and datetime dtypes, as well as
  structured dtypes, are not included.

Improvements to existing NumPy functionality that are needed include:

- Add support for stacks of matrices to some functions in ``numpy.linalg``
  that are currently missing such support.
- Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
- Add a "never copy" mode to ``np.asarray``.


Functions in the ``array_api`` namespace
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's start with an example of a function implementation that shows the most
important differences with the equivalent function in the main namespace::

    def max(x: array, /, *,
            axis: Optional[Union[int, Tuple[int, ...]]] = None,
            keepdims: bool = False
        ) -> array:
        """
        Array API compatible wrapper for :py:func:`np.max <numpy.max>`.
        """
        return np.max._implementation(x, axis=axis, keepdims=keepdims)

This function does not accept ``array_like`` inputs, only ``ndarray``. There
are multiple reasons for this. Other array libraries all work like this.
Letting the user do coercion of lists, generators, or other foreign objects
separately results in a cleaner design with less unexpected behaviour.
It's higher-performance - less overhead from ``asarray`` calls. Static
typing
is easier. Subclasses will work as expected. And the slight increase in
verbosity
because users have to explicitly coerce to ``ndarray`` on rare occasions
seems like a small price to pay.

This function does not support ``__array_ufunc__`` nor
``__array_function__``.
These protocols serve a similar purpose as the array API standard module
itself,
but through a different mechanisms. Because only ``ndarray`` instances are
accepted,
dispatching via one of these protocols isn't useful anymore.

This function uses positional-only parameters in its signature. This makes
code
more portable - writing ``max(x=x, ...)`` is no longer valid, hence if other
libraries call the first parameter ``input`` rather than ``x``, that is
fine.
The rationale for keyword-only parameters (not shown in the above example)
is
two-fold: clarity of end user code, and it being easier to extend the
signature
in the future with keywords in the desired order.

This function has inline type annotations. Inline annotations are far
easier to
maintain than separate stub files. And because the types are simple, this
will
not result in a large amount of clutter with type aliases or unions like in
the
current stub files NumPy has.


DLPack support for zero-copy data interchange
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ability to convert one kind of array into another kind is valuable, and
indeed necessary when downstream libraries want to support multiple kinds of
arrays. This requires a well-specified data exchange protocol. NumPy already
supports two of these, namely the buffer protocol (i.e., PEP 3118), and
the ``__array_interface__`` (Python side) / ``__array_struct__`` (C side)
protocol. Both work similarly, letting the "producer" describe how the data
is laid out in memory so the "consumer" can construct its own kind of array
with a view on that data.

DLPack works in a very similar way. The main reasons to prefer DLPack over
the options already present in NumPy are:

1. DLPack is the only protocol with device support (e.g., GPUs using CUDA or
   ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other array
   libraries are not. Having one protocol per device isn't tenable, hence
   device support is a must.
2. Widespread support. DLPack has the widest adoption of all protocols, only
   NumPy is missing support. And the experiences of other libraries with it
   are positive. This contrasts with the protocols NumPy does support, which
   are used very little - when other libraries want to interoperate with
   NumPy, they typically use the (more limited, and NumPy-specific)
   ``__array__`` protocol.

Adding support for DLPack to NumPy entails:

- Adding a ``ndarray.__dlpack__`` method
- Adding a ``from_dlpack`` function, which takes as input an object
  supporting ``__dlpack__``, and returns an ``ndarray``.

DLPack is currently a ~200 LoC header, and is meant to be included
directly, so
no external dependency is needed. Implementation should be straightforward.


Syntax for device support
~~~~~~~~~~~~~~~~~~~~~~~~~

NumPy itself is CPU-only, so it clearly doesn't have a need for device
support.
However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet) support
multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
To write portable code on systems with multiple devices, it's often
necessary
to create new arrays on the same device as some other array, or check that
two arrays live on the same device. Hence syntax for that is needed.

The array object will have a ``.device`` attribute which enables comparing
devices of different arrays (they only should compare equal if both arrays
are
from the same library and it's the same hardware device). Furthermore,
``device=`` keywords in array creation functions are needed. For example::

    def empty(shape: Union[int, Tuple[int, ...]], /, *,
              dtype: Optional[dtype] = None,
              device: Optional[device] = None) -> array:
        """
        Array API compatible wrapper for :py:func:`np.empty <numpy.empty>`.
        """
        return np.empty(shape, dtype=dtype, device=device)

The implementation for NumPy may be as simple as setting the device
attribute to
the string ``'cpu'`` and raising an exception if array creation functions
encounter any other value.


Dtypes and casting rules
~~~~~~~~~~~~~~~~~~~~~~~~

The supported dtypes in this namespace are boolean, 8/16/32/64-bit signed
and
unsigned integer, and 32/64-bit floating-point dtypes. These will be added
to
the namespace as dtype literals with the expected names (e.g., ``bool``,
``uint16``, ``float64``).

The most obvious omissions are the complex dtypes. The rationale for the
lack
of complex support in the first version of the array API standard is that
several
libraries (PyTorch, MXNet) are still in the process of adding support for
complex dtypes. The next version of the standard is expected to include
``complex64``
and ``complex128`` (see `this issue <
https://github.com/data-apis/array-api/issues/102>`__
for more details).

Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is expected
to only use the dtype literals. Format strings, Python builtin dtypes, or
string representations of the dtype literals are not accepted - this will
improve readability and portability of code at little cost.

Casting rules are only defined between different dtypes of the same kind.
The
rationale for this is that mixed-kind (e.g., integer to floating-point)
casting behavior differs between libraries. NumPy's mixed-kind casting
behavior doesn't need to be changed or restricted, it only needs to be
documented that if users use mixed-kind casting, their code may not be
portable.

.. image:: _static/nep-0047-casting-rules-lattice.png

*Type promotion diagram. Promotion between any two types is given by their
join on this lattice. Only the types of participating arrays matter, not
their values. Dashed lines indicate that behaviour for Python scalars is
undefined on overflow. Boolean, integer and floating-point dtypes are not
connected, indicating mixed-kind promotion is undefined.*

The most important difference between the casting rules in NumPy and in the
array API standard is how scalars and 0-dimensional arrays are handled. In
the standard, array scalars do not exist and 0-dimensional arrays follow the
same casting rules as higher-dimensional arrays.

See the `Type Promotion Rules section of the array API standard <
https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
>`__
for more details.

.. note::

    It is not clear what the best way is to support the different casting
rules
    for 0-dimensional arrays and no value-based casting. One option may be
to
    implement this second set of casting rules, keep them private, mark the
    array API functions with a private attribute that says they adhere to
    these different rules, and let the casting machinery check whether for
    that attribute.

    This needs discussion.


Indexing
~~~~~~~~

An indexing expression that would return a scalar with ``ndarray``, e.g.
``arr_2d[0, 0]``, will return a 0-D array with the new array object. There
are
several reasons for that: array scalars are largely considered a design
mistake
which no other array library copied; it works better for non-CPU libraries
(typically arrays can live on the device, scalars live on the host); and
it's
simply a consistent design. To get a Python scalar out of a 0-D array, one
can
simply use the builtin for the type, e.g. ``float(arr_0d)``.

The other `indexing modes in the standard <
https://data-apis.github.io/array-api/latest/API_specification/indexing.html
>`__
do work largely the same as they do for ``numpy.ndarray``. One noteworthy
difference is that clipping in slice indexing (e.g., ``a[:n]`` where ``n``
is
larger than the size of the first axis) is unspecified behaviour, because
that kind of check can be expensive on accelerators.

The lack of advanced indexing, and boolean indexing being limited to a
single
n-D boolean array, is due to those indexing modes not being suitable for all
types of arrays or JIT compilation. Their absence does not seem to be
problematic; if a user or library author wants to use them, they can do so
through zero-copy conversion to ``numpy.ndarray``. This will signal
correctly
to whomever reads the code that it is then NumPy-specific rather than
portable
to all conforming array types.


The array object
~~~~~~~~~~~~~~~~

The array object in the standard does not have methods other than dunder
methods. The rationale for that is that not all array libraries have methods
on their array object (e.g., TensorFlow does not). It also provides only a
single way of doing something, rather than have functions and methods that
are effectively duplicate.

Mixing operations that may produce views (e.g., indexing, ``nonzero``)
in combination with mutation (e.g., item or slice assignment) is
`explicitly documented in the standard to not be supported <
https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
>`__.
This cannot easily be prohibited in the array object itself; instead this
will
be guidance to the user via documentation.

The standard current does not prescribe a name for the array object itself.
We propose to simply name it ``ndarray``. This is the most obvious name, and
because of the separate namespace should not clash with ``numpy.ndarray``.


Implementation
--------------

.. note::

    This section needs a lot more detail, which will gradually be added when
    the implementation progresses.

A prototype of the ``array_api`` namespace can be found in
https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
The docstring in its ``__init__.py`` has notes on completeness of the
implementation. The code for the wrapper functions also contains ``# Note:``
comments everywhere there is a difference with the NumPy API.
Two important parts that are not implemented yet are the new array object
and
DLPack support. Functions may need changes to ensure the changed casting
rules
are respected.

The array object
~~~~~~~~~~~~~~~~

Regarding the array object implementation, we plan to start with a regular
Python class that wraps a ``numpy.ndarray`` instance. Attributes and methods
can forward to that wrapped instance, applying input validation and
implementing changed behaviour as needed.

The casting rules are probably the most challenging part. The in-progress
dtype system refactor (NEPs 40-43) should make implementing the correct
casting
behaviour easier - it is already moving away from value-based casting for
example.


The dtype objects
~~~~~~~~~~~~~~~~~

We must be able to compare dtypes for equality, and expressions like these
must
be possible::

    np.array_api.some_func(..., dtype=x.dtype)

The above implies it would be nice to have ``np.array_api.float32 ==
np.array_api.ndarray(...).dtype``.

Dtypes should not be assumed to have a class hierarchy by users, however we
are
free to implement it with a class hierarchy if that's convenient. We
considered
the following options to implement dtype objects:

1. Alias dtypes to those in the main namespace. E.g.,
``np.array_api.float32 =
   np.float32``.
2. Make the dtypes instances of ``np.dtype``. E.g., ``np.array_api.float32 =
   np.dtype(np.float32)``.
3. Create new singleton classes with only the required methods/attributes
   (currently just ``__eq__``).

It seems like (2) would be easiest from the perspective of interacting with
functions outside the main namespace. And (3) would adhere best to the
standard.

TBD: the standard does not yet have a good way to inspect properties of a
dtype, to ask questions like "is this an integer dtype?". Perhaps this is
easy
enough to do for users, like so::

    def _get_dtype(dt_or_arr):
        return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else dt_or_arr

    def is_floating(dtype_or_array):
        dtype = _get_dtype(dtype_or_array)
        return dtype in (float32, float64)

    def is_integer(dtype_or_array):
        dtype = _get_dtype(dtype_or_array)
        return dtype in (uint8, uint16, uint32, uint64, int8, int16, int32,
int64)

However it could make sense to add to the standard. Note that NumPy itself
currently does not have a great for asking such questions, see
`gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210221/650366a0/attachment-0001.html>

From pearu.peterson at gmail.com  Mon Feb 22 07:08:18 2021
From: pearu.peterson at gmail.com (Pearu Peterson)
Date: Mon, 22 Feb 2021 14:08:18 +0200
Subject: [Numpy-discussion] NEP 48: Spending NumPy Project funds
In-Reply-To: <CABL7CQhZe4dqTgZZ_A8XhKzW9iWWingo4ru8iOFxEzF1-3AwdA@mail.gmail.com>
References: <CABL7CQhZe4dqTgZZ_A8XhKzW9iWWingo4ru8iOFxEzF1-3AwdA@mail.gmail.com>
Message-ID: <CAPpwKcz=RbcNWsG5JbA0=LPPe3G5xkJd+05buZuDV39mrZN20Q@mail.gmail.com>

Hi,

See GH discussion starting at
https://github.com/numpy/numpy/pull/18454#discussion_r579967791 for the
raised issue that is now moved here.

Re "Compensating fairly" section:

The NEP proposes location-dependent contracts for fair pays.

I think this is a contradictory approach as location is not the only factor
that may influence fairness. As an example, contractors may have different
levels of obligations to their families, and one might argue this should be
taken into consideration as well because the family size and the required
level of commitment to the family members (kids, members who need special
care, etc) can have a huge influence on the contractors living standards,
not just the level of average rent in the particular location. It would be
unfair to take into account location but not the family situation. There
may be other factors as well that may influence fairness and I think this
will make the decision-making about contracting harder and, most
importantly, controversial.

My proposal is that factors like location, family situation, etc should be
discarded when negotiating contract terms. The efficiency of using the
project funding should be defined by how well and quickly a particular
contractor is able to get the job done,  but not how the contractors are
likely to spend their pays - it is nobody's business, IMHO, and is likely
very hard if not impossible to verify.

My 2cents,
Pearu

On Sun, Feb 21, 2021 at 4:52 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

> Compensating fairly
> ```````````````````
>
> Paying people fairly is a difficult topic. Therefore, we will only offer
> some
> guidance here. Final decisions will always have to be considered and
> approved
> by the group of people that bears this responsibility (according to the
> current NumPy governance structure, this would be the NumPy Steering
> Council).
>
> Discussions on employee compensation tend to be dominated by two
> narratives:
> "pay local market rates" and "same work -- same pay".
>
> We consider them both extreme:
>
> - "Same work -- same pay" is unfair to people living in locations with a
> higher
>   cost of living. For example, the average rent for a single family
> apartment
>   can differ by a large factor (from a few hundred dollar to thousands of
>   dollars per month).
> - "Pay local market rates" bakes in existing inequalities between countries
>   and makes fixed-cost items like a development machine or a holiday trip
>   abroad relatively harder to afford in locations where market rates are
> lower.
>
> We seek to find a middle ground between these two extremes.
>
> Useful points of reference include companies like GitLab and
> Buffer who are transparent about their remuneration policies ([3]_, [4]_),
> Google Summer of Code stipends ([5]_), other open source projects that
> manage
> their budget in a transparent manner (e.g., Babel and Webpack on Open
> Collective ([6]_, [7]_)), and standard salary comparison sites.
>
> Since NumPy is a not-for-profit project, we also looked to the nonprofit
> sector
> for guidelines on remuneration policies and compensation levels. Our
> findings
> show that most smaller non-profits tend to pay a median salary/wage. We
> recognize merit in this approach: applying candidates are likely to have a
> genuine interest in open source, rather than to be motivated purely by
> financial incentives.
>
> Considering all of the above, we will use the following guidelines for
> determining compensation:
>
> 1. Aim to compensate people appropriately, up to a level that's expected
> for
>    senior engineers or other professionals as applicable.
> 2. Establish a compensation cap of $125,000 USD that cannot be exceeded
> even
>    for the residents from the most expensive/competitive locations
> ([#f-pay]_).
> 3. For equivalent work and seniority,  a pay differential between locations
>    should never be more than 2x.
>    For example, if we pay $110,000 USD to a senior-level developer from New
>    York, for equivalent work a senior-level developer from South-East Asia
>    should be paid at least $55,000 USD. To compare locations, we will use
>    `Numbeo Cost of Living calculator <
> https://www.numbeo.com/cost-of-living/>`__
>    (or its equivalent).
>
> Some other considerations:
>
> - Often, compensated work is offered for a limited amount of hours or fixed
>   term. In those cases, consider compensation equivalent to a remuneration
>   package that comes with permanent employment (e.g., one month of work
> should
>   be compensated by at most 1/12th of a full-year salary + benefits).
> - When comparing rates, an individual contractor should typically make 20%
> more
>   than someone who is employed since they have to take care of their
> benefits
>   and accounting on their own.
> - Some people may be happy with one-off payments towards a particular
>   deliverable (e.g., hiring a cleaner or some other service to use the
> saved
>   time for work on open source). This should be compensated at a lower rate
>   compared to an individual contractor.
> - When funding someone's time through their employer, that employer may
> want to
>   set the compensation level based on its internal rules (e.g., overhead
> rates).
>   Small deviations from the guidelines in this NEP may be needed in such
> cases,
>   however they should be within reason.
> - It's entirely possible that another strategy rather than paying people
> for
>   their time on certain tasks may turn out to be more effective. Anything
> that
>   helps the project and community grow and improve is worth considering.
> - Transparency helps. If everyone involved is comfortable sharing their
>   compensation levels with the rest of the team (or better make it public),
>   it's least likely to be way off the mark for fairness.
>
> We highly recommend that the individuals involved in decision-making about
> hiring and compensation peruse the content of the References section of
> this
> NEP. It offers a lot of helpful advice on this topic.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/5b359532/attachment.html>

From qiyu8f at gmail.com  Mon Feb 22 07:13:19 2021
From: qiyu8f at gmail.com (ChunLin Fang)
Date: Mon, 22 Feb 2021 20:13:19 +0800
Subject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support
Message-ID: <CALCcLCEJ8fQhG2qnXVnXr79u76f6_=Azo-ik3iqx8xFOrO-NTg@mail.gmail.com>

  Hi all,
    Whether you're running apps on your phone or the world's fastest
supercomputer, you're most likely running ARM. Many major events have
occurred related to ARM archtecture:

   - Apple may have done the most to make ARM relatively relevant in
   popular culture with its new ARM-based M1 processor.
   - Amazon Web Services launched its Graviton2 processors based on the Arm
   architecture , which promise up to 40% better performance from comparable
   x86-based instances for 20% less.
   - Microsoft currently uses Arm-based chips from Qualcomm in some of its
   Surface PCs.
   - Huawei unveiled a new chipset called the Kunpeng based on ARM,
   designed to go into its new TaiShan servers, in a bid to boost its nascent
   cloud business.

     So It's obvious that ARM will become more and more popular in the
future, Since Intel MKL has provide good accelerate support for X86-based
chips, Huawei also published KML_BLAS
<https://kunpeng.huawei.com/en/#/developer/devkit/library>(kunpeng math
library blas) that can make full advantage of ARM-based chips,  KML_BLAS is
a mathematical library for basic linear algebra operations. it provides
three levels of high-performance vector operations: vector-vector
operations, vector-matrix operations, and matrix-matrix operations. The
performance advantage is shown in the attachment compared with OpenBlas.
Can we add KML_BLAS support to numpy?

Cheers,
Chunlin Fang(github ID:Qiyu8)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/3ccbffd4/attachment.html>

From qiyu8f at gmail.com  Mon Feb 22 07:15:24 2021
From: qiyu8f at gmail.com (ChunLin Fang)
Date: Mon, 22 Feb 2021 20:15:24 +0800
Subject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support
Message-ID: <CALCcLCEMQtzuJn28en54U9o-3fzABFvFCL+7uBxkv+x486P-SA@mail.gmail.com>

Part of the performance benchmark results.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/f36e4151/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: KML_BLAS VS OpenBLAS.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 13635 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/f36e4151/attachment-0001.xlsx>

From kevin.k.sheppard at gmail.com  Mon Feb 22 07:30:49 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Mon, 22 Feb 2021 12:30:49 +0000
Subject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support
In-Reply-To: <CALCcLCEMQtzuJn28en54U9o-3fzABFvFCL+7uBxkv+x486P-SA@mail.gmail.com>
References: <CALCcLCEMQtzuJn28en54U9o-3fzABFvFCL+7uBxkv+x486P-SA@mail.gmail.com>
Message-ID: <5BA693A7-2617-49E2-B525-F59B7F13320C@hxcore.ol>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/da65d82e/attachment.html>

From matti.picus at gmail.com  Mon Feb 22 07:44:01 2021
From: matti.picus at gmail.com (Matti Picus)
Date: Mon, 22 Feb 2021 14:44:01 +0200
Subject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support
In-Reply-To: <CALCcLCEJ8fQhG2qnXVnXr79u76f6_=Azo-ik3iqx8xFOrO-NTg@mail.gmail.com>
References: <CALCcLCEJ8fQhG2qnXVnXr79u76f6_=Azo-ik3iqx8xFOrO-NTg@mail.gmail.com>
Message-ID: <39024a42-8655-ce59-efa8-8635639f1a55@gmail.com>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/e0ce30d5/attachment.html>

From matthew.brett at gmail.com  Mon Feb 22 07:49:39 2021
From: matthew.brett at gmail.com (Matthew Brett)
Date: Mon, 22 Feb 2021 12:49:39 +0000
Subject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support
In-Reply-To: <CALCcLCEJ8fQhG2qnXVnXr79u76f6_=Azo-ik3iqx8xFOrO-NTg@mail.gmail.com>
References: <CALCcLCEJ8fQhG2qnXVnXr79u76f6_=Azo-ik3iqx8xFOrO-NTg@mail.gmail.com>
Message-ID: <CAH6Pt5onNbgETSZzsDXvvBOZaOZkk3ycWeM5u_+LNqcVq8b7Kg@mail.gmail.com>

Hi,

The only pages I could find on KML_BLAS were in Chinese.  Like Matti,
I'd be interested to know about the license for this library.  Which
particular ARM chips will it run on?

Cheers,

Matthew

On Mon, Feb 22, 2021 at 12:15 PM ChunLin Fang <qiyu8f at gmail.com> wrote:
>
>   Hi all,
>     Whether you're running apps on your phone or the world's fastest supercomputer, you're most likely running ARM. Many major events have occurred related to ARM archtecture:
>
> Apple may have done the most to make ARM relatively relevant in popular culture with its new ARM-based M1 processor.
> Amazon Web Services launched its Graviton2 processors based on the Arm architecture , which promise up to 40% better performance from comparable x86-based instances for 20% less.
> Microsoft currently uses Arm-based chips from Qualcomm in some of its Surface PCs.
> Huawei unveiled a new chipset called the Kunpeng based on ARM, designed to go into its new TaiShan servers, in a bid to boost its nascent cloud business.
>
>      So It's obvious that ARM will become more and more popular in the future, Since Intel MKL has provide good accelerate support for X86-based chips, Huawei also published KML_BLAS(kunpeng math library blas) that can make full advantage of ARM-based chips,  KML_BLAS is a mathematical library for basic linear algebra operations. it provides three levels of high-performance vector operations: vector-vector operations, vector-matrix operations, and matrix-matrix operations. The performance advantage is shown in the attachment compared with OpenBlas. Can we add KML_BLAS support to numpy?
>
> Cheers,
> Chunlin Fang(github ID:Qiyu8)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From ralf.gommers at gmail.com  Mon Feb 22 10:10:39 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 22 Feb 2021 16:10:39 +0100
Subject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support
In-Reply-To: <39024a42-8655-ce59-efa8-8635639f1a55@gmail.com>
References: <CALCcLCEJ8fQhG2qnXVnXr79u76f6_=Azo-ik3iqx8xFOrO-NTg@mail.gmail.com>
 <39024a42-8655-ce59-efa8-8635639f1a55@gmail.com>
Message-ID: <CABL7CQjJihX8C0Tb=-ZHSMnx3FYDD4uewKE1biHYx4cP1TwF9Q@mail.gmail.com>

On Mon, Feb 22, 2021 at 1:44 PM Matti Picus <matti.picus at gmail.com> wrote:

> On 2/22/21 2:13 PM, ChunLin Fang wrote:
>
>   Hi all,
>     Whether you're running apps on your phone or the world's fastest
> supercomputer, you're most likely running ARM. Many major events have
> occurred related to ARM archtecture:
>
>    - Apple may have done the most to make ARM relatively relevant in
>    popular culture with its new ARM-based M1 processor.
>    - Amazon Web Services launched its Graviton2 processors based on the
>    Arm architecture , which promise up to 40% better performance from
>    comparable x86-based instances for 20% less.
>    - Microsoft currently uses Arm-based chips from Qualcomm in some of
>    its Surface PCs.
>    - Huawei unveiled a new chipset called the Kunpeng based on ARM,
>    designed to go into its new TaiShan servers, in a bid to boost its nascent
>    cloud business.
>
>      So It's obvious that ARM will become more and more popular in the
> future, Since Intel MKL has provide good accelerate support for X86-based
> chips, Huawei also published KML_BLAS
> <https://kunpeng.huawei.com/en/#/developer/devkit/library>(kunpeng math
> library blas) that can make full advantage of ARM-based chips,  KML_BLAS is
> a mathematical library for basic linear algebra operations. it provides
> three levels of high-performance vector operations: vector-vector
> operations, vector-matrix operations, and matrix-matrix operations. The
> performance advantage is shown in the attachment compared with OpenBlas.
> Can we add KML_BLAS support to numpy?
>
> Cheers,
> Chunlin Fang(github ID:Qiyu8)
>
>
> Thanks, I hadn't heard of this library before. I am a bit confused as to
> the link: did you mean this?
> https://www.huaweicloud.com/kunpeng/software/KML_BLAS.html
>
>
> Is there something beyond choosing KML_BLAS in the site.cfg file that
> needs to be done to support it?
>
Support in numpy.distutils probably, analogous to what we did for BLIS for
example: https://github.com/numpy/numpy/pull/7294/files.

The other thing could be "ship aarch64 wheels with KML_BLAS support instead
of OpenBLAS". That we can only do if KML_BLAS would be open source.

Cheers,
Ralf


What is the license/redistribution policy?
>
Matti
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/8939d405/attachment-0001.html>

From alan.isaac at gmail.com  Mon Feb 22 10:25:25 2021
From: alan.isaac at gmail.com (Alan G. Isaac)
Date: Mon, 22 Feb 2021 10:25:25 -0500
Subject: [Numpy-discussion] NEP 48: Spending NumPy Project funds
In-Reply-To: <CAPpwKcz=RbcNWsG5JbA0=LPPe3G5xkJd+05buZuDV39mrZN20Q@mail.gmail.com>
References: <CABL7CQhZe4dqTgZZ_A8XhKzW9iWWingo4ru8iOFxEzF1-3AwdA@mail.gmail.com>
 <CAPpwKcz=RbcNWsG5JbA0=LPPe3G5xkJd+05buZuDV39mrZN20Q@mail.gmail.com>
Message-ID: <d778037a-3bf8-1a0e-1f93-5e8152fb3815@gmail.com>

Yes.

On 2/22/2021 7:08 AM, Pearu Peterson wrote:
> it is nobody's business, IMHO, and is likely very hard if not impossible to verify

From sebastian at sipsolutions.net  Mon Feb 22 13:48:47 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 22 Feb 2021 12:48:47 -0600
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
Message-ID: <22b4290c7f5d5f6a17f819245403b90921d46504.camel@sipsolutions.net>

On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> Hi all,
> 
> Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> for
> discussion on adoption of the array API standard (
> https://data-apis.github.io/array-api/latest/). This will add a new
> numpy.array_api submodule containing that standardized API. The main
> purpose of this API is to be able to write code that is portable to
> other
> array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> MXNet.
> 
> We expect this NEP to remain in draft state for quite a while, while
> we're
> gaining experience with using it in downstream libraries, discuss
> adding it
> to other array libraries, and finishing some of the loose ends (e.g.,
> specifications for linear algebra functions that aren't merged yet,
> see
> https://github.com/data-apis/array-api/pulls) in the API standard
> itself.


There is too much to unpack in a day, I hope I did not miss something
particularly important while reading.
Do you have plans to try some of this outside of NumPy, or maybe make a
repo in the numpy org for it?


Some thoughts:

The DLPack integration: I honestly think we can split that out, maybe
even without or at least a very short NEP.  It seems like a good
addition to me.  And a simple one, especially if we don't need to
integrate it into `np.array(...)`.

---

It seems the current idea is to create a new NumPy array subclass. That
sounds good, but I am a bit worried how that is going to interact with
actual NumPy arrays.
Most SciPy users, will still use NumPy proper.  So SciPy must not
return this "random" subclass, when the user did pass in a NumPy array
(or even by default).

At that point, what will the SciPy dev do to juggle the fact that you
would like to use the new API internally, but the interface must still
default to NumPy (and may depend on the input)?
This is probably not very tricky, but I am slightly worried about what
happens when things get mixed up.  Also if a user passes this "minimal
array" into a current NumPy API function, it will often not work if it
is a subclass.

---

Related to that: how important is it to keep that namespace a "minimal"
implementation, rather than a "conforming" one?  For example, would you
want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or just
`dtype="complex128"`?
Maybe I got the wrong impression though.  Is the aim for a minimal
implementation, but you are OK as long as it is a conforming one?

---

The implementation mentions bypassing `__array_function__` in the
current implementation.  That requires a semi-formalization of how
`__array_function__` should be "bypassed".  I think that is useful, but
I have also been wondering about going the pytorch route of in-lining
the check to avoid the current overheads. That avoids a bit jumping
between C and python and multiple function calls, but might mean that
`func._implementation` is a bit in the way.

Besides `_implementation` usually does support array subclasses and may
still dispatch again internally at this time!

---

I am somewhat worried that getting the promotion (and other quirks) to
where you want could be very tricky, unless we are patient enough to
wait for NumPy proper to evolve.
Hopefully I am just too pessimistic and e.g. a mild form of code
duplication can solve all of that.  Probably time and trial-and-error
will be the judge on that...


Cheers,

Sebastian


> 
> See
> https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> for an initial discussion about this topic.
> 
> Please keep high-level discussion here and detailed comments on
> https://github.com/numpy/numpy/pull/18456. Also, you can access a
> rendered
> version of the NEP from that PR (see PR description for how), which
> may be
> helpful.
> Cheers,
> Ralf
> 
> 
> Abstract
> --------
> 
> We propose to adopt the `Python array API standard`_, developed by
> the
> `Consortium for Python Data API Standards`_. Implementing this as a
> separate
> new namespace in NumPy will allow authors of libraries which depend
> on NumPy
> as well as end users to write code that is portable between NumPy and
> all
> other array/tensor libraries that adopt this standard.
> 
> .. note::
> 
> ??? We expect that this NEP will remain in a draft state for quite a
> while.
> ??? Given the large scope we don't expect to propose it for
> acceptance any
> ??? time soon; instead, we want to solicit feedback on both the high-
> level
> ??? design and implementation, and learn what needs describing better
> in
> this
> ??? NEP or changing in either the implementation or the array API
> standard
> ??? itself.
> 
> 
> Motivation and Scope
> --------------------
> 
> Python users have a wealth of choice for libraries and frameworks for
> numerical computing, data science, machine learning, and deep
> learning. New
> frameworks pushing forward the state of the art in these fields are
> appearing
> every year. One unintended consequence of all this activity and
> creativity
> has been fragmentation in multidimensional array (a.k.a. tensor)
> libraries -
> which are the fundamental data structure for these fields. Choices
> include
> NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> 
> The APIs of each of these libraries are largely similar, but with
> enough
> differences that it?s quite difficult to write code that works with
> multiple
> (or all) of these libraries. The array API standard aims to address
> that
> issue, by specifying an API for the most common ways arrays are
> constructed
> and used. The proposed API is quite similar to NumPy's API, and
> deviates
> mainly
> in places where (a) NumPy made design choices that are inherently not
> portable
> to other implementations, and (b) where other libraries consistently
> deviated
> from NumPy on purpose because NumPy's design turned out to have
> issues or
> unnecessary complexity.
> 
> For a longer discussion on the purpose of the array API standard we
> refer to
> the `Purpose and Scope section of the array API standard <
> https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`
> __
> and the two blog posts announcing the formation of the Consortium
> [1]_ and
> the release of the first draft version of the standard for community
> review
> [2]_.
> 
> The scope of this NEP includes:
> 
> - Adopting the 2021 version of the array API standard
> - Adding a separate namespace, tentatively named ``numpy.array_api``
> - Changes needed/desired outside of the new namespace, for example
> new
> dunder
> ? methods on the ``ndarray`` object
> - Implementation choices, and differences between functions in the
> new
> ? namespace with those in the main ``numpy`` namespace
> - A new array object conforming to the array API standard
> - Maintenance effort and testing strategy
> - Impact on NumPy's total exposed API surface and on other future and
> ? under-discussion design choices
> - Relation to existing and proposed NumPy array protocols
> ? (``__array_ufunc__``, ``__array_function__``,
> ``__array_module__``).
> - Required improvements to existing NumPy functionality
> 
> Out of scope for this NEP are:
> 
> - Changes in the array API standard itself. Those are likely to come
> up
> ? during review of this NEP, but should be upstreamed as needed and
> this NEP
> ? subsequently updated.
> 
> 
> Usage and Impact
> ----------------
> 
> *This section will be fleshed out later, for now we refer to the use
> cases
> given
> in* `the array API standard Use Cases section <
> https://data-apis.github.io/array-api/latest/use_cases.html>`__
> 
> In addition to those use cases, the new namespace contains
> functionality
> that
> is widely used and supported by many array libraries. As such, it is
> a good
> set of functions to teach to newcomers to NumPy and recommend as
> "best
> practice". That contrasts with NumPy's main namespace, which contains
> many
> functions and objects that have been superceded or we consider
> mistakes -
> but
> that we can't remove because of backwards compatibility reasons.
> 
> The usage of the ``numpy.array_api`` namespace by downstream
> libraries is
> intended to enable them to consume multiple kinds of arrays, *without
> having
> to have a hard dependency on all of those array libraries*:
> 
> .. image:: _static/nep-0047-library-dependencies.png
> 
> Adoption in downstream libraries
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The prototype implementation of the ``array_api`` namespace will be
> used
> with
> SciPy, scikit-learn and other libraries of interest that depend on
> NumPy, in
> order to get more experience with the design and find out if any
> important
> parts are missing.
> 
> The pattern to support multiple array libraries is intended to be
> something
> like::
> 
> ??? def somefunc(x, y):
> ??????? # Retrieves standard namespace. Raises if x and y have
> different
> ??????? # namespaces.? See Appendix for possible get_namespace
> implementation
> ??????? xp = get_namespace(x, y)
> ??????? out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> ??????? return out
> 
> The ``get_namespace`` call is effectively the library author opting
> in to
> using the standard API namespace, and thereby explicitly supporting
> all conforming array libraries.
> 
> 
> The ``asarray`` / ``asanyarray`` pattern
> ````````````````````````````````````````
> 
> Many existing libraries use the same ``asarray`` (or ``asanyarray``)
> pattern
> as NumPy itself does; accepting any object that can be coerced into a
> ``np.ndarray``.
> We consider this design pattern problematic - keeping in mind the Zen
> of
> Python, *"explicit is better than implicit"*, as well as the pattern
> being
> historically problematic in the SciPy ecosystem for ``ndarray``
> subclasses
> and with over-eager object creation. All other array/tensor libraries
> are
> more strict, and that works out fine in practice. We would advise
> authors of
> new libraries to avoid the ``asarray`` pattern. Instead they should
> either
> accept just NumPy arrays or, if they want to support multiple kinds
> of
> arrays, check if the incoming array object supports the array API
> standard
> by checking for ``__array_namespace__`` as shown in the example
> above.
> 
> Existing libraries can do such a check as well, and only call
> ``asarray`` if
> the check fails. This is very similar to the ``__duckarray__`` idea
> in
> :ref:`NEP30`.
> 
> 
> .. _adoption-application-code:
> 
> Adoption in application code
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The new namespace can be seen by end users as a cleaned up and
> slimmed down
> version of NumPy's main namespace. Encouraging end users to use this
> namespace like::
> 
> ??? import numpy.array_api as xp
> 
> ??? x = xp.linspace(0, 2*xp.pi, num=100)
> ??? y = xp.cos(x)
> 
> seems perfectly reasonable, and potentially beneficial - users get
> offered
> only
> one function for each purpose (the one we consider best-practice),
> and they
> then write code that is more easily portable to other libraries.
> 
> 
> Backward compatibility
> ----------------------
> 
> No deprecations or removals of existing NumPy APIs or other backwards
> incompatible changes are proposed.
> 
> 
> High-level design
> -----------------
> 
> The array API standard consists of approximately 120 objects, all of
> which
> have a direct NumPy equivalent. This figure shows what is included at
> a
> high level:
> 
> .. image:: _static/nep-0047-scope-of-array-API.png
> 
> The most important changes compared to what NumPy currently offers
> are:
> 
> - A new array object which:
> 
> ??? - conforms to the casting rules and indexing behaviour specified
> by the
> ????? standard,
> ??? - does not have methods other than dunder methods,
> ??? - does not support the full range of NumPy indexing behaviour.
> Advanced
> ????? indexing with integers is not supported. Only boolean indexing
> ????? with a single (possibly multi-dimensional) boolean array is
> supported.
> ????? An indexing expression that selects a single element returns a
> 0-D
> array
> ????? rather than a scalar.
> 
> - Functions in the ``array_api`` namespace:
> 
> ??? - do not accept ``array_like`` inputs, only NumPy arrays and
> Python
> scalars
> ??? - do not support ``__array_ufunc__`` and ``__array_function__``,
> ??? - use positional-only and keyword-only parameters in their
> signatures,
> ??? - have inline type annotations,
> ??? - may have minor changes to signatures and semantics of
> individual
> ????? functions compared to their equivalents already present in
> NumPy,
> ??? - only support dtype literals, not format strings or other ways
> of
> ????? specifying dtypes
> 
> - DLPack_ support will be added to NumPy,
> - New syntax for "device support" will be added, through a
> ``.device``
> ? attribute on the new array object, and ``device=`` keywords in
> array
> creation
> ? functions in the ``array_api`` namespace,
> - Casting rules that differ from those NumPy currently has. Output
> dtypes
> can
> ? be derived from input dtypes (i.e. no value-based casting), and 0-D
> arrays
> ? are treated like >=1-D arrays.
> - Not all dtypes NumPy has are part of the standard. Only boolean,
> signed
> and
> ? unsigned integers, and floating-point dtypes up to ``float64`` are
> supported.
> ? Complex dtypes are expected to be added in the next version of the
> standard.
> ? Extended precision, string, void, object and datetime dtypes, as
> well as
> ? structured dtypes, are not included.
> 
> Improvements to existing NumPy functionality that are needed include:
> 
> - Add support for stacks of matrices to some functions in
> ``numpy.linalg``
> ? that are currently missing such support.
> - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
> - Add a "never copy" mode to ``np.asarray``.
> 
> 
> Functions in the ``array_api`` namespace
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Let's start with an example of a function implementation that shows
> the most
> important differences with the equivalent function in the main
> namespace::
> 
> ??? def max(x: array, /, *,
> ??????????? axis: Optional[Union[int, Tuple[int, ...]]] = None,
> ??????????? keepdims: bool = False
> ??????? ) -> array:
> ??????? """
> ??????? Array API compatible wrapper for :py:func:`np.max
> <numpy.max>`.
> ??????? """
> ??????? return np.max._implementation(x, axis=axis,
> keepdims=keepdims)
> 
> This function does not accept ``array_like`` inputs, only
> ``ndarray``. There
> are multiple reasons for this. Other array libraries all work like
> this.
> Letting the user do coercion of lists, generators, or other foreign
> objects
> separately results in a cleaner design with less unexpected
> behaviour.
> It's higher-performance - less overhead from ``asarray`` calls.
> Static
> typing
> is easier. Subclasses will work as expected. And the slight increase
> in
> verbosity
> because users have to explicitly coerce to ``ndarray`` on rare
> occasions
> seems like a small price to pay.
> 
> This function does not support ``__array_ufunc__`` nor
> ``__array_function__``.
> These protocols serve a similar purpose as the array API standard
> module
> itself,
> but through a different mechanisms. Because only ``ndarray``
> instances are
> accepted,
> dispatching via one of these protocols isn't useful anymore.
> 
> This function uses positional-only parameters in its signature. This
> makes
> code
> more portable - writing ``max(x=x, ...)`` is no longer valid, hence
> if other
> libraries call the first parameter ``input`` rather than ``x``, that
> is
> fine.
> The rationale for keyword-only parameters (not shown in the above
> example)
> is
> two-fold: clarity of end user code, and it being easier to extend the
> signature
> in the future with keywords in the desired order.
> 
> This function has inline type annotations. Inline annotations are far
> easier to
> maintain than separate stub files. And because the types are simple,
> this
> will
> not result in a large amount of clutter with type aliases or unions
> like in
> the
> current stub files NumPy has.
> 
> 
> DLPack support for zero-copy data interchange
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The ability to convert one kind of array into another kind is
> valuable, and
> indeed necessary when downstream libraries want to support multiple
> kinds of
> arrays. This requires a well-specified data exchange protocol. NumPy
> already
> supports two of these, namely the buffer protocol (i.e., PEP 3118),
> and
> the ``__array_interface__`` (Python side) / ``__array_struct__`` (C
> side)
> protocol. Both work similarly, letting the "producer" describe how
> the data
> is laid out in memory so the "consumer" can construct its own kind of
> array
> with a view on that data.
> 
> DLPack works in a very similar way. The main reasons to prefer DLPack
> over
> the options already present in NumPy are:
> 
> 1. DLPack is the only protocol with device support (e.g., GPUs using
> CUDA or
> ?? ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> array
> ?? libraries are not. Having one protocol per device isn't tenable,
> hence
> ?? device support is a must.
> 2. Widespread support. DLPack has the widest adoption of all
> protocols, only
> ?? NumPy is missing support. And the experiences of other libraries
> with it
> ?? are positive. This contrasts with the protocols NumPy does
> support, which
> ?? are used very little - when other libraries want to interoperate
> with
> ?? NumPy, they typically use the (more limited, and NumPy-specific)
> ?? ``__array__`` protocol.
> 
> Adding support for DLPack to NumPy entails:
> 
> - Adding a ``ndarray.__dlpack__`` method
> - Adding a ``from_dlpack`` function, which takes as input an object
> ? supporting ``__dlpack__``, and returns an ``ndarray``.
> 
> DLPack is currently a ~200 LoC header, and is meant to be included
> directly, so
> no external dependency is needed. Implementation should be
> straightforward.
> 
> 
> Syntax for device support
> ~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> NumPy itself is CPU-only, so it clearly doesn't have a need for
> device
> support.
> However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> support
> multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
> To write portable code on systems with multiple devices, it's often
> necessary
> to create new arrays on the same device as some other array, or check
> that
> two arrays live on the same device. Hence syntax for that is needed.
> 
> The array object will have a ``.device`` attribute which enables
> comparing
> devices of different arrays (they only should compare equal if both
> arrays
> are
> from the same library and it's the same hardware device).
> Furthermore,
> ``device=`` keywords in array creation functions are needed. For
> example::
> 
> ??? def empty(shape: Union[int, Tuple[int, ...]], /, *,
> ????????????? dtype: Optional[dtype] = None,
> ????????????? device: Optional[device] = None) -> array:
> ??????? """
> ??????? Array API compatible wrapper for :py:func:`np.empty
> <numpy.empty>`.
> ??????? """
> ??????? return np.empty(shape, dtype=dtype, device=device)
> 
> The implementation for NumPy may be as simple as setting the device
> attribute to
> the string ``'cpu'`` and raising an exception if array creation
> functions
> encounter any other value.
> 
> 
> Dtypes and casting rules
> ~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The supported dtypes in this namespace are boolean, 8/16/32/64-bit
> signed
> and
> unsigned integer, and 32/64-bit floating-point dtypes. These will be
> added
> to
> the namespace as dtype literals with the expected names (e.g.,
> ``bool``,
> ``uint16``, ``float64``).
> 
> The most obvious omissions are the complex dtypes. The rationale for
> the
> lack
> of complex support in the first version of the array API standard is
> that
> several
> libraries (PyTorch, MXNet) are still in the process of adding support
> for
> complex dtypes. The next version of the standard is expected to
> include
> ``complex64``
> and ``complex128`` (see `this issue <
> https://github.com/data-apis/array-api/issues/102>`__
> for more details).
> 
> Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is
> expected
> to only use the dtype literals. Format strings, Python builtin
> dtypes, or
> string representations of the dtype literals are not accepted - this
> will
> improve readability and portability of code at little cost.
> 
> Casting rules are only defined between different dtypes of the same
> kind.
> The
> rationale for this is that mixed-kind (e.g., integer to floating-
> point)
> casting behavior differs between libraries. NumPy's mixed-kind
> casting
> behavior doesn't need to be changed or restricted, it only needs to
> be
> documented that if users use mixed-kind casting, their code may not
> be
> portable.
> 
> .. image:: _static/nep-0047-casting-rules-lattice.png
> 
> *Type promotion diagram. Promotion between any two types is given by
> their
> join on this lattice. Only the types of participating arrays matter,
> not
> their values. Dashed lines indicate that behaviour for Python scalars
> is
> undefined on overflow. Boolean, integer and floating-point dtypes are
> not
> connected, indicating mixed-kind promotion is undefined.*
> 
> The most important difference between the casting rules in NumPy and
> in the
> array API standard is how scalars and 0-dimensional arrays are
> handled. In
> the standard, array scalars do not exist and 0-dimensional arrays
> follow the
> same casting rules as higher-dimensional arrays.
> 
> See the `Type Promotion Rules section of the array API standard <
> https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > `__
> for more details.
> 
> .. note::
> 
> ??? It is not clear what the best way is to support the different
> casting
> rules
> ??? for 0-dimensional arrays and no value-based casting. One option
> may be
> to
> ??? implement this second set of casting rules, keep them private,
> mark the
> ??? array API functions with a private attribute that says they
> adhere to
> ??? these different rules, and let the casting machinery check
> whether for
> ??? that attribute.
> 
> ??? This needs discussion.
> 
> 
> Indexing
> ~~~~~~~~
> 
> An indexing expression that would return a scalar with ``ndarray``,
> e.g.
> ``arr_2d[0, 0]``, will return a 0-D array with the new array object.
> There
> are
> several reasons for that: array scalars are largely considered a
> design
> mistake
> which no other array library copied; it works better for non-CPU
> libraries
> (typically arrays can live on the device, scalars live on the host);
> and
> it's
> simply a consistent design. To get a Python scalar out of a 0-D
> array, one
> can
> simply use the builtin for the type, e.g. ``float(arr_0d)``.
> 
> The other `indexing modes in the standard <
> https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > `__
> do work largely the same as they do for ``numpy.ndarray``. One
> noteworthy
> difference is that clipping in slice indexing (e.g., ``a[:n]`` where
> ``n``
> is
> larger than the size of the first axis) is unspecified behaviour,
> because
> that kind of check can be expensive on accelerators.
> 
> The lack of advanced indexing, and boolean indexing being limited to
> a
> single
> n-D boolean array, is due to those indexing modes not being suitable
> for all
> types of arrays or JIT compilation. Their absence does not seem to be
> problematic; if a user or library author wants to use them, they can
> do so
> through zero-copy conversion to ``numpy.ndarray``. This will signal
> correctly
> to whomever reads the code that it is then NumPy-specific rather than
> portable
> to all conforming array types.
> 
> 
> 
> The array object
> ~~~~~~~~~~~~~~~~
> 
> The array object in the standard does not have methods other than
> dunder
> methods. The rationale for that is that not all array libraries have
> methods
> on their array object (e.g., TensorFlow does not). It also provides
> only a
> single way of doing something, rather than have functions and methods
> that
> are effectively duplicate.
> 
> Mixing operations that may produce views (e.g., indexing,
> ``nonzero``)
> in combination with mutation (e.g., item or slice assignment) is
> `explicitly documented in the standard to not be supported <
> https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > `__.
> This cannot easily be prohibited in the array object itself; instead
> this
> will
> be guidance to the user via documentation.
> 
> The standard current does not prescribe a name for the array object
> itself.
> We propose to simply name it ``ndarray``. This is the most obvious
> name, and
> because of the separate namespace should not clash with
> ``numpy.ndarray``.
> 
> 
> Implementation
> --------------
> 
> .. note::
> 
> ??? This section needs a lot more detail, which will gradually be
> added when
> ??? the implementation progresses.
> 
> A prototype of the ``array_api`` namespace can be found in
> https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
> The docstring in its ``__init__.py`` has notes on completeness of the
> implementation. The code for the wrapper functions also contains ``#
> Note:``
> comments everywhere there is a difference with the NumPy API.
> Two important parts that are not implemented yet are the new array
> object
> and
> DLPack support. Functions may need changes to ensure the changed
> casting
> rules
> are respected.
> 
> The array object
> ~~~~~~~~~~~~~~~~
> 
> Regarding the array object implementation, we plan to start with a
> regular
> Python class that wraps a ``numpy.ndarray`` instance. Attributes and
> methods
> can forward to that wrapped instance, applying input validation and
> implementing changed behaviour as needed.
> 
> The casting rules are probably the most challenging part. The in-
> progress
> dtype system refactor (NEPs 40-43) should make implementing the
> correct
> casting
> behaviour easier - it is already moving away from value-based casting
> for
> example.
> 
> 
> The dtype objects
> ~~~~~~~~~~~~~~~~~
> 
> We must be able to compare dtypes for equality, and expressions like
> these
> must
> be possible::
> 
> ??? np.array_api.some_func(..., dtype=x.dtype)
> 
> The above implies it would be nice to have ``np.array_api.float32 ==
> np.array_api.ndarray(...).dtype``.
> 
> Dtypes should not be assumed to have a class hierarchy by users,
> however we
> are
> free to implement it with a class hierarchy if that's convenient. We
> considered
> the following options to implement dtype objects:
> 
> 1. Alias dtypes to those in the main namespace. E.g.,
> ``np.array_api.float32 =
> ?? np.float32``.
> 2. Make the dtypes instances of ``np.dtype``. E.g.,
> ``np.array_api.float32 =
> ?? np.dtype(np.float32)``.
> 3. Create new singleton classes with only the required
> methods/attributes
> ?? (currently just ``__eq__``).
> 
> It seems like (2) would be easiest from the perspective of
> interacting with
> functions outside the main namespace. And (3) would adhere best to
> the
> standard.
> 
> TBD: the standard does not yet have a good way to inspect properties
> of a
> dtype, to ask questions like "is this an integer dtype?". Perhaps
> this is
> easy
> enough to do for users, like so::
> 
> ??? def _get_dtype(dt_or_arr):
> ??????? return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else
> dt_or_arr
> 
> ??? def is_floating(dtype_or_array):
> ??????? dtype = _get_dtype(dtype_or_array)
> ??????? return dtype in (float32, float64)
> 
> ??? def is_integer(dtype_or_array):
> ??????? dtype = _get_dtype(dtype_or_array)
> ??????? return dtype in (uint8, uint16, uint32, uint64, int8, int16,
> int32,
> int64)
> 
> However it could make sense to add to the standard. Note that NumPy
> itself
> currently does not have a great for asking such questions, see
> `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/8b44b9e2/attachment-0001.sig>

From ralf.gommers at gmail.com  Mon Feb 22 14:16:43 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 22 Feb 2021 20:16:43 +0100
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <22b4290c7f5d5f6a17f819245403b90921d46504.camel@sipsolutions.net>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <22b4290c7f5d5f6a17f819245403b90921d46504.camel@sipsolutions.net>
Message-ID: <CABL7CQj2gxc9a0Ja4xBxPv13nC_OKqQ=TGU93-m4nQsAPuMj9w@mail.gmail.com>

Thanks for the feedback Sebastian!


On Mon, Feb 22, 2021 at 7:49 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> > Hi all,
> >
> > Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> > for
> > discussion on adoption of the array API standard (
> > https://data-apis.github.io/array-api/latest/). This will add a new
> > numpy.array_api submodule containing that standardized API. The main
> > purpose of this API is to be able to write code that is portable to
> > other
> > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> > MXNet.
> >
> > We expect this NEP to remain in draft state for quite a while, while
> > we're
> > gaining experience with using it in downstream libraries, discuss
> > adding it
> > to other array libraries, and finishing some of the loose ends (e.g.,
> > specifications for linear algebra functions that aren't merged yet,
> > see
> > https://github.com/data-apis/array-api/pulls) in the API standard
> > itself.
>
>
> There is too much to unpack in a day, I hope I did not miss something
> particularly important while reading.
> Do you have plans to try some of this outside of NumPy, or maybe make a
> repo in the numpy org for it?
>
>
> Some thoughts:
>
> The DLPack integration: I honestly think we can split that out, maybe
> even without or at least a very short NEP.  It seems like a good
> addition to me.  And a simple one, especially if we don't need to
> integrate it into `np.array(...)`.
>

I agree. I included it in the NEP because it's the standard exchange
mechanism in the API standard, but it makes perfect sense to implement as a
standalone feature.


> ---
>
> It seems the current idea is to create a new NumPy array subclass. That
> sounds good, but I am a bit worried how that is going to interact with
> actual NumPy arrays.
>

Not a subclass! If you got that impression, I should clarify the text. The
idea is a standalone class that doesn't inherit from anything, and has only
the methods, attributes and semantics described in the API standard. It
just uses np.asarray under the hood. This will be clearer when we have a
prototype for it, probably within two weeks.

Most SciPy users, will still use NumPy proper.  So SciPy must not
> return this "random" subclass, when the user did pass in a NumPy array
> (or even by default).
>

Agreed. ndarray in = ndarray out; new array object in = new array object
out.


> At that point, what will the SciPy dev do to juggle the fact that you
> would like to use the new API internally, but the interface must still
> default to NumPy (and may depend on the input)?
> This is probably not very tricky, but I am slightly worried about what
> happens when things get mixed up.  Also if a user passes this "minimal
> array" into a current NumPy API function, it will often not work if it
> is a subclass.
>
> ---
>
> Related to that: how important is it to keep that namespace a "minimal"
> implementation, rather than a "conforming" one?  For example, would you
> want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or just
> `dtype="complex128"`?
>

Yes, I'd definitely want to reject that. Format strings are terrible.

Maybe I got the wrong impression though.  Is the aim for a minimal
> implementation, but you are OK as long as it is a conforming one?
>

In principle we're okay with a conforming one that's a superset of what is
in the standard. But I think we'd only want to do that if creating a
minimal one turns out to be difficult. Having the minimal required set is
much nicer when one wants to write portable code. Because then you can do
so without checking the docs whether any object/method is in "minimal" or
in "extended".


> ---
>
> The implementation mentions bypassing `__array_function__` in the
> current implementation.  That requires a semi-formalization of how
> `__array_function__` should be "bypassed".


I think simply:

def somefunc(x):
    # do whatever checks needed here for, e.g., input validation
    # then call the native numpy implementation:
    return np.somefunc._implementation(x)


  I think that is useful, but
> I have also been wondering about going the pytorch route of in-lining
> the check to avoid the current overheads. That avoids a bit jumping
> between C and python and multiple function calls, but might mean that
> `func._implementation` is a bit in the way.
>
> Besides `_implementation` usually does support array subclasses and may
> still dispatch again internally at this time!
>

What, it dispatches again? That seems very suboptimal. If there's no clean
way to avoid a dispatch, it may make sense to just check array inputs for
the presence of __array_function__ and raise an exception if it's present.

It's not just about overhead (that's a minor thing), it's that the feature
does not make much sense in combination with the array_api namespace. The
"get a hold of a new namespace" approach is like __array_module__, which
was an alternative to __array_function__ not an addition to it.


> ---
>
> I am somewhat worried that getting the promotion (and other quirks) to
> where you want could be very tricky, unless we are patient enough to
> wait for NumPy proper to evolve.
> Hopefully I am just too pessimistic and e.g. a mild form of code
> duplication can solve all of that.  Probably time and trial-and-error
> will be the judge on that...
>

I do agree that the different casting rules are the single most tricky
issue implementation wise.

Cheers,
Ralf


>
> Cheers,
>
> Sebastian
>
>
> >
> > See
> >
> https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> > for an initial discussion about this topic.
> >
> > Please keep high-level discussion here and detailed comments on
> > https://github.com/numpy/numpy/pull/18456. Also, you can access a
> > rendered
> > version of the NEP from that PR (see PR description for how), which
> > may be
> > helpful.
> > Cheers,
> > Ralf
> >
> >
> > Abstract
> > --------
> >
> > We propose to adopt the `Python array API standard`_, developed by
> > the
> > `Consortium for Python Data API Standards`_. Implementing this as a
> > separate
> > new namespace in NumPy will allow authors of libraries which depend
> > on NumPy
> > as well as end users to write code that is portable between NumPy and
> > all
> > other array/tensor libraries that adopt this standard.
> >
> > .. note::
> >
> >     We expect that this NEP will remain in a draft state for quite a
> > while.
> >     Given the large scope we don't expect to propose it for
> > acceptance any
> >     time soon; instead, we want to solicit feedback on both the high-
> > level
> >     design and implementation, and learn what needs describing better
> > in
> > this
> >     NEP or changing in either the implementation or the array API
> > standard
> >     itself.
> >
> >
> > Motivation and Scope
> > --------------------
> >
> > Python users have a wealth of choice for libraries and frameworks for
> > numerical computing, data science, machine learning, and deep
> > learning. New
> > frameworks pushing forward the state of the art in these fields are
> > appearing
> > every year. One unintended consequence of all this activity and
> > creativity
> > has been fragmentation in multidimensional array (a.k.a. tensor)
> > libraries -
> > which are the fundamental data structure for these fields. Choices
> > include
> > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> >
> > The APIs of each of these libraries are largely similar, but with
> > enough
> > differences that it?s quite difficult to write code that works with
> > multiple
> > (or all) of these libraries. The array API standard aims to address
> > that
> > issue, by specifying an API for the most common ways arrays are
> > constructed
> > and used. The proposed API is quite similar to NumPy's API, and
> > deviates
> > mainly
> > in places where (a) NumPy made design choices that are inherently not
> > portable
> > to other implementations, and (b) where other libraries consistently
> > deviated
> > from NumPy on purpose because NumPy's design turned out to have
> > issues or
> > unnecessary complexity.
> >
> > For a longer discussion on the purpose of the array API standard we
> > refer to
> > the `Purpose and Scope section of the array API standard <
> > https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`
> > __
> > and the two blog posts announcing the formation of the Consortium
> > [1]_ and
> > the release of the first draft version of the standard for community
> > review
> > [2]_.
> >
> > The scope of this NEP includes:
> >
> > - Adopting the 2021 version of the array API standard
> > - Adding a separate namespace, tentatively named ``numpy.array_api``
> > - Changes needed/desired outside of the new namespace, for example
> > new
> > dunder
> >   methods on the ``ndarray`` object
> > - Implementation choices, and differences between functions in the
> > new
> >   namespace with those in the main ``numpy`` namespace
> > - A new array object conforming to the array API standard
> > - Maintenance effort and testing strategy
> > - Impact on NumPy's total exposed API surface and on other future and
> >   under-discussion design choices
> > - Relation to existing and proposed NumPy array protocols
> >   (``__array_ufunc__``, ``__array_function__``,
> > ``__array_module__``).
> > - Required improvements to existing NumPy functionality
> >
> > Out of scope for this NEP are:
> >
> > - Changes in the array API standard itself. Those are likely to come
> > up
> >   during review of this NEP, but should be upstreamed as needed and
> > this NEP
> >   subsequently updated.
> >
> >
> > Usage and Impact
> > ----------------
> >
> > *This section will be fleshed out later, for now we refer to the use
> > cases
> > given
> > in* `the array API standard Use Cases section <
> > https://data-apis.github.io/array-api/latest/use_cases.html>`__
> >
> > In addition to those use cases, the new namespace contains
> > functionality
> > that
> > is widely used and supported by many array libraries. As such, it is
> > a good
> > set of functions to teach to newcomers to NumPy and recommend as
> > "best
> > practice". That contrasts with NumPy's main namespace, which contains
> > many
> > functions and objects that have been superceded or we consider
> > mistakes -
> > but
> > that we can't remove because of backwards compatibility reasons.
> >
> > The usage of the ``numpy.array_api`` namespace by downstream
> > libraries is
> > intended to enable them to consume multiple kinds of arrays, *without
> > having
> > to have a hard dependency on all of those array libraries*:
> >
> > .. image:: _static/nep-0047-library-dependencies.png
> >
> > Adoption in downstream libraries
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The prototype implementation of the ``array_api`` namespace will be
> > used
> > with
> > SciPy, scikit-learn and other libraries of interest that depend on
> > NumPy, in
> > order to get more experience with the design and find out if any
> > important
> > parts are missing.
> >
> > The pattern to support multiple array libraries is intended to be
> > something
> > like::
> >
> >     def somefunc(x, y):
> >         # Retrieves standard namespace. Raises if x and y have
> > different
> >         # namespaces.  See Appendix for possible get_namespace
> > implementation
> >         xp = get_namespace(x, y)
> >         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> >         return out
> >
> > The ``get_namespace`` call is effectively the library author opting
> > in to
> > using the standard API namespace, and thereby explicitly supporting
> > all conforming array libraries.
> >
> >
> > The ``asarray`` / ``asanyarray`` pattern
> > ````````````````````````````````````````
> >
> > Many existing libraries use the same ``asarray`` (or ``asanyarray``)
> > pattern
> > as NumPy itself does; accepting any object that can be coerced into a
> > ``np.ndarray``.
> > We consider this design pattern problematic - keeping in mind the Zen
> > of
> > Python, *"explicit is better than implicit"*, as well as the pattern
> > being
> > historically problematic in the SciPy ecosystem for ``ndarray``
> > subclasses
> > and with over-eager object creation. All other array/tensor libraries
> > are
> > more strict, and that works out fine in practice. We would advise
> > authors of
> > new libraries to avoid the ``asarray`` pattern. Instead they should
> > either
> > accept just NumPy arrays or, if they want to support multiple kinds
> > of
> > arrays, check if the incoming array object supports the array API
> > standard
> > by checking for ``__array_namespace__`` as shown in the example
> > above.
> >
> > Existing libraries can do such a check as well, and only call
> > ``asarray`` if
> > the check fails. This is very similar to the ``__duckarray__`` idea
> > in
> > :ref:`NEP30`.
> >
> >
> > .. _adoption-application-code:
> >
> > Adoption in application code
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The new namespace can be seen by end users as a cleaned up and
> > slimmed down
> > version of NumPy's main namespace. Encouraging end users to use this
> > namespace like::
> >
> >     import numpy.array_api as xp
> >
> >     x = xp.linspace(0, 2*xp.pi, num=100)
> >     y = xp.cos(x)
> >
> > seems perfectly reasonable, and potentially beneficial - users get
> > offered
> > only
> > one function for each purpose (the one we consider best-practice),
> > and they
> > then write code that is more easily portable to other libraries.
> >
> >
> > Backward compatibility
> > ----------------------
> >
> > No deprecations or removals of existing NumPy APIs or other backwards
> > incompatible changes are proposed.
> >
> >
> > High-level design
> > -----------------
> >
> > The array API standard consists of approximately 120 objects, all of
> > which
> > have a direct NumPy equivalent. This figure shows what is included at
> > a
> > high level:
> >
> > .. image:: _static/nep-0047-scope-of-array-API.png
> >
> > The most important changes compared to what NumPy currently offers
> > are:
> >
> > - A new array object which:
> >
> >     - conforms to the casting rules and indexing behaviour specified
> > by the
> >       standard,
> >     - does not have methods other than dunder methods,
> >     - does not support the full range of NumPy indexing behaviour.
> > Advanced
> >       indexing with integers is not supported. Only boolean indexing
> >       with a single (possibly multi-dimensional) boolean array is
> > supported.
> >       An indexing expression that selects a single element returns a
> > 0-D
> > array
> >       rather than a scalar.
> >
> > - Functions in the ``array_api`` namespace:
> >
> >     - do not accept ``array_like`` inputs, only NumPy arrays and
> > Python
> > scalars
> >     - do not support ``__array_ufunc__`` and ``__array_function__``,
> >     - use positional-only and keyword-only parameters in their
> > signatures,
> >     - have inline type annotations,
> >     - may have minor changes to signatures and semantics of
> > individual
> >       functions compared to their equivalents already present in
> > NumPy,
> >     - only support dtype literals, not format strings or other ways
> > of
> >       specifying dtypes
> >
> > - DLPack_ support will be added to NumPy,
> > - New syntax for "device support" will be added, through a
> > ``.device``
> >   attribute on the new array object, and ``device=`` keywords in
> > array
> > creation
> >   functions in the ``array_api`` namespace,
> > - Casting rules that differ from those NumPy currently has. Output
> > dtypes
> > can
> >   be derived from input dtypes (i.e. no value-based casting), and 0-D
> > arrays
> >   are treated like >=1-D arrays.
> > - Not all dtypes NumPy has are part of the standard. Only boolean,
> > signed
> > and
> >   unsigned integers, and floating-point dtypes up to ``float64`` are
> > supported.
> >   Complex dtypes are expected to be added in the next version of the
> > standard.
> >   Extended precision, string, void, object and datetime dtypes, as
> > well as
> >   structured dtypes, are not included.
> >
> > Improvements to existing NumPy functionality that are needed include:
> >
> > - Add support for stacks of matrices to some functions in
> > ``numpy.linalg``
> >   that are currently missing such support.
> > - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
> > - Add a "never copy" mode to ``np.asarray``.
> >
> >
> > Functions in the ``array_api`` namespace
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Let's start with an example of a function implementation that shows
> > the most
> > important differences with the equivalent function in the main
> > namespace::
> >
> >     def max(x: array, /, *,
> >             axis: Optional[Union[int, Tuple[int, ...]]] = None,
> >             keepdims: bool = False
> >         ) -> array:
> >         """
> >         Array API compatible wrapper for :py:func:`np.max
> > <numpy.max>`.
> >         """
> >         return np.max._implementation(x, axis=axis,
> > keepdims=keepdims)
> >
> > This function does not accept ``array_like`` inputs, only
> > ``ndarray``. There
> > are multiple reasons for this. Other array libraries all work like
> > this.
> > Letting the user do coercion of lists, generators, or other foreign
> > objects
> > separately results in a cleaner design with less unexpected
> > behaviour.
> > It's higher-performance - less overhead from ``asarray`` calls.
> > Static
> > typing
> > is easier. Subclasses will work as expected. And the slight increase
> > in
> > verbosity
> > because users have to explicitly coerce to ``ndarray`` on rare
> > occasions
> > seems like a small price to pay.
> >
> > This function does not support ``__array_ufunc__`` nor
> > ``__array_function__``.
> > These protocols serve a similar purpose as the array API standard
> > module
> > itself,
> > but through a different mechanisms. Because only ``ndarray``
> > instances are
> > accepted,
> > dispatching via one of these protocols isn't useful anymore.
> >
> > This function uses positional-only parameters in its signature. This
> > makes
> > code
> > more portable - writing ``max(x=x, ...)`` is no longer valid, hence
> > if other
> > libraries call the first parameter ``input`` rather than ``x``, that
> > is
> > fine.
> > The rationale for keyword-only parameters (not shown in the above
> > example)
> > is
> > two-fold: clarity of end user code, and it being easier to extend the
> > signature
> > in the future with keywords in the desired order.
> >
> > This function has inline type annotations. Inline annotations are far
> > easier to
> > maintain than separate stub files. And because the types are simple,
> > this
> > will
> > not result in a large amount of clutter with type aliases or unions
> > like in
> > the
> > current stub files NumPy has.
> >
> >
> > DLPack support for zero-copy data interchange
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The ability to convert one kind of array into another kind is
> > valuable, and
> > indeed necessary when downstream libraries want to support multiple
> > kinds of
> > arrays. This requires a well-specified data exchange protocol. NumPy
> > already
> > supports two of these, namely the buffer protocol (i.e., PEP 3118),
> > and
> > the ``__array_interface__`` (Python side) / ``__array_struct__`` (C
> > side)
> > protocol. Both work similarly, letting the "producer" describe how
> > the data
> > is laid out in memory so the "consumer" can construct its own kind of
> > array
> > with a view on that data.
> >
> > DLPack works in a very similar way. The main reasons to prefer DLPack
> > over
> > the options already present in NumPy are:
> >
> > 1. DLPack is the only protocol with device support (e.g., GPUs using
> > CUDA or
> >    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> > array
> >    libraries are not. Having one protocol per device isn't tenable,
> > hence
> >    device support is a must.
> > 2. Widespread support. DLPack has the widest adoption of all
> > protocols, only
> >    NumPy is missing support. And the experiences of other libraries
> > with it
> >    are positive. This contrasts with the protocols NumPy does
> > support, which
> >    are used very little - when other libraries want to interoperate
> > with
> >    NumPy, they typically use the (more limited, and NumPy-specific)
> >    ``__array__`` protocol.
> >
> > Adding support for DLPack to NumPy entails:
> >
> > - Adding a ``ndarray.__dlpack__`` method
> > - Adding a ``from_dlpack`` function, which takes as input an object
> >   supporting ``__dlpack__``, and returns an ``ndarray``.
> >
> > DLPack is currently a ~200 LoC header, and is meant to be included
> > directly, so
> > no external dependency is needed. Implementation should be
> > straightforward.
> >
> >
> > Syntax for device support
> > ~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > NumPy itself is CPU-only, so it clearly doesn't have a need for
> > device
> > support.
> > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> > support
> > multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
> > To write portable code on systems with multiple devices, it's often
> > necessary
> > to create new arrays on the same device as some other array, or check
> > that
> > two arrays live on the same device. Hence syntax for that is needed.
> >
> > The array object will have a ``.device`` attribute which enables
> > comparing
> > devices of different arrays (they only should compare equal if both
> > arrays
> > are
> > from the same library and it's the same hardware device).
> > Furthermore,
> > ``device=`` keywords in array creation functions are needed. For
> > example::
> >
> >     def empty(shape: Union[int, Tuple[int, ...]], /, *,
> >               dtype: Optional[dtype] = None,
> >               device: Optional[device] = None) -> array:
> >         """
> >         Array API compatible wrapper for :py:func:`np.empty
> > <numpy.empty>`.
> >         """
> >         return np.empty(shape, dtype=dtype, device=device)
> >
> > The implementation for NumPy may be as simple as setting the device
> > attribute to
> > the string ``'cpu'`` and raising an exception if array creation
> > functions
> > encounter any other value.
> >
> >
> > Dtypes and casting rules
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The supported dtypes in this namespace are boolean, 8/16/32/64-bit
> > signed
> > and
> > unsigned integer, and 32/64-bit floating-point dtypes. These will be
> > added
> > to
> > the namespace as dtype literals with the expected names (e.g.,
> > ``bool``,
> > ``uint16``, ``float64``).
> >
> > The most obvious omissions are the complex dtypes. The rationale for
> > the
> > lack
> > of complex support in the first version of the array API standard is
> > that
> > several
> > libraries (PyTorch, MXNet) are still in the process of adding support
> > for
> > complex dtypes. The next version of the standard is expected to
> > include
> > ``complex64``
> > and ``complex128`` (see `this issue <
> > https://github.com/data-apis/array-api/issues/102>`__
> > for more details).
> >
> > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is
> > expected
> > to only use the dtype literals. Format strings, Python builtin
> > dtypes, or
> > string representations of the dtype literals are not accepted - this
> > will
> > improve readability and portability of code at little cost.
> >
> > Casting rules are only defined between different dtypes of the same
> > kind.
> > The
> > rationale for this is that mixed-kind (e.g., integer to floating-
> > point)
> > casting behavior differs between libraries. NumPy's mixed-kind
> > casting
> > behavior doesn't need to be changed or restricted, it only needs to
> > be
> > documented that if users use mixed-kind casting, their code may not
> > be
> > portable.
> >
> > .. image:: _static/nep-0047-casting-rules-lattice.png
> >
> > *Type promotion diagram. Promotion between any two types is given by
> > their
> > join on this lattice. Only the types of participating arrays matter,
> > not
> > their values. Dashed lines indicate that behaviour for Python scalars
> > is
> > undefined on overflow. Boolean, integer and floating-point dtypes are
> > not
> > connected, indicating mixed-kind promotion is undefined.*
> >
> > The most important difference between the casting rules in NumPy and
> > in the
> > array API standard is how scalars and 0-dimensional arrays are
> > handled. In
> > the standard, array scalars do not exist and 0-dimensional arrays
> > follow the
> > same casting rules as higher-dimensional arrays.
> >
> > See the `Type Promotion Rules section of the array API standard <
> >
> https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > > `__
> > for more details.
> >
> > .. note::
> >
> >     It is not clear what the best way is to support the different
> > casting
> > rules
> >     for 0-dimensional arrays and no value-based casting. One option
> > may be
> > to
> >     implement this second set of casting rules, keep them private,
> > mark the
> >     array API functions with a private attribute that says they
> > adhere to
> >     these different rules, and let the casting machinery check
> > whether for
> >     that attribute.
> >
> >     This needs discussion.
> >
> >
> > Indexing
> > ~~~~~~~~
> >
> > An indexing expression that would return a scalar with ``ndarray``,
> > e.g.
> > ``arr_2d[0, 0]``, will return a 0-D array with the new array object.
> > There
> > are
> > several reasons for that: array scalars are largely considered a
> > design
> > mistake
> > which no other array library copied; it works better for non-CPU
> > libraries
> > (typically arrays can live on the device, scalars live on the host);
> > and
> > it's
> > simply a consistent design. To get a Python scalar out of a 0-D
> > array, one
> > can
> > simply use the builtin for the type, e.g. ``float(arr_0d)``.
> >
> > The other `indexing modes in the standard <
> >
> https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > > `__
> > do work largely the same as they do for ``numpy.ndarray``. One
> > noteworthy
> > difference is that clipping in slice indexing (e.g., ``a[:n]`` where
> > ``n``
> > is
> > larger than the size of the first axis) is unspecified behaviour,
> > because
> > that kind of check can be expensive on accelerators.
> >
> > The lack of advanced indexing, and boolean indexing being limited to
> > a
> > single
> > n-D boolean array, is due to those indexing modes not being suitable
> > for all
> > types of arrays or JIT compilation. Their absence does not seem to be
> > problematic; if a user or library author wants to use them, they can
> > do so
> > through zero-copy conversion to ``numpy.ndarray``. This will signal
> > correctly
> > to whomever reads the code that it is then NumPy-specific rather than
> > portable
> > to all conforming array types.
> >
> >
> >
> > The array object
> > ~~~~~~~~~~~~~~~~
> >
> > The array object in the standard does not have methods other than
> > dunder
> > methods. The rationale for that is that not all array libraries have
> > methods
> > on their array object (e.g., TensorFlow does not). It also provides
> > only a
> > single way of doing something, rather than have functions and methods
> > that
> > are effectively duplicate.
> >
> > Mixing operations that may produce views (e.g., indexing,
> > ``nonzero``)
> > in combination with mutation (e.g., item or slice assignment) is
> > `explicitly documented in the standard to not be supported <
> >
> https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > > `__.
> > This cannot easily be prohibited in the array object itself; instead
> > this
> > will
> > be guidance to the user via documentation.
> >
> > The standard current does not prescribe a name for the array object
> > itself.
> > We propose to simply name it ``ndarray``. This is the most obvious
> > name, and
> > because of the separate namespace should not clash with
> > ``numpy.ndarray``.
> >
> >
> > Implementation
> > --------------
> >
> > .. note::
> >
> >     This section needs a lot more detail, which will gradually be
> > added when
> >     the implementation progresses.
> >
> > A prototype of the ``array_api`` namespace can be found in
> > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
> > The docstring in its ``__init__.py`` has notes on completeness of the
> > implementation. The code for the wrapper functions also contains ``#
> > Note:``
> > comments everywhere there is a difference with the NumPy API.
> > Two important parts that are not implemented yet are the new array
> > object
> > and
> > DLPack support. Functions may need changes to ensure the changed
> > casting
> > rules
> > are respected.
> >
> > The array object
> > ~~~~~~~~~~~~~~~~
> >
> > Regarding the array object implementation, we plan to start with a
> > regular
> > Python class that wraps a ``numpy.ndarray`` instance. Attributes and
> > methods
> > can forward to that wrapped instance, applying input validation and
> > implementing changed behaviour as needed.
> >
> > The casting rules are probably the most challenging part. The in-
> > progress
> > dtype system refactor (NEPs 40-43) should make implementing the
> > correct
> > casting
> > behaviour easier - it is already moving away from value-based casting
> > for
> > example.
> >
> >
> > The dtype objects
> > ~~~~~~~~~~~~~~~~~
> >
> > We must be able to compare dtypes for equality, and expressions like
> > these
> > must
> > be possible::
> >
> >     np.array_api.some_func(..., dtype=x.dtype)
> >
> > The above implies it would be nice to have ``np.array_api.float32 ==
> > np.array_api.ndarray(...).dtype``.
> >
> > Dtypes should not be assumed to have a class hierarchy by users,
> > however we
> > are
> > free to implement it with a class hierarchy if that's convenient. We
> > considered
> > the following options to implement dtype objects:
> >
> > 1. Alias dtypes to those in the main namespace. E.g.,
> > ``np.array_api.float32 =
> >    np.float32``.
> > 2. Make the dtypes instances of ``np.dtype``. E.g.,
> > ``np.array_api.float32 =
> >    np.dtype(np.float32)``.
> > 3. Create new singleton classes with only the required
> > methods/attributes
> >    (currently just ``__eq__``).
> >
> > It seems like (2) would be easiest from the perspective of
> > interacting with
> > functions outside the main namespace. And (3) would adhere best to
> > the
> > standard.
> >
> > TBD: the standard does not yet have a good way to inspect properties
> > of a
> > dtype, to ask questions like "is this an integer dtype?". Perhaps
> > this is
> > easy
> > enough to do for users, like so::
> >
> >     def _get_dtype(dt_or_arr):
> >         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else
> > dt_or_arr
> >
> >     def is_floating(dtype_or_array):
> >         dtype = _get_dtype(dtype_or_array)
> >         return dtype in (float32, float64)
> >
> >     def is_integer(dtype_or_array):
> >         dtype = _get_dtype(dtype_or_array)
> >         return dtype in (uint8, uint16, uint32, uint64, int8, int16,
> > int32,
> > int64)
> >
> > However it could make sense to add to the standard. Note that NumPy
> > itself
> > currently does not have a great for asking such questions, see
> > `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/6103a69a/attachment-0001.html>

From sebastian at sipsolutions.net  Mon Feb 22 14:51:43 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 22 Feb 2021 13:51:43 -0600
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <CABL7CQj2gxc9a0Ja4xBxPv13nC_OKqQ=TGU93-m4nQsAPuMj9w@mail.gmail.com>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <22b4290c7f5d5f6a17f819245403b90921d46504.camel@sipsolutions.net>
 <CABL7CQj2gxc9a0Ja4xBxPv13nC_OKqQ=TGU93-m4nQsAPuMj9w@mail.gmail.com>
Message-ID: <aacc5c2f0980273b79c82b11f6e1834c081e8b54.camel@sipsolutions.net>

On Mon, 2021-02-22 at 20:16 +0100, Ralf Gommers wrote:

<snip>

> > 
> > It seems the current idea is to create a new NumPy array subclass.
> > That
> > sounds good, but I am a bit worried how that is going to interact
> > with
> > actual NumPy arrays.
> > 
> 
> Not a subclass! If you got that impression, I should clarify the
> text. The

Sorry, you do write "wraps a numpy.ndarray", I am not sure why I got
the subclass idea when reading it yesterday.

But, in that case maybe you should just implement it as:

    def somefunc(x):
        # do whatever checks needed here for, e.g., input validation
        # then call the native numpy implementation:
        x = as_numpy_array(x)
        result = np.somefunc(x)
        return as_minimal(x)

and not even use the `._implementation`?  I guess small issue is that
we don't have a way to call `as_ndarray` on all relevant inputs
conveniently.  In most cases it will be straight forward though.

I assume you need your own `as_numpy_array` call, to reject ndarray-
subclasses, that `_implementation` will allow to pass through.


> idea is a standalone class that doesn't inherit from anything, and
> has only
> the methods, attributes and semantics described in the API standard.
> It
> just uses np.asarray under the hood. This will be clearer when we
> have a
> prototype for it, probably within two weeks.
> 

Then I should wait for the prototype, for more discussion :).

> Most SciPy users, will still use NumPy proper.? So SciPy must not
> > return this "random" subclass, when the user did pass in a NumPy
> > array
> > (or even by default).
> > 
> 
> Agreed. ndarray in = ndarray out; new array object in = new array
> object
> out.
> 
> 
> > At that point, what will the SciPy dev do to juggle the fact that
> > you
> > would like to use the new API internally, but the interface must
> > still
> > default to NumPy (and may depend on the input)?
> > This is probably not very tricky, but I am slightly worried about
> > what
> > happens when things get mixed up.? Also if a user passes this
> > "minimal
> > array" into a current NumPy API function, it will often not work if
> > it
> > is a subclass.
> > 
> > ---
> > 
> > Related to that: how important is it to keep that namespace a
> > "minimal"
> > implementation, rather than a "conforming" one?? For example, would
> > you
> > want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or
> > just
> > `dtype="complex128"`?
> > 
> 
> Yes, I'd definitely want to reject that. Format strings are terrible.


Agreed, I guess I am wondering whether we can find a good solution that
does not involve writing stubs around 140 functions with more strict
input validation.
But maybe it is also not particularly difficult or churn to do... Or
even automate, e.g. from the typing stubs.


> 
> Maybe I got the wrong impression though.? Is the aim for a minimal
> > implementation, but you are OK as long as it is a conforming one?
> > 
> 
> In principle we're okay with a conforming one that's a superset of
> what is
> in the standard. But I think we'd only want to do that if creating a
> minimal one turns out to be difficult. Having the minimal required
> set is
> much nicer when one wants to write portable code. Because then you
> can do
> so without checking the docs whether any object/method is in
> "minimal" or
> in "extended".
> 

Right, you would like to have a minimal implementation somewhere.
Having it in NumPy could be convenient, although not strictly
necessary.


<snip>

> 
> > Besides `_implementation` usually does support array subclasses and
> > may
> > still dispatch again internally at this time!
> > 
> 
> What, it dispatches again? That seems very suboptimal. If there's no
> clean
> way to avoid a dispatch, it may make sense to just check array inputs
> for
> the presence of __array_function__ and raise an exception if it's
> present.
> 


I do not think our functions where ever rewritten to only use e.g.
`._implementation()` internally. I am not even quite certain that would
be correct for subclasses.

It is annoying that you may have to struggle with it here to do
something that is different from the implicit dispatchers.  But on the
up-side a clear solution would be helpful in any case.

Cheers,

Sebastian


> It's not just about overhead (that's a minor thing), it's that the
> feature
> does not make much sense in combination with the array_api namespace.
> The
> "get a hold of a new namespace" approach is like __array_module__,
> which
> was an alternative to __array_function__ not an addition to it.
> 
> 
> 
> > ---
> > 
> > I am somewhat worried that getting the promotion (and other quirks)
> > to
> > where you want could be very tricky, unless we are patient enough
> > to
> > wait for NumPy proper to evolve.
> > Hopefully I am just too pessimistic and e.g. a mild form of code
> > duplication can solve all of that.? Probably time and trial-and-
> > error
> > will be the judge on that...
> > 
> 
> I do agree that the different casting rules are the single most
> tricky
> issue implementation wise.
> 
> Cheers,
> Ralf
> 
> 
> 
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > > 
> > > See
> > > 
> >     
> > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> > > for an initial discussion about this topic.
> > > 
> > > Please keep high-level discussion here and detailed comments on
> > > https://github.com/numpy/numpy/pull/18456. Also, you can access a
> > > rendered
> > > version of the NEP from that PR (see PR description for how),
> > > which
> > > may be
> > > helpful.
> > > Cheers,
> > > Ralf
> > > 
> > > 
> > > Abstract
> > > --------
> > > 
> > > We propose to adopt the `Python array API standard`_, developed
> > > by
> > > the
> > > `Consortium for Python Data API Standards`_. Implementing this as
> > > a
> > > separate
> > > new namespace in NumPy will allow authors of libraries which
> > > depend
> > > on NumPy
> > > as well as end users to write code that is portable between NumPy
> > > and
> > > all
> > > other array/tensor libraries that adopt this standard.
> > > 
> > > .. note::
> > > 
> > > ??? We expect that this NEP will remain in a draft state for
> > > quite a
> > > while.
> > > ??? Given the large scope we don't expect to propose it for
> > > acceptance any
> > > ??? time soon; instead, we want to solicit feedback on both the
> > > high-
> > > level
> > > ??? design and implementation, and learn what needs describing
> > > better
> > > in
> > > this
> > > ??? NEP or changing in either the implementation or the array API
> > > standard
> > > ??? itself.
> > > 
> > > 
> > > Motivation and Scope
> > > --------------------
> > > 
> > > Python users have a wealth of choice for libraries and frameworks
> > > for
> > > numerical computing, data science, machine learning, and deep
> > > learning. New
> > > frameworks pushing forward the state of the art in these fields
> > > are
> > > appearing
> > > every year. One unintended consequence of all this activity and
> > > creativity
> > > has been fragmentation in multidimensional array (a.k.a. tensor)
> > > libraries -
> > > which are the fundamental data structure for these fields.
> > > Choices
> > > include
> > > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> > > 
> > > The APIs of each of these libraries are largely similar, but with
> > > enough
> > > differences that it?s quite difficult to write code that works
> > > with
> > > multiple
> > > (or all) of these libraries. The array API standard aims to
> > > address
> > > that
> > > issue, by specifying an API for the most common ways arrays are
> > > constructed
> > > and used. The proposed API is quite similar to NumPy's API, and
> > > deviates
> > > mainly
> > > in places where (a) NumPy made design choices that are inherently
> > > not
> > > portable
> > > to other implementations, and (b) where other libraries
> > > consistently
> > > deviated
> > > from NumPy on purpose because NumPy's design turned out to have
> > > issues or
> > > unnecessary complexity.
> > > 
> > > For a longer discussion on the purpose of the array API standard
> > > we
> > > refer to
> > > the `Purpose and Scope section of the array API standard <
> > >     
> > > https://data-apis.github.io/array-api/latest/purpose_and_scope.html
> > > >`
> > > __
> > > and the two blog posts announcing the formation of the Consortium
> > > [1]_ and
> > > the release of the first draft version of the standard for
> > > community
> > > review
> > > [2]_.
> > > 
> > > The scope of this NEP includes:
> > > 
> > > - Adopting the 2021 version of the array API standard
> > > - Adding a separate namespace, tentatively named
> > > ``numpy.array_api``
> > > - Changes needed/desired outside of the new namespace, for
> > > example
> > > new
> > > dunder
> > > ? methods on the ``ndarray`` object
> > > - Implementation choices, and differences between functions in
> > > the
> > > new
> > > ? namespace with those in the main ``numpy`` namespace
> > > - A new array object conforming to the array API standard
> > > - Maintenance effort and testing strategy
> > > - Impact on NumPy's total exposed API surface and on other future
> > > and
> > > ? under-discussion design choices
> > > - Relation to existing and proposed NumPy array protocols
> > > ? (``__array_ufunc__``, ``__array_function__``,
> > > ``__array_module__``).
> > > - Required improvements to existing NumPy functionality
> > > 
> > > Out of scope for this NEP are:
> > > 
> > > - Changes in the array API standard itself. Those are likely to
> > > come
> > > up
> > > ? during review of this NEP, but should be upstreamed as needed
> > > and
> > > this NEP
> > > ? subsequently updated.
> > > 
> > > 
> > > Usage and Impact
> > > ----------------
> > > 
> > > *This section will be fleshed out later, for now we refer to the
> > > use
> > > cases
> > > given
> > > in* `the array API standard Use Cases section <
> > > https://data-apis.github.io/array-api/latest/use_cases.html>`__
> > > 
> > > In addition to those use cases, the new namespace contains
> > > functionality
> > > that
> > > is widely used and supported by many array libraries. As such, it
> > > is
> > > a good
> > > set of functions to teach to newcomers to NumPy and recommend as
> > > "best
> > > practice". That contrasts with NumPy's main namespace, which
> > > contains
> > > many
> > > functions and objects that have been superceded or we consider
> > > mistakes -
> > > but
> > > that we can't remove because of backwards compatibility reasons.
> > > 
> > > The usage of the ``numpy.array_api`` namespace by downstream
> > > libraries is
> > > intended to enable them to consume multiple kinds of arrays,
> > > *without
> > > having
> > > to have a hard dependency on all of those array libraries*:
> > > 
> > > .. image:: _static/nep-0047-library-dependencies.png
> > > 
> > > Adoption in downstream libraries
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The prototype implementation of the ``array_api`` namespace will
> > > be
> > > used
> > > with
> > > SciPy, scikit-learn and other libraries of interest that depend
> > > on
> > > NumPy, in
> > > order to get more experience with the design and find out if any
> > > important
> > > parts are missing.
> > > 
> > > The pattern to support multiple array libraries is intended to be
> > > something
> > > like::
> > > 
> > > ??? def somefunc(x, y):
> > > ??????? # Retrieves standard namespace. Raises if x and y have
> > > different
> > > ??????? # namespaces.? See Appendix for possible get_namespace
> > > implementation
> > > ??????? xp = get_namespace(x, y)
> > > ??????? out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> > > ??????? return out
> > > 
> > > The ``get_namespace`` call is effectively the library author
> > > opting
> > > in to
> > > using the standard API namespace, and thereby explicitly
> > > supporting
> > > all conforming array libraries.
> > > 
> > > 
> > > The ``asarray`` / ``asanyarray`` pattern
> > > ````````````````````````````````````````
> > > 
> > > Many existing libraries use the same ``asarray`` (or
> > > ``asanyarray``)
> > > pattern
> > > as NumPy itself does; accepting any object that can be coerced
> > > into a
> > > ``np.ndarray``.
> > > We consider this design pattern problematic - keeping in mind the
> > > Zen
> > > of
> > > Python, *"explicit is better than implicit"*, as well as the
> > > pattern
> > > being
> > > historically problematic in the SciPy ecosystem for ``ndarray``
> > > subclasses
> > > and with over-eager object creation. All other array/tensor
> > > libraries
> > > are
> > > more strict, and that works out fine in practice. We would advise
> > > authors of
> > > new libraries to avoid the ``asarray`` pattern. Instead they
> > > should
> > > either
> > > accept just NumPy arrays or, if they want to support multiple
> > > kinds
> > > of
> > > arrays, check if the incoming array object supports the array API
> > > standard
> > > by checking for ``__array_namespace__`` as shown in the example
> > > above.
> > > 
> > > Existing libraries can do such a check as well, and only call
> > > ``asarray`` if
> > > the check fails. This is very similar to the ``__duckarray__``
> > > idea
> > > in
> > > :ref:`NEP30`.
> > > 
> > > 
> > > .. _adoption-application-code:
> > > 
> > > Adoption in application code
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The new namespace can be seen by end users as a cleaned up and
> > > slimmed down
> > > version of NumPy's main namespace. Encouraging end users to use
> > > this
> > > namespace like::
> > > 
> > > ??? import numpy.array_api as xp
> > > 
> > > ??? x = xp.linspace(0, 2*xp.pi, num=100)
> > > ??? y = xp.cos(x)
> > > 
> > > seems perfectly reasonable, and potentially beneficial - users
> > > get
> > > offered
> > > only
> > > one function for each purpose (the one we consider best-
> > > practice),
> > > and they
> > > then write code that is more easily portable to other libraries.
> > > 
> > > 
> > > Backward compatibility
> > > ----------------------
> > > 
> > > No deprecations or removals of existing NumPy APIs or other
> > > backwards
> > > incompatible changes are proposed.
> > > 
> > > 
> > > High-level design
> > > -----------------
> > > 
> > > The array API standard consists of approximately 120 objects, all
> > > of
> > > which
> > > have a direct NumPy equivalent. This figure shows what is
> > > included at
> > > a
> > > high level:
> > > 
> > > .. image:: _static/nep-0047-scope-of-array-API.png
> > > 
> > > The most important changes compared to what NumPy currently
> > > offers
> > > are:
> > > 
> > > - A new array object which:
> > > 
> > > ??? - conforms to the casting rules and indexing behaviour
> > > specified
> > > by the
> > > ????? standard,
> > > ??? - does not have methods other than dunder methods,
> > > ??? - does not support the full range of NumPy indexing
> > > behaviour.
> > > Advanced
> > > ????? indexing with integers is not supported. Only boolean
> > > indexing
> > > ????? with a single (possibly multi-dimensional) boolean array is
> > > supported.
> > > ????? An indexing expression that selects a single element
> > > returns a
> > > 0-D
> > > array
> > > ????? rather than a scalar.
> > > 
> > > - Functions in the ``array_api`` namespace:
> > > 
> > > ??? - do not accept ``array_like`` inputs, only NumPy arrays and
> > > Python
> > > scalars
> > > ??? - do not support ``__array_ufunc__`` and
> > > ``__array_function__``,
> > > ??? - use positional-only and keyword-only parameters in their
> > > signatures,
> > > ??? - have inline type annotations,
> > > ??? - may have minor changes to signatures and semantics of
> > > individual
> > > ????? functions compared to their equivalents already present in
> > > NumPy,
> > > ??? - only support dtype literals, not format strings or other
> > > ways
> > > of
> > > ????? specifying dtypes
> > > 
> > > - DLPack_ support will be added to NumPy,
> > > - New syntax for "device support" will be added, through a
> > > ``.device``
> > > ? attribute on the new array object, and ``device=`` keywords in
> > > array
> > > creation
> > > ? functions in the ``array_api`` namespace,
> > > - Casting rules that differ from those NumPy currently has.
> > > Output
> > > dtypes
> > > can
> > > ? be derived from input dtypes (i.e. no value-based casting), and
> > > 0-D
> > > arrays
> > > ? are treated like >=1-D arrays.
> > > - Not all dtypes NumPy has are part of the standard. Only
> > > boolean,
> > > signed
> > > and
> > > ? unsigned integers, and floating-point dtypes up to ``float64``
> > > are
> > > supported.
> > > ? Complex dtypes are expected to be added in the next version of
> > > the
> > > standard.
> > > ? Extended precision, string, void, object and datetime dtypes,
> > > as
> > > well as
> > > ? structured dtypes, are not included.
> > > 
> > > Improvements to existing NumPy functionality that are needed
> > > include:
> > > 
> > > - Add support for stacks of matrices to some functions in
> > > ``numpy.linalg``
> > > ? that are currently missing such support.
> > > - Add the ``keepdims`` keyword to ``np.argmin`` and
> > > ``np.argmax``.
> > > - Add a "never copy" mode to ``np.asarray``.
> > > 
> > > 
> > > Functions in the ``array_api`` namespace
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > Let's start with an example of a function implementation that
> > > shows
> > > the most
> > > important differences with the equivalent function in the main
> > > namespace::
> > > 
> > > ??? def max(x: array, /, *,
> > > ??????????? axis: Optional[Union[int, Tuple[int, ...]]] = None,
> > > ??????????? keepdims: bool = False
> > > ??????? ) -> array:
> > > ??????? """
> > > ??????? Array API compatible wrapper for :py:func:`np.max
> > > <numpy.max>`.
> > > ??????? """
> > > ??????? return np.max._implementation(x, axis=axis,
> > > keepdims=keepdims)
> > > 
> > > This function does not accept ``array_like`` inputs, only
> > > ``ndarray``. There
> > > are multiple reasons for this. Other array libraries all work
> > > like
> > > this.
> > > Letting the user do coercion of lists, generators, or other
> > > foreign
> > > objects
> > > separately results in a cleaner design with less unexpected
> > > behaviour.
> > > It's higher-performance - less overhead from ``asarray`` calls.
> > > Static
> > > typing
> > > is easier. Subclasses will work as expected. And the slight
> > > increase
> > > in
> > > verbosity
> > > because users have to explicitly coerce to ``ndarray`` on rare
> > > occasions
> > > seems like a small price to pay.
> > > 
> > > This function does not support ``__array_ufunc__`` nor
> > > ``__array_function__``.
> > > These protocols serve a similar purpose as the array API standard
> > > module
> > > itself,
> > > but through a different mechanisms. Because only ``ndarray``
> > > instances are
> > > accepted,
> > > dispatching via one of these protocols isn't useful anymore.
> > > 
> > > This function uses positional-only parameters in its signature.
> > > This
> > > makes
> > > code
> > > more portable - writing ``max(x=x, ...)`` is no longer valid,
> > > hence
> > > if other
> > > libraries call the first parameter ``input`` rather than ``x``,
> > > that
> > > is
> > > fine.
> > > The rationale for keyword-only parameters (not shown in the above
> > > example)
> > > is
> > > two-fold: clarity of end user code, and it being easier to extend
> > > the
> > > signature
> > > in the future with keywords in the desired order.
> > > 
> > > This function has inline type annotations. Inline annotations are
> > > far
> > > easier to
> > > maintain than separate stub files. And because the types are
> > > simple,
> > > this
> > > will
> > > not result in a large amount of clutter with type aliases or
> > > unions
> > > like in
> > > the
> > > current stub files NumPy has.
> > > 
> > > 
> > > DLPack support for zero-copy data interchange
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The ability to convert one kind of array into another kind is
> > > valuable, and
> > > indeed necessary when downstream libraries want to support
> > > multiple
> > > kinds of
> > > arrays. This requires a well-specified data exchange protocol.
> > > NumPy
> > > already
> > > supports two of these, namely the buffer protocol (i.e., PEP
> > > 3118),
> > > and
> > > the ``__array_interface__`` (Python side) / ``__array_struct__``
> > > (C
> > > side)
> > > protocol. Both work similarly, letting the "producer" describe
> > > how
> > > the data
> > > is laid out in memory so the "consumer" can construct its own
> > > kind of
> > > array
> > > with a view on that data.
> > > 
> > > DLPack works in a very similar way. The main reasons to prefer
> > > DLPack
> > > over
> > > the options already present in NumPy are:
> > > 
> > > 1. DLPack is the only protocol with device support (e.g., GPUs
> > > using
> > > CUDA or
> > > ?? ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> > > array
> > > ?? libraries are not. Having one protocol per device isn't
> > > tenable,
> > > hence
> > > ?? device support is a must.
> > > 2. Widespread support. DLPack has the widest adoption of all
> > > protocols, only
> > > ?? NumPy is missing support. And the experiences of other
> > > libraries
> > > with it
> > > ?? are positive. This contrasts with the protocols NumPy does
> > > support, which
> > > ?? are used very little - when other libraries want to
> > > interoperate
> > > with
> > > ?? NumPy, they typically use the (more limited, and NumPy-
> > > specific)
> > > ?? ``__array__`` protocol.
> > > 
> > > Adding support for DLPack to NumPy entails:
> > > 
> > > - Adding a ``ndarray.__dlpack__`` method
> > > - Adding a ``from_dlpack`` function, which takes as input an
> > > object
> > > ? supporting ``__dlpack__``, and returns an ``ndarray``.
> > > 
> > > DLPack is currently a ~200 LoC header, and is meant to be
> > > included
> > > directly, so
> > > no external dependency is needed. Implementation should be
> > > straightforward.
> > > 
> > > 
> > > Syntax for device support
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > NumPy itself is CPU-only, so it clearly doesn't have a need for
> > > device
> > > support.
> > > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> > > support
> > > multiple types of devices: CPU, GPU, TPU, and more exotic
> > > hardware.
> > > To write portable code on systems with multiple devices, it's
> > > often
> > > necessary
> > > to create new arrays on the same device as some other array, or
> > > check
> > > that
> > > two arrays live on the same device. Hence syntax for that is
> > > needed.
> > > 
> > > The array object will have a ``.device`` attribute which enables
> > > comparing
> > > devices of different arrays (they only should compare equal if
> > > both
> > > arrays
> > > are
> > > from the same library and it's the same hardware device).
> > > Furthermore,
> > > ``device=`` keywords in array creation functions are needed. For
> > > example::
> > > 
> > > ??? def empty(shape: Union[int, Tuple[int, ...]], /, *,
> > > ????????????? dtype: Optional[dtype] = None,
> > > ????????????? device: Optional[device] = None) -> array:
> > > ??????? """
> > > ??????? Array API compatible wrapper for :py:func:`np.empty
> > > <numpy.empty>`.
> > > ??????? """
> > > ??????? return np.empty(shape, dtype=dtype, device=device)
> > > 
> > > The implementation for NumPy may be as simple as setting the
> > > device
> > > attribute to
> > > the string ``'cpu'`` and raising an exception if array creation
> > > functions
> > > encounter any other value.
> > > 
> > > 
> > > Dtypes and casting rules
> > > ~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The supported dtypes in this namespace are boolean, 8/16/32/64-
> > > bit
> > > signed
> > > and
> > > unsigned integer, and 32/64-bit floating-point dtypes. These will
> > > be
> > > added
> > > to
> > > the namespace as dtype literals with the expected names (e.g.,
> > > ``bool``,
> > > ``uint16``, ``float64``).
> > > 
> > > The most obvious omissions are the complex dtypes. The rationale
> > > for
> > > the
> > > lack
> > > of complex support in the first version of the array API standard
> > > is
> > > that
> > > several
> > > libraries (PyTorch, MXNet) are still in the process of adding
> > > support
> > > for
> > > complex dtypes. The next version of the standard is expected to
> > > include
> > > ``complex64``
> > > and ``complex128`` (see `this issue <
> > > https://github.com/data-apis/array-api/issues/102>`__
> > > for more details).
> > > 
> > > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword,
> > > is
> > > expected
> > > to only use the dtype literals. Format strings, Python builtin
> > > dtypes, or
> > > string representations of the dtype literals are not accepted -
> > > this
> > > will
> > > improve readability and portability of code at little cost.
> > > 
> > > Casting rules are only defined between different dtypes of the
> > > same
> > > kind.
> > > The
> > > rationale for this is that mixed-kind (e.g., integer to floating-
> > > point)
> > > casting behavior differs between libraries. NumPy's mixed-kind
> > > casting
> > > behavior doesn't need to be changed or restricted, it only needs
> > > to
> > > be
> > > documented that if users use mixed-kind casting, their code may
> > > not
> > > be
> > > portable.
> > > 
> > > .. image:: _static/nep-0047-casting-rules-lattice.png
> > > 
> > > *Type promotion diagram. Promotion between any two types is given
> > > by
> > > their
> > > join on this lattice. Only the types of participating arrays
> > > matter,
> > > not
> > > their values. Dashed lines indicate that behaviour for Python
> > > scalars
> > > is
> > > undefined on overflow. Boolean, integer and floating-point dtypes
> > > are
> > > not
> > > connected, indicating mixed-kind promotion is undefined.*
> > > 
> > > The most important difference between the casting rules in NumPy
> > > and
> > > in the
> > > array API standard is how scalars and 0-dimensional arrays are
> > > handled. In
> > > the standard, array scalars do not exist and 0-dimensional arrays
> > > follow the
> > > same casting rules as higher-dimensional arrays.
> > > 
> > > See the `Type Promotion Rules section of the array API standard <
> > > 
> >     
> > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > > > `__
> > > for more details.
> > > 
> > > .. note::
> > > 
> > > ??? It is not clear what the best way is to support the different
> > > casting
> > > rules
> > > ??? for 0-dimensional arrays and no value-based casting. One
> > > option
> > > may be
> > > to
> > > ??? implement this second set of casting rules, keep them
> > > private,
> > > mark the
> > > ??? array API functions with a private attribute that says they
> > > adhere to
> > > ??? these different rules, and let the casting machinery check
> > > whether for
> > > ??? that attribute.
> > > 
> > > ??? This needs discussion.
> > > 
> > > 
> > > Indexing
> > > ~~~~~~~~
> > > 
> > > An indexing expression that would return a scalar with
> > > ``ndarray``,
> > > e.g.
> > > ``arr_2d[0, 0]``, will return a 0-D array with the new array
> > > object.
> > > There
> > > are
> > > several reasons for that: array scalars are largely considered a
> > > design
> > > mistake
> > > which no other array library copied; it works better for non-CPU
> > > libraries
> > > (typically arrays can live on the device, scalars live on the
> > > host);
> > > and
> > > it's
> > > simply a consistent design. To get a Python scalar out of a 0-D
> > > array, one
> > > can
> > > simply use the builtin for the type, e.g. ``float(arr_0d)``.
> > > 
> > > The other `indexing modes in the standard <
> > > 
> >     
> > https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > > > `__
> > > do work largely the same as they do for ``numpy.ndarray``. One
> > > noteworthy
> > > difference is that clipping in slice indexing (e.g., ``a[:n]``
> > > where
> > > ``n``
> > > is
> > > larger than the size of the first axis) is unspecified behaviour,
> > > because
> > > that kind of check can be expensive on accelerators.
> > > 
> > > The lack of advanced indexing, and boolean indexing being limited
> > > to
> > > a
> > > single
> > > n-D boolean array, is due to those indexing modes not being
> > > suitable
> > > for all
> > > types of arrays or JIT compilation. Their absence does not seem
> > > to be
> > > problematic; if a user or library author wants to use them, they
> > > can
> > > do so
> > > through zero-copy conversion to ``numpy.ndarray``. This will
> > > signal
> > > correctly
> > > to whomever reads the code that it is then NumPy-specific rather
> > > than
> > > portable
> > > to all conforming array types.
> > > 
> > > 
> > > 
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > > 
> > > The array object in the standard does not have methods other than
> > > dunder
> > > methods. The rationale for that is that not all array libraries
> > > have
> > > methods
> > > on their array object (e.g., TensorFlow does not). It also
> > > provides
> > > only a
> > > single way of doing something, rather than have functions and
> > > methods
> > > that
> > > are effectively duplicate.
> > > 
> > > Mixing operations that may produce views (e.g., indexing,
> > > ``nonzero``)
> > > in combination with mutation (e.g., item or slice assignment) is
> > > `explicitly documented in the standard to not be supported <
> > > 
> >     
> > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > > > `__.
> > > This cannot easily be prohibited in the array object itself;
> > > instead
> > > this
> > > will
> > > be guidance to the user via documentation.
> > > 
> > > The standard current does not prescribe a name for the array
> > > object
> > > itself.
> > > We propose to simply name it ``ndarray``. This is the most
> > > obvious
> > > name, and
> > > because of the separate namespace should not clash with
> > > ``numpy.ndarray``.
> > > 
> > > 
> > > Implementation
> > > --------------
> > > 
> > > .. note::
> > > 
> > > ??? This section needs a lot more detail, which will gradually be
> > > added when
> > > ??? the implementation progresses.
> > > 
> > > A prototype of the ``array_api`` namespace can be found in
> > >     
> > > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api
> > > .
> > > The docstring in its ``__init__.py`` has notes on completeness of
> > > the
> > > implementation. The code for the wrapper functions also contains
> > > ``#
> > > Note:``
> > > comments everywhere there is a difference with the NumPy API.
> > > Two important parts that are not implemented yet are the new
> > > array
> > > object
> > > and
> > > DLPack support. Functions may need changes to ensure the changed
> > > casting
> > > rules
> > > are respected.
> > > 
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > > 
> > > Regarding the array object implementation, we plan to start with
> > > a
> > > regular
> > > Python class that wraps a ``numpy.ndarray`` instance. Attributes
> > > and
> > > methods
> > > can forward to that wrapped instance, applying input validation
> > > and
> > > implementing changed behaviour as needed.
> > > 
> > > The casting rules are probably the most challenging part. The in-
> > > progress
> > > dtype system refactor (NEPs 40-43) should make implementing the
> > > correct
> > > casting
> > > behaviour easier - it is already moving away from value-based
> > > casting
> > > for
> > > example.
> > > 
> > > 
> > > The dtype objects
> > > ~~~~~~~~~~~~~~~~~
> > > 
> > > We must be able to compare dtypes for equality, and expressions
> > > like
> > > these
> > > must
> > > be possible::
> > > 
> > > ??? np.array_api.some_func(..., dtype=x.dtype)
> > > 
> > > The above implies it would be nice to have ``np.array_api.float32
> > > ==
> > > np.array_api.ndarray(...).dtype``.
> > > 
> > > Dtypes should not be assumed to have a class hierarchy by users,
> > > however we
> > > are
> > > free to implement it with a class hierarchy if that's convenient.
> > > We
> > > considered
> > > the following options to implement dtype objects:
> > > 
> > > 1. Alias dtypes to those in the main namespace. E.g.,
> > > ``np.array_api.float32 =
> > > ?? np.float32``.
> > > 2. Make the dtypes instances of ``np.dtype``. E.g.,
> > > ``np.array_api.float32 =
> > > ?? np.dtype(np.float32)``.
> > > 3. Create new singleton classes with only the required
> > > methods/attributes
> > > ?? (currently just ``__eq__``).
> > > 
> > > It seems like (2) would be easiest from the perspective of
> > > interacting with
> > > functions outside the main namespace. And (3) would adhere best
> > > to
> > > the
> > > standard.
> > > 
> > > TBD: the standard does not yet have a good way to inspect
> > > properties
> > > of a
> > > dtype, to ask questions like "is this an integer dtype?". Perhaps
> > > this is
> > > easy
> > > enough to do for users, like so::
> > > 
> > > ??? def _get_dtype(dt_or_arr):
> > > ??????? return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype')
> > > else
> > > dt_or_arr
> > > 
> > > ??? def is_floating(dtype_or_array):
> > > ??????? dtype = _get_dtype(dtype_or_array)
> > > ??????? return dtype in (float32, float64)
> > > 
> > > ??? def is_integer(dtype_or_array):
> > > ??????? dtype = _get_dtype(dtype_or_array)
> > > ??????? return dtype in (uint8, uint16, uint32, uint64, int8,
> > > int16,
> > > int32,
> > > int64)
> > > 
> > > However it could make sense to add to the standard. Note that
> > > NumPy
> > > itself
> > > currently does not have a great for asking such questions, see
> > > `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/375993a8/attachment-0001.sig>

From shoyer at gmail.com  Mon Feb 22 15:33:51 2021
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 22 Feb 2021 12:33:51 -0800
Subject: [Numpy-discussion] NEP 48: Spending NumPy Project funds
In-Reply-To: <CAPpwKcz=RbcNWsG5JbA0=LPPe3G5xkJd+05buZuDV39mrZN20Q@mail.gmail.com>
References: <CABL7CQhZe4dqTgZZ_A8XhKzW9iWWingo4ru8iOFxEzF1-3AwdA@mail.gmail.com>
 <CAPpwKcz=RbcNWsG5JbA0=LPPe3G5xkJd+05buZuDV39mrZN20Q@mail.gmail.com>
Message-ID: <CAEQ_Tvd3oB0Tw3rBx_E49LfQ=WY7VehXOtwr_y3xQ_y76FZP3Q@mail.gmail.com>

On Mon, Feb 22, 2021 at 4:08 AM Pearu Peterson <pearu.peterson at gmail.com>
wrote:

> Hi,
>
> See GH discussion starting at
> https://github.com/numpy/numpy/pull/18454#discussion_r579967791 for the
> raised issue that is now moved here.
>
> Re "Compensating fairly" section:
>
> The NEP proposes location-dependent contracts for fair pays.
>
> I think this is a contradictory approach as location is not the only
> factor that may influence fairness. As an example, contractors may have
> different levels of obligations to their families, and one might argue this
> should be taken into consideration as well because the family size and the
> required level of commitment to the family members (kids, members who need
> special care, etc) can have a huge influence on the contractors living
> standards, not just the level of average rent in the particular location.
> It would be unfair to take into account location but not the family
> situation. There may be other factors as well that may influence fairness
> and I think this will make the decision-making about contracting harder
> and, most importantly, controversial.
>
> My proposal is that factors like location, family situation, etc should be
> discarded when negotiating contract terms. The efficiency of using the
> project funding should be defined by how well and quickly a particular
> contractor is able to get the job done,  but not how the contractors are
> likely to spend their pays - it is nobody's business, IMHO, and is likely
> very hard if not impossible to verify.
>

One difference is that it is illegal (at least under US law) to consider
factors such as family situation in determining pay.

However, it is both legal and standard to consider location. I'm not saying
we should necessarily do it, but it's an accepted practice. NumPy
development is global, but prevailing wages are not.


>
> My 2cents,
> Pearu
>
> On Sun, Feb 21, 2021 at 4:52 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>> Compensating fairly
>> ```````````````````
>>
>> Paying people fairly is a difficult topic. Therefore, we will only offer
>> some
>> guidance here. Final decisions will always have to be considered and
>> approved
>> by the group of people that bears this responsibility (according to the
>> current NumPy governance structure, this would be the NumPy Steering
>> Council).
>>
>> Discussions on employee compensation tend to be dominated by two
>> narratives:
>> "pay local market rates" and "same work -- same pay".
>>
>> We consider them both extreme:
>>
>> - "Same work -- same pay" is unfair to people living in locations with a
>> higher
>>   cost of living. For example, the average rent for a single family
>> apartment
>>   can differ by a large factor (from a few hundred dollar to thousands of
>>   dollars per month).
>> - "Pay local market rates" bakes in existing inequalities between
>> countries
>>   and makes fixed-cost items like a development machine or a holiday trip
>>   abroad relatively harder to afford in locations where market rates are
>> lower.
>>
>> We seek to find a middle ground between these two extremes.
>>
>> Useful points of reference include companies like GitLab and
>> Buffer who are transparent about their remuneration policies ([3]_, [4]_),
>> Google Summer of Code stipends ([5]_), other open source projects that
>> manage
>> their budget in a transparent manner (e.g., Babel and Webpack on Open
>> Collective ([6]_, [7]_)), and standard salary comparison sites.
>>
>> Since NumPy is a not-for-profit project, we also looked to the nonprofit
>> sector
>> for guidelines on remuneration policies and compensation levels. Our
>> findings
>> show that most smaller non-profits tend to pay a median salary/wage. We
>> recognize merit in this approach: applying candidates are likely to have a
>> genuine interest in open source, rather than to be motivated purely by
>> financial incentives.
>>
>> Considering all of the above, we will use the following guidelines for
>> determining compensation:
>>
>> 1. Aim to compensate people appropriately, up to a level that's expected
>> for
>>    senior engineers or other professionals as applicable.
>> 2. Establish a compensation cap of $125,000 USD that cannot be exceeded
>> even
>>    for the residents from the most expensive/competitive locations
>> ([#f-pay]_).
>> 3. For equivalent work and seniority,  a pay differential between
>> locations
>>    should never be more than 2x.
>>    For example, if we pay $110,000 USD to a senior-level developer from
>> New
>>    York, for equivalent work a senior-level developer from South-East Asia
>>    should be paid at least $55,000 USD. To compare locations, we will use
>>    `Numbeo Cost of Living calculator <
>> https://www.numbeo.com/cost-of-living/>`__
>>    (or its equivalent).
>>
>> Some other considerations:
>>
>> - Often, compensated work is offered for a limited amount of hours or
>> fixed
>>   term. In those cases, consider compensation equivalent to a remuneration
>>   package that comes with permanent employment (e.g., one month of work
>> should
>>   be compensated by at most 1/12th of a full-year salary + benefits).
>> - When comparing rates, an individual contractor should typically make
>> 20% more
>>   than someone who is employed since they have to take care of their
>> benefits
>>   and accounting on their own.
>> - Some people may be happy with one-off payments towards a particular
>>   deliverable (e.g., hiring a cleaner or some other service to use the
>> saved
>>   time for work on open source). This should be compensated at a lower
>> rate
>>   compared to an individual contractor.
>> - When funding someone's time through their employer, that employer may
>> want to
>>   set the compensation level based on its internal rules (e.g., overhead
>> rates).
>>   Small deviations from the guidelines in this NEP may be needed in such
>> cases,
>>   however they should be within reason.
>> - It's entirely possible that another strategy rather than paying people
>> for
>>   their time on certain tasks may turn out to be more effective. Anything
>> that
>>   helps the project and community grow and improve is worth considering.
>> - Transparency helps. If everyone involved is comfortable sharing their
>>   compensation levels with the rest of the team (or better make it
>> public),
>>   it's least likely to be way off the mark for fairness.
>>
>> We highly recommend that the individuals involved in decision-making about
>> hiring and compensation peruse the content of the References section of
>> this
>> NEP. It offers a lot of helpful advice on this topic.
>>
>>
>> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/9f1a2ea7/attachment.html>

From ralf.gommers at gmail.com  Mon Feb 22 16:42:01 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 22 Feb 2021 22:42:01 +0100
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <aacc5c2f0980273b79c82b11f6e1834c081e8b54.camel@sipsolutions.net>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <22b4290c7f5d5f6a17f819245403b90921d46504.camel@sipsolutions.net>
 <CABL7CQj2gxc9a0Ja4xBxPv13nC_OKqQ=TGU93-m4nQsAPuMj9w@mail.gmail.com>
 <aacc5c2f0980273b79c82b11f6e1834c081e8b54.camel@sipsolutions.net>
Message-ID: <CABL7CQg6=WWH_27wz6OsPH6cVyXTjiZyBrWO4K63YLiV8BG_fQ@mail.gmail.com>

On Mon, Feb 22, 2021 at 8:52 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Mon, 2021-02-22 at 20:16 +0100, Ralf Gommers wrote:
>
> <snip>
>
> > >
> > > It seems the current idea is to create a new NumPy array subclass.
> > > That
> > > sounds good, but I am a bit worried how that is going to interact
> > > with
> > > actual NumPy arrays.
> > >
> >
> > Not a subclass! If you got that impression, I should clarify the
> > text. The
>
> Sorry, you do write "wraps a numpy.ndarray", I am not sure why I got
> the subclass idea when reading it yesterday.
>
> But, in that case maybe you should just implement it as:
>
>     def somefunc(x):
>         # do whatever checks needed here for, e.g., input validation
>         # then call the native numpy implementation:
>         x = as_numpy_array(x)
>         result = np.somefunc(x)
>         return as_minimal(x)
>
> and not even use the `._implementation`?


That makes sense. It's like doing a double asarray, but you get the fast
path in __array_function__

I guess small issue is that
> we don't have a way to call `as_ndarray` on all relevant inputs
> conveniently.


Why not? The input should be very well-defined, basically just instances of
the new array object. Note that you cannot pass lists, generators, or other
such types.

  In most cases it will be straight forward though.
>
> I assume you need your own `as_numpy_array` call, to reject ndarray-
> subclasses, that `_implementation` will allow to pass through.
>
> > >
> > > Related to that: how important is it to keep that namespace a
> > > "minimal"
> > > implementation, rather than a "conforming" one?  For example, would
> > > you
> > > want to reject `numpy_api.array([1, 2, 3, 4], dtype="i,i")`? Or
> > > just
> > > `dtype="complex128"`?
> > >
> >
> > Yes, I'd definitely want to reject that. Format strings are terrible.
>
>
> Agreed, I guess I am wondering whether we can find a good solution that
> does not involve writing stubs around 140 functions with more strict
> input validation.
> But maybe it is also not particularly difficult or churn to do... Or
> even automate, e.g. from the typing stubs.
>

I think it's easy to do, and better than something "smart". Also note that
there are no typing stubs, the type annotations are clean enough that they
can be added inline, which is much nicer than stubs.


>
> >
> > Maybe I got the wrong impression though.  Is the aim for a minimal
> > > implementation, but you are OK as long as it is a conforming one?
> > >
> >
> > In principle we're okay with a conforming one that's a superset of
> > what is
> > in the standard. But I think we'd only want to do that if creating a
> > minimal one turns out to be difficult. Having the minimal required
> > set is
> > much nicer when one wants to write portable code. Because then you
> > can do
> > so without checking the docs whether any object/method is in
> > "minimal" or
> > in "extended".
> >
>
> Right, you would like to have a minimal implementation somewhere.
> Having it in NumPy could be convenient, although not strictly
> necessary.
>

That was my original thinking - just reuse `np.ndarray`, and have the
"minimal" thing as a standalone implementation in a new package. But that's
more work, and less nice. After getting used to the idea of a second array
object, I'm actually much happier with having it in numpy.


>
> <snip>
>
> >
> > > Besides `_implementation` usually does support array subclasses and
> > > may
> > > still dispatch again internally at this time!
> > >
> >
> > What, it dispatches again? That seems very suboptimal. If there's no
> > clean
> > way to avoid a dispatch, it may make sense to just check array inputs
> > for
> > the presence of __array_function__ and raise an exception if it's
> > present.
> >
>
>
> I do not think our functions where ever rewritten to only use e.g.
> `._implementation()` internally. I am not even quite certain that would
> be correct for subclasses.
>
> It is annoying that you may have to struggle with it here to do
> something that is different from the implicit dispatchers.  But on the
> up-side a clear solution would be helpful in any case.
>

Agreed.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210222/e7dffa34/attachment-0001.html>

From ralf.gommers at gmail.com  Tue Feb 23 04:20:49 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 23 Feb 2021 10:20:49 +0100
Subject: [Numpy-discussion] NEP 48: Spending NumPy Project funds
In-Reply-To: <CAEQ_Tvd3oB0Tw3rBx_E49LfQ=WY7VehXOtwr_y3xQ_y76FZP3Q@mail.gmail.com>
References: <CABL7CQhZe4dqTgZZ_A8XhKzW9iWWingo4ru8iOFxEzF1-3AwdA@mail.gmail.com>
 <CAPpwKcz=RbcNWsG5JbA0=LPPe3G5xkJd+05buZuDV39mrZN20Q@mail.gmail.com>
 <CAEQ_Tvd3oB0Tw3rBx_E49LfQ=WY7VehXOtwr_y3xQ_y76FZP3Q@mail.gmail.com>
Message-ID: <CABL7CQg49V2BSQ0ap6htTwGcPrN1358s+q0OjdveJdGVKwzOcg@mail.gmail.com>

On Mon, Feb 22, 2021 at 9:34 PM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Mon, Feb 22, 2021 at 4:08 AM Pearu Peterson <pearu.peterson at gmail.com>
> wrote:
>
>> Hi,
>>
>> See GH discussion starting at
>> https://github.com/numpy/numpy/pull/18454#discussion_r579967791 for the
>> raised issue that is now moved here.
>>
>> Re "Compensating fairly" section:
>>
>> The NEP proposes location-dependent contracts for fair pays.
>>
>> I think this is a contradictory approach as location is not the only
>> factor that may influence fairness. As an example, contractors may have
>> different levels of obligations to their families, and one might argue this
>> should be taken into consideration as well because the family size and the
>> required level of commitment to the family members (kids, members who need
>> special care, etc) can have a huge influence on the contractors living
>> standards, not just the level of average rent in the particular location.
>> It would be unfair to take into account location but not the family
>> situation. There may be other factors as well that may influence fairness
>> and I think this will make the decision-making about contracting harder
>> and, most importantly, controversial.
>>
>> My proposal is that factors like location, family situation, etc should
>> be discarded when negotiating contract terms. The efficiency of using the
>> project funding should be defined by how well and quickly a particular
>> contractor is able to get the job done,  but not how the contractors are
>> likely to spend their pays - it is nobody's business, IMHO, and is likely
>> very hard if not impossible to verify.
>>
>
> One difference is that it is illegal (at least under US law) to consider
> factors such as family situation in determining pay.
>
> However, it is both legal and standard to consider location. I'm not
> saying we should necessarily do it, but it's an accepted practice. NumPy
> development is global, but prevailing wages are not.
>

Regarding location, that's clearly one of the most complicated things to
deal with. Aside from legality, it's indeed because it's standard practice
that we have to deal with it. The NEP text explains why both doing what's
standard and a completely location-independent approach are considered
unfair. If I'd have to choose between those two, I'd agree that
location-independent compensation is *less unfair*. It would however either
make it impossible to contract with people in expensive locations, or use
compensation levels that are up to 10x higher than the norm for other
locations.

> "The efficiency of using the project funding ...."

This is exactly the contradiction. We don't just want to get the most for
our money. That's the usual corporate approach: pay as little as you can
get away with. And it would lead to very strong location-dependent choices.

The proposed approach is: first figure out what we want to fund. Then look
for a great candidate. Taking into account the factors listed, like if
someone is already a part of the team and has the required skills. And
after that's settled, determine a fair compensation level. This ordering
may not be as clear as it should be in the current text, I'll try to make
it more explicit.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/d1907ab0/attachment.html>

From qiyu8f at gmail.com  Tue Feb 23 07:41:21 2021
From: qiyu8f at gmail.com (ChunLin Fang)
Date: Tue, 23 Feb 2021 20:41:21 +0800
Subject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support
Message-ID: <CALCcLCF1zRE7HW6KvfvQDeMH4O_CF39cyY0UGcWQD1jDsf1TYQ@mail.gmail.com>

Thanks for asking, this is a simple explanation for your questions:
1. The download link of KML_BLAS:
The Chinese page is
https://www.huaweicloud.com/kunpeng/software/KML_BLAS.html
The English page is https://kunpeng.huawei.com/en/#/developer/devkit/library,
you can find a "Math Library" Navigation entry in the bottom of this page.
"KML_BLAS" lies in there.
2. The license/redistribution policy of KML_BLAS:
The license is very similar to intel MKL, The license file is in the
process of making.
3.How to support KML_BLAS:
The support process is similar to BLIS, just need to add to
numpy.distutils, KML_BLAS will not open source in the near future.
4.What kind of ARM chips are supported:
any ARMV8 chip is supported.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/83f91cee/attachment.html>

From ralf.gommers at gmail.com  Tue Feb 23 08:10:45 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 23 Feb 2021 14:10:45 +0100
Subject: [Numpy-discussion] ENH: Proposal to add KML_BLAS support
In-Reply-To: <CALCcLCF1zRE7HW6KvfvQDeMH4O_CF39cyY0UGcWQD1jDsf1TYQ@mail.gmail.com>
References: <CALCcLCF1zRE7HW6KvfvQDeMH4O_CF39cyY0UGcWQD1jDsf1TYQ@mail.gmail.com>
Message-ID: <CABL7CQgMLPFDyV52Msn-zpXcJx0Djkb8T1ccTE9rS9wx86risw@mail.gmail.com>

On Tue, Feb 23, 2021 at 1:42 PM ChunLin Fang <qiyu8f at gmail.com> wrote:

> Thanks for asking, this is a simple explanation for your questions:
> 1. The download link of KML_BLAS:
> The Chinese page is
> https://www.huaweicloud.com/kunpeng/software/KML_BLAS.html
> The English page is
> https://kunpeng.huawei.com/en/#/developer/devkit/library,  you can find a
> "Math Library" Navigation entry in the bottom of this page. "KML_BLAS" lies
> in there.
> 2. The license/redistribution policy of KML_BLAS:
> The license is very similar to intel MKL, The license file is in the
> process of making.
> 3.How to support KML_BLAS:
> The support process is similar to BLIS, just need to add to
> numpy.distutils, KML_BLAS will not open source in the near future.
>

This sounds fine to me, and the performance is potentially interesting to
ARMv8 users. Do you want to open a PR?

Side note: the email client you are using is breaking threading, you may
want to tweak a setting for that or change to another client.

Cheers,
Ralf

4.What kind of ARM chips are supported:
> any ARMV8 chip is supported.
>
_______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/7ea37af6/attachment.html>

From ralf.gommers at gmail.com  Tue Feb 23 09:08:04 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Tue, 23 Feb 2021 15:08:04 +0100
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <22b4290c7f5d5f6a17f819245403b90921d46504.camel@sipsolutions.net>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <22b4290c7f5d5f6a17f819245403b90921d46504.camel@sipsolutions.net>
Message-ID: <CABL7CQjuB8i=8Bv=OO+DPRj6NxxS=Yn7WyiC5OwF9Srih_x0TQ@mail.gmail.com>

On Mon, Feb 22, 2021 at 7:49 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> > Hi all,
> >
> > Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> > for
> > discussion on adoption of the array API standard (
> > https://data-apis.github.io/array-api/latest/). This will add a new
> > numpy.array_api submodule containing that standardized API. The main
> > purpose of this API is to be able to write code that is portable to
> > other
> > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> > MXNet.
> >
> > We expect this NEP to remain in draft state for quite a while, while
> > we're
> > gaining experience with using it in downstream libraries, discuss
> > adding it
> > to other array libraries, and finishing some of the loose ends (e.g.,
> > specifications for linear algebra functions that aren't merged yet,
> > see
> > https://github.com/data-apis/array-api/pulls) in the API standard
> > itself.
>
>
> There is too much to unpack in a day, I hope I did not miss something
> particularly important while reading.
> Do you have plans to try some of this outside of NumPy, or maybe make a
> repo in the numpy org for it?
>

Sorry, I forgot to answer this question. That is what we're doing now, the
current prototype is at
https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api. I do
expect that as soon we need any changes in C code, that becomes
impractical. I think merging as a private submodule (numpy._array_api)
makes sense. That will help with WIP PRs to other libraries - then we can
use the "test against master" CI for that, rather than having to make a
mess injecting things inside CI.

Also, there are a few parts of the NEP that are improvements outside of the
new submodule. Not only DLPack, but also consistency in "stacks of
matrices" in linalg functions, adding a missing keepdims keyword, the
never-copy mode for asarray, and improving the API for inspecting dtype
families (https://github.com/numpy/numpy/issues/17325). Those things can
all be pushed forward.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/3dbcafc2/attachment-0001.html>

From ndbecker2 at gmail.com  Tue Feb 23 13:09:31 2021
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 23 Feb 2021 13:09:31 -0500
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
Message-ID: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>

I have code that performs dot product of a 2D matrix of size (on the
order of) [1000,16] with a vector of size [1000].  The matrix is
float64 and the vector is complex128.  I was using numpy.dot but it
turned out to be a bottleneck.

So I coded dot2x1 in c++ (using xtensor-python just for the
interface).  No fancy simd was used, unless g++ did it on it's own.

On a simple benchmark using timeit I find my hand-coded routine is on
the order of 1000x faster than numpy?  Here is the test code:
My custom c++ code is dot2x1.  I'm not copying it here because it has
some dependencies.  Any idea what is going on?

import numpy as np

from dot2x1 import dot2x1

a = np.ones ((1000,16))
b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
       -0.80311816+0.80311816j, -0.80311816-0.80311816j,
        1.09707981+0.29396165j,  1.09707981-0.29396165j,
       -1.09707981+0.29396165j, -1.09707981-0.29396165j,
        0.29396165+1.09707981j,  0.29396165-1.09707981j,
       -0.29396165+1.09707981j, -0.29396165-1.09707981j,
        0.25495815+0.25495815j,  0.25495815-0.25495815j,
       -0.25495815+0.25495815j, -0.25495815-0.25495815j])

def F1():
    d = dot2x1 (a, b)

def F2():
    d = np.dot (a, b)

from timeit import timeit
print (timeit ('F1()', globals=globals(), number=1000))
print (timeit ('F2()', globals=globals(), number=1000))

In [13]: 0.013910860987380147 << 1st timeit
28.608758996007964  << 2nd timeit
-- 
Those who don't understand recursion are doomed to repeat it

From andrea.gavana at gmail.com  Tue Feb 23 13:32:46 2021
From: andrea.gavana at gmail.com (Andrea Gavana)
Date: Tue, 23 Feb 2021 19:32:46 +0100
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
Message-ID: <CAEf70bwPrrfZys49PsVXN033qKZs2_5kcSryWbd-ZePcn3SC3g@mail.gmail.com>

Hi,

On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbecker2 at gmail.com> wrote:

> I have code that performs dot product of a 2D matrix of size (on the
> order of) [1000,16] with a vector of size [1000].  The matrix is
> float64 and the vector is complex128.  I was using numpy.dot but it
> turned out to be a bottleneck.
>
> So I coded dot2x1 in c++ (using xtensor-python just for the
> interface).  No fancy simd was used, unless g++ did it on it's own.
>
> On a simple benchmark using timeit I find my hand-coded routine is on
> the order of 1000x faster than numpy?  Here is the test code:
> My custom c++ code is dot2x1.  I'm not copying it here because it has
> some dependencies.  Any idea what is going on?


I had a similar experience - albeit with an older numpy and Python 2.7, so
my comments are easily outdated and irrelevant. This was on Windows 10 64
bit, way more than plenty RAM.

It took me forever to find out that numpy.dot was the culprit, and I ended
up using fortran + f2py. Even with the overhead of having to go through
f2py bridge, the fortran dot_product was several times faster.

Sorry if It doesn?t help much.

Andrea.


>
> import numpy as np
>
> from dot2x1 import dot2x1
>
> a = np.ones ((1000,16))
> b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
>        -0.80311816+0.80311816j, -0.80311816-0.80311816j,
>         1.09707981+0.29396165j,  1.09707981-0.29396165j,
>        -1.09707981+0.29396165j, -1.09707981-0.29396165j,
>         0.29396165+1.09707981j,  0.29396165-1.09707981j,
>        -0.29396165+1.09707981j, -0.29396165-1.09707981j,
>         0.25495815+0.25495815j,  0.25495815-0.25495815j,
>        -0.25495815+0.25495815j, -0.25495815-0.25495815j])
>
> def F1():
>     d = dot2x1 (a, b)
>
> def F2():
>     d = np.dot (a, b)
>
> from timeit import timeit
> print (timeit ('F1()', globals=globals(), number=1000))
> print (timeit ('F2()', globals=globals(), number=1000))
>
> In [13]: 0.013910860987380147 << 1st timeit
> 28.608758996007964  << 2nd timeit
> --
> Those who don't understand recursion are doomed to repeat it
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/eb0dc595/attachment.html>

From rth.yurchak at gmail.com  Tue Feb 23 13:40:22 2021
From: rth.yurchak at gmail.com (Roman Yurchak)
Date: Tue, 23 Feb 2021 19:40:22 +0100
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAEf70bwPrrfZys49PsVXN033qKZs2_5kcSryWbd-ZePcn3SC3g@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAEf70bwPrrfZys49PsVXN033qKZs2_5kcSryWbd-ZePcn3SC3g@mail.gmail.com>
Message-ID: <22748cbe-ea46-6f86-29d0-24ba9dc60e7a@gmail.com>

For the first benchmark apparently A.dot(B) with A real and B complex is 
a known issue performance wise https://github.com/numpy/numpy/issues/10468

In general, it might be worth trying different BLAS backends. For 
instance, if you install numpy from conda-forge you should be able to 
switch between OpenBLAS, MKL and BLIS: 
https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation

Roman

On 23/02/2021 19:32, Andrea Gavana wrote:
> Hi,
> 
> On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbecker2 at gmail.com 
> <mailto:ndbecker2 at gmail.com>> wrote:
> 
>     I have code that performs dot product of a 2D matrix of size (on the
>     order of) [1000,16] with a vector of size [1000].? The matrix is
>     float64 and the vector is complex128.? I was using numpy.dot but it
>     turned out to be a bottleneck.
> 
>     So I coded dot2x1 in c++ (using xtensor-python just for the
>     interface).? No fancy simd was used, unless g++ did it on it's own.
> 
>     On a simple benchmark using timeit I find my hand-coded routine is on
>     the order of 1000x faster than numpy?? Here is the test code:
>     My custom c++ code is dot2x1.? I'm not copying it here because it has
>     some dependencies.? Any idea what is going on?
> 
> 
> 
> I had a similar experience - albeit with an older numpy and Python 2.7, 
> so my comments are easily outdated and irrelevant. This was on Windows 
> 10 64 bit, way more than plenty RAM.
> 
> It took me forever to find out that numpy.dot was the culprit, and I 
> ended up using fortran + f2py. Even with the overhead of having to go 
> through f2py bridge, the fortran dot_product was several times faster.
> 
> Sorry if It doesn?t help much.
> 
> Andrea.
> 
> 
> 
> 
>     import numpy as np
> 
>     from dot2x1 import dot2x1
> 
>     a = np.ones ((1000,16))
>     b = np.array([ 0.80311816+0.80311816j,? 0.80311816-0.80311816j,
>      ? ? ? ?-0.80311816+0.80311816j, -0.80311816-0.80311816j,
>      ? ? ? ? 1.09707981+0.29396165j,? 1.09707981-0.29396165j,
>      ? ? ? ?-1.09707981+0.29396165j, -1.09707981-0.29396165j,
>      ? ? ? ? 0.29396165+1.09707981j,? 0.29396165-1.09707981j,
>      ? ? ? ?-0.29396165+1.09707981j, -0.29396165-1.09707981j,
>      ? ? ? ? 0.25495815+0.25495815j,? 0.25495815-0.25495815j,
>      ? ? ? ?-0.25495815+0.25495815j, -0.25495815-0.25495815j])
> 
>     def F1():
>      ? ? d = dot2x1 (a, b)
> 
>     def F2():
>      ? ? d = np.dot (a, b)
> 
>     from timeit import timeit
>     print (timeit ('F1()', globals=globals(), number=1000))
>     print (timeit ('F2()', globals=globals(), number=1000))
> 
>     In [13]: 0.013910860987380147 << 1st timeit
>     28.608758996007964? << 2nd timeit
>     -- 
>     Those who don't understand recursion are doomed to repeat it
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

From ndbecker2 at gmail.com  Tue Feb 23 13:51:39 2021
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 23 Feb 2021 13:51:39 -0500
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <22748cbe-ea46-6f86-29d0-24ba9dc60e7a@gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAEf70bwPrrfZys49PsVXN033qKZs2_5kcSryWbd-ZePcn3SC3g@mail.gmail.com>
 <22748cbe-ea46-6f86-29d0-24ba9dc60e7a@gmail.com>
Message-ID: <CAG3t+pF=dOuAcdXnemWaGmEoAhH_Eo+hwoX=3Avt+NC5Yx7agw@mail.gmail.com>

One suspect is that it seems the numpy version was multi-threading.
This isn't useful here, because I'm running parallel monte-carlo
simulations using all cores.  Perhaps this is perversely slowing
things down?  I don't know how to account for 1000x  slowdown though.

On Tue, Feb 23, 2021 at 1:40 PM Roman Yurchak <rth.yurchak at gmail.com> wrote:
>
> For the first benchmark apparently A.dot(B) with A real and B complex is
> a known issue performance wise https://github.com/numpy/numpy/issues/10468
>
> In general, it might be worth trying different BLAS backends. For
> instance, if you install numpy from conda-forge you should be able to
> switch between OpenBLAS, MKL and BLIS:
> https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation
>
> Roman
>
> On 23/02/2021 19:32, Andrea Gavana wrote:
> > Hi,
> >
> > On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbecker2 at gmail.com
> > <mailto:ndbecker2 at gmail.com>> wrote:
> >
> >     I have code that performs dot product of a 2D matrix of size (on the
> >     order of) [1000,16] with a vector of size [1000].  The matrix is
> >     float64 and the vector is complex128.  I was using numpy.dot but it
> >     turned out to be a bottleneck.
> >
> >     So I coded dot2x1 in c++ (using xtensor-python just for the
> >     interface).  No fancy simd was used, unless g++ did it on it's own.
> >
> >     On a simple benchmark using timeit I find my hand-coded routine is on
> >     the order of 1000x faster than numpy?  Here is the test code:
> >     My custom c++ code is dot2x1.  I'm not copying it here because it has
> >     some dependencies.  Any idea what is going on?
> >
> >
> >
> > I had a similar experience - albeit with an older numpy and Python 2.7,
> > so my comments are easily outdated and irrelevant. This was on Windows
> > 10 64 bit, way more than plenty RAM.
> >
> > It took me forever to find out that numpy.dot was the culprit, and I
> > ended up using fortran + f2py. Even with the overhead of having to go
> > through f2py bridge, the fortran dot_product was several times faster.
> >
> > Sorry if It doesn?t help much.
> >
> > Andrea.
> >
> >
> >
> >
> >     import numpy as np
> >
> >     from dot2x1 import dot2x1
> >
> >     a = np.ones ((1000,16))
> >     b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
> >             -0.80311816+0.80311816j, -0.80311816-0.80311816j,
> >              1.09707981+0.29396165j,  1.09707981-0.29396165j,
> >             -1.09707981+0.29396165j, -1.09707981-0.29396165j,
> >              0.29396165+1.09707981j,  0.29396165-1.09707981j,
> >             -0.29396165+1.09707981j, -0.29396165-1.09707981j,
> >              0.25495815+0.25495815j,  0.25495815-0.25495815j,
> >             -0.25495815+0.25495815j, -0.25495815-0.25495815j])
> >
> >     def F1():
> >          d = dot2x1 (a, b)
> >
> >     def F2():
> >          d = np.dot (a, b)
> >
> >     from timeit import timeit
> >     print (timeit ('F1()', globals=globals(), number=1000))
> >     print (timeit ('F2()', globals=globals(), number=1000))
> >
> >     In [13]: 0.013910860987380147 << 1st timeit
> >     28.608758996007964  << 2nd timeit
> >     --
> >     Those who don't understand recursion are doomed to repeat it
> >     _______________________________________________
> >     NumPy-Discussion mailing list
> >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> >     https://mail.python.org/mailman/listinfo/numpy-discussion
> >     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


-- 
Those who don't understand recursion are doomed to repeat it

From cmkleffner at gmail.com  Tue Feb 23 14:19:12 2021
From: cmkleffner at gmail.com (Carl Kleffner)
Date: Tue, 23 Feb 2021 20:19:12 +0100
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAG3t+pF=dOuAcdXnemWaGmEoAhH_Eo+hwoX=3Avt+NC5Yx7agw@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAEf70bwPrrfZys49PsVXN033qKZs2_5kcSryWbd-ZePcn3SC3g@mail.gmail.com>
 <22748cbe-ea46-6f86-29d0-24ba9dc60e7a@gmail.com>
 <CAG3t+pF=dOuAcdXnemWaGmEoAhH_Eo+hwoX=3Avt+NC5Yx7agw@mail.gmail.com>
Message-ID: <CAGGsPMzBH4xz6tvHhPSVZnWksPAsKhZH1Yp4Vhbq-JO39ebJGQ@mail.gmail.com>

https://stackoverflow.com/questions/19839539/how-to-get-faster-code-than-numpy-dot-for-matrix-multiplication

maybe C_CONTIGUOUS vs F_CONTIGUOUS?

Carl


Am Di., 23. Feb. 2021 um 19:52 Uhr schrieb Neal Becker <ndbecker2 at gmail.com
>:

> One suspect is that it seems the numpy version was multi-threading.
> This isn't useful here, because I'm running parallel monte-carlo
> simulations using all cores.  Perhaps this is perversely slowing
> things down?  I don't know how to account for 1000x  slowdown though.
>
> On Tue, Feb 23, 2021 at 1:40 PM Roman Yurchak <rth.yurchak at gmail.com>
> wrote:
> >
> > For the first benchmark apparently A.dot(B) with A real and B complex is
> > a known issue performance wise
> https://github.com/numpy/numpy/issues/10468
> >
> > In general, it might be worth trying different BLAS backends. For
> > instance, if you install numpy from conda-forge you should be able to
> > switch between OpenBLAS, MKL and BLIS:
> >
> https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation
> >
> > Roman
> >
> > On 23/02/2021 19:32, Andrea Gavana wrote:
> > > Hi,
> > >
> > > On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbecker2 at gmail.com
> > > <mailto:ndbecker2 at gmail.com>> wrote:
> > >
> > >     I have code that performs dot product of a 2D matrix of size (on
> the
> > >     order of) [1000,16] with a vector of size [1000].  The matrix is
> > >     float64 and the vector is complex128.  I was using numpy.dot but it
> > >     turned out to be a bottleneck.
> > >
> > >     So I coded dot2x1 in c++ (using xtensor-python just for the
> > >     interface).  No fancy simd was used, unless g++ did it on it's own.
> > >
> > >     On a simple benchmark using timeit I find my hand-coded routine is
> on
> > >     the order of 1000x faster than numpy?  Here is the test code:
> > >     My custom c++ code is dot2x1.  I'm not copying it here because it
> has
> > >     some dependencies.  Any idea what is going on?
> > >
> > >
> > >
> > > I had a similar experience - albeit with an older numpy and Python 2.7,
> > > so my comments are easily outdated and irrelevant. This was on Windows
> > > 10 64 bit, way more than plenty RAM.
> > >
> > > It took me forever to find out that numpy.dot was the culprit, and I
> > > ended up using fortran + f2py. Even with the overhead of having to go
> > > through f2py bridge, the fortran dot_product was several times faster.
> > >
> > > Sorry if It doesn?t help much.
> > >
> > > Andrea.
> > >
> > >
> > >
> > >
> > >     import numpy as np
> > >
> > >     from dot2x1 import dot2x1
> > >
> > >     a = np.ones ((1000,16))
> > >     b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
> > >             -0.80311816+0.80311816j, -0.80311816-0.80311816j,
> > >              1.09707981+0.29396165j,  1.09707981-0.29396165j,
> > >             -1.09707981+0.29396165j, -1.09707981-0.29396165j,
> > >              0.29396165+1.09707981j,  0.29396165-1.09707981j,
> > >             -0.29396165+1.09707981j, -0.29396165-1.09707981j,
> > >              0.25495815+0.25495815j,  0.25495815-0.25495815j,
> > >             -0.25495815+0.25495815j, -0.25495815-0.25495815j])
> > >
> > >     def F1():
> > >          d = dot2x1 (a, b)
> > >
> > >     def F2():
> > >          d = np.dot (a, b)
> > >
> > >     from timeit import timeit
> > >     print (timeit ('F1()', globals=globals(), number=1000))
> > >     print (timeit ('F2()', globals=globals(), number=1000))
> > >
> > >     In [13]: 0.013910860987380147 << 1st timeit
> > >     28.608758996007964  << 2nd timeit
> > >     --
> > >     Those who don't understand recursion are doomed to repeat it
> > >     _______________________________________________
> > >     NumPy-Discussion mailing list
> > >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> > >     https://mail.python.org/mailman/listinfo/numpy-discussion
> > >     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> Those who don't understand recursion are doomed to repeat it
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/04bf5363/attachment.html>

From ndbecker2 at gmail.com  Tue Feb 23 14:19:03 2021
From: ndbecker2 at gmail.com (Neal Becker)
Date: Tue, 23 Feb 2021 14:19:03 -0500
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAG3t+pF=dOuAcdXnemWaGmEoAhH_Eo+hwoX=3Avt+NC5Yx7agw@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAEf70bwPrrfZys49PsVXN033qKZs2_5kcSryWbd-ZePcn3SC3g@mail.gmail.com>
 <22748cbe-ea46-6f86-29d0-24ba9dc60e7a@gmail.com>
 <CAG3t+pF=dOuAcdXnemWaGmEoAhH_Eo+hwoX=3Avt+NC5Yx7agw@mail.gmail.com>
Message-ID: <CAG3t+pEeGGU6jrB5pA3sudbOfzV=jgTvqutqEgXio7VXXyAYow@mail.gmail.com>

I'm using fedora 33 standard numpy.
ldd says:

/usr/lib64/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so:
linux-vdso.so.1 (0x00007ffdd1487000)
libflexiblas.so.3 => /lib64/libflexiblas.so.3 (0x00007f0512787000)

So whatever flexiblas is doing controls blas.

On Tue, Feb 23, 2021 at 1:51 PM Neal Becker <ndbecker2 at gmail.com> wrote:
>
> One suspect is that it seems the numpy version was multi-threading.
> This isn't useful here, because I'm running parallel monte-carlo
> simulations using all cores.  Perhaps this is perversely slowing
> things down?  I don't know how to account for 1000x  slowdown though.
>
> On Tue, Feb 23, 2021 at 1:40 PM Roman Yurchak <rth.yurchak at gmail.com> wrote:
> >
> > For the first benchmark apparently A.dot(B) with A real and B complex is
> > a known issue performance wise https://github.com/numpy/numpy/issues/10468
> >
> > In general, it might be worth trying different BLAS backends. For
> > instance, if you install numpy from conda-forge you should be able to
> > switch between OpenBLAS, MKL and BLIS:
> > https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation
> >
> > Roman
> >
> > On 23/02/2021 19:32, Andrea Gavana wrote:
> > > Hi,
> > >
> > > On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbecker2 at gmail.com
> > > <mailto:ndbecker2 at gmail.com>> wrote:
> > >
> > >     I have code that performs dot product of a 2D matrix of size (on the
> > >     order of) [1000,16] with a vector of size [1000].  The matrix is
> > >     float64 and the vector is complex128.  I was using numpy.dot but it
> > >     turned out to be a bottleneck.
> > >
> > >     So I coded dot2x1 in c++ (using xtensor-python just for the
> > >     interface).  No fancy simd was used, unless g++ did it on it's own.
> > >
> > >     On a simple benchmark using timeit I find my hand-coded routine is on
> > >     the order of 1000x faster than numpy?  Here is the test code:
> > >     My custom c++ code is dot2x1.  I'm not copying it here because it has
> > >     some dependencies.  Any idea what is going on?
> > >
> > >
> > >
> > > I had a similar experience - albeit with an older numpy and Python 2.7,
> > > so my comments are easily outdated and irrelevant. This was on Windows
> > > 10 64 bit, way more than plenty RAM.
> > >
> > > It took me forever to find out that numpy.dot was the culprit, and I
> > > ended up using fortran + f2py. Even with the overhead of having to go
> > > through f2py bridge, the fortran dot_product was several times faster.
> > >
> > > Sorry if It doesn?t help much.
> > >
> > > Andrea.
> > >
> > >
> > >
> > >
> > >     import numpy as np
> > >
> > >     from dot2x1 import dot2x1
> > >
> > >     a = np.ones ((1000,16))
> > >     b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
> > >             -0.80311816+0.80311816j, -0.80311816-0.80311816j,
> > >              1.09707981+0.29396165j,  1.09707981-0.29396165j,
> > >             -1.09707981+0.29396165j, -1.09707981-0.29396165j,
> > >              0.29396165+1.09707981j,  0.29396165-1.09707981j,
> > >             -0.29396165+1.09707981j, -0.29396165-1.09707981j,
> > >              0.25495815+0.25495815j,  0.25495815-0.25495815j,
> > >             -0.25495815+0.25495815j, -0.25495815-0.25495815j])
> > >
> > >     def F1():
> > >          d = dot2x1 (a, b)
> > >
> > >     def F2():
> > >          d = np.dot (a, b)
> > >
> > >     from timeit import timeit
> > >     print (timeit ('F1()', globals=globals(), number=1000))
> > >     print (timeit ('F2()', globals=globals(), number=1000))
> > >
> > >     In [13]: 0.013910860987380147 << 1st timeit
> > >     28.608758996007964  << 2nd timeit
> > >     --
> > >     Those who don't understand recursion are doomed to repeat it
> > >     _______________________________________________
> > >     NumPy-Discussion mailing list
> > >     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> > >     https://mail.python.org/mailman/listinfo/numpy-discussion
> > >     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> Those who don't understand recursion are doomed to repeat it


-- 
Those who don't understand recursion are doomed to repeat it

From davidmenhur at gmail.com  Tue Feb 23 14:46:21 2021
From: davidmenhur at gmail.com (=?UTF-8?Q?David_Men=C3=A9ndez_Hurtado?=)
Date: Tue, 23 Feb 2021 20:46:21 +0100
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <22748cbe-ea46-6f86-29d0-24ba9dc60e7a@gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAEf70bwPrrfZys49PsVXN033qKZs2_5kcSryWbd-ZePcn3SC3g@mail.gmail.com>
 <22748cbe-ea46-6f86-29d0-24ba9dc60e7a@gmail.com>
Message-ID: <CAJhcF=3qqa5SNJmLeNodm8PHxJ2uAHuKpmoVkY4ahWME19M5uQ@mail.gmail.com>

On Tue, 23 Feb 2021, 7:41 pm Roman Yurchak, <rth.yurchak at gmail.com> wrote:

> For the first benchmark apparently A.dot(B) with A real and B complex is
> a known issue performance wise https://github.com/numpy/numpy/issues/10468


I splitted B into a vector of size (N, 2) for the real and imaginary part,
and that makes the multiplication twice as fast.


My configuration (also in Fedora 33) np.show_config()


blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/5a78fb9f/attachment.html>

From cmkleffner at gmail.com  Tue Feb 23 16:03:50 2021
From: cmkleffner at gmail.com (Carl Kleffner)
Date: Tue, 23 Feb 2021 22:03:50 +0100
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAJhcF=3qqa5SNJmLeNodm8PHxJ2uAHuKpmoVkY4ahWME19M5uQ@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAEf70bwPrrfZys49PsVXN033qKZs2_5kcSryWbd-ZePcn3SC3g@mail.gmail.com>
 <22748cbe-ea46-6f86-29d0-24ba9dc60e7a@gmail.com>
 <CAJhcF=3qqa5SNJmLeNodm8PHxJ2uAHuKpmoVkY4ahWME19M5uQ@mail.gmail.com>
Message-ID: <CAGGsPMwvd6b5+efqdmJUdMKvMPCR6WkNEdTjMtneLBjimTKqWw@mail.gmail.com>

The stackoverflow link above contains a simple testcase:

>>> from scipy.linalg import get_blas_funcs>>> gemm = get_blas_funcs("gemm", [X, Y])>>> np.all(gemm(1, X, Y) == np.dot(X, Y))True

It would be of interest to benchmark gemm against np.dot. Maybe np.dot
doesn't use blas at al for whatever reason?


Am Di., 23. Feb. 2021 um 20:46 Uhr schrieb David Men?ndez Hurtado <
davidmenhur at gmail.com>:

> On Tue, 23 Feb 2021, 7:41 pm Roman Yurchak, <rth.yurchak at gmail.com> wrote:
>
>> For the first benchmark apparently A.dot(B) with A real and B complex is
>> a known issue performance wise
>> https://github.com/numpy/numpy/issues/10468
>
>
> I splitted B into a vector of size (N, 2) for the real and imaginary part,
> and that makes the multiplication twice as fast.
>
>
> My configuration (also in Fedora 33) np.show_config()
>
>
>
> blas_mkl_info:
>   NOT AVAILABLE
> blis_info:
>   NOT AVAILABLE
> openblas_info:
>     libraries = ['openblas', 'openblas']
>     library_dirs = ['/usr/local/lib']
>     language = c
>     define_macros = [('HAVE_CBLAS', None)]
> blas_opt_info:
>     libraries = ['openblas', 'openblas']
>     library_dirs = ['/usr/local/lib']
>     language = c
>     define_macros = [('HAVE_CBLAS', None)]
> lapack_mkl_info:
>   NOT AVAILABLE
> openblas_lapack_info:
>     libraries = ['openblas', 'openblas']
>     library_dirs = ['/usr/local/lib']
>     language = c
>     define_macros = [('HAVE_CBLAS', None)]
> lapack_opt_info:
>     libraries = ['openblas', 'openblas']
>     library_dirs = ['/usr/local/lib']
>     language = c
>     define_macros = [('HAVE_CBLAS', None)]
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/05fa4d89/attachment.html>

From charlesr.harris at gmail.com  Tue Feb 23 19:47:29 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 23 Feb 2021 17:47:29 -0700
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
Message-ID: <CAB6mnxKNgeRsyREy3ELMThnZHbnf+thnBnsktGcT2RkUjTffHA@mail.gmail.com>

On Tue, Feb 23, 2021 at 11:10 AM Neal Becker <ndbecker2 at gmail.com> wrote:

> I have code that performs dot product of a 2D matrix of size (on the
> order of) [1000,16] with a vector of size [1000].  The matrix is
> float64 and the vector is complex128.  I was using numpy.dot but it
> turned out to be a bottleneck.
>
> So I coded dot2x1 in c++ (using xtensor-python just for the
> interface).  No fancy simd was used, unless g++ did it on it's own.
>
> On a simple benchmark using timeit I find my hand-coded routine is on
> the order of 1000x faster than numpy?  Here is the test code:
> My custom c++ code is dot2x1.  I'm not copying it here because it has
> some dependencies.  Any idea what is going on?
>
> import numpy as np
>
> from dot2x1 import dot2x1
>
> a = np.ones ((1000,16))
> b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
>        -0.80311816+0.80311816j, -0.80311816-0.80311816j,
>         1.09707981+0.29396165j,  1.09707981-0.29396165j,
>        -1.09707981+0.29396165j, -1.09707981-0.29396165j,
>         0.29396165+1.09707981j,  0.29396165-1.09707981j,
>        -0.29396165+1.09707981j, -0.29396165-1.09707981j,
>         0.25495815+0.25495815j,  0.25495815-0.25495815j,
>        -0.25495815+0.25495815j, -0.25495815-0.25495815j])
>
> def F1():
>     d = dot2x1 (a, b)
>
> def F2():
>     d = np.dot (a, b)
>
> from timeit import timeit
> print (timeit ('F1()', globals=globals(), number=1000))
> print (timeit ('F2()', globals=globals(), number=1000))
>
> In [13]: 0.013910860987380147 << 1st timeit
> 28.608758996007964  << 2nd timeit
>

I'm going to guess threading, although huge pages can also be a problem on
a machine under heavy load running other processes. Call overhead may also
matter for such small matrices.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/2721a550/attachment.html>

From charlesr.harris at gmail.com  Tue Feb 23 19:59:34 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Tue, 23 Feb 2021 17:59:34 -0700
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAB6mnxKNgeRsyREy3ELMThnZHbnf+thnBnsktGcT2RkUjTffHA@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAB6mnxKNgeRsyREy3ELMThnZHbnf+thnBnsktGcT2RkUjTffHA@mail.gmail.com>
Message-ID: <CAB6mnx+zHeVVz9XpVP5FEFjxv0=5s87gU2mNoXSSWfbbpfj75Q@mail.gmail.com>

On Tue, Feb 23, 2021 at 5:47 PM Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Tue, Feb 23, 2021 at 11:10 AM Neal Becker <ndbecker2 at gmail.com> wrote:
>
>> I have code that performs dot product of a 2D matrix of size (on the
>> order of) [1000,16] with a vector of size [1000].  The matrix is
>> float64 and the vector is complex128.  I was using numpy.dot but it
>> turned out to be a bottleneck.
>>
>> So I coded dot2x1 in c++ (using xtensor-python just for the
>> interface).  No fancy simd was used, unless g++ did it on it's own.
>>
>> On a simple benchmark using timeit I find my hand-coded routine is on
>> the order of 1000x faster than numpy?  Here is the test code:
>> My custom c++ code is dot2x1.  I'm not copying it here because it has
>> some dependencies.  Any idea what is going on?
>>
>> import numpy as np
>>
>> from dot2x1 import dot2x1
>>
>> a = np.ones ((1000,16))
>> b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
>>        -0.80311816+0.80311816j, -0.80311816-0.80311816j,
>>         1.09707981+0.29396165j,  1.09707981-0.29396165j,
>>        -1.09707981+0.29396165j, -1.09707981-0.29396165j,
>>         0.29396165+1.09707981j,  0.29396165-1.09707981j,
>>        -0.29396165+1.09707981j, -0.29396165-1.09707981j,
>>         0.25495815+0.25495815j,  0.25495815-0.25495815j,
>>        -0.25495815+0.25495815j, -0.25495815-0.25495815j])
>>
>> def F1():
>>     d = dot2x1 (a, b)
>>
>> def F2():
>>     d = np.dot (a, b)
>>
>> from timeit import timeit
>> print (timeit ('F1()', globals=globals(), number=1000))
>> print (timeit ('F2()', globals=globals(), number=1000))
>>
>> In [13]: 0.013910860987380147 << 1st timeit
>> 28.608758996007964  << 2nd timeit
>>
>
> I'm going to guess threading, although huge pages can also be a problem on
> a machine under heavy load running other processes. Call overhead may also
> matter for such small matrices.
>
>
What BLAS library are you using. I get much better results using an 8 year
old i5 and ATLAS.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/fbe113f4/attachment-0001.html>

From sebastian at sipsolutions.net  Tue Feb 23 22:27:18 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 23 Feb 2021 21:27:18 -0600
Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage
 Focus
Message-ID: <81709156b3123fdaf0e6051a6e377a37bcf46f4a.camel@sipsolutions.net>

Hi all,

Our bi-weekly triage-focused NumPy development meeting is Wednesday,
Feb 24th at 11 am Pacific Time (19:00 UTC).
Everyone is invited to join in and edit the work-in-progress meeting
topics and notes:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

I encourage everyone to notify us of issues or PRs that you feel should
be prioritized, discussed, or reviewed.

Best regards

Sebastian


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210223/eab3e54c/attachment.sig>

From bknaepen at gmail.com  Wed Feb 24 05:29:01 2021
From: bknaepen at gmail.com (Bernard Knaepen)
Date: Wed, 24 Feb 2021 11:29:01 +0100
Subject: [Numpy-discussion] Guidelines for floating point comparison
Message-ID: <E4FDBD73-8A2D-41DE-99F4-7171FFBDA604@gmail.com>

Hi all,

We are developing a code that heavily relies on NumPy. Some of our regression tests rely on floating point number comparisons and we are a bit lost in determining how to choose atol and rtol (we are trying to do all operations in double precision). We would like to set atol and rtol as low as possible but still have the tests pass if we run on different architectures or introduce some ?cosmetic? changes like using different similar NumPy routines.

For example, let?s say we want some powers of the matrix A and compute them as:

A = np.array(some_array)
A2 = np.dot(A, A)
A3 = np.dot(A2, A)
A4 = np.dot(A3, A)

If we alternatively computed A4 like:

A4 = np.linalg.matrix_power(A, 4),

we get different values in our final outputs because obviously the operations are not equivalent up to machine accuracy.

Is there any reference that one could share providing guidelines on how to choose reasonable values for atol and rtol in this kind of situation? For example, does the NumPy package use a fixed set of values for its own development? the default ones?

Thanks in advance for any help,
Cheers,
Bernard.


From ralf.gommers at gmail.com  Wed Feb 24 07:24:57 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 24 Feb 2021 13:24:57 +0100
Subject: [Numpy-discussion] Guidelines for floating point comparison
In-Reply-To: <E4FDBD73-8A2D-41DE-99F4-7171FFBDA604@gmail.com>
References: <E4FDBD73-8A2D-41DE-99F4-7171FFBDA604@gmail.com>
Message-ID: <CABL7CQjSoGy4hXPqBwUV_-U4Ais=WMde2967Ak9fyTh8FekG4w@mail.gmail.com>

On Wed, Feb 24, 2021 at 11:29 AM Bernard Knaepen <bknaepen at gmail.com> wrote:

> Hi all,
>
> We are developing a code that heavily relies on NumPy. Some of our
> regression tests rely on floating point number comparisons and we are a bit
> lost in determining how to choose atol and rtol (we are trying to do all
> operations in double precision). We would like to set atol and rtol as low
> as possible but still have the tests pass if we run on different
> architectures or introduce some ?cosmetic? changes like using different
> similar NumPy routines.
>
> For example, let?s say we want some powers of the matrix A and compute
> them as:
>
> A = np.array(some_array)
> A2 = np.dot(A, A)
> A3 = np.dot(A2, A)
> A4 = np.dot(A3, A)
>
> If we alternatively computed A4 like:
>
> A4 = np.linalg.matrix_power(A, 4),
>
> we get different values in our final outputs because obviously the
> operations are not equivalent up to machine accuracy.
>
> Is there any reference that one could share providing guidelines on how to
> choose reasonable values for atol and rtol in this kind of situation? For
> example, does the NumPy package use a fixed set of values for its own
> development? the default ones?
>

I don't think there's a clear guide in docs or blog post anywhere. You can
get a sense of what works by browsing the unit tests for numpy and scipy.
numpy.linalg, scipy.linalg and scipy.special are particularly relevant
probably. For a rough rule of thumb: if you test on x86_64 and precision is
on the order of 1e-13 to 1e-16, then set a relative tolerance 10 to 100
times higher to account for other hardware, BLAS implementations, etc.

Cheers,
Ralf


> Thanks in advance for any help,
> Cheers,
> Bernard.
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210224/cfc2d213/attachment.html>

From ndbecker2 at gmail.com  Wed Feb 24 07:35:37 2021
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 24 Feb 2021 07:35:37 -0500
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAB6mnx+zHeVVz9XpVP5FEFjxv0=5s87gU2mNoXSSWfbbpfj75Q@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAB6mnxKNgeRsyREy3ELMThnZHbnf+thnBnsktGcT2RkUjTffHA@mail.gmail.com>
 <CAB6mnx+zHeVVz9XpVP5FEFjxv0=5s87gU2mNoXSSWfbbpfj75Q@mail.gmail.com>
Message-ID: <CAG3t+pGh_WAqVypL-vGnNjdD3Od7Mk+9WmH4XKG3i092MBegWQ@mail.gmail.com>

See my earlier email - this is fedora 33, python3.9.

I'm using fedora 33 standard numpy.
ldd says:

/usr/lib64/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so:
linux-vdso.so.1 (0x00007ffdd1487000)
libflexiblas.so.3 => /lib64/libflexiblas.so.3 (0x00007f0512787000)

So whatever flexiblas is doing controls blas.
flexiblas print
FlexiBLAS, version 3.0.4
Copyright (C) 2014, 2015, 2016, 2017, 2018, 2019, 2020 Martin Koehler
and others.
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.


Configured BLAS libraries:
System-wide (/etc/flexiblasrc):

System-wide from config directory (/etc/flexiblasrc.d/)
 OPENBLAS-OPENMP
   library = libflexiblas_openblas-openmp.so
   comment =
 NETLIB
   library = libflexiblas_netlib.so
   comment =
 ATLAS
   library = libflexiblas_atlas.so
   comment =

User config (/home/nbecker/.flexiblasrc):

Host config (/home/nbecker/.flexiblasrc.nbecker8):

Available hooks:

Backend and hook search paths:
  /usr/lib64/flexiblas/

Default BLAS:
    System:       OPENBLAS-OPENMP
    User:         (none)
    Host:         (none)
    Active Default: OPENBLAS-OPENMP (System)
Runtime properties:
   verbose = 0 (System)

So it looks  to me it is using openblas-openmp.


On Tue, Feb 23, 2021 at 8:00 PM Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
>
> On Tue, Feb 23, 2021 at 5:47 PM Charles R Harris <charlesr.harris at gmail.com> wrote:
>>
>>
>>
>> On Tue, Feb 23, 2021 at 11:10 AM Neal Becker <ndbecker2 at gmail.com> wrote:
>>>
>>> I have code that performs dot product of a 2D matrix of size (on the
>>> order of) [1000,16] with a vector of size [1000].  The matrix is
>>> float64 and the vector is complex128.  I was using numpy.dot but it
>>> turned out to be a bottleneck.
>>>
>>> So I coded dot2x1 in c++ (using xtensor-python just for the
>>> interface).  No fancy simd was used, unless g++ did it on it's own.
>>>
>>> On a simple benchmark using timeit I find my hand-coded routine is on
>>> the order of 1000x faster than numpy?  Here is the test code:
>>> My custom c++ code is dot2x1.  I'm not copying it here because it has
>>> some dependencies.  Any idea what is going on?
>>>
>>> import numpy as np
>>>
>>> from dot2x1 import dot2x1
>>>
>>> a = np.ones ((1000,16))
>>> b = np.array([ 0.80311816+0.80311816j,  0.80311816-0.80311816j,
>>>        -0.80311816+0.80311816j, -0.80311816-0.80311816j,
>>>         1.09707981+0.29396165j,  1.09707981-0.29396165j,
>>>        -1.09707981+0.29396165j, -1.09707981-0.29396165j,
>>>         0.29396165+1.09707981j,  0.29396165-1.09707981j,
>>>        -0.29396165+1.09707981j, -0.29396165-1.09707981j,
>>>         0.25495815+0.25495815j,  0.25495815-0.25495815j,
>>>        -0.25495815+0.25495815j, -0.25495815-0.25495815j])
>>>
>>> def F1():
>>>     d = dot2x1 (a, b)
>>>
>>> def F2():
>>>     d = np.dot (a, b)
>>>
>>> from timeit import timeit
>>> print (timeit ('F1()', globals=globals(), number=1000))
>>> print (timeit ('F2()', globals=globals(), number=1000))
>>>
>>> In [13]: 0.013910860987380147 << 1st timeit
>>> 28.608758996007964  << 2nd timeit
>>
>>
>> I'm going to guess threading, although huge pages can also be a problem on a machine under heavy load running other processes. Call overhead may also matter for such small matrices.
>>
>
> What BLAS library are you using. I get much better results using an 8 year old i5 and ATLAS.
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


--
Those who don't understand recursion are doomed to repeat it

From kevin.k.sheppard at gmail.com  Wed Feb 24 07:51:48 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Wed, 24 Feb 2021 12:51:48 +0000
Subject: [Numpy-discussion] Guidelines for floating point comparison
In-Reply-To: <CABL7CQjSoGy4hXPqBwUV_-U4Ais=WMde2967Ak9fyTh8FekG4w@mail.gmail.com>
References: <E4FDBD73-8A2D-41DE-99F4-7171FFBDA604@gmail.com>,
 <CABL7CQjSoGy4hXPqBwUV_-U4Ais=WMde2967Ak9fyTh8FekG4w@mail.gmail.com>
Message-ID: <FE4BB4BE-E538-4C91-B539-1F7CD6C00C9D@hxcore.ol>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210224/6356495a/attachment.html>

From charlesr.harris at gmail.com  Wed Feb 24 10:02:46 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 24 Feb 2021 08:02:46 -0700
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAG3t+pGh_WAqVypL-vGnNjdD3Od7Mk+9WmH4XKG3i092MBegWQ@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAB6mnxKNgeRsyREy3ELMThnZHbnf+thnBnsktGcT2RkUjTffHA@mail.gmail.com>
 <CAB6mnx+zHeVVz9XpVP5FEFjxv0=5s87gU2mNoXSSWfbbpfj75Q@mail.gmail.com>
 <CAG3t+pGh_WAqVypL-vGnNjdD3Od7Mk+9WmH4XKG3i092MBegWQ@mail.gmail.com>
Message-ID: <CAB6mnxJBuxdMmGSMaCV6TEG1KwUQaW+j9cmRkrMkAT7h+KdGxA@mail.gmail.com>

On Wed, Feb 24, 2021 at 5:36 AM Neal Becker <ndbecker2 at gmail.com> wrote:

> See my earlier email - this is fedora 33, python3.9.
>
> I'm using fedora 33 standard numpy.
> ldd says:
>
> /usr/lib64/python3.9/site-packages/numpy/core/_
> multiarray_umath.cpython-39-x86_64-linux-gnu.so:
> linux-vdso.so.1 (0x00007ffdd1487000)
> libflexiblas.so.3 => /lib64/libflexiblas.so.3 (0x00007f0512787000)
>
> So whatever flexiblas is doing controls blas.
> flexiblas print
> FlexiBLAS, version 3.0.4
> Copyright (C) 2014, 2015, 2016, 2017, 2018, 2019, 2020 Martin Koehler
> and others.
> This is free software; see the source code for copying conditions.
> There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
> FITNESS FOR A PARTICULAR PURPOSE.
>
>
> Configured BLAS libraries:
> System-wide (/etc/flexiblasrc):
>
> System-wide from config directory (/etc/flexiblasrc.d/)
>  OPENBLAS-OPENMP
>    library = libflexiblas_openblas-openmp.so
>    comment =
>  NETLIB
>    library = libflexiblas_netlib.so
>    comment =
>  ATLAS
>    library = libflexiblas_atlas.so
>    comment =
>
> User config (/home/nbecker/.flexiblasrc):
>
> Host config (/home/nbecker/.flexiblasrc.nbecker8):
>
> Available hooks:
>
> Backend and hook search paths:
>   /usr/lib64/flexiblas/
>
> Default BLAS:
>     System:       OPENBLAS-OPENMP
>     User:         (none)
>     Host:         (none)
>     Active Default: OPENBLAS-OPENMP (System)
> Runtime properties:
>    verbose = 0 (System)
>
> So it looks  to me it is using openblas-openmp.
>
>
ISTR that there have been problems with openmp. There are a ton of OpenBLAS
versions available in fedora 33. Just available via flexiblas


   1. flexiblas-openblas-openmp.x86_64 : FlexiBLAS wrappers for OpenBLAS
   2. flexiblas-openblas-openmp.i686 : FlexiBLAS wrappers for OpenBLAS
   3. flexiblas-openblas-openmp64.x86_64 : FlexiBLAS wrappers for OpenBLAS
   (64-bit)
   4. flexiblas-openblas-serial.x86_64 : FlexiBLAS wrappers for OpenBLAS
   5. flexiblas-openblas-serial64.x86_64 : FlexiBLAS wrappers for OpenBLAS
   (64-bit)
   6. flexiblas-openblas-threads.x86_64 : FlexiBLAS wrappers for OpenBLAS
   7. flexiblas-openblas-threads64.x86_64 : FlexiBLAS wrappers for OpenBLAS
   (64-bit)

I am not sure how to make use of flexiblas, but would explore that. We
might need to do something with distutils to interoperate with it or maybe
you can control it though site.cfg. There are 12 versions available in
total. I would suggest trying serial or pthreads.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210224/4f581d23/attachment.html>

From charlesr.harris at gmail.com  Wed Feb 24 10:12:10 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Wed, 24 Feb 2021 08:12:10 -0700
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAB6mnxJBuxdMmGSMaCV6TEG1KwUQaW+j9cmRkrMkAT7h+KdGxA@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAB6mnxKNgeRsyREy3ELMThnZHbnf+thnBnsktGcT2RkUjTffHA@mail.gmail.com>
 <CAB6mnx+zHeVVz9XpVP5FEFjxv0=5s87gU2mNoXSSWfbbpfj75Q@mail.gmail.com>
 <CAG3t+pGh_WAqVypL-vGnNjdD3Od7Mk+9WmH4XKG3i092MBegWQ@mail.gmail.com>
 <CAB6mnxJBuxdMmGSMaCV6TEG1KwUQaW+j9cmRkrMkAT7h+KdGxA@mail.gmail.com>
Message-ID: <CAB6mnxLEdP6r9WPWWe151RgebC0oAAbSDRVy-FGgJNj6UMUMZA@mail.gmail.com>

On Wed, Feb 24, 2021 at 8:02 AM Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Wed, Feb 24, 2021 at 5:36 AM Neal Becker <ndbecker2 at gmail.com> wrote:
>
>> See my earlier email - this is fedora 33, python3.9.
>>
>> I'm using fedora 33 standard numpy.
>> ldd says:
>>
>> /usr/lib64/python3.9/site-packages/numpy/core/_
>> multiarray_umath.cpython-39-x86_64-linux-gnu.so:
>> linux-vdso.so.1 (0x00007ffdd1487000)
>> libflexiblas.so.3 => /lib64/libflexiblas.so.3 (0x00007f0512787000)
>>
>> So whatever flexiblas is doing controls blas.
>> flexiblas print
>> FlexiBLAS, version 3.0.4
>> Copyright (C) 2014, 2015, 2016, 2017, 2018, 2019, 2020 Martin Koehler
>> and others.
>> This is free software; see the source code for copying conditions.
>> There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
>> FITNESS FOR A PARTICULAR PURPOSE.
>>
>>
>> Configured BLAS libraries:
>> System-wide (/etc/flexiblasrc):
>>
>> System-wide from config directory (/etc/flexiblasrc.d/)
>>  OPENBLAS-OPENMP
>>    library = libflexiblas_openblas-openmp.so
>>    comment =
>>  NETLIB
>>    library = libflexiblas_netlib.so
>>    comment =
>>  ATLAS
>>    library = libflexiblas_atlas.so
>>    comment =
>>
>> User config (/home/nbecker/.flexiblasrc):
>>
>> Host config (/home/nbecker/.flexiblasrc.nbecker8):
>>
>> Available hooks:
>>
>> Backend and hook search paths:
>>   /usr/lib64/flexiblas/
>>
>> Default BLAS:
>>     System:       OPENBLAS-OPENMP
>>     User:         (none)
>>     Host:         (none)
>>     Active Default: OPENBLAS-OPENMP (System)
>> Runtime properties:
>>    verbose = 0 (System)
>>
>> So it looks  to me it is using openblas-openmp.
>>
>>
> ISTR that there have been problems with openmp. There are a ton of
> OpenBLAS versions available in fedora 33. Just available via flexiblas
>
>
>    1. flexiblas-openblas-openmp.x86_64 : FlexiBLAS wrappers for OpenBLAS
>    2. flexiblas-openblas-openmp.i686 : FlexiBLAS wrappers for OpenBLAS
>    3. flexiblas-openblas-openmp64.x86_64 : FlexiBLAS wrappers for
>    OpenBLAS (64-bit)
>    4. flexiblas-openblas-serial.x86_64 : FlexiBLAS wrappers for OpenBLAS
>    5. flexiblas-openblas-serial64.x86_64 : FlexiBLAS wrappers for
>    OpenBLAS (64-bit)
>    6. flexiblas-openblas-threads.x86_64 : FlexiBLAS wrappers for OpenBLAS
>    7. flexiblas-openblas-threads64.x86_64 : FlexiBLAS wrappers for
>    OpenBLAS (64-bit)
>
> I am not sure how to make use of flexiblas, but would explore that. We
> might need to do something with distutils to interoperate with it or maybe
> you can control it though site.cfg. There are 12 versions available in
> total. I would suggest trying serial or pthreads.
>
>
Seems to be controlled in the /etc directory:

/etc/flexiblas64rc.d/openblas-openmp64.conf

On my machine it looks like openmp64 is the system default.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210224/32f43a08/attachment-0001.html>

From ndbecker2 at gmail.com  Wed Feb 24 12:31:24 2021
From: ndbecker2 at gmail.com (Neal Becker)
Date: Wed, 24 Feb 2021 12:31:24 -0500
Subject: [Numpy-discussion] C-coded dot 1000x faster than numpy?
In-Reply-To: <CAB6mnxLEdP6r9WPWWe151RgebC0oAAbSDRVy-FGgJNj6UMUMZA@mail.gmail.com>
References: <CAG3t+pGYVf7KFNp9NWFCd4c_CJAUSaB=NG6QYYvqSJMw-dUwdw@mail.gmail.com>
 <CAB6mnxKNgeRsyREy3ELMThnZHbnf+thnBnsktGcT2RkUjTffHA@mail.gmail.com>
 <CAB6mnx+zHeVVz9XpVP5FEFjxv0=5s87gU2mNoXSSWfbbpfj75Q@mail.gmail.com>
 <CAG3t+pGh_WAqVypL-vGnNjdD3Od7Mk+9WmH4XKG3i092MBegWQ@mail.gmail.com>
 <CAB6mnxJBuxdMmGSMaCV6TEG1KwUQaW+j9cmRkrMkAT7h+KdGxA@mail.gmail.com>
 <CAB6mnxLEdP6r9WPWWe151RgebC0oAAbSDRVy-FGgJNj6UMUMZA@mail.gmail.com>
Message-ID: <CAG3t+pELNRVJRF-6E9wbfhaHof2bbCSNA8w_pCXeBRU5Y2s6QA@mail.gmail.com>

Supposedly can control through env variables but I didn't see any effect

On Wed, Feb 24, 2021, 10:12 AM Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Wed, Feb 24, 2021 at 8:02 AM Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Wed, Feb 24, 2021 at 5:36 AM Neal Becker <ndbecker2 at gmail.com> wrote:
>>
>>> See my earlier email - this is fedora 33, python3.9.
>>>
>>> I'm using fedora 33 standard numpy.
>>> ldd says:
>>>
>>> /usr/lib64/python3.9/site-packages/numpy/core/_
>>> multiarray_umath.cpython-39-x86_64-linux-gnu.so:
>>> linux-vdso.so.1 (0x00007ffdd1487000)
>>> libflexiblas.so.3 => /lib64/libflexiblas.so.3 (0x00007f0512787000)
>>>
>>> So whatever flexiblas is doing controls blas.
>>> flexiblas print
>>> FlexiBLAS, version 3.0.4
>>> Copyright (C) 2014, 2015, 2016, 2017, 2018, 2019, 2020 Martin Koehler
>>> and others.
>>> This is free software; see the source code for copying conditions.
>>> There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
>>> FITNESS FOR A PARTICULAR PURPOSE.
>>>
>>>
>>> Configured BLAS libraries:
>>> System-wide (/etc/flexiblasrc):
>>>
>>> System-wide from config directory (/etc/flexiblasrc.d/)
>>>  OPENBLAS-OPENMP
>>>    library = libflexiblas_openblas-openmp.so
>>>    comment =
>>>  NETLIB
>>>    library = libflexiblas_netlib.so
>>>    comment =
>>>  ATLAS
>>>    library = libflexiblas_atlas.so
>>>    comment =
>>>
>>> User config (/home/nbecker/.flexiblasrc):
>>>
>>> Host config (/home/nbecker/.flexiblasrc.nbecker8):
>>>
>>> Available hooks:
>>>
>>> Backend and hook search paths:
>>>   /usr/lib64/flexiblas/
>>>
>>> Default BLAS:
>>>     System:       OPENBLAS-OPENMP
>>>     User:         (none)
>>>     Host:         (none)
>>>     Active Default: OPENBLAS-OPENMP (System)
>>> Runtime properties:
>>>    verbose = 0 (System)
>>>
>>> So it looks  to me it is using openblas-openmp.
>>>
>>>
>> ISTR that there have been problems with openmp. There are a ton of
>> OpenBLAS versions available in fedora 33. Just available via flexiblas
>>
>>
>>    1. flexiblas-openblas-openmp.x86_64 : FlexiBLAS wrappers for OpenBLAS
>>    2. flexiblas-openblas-openmp.i686 : FlexiBLAS wrappers for OpenBLAS
>>    3. flexiblas-openblas-openmp64.x86_64 : FlexiBLAS wrappers for
>>    OpenBLAS (64-bit)
>>    4. flexiblas-openblas-serial.x86_64 : FlexiBLAS wrappers for OpenBLAS
>>    5. flexiblas-openblas-serial64.x86_64 : FlexiBLAS wrappers for
>>    OpenBLAS (64-bit)
>>    6. flexiblas-openblas-threads.x86_64 : FlexiBLAS wrappers for OpenBLAS
>>    7. flexiblas-openblas-threads64.x86_64 : FlexiBLAS wrappers for
>>    OpenBLAS (64-bit)
>>
>> I am not sure how to make use of flexiblas, but would explore that. We
>> might need to do something with distutils to interoperate with it or maybe
>> you can control it though site.cfg. There are 12 versions available in
>> total. I would suggest trying serial or pthreads.
>>
>>
> Seems to be controlled in the /etc directory:
>
> /etc/flexiblas64rc.d/openblas-openmp64.conf
>
> On my machine it looks like openmp64 is the system default.
>
> Chuck
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210224/cbe5c43f/attachment.html>

From ilhanpolat at gmail.com  Thu Feb 25 02:33:01 2021
From: ilhanpolat at gmail.com (Ilhan Polat)
Date: Thu, 25 Feb 2021 08:33:01 +0100
Subject: [Numpy-discussion] Guidelines for floating point comparison
In-Reply-To: <FE4BB4BE-E538-4C91-B539-1F7CD6C00C9D@hxcore.ol>
References: <E4FDBD73-8A2D-41DE-99F4-7171FFBDA604@gmail.com>
 <CABL7CQjSoGy4hXPqBwUV_-U4Ais=WMde2967Ak9fyTh8FekG4w@mail.gmail.com>
 <FE4BB4BE-E538-4C91-B539-1F7CD6C00C9D@hxcore.ol>
Message-ID: <CAEBuzr_T816NWH8PT7E4D19bFEox4VgHGwgc-kLhQFt2fM29ZA@mail.gmail.com>

Matrix powers are annoyingly tricky to keep under control due to the fact
that things to explode or implode rather quickly. In fact the famous quote
from Moler, Van Loan "Unfortunately, the roundoff errors in the mth power
of a matrix, say B^m ,are usually small relative to ||B||^m rather than
||B^m||" So thanks for nothing smart people.

Then there is a typical bound on rounding errors of matrix multiplication
which says if C is the exact result and Ce the result of our operation then
for some number k this expression holds  |C - Ce| <= k * |A| * |B| From
that, hoping that matrix power won't result with an error too far from the
manual multiplication, it is a matter of selecting a sensible k for atol
and rtol=0. I would go about this as

(some arbitrary constant I am randomly throwing in)*(matrix size n)*
np.finfo(dtype).eps*norm(A, 1)**k

As an example, get a matrix and artificially bloat the (0,0) entry

n = 100
A = np.random.rand(n, n)
A += np.diag([10.]+[0.]*99)
A4 = np.linalg.matrix_power(A, 4)
AA = A @ A @ A @ A
print('Max entry error', np.max(np.abs(AA-A4)))
print('My atol value', 100*n*np.finfo(A.dtype).eps*np.linalg.norm(A, 1)*4)

This accidentally makes it a tight bound but depending on how wildly your A
varies or how the spectrum of A is structured you might need to change
these constants.


On Wed, Feb 24, 2021 at 1:53 PM Kevin Sheppard <kevin.k.sheppard at gmail.com>
wrote:

> In my experience it is most common to use reasonable but not exceedingly
> tight bounds in complex applications where there isn?t a proof that the
> maximum error must be smaller than some number.  I would also caution
> against using a single system to find the tightest tolerance a test passes
> at.  For example, if you can pass at a rol 1e-13 on Linux/AMD64/GCC 9, then
> you might want to set a tolerance around 1e-11 so that you don?t get caught
> out on other platforms. Notoriously challenging platforms in my experience
> (mostly from statsmodels) are 32-bit Windows, 32-bit Linux and OSX (and I
> suspect OSX/ARM64 will be another difficult one).
>
>
>
> This advice is moot if you have a precise bound for the error.
>
>
>
> Kevin
>
>
>
>
>
> *From: *Ralf Gommers <ralf.gommers at gmail.com>
> *Sent: *Wednesday, February 24, 2021 12:25 PM
> *To: *Discussion of Numerical Python <numpy-discussion at python.org>
> *Subject: *Re: [Numpy-discussion] Guidelines for floating point comparison
>
>
>
>
>
>
>
> On Wed, Feb 24, 2021 at 11:29 AM Bernard Knaepen <bknaepen at gmail.com>
> wrote:
>
> Hi all,
>
> We are developing a code that heavily relies on NumPy. Some of our
> regression tests rely on floating point number comparisons and we are a bit
> lost in determining how to choose atol and rtol (we are trying to do all
> operations in double precision). We would like to set atol and rtol as low
> as possible but still have the tests pass if we run on different
> architectures or introduce some ?cosmetic? changes like using different
> similar NumPy routines.
>
> For example, let?s say we want some powers of the matrix A and compute
> them as:
>
> A = np.array(some_array)
> A2 = np.dot(A, A)
> A3 = np.dot(A2, A)
> A4 = np.dot(A3, A)
>
> If we alternatively computed A4 like:
>
> A4 = np.linalg.matrix_power(A, 4),
>
> we get different values in our final outputs because obviously the
> operations are not equivalent up to machine accuracy.
>
> Is there any reference that one could share providing guidelines on how to
> choose reasonable values for atol and rtol in this kind of situation? For
> example, does the NumPy package use a fixed set of values for its own
> development? the default ones?
>
>
>
> I don't think there's a clear guide in docs or blog post anywhere. You can
> get a sense of what works by browsing the unit tests for numpy and scipy.
> numpy.linalg, scipy.linalg and scipy.special are particularly relevant
> probably. For a rough rule of thumb: if you test on x86_64 and precision is
> on the order of 1e-13 to 1e-16, then set a relative tolerance 10 to 100
> times higher to account for other hardware, BLAS implementations, etc.
>
>
>
> Cheers,
>
> Ralf
>
>
>
>
> Thanks in advance for any help,
> Cheers,
> Bernard.
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210225/2e87506d/attachment-0001.html>

From melissawm at gmail.com  Fri Feb 26 17:16:07 2021
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Fri, 26 Feb 2021 19:16:07 -0300
Subject: [Numpy-discussion] Documentation Team meeting - Monday March 1
In-Reply-To: <CAC7J6Vay+j5SsbXsJkCh2K5aeR6a8HkrkFpzDWoPkOfH=a9OkA@mail.gmail.com>
References: <CAC7J6Vbkch+=wQcEbJgELAphM2d2552Ywr2txrbinUR-khTR=g@mail.gmail.com>
 <CAC7J6VY9k_EGL-adWpeHLvFOEG6EBy3wauhD8cZ5vmL-JDxA0Q@mail.gmail.com>
 <CAC7J6VaNwZDbtt4FP6mX_-+iD_d=_4VVcjdGiaJj=q7eWCHewA@mail.gmail.com>
 <CAC7J6VZ2D8q0s7+rpKv7HYh9oBqwOQnUP_HjwKCjMtAnfi60Og@mail.gmail.com>
 <CAC7J6VY=vgKXySL1Euy30ZNzqkqvcr4Za1L04JZv-GCk87c1AQ@mail.gmail.com>
 <CAC7J6Vbq0vtd5JH6MXvcSj4A4aZjg+x=QdHvPyQGeGcmpp8JYg@mail.gmail.com>
 <CAC7J6VZ_+uh967zGc7qA6+KWKsP=gZ8bZecrJ9Hq9oG0JTzsOg@mail.gmail.com>
 <CAC7J6VZkder9wiEmE+Cm5Lxg0zt0t_a-KwfuvcCXZG91Wk+7kA@mail.gmail.com>
 <CAC7J6VYyt4c2qcp92iE8_BMF70cUC3-ZX5E5mqNa0zVAFryvmQ@mail.gmail.com>
 <CAC7J6VZq1DLDk--F+S6F50DHMMXOaWo6cbpSau7APHe22X_c9w@mail.gmail.com>
 <CAC7J6VaegMEY9dvMeRSe2YY=AUEJYkXNNF-KTy89zyE6LgHo5A@mail.gmail.com>
 <CAC7J6VY4_X-MyXajy0KryML03KoryKR9n5dsf5bupgv7noukUg@mail.gmail.com>
 <CAC7J6Va3n9Hdym+RmscOkDyf1ukH+DDtCv7N9D8y0cX1a+uC-A@mail.gmail.com>
 <CAC7J6VZX57n+_nWL-dKe4YzqY=Nc1=LYRXkNv8+JcP=q3Vyqog@mail.gmail.com>
 <CAC7J6Vb5uGp-gEMudbbiWSBKo1gyMsq03M4hz9mnNYir9GcqXw@mail.gmail.com>
 <CAC7J6Va3g7Pqrs=Hxtgg6GKHhZwErz4PPiKLw0YL-R2HUkWprQ@mail.gmail.com>
 <CAC7J6Vay+j5SsbXsJkCh2K5aeR6a8HkrkFpzDWoPkOfH=a9OkA@mail.gmail.com>
Message-ID: <CAC7J6VYzL4+hxf7=1NfxHx7tVyfQci2iNNr7=gCRgJkMVXuiyQ@mail.gmail.com>

Hi all!

Our next Documentation Team meeting will be on *Monday, March 1* at ***4PM
UTC***. All are welcome - you don't need to already be a contributor to
join. If you have questions or are curious about what we're doing, we'll be
happy to meet you!

If you wish to join on Zoom, use this link:

https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success

Here's the permanent hackmd document with the meeting notes (still being
updated in the next few days!):

https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg
<https://www.google.com/url?q=https%3A%2F%2Fhackmd.io%2FoB_boakvRqKR-_2jRV-Qjg&sa=D&usd=2&usg=AFQjCNGIOzVwlfDFd6YAgBwVUjmjQKWRSw>

Hope to see you around!

** You can click this link to get the correct time at your timezone:
https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210301T16&p1=1440&ah=1

- Melissa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210226/1985c596/attachment.html>

From bknaepen at gmail.com  Sun Feb 28 03:00:32 2021
From: bknaepen at gmail.com (Bernard Knaepen)
Date: Sun, 28 Feb 2021 09:00:32 +0100
Subject: [Numpy-discussion] Guidelines for floating point comparison
In-Reply-To: <CAEBuzr_T816NWH8PT7E4D19bFEox4VgHGwgc-kLhQFt2fM29ZA@mail.gmail.com>
References: <E4FDBD73-8A2D-41DE-99F4-7171FFBDA604@gmail.com>
 <CABL7CQjSoGy4hXPqBwUV_-U4Ais=WMde2967Ak9fyTh8FekG4w@mail.gmail.com>
 <FE4BB4BE-E538-4C91-B539-1F7CD6C00C9D@hxcore.ol>
 <CAEBuzr_T816NWH8PT7E4D19bFEox4VgHGwgc-kLhQFt2fM29ZA@mail.gmail.com>
Message-ID: <C85AF3D8-8634-4E7A-AFD7-958447878EC8@gmail.com>

Dear Ralph, Kevin and Ilhan,

Many thanks for your inputs! As I understand, there is no easy bulletproof way of specifying the tolerances and it?s probably better to adjust them based on the actual test performed.

Thanks again,
Bernard.

> On 25 Feb 2021, at 08:33, Ilhan Polat <ilhanpolat at gmail.com> wrote:
> 
> Matrix powers are annoyingly tricky to keep under control due to the fact that things to explode or implode rather quickly. In fact the famous quote from Moler, Van Loan "Unfortunately, the roundoff errors in the mth power of a matrix, say B^m ,are usually small relative to ||B||^m rather than ||B^m||" So thanks for nothing smart people. 
> 
> Then there is a typical bound on rounding errors of matrix multiplication which says if C is the exact result and Ce the result of our operation then for some number k this expression holds  |C - Ce| <= k * |A| * |B| From that, hoping that matrix power won't result with an error too far from the manual multiplication, it is a matter of selecting a sensible k for atol and rtol=0. I would go about this as 
> 
> (some arbitrary constant I am randomly throwing in)*(matrix size n)*np.finfo(dtype).eps*norm(A, 1)**k
> 
> As an example, get a matrix and artificially bloat the (0,0) entry
> 
> n = 100
> A = np.random.rand(n, n)
> A += np.diag([10.]+[0.]*99)
> A4 = np.linalg.matrix_power(A, 4)
> AA = A @ A @ A @ A
> print('Max entry error', np.max(np.abs(AA-A4)))
> print('My atol value', 100*n*np.finfo(A.dtype).eps*np.linalg.norm(A, 1)*4)
> 
> This accidentally makes it a tight bound but depending on how wildly your A varies or how the spectrum of A is structured you might need to change these constants. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Feb 24, 2021 at 1:53 PM Kevin Sheppard <kevin.k.sheppard at gmail.com <mailto:kevin.k.sheppard at gmail.com>> wrote:
> In my experience it is most common to use reasonable but not exceedingly tight bounds in complex applications where there isn?t a proof that the maximum error must be smaller than some number.  I would also caution against using a single system to find the tightest tolerance a test passes at.  For example, if you can pass at a rol 1e-13 on Linux/AMD64/GCC 9, then you might want to set a tolerance around 1e-11 so that you don?t get caught out on other platforms. Notoriously challenging platforms in my experience (mostly from statsmodels) are 32-bit Windows, 32-bit Linux and OSX (and I suspect OSX/ARM64 will be another difficult one).
> 
>  
> 
> This advice is moot if you have a precise bound for the error.
> 
>  
> 
> Kevin
> 
>  
> 
>  
> 
> From: Ralf Gommers <mailto:ralf.gommers at gmail.com>
> Sent: Wednesday, February 24, 2021 12:25 PM
> To: Discussion of Numerical Python <mailto:numpy-discussion at python.org>
> Subject: Re: [Numpy-discussion] Guidelines for floating point comparison
> 
>  
> 
>  
> 
>  
> 
> On Wed, Feb 24, 2021 at 11:29 AM Bernard Knaepen <bknaepen at gmail.com <mailto:bknaepen at gmail.com>> wrote:
> 
> Hi all,
> 
> We are developing a code that heavily relies on NumPy. Some of our regression tests rely on floating point number comparisons and we are a bit lost in determining how to choose atol and rtol (we are trying to do all operations in double precision). We would like to set atol and rtol as low as possible but still have the tests pass if we run on different architectures or introduce some ?cosmetic? changes like using different similar NumPy routines.
> 
> For example, let?s say we want some powers of the matrix A and compute them as:
> 
> A = np.array(some_array)
> A2 = np.dot(A, A)
> A3 = np.dot(A2, A)
> A4 = np.dot(A3, A)
> 
> If we alternatively computed A4 like:
> 
> A4 = np.linalg.matrix_power(A, 4),
> 
> we get different values in our final outputs because obviously the operations are not equivalent up to machine accuracy.
> 
> Is there any reference that one could share providing guidelines on how to choose reasonable values for atol and rtol in this kind of situation? For example, does the NumPy package use a fixed set of values for its own development? the default ones?
> 
>  
> 
> I don't think there's a clear guide in docs or blog post anywhere. You can get a sense of what works by browsing the unit tests for numpy and scipy. numpy.linalg, scipy.linalg and scipy.special are particularly relevant probably. For a rough rule of thumb: if you test on x86_64 and precision is on the order of 1e-13 to 1e-16, then set a relative tolerance 10 to 100 times higher to account for other hardware, BLAS implementations, etc.
> 
>  
> 
> Cheers,
> 
> Ralf
> 
>  
> 
> 
> Thanks in advance for any help,
> Cheers,
> Bernard.
> 
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
>  
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210228/69003ae9/attachment.html>

From mr395415 at students.mimuw.edu.pl  Sun Feb 28 19:32:19 2021
From: mr395415 at students.mimuw.edu.pl (Michal Radwanski)
Date: Mon, 01 Mar 2021 01:32:19 +0100
Subject: [Numpy-discussion] Possible
In-Reply-To: <569708ddac3cdd46aa251ef72a83d4a11d1314d5.camel@students.mimuw.edu.pl>
References: <569708ddac3cdd46aa251ef72a83d4a11d1314d5.camel@students.mimuw.edu.pl>
Message-ID: <851aa89d4beb6b91648a0b2d1f5df320c4d1f8a2.camel@students.mimuw.edu.pl>

Sorry for bad naming, I hit the "send" button too fast. Intended title:
"Possible documentation bug for numpy.array".

Jatorrizko mezua: al., 2021-03-01 01:30 +0100, egilea: Michal Radwanski
> Hello,
> 
> I'm not sure if it's expected behaviour or a bug, so I decided to
> write
> here. First an example:
> In [4]: array([2**63]) 
> Out[4]: array([9223372036854775808], dtype=uint64)
> 
> In [5]: array([2**63-1, 2**63]) 
> Out[5]: array([9.22337204e+18, 9.22337204e+18])
> 
> 
> The docs for `numpy.array` mention, that:
> 
> dtype : data-type, optional
> ?The desired data-type for the array. If not given, then the type 
> ?will be determined as the minimum type required to hold?
> ?the objects?in the sequence.
> 
> I understand the type promotions here, but I believe that the
> documentation is wrong in this case. Indeed, the minumum type in the
> latter case would be 'uint64'.
> 
> Is it a bug worth submitting/fixing?
> 
> 

-- 
Z wyrazami szacunku
Micha? Radwa?ski


With kind regards
Micha? Radwa?ski


From mr395415 at students.mimuw.edu.pl  Sun Feb 28 19:30:41 2021
From: mr395415 at students.mimuw.edu.pl (Michal Radwanski)
Date: Mon, 01 Mar 2021 01:30:41 +0100
Subject: [Numpy-discussion] Possible
Message-ID: <569708ddac3cdd46aa251ef72a83d4a11d1314d5.camel@students.mimuw.edu.pl>

Hello,

I'm not sure if it's expected behaviour or a bug, so I decided to write
here. First an example:
In [4]: array([2**63]) 
Out[4]: array([9223372036854775808], dtype=uint64)

In [5]: array([2**63-1, 2**63]) 
Out[5]: array([9.22337204e+18, 9.22337204e+18])


The docs for `numpy.array` mention, that:

dtype : data-type, optional
 The desired data-type for the array. If not given, then the type 
 will be determined as the minimum type required to hold?
 the objects?in the sequence.

I understand the type promotions here, but I believe that the
documentation is wrong in this case. Indeed, the minumum type in the
latter case would be 'uint64'.

Is it a bug worth submitting/fixing?


-- 
Z wyrazami szacunku
Micha? Radwa?ski


With kind regards
Micha? Radwa?ski


From sebastian at sipsolutions.net  Sun Feb 28 21:20:03 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 28 Feb 2021 20:20:03 -0600
Subject: [Numpy-discussion] Possible documentation bug for numpy.array
In-Reply-To: <569708ddac3cdd46aa251ef72a83d4a11d1314d5.camel@students.mimuw.edu.pl>
References: <569708ddac3cdd46aa251ef72a83d4a11d1314d5.camel@students.mimuw.edu.pl>
Message-ID: <fa30640be231b207826e68cf564ce66c77a0e3b9.camel@sipsolutions.net>

On Mon, 2021-03-01 at 01:30 +0100, Michal Radwanski wrote:
> Hello,
> 
> I'm not sure if it's expected behaviour or a bug, so I decided to
> write
> here. First an example:
> In [4]: array([2**63]) 
> Out[4]: array([9223372036854775808], dtype=uint64)
> 
> In [5]: array([2**63-1, 2**63]) 
> Out[5]: array([9.22337204e+18, 9.22337204e+18])
> 
> 

Thanks, this is a known issue, e.g.:
https://github.com/numpy/numpy/issues/14883
and https://github.com/numpy/numpy/issues/16287

Currently, my view is that trying to "fix" it so that the result is
truly minimal is probably doomed to introduce unnecessary complexity
and/or will just make the oddities slightly more hard to find.

Instead, my stance is that we should be to refuse to guess anything
beside the "default integer" users pass in integers.  That would
probably mean you get an error that `2**63` cannot be represented by
`int64` forcing you to be explicit about the dtype you expect.
(In the long run, it might also return an `object` array. [1]) 


With regards to the documentation... `np.array` promotes inputs as they
come in (depth first currently). I.e. in a "left-to-right" fashion.
That basically means, that you are right and "minimal" will not always
be true, due to our promotion rules.
But the bigger confusion is that Python Integers are mapped to NumPy
dtypes by finding the first one in the following list which can
represent the value:

  * C long: int64 on 64bit linux/mac, otherwise (all windows!) int32
  * C long long: int64 on all relevant platforms AFAIK
  * C unsigned long long: uint64 on all relevant platforms AFAIK
  * object

Which is an attempt at "minimal" of course.  If we have an idea how to
capture especially this integer behaviour in the docs, that may be a
good idea.  (The way the promotion is done also breaks the "minimal"
claim, but that is much more subtle.)

Cheers,

Sebastian


[1] However, before that happens, we may also consider an API where you
have to explicitly allow the `np.array` call to fall back to `object`
in cases where promotion fails ? including this case. I.e. with
something like:

    np.array(..., dtype="allow-object-fallback")  # of course shorter

(I can't find the issue about it right now, there is at least one where
this was discussed.)


> The docs for `numpy.array` mention, that:
> 
> dtype : data-type, optional
> ?The desired data-type for the array. If not given, then the type 
> ?will be determined as the minimum type required to hold?
> ?the objects?in the sequence.
> 
> I understand the type promotions here, but I believe that the
> documentation is wrong in this case. Indeed, the minumum type in the
> latter case would be 'uint64'.
> 
> Is it a bug worth submitting/fixing?
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210228/ead23487/attachment.sig>