From ralf.gommers at gmail.com  Mon Mar  1 04:53:11 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 1 Mar 2021 10:53:11 +0100
Subject: [Numpy-discussion] guide for downstream package authors & setting
 version constraints
Message-ID: <CABL7CQgVyrhapMMMB6s0ty4r897zgZBM3VUSz0Kg1Bsu=xrh2w@mail.gmail.com>

Hi all,

We now have a guide for downstream package authors, talking about API and
ABI stability, NumPy versioning, testing against NumPy master, and how to
add build time and runtime dependencies for numpy:
https://numpy.org/devdocs/user/depending_on_numpy.html

Especially the version constraints and setting upper bounds for
install_requires correctly is important - almost no packages do this
correctly (or at all really). If your package depends on NumPy and you deal
with packaging it, please check it out!

And for even more practical details on release process steps for a
downstream package, see
http://scipy.github.io/devdocs/dev/core-dev/index.html#updating-upper-bounds-of-dependencies

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210301/db66f603/attachment.html>

From sebastian at sipsolutions.net  Wed Mar  3 10:44:44 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 03 Mar 2021 09:44:44 -0600
Subject: [Numpy-discussion] NumPy Community Meeting Wednesday (Today)
Message-ID: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting Wednesday March 3rd at 12pm
Pacific Time (20:00 UTC). Everyone is invited and encouraged to
join in and edit the work-in-progress meeting topics and notes at:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian


PS: Sorry for the late reminder.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210303/97741978/attachment.sig>

From robbmcleod at gmail.com  Wed Mar  3 13:30:12 2021
From: robbmcleod at gmail.com (Robert McLeod)
Date: Wed, 3 Mar 2021 10:30:12 -0800
Subject: [Numpy-discussion] ANN: NumExpr 2.7.3
Message-ID: <CAEFUWWXvzR_ZZd=r2-WGJv_EkmcXDx-47Md_=q5=q3H6B_hf0Q@mail.gmail.com>

========================
Announcing NumExpr 2.7.3
========================

Hi everyone,

This is a maintenance release to make use of the oldest supported NumPy
version
when building wheels, in an effort to alleviate issues seen on Windows
machines
that do not have the latest Windows MSVC runtime installed. It also adds
wheels built via GitHub Actions for ARMv8 platforms.

Project documentation is available at:

http://numexpr.readthedocs.io/

Changes from 2.7.2 to 2.7.3
---------------------------

- Pinned Numpy versions to minimum supported version in an effort to
alleviate
  issues seen in Windows machines not having the same Windows SDK installed
as
  was used to build the wheels.
- ARMv8 wheels are now available, thanks to `odidev` for the pull request.

What's Numexpr?
---------------

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It has multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

Where I can find Numexpr?
-------------------------

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Documentation is hosted at:

http://numexpr.readthedocs.io/en/latest/

Share your experience
---------------------

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Enjoy data!


-- 
Robert McLeod
robbmcleod at gmail.com
robert.mcleod at hitachi-hhtc.ca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210303/9cf97e7d/attachment.html>

From stefanv at berkeley.edu  Thu Mar  4 02:08:59 2021
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Wed, 03 Mar 2021 23:08:59 -0800
Subject: [Numpy-discussion] Development branches renamed
Message-ID: <68d4ea36-941e-4232-b679-dbf1ff8f4469@www.fastmail.com>

Hi everyone,

The development branches of most of the repositories on github.com/numpy have been renamed to `main` (this is the GitHub default for newly created repositories).  The move has not yet been made for sub-projects such as `numpydoc` or `numpy.org`, but those should follow soon.

We were able to preserve all PRs, other than those for which the original branches have been deleted.

You can update your locally cloned repository to have a `main` branch as follows:

git branch -m master main
git fetch <YOUR_UPSTREAM_REMOTE>
git branch -u <YOUR_UPSTREAM_REMOTE>/main main

(where YOUR_UPSTREAM_REMOTE is typically called `upstream` or `origin`)

If you have any trouble, let us know.

Best regards,
St?fan

From ralf.gommers at gmail.com  Thu Mar  4 03:32:52 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 4 Mar 2021 09:32:52 +0100
Subject: [Numpy-discussion] Development branches renamed
In-Reply-To: <68d4ea36-941e-4232-b679-dbf1ff8f4469@www.fastmail.com>
References: <68d4ea36-941e-4232-b679-dbf1ff8f4469@www.fastmail.com>
Message-ID: <CABL7CQi9+FXqETqyPLd+7LE2Fq=7uqmmBTrBZtZE=ACVLdky+g@mail.gmail.com>

On Thu, Mar 4, 2021 at 8:09 AM Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> Hi everyone,
>
> The development branches of most of the repositories on github.com/numpy
> have been renamed to `main` (this is the GitHub default for newly created
> repositories).  The move has not yet been made for sub-projects such as
> `numpydoc` or `numpy.org`, but those should follow soon.
>

Thanks for working on this St?fan!

Cheers,
Ralf


> We were able to preserve all PRs, other than those for which the original
> branches have been deleted.
>
> You can update your locally cloned repository to have a `main` branch as
> follows:
>
> git branch -m master main
> git fetch <YOUR_UPSTREAM_REMOTE>
> git branch -u <YOUR_UPSTREAM_REMOTE>/main main
>
> (where YOUR_UPSTREAM_REMOTE is typically called `upstream` or `origin`)
>
> If you have any trouble, let us know.
>
> Best regards,
> St?fan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210304/14b66867/attachment.html>

From klark--kent at yandex.ru  Thu Mar  4 17:37:40 2021
From: klark--kent at yandex.ru (klark--kent at yandex.ru)
Date: Fri, 05 Mar 2021 01:37:40 +0300
Subject: [Numpy-discussion] NumPy logo merchandise
In-Reply-To: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net>
References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net>
Message-ID: <2911614896816@mail.yandex.ru>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210305/eb637b00/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: IMG-20201122-WA0001.jpg
Type: image/jpeg
Size: 80498 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210305/eb637b00/attachment-0001.jpg>

From shoyer at gmail.com  Thu Mar  4 19:54:07 2021
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 4 Mar 2021 16:54:07 -0800
Subject: [Numpy-discussion] NumPy logo merchandise
In-Reply-To: <2911614896816@mail.yandex.ru>
References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net>
 <2911614896816@mail.yandex.ru>
Message-ID: <CAEQ_TvcBKZZvNVJuLLMDuHimq=K-hRyy6czA6cykJF2ryEbgPg@mail.gmail.com>

I love your mittens!

NumPy really should be in the NumFOCUS store, but it currently isn't:
https://shop.spreadshirt.com/numfocus/all

I'll make some inquiries to see if we can sort that out :).

On Thu, Mar 4, 2021 at 2:43 PM <klark--kent at yandex.ru> wrote:

> Hello. I was looking for a T-shirt with Numpy logo but didn't find
> anywhere. Anybody knows if there's a merchandise with Numpy? So I have to
> kneet mittens with Numpy logo for myself.
>
> Best regards!
> Konstantin
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210304/5f8f53ca/attachment.html>

From jni at fastmail.com  Thu Mar  4 19:57:42 2021
From: jni at fastmail.com (Juan Nunez-Iglesias)
Date: Fri, 5 Mar 2021 11:57:42 +1100
Subject: [Numpy-discussion] NumPy logo merchandise
In-Reply-To: <CAEQ_TvcBKZZvNVJuLLMDuHimq=K-hRyy6czA6cykJF2ryEbgPg@mail.gmail.com>
References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net>
 <2911614896816@mail.yandex.ru>
 <CAEQ_TvcBKZZvNVJuLLMDuHimq=K-hRyy6czA6cykJF2ryEbgPg@mail.gmail.com>
Message-ID: <D8338B21-ED84-48AF-BC3E-A560C5166A3A@fastmail.com>

Yeah I desperately want those mitts!

 <https://www.google.com/url?sa=i&url=https%3A%2F%2Fscaledynamix.com%2Fblog%2Fhow-site-performance-affects-revenue%2F&psig=AOvVaw3lU_Cr7A8iTGEHDVIJy5rK&ust=1614992213924000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCV0oD5l-8CFQAAAAAdAAAAABAD>


> On 5 Mar 2021, at 11:54 am, Stephan Hoyer <shoyer at gmail.com> wrote:
> 
> I love your mittens!
> 
> NumPy really should be in the NumFOCUS store, but it currently isn't:
> https://shop.spreadshirt.com/numfocus/all <https://shop.spreadshirt.com/numfocus/all>
> 
> I'll make some inquiries to see if we can sort that out :).
> 
> On Thu, Mar 4, 2021 at 2:43 PM <klark--kent at yandex.ru <mailto:klark--kent at yandex.ru>> wrote:
> Hello. I was looking for a T-shirt with Numpy logo but didn't find anywhere. Anybody knows if there's a merchandise with Numpy? So I have to kneet mittens with Numpy logo for myself.
> 
> Best regards!
> Konstantin
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210305/7469b199/attachment.html>

From melissawm at gmail.com  Fri Mar  5 07:58:03 2021
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Fri, 5 Mar 2021 09:58:03 -0300
Subject: [Numpy-discussion] NumPy logo merchandise
In-Reply-To: <D8338B21-ED84-48AF-BC3E-A560C5166A3A@fastmail.com>
References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net>
 <2911614896816@mail.yandex.ru>
 <CAEQ_TvcBKZZvNVJuLLMDuHimq=K-hRyy6czA6cykJF2ryEbgPg@mail.gmail.com>
 <D8338B21-ED84-48AF-BC3E-A560C5166A3A@fastmail.com>
Message-ID: <CAC7J6VaPBaG3yaLbXk415sWaC2CJw2du2dPSQuzaKTTj1BQrsw@mail.gmail.com>

Hi Konstantin,

Would you mind open-sourcing that recipe? :D

Cheers,

Melissa


Em qui, 4 de mar de 2021 21:59, Juan Nunez-Iglesias <jni at fastmail.com>
escreveu:

> Yeah I desperately want those mitts!
>
> [image: Shut up and take my money! - How site performance affects revenue
> - Scale Dynamix]
> <https://www.google.com/url?sa=i&url=https%3A%2F%2Fscaledynamix.com%2Fblog%2Fhow-site-performance-affects-revenue%2F&psig=AOvVaw3lU_Cr7A8iTGEHDVIJy5rK&ust=1614992213924000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCJCV0oD5l-8CFQAAAAAdAAAAABAD>
>
>
> On 5 Mar 2021, at 11:54 am, Stephan Hoyer <shoyer at gmail.com> wrote:
>
> I love your mittens!
>
> NumPy really should be in the NumFOCUS store, but it currently isn't:
> https://shop.spreadshirt.com/numfocus/all
>
> I'll make some inquiries to see if we can sort that out :).
>
> On Thu, Mar 4, 2021 at 2:43 PM <klark--kent at yandex.ru> wrote:
>
>> Hello. I was looking for a T-shirt with Numpy logo but didn't find
>> anywhere. Anybody knows if there's a merchandise with Numpy? So I have to
>> kneet mittens with Numpy logo for myself.
>>
>> Best regards!
>> Konstantin
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210305/c8807cb1/attachment-0001.html>

From klark--kent at yandex.ru  Fri Mar  5 17:02:02 2021
From: klark--kent at yandex.ru (klark--kent at yandex.ru)
Date: Sat, 06 Mar 2021 01:02:02 +0300
Subject: [Numpy-discussion] NumPy logo merchandise
In-Reply-To: <CAC7J6VaPBaG3yaLbXk415sWaC2CJw2du2dPSQuzaKTTj1BQrsw@mail.gmail.com>
References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net>
 <2911614896816@mail.yandex.ru>
 <CAEQ_TvcBKZZvNVJuLLMDuHimq=K-hRyy6czA6cykJF2ryEbgPg@mail.gmail.com>
 <D8338B21-ED84-48AF-BC3E-A560C5166A3A@fastmail.com>
 <CAC7J6VaPBaG3yaLbXk415sWaC2CJw2du2dPSQuzaKTTj1BQrsw@mail.gmail.com>
Message-ID: <214931614981686@mail.yandex.ru>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210306/d75817ea/attachment.html>

From toddrjen at gmail.com  Sat Mar  6 09:44:17 2021
From: toddrjen at gmail.com (Todd)
Date: Sat, 6 Mar 2021 09:44:17 -0500
Subject: [Numpy-discussion] String accessor methods
Message-ID: <CAFpSVpJ7CXZBYohufvr6gQr3sJrb+vgWQT6qs2mj8WRre-z3Uw@mail.gmail.com>

Currently. working with strings in numpy is not very convenient. You have
to use a separate set of functions in a separate namespace, and those
functions are relatively limited and poorly-documented.

A solution several other projects, including pandas [0] and xarray [1],
have found are string accessor methods. These are a set of methods attached
to a `str` attribute of the class.  These have the advantage that they are
always available and have a well-defined object they operate on.  On
non-str dtypes, it would raise an exception.

This would also provide a standardized set of methods and behaviors that
are part of the numpy api that other classes could depend on.

An example would be something like this:

>>> mystr = np.array(["test first", "test second", "test third"])
>>> mystr.str.title()
array(['Test First', 'Test Second', 'Test Third'], dtype='<U11')

[0]
https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#string-methods
[1]
https://xarray.pydata.org/en/stable/generated/xarray.core.accessor_str.StringAccessor.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210306/68f979fa/attachment.html>

From blkzol001 at myuct.ac.za  Sat Mar  6 12:36:35 2021
From: blkzol001 at myuct.ac.za (zoj613)
Date: Sat, 6 Mar 2021 10:36:35 -0700 (MST)
Subject: [Numpy-discussion] Using logfactorial instead of loggamma in
 random_poisson sampler
Message-ID: <1615052195691-0.post@n7.nabble.com>

Hi All,

I noticed that the transformed rejection method for generating Poisson
random variables used in numpy makes use of the `random_loggam` function
which directly calculates the log-gamma function. It appears that a
log-factorial lookup table was added a few years back which could be used in
place of random_loggam since the input is always an integer. Is there a
reason for not using this table instead? See link below for the line of
code:

https://github.com/numpy/numpy/blob/6222e283fa0b8fb9ba562dabf6ca9ea7ed65be39/numpy/random/src/distributions/distributions.c#L572

Regards
Zolisa


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From blkzol001 at myuct.ac.za  Sat Mar  6 12:52:26 2021
From: blkzol001 at myuct.ac.za (zoj613)
Date: Sat, 6 Mar 2021 10:52:26 -0700 (MST)
Subject: [Numpy-discussion] guide for downstream package authors &
 setting version constraints
In-Reply-To: <CABL7CQgVyrhapMMMB6s0ty4r897zgZBM3VUSz0Kg1Bsu=xrh2w@mail.gmail.com>
References: <CABL7CQgVyrhapMMMB6s0ty4r897zgZBM3VUSz0Kg1Bsu=xrh2w@mail.gmail.com>
Message-ID: <1615053146181-0.post@n7.nabble.com>

Thanks you, this looks very informative. Is there a best practice guide
somewhere in the docs on how to correctly expose C-level code to third
parties via .pxd files, similarly to how one can access the c_distributions
of numpy via cython? I tried this previously and failed miserably. It seemed
like symbols for some C functions I tried to expose to the user via
cython declaration could not be found. I know I did something wrong, but im
not sure what (I linked the header files and everything). The Cython docs
were not very helpful. Maybe scipy/numpy devs could shed some light on how
this is properly done?


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From dan_patterson at outlook.com  Sat Mar  6 12:57:04 2021
From: dan_patterson at outlook.com (dan_patterson)
Date: Sat, 6 Mar 2021 10:57:04 -0700 (MST)
Subject: [Numpy-discussion] String accessor methods
In-Reply-To: <CAFpSVpJ7CXZBYohufvr6gQr3sJrb+vgWQT6qs2mj8WRre-z3Uw@mail.gmail.com>
References: <CAFpSVpJ7CXZBYohufvr6gQr3sJrb+vgWQT6qs2mj8WRre-z3Uw@mail.gmail.com>
Message-ID: <1615053424270-0.post@n7.nabble.com>

The are  in np.char

mystr = np.array(["test first", "test second", "test third"])

np.char.title(mystr)
array(['Test First', 'Test Second', 'Test Third'], dtype='<U11')


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From warren.weckesser at gmail.com  Sat Mar  6 13:44:21 2021
From: warren.weckesser at gmail.com (Warren Weckesser)
Date: Sat, 6 Mar 2021 13:44:21 -0500
Subject: [Numpy-discussion] Using logfactorial instead of loggamma in
 random_poisson sampler
In-Reply-To: <1615052195691-0.post@n7.nabble.com>
References: <1615052195691-0.post@n7.nabble.com>
Message-ID: <CAGzF1udf6WuVOTVD5ipmXFcQ0MR4C5OHkMgFRKLGiHFfZ6VXVg@mail.gmail.com>

On 3/6/21, zoj613 <blkzol001 at myuct.ac.za> wrote:
> Hi All,
>
> I noticed that the transformed rejection method for generating Poisson
> random variables used in numpy makes use of the `random_loggam` function
> which directly calculates the log-gamma function. It appears that a
> log-factorial lookup table was added a few years back which could be used
> in
> place of random_loggam since the input is always an integer. Is there a
> reason for not using this table instead? See link below for the line of
> code:
>
> https://github.com/numpy/numpy/blob/6222e283fa0b8fb9ba562dabf6ca9ea7ed65be39/numpy/random/src/distributions/distributions.c#L572
>
> Regards
> Zolisa
>

Hi Zolisa,

In the pull request where the C function logfactorial was added
(https://github.com/numpy/numpy/pull/13761), I originally modified the
Poisson code to use logfactorial as you suggest, but Kevin (@bashtage
on github) pointed out that the change could potentially alter the
random stream for the legacy version. Making the change requires
creating separate C functions, one for the legacy code that remains
unchanged, and one for the newer Generator class that would use
logfactorial.  You can see the comments here (click on "Show
resolved"):

    https://github.com/numpy/numpy/pull/13761#pullrequestreview-249973405

At the time, making that change was not a high priority, so I didn't
pursue it. It does make sense to use the logfactorial function there,
and I'd be happy to see it updated, but be aware that making the
change is more work than changing just the function call.

Warren

>
>
> --
> Sent from: http://numpy-discussion.10968.n7.nabble.com/
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

From blkzol001 at myuct.ac.za  Sat Mar  6 14:05:14 2021
From: blkzol001 at myuct.ac.za (zoj613)
Date: Sat, 6 Mar 2021 12:05:14 -0700 (MST)
Subject: [Numpy-discussion] Using logfactorial instead of loggamma in
 random_poisson sampler
In-Reply-To: <CAGzF1udf6WuVOTVD5ipmXFcQ0MR4C5OHkMgFRKLGiHFfZ6VXVg@mail.gmail.com>
References: <1615052195691-0.post@n7.nabble.com>
 <CAGzF1udf6WuVOTVD5ipmXFcQ0MR4C5OHkMgFRKLGiHFfZ6VXVg@mail.gmail.com>
Message-ID: <1615057514907-0.post@n7.nabble.com>

Ah, I had a suspicion that it was to preserve the random stream but wasn't
too sure. Thanks for the clarification.


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From robert.kern at gmail.com  Sat Mar  6 21:39:38 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 6 Mar 2021 21:39:38 -0500
Subject: [Numpy-discussion] Using logfactorial instead of loggamma in
 random_poisson sampler
In-Reply-To: <CAGzF1udf6WuVOTVD5ipmXFcQ0MR4C5OHkMgFRKLGiHFfZ6VXVg@mail.gmail.com>
References: <1615052195691-0.post@n7.nabble.com>
 <CAGzF1udf6WuVOTVD5ipmXFcQ0MR4C5OHkMgFRKLGiHFfZ6VXVg@mail.gmail.com>
Message-ID: <CAF6FJisQ8QydLF3GiABk5OFireZPmZxeDWaDHNHbuZ-0Rbkj=Q@mail.gmail.com>

On Sat, Mar 6, 2021 at 1:45 PM Warren Weckesser <warren.weckesser at gmail.com>
wrote:

> At the time, making that change was not a high priority, so I didn't
> pursue it. It does make sense to use the logfactorial function there,
> and I'd be happy to see it updated, but be aware that making the
> change is more work than changing just the function call.
>

Does it make a big difference? Per NEP 19, even in `Generator`, we do weigh
the cost of changing the stream reasonably highly. Improved accuracy is
likely worthwhile, but a minor performance improvement is probably not.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210306/3faa02a4/attachment.html>

From matti.picus at gmail.com  Sun Mar  7 00:23:36 2021
From: matti.picus at gmail.com (Matti Picus)
Date: Sun, 7 Mar 2021 07:23:36 +0200
Subject: [Numpy-discussion] guide for downstream package authors &
 setting version constraints
In-Reply-To: <1615053146181-0.post@n7.nabble.com>
References: <CABL7CQgVyrhapMMMB6s0ty4r897zgZBM3VUSz0Kg1Bsu=xrh2w@mail.gmail.com>
 <1615053146181-0.post@n7.nabble.com>
Message-ID: <ece3423f-ef5b-77cc-7c81-4ef6dbc5d6e6@gmail.com>

This is a different topic altogether. I think you would get better 
results asking on the cython-users mailing list with a concrete example 
of something that didn't work.

Matti


On 3/6/21 7:52 PM, zoj613 wrote:
> Is there a best practice guide
> somewhere in the docs on how to correctly expose C-level code to third
> parties via .pxd files, similarly to how one can access the c_distributions
> of numpy via cython?

From toddrjen at gmail.com  Sun Mar  7 00:30:56 2021
From: toddrjen at gmail.com (Todd)
Date: Sun, 7 Mar 2021 00:30:56 -0500
Subject: [Numpy-discussion] String accessor methods
In-Reply-To: <1615053424270-0.post@n7.nabble.com>
References: <CAFpSVpJ7CXZBYohufvr6gQr3sJrb+vgWQT6qs2mj8WRre-z3Uw@mail.gmail.com>
 <1615053424270-0.post@n7.nabble.com>
Message-ID: <CAFpSVpJF4iimRXzzBXo9eSTLf4RMxwCknPesURS-h9=KxL0d4A@mail.gmail.com>

On Sat, Mar 6, 2021 at 12:57 PM dan_patterson <dan_patterson at outlook.com>
wrote:

> The are  in np.char
>
> mystr = np.array(["test first", "test second", "test third"])
>
> np.char.title(mystr)
> array(['Test First', 'Test Second', 'Test Third'], dtype='<U11')
>

I mentioned those in my email, but they are far less convenient to use than
class methods, nor do they relate well to how built-in strings are used in
Python. That is why other projects have started using accessor methods and
why Python removed all the separate string functions in Python 3.
The functions in np.char are also limited in their capabilities, and fairly
poorly documented in my opinion.  Some of those limitations are impossible
to overcome, for example they inherently can never support operators,
addition or multiplication, or slicing like Python strings can, while
an accessor could.

However, putting them as top-level methods for ndarray would pollute the
methods too much. That is why I am suggesting numpy do the same thing that
pandas, xarray, etc. are doing and putting those as methods under a 'str'
attribute for ndarrays rather than as separate functions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210307/ee506e6e/attachment.html>

From kevin.k.sheppard at gmail.com  Sun Mar  7 04:34:29 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Sun, 7 Mar 2021 09:34:29 +0000
Subject: [Numpy-discussion] String accessor methods
In-Reply-To: <CAFpSVpJF4iimRXzzBXo9eSTLf4RMxwCknPesURS-h9=KxL0d4A@mail.gmail.com>
References: <CAFpSVpJ7CXZBYohufvr6gQr3sJrb+vgWQT6qs2mj8WRre-z3Uw@mail.gmail.com>
 <1615053424270-0.post@n7.nabble.com>
 <CAFpSVpJF4iimRXzzBXo9eSTLf4RMxwCknPesURS-h9=KxL0d4A@mail.gmail.com>
Message-ID: <CAE8Ss2EwihsURhH0VcPnbxX6H2xMbHsAvbbZ9wL8hJuO6isRYg@mail.gmail.com>

I think that and string functions that are exposed from an ndarray would
have to be guaranteed to work in-place. Requiring casting to objects to use
the methods feels more like syntactic sugar than an essential case. I think
most of the ones mentioned are low performance and can't take advantage of
the storage as a blob of int8 (ascii) or int32 (utf32) that underlay Numpy
string arrays.

I also think the existence of these in pandas reduces the case for them
being in Numpy.

On Sun, Mar 7, 2021, 05:32 Todd <toddrjen at gmail.com> wrote:

> On Sat, Mar 6, 2021 at 12:57 PM dan_patterson <dan_patterson at outlook.com>
> wrote:
>
>> The are  in np.char
>>
>> mystr = np.array(["test first", "test second", "test third"])
>>
>> np.char.title(mystr)
>> array(['Test First', 'Test Second', 'Test Third'], dtype='<U11')
>>
>
> I mentioned those in my email, but they are far less convenient to use
> than class methods, nor do they relate well to how built-in strings are
> used in Python. That is why other projects have started using accessor
> methods and why Python removed all the separate string functions in Python
> 3. The functions in np.char are also limited in their capabilities, and
> fairly poorly documented in my opinion.  Some of those limitations are
> impossible to overcome, for example they inherently can never support
> operators, addition or multiplication, or slicing like Python strings can,
> while an accessor could.
>
> However, putting them as top-level methods for ndarray would pollute the
> methods too much. That is why I am suggesting numpy do the same thing that
> pandas, xarray, etc. are doing and putting those as methods under a 'str'
> attribute for ndarrays rather than as separate functions.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210307/efe765de/attachment.html>

From sebastian at sipsolutions.net  Sun Mar  7 11:16:36 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 07 Mar 2021 10:16:36 -0600
Subject: [Numpy-discussion] String accessor methods
In-Reply-To: <CAE8Ss2EwihsURhH0VcPnbxX6H2xMbHsAvbbZ9wL8hJuO6isRYg@mail.gmail.com>
References: <CAFpSVpJ7CXZBYohufvr6gQr3sJrb+vgWQT6qs2mj8WRre-z3Uw@mail.gmail.com>
 <1615053424270-0.post@n7.nabble.com>
 <CAFpSVpJF4iimRXzzBXo9eSTLf4RMxwCknPesURS-h9=KxL0d4A@mail.gmail.com>
 <CAE8Ss2EwihsURhH0VcPnbxX6H2xMbHsAvbbZ9wL8hJuO6isRYg@mail.gmail.com>
Message-ID: <85bb80fe21d2666a68a78c7a8af792c7bd2d5b7d.camel@sipsolutions.net>

On Sun, 2021-03-07 at 09:34 +0000, Kevin Sheppard wrote:
> I think that and string functions that are exposed from an ndarray
> would
> have to be guaranteed to work in-place. Requiring casting to objects
> to use
> the methods feels more like syntactic sugar than an essential case. I
> think
> most of the ones mentioned are low performance and can't take
> advantage of
> the storage as a blob of int8 (ascii) or int32 (utf32) that underlay
> Numpy
> string arrays.
> 
> I also think the existence of these in pandas reduces the case for
> them
> being in Numpy.

I agree with this, the need seems much lower in NumPy. And NumPy's
currently somewhat weird strings at least for me makes it even less
appealing to expose more string utilities of any kind at this time.

In general, there is probably something to be said about such
"accessor", in the sense of having a place to put methods which are
specific to the array's dtype.  Other examples are datetime/timedelta
or Units and probably many potential DTypes [1]. It is one advantage
that the `astropy.units.Quantity` subclass has over a DType based
solution: `methods` can be added very transparently.

Basically: The current `np.char` functions are a bit weird and I would
need a quite a bit more convincing to expose them at this time.
But, I would be delighted if we can think of a solution that goes
beyond `str` [2].  Probably not adding `ndarray.str` at all or only if
the array has a string DType.
But do it in way that generalizes!  That could be a DType specific
mixin class, or I had previously played with the thought of a "generic"
accessor:
    `ndarray.elementwise.<ufuncs-provided-by-DType>`

But those go beyond the original string request and need some smart
idea/thoughts!

An interesting aside is that `arr.imag` and `arr.real` fall into the
same category. But they are narrow enough that we can just have a
specific solution for them.

Cheers,

Sebastian


[1] Datetimes/timedelta might have some use of basic timezone handling
(not sure if relevant to NumPy's naive datetimes).

`astropy.units.Quantity` has a few extra methods/properties:

* `.cgs`, `.si`, `.decompose()`, `.to()`: cast to different unit.
* `.unit`
* `.value`: get a value array view without any unit.
* `.to_value()` method that returns a copy, not a view.

Of course we can spell those using DTypes, but I think it might be
long: `arr.astype(arr.dtype.cgs)`, or `arr.view(arr.dtype.unitless)`.
Utility functions similar to `np.char` also can simplify all of this,
but methods do have merit.
Other user DTypes could very well have more compelling use-cases.


[2] But it probably won't reach my serious thinking cycles for a while.
For starters, dedicated utility functions seem decent enough...


> 
> On Sun, Mar 7, 2021, 05:32 Todd <toddrjen at gmail.com> wrote:
> 
> > On Sat, Mar 6, 2021 at 12:57 PM dan_patterson <
> > dan_patterson at outlook.com>
> > wrote:
> > 
> > > The are? in np.char
> > > 
> > > mystr = np.array(["test first", "test second", "test third"])
> > > 
> > > np.char.title(mystr)
> > > array(['Test First', 'Test Second', 'Test Third'], dtype='<U11')
> > > 
> > 
> > I mentioned those in my email, but they are far less convenient to
> > use
> > than class methods, nor do they relate well to how built-in strings
> > are
> > used in Python. That is why other projects have started using
> > accessor
> > methods and why Python removed all the separate string functions in
> > Python
> > 3. The functions in np.char are also limited in their capabilities,
> > and
> > fairly poorly documented in my opinion.? Some of those limitations
> > are
> > impossible to overcome, for example they inherently can never
> > support
> > operators, addition or multiplication, or slicing like Python
> > strings can,
> > while an accessor could.
> > 
> > However, putting them as top-level methods for ndarray would
> > pollute the
> > methods too much. That is why I am suggesting numpy do the same
> > thing that
> > pandas, xarray, etc. are doing and putting those as methods under a
> > 'str'
> > attribute for ndarrays rather than as separate functions.
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210307/37ed5d6d/attachment.sig>

From blkzol001 at myuct.ac.za  Mon Mar  8 04:51:51 2021
From: blkzol001 at myuct.ac.za (zoj613)
Date: Mon, 8 Mar 2021 02:51:51 -0700 (MST)
Subject: [Numpy-discussion] guide for downstream package authors &
 setting version constraints
In-Reply-To: <ece3423f-ef5b-77cc-7c81-4ef6dbc5d6e6@gmail.com>
References: <CABL7CQgVyrhapMMMB6s0ty4r897zgZBM3VUSz0Kg1Bsu=xrh2w@mail.gmail.com>
 <1615053146181-0.post@n7.nabble.com>
 <ece3423f-ef5b-77cc-7c81-4ef6dbc5d6e6@gmail.com>
Message-ID: <1615197111584-0.post@n7.nabble.com>

Thanks for the suggestion. However I was able to solve the issue I had by
just creating inline wrapper functions in cython for the C functions so I
dont have to link them when importing in other 3rd party cython modules.


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From kevin.k.sheppard at gmail.com  Mon Mar  8 11:43:26 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Mon, 8 Mar 2021 16:43:26 +0000
Subject: [Numpy-discussion] Using logfactorial instead of loggamma in
 random_poisson sampler
In-Reply-To: <CAF6FJisQ8QydLF3GiABk5OFireZPmZxeDWaDHNHbuZ-0Rbkj=Q@mail.gmail.com>
References: <1615052195691-0.post@n7.nabble.com>
 <CAGzF1udf6WuVOTVD5ipmXFcQ0MR4C5OHkMgFRKLGiHFfZ6VXVg@mail.gmail.com>
 <CAF6FJisQ8QydLF3GiABk5OFireZPmZxeDWaDHNHbuZ-0Rbkj=Q@mail.gmail.com>
Message-ID: <CAE8Ss2Hio3NothS4n5Kng-Rm2AYnhVrGGXEdqDMUGhYJFDi85A@mail.gmail.com>

I did a quick test and using random_loggam was about 6% faster than using
logfactorial (on Windows).

Kevin


On Sun, Mar 7, 2021 at 2:40 AM Robert Kern <robert.kern at gmail.com> wrote:

> On Sat, Mar 6, 2021 at 1:45 PM Warren Weckesser <
> warren.weckesser at gmail.com> wrote:
>
>> At the time, making that change was not a high priority, so I didn't
>> pursue it. It does make sense to use the logfactorial function there,
>> and I'd be happy to see it updated, but be aware that making the
>> change is more work than changing just the function call.
>>
>
> Does it make a big difference? Per NEP 19, even in `Generator`, we do
> weigh the cost of changing the stream reasonably highly. Improved accuracy
> is likely worthwhile, but a minor performance improvement is probably not.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210308/314eca65/attachment.html>

From blkzol001 at myuct.ac.za  Mon Mar  8 12:05:48 2021
From: blkzol001 at myuct.ac.za (zoj613)
Date: Mon, 8 Mar 2021 10:05:48 -0700 (MST)
Subject: [Numpy-discussion] Using logfactorial instead of loggamma in
 random_poisson sampler
In-Reply-To: <CAE8Ss2Hio3NothS4n5Kng-Rm2AYnhVrGGXEdqDMUGhYJFDi85A@mail.gmail.com>
References: <1615052195691-0.post@n7.nabble.com>
 <CAGzF1udf6WuVOTVD5ipmXFcQ0MR4C5OHkMgFRKLGiHFfZ6VXVg@mail.gmail.com>
 <CAF6FJisQ8QydLF3GiABk5OFireZPmZxeDWaDHNHbuZ-0Rbkj=Q@mail.gmail.com>
 <CAE8Ss2Hio3NothS4n5Kng-Rm2AYnhVrGGXEdqDMUGhYJFDi85A@mail.gmail.com>
Message-ID: <1615223148767-0.post@n7.nabble.com>

What do you think is the explanation for that? I had assumed that using a
lookup table would be faster considering that the loggam implementation has
loops and makes calls to elementary functions in it.


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From sebastian at sipsolutions.net  Mon Mar  8 15:15:01 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 08 Mar 2021 14:15:01 -0600
Subject: [Numpy-discussion] NumPy logo merchandise available at spreadshirt
 NumFOCUS shop
In-Reply-To: <CAEQ_TvcBKZZvNVJuLLMDuHimq=K-hRyy6czA6cykJF2ryEbgPg@mail.gmail.com>
References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net>
 <2911614896816@mail.yandex.ru>
 <CAEQ_TvcBKZZvNVJuLLMDuHimq=K-hRyy6czA6cykJF2ryEbgPg@mail.gmail.com>
Message-ID: <a589ceea690a91e84d01f17b61765122d64b9642.camel@sipsolutions.net>

On Thu, 2021-03-04 at 16:54 -0800, Stephan Hoyer wrote:
> I love your mittens!
> 
> NumPy really should be in the NumFOCUS store, but it currently isn't:
> https://shop.spreadshirt.com/numfocus/all
> 

Various items with the NumPy logo (text below cube) are now available
here from the NumFOCUS spreadshirt shop [1] here:

https://shop.spreadshirt.com/numfocus/numpy?idea=604683f7998267255de40bcc

If there is popular demand to add something else that should be no
problem :).
Unfortunately, I doubt we can add those amazing mittens there!

Cheers,

Sebastian


[1] https://shop.spreadshirt.com/numfocus


> I'll make some inquiries to see if we can sort that out :).
> 
> On Thu, Mar 4, 2021 at 2:43 PM <klark--kent at yandex.ru> wrote:
> 
> > Hello. I was looking for a T-shirt with Numpy logo but didn't find
> > anywhere. Anybody knows if there's a merchandise with Numpy? So I
> > have to
> > kneet mittens with Numpy logo for myself.
> > 
> > Best regards!
> > Konstantin
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210308/acf21790/attachment.sig>

From sebastian at sipsolutions.net  Tue Mar  9 22:57:25 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 09 Mar 2021 21:57:25 -0600
Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage
 Focus
Message-ID: <6b3a8d4f360a7449ebd63fa5ef280dc55a63f5c5.camel@sipsolutions.net>

Hi all,

Our bi-weekly triage-focused NumPy development meeting is Wednesday,
March 10th at 11 am Pacific Time (19:00 UTC).
Everyone is invited to join in and edit the work-in-progress meeting
topics and notes:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

I encourage everyone to notify us of issues or PRs that you feel should
be prioritized, discussed, or reviewed.

Best regards

Sebastian


PS: We will probably schedule the community meeting in UTC next week
probably, shifting to avoid shifting it by one hour.
Which means that the times will shift for whoever has daylight saving
time changes (which is this Sunday e.g. in the US).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210309/281a5dd3/attachment.sig>

From sebastian at sipsolutions.net  Wed Mar 10 12:36:22 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 10 Mar 2021 11:36:22 -0600
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
Message-ID: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>

Top Posting, to discuss post specific questions about NEP 47 and
partially the start on implementing it in:

    https://github.com/numpy/numpy/pull/18585

There are probably many more that will crop up. But for me, each of
these is a pretty major difficulty without a clear answer as of now.

1. I still need clarity how a library is supposed to use this namespace
when the user passes in a NumPy array (mentioned before).  The user
must get back a NumPy array after all.  Maybe that is just a decorator,
but it seems important.

2. `np.result_type` special cases array-scalars (the current PR), NEP
47 promises it will not.  The PR could attempt to work around that
using `arr.dtype` int `result_type`, I expect there are more details to
fight with there, but I am not sure.

3. For all other function, the same problem applies. You don't actually
have anything to fix NumPy promotion rules.  You could bake your own
cake here for numeric types, but I am not sure, you might also need NEP
43 in all its promotion power to pull it off.

4. Now that I looked at the above, I do not feel its reasonable to
limit this functionality to numeric dtypes.  If someone uses a NumPy
rational-dtype, why should a SciPy function currently implemented in
pure NumPy reject that?  In other words, I think this is the point
where trying to be "minimal" is counterproductive.

4. The PR makes no attempt at handling binary operators in any way
aside from greedily coercing the other operand.

5. What happens with a mix of array-likes or even array subclasses like
`astropy.quantity`?

6. Is there any provision on how to deal with mixed array-like inputs?
CuPy+numpy, etc.?


I don't think we have to figure out everything up-front, but I do think
there are a few very fundamental questions still open, at least for me
personally.

Cheers,

Sebastian


On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> Hi all,
> 
> Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> for
> discussion on adoption of the array API standard (
> https://data-apis.github.io/array-api/latest/). This will add a new
> numpy.array_api submodule containing that standardized API. The main
> purpose of this API is to be able to write code that is portable to
> other
> array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> MXNet.
> 
> We expect this NEP to remain in draft state for quite a while, while
> we're
> gaining experience with using it in downstream libraries, discuss
> adding it
> to other array libraries, and finishing some of the loose ends (e.g.,
> specifications for linear algebra functions that aren't merged yet,
> see
> https://github.com/data-apis/array-api/pulls) in the API standard
> itself.
> 
> See
> https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> for an initial discussion about this topic.
> 
> Please keep high-level discussion here and detailed comments on
> https://github.com/numpy/numpy/pull/18456. Also, you can access a
> rendered
> version of the NEP from that PR (see PR description for how), which
> may be
> helpful.
> Cheers,
> Ralf
> 
> 
> Abstract
> --------
> 
> We propose to adopt the `Python array API standard`_, developed by
> the
> `Consortium for Python Data API Standards`_. Implementing this as a
> separate
> new namespace in NumPy will allow authors of libraries which depend
> on NumPy
> as well as end users to write code that is portable between NumPy and
> all
> other array/tensor libraries that adopt this standard.
> 
> .. note::
> 
> ??? We expect that this NEP will remain in a draft state for quite a
> while.
> ??? Given the large scope we don't expect to propose it for
> acceptance any
> ??? time soon; instead, we want to solicit feedback on both the high-
> level
> ??? design and implementation, and learn what needs describing better
> in
> this
> ??? NEP or changing in either the implementation or the array API
> standard
> ??? itself.
> 
> 
> Motivation and Scope
> --------------------
> 
> Python users have a wealth of choice for libraries and frameworks for
> numerical computing, data science, machine learning, and deep
> learning. New
> frameworks pushing forward the state of the art in these fields are
> appearing
> every year. One unintended consequence of all this activity and
> creativity
> has been fragmentation in multidimensional array (a.k.a. tensor)
> libraries -
> which are the fundamental data structure for these fields. Choices
> include
> NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> 
> The APIs of each of these libraries are largely similar, but with
> enough
> differences that it?s quite difficult to write code that works with
> multiple
> (or all) of these libraries. The array API standard aims to address
> that
> issue, by specifying an API for the most common ways arrays are
> constructed
> and used. The proposed API is quite similar to NumPy's API, and
> deviates
> mainly
> in places where (a) NumPy made design choices that are inherently not
> portable
> to other implementations, and (b) where other libraries consistently
> deviated
> from NumPy on purpose because NumPy's design turned out to have
> issues or
> unnecessary complexity.
> 
> For a longer discussion on the purpose of the array API standard we
> refer to
> the `Purpose and Scope section of the array API standard <
> https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`
> __
> and the two blog posts announcing the formation of the Consortium
> [1]_ and
> the release of the first draft version of the standard for community
> review
> [2]_.
> 
> The scope of this NEP includes:
> 
> - Adopting the 2021 version of the array API standard
> - Adding a separate namespace, tentatively named ``numpy.array_api``
> - Changes needed/desired outside of the new namespace, for example
> new
> dunder
> ? methods on the ``ndarray`` object
> - Implementation choices, and differences between functions in the
> new
> ? namespace with those in the main ``numpy`` namespace
> - A new array object conforming to the array API standard
> - Maintenance effort and testing strategy
> - Impact on NumPy's total exposed API surface and on other future and
> ? under-discussion design choices
> - Relation to existing and proposed NumPy array protocols
> ? (``__array_ufunc__``, ``__array_function__``,
> ``__array_module__``).
> - Required improvements to existing NumPy functionality
> 
> Out of scope for this NEP are:
> 
> - Changes in the array API standard itself. Those are likely to come
> up
> ? during review of this NEP, but should be upstreamed as needed and
> this NEP
> ? subsequently updated.
> 
> 
> Usage and Impact
> ----------------
> 
> *This section will be fleshed out later, for now we refer to the use
> cases
> given
> in* `the array API standard Use Cases section <
> https://data-apis.github.io/array-api/latest/use_cases.html>`__
> 
> In addition to those use cases, the new namespace contains
> functionality
> that
> is widely used and supported by many array libraries. As such, it is
> a good
> set of functions to teach to newcomers to NumPy and recommend as
> "best
> practice". That contrasts with NumPy's main namespace, which contains
> many
> functions and objects that have been superceded or we consider
> mistakes -
> but
> that we can't remove because of backwards compatibility reasons.
> 
> The usage of the ``numpy.array_api`` namespace by downstream
> libraries is
> intended to enable them to consume multiple kinds of arrays, *without
> having
> to have a hard dependency on all of those array libraries*:
> 
> .. image:: _static/nep-0047-library-dependencies.png
> 
> Adoption in downstream libraries
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The prototype implementation of the ``array_api`` namespace will be
> used
> with
> SciPy, scikit-learn and other libraries of interest that depend on
> NumPy, in
> order to get more experience with the design and find out if any
> important
> parts are missing.
> 
> The pattern to support multiple array libraries is intended to be
> something
> like::
> 
> ??? def somefunc(x, y):
> ??????? # Retrieves standard namespace. Raises if x and y have
> different
> ??????? # namespaces.? See Appendix for possible get_namespace
> implementation
> ??????? xp = get_namespace(x, y)
> ??????? out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> ??????? return out
> 
> The ``get_namespace`` call is effectively the library author opting
> in to
> using the standard API namespace, and thereby explicitly supporting
> all conforming array libraries.
> 
> 
> The ``asarray`` / ``asanyarray`` pattern
> ````````````````````````````````````````
> 
> Many existing libraries use the same ``asarray`` (or ``asanyarray``)
> pattern
> as NumPy itself does; accepting any object that can be coerced into a
> ``np.ndarray``.
> We consider this design pattern problematic - keeping in mind the Zen
> of
> Python, *"explicit is better than implicit"*, as well as the pattern
> being
> historically problematic in the SciPy ecosystem for ``ndarray``
> subclasses
> and with over-eager object creation. All other array/tensor libraries
> are
> more strict, and that works out fine in practice. We would advise
> authors of
> new libraries to avoid the ``asarray`` pattern. Instead they should
> either
> accept just NumPy arrays or, if they want to support multiple kinds
> of
> arrays, check if the incoming array object supports the array API
> standard
> by checking for ``__array_namespace__`` as shown in the example
> above.
> 
> Existing libraries can do such a check as well, and only call
> ``asarray`` if
> the check fails. This is very similar to the ``__duckarray__`` idea
> in
> :ref:`NEP30`.
> 
> 
> .. _adoption-application-code:
> 
> Adoption in application code
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The new namespace can be seen by end users as a cleaned up and
> slimmed down
> version of NumPy's main namespace. Encouraging end users to use this
> namespace like::
> 
> ??? import numpy.array_api as xp
> 
> ??? x = xp.linspace(0, 2*xp.pi, num=100)
> ??? y = xp.cos(x)
> 
> seems perfectly reasonable, and potentially beneficial - users get
> offered
> only
> one function for each purpose (the one we consider best-practice),
> and they
> then write code that is more easily portable to other libraries.
> 
> 
> Backward compatibility
> ----------------------
> 
> No deprecations or removals of existing NumPy APIs or other backwards
> incompatible changes are proposed.
> 
> 
> High-level design
> -----------------
> 
> The array API standard consists of approximately 120 objects, all of
> which
> have a direct NumPy equivalent. This figure shows what is included at
> a
> high level:
> 
> .. image:: _static/nep-0047-scope-of-array-API.png
> 
> The most important changes compared to what NumPy currently offers
> are:
> 
> - A new array object which:
> 
> ??? - conforms to the casting rules and indexing behaviour specified
> by the
> ????? standard,
> ??? - does not have methods other than dunder methods,
> ??? - does not support the full range of NumPy indexing behaviour.
> Advanced
> ????? indexing with integers is not supported. Only boolean indexing
> ????? with a single (possibly multi-dimensional) boolean array is
> supported.
> ????? An indexing expression that selects a single element returns a
> 0-D
> array
> ????? rather than a scalar.
> 
> - Functions in the ``array_api`` namespace:
> 
> ??? - do not accept ``array_like`` inputs, only NumPy arrays and
> Python
> scalars
> ??? - do not support ``__array_ufunc__`` and ``__array_function__``,
> ??? - use positional-only and keyword-only parameters in their
> signatures,
> ??? - have inline type annotations,
> ??? - may have minor changes to signatures and semantics of
> individual
> ????? functions compared to their equivalents already present in
> NumPy,
> ??? - only support dtype literals, not format strings or other ways
> of
> ????? specifying dtypes
> 
> - DLPack_ support will be added to NumPy,
> - New syntax for "device support" will be added, through a
> ``.device``
> ? attribute on the new array object, and ``device=`` keywords in
> array
> creation
> ? functions in the ``array_api`` namespace,
> - Casting rules that differ from those NumPy currently has. Output
> dtypes
> can
> ? be derived from input dtypes (i.e. no value-based casting), and 0-D
> arrays
> ? are treated like >=1-D arrays.
> - Not all dtypes NumPy has are part of the standard. Only boolean,
> signed
> and
> ? unsigned integers, and floating-point dtypes up to ``float64`` are
> supported.
> ? Complex dtypes are expected to be added in the next version of the
> standard.
> ? Extended precision, string, void, object and datetime dtypes, as
> well as
> ? structured dtypes, are not included.
> 
> Improvements to existing NumPy functionality that are needed include:
> 
> - Add support for stacks of matrices to some functions in
> ``numpy.linalg``
> ? that are currently missing such support.
> - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
> - Add a "never copy" mode to ``np.asarray``.
> 
> 
> Functions in the ``array_api`` namespace
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Let's start with an example of a function implementation that shows
> the most
> important differences with the equivalent function in the main
> namespace::
> 
> ??? def max(x: array, /, *,
> ??????????? axis: Optional[Union[int, Tuple[int, ...]]] = None,
> ??????????? keepdims: bool = False
> ??????? ) -> array:
> ??????? """
> ??????? Array API compatible wrapper for :py:func:`np.max
> <numpy.max>`.
> ??????? """
> ??????? return np.max._implementation(x, axis=axis,
> keepdims=keepdims)
> 
> This function does not accept ``array_like`` inputs, only
> ``ndarray``. There
> are multiple reasons for this. Other array libraries all work like
> this.
> Letting the user do coercion of lists, generators, or other foreign
> objects
> separately results in a cleaner design with less unexpected
> behaviour.
> It's higher-performance - less overhead from ``asarray`` calls.
> Static
> typing
> is easier. Subclasses will work as expected. And the slight increase
> in
> verbosity
> because users have to explicitly coerce to ``ndarray`` on rare
> occasions
> seems like a small price to pay.
> 
> This function does not support ``__array_ufunc__`` nor
> ``__array_function__``.
> These protocols serve a similar purpose as the array API standard
> module
> itself,
> but through a different mechanisms. Because only ``ndarray``
> instances are
> accepted,
> dispatching via one of these protocols isn't useful anymore.
> 
> This function uses positional-only parameters in its signature. This
> makes
> code
> more portable - writing ``max(x=x, ...)`` is no longer valid, hence
> if other
> libraries call the first parameter ``input`` rather than ``x``, that
> is
> fine.
> The rationale for keyword-only parameters (not shown in the above
> example)
> is
> two-fold: clarity of end user code, and it being easier to extend the
> signature
> in the future with keywords in the desired order.
> 
> This function has inline type annotations. Inline annotations are far
> easier to
> maintain than separate stub files. And because the types are simple,
> this
> will
> not result in a large amount of clutter with type aliases or unions
> like in
> the
> current stub files NumPy has.
> 
> 
> DLPack support for zero-copy data interchange
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The ability to convert one kind of array into another kind is
> valuable, and
> indeed necessary when downstream libraries want to support multiple
> kinds of
> arrays. This requires a well-specified data exchange protocol. NumPy
> already
> supports two of these, namely the buffer protocol (i.e., PEP 3118),
> and
> the ``__array_interface__`` (Python side) / ``__array_struct__`` (C
> side)
> protocol. Both work similarly, letting the "producer" describe how
> the data
> is laid out in memory so the "consumer" can construct its own kind of
> array
> with a view on that data.
> 
> DLPack works in a very similar way. The main reasons to prefer DLPack
> over
> the options already present in NumPy are:
> 
> 1. DLPack is the only protocol with device support (e.g., GPUs using
> CUDA or
> ?? ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> array
> ?? libraries are not. Having one protocol per device isn't tenable,
> hence
> ?? device support is a must.
> 2. Widespread support. DLPack has the widest adoption of all
> protocols, only
> ?? NumPy is missing support. And the experiences of other libraries
> with it
> ?? are positive. This contrasts with the protocols NumPy does
> support, which
> ?? are used very little - when other libraries want to interoperate
> with
> ?? NumPy, they typically use the (more limited, and NumPy-specific)
> ?? ``__array__`` protocol.
> 
> Adding support for DLPack to NumPy entails:
> 
> - Adding a ``ndarray.__dlpack__`` method
> - Adding a ``from_dlpack`` function, which takes as input an object
> ? supporting ``__dlpack__``, and returns an ``ndarray``.
> 
> DLPack is currently a ~200 LoC header, and is meant to be included
> directly, so
> no external dependency is needed. Implementation should be
> straightforward.
> 
> 
> Syntax for device support
> ~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> NumPy itself is CPU-only, so it clearly doesn't have a need for
> device
> support.
> However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> support
> multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
> To write portable code on systems with multiple devices, it's often
> necessary
> to create new arrays on the same device as some other array, or check
> that
> two arrays live on the same device. Hence syntax for that is needed.
> 
> The array object will have a ``.device`` attribute which enables
> comparing
> devices of different arrays (they only should compare equal if both
> arrays
> are
> from the same library and it's the same hardware device).
> Furthermore,
> ``device=`` keywords in array creation functions are needed. For
> example::
> 
> ??? def empty(shape: Union[int, Tuple[int, ...]], /, *,
> ????????????? dtype: Optional[dtype] = None,
> ????????????? device: Optional[device] = None) -> array:
> ??????? """
> ??????? Array API compatible wrapper for :py:func:`np.empty
> <numpy.empty>`.
> ??????? """
> ??????? return np.empty(shape, dtype=dtype, device=device)
> 
> The implementation for NumPy may be as simple as setting the device
> attribute to
> the string ``'cpu'`` and raising an exception if array creation
> functions
> encounter any other value.
> 
> 
> Dtypes and casting rules
> ~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The supported dtypes in this namespace are boolean, 8/16/32/64-bit
> signed
> and
> unsigned integer, and 32/64-bit floating-point dtypes. These will be
> added
> to
> the namespace as dtype literals with the expected names (e.g.,
> ``bool``,
> ``uint16``, ``float64``).
> 
> The most obvious omissions are the complex dtypes. The rationale for
> the
> lack
> of complex support in the first version of the array API standard is
> that
> several
> libraries (PyTorch, MXNet) are still in the process of adding support
> for
> complex dtypes. The next version of the standard is expected to
> include
> ``complex64``
> and ``complex128`` (see `this issue <
> https://github.com/data-apis/array-api/issues/102>`__
> for more details).
> 
> Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is
> expected
> to only use the dtype literals. Format strings, Python builtin
> dtypes, or
> string representations of the dtype literals are not accepted - this
> will
> improve readability and portability of code at little cost.
> 
> Casting rules are only defined between different dtypes of the same
> kind.
> The
> rationale for this is that mixed-kind (e.g., integer to floating-
> point)
> casting behavior differs between libraries. NumPy's mixed-kind
> casting
> behavior doesn't need to be changed or restricted, it only needs to
> be
> documented that if users use mixed-kind casting, their code may not
> be
> portable.
> 
> .. image:: _static/nep-0047-casting-rules-lattice.png
> 
> *Type promotion diagram. Promotion between any two types is given by
> their
> join on this lattice. Only the types of participating arrays matter,
> not
> their values. Dashed lines indicate that behaviour for Python scalars
> is
> undefined on overflow. Boolean, integer and floating-point dtypes are
> not
> connected, indicating mixed-kind promotion is undefined.*
> 
> The most important difference between the casting rules in NumPy and
> in the
> array API standard is how scalars and 0-dimensional arrays are
> handled. In
> the standard, array scalars do not exist and 0-dimensional arrays
> follow the
> same casting rules as higher-dimensional arrays.
> 
> See the `Type Promotion Rules section of the array API standard <
> https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > `__
> for more details.
> 
> .. note::
> 
> ??? It is not clear what the best way is to support the different
> casting
> rules
> ??? for 0-dimensional arrays and no value-based casting. One option
> may be
> to
> ??? implement this second set of casting rules, keep them private,
> mark the
> ??? array API functions with a private attribute that says they
> adhere to
> ??? these different rules, and let the casting machinery check
> whether for
> ??? that attribute.
> 
> ??? This needs discussion.
> 
> 
> Indexing
> ~~~~~~~~
> 
> An indexing expression that would return a scalar with ``ndarray``,
> e.g.
> ``arr_2d[0, 0]``, will return a 0-D array with the new array object.
> There
> are
> several reasons for that: array scalars are largely considered a
> design
> mistake
> which no other array library copied; it works better for non-CPU
> libraries
> (typically arrays can live on the device, scalars live on the host);
> and
> it's
> simply a consistent design. To get a Python scalar out of a 0-D
> array, one
> can
> simply use the builtin for the type, e.g. ``float(arr_0d)``.
> 
> The other `indexing modes in the standard <
> https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > `__
> do work largely the same as they do for ``numpy.ndarray``. One
> noteworthy
> difference is that clipping in slice indexing (e.g., ``a[:n]`` where
> ``n``
> is
> larger than the size of the first axis) is unspecified behaviour,
> because
> that kind of check can be expensive on accelerators.
> 
> The lack of advanced indexing, and boolean indexing being limited to
> a
> single
> n-D boolean array, is due to those indexing modes not being suitable
> for all
> types of arrays or JIT compilation. Their absence does not seem to be
> problematic; if a user or library author wants to use them, they can
> do so
> through zero-copy conversion to ``numpy.ndarray``. This will signal
> correctly
> to whomever reads the code that it is then NumPy-specific rather than
> portable
> to all conforming array types.
> 
> 
> 
> The array object
> ~~~~~~~~~~~~~~~~
> 
> The array object in the standard does not have methods other than
> dunder
> methods. The rationale for that is that not all array libraries have
> methods
> on their array object (e.g., TensorFlow does not). It also provides
> only a
> single way of doing something, rather than have functions and methods
> that
> are effectively duplicate.
> 
> Mixing operations that may produce views (e.g., indexing,
> ``nonzero``)
> in combination with mutation (e.g., item or slice assignment) is
> `explicitly documented in the standard to not be supported <
> https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > `__.
> This cannot easily be prohibited in the array object itself; instead
> this
> will
> be guidance to the user via documentation.
> 
> The standard current does not prescribe a name for the array object
> itself.
> We propose to simply name it ``ndarray``. This is the most obvious
> name, and
> because of the separate namespace should not clash with
> ``numpy.ndarray``.
> 
> 
> Implementation
> --------------
> 
> .. note::
> 
> ??? This section needs a lot more detail, which will gradually be
> added when
> ??? the implementation progresses.
> 
> A prototype of the ``array_api`` namespace can be found in
> https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
> The docstring in its ``__init__.py`` has notes on completeness of the
> implementation. The code for the wrapper functions also contains ``#
> Note:``
> comments everywhere there is a difference with the NumPy API.
> Two important parts that are not implemented yet are the new array
> object
> and
> DLPack support. Functions may need changes to ensure the changed
> casting
> rules
> are respected.
> 
> The array object
> ~~~~~~~~~~~~~~~~
> 
> Regarding the array object implementation, we plan to start with a
> regular
> Python class that wraps a ``numpy.ndarray`` instance. Attributes and
> methods
> can forward to that wrapped instance, applying input validation and
> implementing changed behaviour as needed.
> 
> The casting rules are probably the most challenging part. The in-
> progress
> dtype system refactor (NEPs 40-43) should make implementing the
> correct
> casting
> behaviour easier - it is already moving away from value-based casting
> for
> example.
> 
> 
> The dtype objects
> ~~~~~~~~~~~~~~~~~
> 
> We must be able to compare dtypes for equality, and expressions like
> these
> must
> be possible::
> 
> ??? np.array_api.some_func(..., dtype=x.dtype)
> 
> The above implies it would be nice to have ``np.array_api.float32 ==
> np.array_api.ndarray(...).dtype``.
> 
> Dtypes should not be assumed to have a class hierarchy by users,
> however we
> are
> free to implement it with a class hierarchy if that's convenient. We
> considered
> the following options to implement dtype objects:
> 
> 1. Alias dtypes to those in the main namespace. E.g.,
> ``np.array_api.float32 =
> ?? np.float32``.
> 2. Make the dtypes instances of ``np.dtype``. E.g.,
> ``np.array_api.float32 =
> ?? np.dtype(np.float32)``.
> 3. Create new singleton classes with only the required
> methods/attributes
> ?? (currently just ``__eq__``).
> 
> It seems like (2) would be easiest from the perspective of
> interacting with
> functions outside the main namespace. And (3) would adhere best to
> the
> standard.
> 
> TBD: the standard does not yet have a good way to inspect properties
> of a
> dtype, to ask questions like "is this an integer dtype?". Perhaps
> this is
> easy
> enough to do for users, like so::
> 
> ??? def _get_dtype(dt_or_arr):
> ??????? return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else
> dt_or_arr
> 
> ??? def is_floating(dtype_or_array):
> ??????? dtype = _get_dtype(dtype_or_array)
> ??????? return dtype in (float32, float64)
> 
> ??? def is_integer(dtype_or_array):
> ??????? dtype = _get_dtype(dtype_or_array)
> ??????? return dtype in (uint8, uint16, uint32, uint64, int8, int16,
> int32,
> int64)
> 
> However it could make sense to add to the standard. Note that NumPy
> itself
> currently does not have a great for asking such questions, see
> `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210310/6fcb9476/attachment-0001.sig>

From asmeurer at gmail.com  Wed Mar 10 15:44:47 2021
From: asmeurer at gmail.com (Aaron Meurer)
Date: Wed, 10 Mar 2021 13:44:47 -0700
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>
Message-ID: <CAKgW=6+LJpuexf6XW5Sn=xgLgaN_nayMwQFdHVheSS3kR0CpuQ@mail.gmail.com>

On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg
<sebastian at sipsolutions.net> wrote:
>
> Top Posting, to discuss post specific questions about NEP 47 and
> partially the start on implementing it in:
>
>     https://github.com/numpy/numpy/pull/18585
>
> There are probably many more that will crop up. But for me, each of
> these is a pretty major difficulty without a clear answer as of now.
>
> 1. I still need clarity how a library is supposed to use this namespace
> when the user passes in a NumPy array (mentioned before).  The user
> must get back a NumPy array after all.  Maybe that is just a decorator,
> but it seems important.
>
> 2. `np.result_type` special cases array-scalars (the current PR), NEP
> 47 promises it will not.  The PR could attempt to work around that
> using `arr.dtype` int `result_type`, I expect there are more details to
> fight with there, but I am not sure.

The idea is to work around it everywhere, so that it follows the rules
in the spec (no array scalars, no value-based casting). I haven't
started it yet, though, so I don't know yet how hard it will be. If it
ends up being too hard we could put it in the same camp as device
support and dlpack support where it needs some basic implementation in
numpy itself first before we can properly do it in the array API
namespace.

>
> 3. For all other function, the same problem applies. You don't actually
> have anything to fix NumPy promotion rules.  You could bake your own
> cake here for numeric types, but I am not sure, you might also need NEP
> 43 in all its promotion power to pull it off.
>
> 4. Now that I looked at the above, I do not feel its reasonable to
> limit this functionality to numeric dtypes.  If someone uses a NumPy
> rational-dtype, why should a SciPy function currently implemented in
> pure NumPy reject that?  In other words, I think this is the point
> where trying to be "minimal" is counterproductive.

The idea of minimality is to make it so users can be sure they will be
able to use other libraries, once they also have array API compliant
namespaces. A rational-dtype wouldn't ever be implemented in those
other libraries, because it isn't part of the standard, so if a user
is using those, that is a sign they are using things that aren't in
the array API, so they can't expect to be able to swap out their
dtypes. If a user wants to use something that's only in NumPy, then
they should just use NumPy.

>
> 4. The PR makes no attempt at handling binary operators in any way
> aside from greedily coercing the other operand.
>
> 5. What happens with a mix of array-likes or even array subclasses like
> `astropy.quantity`?
>
> 6. Is there any provision on how to deal with mixed array-like inputs?
> CuPy+numpy, etc.?

Neither of these are defined in the spec. The spec only deals with
staying inside of the compliant namespace. It doesn't require any
behavior mixing things from other namespaces. That's generally
considered a much harder problem, and there is the data interchange
protocol to deal with it
(https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html).

Aaron Meurer

>
>
> I don't think we have to figure out everything up-front, but I do think
> there are a few very fundamental questions still open, at least for me
> personally.
>
> Cheers,
>
> Sebastian
>
>
>
> On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> > Hi all,
> >
> > Here is a NEP, written together with Stephan Hoyer and Aaron Meurer,
> > for
> > discussion on adoption of the array API standard (
> > https://data-apis.github.io/array-api/latest/). This will add a new
> > numpy.array_api submodule containing that standardized API. The main
> > purpose of this API is to be able to write code that is portable to
> > other
> > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and
> > MXNet.
> >
> > We expect this NEP to remain in draft state for quite a while, while
> > we're
> > gaining experience with using it in downstream libraries, discuss
> > adding it
> > to other array libraries, and finishing some of the loose ends (e.g.,
> > specifications for linear algebra functions that aren't merged yet,
> > see
> > https://github.com/data-apis/array-api/pulls) in the API standard
> > itself.
> >
> > See
> > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> > for an initial discussion about this topic.
> >
> > Please keep high-level discussion here and detailed comments on
> > https://github.com/numpy/numpy/pull/18456. Also, you can access a
> > rendered
> > version of the NEP from that PR (see PR description for how), which
> > may be
> > helpful.
> > Cheers,
> > Ralf
> >
> >
> > Abstract
> > --------
> >
> > We propose to adopt the `Python array API standard`_, developed by
> > the
> > `Consortium for Python Data API Standards`_. Implementing this as a
> > separate
> > new namespace in NumPy will allow authors of libraries which depend
> > on NumPy
> > as well as end users to write code that is portable between NumPy and
> > all
> > other array/tensor libraries that adopt this standard.
> >
> > .. note::
> >
> >     We expect that this NEP will remain in a draft state for quite a
> > while.
> >     Given the large scope we don't expect to propose it for
> > acceptance any
> >     time soon; instead, we want to solicit feedback on both the high-
> > level
> >     design and implementation, and learn what needs describing better
> > in
> > this
> >     NEP or changing in either the implementation or the array API
> > standard
> >     itself.
> >
> >
> > Motivation and Scope
> > --------------------
> >
> > Python users have a wealth of choice for libraries and frameworks for
> > numerical computing, data science, machine learning, and deep
> > learning. New
> > frameworks pushing forward the state of the art in these fields are
> > appearing
> > every year. One unintended consequence of all this activity and
> > creativity
> > has been fragmentation in multidimensional array (a.k.a. tensor)
> > libraries -
> > which are the fundamental data structure for these fields. Choices
> > include
> > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> >
> > The APIs of each of these libraries are largely similar, but with
> > enough
> > differences that it?s quite difficult to write code that works with
> > multiple
> > (or all) of these libraries. The array API standard aims to address
> > that
> > issue, by specifying an API for the most common ways arrays are
> > constructed
> > and used. The proposed API is quite similar to NumPy's API, and
> > deviates
> > mainly
> > in places where (a) NumPy made design choices that are inherently not
> > portable
> > to other implementations, and (b) where other libraries consistently
> > deviated
> > from NumPy on purpose because NumPy's design turned out to have
> > issues or
> > unnecessary complexity.
> >
> > For a longer discussion on the purpose of the array API standard we
> > refer to
> > the `Purpose and Scope section of the array API standard <
> > https://data-apis.github.io/array-api/latest/purpose_and_scope.html>`
> > __
> > and the two blog posts announcing the formation of the Consortium
> > [1]_ and
> > the release of the first draft version of the standard for community
> > review
> > [2]_.
> >
> > The scope of this NEP includes:
> >
> > - Adopting the 2021 version of the array API standard
> > - Adding a separate namespace, tentatively named ``numpy.array_api``
> > - Changes needed/desired outside of the new namespace, for example
> > new
> > dunder
> >   methods on the ``ndarray`` object
> > - Implementation choices, and differences between functions in the
> > new
> >   namespace with those in the main ``numpy`` namespace
> > - A new array object conforming to the array API standard
> > - Maintenance effort and testing strategy
> > - Impact on NumPy's total exposed API surface and on other future and
> >   under-discussion design choices
> > - Relation to existing and proposed NumPy array protocols
> >   (``__array_ufunc__``, ``__array_function__``,
> > ``__array_module__``).
> > - Required improvements to existing NumPy functionality
> >
> > Out of scope for this NEP are:
> >
> > - Changes in the array API standard itself. Those are likely to come
> > up
> >   during review of this NEP, but should be upstreamed as needed and
> > this NEP
> >   subsequently updated.
> >
> >
> > Usage and Impact
> > ----------------
> >
> > *This section will be fleshed out later, for now we refer to the use
> > cases
> > given
> > in* `the array API standard Use Cases section <
> > https://data-apis.github.io/array-api/latest/use_cases.html>`__
> >
> > In addition to those use cases, the new namespace contains
> > functionality
> > that
> > is widely used and supported by many array libraries. As such, it is
> > a good
> > set of functions to teach to newcomers to NumPy and recommend as
> > "best
> > practice". That contrasts with NumPy's main namespace, which contains
> > many
> > functions and objects that have been superceded or we consider
> > mistakes -
> > but
> > that we can't remove because of backwards compatibility reasons.
> >
> > The usage of the ``numpy.array_api`` namespace by downstream
> > libraries is
> > intended to enable them to consume multiple kinds of arrays, *without
> > having
> > to have a hard dependency on all of those array libraries*:
> >
> > .. image:: _static/nep-0047-library-dependencies.png
> >
> > Adoption in downstream libraries
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The prototype implementation of the ``array_api`` namespace will be
> > used
> > with
> > SciPy, scikit-learn and other libraries of interest that depend on
> > NumPy, in
> > order to get more experience with the design and find out if any
> > important
> > parts are missing.
> >
> > The pattern to support multiple array libraries is intended to be
> > something
> > like::
> >
> >     def somefunc(x, y):
> >         # Retrieves standard namespace. Raises if x and y have
> > different
> >         # namespaces.  See Appendix for possible get_namespace
> > implementation
> >         xp = get_namespace(x, y)
> >         out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> >         return out
> >
> > The ``get_namespace`` call is effectively the library author opting
> > in to
> > using the standard API namespace, and thereby explicitly supporting
> > all conforming array libraries.
> >
> >
> > The ``asarray`` / ``asanyarray`` pattern
> > ````````````````````````````````````````
> >
> > Many existing libraries use the same ``asarray`` (or ``asanyarray``)
> > pattern
> > as NumPy itself does; accepting any object that can be coerced into a
> > ``np.ndarray``.
> > We consider this design pattern problematic - keeping in mind the Zen
> > of
> > Python, *"explicit is better than implicit"*, as well as the pattern
> > being
> > historically problematic in the SciPy ecosystem for ``ndarray``
> > subclasses
> > and with over-eager object creation. All other array/tensor libraries
> > are
> > more strict, and that works out fine in practice. We would advise
> > authors of
> > new libraries to avoid the ``asarray`` pattern. Instead they should
> > either
> > accept just NumPy arrays or, if they want to support multiple kinds
> > of
> > arrays, check if the incoming array object supports the array API
> > standard
> > by checking for ``__array_namespace__`` as shown in the example
> > above.
> >
> > Existing libraries can do such a check as well, and only call
> > ``asarray`` if
> > the check fails. This is very similar to the ``__duckarray__`` idea
> > in
> > :ref:`NEP30`.
> >
> >
> > .. _adoption-application-code:
> >
> > Adoption in application code
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The new namespace can be seen by end users as a cleaned up and
> > slimmed down
> > version of NumPy's main namespace. Encouraging end users to use this
> > namespace like::
> >
> >     import numpy.array_api as xp
> >
> >     x = xp.linspace(0, 2*xp.pi, num=100)
> >     y = xp.cos(x)
> >
> > seems perfectly reasonable, and potentially beneficial - users get
> > offered
> > only
> > one function for each purpose (the one we consider best-practice),
> > and they
> > then write code that is more easily portable to other libraries.
> >
> >
> > Backward compatibility
> > ----------------------
> >
> > No deprecations or removals of existing NumPy APIs or other backwards
> > incompatible changes are proposed.
> >
> >
> > High-level design
> > -----------------
> >
> > The array API standard consists of approximately 120 objects, all of
> > which
> > have a direct NumPy equivalent. This figure shows what is included at
> > a
> > high level:
> >
> > .. image:: _static/nep-0047-scope-of-array-API.png
> >
> > The most important changes compared to what NumPy currently offers
> > are:
> >
> > - A new array object which:
> >
> >     - conforms to the casting rules and indexing behaviour specified
> > by the
> >       standard,
> >     - does not have methods other than dunder methods,
> >     - does not support the full range of NumPy indexing behaviour.
> > Advanced
> >       indexing with integers is not supported. Only boolean indexing
> >       with a single (possibly multi-dimensional) boolean array is
> > supported.
> >       An indexing expression that selects a single element returns a
> > 0-D
> > array
> >       rather than a scalar.
> >
> > - Functions in the ``array_api`` namespace:
> >
> >     - do not accept ``array_like`` inputs, only NumPy arrays and
> > Python
> > scalars
> >     - do not support ``__array_ufunc__`` and ``__array_function__``,
> >     - use positional-only and keyword-only parameters in their
> > signatures,
> >     - have inline type annotations,
> >     - may have minor changes to signatures and semantics of
> > individual
> >       functions compared to their equivalents already present in
> > NumPy,
> >     - only support dtype literals, not format strings or other ways
> > of
> >       specifying dtypes
> >
> > - DLPack_ support will be added to NumPy,
> > - New syntax for "device support" will be added, through a
> > ``.device``
> >   attribute on the new array object, and ``device=`` keywords in
> > array
> > creation
> >   functions in the ``array_api`` namespace,
> > - Casting rules that differ from those NumPy currently has. Output
> > dtypes
> > can
> >   be derived from input dtypes (i.e. no value-based casting), and 0-D
> > arrays
> >   are treated like >=1-D arrays.
> > - Not all dtypes NumPy has are part of the standard. Only boolean,
> > signed
> > and
> >   unsigned integers, and floating-point dtypes up to ``float64`` are
> > supported.
> >   Complex dtypes are expected to be added in the next version of the
> > standard.
> >   Extended precision, string, void, object and datetime dtypes, as
> > well as
> >   structured dtypes, are not included.
> >
> > Improvements to existing NumPy functionality that are needed include:
> >
> > - Add support for stacks of matrices to some functions in
> > ``numpy.linalg``
> >   that are currently missing such support.
> > - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``.
> > - Add a "never copy" mode to ``np.asarray``.
> >
> >
> > Functions in the ``array_api`` namespace
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Let's start with an example of a function implementation that shows
> > the most
> > important differences with the equivalent function in the main
> > namespace::
> >
> >     def max(x: array, /, *,
> >             axis: Optional[Union[int, Tuple[int, ...]]] = None,
> >             keepdims: bool = False
> >         ) -> array:
> >         """
> >         Array API compatible wrapper for :py:func:`np.max
> > <numpy.max>`.
> >         """
> >         return np.max._implementation(x, axis=axis,
> > keepdims=keepdims)
> >
> > This function does not accept ``array_like`` inputs, only
> > ``ndarray``. There
> > are multiple reasons for this. Other array libraries all work like
> > this.
> > Letting the user do coercion of lists, generators, or other foreign
> > objects
> > separately results in a cleaner design with less unexpected
> > behaviour.
> > It's higher-performance - less overhead from ``asarray`` calls.
> > Static
> > typing
> > is easier. Subclasses will work as expected. And the slight increase
> > in
> > verbosity
> > because users have to explicitly coerce to ``ndarray`` on rare
> > occasions
> > seems like a small price to pay.
> >
> > This function does not support ``__array_ufunc__`` nor
> > ``__array_function__``.
> > These protocols serve a similar purpose as the array API standard
> > module
> > itself,
> > but through a different mechanisms. Because only ``ndarray``
> > instances are
> > accepted,
> > dispatching via one of these protocols isn't useful anymore.
> >
> > This function uses positional-only parameters in its signature. This
> > makes
> > code
> > more portable - writing ``max(x=x, ...)`` is no longer valid, hence
> > if other
> > libraries call the first parameter ``input`` rather than ``x``, that
> > is
> > fine.
> > The rationale for keyword-only parameters (not shown in the above
> > example)
> > is
> > two-fold: clarity of end user code, and it being easier to extend the
> > signature
> > in the future with keywords in the desired order.
> >
> > This function has inline type annotations. Inline annotations are far
> > easier to
> > maintain than separate stub files. And because the types are simple,
> > this
> > will
> > not result in a large amount of clutter with type aliases or unions
> > like in
> > the
> > current stub files NumPy has.
> >
> >
> > DLPack support for zero-copy data interchange
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The ability to convert one kind of array into another kind is
> > valuable, and
> > indeed necessary when downstream libraries want to support multiple
> > kinds of
> > arrays. This requires a well-specified data exchange protocol. NumPy
> > already
> > supports two of these, namely the buffer protocol (i.e., PEP 3118),
> > and
> > the ``__array_interface__`` (Python side) / ``__array_struct__`` (C
> > side)
> > protocol. Both work similarly, letting the "producer" describe how
> > the data
> > is laid out in memory so the "consumer" can construct its own kind of
> > array
> > with a view on that data.
> >
> > DLPack works in a very similar way. The main reasons to prefer DLPack
> > over
> > the options already present in NumPy are:
> >
> > 1. DLPack is the only protocol with device support (e.g., GPUs using
> > CUDA or
> >    ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> > array
> >    libraries are not. Having one protocol per device isn't tenable,
> > hence
> >    device support is a must.
> > 2. Widespread support. DLPack has the widest adoption of all
> > protocols, only
> >    NumPy is missing support. And the experiences of other libraries
> > with it
> >    are positive. This contrasts with the protocols NumPy does
> > support, which
> >    are used very little - when other libraries want to interoperate
> > with
> >    NumPy, they typically use the (more limited, and NumPy-specific)
> >    ``__array__`` protocol.
> >
> > Adding support for DLPack to NumPy entails:
> >
> > - Adding a ``ndarray.__dlpack__`` method
> > - Adding a ``from_dlpack`` function, which takes as input an object
> >   supporting ``__dlpack__``, and returns an ``ndarray``.
> >
> > DLPack is currently a ~200 LoC header, and is meant to be included
> > directly, so
> > no external dependency is needed. Implementation should be
> > straightforward.
> >
> >
> > Syntax for device support
> > ~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > NumPy itself is CPU-only, so it clearly doesn't have a need for
> > device
> > support.
> > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> > support
> > multiple types of devices: CPU, GPU, TPU, and more exotic hardware.
> > To write portable code on systems with multiple devices, it's often
> > necessary
> > to create new arrays on the same device as some other array, or check
> > that
> > two arrays live on the same device. Hence syntax for that is needed.
> >
> > The array object will have a ``.device`` attribute which enables
> > comparing
> > devices of different arrays (they only should compare equal if both
> > arrays
> > are
> > from the same library and it's the same hardware device).
> > Furthermore,
> > ``device=`` keywords in array creation functions are needed. For
> > example::
> >
> >     def empty(shape: Union[int, Tuple[int, ...]], /, *,
> >               dtype: Optional[dtype] = None,
> >               device: Optional[device] = None) -> array:
> >         """
> >         Array API compatible wrapper for :py:func:`np.empty
> > <numpy.empty>`.
> >         """
> >         return np.empty(shape, dtype=dtype, device=device)
> >
> > The implementation for NumPy may be as simple as setting the device
> > attribute to
> > the string ``'cpu'`` and raising an exception if array creation
> > functions
> > encounter any other value.
> >
> >
> > Dtypes and casting rules
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The supported dtypes in this namespace are boolean, 8/16/32/64-bit
> > signed
> > and
> > unsigned integer, and 32/64-bit floating-point dtypes. These will be
> > added
> > to
> > the namespace as dtype literals with the expected names (e.g.,
> > ``bool``,
> > ``uint16``, ``float64``).
> >
> > The most obvious omissions are the complex dtypes. The rationale for
> > the
> > lack
> > of complex support in the first version of the array API standard is
> > that
> > several
> > libraries (PyTorch, MXNet) are still in the process of adding support
> > for
> > complex dtypes. The next version of the standard is expected to
> > include
> > ``complex64``
> > and ``complex128`` (see `this issue <
> > https://github.com/data-apis/array-api/issues/102>`__
> > for more details).
> >
> > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is
> > expected
> > to only use the dtype literals. Format strings, Python builtin
> > dtypes, or
> > string representations of the dtype literals are not accepted - this
> > will
> > improve readability and portability of code at little cost.
> >
> > Casting rules are only defined between different dtypes of the same
> > kind.
> > The
> > rationale for this is that mixed-kind (e.g., integer to floating-
> > point)
> > casting behavior differs between libraries. NumPy's mixed-kind
> > casting
> > behavior doesn't need to be changed or restricted, it only needs to
> > be
> > documented that if users use mixed-kind casting, their code may not
> > be
> > portable.
> >
> > .. image:: _static/nep-0047-casting-rules-lattice.png
> >
> > *Type promotion diagram. Promotion between any two types is given by
> > their
> > join on this lattice. Only the types of participating arrays matter,
> > not
> > their values. Dashed lines indicate that behaviour for Python scalars
> > is
> > undefined on overflow. Boolean, integer and floating-point dtypes are
> > not
> > connected, indicating mixed-kind promotion is undefined.*
> >
> > The most important difference between the casting rules in NumPy and
> > in the
> > array API standard is how scalars and 0-dimensional arrays are
> > handled. In
> > the standard, array scalars do not exist and 0-dimensional arrays
> > follow the
> > same casting rules as higher-dimensional arrays.
> >
> > See the `Type Promotion Rules section of the array API standard <
> > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > > `__
> > for more details.
> >
> > .. note::
> >
> >     It is not clear what the best way is to support the different
> > casting
> > rules
> >     for 0-dimensional arrays and no value-based casting. One option
> > may be
> > to
> >     implement this second set of casting rules, keep them private,
> > mark the
> >     array API functions with a private attribute that says they
> > adhere to
> >     these different rules, and let the casting machinery check
> > whether for
> >     that attribute.
> >
> >     This needs discussion.
> >
> >
> > Indexing
> > ~~~~~~~~
> >
> > An indexing expression that would return a scalar with ``ndarray``,
> > e.g.
> > ``arr_2d[0, 0]``, will return a 0-D array with the new array object.
> > There
> > are
> > several reasons for that: array scalars are largely considered a
> > design
> > mistake
> > which no other array library copied; it works better for non-CPU
> > libraries
> > (typically arrays can live on the device, scalars live on the host);
> > and
> > it's
> > simply a consistent design. To get a Python scalar out of a 0-D
> > array, one
> > can
> > simply use the builtin for the type, e.g. ``float(arr_0d)``.
> >
> > The other `indexing modes in the standard <
> > https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > > `__
> > do work largely the same as they do for ``numpy.ndarray``. One
> > noteworthy
> > difference is that clipping in slice indexing (e.g., ``a[:n]`` where
> > ``n``
> > is
> > larger than the size of the first axis) is unspecified behaviour,
> > because
> > that kind of check can be expensive on accelerators.
> >
> > The lack of advanced indexing, and boolean indexing being limited to
> > a
> > single
> > n-D boolean array, is due to those indexing modes not being suitable
> > for all
> > types of arrays or JIT compilation. Their absence does not seem to be
> > problematic; if a user or library author wants to use them, they can
> > do so
> > through zero-copy conversion to ``numpy.ndarray``. This will signal
> > correctly
> > to whomever reads the code that it is then NumPy-specific rather than
> > portable
> > to all conforming array types.
> >
> >
> >
> > The array object
> > ~~~~~~~~~~~~~~~~
> >
> > The array object in the standard does not have methods other than
> > dunder
> > methods. The rationale for that is that not all array libraries have
> > methods
> > on their array object (e.g., TensorFlow does not). It also provides
> > only a
> > single way of doing something, rather than have functions and methods
> > that
> > are effectively duplicate.
> >
> > Mixing operations that may produce views (e.g., indexing,
> > ``nonzero``)
> > in combination with mutation (e.g., item or slice assignment) is
> > `explicitly documented in the standard to not be supported <
> > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > > `__.
> > This cannot easily be prohibited in the array object itself; instead
> > this
> > will
> > be guidance to the user via documentation.
> >
> > The standard current does not prescribe a name for the array object
> > itself.
> > We propose to simply name it ``ndarray``. This is the most obvious
> > name, and
> > because of the separate namespace should not clash with
> > ``numpy.ndarray``.
> >
> >
> > Implementation
> > --------------
> >
> > .. note::
> >
> >     This section needs a lot more detail, which will gradually be
> > added when
> >     the implementation progresses.
> >
> > A prototype of the ``array_api`` namespace can be found in
> > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api.
> > The docstring in its ``__init__.py`` has notes on completeness of the
> > implementation. The code for the wrapper functions also contains ``#
> > Note:``
> > comments everywhere there is a difference with the NumPy API.
> > Two important parts that are not implemented yet are the new array
> > object
> > and
> > DLPack support. Functions may need changes to ensure the changed
> > casting
> > rules
> > are respected.
> >
> > The array object
> > ~~~~~~~~~~~~~~~~
> >
> > Regarding the array object implementation, we plan to start with a
> > regular
> > Python class that wraps a ``numpy.ndarray`` instance. Attributes and
> > methods
> > can forward to that wrapped instance, applying input validation and
> > implementing changed behaviour as needed.
> >
> > The casting rules are probably the most challenging part. The in-
> > progress
> > dtype system refactor (NEPs 40-43) should make implementing the
> > correct
> > casting
> > behaviour easier - it is already moving away from value-based casting
> > for
> > example.
> >
> >
> > The dtype objects
> > ~~~~~~~~~~~~~~~~~
> >
> > We must be able to compare dtypes for equality, and expressions like
> > these
> > must
> > be possible::
> >
> >     np.array_api.some_func(..., dtype=x.dtype)
> >
> > The above implies it would be nice to have ``np.array_api.float32 ==
> > np.array_api.ndarray(...).dtype``.
> >
> > Dtypes should not be assumed to have a class hierarchy by users,
> > however we
> > are
> > free to implement it with a class hierarchy if that's convenient. We
> > considered
> > the following options to implement dtype objects:
> >
> > 1. Alias dtypes to those in the main namespace. E.g.,
> > ``np.array_api.float32 =
> >    np.float32``.
> > 2. Make the dtypes instances of ``np.dtype``. E.g.,
> > ``np.array_api.float32 =
> >    np.dtype(np.float32)``.
> > 3. Create new singleton classes with only the required
> > methods/attributes
> >    (currently just ``__eq__``).
> >
> > It seems like (2) would be easiest from the perspective of
> > interacting with
> > functions outside the main namespace. And (3) would adhere best to
> > the
> > standard.
> >
> > TBD: the standard does not yet have a good way to inspect properties
> > of a
> > dtype, to ask questions like "is this an integer dtype?". Perhaps
> > this is
> > easy
> > enough to do for users, like so::
> >
> >     def _get_dtype(dt_or_arr):
> >         return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else
> > dt_or_arr
> >
> >     def is_floating(dtype_or_array):
> >         dtype = _get_dtype(dtype_or_array)
> >         return dtype in (float32, float64)
> >
> >     def is_integer(dtype_or_array):
> >         dtype = _get_dtype(dtype_or_array)
> >         return dtype in (uint8, uint16, uint32, uint64, int8, int16,
> > int32,
> > int64)
> >
> > However it could make sense to add to the standard. Note that NumPy
> > itself
> > currently does not have a great for asking such questions, see
> > `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From sebastian at sipsolutions.net  Wed Mar 10 17:35:49 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 10 Mar 2021 16:35:49 -0600
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <CAKgW=6+LJpuexf6XW5Sn=xgLgaN_nayMwQFdHVheSS3kR0CpuQ@mail.gmail.com>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>
 <CAKgW=6+LJpuexf6XW5Sn=xgLgaN_nayMwQFdHVheSS3kR0CpuQ@mail.gmail.com>
Message-ID: <01b5e7193b6e9b261befb4e62c5b94f39debf69f.camel@sipsolutions.net>

On Wed, 2021-03-10 at 13:44 -0700, Aaron Meurer wrote:
> On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> > 
> > Top Posting, to discuss post specific questions about NEP 47 and
> > partially the start on implementing it in:
> > 
> > ??? https://github.com/numpy/numpy/pull/18585
> > 
> > There are probably many more that will crop up. But for me, each of
> > these is a pretty major difficulty without a clear answer as of
> > now.
> > 
> > 1. I still need clarity how a library is supposed to use this
> > namespace
> > when the user passes in a NumPy array (mentioned before).? The user
> > must get back a NumPy array after all.? Maybe that is just a
> > decorator,
> > but it seems important.
> > 
> > 2. `np.result_type` special cases array-scalars (the current PR),
> > NEP
> > 47 promises it will not.? The PR could attempt to work around that
> > using `arr.dtype` int `result_type`, I expect there are more
> > details to
> > fight with there, but I am not sure.
> 
> The idea is to work around it everywhere, so that it follows the
> rules
> in the spec (no array scalars, no value-based casting). I haven't
> started it yet, though, so I don't know yet how hard it will be. If
> it
> ends up being too hard we could put it in the same camp as device
> support and dlpack support where it needs some basic implementation
> in
> numpy itself first before we can properly do it in the array API
> namespace.

Quite frankly. If you really want to implement a minimal API, it may be
best to just write it yourself and ditch NumPy. (Of course I currently
doubt that the NEP 47 implementation should be minimal.)

About doing promotion yourself  ("promotion" as in what ufuncs do; I
call `np.result_type` "common DType", because it is used e.g. in
`concatenate`):

Ufuncs have at least one more rule for true-division, plus there may be
mixed float-int loops, etc.  Since the standard is very limited and you
only have numeric dtypes that might be all though.

In any case, my point is: If NumPy does strange things (and it does
with 0-D arrays currently).  You could cook your own soup there also,
and implement it in NumPy by using `signature=...` in the ufunc call.


> 
> > 
> > 3. For all other function, the same problem applies. You don't
> > actually
> > have anything to fix NumPy promotion rules.? You could bake your
> > own
> > cake here for numeric types, but I am not sure, you might also need
> > NEP
> > 43 in all its promotion power to pull it off.
> > 
> > 4. Now that I looked at the above, I do not feel its reasonable to
> > limit this functionality to numeric dtypes.? If someone uses a
> > NumPy
> > rational-dtype, why should a SciPy function currently implemented
> > in
> > pure NumPy reject that?? In other words, I think this is the point
> > where trying to be "minimal" is counterproductive.
> 
> The idea of minimality is to make it so users can be sure they will
> be
> able to use other libraries, once they also have array API compliant
> namespaces. A rational-dtype wouldn't ever be implemented in those
> other libraries, because it isn't part of the standard, so if a user
> is using those, that is a sign they are using things that aren't in
> the array API, so they can't expect to be able to swap out their
> dtypes. If a user wants to use something that's only in NumPy, then
> they should just use NumPy.
> 

This is not about the "user", in your scenario the end-user does use
NumPy.  The way I understand this is not a prerequisite.  If it is, a
lot of things will be simpler though, and most of my doubts will go
away (but be replaced with uncertainty about the usefulness).


The problem is that SciPy as the "library author" wants to to use NEP
47 without limiting the end-user (or the end-user even noticing!).
The distinction between end-user and library author (someone who writes
a function that should work with numpy, pytorch, etc.) is very
important here and too all of these "protocol" discussions.


I assume that SciPy should be able to have the cake and eat it to:

* Uses the limited array-api and make sure to only rely on the minimal
  subset.
* Not artificially limit end-users who pass in NumPy arrays.

The second point can also be read as: SciPy would be able to support
practically all current NumPy array use cases without jumping through
any additional hoops (or well, maybe a bit of churn, but churn that is
made easy by as of now undefined API). 

> > 
> > 4. The PR makes no attempt at handling binary operators in any way
> > aside from greedily coercing the other operand.
> > 
> > 5. What happens with a mix of array-likes or even array subclasses
> > like
> > `astropy.quantity`?
> > 
> > 6. Is there any provision on how to deal with mixed array-like
> > inputs?
> > CuPy+numpy, etc.?
> 
> Neither of these are defined in the spec. The spec only deals with
> staying inside of the compliant namespace. It doesn't require any
> behavior mixing things from other namespaces. That's generally
> considered a much harder problem, and there is the data interchange
> protocol to deal with it
> ( 
> https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html
> ).
> 

OK, maybe you can get away with it, since the current proposal seems to
be that `get_namespace()` raises on mixed input. Still seems like
something that should probably raise an error rather than coerce to
NumPy when calling: `nep47_array_object + dask_array`.

Cheers,

Sebastian


> Aaron Meurer
> 
> > 
> > 
> > I don't think we have to figure out everything up-front, but I do
> > think
> > there are a few very fundamental questions still open, at least for
> > me
> > personally.
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > 
> > On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote:
> > > Hi all,
> > > 
> > > Here is a NEP, written together with Stephan Hoyer and Aaron
> > > Meurer,
> > > for
> > > discussion on adoption of the array API standard (
> > > https://data-apis.github.io/array-api/latest/). This will add a
> > > new
> > > numpy.array_api submodule containing that standardized API. The
> > > main
> > > purpose of this API is to be able to write code that is portable
> > > to
> > > other
> > > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask,
> > > and
> > > MXNet.
> > > 
> > > We expect this NEP to remain in draft state for quite a while,
> > > while
> > > we're
> > > gaining experience with using it in downstream libraries, discuss
> > > adding it
> > > to other array libraries, and finishing some of the loose ends
> > > (e.g.,
> > > specifications for linear algebra functions that aren't merged
> > > yet,
> > > see
> > > https://github.com/data-apis/array-api/pulls) in the API standard
> > > itself.
> > > 
> > > See
> > >  
> > > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html
> > > for an initial discussion about this topic.
> > > 
> > > Please keep high-level discussion here and detailed comments on
> > > https://github.com/numpy/numpy/pull/18456. Also, you can access a
> > > rendered
> > > version of the NEP from that PR (see PR description for how),
> > > which
> > > may be
> > > helpful.
> > > Cheers,
> > > Ralf
> > > 
> > > 
> > > Abstract
> > > --------
> > > 
> > > We propose to adopt the `Python array API standard`_, developed
> > > by
> > > the
> > > `Consortium for Python Data API Standards`_. Implementing this as
> > > a
> > > separate
> > > new namespace in NumPy will allow authors of libraries which
> > > depend
> > > on NumPy
> > > as well as end users to write code that is portable between NumPy
> > > and
> > > all
> > > other array/tensor libraries that adopt this standard.
> > > 
> > > .. note::
> > > 
> > > ??? We expect that this NEP will remain in a draft state for
> > > quite a
> > > while.
> > > ??? Given the large scope we don't expect to propose it for
> > > acceptance any
> > > ??? time soon; instead, we want to solicit feedback on both the
> > > high-
> > > level
> > > ??? design and implementation, and learn what needs describing
> > > better
> > > in
> > > this
> > > ??? NEP or changing in either the implementation or the array API
> > > standard
> > > ??? itself.
> > > 
> > > 
> > > Motivation and Scope
> > > --------------------
> > > 
> > > Python users have a wealth of choice for libraries and frameworks
> > > for
> > > numerical computing, data science, machine learning, and deep
> > > learning. New
> > > frameworks pushing forward the state of the art in these fields
> > > are
> > > appearing
> > > every year. One unintended consequence of all this activity and
> > > creativity
> > > has been fragmentation in multidimensional array (a.k.a. tensor)
> > > libraries -
> > > which are the fundamental data structure for these fields.
> > > Choices
> > > include
> > > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others.
> > > 
> > > The APIs of each of these libraries are largely similar, but with
> > > enough
> > > differences that it?s quite difficult to write code that works
> > > with
> > > multiple
> > > (or all) of these libraries. The array API standard aims to
> > > address
> > > that
> > > issue, by specifying an API for the most common ways arrays are
> > > constructed
> > > and used. The proposed API is quite similar to NumPy's API, and
> > > deviates
> > > mainly
> > > in places where (a) NumPy made design choices that are inherently
> > > not
> > > portable
> > > to other implementations, and (b) where other libraries
> > > consistently
> > > deviated
> > > from NumPy on purpose because NumPy's design turned out to have
> > > issues or
> > > unnecessary complexity.
> > > 
> > > For a longer discussion on the purpose of the array API standard
> > > we
> > > refer to
> > > the `Purpose and Scope section of the array API standard <
> > >  
> > > https://data-apis.github.io/array-api/latest/purpose_and_scope.html
> > > >`
> > > __
> > > and the two blog posts announcing the formation of the Consortium
> > > [1]_ and
> > > the release of the first draft version of the standard for
> > > community
> > > review
> > > [2]_.
> > > 
> > > The scope of this NEP includes:
> > > 
> > > - Adopting the 2021 version of the array API standard
> > > - Adding a separate namespace, tentatively named
> > > ``numpy.array_api``
> > > - Changes needed/desired outside of the new namespace, for
> > > example
> > > new
> > > dunder
> > > ? methods on the ``ndarray`` object
> > > - Implementation choices, and differences between functions in
> > > the
> > > new
> > > ? namespace with those in the main ``numpy`` namespace
> > > - A new array object conforming to the array API standard
> > > - Maintenance effort and testing strategy
> > > - Impact on NumPy's total exposed API surface and on other future
> > > and
> > > ? under-discussion design choices
> > > - Relation to existing and proposed NumPy array protocols
> > > ? (``__array_ufunc__``, ``__array_function__``,
> > > ``__array_module__``).
> > > - Required improvements to existing NumPy functionality
> > > 
> > > Out of scope for this NEP are:
> > > 
> > > - Changes in the array API standard itself. Those are likely to
> > > come
> > > up
> > > ? during review of this NEP, but should be upstreamed as needed
> > > and
> > > this NEP
> > > ? subsequently updated.
> > > 
> > > 
> > > Usage and Impact
> > > ----------------
> > > 
> > > *This section will be fleshed out later, for now we refer to the
> > > use
> > > cases
> > > given
> > > in* `the array API standard Use Cases section <
> > > https://data-apis.github.io/array-api/latest/use_cases.html>`__
> > > 
> > > In addition to those use cases, the new namespace contains
> > > functionality
> > > that
> > > is widely used and supported by many array libraries. As such, it
> > > is
> > > a good
> > > set of functions to teach to newcomers to NumPy and recommend as
> > > "best
> > > practice". That contrasts with NumPy's main namespace, which
> > > contains
> > > many
> > > functions and objects that have been superceded or we consider
> > > mistakes -
> > > but
> > > that we can't remove because of backwards compatibility reasons.
> > > 
> > > The usage of the ``numpy.array_api`` namespace by downstream
> > > libraries is
> > > intended to enable them to consume multiple kinds of arrays,
> > > *without
> > > having
> > > to have a hard dependency on all of those array libraries*:
> > > 
> > > .. image:: _static/nep-0047-library-dependencies.png
> > > 
> > > Adoption in downstream libraries
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The prototype implementation of the ``array_api`` namespace will
> > > be
> > > used
> > > with
> > > SciPy, scikit-learn and other libraries of interest that depend
> > > on
> > > NumPy, in
> > > order to get more experience with the design and find out if any
> > > important
> > > parts are missing.
> > > 
> > > The pattern to support multiple array libraries is intended to be
> > > something
> > > like::
> > > 
> > > ??? def somefunc(x, y):
> > > ??????? # Retrieves standard namespace. Raises if x and y have
> > > different
> > > ??????? # namespaces.? See Appendix for possible get_namespace
> > > implementation
> > > ??????? xp = get_namespace(x, y)
> > > ??????? out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
> > > ??????? return out
> > > 
> > > The ``get_namespace`` call is effectively the library author
> > > opting
> > > in to
> > > using the standard API namespace, and thereby explicitly
> > > supporting
> > > all conforming array libraries.
> > > 
> > > 
> > > The ``asarray`` / ``asanyarray`` pattern
> > > ````````````````````````````````````````
> > > 
> > > Many existing libraries use the same ``asarray`` (or
> > > ``asanyarray``)
> > > pattern
> > > as NumPy itself does; accepting any object that can be coerced
> > > into a
> > > ``np.ndarray``.
> > > We consider this design pattern problematic - keeping in mind the
> > > Zen
> > > of
> > > Python, *"explicit is better than implicit"*, as well as the
> > > pattern
> > > being
> > > historically problematic in the SciPy ecosystem for ``ndarray``
> > > subclasses
> > > and with over-eager object creation. All other array/tensor
> > > libraries
> > > are
> > > more strict, and that works out fine in practice. We would advise
> > > authors of
> > > new libraries to avoid the ``asarray`` pattern. Instead they
> > > should
> > > either
> > > accept just NumPy arrays or, if they want to support multiple
> > > kinds
> > > of
> > > arrays, check if the incoming array object supports the array API
> > > standard
> > > by checking for ``__array_namespace__`` as shown in the example
> > > above.
> > > 
> > > Existing libraries can do such a check as well, and only call
> > > ``asarray`` if
> > > the check fails. This is very similar to the ``__duckarray__``
> > > idea
> > > in
> > > :ref:`NEP30`.
> > > 
> > > 
> > > .. _adoption-application-code:
> > > 
> > > Adoption in application code
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The new namespace can be seen by end users as a cleaned up and
> > > slimmed down
> > > version of NumPy's main namespace. Encouraging end users to use
> > > this
> > > namespace like::
> > > 
> > > ??? import numpy.array_api as xp
> > > 
> > > ??? x = xp.linspace(0, 2*xp.pi, num=100)
> > > ??? y = xp.cos(x)
> > > 
> > > seems perfectly reasonable, and potentially beneficial - users
> > > get
> > > offered
> > > only
> > > one function for each purpose (the one we consider best-
> > > practice),
> > > and they
> > > then write code that is more easily portable to other libraries.
> > > 
> > > 
> > > Backward compatibility
> > > ----------------------
> > > 
> > > No deprecations or removals of existing NumPy APIs or other
> > > backwards
> > > incompatible changes are proposed.
> > > 
> > > 
> > > High-level design
> > > -----------------
> > > 
> > > The array API standard consists of approximately 120 objects, all
> > > of
> > > which
> > > have a direct NumPy equivalent. This figure shows what is
> > > included at
> > > a
> > > high level:
> > > 
> > > .. image:: _static/nep-0047-scope-of-array-API.png
> > > 
> > > The most important changes compared to what NumPy currently
> > > offers
> > > are:
> > > 
> > > - A new array object which:
> > > 
> > > ??? - conforms to the casting rules and indexing behaviour
> > > specified
> > > by the
> > > ????? standard,
> > > ??? - does not have methods other than dunder methods,
> > > ??? - does not support the full range of NumPy indexing
> > > behaviour.
> > > Advanced
> > > ????? indexing with integers is not supported. Only boolean
> > > indexing
> > > ????? with a single (possibly multi-dimensional) boolean array is
> > > supported.
> > > ????? An indexing expression that selects a single element
> > > returns a
> > > 0-D
> > > array
> > > ????? rather than a scalar.
> > > 
> > > - Functions in the ``array_api`` namespace:
> > > 
> > > ??? - do not accept ``array_like`` inputs, only NumPy arrays and
> > > Python
> > > scalars
> > > ??? - do not support ``__array_ufunc__`` and
> > > ``__array_function__``,
> > > ??? - use positional-only and keyword-only parameters in their
> > > signatures,
> > > ??? - have inline type annotations,
> > > ??? - may have minor changes to signatures and semantics of
> > > individual
> > > ????? functions compared to their equivalents already present in
> > > NumPy,
> > > ??? - only support dtype literals, not format strings or other
> > > ways
> > > of
> > > ????? specifying dtypes
> > > 
> > > - DLPack_ support will be added to NumPy,
> > > - New syntax for "device support" will be added, through a
> > > ``.device``
> > > ? attribute on the new array object, and ``device=`` keywords in
> > > array
> > > creation
> > > ? functions in the ``array_api`` namespace,
> > > - Casting rules that differ from those NumPy currently has.
> > > Output
> > > dtypes
> > > can
> > > ? be derived from input dtypes (i.e. no value-based casting), and
> > > 0-D
> > > arrays
> > > ? are treated like >=1-D arrays.
> > > - Not all dtypes NumPy has are part of the standard. Only
> > > boolean,
> > > signed
> > > and
> > > ? unsigned integers, and floating-point dtypes up to ``float64``
> > > are
> > > supported.
> > > ? Complex dtypes are expected to be added in the next version of
> > > the
> > > standard.
> > > ? Extended precision, string, void, object and datetime dtypes,
> > > as
> > > well as
> > > ? structured dtypes, are not included.
> > > 
> > > Improvements to existing NumPy functionality that are needed
> > > include:
> > > 
> > > - Add support for stacks of matrices to some functions in
> > > ``numpy.linalg``
> > > ? that are currently missing such support.
> > > - Add the ``keepdims`` keyword to ``np.argmin`` and
> > > ``np.argmax``.
> > > - Add a "never copy" mode to ``np.asarray``.
> > > 
> > > 
> > > Functions in the ``array_api`` namespace
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > Let's start with an example of a function implementation that
> > > shows
> > > the most
> > > important differences with the equivalent function in the main
> > > namespace::
> > > 
> > > ??? def max(x: array, /, *,
> > > ??????????? axis: Optional[Union[int, Tuple[int, ...]]] = None,
> > > ??????????? keepdims: bool = False
> > > ??????? ) -> array:
> > > ??????? """
> > > ??????? Array API compatible wrapper for :py:func:`np.max
> > > <numpy.max>`.
> > > ??????? """
> > > ??????? return np.max._implementation(x, axis=axis,
> > > keepdims=keepdims)
> > > 
> > > This function does not accept ``array_like`` inputs, only
> > > ``ndarray``. There
> > > are multiple reasons for this. Other array libraries all work
> > > like
> > > this.
> > > Letting the user do coercion of lists, generators, or other
> > > foreign
> > > objects
> > > separately results in a cleaner design with less unexpected
> > > behaviour.
> > > It's higher-performance - less overhead from ``asarray`` calls.
> > > Static
> > > typing
> > > is easier. Subclasses will work as expected. And the slight
> > > increase
> > > in
> > > verbosity
> > > because users have to explicitly coerce to ``ndarray`` on rare
> > > occasions
> > > seems like a small price to pay.
> > > 
> > > This function does not support ``__array_ufunc__`` nor
> > > ``__array_function__``.
> > > These protocols serve a similar purpose as the array API standard
> > > module
> > > itself,
> > > but through a different mechanisms. Because only ``ndarray``
> > > instances are
> > > accepted,
> > > dispatching via one of these protocols isn't useful anymore.
> > > 
> > > This function uses positional-only parameters in its signature.
> > > This
> > > makes
> > > code
> > > more portable - writing ``max(x=x, ...)`` is no longer valid,
> > > hence
> > > if other
> > > libraries call the first parameter ``input`` rather than ``x``,
> > > that
> > > is
> > > fine.
> > > The rationale for keyword-only parameters (not shown in the above
> > > example)
> > > is
> > > two-fold: clarity of end user code, and it being easier to extend
> > > the
> > > signature
> > > in the future with keywords in the desired order.
> > > 
> > > This function has inline type annotations. Inline annotations are
> > > far
> > > easier to
> > > maintain than separate stub files. And because the types are
> > > simple,
> > > this
> > > will
> > > not result in a large amount of clutter with type aliases or
> > > unions
> > > like in
> > > the
> > > current stub files NumPy has.
> > > 
> > > 
> > > DLPack support for zero-copy data interchange
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The ability to convert one kind of array into another kind is
> > > valuable, and
> > > indeed necessary when downstream libraries want to support
> > > multiple
> > > kinds of
> > > arrays. This requires a well-specified data exchange protocol.
> > > NumPy
> > > already
> > > supports two of these, namely the buffer protocol (i.e., PEP
> > > 3118),
> > > and
> > > the ``__array_interface__`` (Python side) / ``__array_struct__``
> > > (C
> > > side)
> > > protocol. Both work similarly, letting the "producer" describe
> > > how
> > > the data
> > > is laid out in memory so the "consumer" can construct its own
> > > kind of
> > > array
> > > with a view on that data.
> > > 
> > > DLPack works in a very similar way. The main reasons to prefer
> > > DLPack
> > > over
> > > the options already present in NumPy are:
> > > 
> > > 1. DLPack is the only protocol with device support (e.g., GPUs
> > > using
> > > CUDA or
> > > ?? ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other
> > > array
> > > ?? libraries are not. Having one protocol per device isn't
> > > tenable,
> > > hence
> > > ?? device support is a must.
> > > 2. Widespread support. DLPack has the widest adoption of all
> > > protocols, only
> > > ?? NumPy is missing support. And the experiences of other
> > > libraries
> > > with it
> > > ?? are positive. This contrasts with the protocols NumPy does
> > > support, which
> > > ?? are used very little - when other libraries want to
> > > interoperate
> > > with
> > > ?? NumPy, they typically use the (more limited, and NumPy-
> > > specific)
> > > ?? ``__array__`` protocol.
> > > 
> > > Adding support for DLPack to NumPy entails:
> > > 
> > > - Adding a ``ndarray.__dlpack__`` method
> > > - Adding a ``from_dlpack`` function, which takes as input an
> > > object
> > > ? supporting ``__dlpack__``, and returns an ``ndarray``.
> > > 
> > > DLPack is currently a ~200 LoC header, and is meant to be
> > > included
> > > directly, so
> > > no external dependency is needed. Implementation should be
> > > straightforward.
> > > 
> > > 
> > > Syntax for device support
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > NumPy itself is CPU-only, so it clearly doesn't have a need for
> > > device
> > > support.
> > > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet)
> > > support
> > > multiple types of devices: CPU, GPU, TPU, and more exotic
> > > hardware.
> > > To write portable code on systems with multiple devices, it's
> > > often
> > > necessary
> > > to create new arrays on the same device as some other array, or
> > > check
> > > that
> > > two arrays live on the same device. Hence syntax for that is
> > > needed.
> > > 
> > > The array object will have a ``.device`` attribute which enables
> > > comparing
> > > devices of different arrays (they only should compare equal if
> > > both
> > > arrays
> > > are
> > > from the same library and it's the same hardware device).
> > > Furthermore,
> > > ``device=`` keywords in array creation functions are needed. For
> > > example::
> > > 
> > > ??? def empty(shape: Union[int, Tuple[int, ...]], /, *,
> > > ????????????? dtype: Optional[dtype] = None,
> > > ????????????? device: Optional[device] = None) -> array:
> > > ??????? """
> > > ??????? Array API compatible wrapper for :py:func:`np.empty
> > > <numpy.empty>`.
> > > ??????? """
> > > ??????? return np.empty(shape, dtype=dtype, device=device)
> > > 
> > > The implementation for NumPy may be as simple as setting the
> > > device
> > > attribute to
> > > the string ``'cpu'`` and raising an exception if array creation
> > > functions
> > > encounter any other value.
> > > 
> > > 
> > > Dtypes and casting rules
> > > ~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > The supported dtypes in this namespace are boolean, 8/16/32/64-
> > > bit
> > > signed
> > > and
> > > unsigned integer, and 32/64-bit floating-point dtypes. These will
> > > be
> > > added
> > > to
> > > the namespace as dtype literals with the expected names (e.g.,
> > > ``bool``,
> > > ``uint16``, ``float64``).
> > > 
> > > The most obvious omissions are the complex dtypes. The rationale
> > > for
> > > the
> > > lack
> > > of complex support in the first version of the array API standard
> > > is
> > > that
> > > several
> > > libraries (PyTorch, MXNet) are still in the process of adding
> > > support
> > > for
> > > complex dtypes. The next version of the standard is expected to
> > > include
> > > ``complex64``
> > > and ``complex128`` (see `this issue <
> > > https://github.com/data-apis/array-api/issues/102>`__
> > > for more details).
> > > 
> > > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword,
> > > is
> > > expected
> > > to only use the dtype literals. Format strings, Python builtin
> > > dtypes, or
> > > string representations of the dtype literals are not accepted -
> > > this
> > > will
> > > improve readability and portability of code at little cost.
> > > 
> > > Casting rules are only defined between different dtypes of the
> > > same
> > > kind.
> > > The
> > > rationale for this is that mixed-kind (e.g., integer to floating-
> > > point)
> > > casting behavior differs between libraries. NumPy's mixed-kind
> > > casting
> > > behavior doesn't need to be changed or restricted, it only needs
> > > to
> > > be
> > > documented that if users use mixed-kind casting, their code may
> > > not
> > > be
> > > portable.
> > > 
> > > .. image:: _static/nep-0047-casting-rules-lattice.png
> > > 
> > > *Type promotion diagram. Promotion between any two types is given
> > > by
> > > their
> > > join on this lattice. Only the types of participating arrays
> > > matter,
> > > not
> > > their values. Dashed lines indicate that behaviour for Python
> > > scalars
> > > is
> > > undefined on overflow. Boolean, integer and floating-point dtypes
> > > are
> > > not
> > > connected, indicating mixed-kind promotion is undefined.*
> > > 
> > > The most important difference between the casting rules in NumPy
> > > and
> > > in the
> > > array API standard is how scalars and 0-dimensional arrays are
> > > handled. In
> > > the standard, array scalars do not exist and 0-dimensional arrays
> > > follow the
> > > same casting rules as higher-dimensional arrays.
> > > 
> > > See the `Type Promotion Rules section of the array API standard <
> > >  
> > > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html
> > > > `__
> > > for more details.
> > > 
> > > .. note::
> > > 
> > > ??? It is not clear what the best way is to support the different
> > > casting
> > > rules
> > > ??? for 0-dimensional arrays and no value-based casting. One
> > > option
> > > may be
> > > to
> > > ??? implement this second set of casting rules, keep them
> > > private,
> > > mark the
> > > ??? array API functions with a private attribute that says they
> > > adhere to
> > > ??? these different rules, and let the casting machinery check
> > > whether for
> > > ??? that attribute.
> > > 
> > > ??? This needs discussion.
> > > 
> > > 
> > > Indexing
> > > ~~~~~~~~
> > > 
> > > An indexing expression that would return a scalar with
> > > ``ndarray``,
> > > e.g.
> > > ``arr_2d[0, 0]``, will return a 0-D array with the new array
> > > object.
> > > There
> > > are
> > > several reasons for that: array scalars are largely considered a
> > > design
> > > mistake
> > > which no other array library copied; it works better for non-CPU
> > > libraries
> > > (typically arrays can live on the device, scalars live on the
> > > host);
> > > and
> > > it's
> > > simply a consistent design. To get a Python scalar out of a 0-D
> > > array, one
> > > can
> > > simply use the builtin for the type, e.g. ``float(arr_0d)``.
> > > 
> > > The other `indexing modes in the standard <
> > >  
> > > https://data-apis.github.io/array-api/latest/API_specification/indexing.html
> > > > `__
> > > do work largely the same as they do for ``numpy.ndarray``. One
> > > noteworthy
> > > difference is that clipping in slice indexing (e.g., ``a[:n]``
> > > where
> > > ``n``
> > > is
> > > larger than the size of the first axis) is unspecified behaviour,
> > > because
> > > that kind of check can be expensive on accelerators.
> > > 
> > > The lack of advanced indexing, and boolean indexing being limited
> > > to
> > > a
> > > single
> > > n-D boolean array, is due to those indexing modes not being
> > > suitable
> > > for all
> > > types of arrays or JIT compilation. Their absence does not seem
> > > to be
> > > problematic; if a user or library author wants to use them, they
> > > can
> > > do so
> > > through zero-copy conversion to ``numpy.ndarray``. This will
> > > signal
> > > correctly
> > > to whomever reads the code that it is then NumPy-specific rather
> > > than
> > > portable
> > > to all conforming array types.
> > > 
> > > 
> > > 
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > > 
> > > The array object in the standard does not have methods other than
> > > dunder
> > > methods. The rationale for that is that not all array libraries
> > > have
> > > methods
> > > on their array object (e.g., TensorFlow does not). It also
> > > provides
> > > only a
> > > single way of doing something, rather than have functions and
> > > methods
> > > that
> > > are effectively duplicate.
> > > 
> > > Mixing operations that may produce views (e.g., indexing,
> > > ``nonzero``)
> > > in combination with mutation (e.g., item or slice assignment) is
> > > `explicitly documented in the standard to not be supported <
> > >  
> > > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html
> > > > `__.
> > > This cannot easily be prohibited in the array object itself;
> > > instead
> > > this
> > > will
> > > be guidance to the user via documentation.
> > > 
> > > The standard current does not prescribe a name for the array
> > > object
> > > itself.
> > > We propose to simply name it ``ndarray``. This is the most
> > > obvious
> > > name, and
> > > because of the separate namespace should not clash with
> > > ``numpy.ndarray``.
> > > 
> > > 
> > > Implementation
> > > --------------
> > > 
> > > .. note::
> > > 
> > > ??? This section needs a lot more detail, which will gradually be
> > > added when
> > > ??? the implementation progresses.
> > > 
> > > A prototype of the ``array_api`` namespace can be found in
> > >  
> > > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api
> > > .
> > > The docstring in its ``__init__.py`` has notes on completeness of
> > > the
> > > implementation. The code for the wrapper functions also contains
> > > ``#
> > > Note:``
> > > comments everywhere there is a difference with the NumPy API.
> > > Two important parts that are not implemented yet are the new
> > > array
> > > object
> > > and
> > > DLPack support. Functions may need changes to ensure the changed
> > > casting
> > > rules
> > > are respected.
> > > 
> > > The array object
> > > ~~~~~~~~~~~~~~~~
> > > 
> > > Regarding the array object implementation, we plan to start with
> > > a
> > > regular
> > > Python class that wraps a ``numpy.ndarray`` instance. Attributes
> > > and
> > > methods
> > > can forward to that wrapped instance, applying input validation
> > > and
> > > implementing changed behaviour as needed.
> > > 
> > > The casting rules are probably the most challenging part. The in-
> > > progress
> > > dtype system refactor (NEPs 40-43) should make implementing the
> > > correct
> > > casting
> > > behaviour easier - it is already moving away from value-based
> > > casting
> > > for
> > > example.
> > > 
> > > 
> > > The dtype objects
> > > ~~~~~~~~~~~~~~~~~
> > > 
> > > We must be able to compare dtypes for equality, and expressions
> > > like
> > > these
> > > must
> > > be possible::
> > > 
> > > ??? np.array_api.some_func(..., dtype=x.dtype)
> > > 
> > > The above implies it would be nice to have ``np.array_api.float32
> > > ==
> > > np.array_api.ndarray(...).dtype``.
> > > 
> > > Dtypes should not be assumed to have a class hierarchy by users,
> > > however we
> > > are
> > > free to implement it with a class hierarchy if that's convenient.
> > > We
> > > considered
> > > the following options to implement dtype objects:
> > > 
> > > 1. Alias dtypes to those in the main namespace. E.g.,
> > > ``np.array_api.float32 =
> > > ?? np.float32``.
> > > 2. Make the dtypes instances of ``np.dtype``. E.g.,
> > > ``np.array_api.float32 =
> > > ?? np.dtype(np.float32)``.
> > > 3. Create new singleton classes with only the required
> > > methods/attributes
> > > ?? (currently just ``__eq__``).
> > > 
> > > It seems like (2) would be easiest from the perspective of
> > > interacting with
> > > functions outside the main namespace. And (3) would adhere best
> > > to
> > > the
> > > standard.
> > > 
> > > TBD: the standard does not yet have a good way to inspect
> > > properties
> > > of a
> > > dtype, to ask questions like "is this an integer dtype?". Perhaps
> > > this is
> > > easy
> > > enough to do for users, like so::
> > > 
> > > ??? def _get_dtype(dt_or_arr):
> > > ??????? return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype')
> > > else
> > > dt_or_arr
> > > 
> > > ??? def is_floating(dtype_or_array):
> > > ??????? dtype = _get_dtype(dtype_or_array)
> > > ??????? return dtype in (float32, float64)
> > > 
> > > ??? def is_integer(dtype_or_array):
> > > ??????? dtype = _get_dtype(dtype_or_array)
> > > ??????? return dtype in (uint8, uint16, uint32, uint64, int8,
> > > int16,
> > > int32,
> > > int64)
> > > 
> > > However it could make sense to add to the standard. Note that
> > > NumPy
> > > itself
> > > currently does not have a great for asking such questions, see
> > > `gh-17325 <https://github.com/numpy/numpy/issues/17325>`__.
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210310/373e7d0b/attachment-0001.sig>

From ralf.gommers at gmail.com  Thu Mar 11 06:37:04 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 11 Mar 2021 12:37:04 +0100
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>
Message-ID: <CABL7CQjGe7TPQKVFsZ=eoRyN5NHj5tZq+Hu+3R4rZm7_+3RxHQ@mail.gmail.com>

On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Top Posting, to discuss post specific questions about NEP 47 and
> partially the start on implementing it in:
>
>     https://github.com/numpy/numpy/pull/18585
>
> There are probably many more that will crop up. But for me, each of
> these is a pretty major difficulty without a clear answer as of now.
>

All great questions, that Sebastian. Let me reply to the questions that
Aaron didn't reply to inline below.


> 1. I still need clarity how a library is supposed to use this namespace
> when the user passes in a NumPy array (mentioned before).  The user
> must get back a NumPy array after all.  Maybe that is just a decorator,
> but it seems important.
>

I agree that it will be a common pattern that libraries will accept all
standard-compliant array types plus numpy.ndarray. And the output array
type should match the input type. In Aaron's implementation the new array
object has a numpy.ndarray as private attribute, so that's the instance
that should be returned. A decorator seems like a sensible way to handle
that. Or a simple utility function, something like `return
correct_arraytype(out)`.

Either way, that pattern should be added to NEP 47. I don't see a
fundamental problem here, we just need to find the nicest UX for it.

3. For all other function, the same problem applies. You don't actually
> have anything to fix NumPy promotion rules.  You could bake your own
> cake here for numeric types, but I am not sure, you might also need NEP
> 43 in all its promotion power to pull it off.
>

This is probably the single most difficult question implementation-wise.
Note that there are only numerical dtypes (plus boolean), so dealing with
string, datetime, object or third-party dtypes is a non-issue.

4. The PR makes no attempt at handling binary operators in any way
> aside from greedily coercing the other operand.
>

Agreed. This is the same point as (3) I think - how to handle dtype
promotion is the main open question.


> 5. What happens with a mix of array-likes or even array subclasses like
> `astropy.quantity`?
>

Array-likes (e.g. list) should raise an exception, the NEP clearly says "do
not accept array_like dtypes". This is what every other array/tensor
library already does.

Array subclasses should work as expected, assuming they're valid subclasses
and not things like np.matrix. Using Mypy will help avoid writing more
subclasses that break the Liskov substitution principle. More comments in
https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern

Mixing two different types of arrays into a single function call should
raise an exception. A design goal is: enable writing functions
`somefunc(x1, x2)` that work for any type of array where `x1, x2` come from
the same library = so they're either the same type, or two types for which
the library itself knows how to mix them. If x1 and x2 are from different
libraries, this will raise an exception.

To be clear, it is not intended that `np.array_api.somefunc(x_cupy)` works
- this will raise an exception.

Cheers,
Ralf


>
> I don't think we have to figure out everything up-front, but I do think
> there are a few very fundamental questions still open, at least for me
> personally.
>
> Cheers,
>
> Sebastian
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210311/e4ad1ec9/attachment.html>

From ralf.gommers at gmail.com  Thu Mar 11 07:49:33 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Thu, 11 Mar 2021 13:49:33 +0100
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <01b5e7193b6e9b261befb4e62c5b94f39debf69f.camel@sipsolutions.net>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>
 <CAKgW=6+LJpuexf6XW5Sn=xgLgaN_nayMwQFdHVheSS3kR0CpuQ@mail.gmail.com>
 <01b5e7193b6e9b261befb4e62c5b94f39debf69f.camel@sipsolutions.net>
Message-ID: <CABL7CQgQBC0vqz20P656JN8kHL8dKOEDgsfG3Ngc3GSkzxFuew@mail.gmail.com>

On Wed, Mar 10, 2021 at 11:41 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Wed, 2021-03-10 at 13:44 -0700, Aaron Meurer wrote:
> > On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg
> > <sebastian at sipsolutions.net> wrote:
>
> > >
> > > 2. `np.result_type` special cases array-scalars (the current PR),
> > > NEP
> > > 47 promises it will not.  The PR could attempt to work around that
> > > using `arr.dtype` int `result_type`, I expect there are more
> > > details to
> > > fight with there, but I am not sure.
> >
> > The idea is to work around it everywhere, so that it follows the
> > rules
> > in the spec (no array scalars, no value-based casting). I haven't
> > started it yet, though, so I don't know yet how hard it will be. If
> > it
> > ends up being too hard we could put it in the same camp as device
> > support and dlpack support where it needs some basic implementation
> > in
> > numpy itself first before we can properly do it in the array API
> > namespace.
>
> Quite frankly. If you really want to implement a minimal API, it may be
> best to just write it yourself and ditch NumPy. (Of course I currently
> doubt that the NEP 47 implementation should be minimal.)
>

I'm not really sure what to say other than that I don't think anyone will
be served by "ditching NumPy".

The goal for this "minimal" part is to provide an API that you can write
code against that will work portably across other array libraries. That
seems like a valuable goal, right? And if you want NumPy-specific things
that other libraries don't commonly (or at all) implement and are not
supported by array_api, then you don't use this API but the existing main
numpy namespace.


> About doing promotion yourself  ("promotion" as in what ufuncs do; I
> call `np.result_type` "common DType", because it is used e.g. in
> `concatenate`):
>
> Ufuncs have at least one more rule for true-division, plus there may be
> mixed float-int loops, etc.  Since the standard is very limited and you
> only have numeric dtypes that might be all though.
>
> In any case, my point is: If NumPy does strange things (and it does
> with 0-D arrays currently).  You could cook your own soup there also,
> and implement it in NumPy by using `signature=...` in the ufunc call.
>

Interesting idea.


> > > 4. Now that I looked at the above, I do not feel its reasonable to
> > > limit this functionality to numeric dtypes.  If someone uses a
> > > NumPy
> > > rational-dtype, why should a SciPy function currently implemented
> > > in
> > > pure NumPy reject that?  In other words, I think this is the point
> > > where trying to be "minimal" is counterproductive.
>

SciPy would still be free to implement *both* a portable code path and a
numpy-specific path (if that makes sense, which I doubt in many cases).
There's just no way those two code paths can be 100% common, because no
other library implements a rational dtype.

>
> > The idea of minimality is to make it so users can be sure they will
> > be
> > able to use other libraries, once they also have array API compliant
> > namespaces. A rational-dtype wouldn't ever be implemented in those
> > other libraries, because it isn't part of the standard, so if a user
> > is using those, that is a sign they are using things that aren't in
> > the array API, so they can't expect to be able to swap out their
> > dtypes. If a user wants to use something that's only in NumPy, then
> > they should just use NumPy.
> >
>
> This is not about the "user", in your scenario the end-user does use
> NumPy.  The way I understand this is not a prerequisite.  If it is, a
> lot of things will be simpler though, and most of my doubts will go
> away (but be replaced with uncertainty about the usefulness).
>

> The problem is that SciPy as the "library author" wants to to use NEP
> 47 without limiting the end-user (or the end-user even noticing!).
> The distinction between end-user and library author (someone who writes
> a function that should work with numpy, pytorch, etc.) is very
> important here and too all of these "protocol" discussions.
>

The example feels a little forced. >99% of end user code written against
libraries like SciPy uses standard numerical dtypes. Things like a rational
dtype are very niche. A rationale dtype works with most NumPy functions,
but is not at all guaranteed to work with SciPy functions - and if it does
it's accidental, untested and may break if SciPy would change its
implementation (e.g. move from pure Python + NumPy to Cython or C++).


>
> I assume that SciPy should be able to have the cake and eat it to:
>
> * Uses the limited array-api and make sure to only rely on the minimal
>   subset.
> * Not artificially limit end-users who pass in NumPy arrays.
>
> The second point can also be read as: SciPy would be able to support
> practically all current NumPy array use cases without jumping through
> any additional hoops (or well, maybe a bit of churn, but churn that is
> made easy by as of now undefined API).
>

I suspect you have things in mind that are not actually supported by SciPy
today. The rational dtype is one example, but so are ndarray subclasses.
Take masked arrays as an example - these are not supported today, except
for scipy.stats.mstats functionality - where support is intentional,
special-cases and tested.

For masked arrays as well as other arbitrary fancy subclasses, there's some
not-well-defined subset of functionality that may work today, but that is
fragile, untested and can break without warning in any release. Only
Liskov-substitutable ndarray subclasses are not fragile - those are simply
coerced to ndarray via the ubiquitous `np.asarray` pattern, and ndarrays
are returned. That must and will remain working.

This is a complex topic, and it's possible that I'm missing other use cases
you have in mind, so I thought I'd make a diagram to explain the difference
between the custom dtypes & subclasses that are supported by NumPy itself
but not by downstream libraries:
https://github.com/rgommers/numpy/blob/numpy-scipy-custom-inputs/doc/neps/_static/nep-0047-numpy-scipy-custominputs.png


>
> > > 6. Is there any provision on how to deal with mixed array-like
> > > inputs?
> > > CuPy+numpy, etc.?
> >
> > Neither of these are defined in the spec. The spec only deals with
> > staying inside of the compliant namespace. It doesn't require any
> > behavior mixing things from other namespaces. That's generally
> > considered a much harder problem, and there is the data interchange
> > protocol to deal with it
> > (
> >
> https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html
> > ).
> >
>
> OK, maybe you can get away with it, since the current proposal seems to
> be that `get_namespace()` raises on mixed input. Still seems like
> something that should probably raise an error rather than coerce to
> NumPy when calling: `nep47_array_object + dask_array`.
>

Agreed, this must raise too.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210311/39cbef40/attachment-0001.html>

From sebastian at sipsolutions.net  Thu Mar 11 12:07:42 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 11 Mar 2021 11:07:42 -0600
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <CABL7CQjGe7TPQKVFsZ=eoRyN5NHj5tZq+Hu+3R4rZm7_+3RxHQ@mail.gmail.com>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>
 <CABL7CQjGe7TPQKVFsZ=eoRyN5NHj5tZq+Hu+3R4rZm7_+3RxHQ@mail.gmail.com>
Message-ID: <3ba55e0fe50da814b486a73855da35770e50303b.camel@sipsolutions.net>

On Thu, 2021-03-11 at 12:37 +0100, Ralf Gommers wrote:
> On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg < 
> sebastian at sipsolutions.net>
> wrote:
> 
> > Top Posting, to discuss post specific questions about NEP 47 and
> > partially the start on implementing it in:
> > 
> > ??? https://github.com/numpy/numpy/pull/18585
> > 
> > There are probably many more that will crop up. But for me, each of
> > these is a pretty major difficulty without a clear answer as of
> > now.
> > 
> 
> All great questions, that Sebastian. Let me reply to the questions
> that
> Aaron didn't reply to inline below.
> 

To be clear, I do not expect complete answers to these questions right
now.  (Although being unsure about some of them does make me slightly
reluctant to merge the work-in-progress into NumPy proper as opposed to
a separate repo.)

Also, yes, most/all questions are hopefully are just trivialities to
check of (or no more than seeds for thought).  Or even just a starting
point for making NEP 47's "Usage and Impact" section more complete
including them as either "example usage patterns" or "limitations".


My second takeaway from the questions is that I have doubts the
"minimal" version will pan out, it feels like many of the questions
might disappear if you drop that part. So, from my current thinking,
the minimal implementation may not be a good "NEP 47" implementation.

That does _not_ mean that I think you should pause and reconsider or
even worry about pleasing me with good answers!  Just continue under
whatever assumption you prefer and if it turns out that "minimal" won't
work for NEP 47: no harm done!  We need a "minimal implementation" in
any case.

Cheers,

Sebastian


[1] If SciPy needs an additional NumPy code path to keep support
`object` arrays or other dtypes ? right now even complex ?, then the
reader needs to be aware of that to make a decision if NEP 47 will
actually help for their library.
Will AstroPy have to reimplement `astropy.units.Quantity` to be
"standard conform" (is that even possible!?) before it can easily adopt
it any of its API that currently works with `astropy.units.Quantity`?


> 
> > 1. I still need clarity how a library is supposed to use this
> > namespace
> > when the user passes in a NumPy array (mentioned before).? The user
> > must get back a NumPy array after all.? Maybe that is just a
> > decorator,
> > but it seems important.
> > 
> 
> I agree that it will be a common pattern that libraries will accept
> all
> standard-compliant array types plus numpy.ndarray. And the output
> array
> type should match the input type. In Aaron's implementation the new
> array
> object has a numpy.ndarray as private attribute, so that's the
> instance
> that should be returned. A decorator seems like a sensible way to
> handle
> that. Or a simple utility function, something like `return
> correct_arraytype(out)`.
> 
> Either way, that pattern should be added to NEP 47. I don't see a
> fundamental problem here, we just need to find the nicest UX for it.
> 
> 3. For all other function, the same problem applies. You don't
> actually
> > have anything to fix NumPy promotion rules.? You could bake your
> > own
> > cake here for numeric types, but I am not sure, you might also need
> > NEP
> > 43 in all its promotion power to pull it off.
> > 
> 
> This is probably the single most difficult question implementation-
> wise.
> Note that there are only numerical dtypes (plus boolean), so dealing
> with
> string, datetime, object or third-party dtypes is a non-issue.
> 
> 4. The PR makes no attempt at handling binary operators in any way
> > aside from greedily coercing the other operand.
> > 
> 
> Agreed. This is the same point as (3) I think - how to handle dtype
> promotion is the main open question.
>
> 
> > 5. What happens with a mix of array-likes or even array subclasses
> > like
> > `astropy.quantity`?
> > 
> 
> Array-likes (e.g. list) should raise an exception, the NEP clearly
> says "do
> not accept array_like dtypes". This is what every other array/tensor
> library already does.
> 
> Array subclasses should work as expected, assuming they're valid
> subclasses
> and not things like np.matrix. Using Mypy will help avoid writing
> more
> subclasses that break the Liskov substitution principle. More
> comments in
> https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern
> 
> Mixing two different types of arrays into a single function call
> should
> raise an exception. A design goal is: enable writing functions
> `somefunc(x1, x2)` that work for any type of array where `x1, x2`
> come from
> the same library = so they're either the same type, or two types for
> which
> the library itself knows how to mix them. If x1 and x2 are from
> different
> libraries, this will raise an exception.
> 
> To be clear, it is not intended that `np.array_api.somefunc(x_cupy)`
> works
> - this will raise an exception.
> 
> Cheers,
> Ralf
> 
> 
> 
> > 
> > I don't think we have to figure out everything up-front, but I do
> > think
> > there are a few very fundamental questions still open, at least for
> > me
> > personally.
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210311/62776a42/attachment.sig>

From pierre.augier at univ-grenoble-alpes.fr  Fri Mar 12 15:36:20 2021
From: pierre.augier at univ-grenoble-alpes.fr (PIERRE AUGIER)
Date: Fri, 12 Mar 2021 21:36:20 +0100 (CET)
Subject: [Numpy-discussion] Looking for a difference between Numpy 0.19.5
 and 0.20 explaining a perf regression with Pythran
Message-ID: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>

Hi,

I'm looking for a difference between Numpy 0.19.5 and 0.20 which could explain a performance regression (~15 %) with Pythran.

I observe this regression with the script https://github.com/paugier/nbabel/blob/master/py/bench.py

Pythran reimplements Numpy so it is not about Numpy code for computation. However, Pythran of course uses the native array contained in a Numpy array. I'm quite sure that something has changed between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) since I don't get the same performance with Numpy 0.20. I checked that the values in the arrays are the same and that the flags characterizing the arrays are also the same.

Good news, I'm now able to obtain the performance difference just with Numpy 0.19.5. In this code, I load the data with Pandas and need to prepare contiguous Numpy arrays to give them to Pythran. With Numpy 0.19.5, if I use np.copy I get better performance that with np.ascontiguousarray. With Numpy 0.20, both functions create array giving the same performance with Pythran (again, less good that with Numpy 0.19.5).

Note that this code is very efficient (more that 100 times faster than using Numpy), so I guess that things like alignment or memory location can lead to such difference.

More details in this issue https://github.com/serge-sans-paille/pythran/issues/1735

Any help to understand what has changed would be greatly appreciated!

Cheers,
Pierre

From melissawm at gmail.com  Fri Mar 12 16:27:23 2021
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Fri, 12 Mar 2021 18:27:23 -0300
Subject: [Numpy-discussion] Documentation Team meeting - Monday March 15
 (Beware of Daylight Saving Time!)
In-Reply-To: <CAC7J6VYzL4+hxf7=1NfxHx7tVyfQci2iNNr7=gCRgJkMVXuiyQ@mail.gmail.com>
References: <CAC7J6Vbkch+=wQcEbJgELAphM2d2552Ywr2txrbinUR-khTR=g@mail.gmail.com>
 <CAC7J6VY9k_EGL-adWpeHLvFOEG6EBy3wauhD8cZ5vmL-JDxA0Q@mail.gmail.com>
 <CAC7J6VaNwZDbtt4FP6mX_-+iD_d=_4VVcjdGiaJj=q7eWCHewA@mail.gmail.com>
 <CAC7J6VZ2D8q0s7+rpKv7HYh9oBqwOQnUP_HjwKCjMtAnfi60Og@mail.gmail.com>
 <CAC7J6VY=vgKXySL1Euy30ZNzqkqvcr4Za1L04JZv-GCk87c1AQ@mail.gmail.com>
 <CAC7J6Vbq0vtd5JH6MXvcSj4A4aZjg+x=QdHvPyQGeGcmpp8JYg@mail.gmail.com>
 <CAC7J6VZ_+uh967zGc7qA6+KWKsP=gZ8bZecrJ9Hq9oG0JTzsOg@mail.gmail.com>
 <CAC7J6VZkder9wiEmE+Cm5Lxg0zt0t_a-KwfuvcCXZG91Wk+7kA@mail.gmail.com>
 <CAC7J6VYyt4c2qcp92iE8_BMF70cUC3-ZX5E5mqNa0zVAFryvmQ@mail.gmail.com>
 <CAC7J6VZq1DLDk--F+S6F50DHMMXOaWo6cbpSau7APHe22X_c9w@mail.gmail.com>
 <CAC7J6VaegMEY9dvMeRSe2YY=AUEJYkXNNF-KTy89zyE6LgHo5A@mail.gmail.com>
 <CAC7J6VY4_X-MyXajy0KryML03KoryKR9n5dsf5bupgv7noukUg@mail.gmail.com>
 <CAC7J6Va3n9Hdym+RmscOkDyf1ukH+DDtCv7N9D8y0cX1a+uC-A@mail.gmail.com>
 <CAC7J6VZX57n+_nWL-dKe4YzqY=Nc1=LYRXkNv8+JcP=q3Vyqog@mail.gmail.com>
 <CAC7J6Vb5uGp-gEMudbbiWSBKo1gyMsq03M4hz9mnNYir9GcqXw@mail.gmail.com>
 <CAC7J6Va3g7Pqrs=Hxtgg6GKHhZwErz4PPiKLw0YL-R2HUkWprQ@mail.gmail.com>
 <CAC7J6Vay+j5SsbXsJkCh2K5aeR6a8HkrkFpzDWoPkOfH=a9OkA@mail.gmail.com>
 <CAC7J6VYzL4+hxf7=1NfxHx7tVyfQci2iNNr7=gCRgJkMVXuiyQ@mail.gmail.com>
Message-ID: <CAC7J6VaBg=C-J4TgiUOSeHmE5qRM0O_Fzxsfyyozgczqsihv+g@mail.gmail.com>

Hi all!

Our next Documentation Team meeting will be on *Monday, March 15* at ***4PM
UTC*** (This has probably changed for you if you have recently gone through
a DST change).

All are welcome - you don't need to already be a contributor to join. If
you have questions or are curious about what we're doing, we'll be happy to
meet you!

If you wish to join on Zoom, use this link:

https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success

Here's the permanent hackmd document with the meeting notes (still being
updated in the next few days!):

https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg
<https://www.google.com/url?q=https%3A%2F%2Fhackmd.io%2FoB_boakvRqKR-_2jRV-Qjg&sa=D&usd=2&usg=AFQjCNGIOzVwlfDFd6YAgBwVUjmjQKWRSw>

Hope to see you around!

** You can click this link to get the correct time at your timezone:
https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210315T16&p1=1440&ah=1

- Melissa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210312/33914f3a/attachment.html>

From sebastian at sipsolutions.net  Fri Mar 12 16:50:24 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 12 Mar 2021 15:50:24 -0600
Subject: [Numpy-discussion] Looking for a difference between Numpy
 0.19.5 and 0.20 explaining a perf regression with Pythran
In-Reply-To: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
Message-ID: <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>

On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> Hi,
> 
> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> could explain a performance regression (~15 %) with Pythran.
> 
> I observe this regression with the script 
> https://github.com/paugier/nbabel/blob/master/py/bench.py
> 
> Pythran reimplements Numpy so it is not about Numpy code for
> computation. However, Pythran of course uses the native array
> contained in a Numpy array. I'm quite sure that something has changed
> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
> since I don't get the same performance with Numpy 0.20. I checked
> that the values in the arrays are the same and that the flags
> characterizing the arrays are also the same.
> 
> Good news, I'm now able to obtain the performance difference just
> with Numpy 0.19.5. In this code, I load the data with Pandas and need
> to prepare contiguous Numpy arrays to give them to Pythran. With
> Numpy 0.19.5, if I use np.copy I get better performance that with
> np.ascontiguousarray. With Numpy 0.20, both functions create array
> giving the same performance with Pythran (again, less good that with
> Numpy 0.19.5).
> 
> Note that this code is very efficient (more that 100 times faster
> than using Numpy), so I guess that things like alignment or memory
> location can lead to such difference.
> 
> More details in this issue 
> https://github.com/serge-sans-paille/pythran/issues/1735
> 
> Any help to understand what has changed would be greatly appreciated!
> 

If you want to really dig into this, it would be good to do profiling
to find out at where the differences are.

Without that, I don't have much appetite to investigate personally. The
reason is that fluctuations of ~30% (or even much more) when running
the NumPy benchmarks are very common.

I am not aware of an immediate change in NumPy, especially since you
are talking pythran, and only the memory space or the interface code
should matter.
As to the interface code... I would expect it to be quite a bit faster,
not slower.
There was no change around data allocation, so at best what you are
seeing is a different pattern in how the "small array cache" ends up
being used.


Unfortunately, getting stable benchmarks that reflect code changes
exactly is tough...  Here is a nice blog post from Victor Stinner where
he had to go as far as using "profile guided compilation" to avoid
fluctuations:

https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html

I somewhat hope that this is also the reason for the huge fluctuations
we see in the NumPy benchmarks due to absolutely unrelated code
changes.
But I did not have the energy to try it (and a probably fixed bug in
gcc makes it a bit harder right now).

Cheers,

Sebastian


> Cheers,
> Pierre
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210312/888edf57/attachment.sig>

From pierre.augier at univ-grenoble-alpes.fr  Fri Mar 12 18:33:42 2021
From: pierre.augier at univ-grenoble-alpes.fr (PIERRE AUGIER)
Date: Sat, 13 Mar 2021 00:33:42 +0100 (CET)
Subject: [Numpy-discussion] Looking for a difference between Numpy
 0.19.5 and 0.20 explaining a perf regression with Pythran
In-Reply-To: <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
Message-ID: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>

Hi,

I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy --force-reinstall` and I can reproduce the regression.

Good news, I was able to reproduce the difference with only Numpy 1.20.1. 

Arrays prepared with (`df` is a Pandas dataframe)

arr = df.values.copy()

or 

arr = np.ascontiguousarray(df.values)

lead to "slow" execution while arrays prepared with

arr = np.copy(df.values)

lead to faster execution.

arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to give exactly the same result.

Note that I think I'm doing quite serious and reproducible benchmarks. I also checked that this regression is reproducible on another computer.

Cheers,

Pierre

----- Mail original -----
> De: "Sebastian Berg" <sebastian at sipsolutions.net>
> ?: "numpy-discussion" <numpy-discussion at python.org>
> Envoy?: Vendredi 12 Mars 2021 22:50:24
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
> Pythran

> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
>> Hi,
>> 
>> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
>> could explain a performance regression (~15 %) with Pythran.
>> 
>> I observe this regression with the script
>> https://github.com/paugier/nbabel/blob/master/py/bench.py
>> 
>> Pythran reimplements Numpy so it is not about Numpy code for
>> computation. However, Pythran of course uses the native array
>> contained in a Numpy array. I'm quite sure that something has changed
>> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
>> since I don't get the same performance with Numpy 0.20. I checked
>> that the values in the arrays are the same and that the flags
>> characterizing the arrays are also the same.
>> 
>> Good news, I'm now able to obtain the performance difference just
>> with Numpy 0.19.5. In this code, I load the data with Pandas and need
>> to prepare contiguous Numpy arrays to give them to Pythran. With
>> Numpy 0.19.5, if I use np.copy I get better performance that with
>> np.ascontiguousarray. With Numpy 0.20, both functions create array
>> giving the same performance with Pythran (again, less good that with
>> Numpy 0.19.5).
>> 
>> Note that this code is very efficient (more that 100 times faster
>> than using Numpy), so I guess that things like alignment or memory
>> location can lead to such difference.
>> 
>> More details in this issue
>> https://github.com/serge-sans-paille/pythran/issues/1735
>> 
>> Any help to understand what has changed would be greatly appreciated!
>> 
> 
> If you want to really dig into this, it would be good to do profiling
> to find out at where the differences are.
> 
> Without that, I don't have much appetite to investigate personally. The
> reason is that fluctuations of ~30% (or even much more) when running
> the NumPy benchmarks are very common.
> 
> I am not aware of an immediate change in NumPy, especially since you
> are talking pythran, and only the memory space or the interface code
> should matter.
> As to the interface code... I would expect it to be quite a bit faster,
> not slower.
> There was no change around data allocation, so at best what you are
> seeing is a different pattern in how the "small array cache" ends up
> being used.
> 
> 
> Unfortunately, getting stable benchmarks that reflect code changes
> exactly is tough...  Here is a nice blog post from Victor Stinner where
> he had to go as far as using "profile guided compilation" to avoid
> fluctuations:
> 
> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> 
> I somewhat hope that this is also the reason for the huge fluctuations
> we see in the NumPy benchmarks due to absolutely unrelated code
> changes.
> But I did not have the energy to try it (and a probably fixed bug in
> gcc makes it a bit harder right now).
> 
> Cheers,
> 
> Sebastian
> 
> 
> 
> 
>> Cheers,
>> Pierre
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From efiring at hawaii.edu  Fri Mar 12 18:53:23 2021
From: efiring at hawaii.edu (Eric Firing)
Date: Fri, 12 Mar 2021 13:53:23 -1000
Subject: [Numpy-discussion] Looking for a difference between Numpy
 0.19.5 and 0.20 explaining a perf regression with Pythran
In-Reply-To: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
 <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
Message-ID: <9ce63b83-8e61-e415-aef6-e5385e3a0649@hawaii.edu>

On 2021/03/12 1:33 PM, PIERRE AUGIER wrote:
> arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to give exactly the same result.

According to the docstrings for numpy.copy and arr.copy, the function 
and the method have different defaults for the memory layout.  np.copy() 
tries to maintain the order of the original while arr.copy() defaults to 
C order.

Eric

From diagonaldevice at gmail.com  Fri Mar 12 19:24:08 2021
From: diagonaldevice at gmail.com (Michael Lamparski)
Date: Fri, 12 Mar 2021 19:24:08 -0500
Subject: [Numpy-discussion] Programmatically contracting multiple tensors
Message-ID: <CAGfQJGHxGgh+XgnoSif6sUXoxfJEL35DQQL-6itipD5vm2QeDw@mail.gmail.com>

Greetings,

I have something in my code where I can receive an array M of unknown
dimensionality and a list of "labels" for each axis.  E.g. perhaps I might
get an array of shape (2, 47, 3, 47, 3) with labels ['spin', 'atom',
'coord', 'atom', 'coord'].

For every axis that is labeled "coord", I want to multiply in some rotation
matrix R.  So, for the above example, this could be done with the following
handwritten line:

return np.einsum('Cc,Ee,abcde->abCdE', R, R, M)

But since I want to do this programmatically, I find myself in the awkward
situation of having to construct this string (and e.g. having to
arbitrarily limit the number of axes to 26 or something like that).  Is
there a more idiomatic way to do this that would let me supply integer
labels for summation indices?  Or should I just bite the bullet and start
generating strings?

---
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210312/da7e99e0/attachment.html>

From wieser.eric+numpy at gmail.com  Fri Mar 12 19:32:01 2021
From: wieser.eric+numpy at gmail.com (Eric Wieser)
Date: Sat, 13 Mar 2021 00:32:01 +0000
Subject: [Numpy-discussion] Programmatically contracting multiple tensors
In-Reply-To: <CAGfQJGHxGgh+XgnoSif6sUXoxfJEL35DQQL-6itipD5vm2QeDw@mail.gmail.com>
References: <CAGfQJGHxGgh+XgnoSif6sUXoxfJEL35DQQL-6itipD5vm2QeDw@mail.gmail.com>
Message-ID: <CAL1kJvCu7feKE4VvjCuB+A7_a8VRTaMarAo37CNdxwjB=Jf_EA@mail.gmail.com>

Einsum has a secret integer argument format that appears in the Examples
section of the `np.einsum` docs, but appears not to be mentioned at all in
the parameter listing.

Eric

On Sat, 13 Mar 2021 at 00:25, Michael Lamparski <diagonaldevice at gmail.com>
wrote:

> Greetings,
>
> I have something in my code where I can receive an array M of unknown
> dimensionality and a list of "labels" for each axis.  E.g. perhaps I might
> get an array of shape (2, 47, 3, 47, 3) with labels ['spin', 'atom',
> 'coord', 'atom', 'coord'].
>
> For every axis that is labeled "coord", I want to multiply in some
> rotation matrix R.  So, for the above example, this could be done with the
> following handwritten line:
>
> return np.einsum('Cc,Ee,abcde->abCdE', R, R, M)
>
> But since I want to do this programmatically, I find myself in the awkward
> situation of having to construct this string (and e.g. having to
> arbitrarily limit the number of axes to 26 or something like that).  Is
> there a more idiomatic way to do this that would let me supply integer
> labels for summation indices?  Or should I just bite the bullet and start
> generating strings?
>
> ---
> Michael
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/543d6200/attachment.html>

From deak.andris at gmail.com  Fri Mar 12 20:09:02 2021
From: deak.andris at gmail.com (Andras Deak)
Date: Sat, 13 Mar 2021 02:09:02 +0100
Subject: [Numpy-discussion] Programmatically contracting multiple tensors
In-Reply-To: <CAL1kJvCu7feKE4VvjCuB+A7_a8VRTaMarAo37CNdxwjB=Jf_EA@mail.gmail.com>
References: <CAGfQJGHxGgh+XgnoSif6sUXoxfJEL35DQQL-6itipD5vm2QeDw@mail.gmail.com>
 <CAL1kJvCu7feKE4VvjCuB+A7_a8VRTaMarAo37CNdxwjB=Jf_EA@mail.gmail.com>
Message-ID: <CAMEWA4O13436d+uJm3NS01vpMA+POQ88A+KtREyr90CsKMxwgA@mail.gmail.com>

On Sat, Mar 13, 2021 at 1:32 AM Eric Wieser <wieser.eric+numpy at gmail.com> wrote:
>
> Einsum has a secret integer argument format that appears in the Examples section of the `np.einsum` docs, but appears not to be mentioned at all in the parameter listing.

It's mentioned (albeit somewhat cryptically) sooner in the Notes:

"einsum also provides an alternative way to provide the subscripts and
operands as einsum(op0, sublist0, op1, sublist1, ..., [sublistout]).
If the output shape is not provided in this format einsum will be
calculated in implicit mode, otherwise it will be performed
explicitly. The examples below have corresponding einsum calls with
the two parameter methods.

New in version 1.10.0."

Not that this helps much, because I definitely wouldn't understand
this API without the examples.
But I'm not sure _where_ this could be highlighted among the
parameters; after all this is all covered by the *operands parameter.

Andr?s


> Eric
>
> On Sat, 13 Mar 2021 at 00:25, Michael Lamparski <diagonaldevice at gmail.com> wrote:
>>
>> Greetings,
>>
>> I have something in my code where I can receive an array M of unknown dimensionality and a list of "labels" for each axis.  E.g. perhaps I might get an array of shape (2, 47, 3, 47, 3) with labels ['spin', 'atom', 'coord', 'atom', 'coord'].
>>
>> For every axis that is labeled "coord", I want to multiply in some rotation matrix R.  So, for the above example, this could be done with the following handwritten line:
>>
>> return np.einsum('Cc,Ee,abcde->abCdE', R, R, M)
>>
>> But since I want to do this programmatically, I find myself in the awkward situation of having to construct this string (and e.g. having to arbitrarily limit the number of axes to 26 or something like that).  Is there a more idiomatic way to do this that would let me supply integer labels for summation indices?  Or should I just bite the bullet and start generating strings?
>>
>> ---
>> Michael
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From sebastian at sipsolutions.net  Fri Mar 12 20:24:33 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Fri, 12 Mar 2021 19:24:33 -0600
Subject: [Numpy-discussion] Looking for a difference between Numpy
 0.19.5 and 0.20 explaining a perf regression with Pythran
In-Reply-To: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
 <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
Message-ID: <f446a327b684d25c26357fdb39d1edc56eec4c77.camel@sipsolutions.net>

On Sat, 2021-03-13 at 00:33 +0100, PIERRE AUGIER wrote:
> Hi,
> 
> I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary
> numpy --force-reinstall` and I can reproduce the regression.
> 
> Good news, I was able to reproduce the difference with only Numpy
> 1.20.1. 
> 
> Arrays prepared with (`df` is a Pandas dataframe)
> 
> arr = df.values.copy()
> 
> or 
> 
> arr = np.ascontiguousarray(df.values)
> 
> lead to "slow" execution while arrays prepared with
> 
> arr = np.copy(df.values)
> 
> lead to faster execution.
> 
> arr.copy() or np.copy(arr) do not give the same result, with arr
> obtained from a Pandas dataframe with arr = df.values. It's strange
> because type(df.values) gives <class 'numpy.ndarray'> so I would
> expect arr.copy() and np.copy(arr) to give exactly the same result.

The only thing that can change would be the arrays flags and
`arr.strides`, but they should not have cahnged.  And there is no
change in NumPy that I can even remotely think of.  Array data is just
allocated with `malloc`.

That is: as I understand it, you are *not* timing `np.copy` or
`np.ascontiguouscopy` itself, but just operating on the array returned.
NumPy only ever uses `malloc` for allocating array content. 

> 
> Note that I think I'm doing quite serious and reproducible
> benchmarks. I also checked that this regression is reproducible on
> another computer.

I absolutely trust the benchmark results. I was hoping you might also
be running a profiler (as in analyze the running program) to find out
where the difference originate on the C side.  That would allow to say
with certainty either what changed or that there was no actual related
code change.

E.g. I have seen huge speed differences in the same `memcpy` or similar
calls, due to whatever reasons (maybe due to compiler changes, or due
to address space changes... or maybe the former causing the latter, I
don't know.).

Cheers,

Sebastian


> 
> Cheers,
> 
> Pierre
> 
> ----- Mail original -----
> > De: "Sebastian Berg" <sebastian at sipsolutions.net>
> > ?: "numpy-discussion" <numpy-discussion at python.org>
> > Envoy?: Vendredi 12 Mars 2021 22:50:24
> > Objet: Re: [Numpy-discussion] Looking for a difference between
> > Numpy 0.19.5 and 0.20 explaining a perf regression with
> > Pythran
> 
> > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> > > Hi,
> > > 
> > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> > > could explain a performance regression (~15 %) with Pythran.
> > > 
> > > I observe this regression with the script
> > > https://github.com/paugier/nbabel/blob/master/py/bench.py
> > > 
> > > Pythran reimplements Numpy so it is not about Numpy code for
> > > computation. However, Pythran of course uses the native array
> > > contained in a Numpy array. I'm quite sure that something has
> > > changed
> > > between Numpy 0.19.5 and 0.20 (or between the corresponding
> > > wheels?)
> > > since I don't get the same performance with Numpy 0.20. I checked
> > > that the values in the arrays are the same and that the flags
> > > characterizing the arrays are also the same.
> > > 
> > > Good news, I'm now able to obtain the performance difference just
> > > with Numpy 0.19.5. In this code, I load the data with Pandas and
> > > need
> > > to prepare contiguous Numpy arrays to give them to Pythran. With
> > > Numpy 0.19.5, if I use np.copy I get better performance that with
> > > np.ascontiguousarray. With Numpy 0.20, both functions create
> > > array
> > > giving the same performance with Pythran (again, less good that
> > > with
> > > Numpy 0.19.5).
> > > 
> > > Note that this code is very efficient (more that 100 times faster
> > > than using Numpy), so I guess that things like alignment or
> > > memory
> > > location can lead to such difference.
> > > 
> > > More details in this issue
> > > https://github.com/serge-sans-paille/pythran/issues/1735
> > > 
> > > Any help to understand what has changed would be greatly
> > > appreciated!
> > > 
> > 
> > If you want to really dig into this, it would be good to do
> > profiling
> > to find out at where the differences are.
> > 
> > Without that, I don't have much appetite to investigate personally.
> > The
> > reason is that fluctuations of ~30% (or even much more) when
> > running
> > the NumPy benchmarks are very common.
> > 
> > I am not aware of an immediate change in NumPy, especially since
> > you
> > are talking pythran, and only the memory space or the interface
> > code
> > should matter.
> > As to the interface code... I would expect it to be quite a bit
> > faster,
> > not slower.
> > There was no change around data allocation, so at best what you are
> > seeing is a different pattern in how the "small array cache" ends
> > up
> > being used.
> > 
> > 
> > Unfortunately, getting stable benchmarks that reflect code changes
> > exactly is tough...? Here is a nice blog post from Victor Stinner
> > where
> > he had to go as far as using "profile guided compilation" to avoid
> > fluctuations:
> > 
> >  
> > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> > 
> > I somewhat hope that this is also the reason for the huge
> > fluctuations
> > we see in the NumPy benchmarks due to absolutely unrelated code
> > changes.
> > But I did not have the energy to try it (and a probably fixed bug
> > in
> > gcc makes it a bit harder right now).
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > 
> > 
> > > Cheers,
> > > Pierre
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210312/284d9a42/attachment-0001.sig>

From klark--kent at yandex.ru  Sat Mar 13 15:55:17 2021
From: klark--kent at yandex.ru (klark--kent at yandex.ru)
Date: Sat, 13 Mar 2021 23:55:17 +0300
Subject: [Numpy-discussion] size of arrays
In-Reply-To: <f446a327b684d25c26357fdb39d1edc56eec4c77.camel@sipsolutions.net>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
 <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <f446a327b684d25c26357fdb39d1edc56eec4c77.camel@sipsolutions.net>
Message-ID: <526711615664631@mail.yandex.ru>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/a0863768/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 23474 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/a0863768/attachment-0001.png>

From toddrjen at gmail.com  Sat Mar 13 16:05:11 2021
From: toddrjen at gmail.com (Todd)
Date: Sat, 13 Mar 2021 16:05:11 -0500
Subject: [Numpy-discussion] size of arrays
In-Reply-To: <526711615664631@mail.yandex.ru>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
 <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <f446a327b684d25c26357fdb39d1edc56eec4c77.camel@sipsolutions.net>
 <526711615664631@mail.yandex.ru>
Message-ID: <CAFpSVpJw3X2UsAadbOqPK5K-CGQg5MAnN0Q=u-pdFWhc6+MGgQ@mail.gmail.com>

Ideally float64 uses 64 bits for each number while float16 uses 16 bits.
64/16=4.  However, there is some additional overhead.  This overhead makes
up a large portion of small arrays, but becomes negligible as the array
gets bigger.

On Sat, Mar 13, 2021, 16:01 <klark--kent at yandex.ru> wrote:

> Dear colleagues!
>
> Size of np.float16(1) is 26
> Size of np.float64(1) is 32
> 32 / 26 = 1.23
>
> Since memory is limited I have a question after this code:
>
>    import numpy as np
>    import sys
>
>    a1 = np.ones(1, dtype='float16')
>    b1 = np.ones(1, dtype='float64')
>    div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)
>    # div_1 = 1.06
>
>    a2 = np.ones(10, dtype='float16')
>    b2 = np.ones(10, dtype='float64')
>    div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)
>    # div_2 = 1.51
>
>    a3 = np.ones(100, dtype='float16')
>    b3 = np.ones(100, dtype='float64')
>    div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)
>    # div_3 = 3.0
> Size of np.float64 numpy arrays is four times more than for np.float16.
> Is it possible to minimize the difference close to 1.23?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/88edb3c8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 23474 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/88edb3c8/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 23474 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/88edb3c8/attachment-0003.png>

From robert.kern at gmail.com  Sat Mar 13 16:15:15 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 13 Mar 2021 16:15:15 -0500
Subject: [Numpy-discussion] size of arrays
In-Reply-To: <526711615664631@mail.yandex.ru>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
 <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <f446a327b684d25c26357fdb39d1edc56eec4c77.camel@sipsolutions.net>
 <526711615664631@mail.yandex.ru>
Message-ID: <CAF6FJisxP64fMGPK8vW4gmbw9BTENGOHRHmS0Zz5L=MkhC9vGg@mail.gmail.com>

On Sat, Mar 13, 2021 at 4:02 PM <klark--kent at yandex.ru> wrote:

> Dear colleagues!
>
> Size of np.float16(1) is 26
> Size of np.float64(1) is 32
> 32 / 26 = 1.23
>

Note that `sys.getsizeof()` is returning the size of the given Python
object in bytes. `np.float16(1)` and `np.float64(1)` are so-called "numpy
scalar objects" that wrap up the raw `float16` (2 bytes) and `float64` (8
bytes) values with the necessary information to make them Python objects.
The extra 24 bytes for each is _not_ present for each value when you have
`float16` and `float64` arrays of larger lengths. There is still some
overhead to make the array of numbers into a Python object, but this does
not increase with the number of array elements. This is what you are seeing
below when you compute the sizes of the Python objects that are the arrays.
The fixed overhead does not increase when you increase the sizes of the
arrays. They eventually approach the ideal ratio of 4: `float64` values
take up 4 times as many bytes as `float16` values, as the names suggest.
The ratio of 1.23 that you get from comparing the scalar objects reflects
that the overhead for making a single value into a Python object takes up
significantly more memory than the actual single number itself.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/9e5b86c5/attachment.html>

From klark--kent at yandex.ru  Sat Mar 13 16:17:08 2021
From: klark--kent at yandex.ru (klark--kent at yandex.ru)
Date: Sun, 14 Mar 2021 00:17:08 +0300
Subject: [Numpy-discussion] size of arrays
In-Reply-To: <CAFpSVpJw3X2UsAadbOqPK5K-CGQg5MAnN0Q=u-pdFWhc6+MGgQ@mail.gmail.com>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
 <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <f446a327b684d25c26357fdb39d1edc56eec4c77.camel@sipsolutions.net>
 <526711615664631@mail.yandex.ru>
 <CAFpSVpJw3X2UsAadbOqPK5K-CGQg5MAnN0Q=u-pdFWhc6+MGgQ@mail.gmail.com>
Message-ID: <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210314/793bbfb4/attachment.html>

From robert.kern at gmail.com  Sat Mar 13 16:21:30 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Sat, 13 Mar 2021 16:21:30 -0500
Subject: [Numpy-discussion] size of arrays
In-Reply-To: <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
 <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <f446a327b684d25c26357fdb39d1edc56eec4c77.camel@sipsolutions.net>
 <526711615664631@mail.yandex.ru>
 <CAFpSVpJw3X2UsAadbOqPK5K-CGQg5MAnN0Q=u-pdFWhc6+MGgQ@mail.gmail.com>
 <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net>
Message-ID: <CAF6FJivKs0eG5sw2jfS7PXpHx6g+nSF0k0u3SD-EGbf4PjVjSg@mail.gmail.com>

On Sat, Mar 13, 2021 at 4:18 PM <klark--kent at yandex.ru> wrote:

> So is it right that 100 arrays of one element is smaller than one array
> with size of 100 elements?
>

No, typically the opposite is true.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/28ae6670/attachment.html>

From toddrjen at gmail.com  Sat Mar 13 18:27:23 2021
From: toddrjen at gmail.com (Todd)
Date: Sat, 13 Mar 2021 18:27:23 -0500
Subject: [Numpy-discussion] size of arrays
In-Reply-To: <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net>
References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net>
 <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <f446a327b684d25c26357fdb39d1edc56eec4c77.camel@sipsolutions.net>
 <526711615664631@mail.yandex.ru>
 <CAFpSVpJw3X2UsAadbOqPK5K-CGQg5MAnN0Q=u-pdFWhc6+MGgQ@mail.gmail.com>
 <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net>
Message-ID: <CAFpSVpKFDA0yucSrkLmcaeU1BqN2=2mC0o0xauAvcANx_qHh4Q@mail.gmail.com>

No, because the array of 100 elements will only have the overhead once,
while the 100 arrays will each have the overhead repeated.


Think about the overhead like a book cover on a book. It takes additional
space, but provides storage for the book, information to help you find it,
etc. Each book only needs one cover. So a single 100 page book only needs
one cover, while a hundred 1 page books needs 100 covers. Also, as the book
gets more pages the cover takes a smaller portion of the total size of the
book.

On Sat, Mar 13, 2021, 16:17 <klark--kent at yandex.ru> wrote:

> So is it right that 100 arrays of one element is smaller than one array
> with size of 100 elements?
>
> 14.03.2021, 00:06, "Todd" <toddrjen at gmail.com>:
>
> Ideally float64 uses 64 bits for each number while float16 uses 16 bits.
> 64/16=4.  However, there is some additional overhead.  This overhead makes
> up a large portion of small arrays, but becomes negligible as the array
> gets bigger.
>
> On Sat, Mar 13, 2021, 16:01 <klark--kent at yandex.ru> wrote:
>
> Dear colleagues!
>
> Size of np.float16(1) is 26
> Size of np.float64(1) is 32
> 32 / 26 = 1.23
>
> Since memory is limited I have a question after this code:
>
>    import numpy as np
>    import sys
>
>    a1 = np.ones(1, dtype='float16')
>    b1 = np.ones(1, dtype='float64')
>    div_1 = sys.getsizeof(b1) / sys.getsizeof(a1)
>    # div_1 = 1.06
>
>    a2 = np.ones(10, dtype='float16')
>    b2 = np.ones(10, dtype='float64')
>    div_2 = sys.getsizeof(b2) / sys.getsizeof(a2)
>    # div_2 = 1.51
>
>    a3 = np.ones(100, dtype='float16')
>    b3 = np.ones(100, dtype='float64')
>    div_3 = sys.getsizeof(b3) / sys.getsizeof(a3)
>    # div_3 = 3.0
> Size of np.float64 numpy arrays is four times more than for np.float16.
> Is it possible to minimize the difference close to 1.23?
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210313/0b6ada38/attachment-0001.html>

From dan_patterson at outlook.com  Sat Mar 13 23:12:58 2021
From: dan_patterson at outlook.com (dan_patterson)
Date: Sat, 13 Mar 2021 21:12:58 -0700 (MST)
Subject: [Numpy-discussion] Numpy 1.20.1 availability
Message-ID: <1615695178675-0.post@n7.nabble.com>

Any idea why the most recent version isn't available on the main anaconda
channel.  conda-forge and building are not options for a number of reasons.  
I posted a package request there but double digit days have gone by it just
got a thumbs up and package-request tag
https://github.com/ContinuumIO/anaconda-issues/issues/12309
I realize it could be the "times" or maybe no one is aware of its absence.


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From jni at fastmail.com  Sun Mar 14 01:15:39 2021
From: jni at fastmail.com (Juan Nunez-Iglesias)
Date: Sun, 14 Mar 2021 17:15:39 +1100
Subject: [Numpy-discussion] Looking for a difference between Numpy
 0.19.5 and 0.20 explaining a perf regression with Pythran
In-Reply-To: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
Message-ID: <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com>

Hi Pierre,

If you?re able to compile NumPy locally and you have reliable benchmarks, you can write a script that tests the runtime of your benchmark and reports it as a test pass/fail. You can then use ?git bisect run? to automatically find the commit that caused the issue. That will help narrow down the discussion before it gets completely derailed a second time. ?

https://lwn.net/Articles/317154/

Juan. 

> On 13 Mar 2021, at 10:34 am, PIERRE AUGIER <pierre.augier at univ-grenoble-alpes.fr> wrote:
> 
> ?Hi,
> 
> I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy --force-reinstall` and I can reproduce the regression.
> 
> Good news, I was able to reproduce the difference with only Numpy 1.20.1. 
> 
> Arrays prepared with (`df` is a Pandas dataframe)
> 
> arr = df.values.copy()
> 
> or 
> 
> arr = np.ascontiguousarray(df.values)
> 
> lead to "slow" execution while arrays prepared with
> 
> arr = np.copy(df.values)
> 
> lead to faster execution.
> 
> arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to give exactly the same result.
> 
> Note that I think I'm doing quite serious and reproducible benchmarks. I also checked that this regression is reproducible on another computer.
> 
> Cheers,
> 
> Pierre
> 
> ----- Mail original -----
>> De: "Sebastian Berg" <sebastian at sipsolutions.net>
>> ?: "numpy-discussion" <numpy-discussion at python.org>
>> Envoy?: Vendredi 12 Mars 2021 22:50:24
>> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
>> Pythran
> 
>>> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
>>> Hi,
>>> 
>>> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
>>> could explain a performance regression (~15 %) with Pythran.
>>> 
>>> I observe this regression with the script
>>> https://github.com/paugier/nbabel/blob/master/py/bench.py
>>> 
>>> Pythran reimplements Numpy so it is not about Numpy code for
>>> computation. However, Pythran of course uses the native array
>>> contained in a Numpy array. I'm quite sure that something has changed
>>> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
>>> since I don't get the same performance with Numpy 0.20. I checked
>>> that the values in the arrays are the same and that the flags
>>> characterizing the arrays are also the same.
>>> 
>>> Good news, I'm now able to obtain the performance difference just
>>> with Numpy 0.19.5. In this code, I load the data with Pandas and need
>>> to prepare contiguous Numpy arrays to give them to Pythran. With
>>> Numpy 0.19.5, if I use np.copy I get better performance that with
>>> np.ascontiguousarray. With Numpy 0.20, both functions create array
>>> giving the same performance with Pythran (again, less good that with
>>> Numpy 0.19.5).
>>> 
>>> Note that this code is very efficient (more that 100 times faster
>>> than using Numpy), so I guess that things like alignment or memory
>>> location can lead to such difference.
>>> 
>>> More details in this issue
>>> https://github.com/serge-sans-paille/pythran/issues/1735
>>> 
>>> Any help to understand what has changed would be greatly appreciated!
>>> 
>> 
>> If you want to really dig into this, it would be good to do profiling
>> to find out at where the differences are.
>> 
>> Without that, I don't have much appetite to investigate personally. The
>> reason is that fluctuations of ~30% (or even much more) when running
>> the NumPy benchmarks are very common.
>> 
>> I am not aware of an immediate change in NumPy, especially since you
>> are talking pythran, and only the memory space or the interface code
>> should matter.
>> As to the interface code... I would expect it to be quite a bit faster,
>> not slower.
>> There was no change around data allocation, so at best what you are
>> seeing is a different pattern in how the "small array cache" ends up
>> being used.
>> 
>> 
>> Unfortunately, getting stable benchmarks that reflect code changes
>> exactly is tough...  Here is a nice blog post from Victor Stinner where
>> he had to go as far as using "profile guided compilation" to avoid
>> fluctuations:
>> 
>> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
>> 
>> I somewhat hope that this is also the reason for the huge fluctuations
>> we see in the NumPy benchmarks due to absolutely unrelated code
>> changes.
>> But I did not have the energy to try it (and a probably fixed bug in
>> gcc makes it a bit harder right now).
>> 
>> Cheers,
>> 
>> Sebastian
>> 
>> 
>> 
>> 
>>> Cheers,
>>> Pierre
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>> 
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210314/b0165e0b/attachment.html>

From matti.picus at gmail.com  Sun Mar 14 04:05:50 2021
From: matti.picus at gmail.com (Matti Picus)
Date: Sun, 14 Mar 2021 10:05:50 +0200
Subject: [Numpy-discussion] Numpy 1.20.1 availability
In-Reply-To: <1615695178675-0.post@n7.nabble.com>
References: <1615695178675-0.post@n7.nabble.com>
Message-ID: <d825481a-903b-62c5-6a57-d181fcf57f59@gmail.com>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210314/a1af991e/attachment-0001.html>

From p.j.a.cock at googlemail.com  Sun Mar 14 06:14:13 2021
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Sun, 14 Mar 2021 10:14:13 +0000
Subject: [Numpy-discussion] Numpy 1.20.1 availability
In-Reply-To: <d825481a-903b-62c5-6a57-d181fcf57f59@gmail.com>
References: <1615695178675-0.post@n7.nabble.com>
 <d825481a-903b-62c5-6a57-d181fcf57f59@gmail.com>
Message-ID: <CAKVJ-_5fvSe8WjWF6GHPmk5_ttnHtv5o2S+jENTZFrymw8DyiQ@mail.gmail.com>

I would recommend using the community run conda-forge as one of your
default conda channels. They have a very slick largely automated system
to update recipes when upstream makes a release. The default Anaconda
channel from Anaconda, Inc. (formerly Continuum Analytics, Inc.) is
comparatively slow.

You may recognise some of the maintainers of the conda-forge numpy
recipe? https://github.com/conda-forge/numpy-feedstock/

I'm impressed to see 17 million conda-forge numpy downloads, vs
'just' 2.5 million downloads of the default channel's package:

https://anaconda.org/conda-forge/numpy
https://anaconda.org/anaconda/numpy

Regards,

Peter

On Sun, Mar 14, 2021 at 8:06 AM Matti Picus <matti.picus at gmail.com> wrote:
>
> On 3/14/21 6:12 AM, dan_patterson wrote:
>
> Any idea why the most recent version isn't available on the main anaconda
> channel.  conda-forge and building are not options for a number of reasons.
> I posted a package request there but double digit days have gone by it just
> got a thumbs up and package-request tag
> https://github.com/ContinuumIO/anaconda-issues/issues/12309
> I realize it could be the "times" or maybe no one is aware of its absence.
>
>
> NumPy does not control the packages on the main anaconda channel, so a request here is likely to go unanswered. The package has been updated in the conda-forge channel.
>
>
> Matti
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From dan_patterson at outlook.com  Sun Mar 14 07:43:09 2021
From: dan_patterson at outlook.com (dan_patterson)
Date: Sun, 14 Mar 2021 04:43:09 -0700 (MST)
Subject: [Numpy-discussion] Numpy 1.20.1 availability
In-Reply-To: <CAKVJ-_5fvSe8WjWF6GHPmk5_ttnHtv5o2S+jENTZFrymw8DyiQ@mail.gmail.com>
References: <1615695178675-0.post@n7.nabble.com>
 <d825481a-903b-62c5-6a57-d181fcf57f59@gmail.com>
 <CAKVJ-_5fvSe8WjWF6GHPmk5_ttnHtv5o2S+jENTZFrymw8DyiQ@mail.gmail.com>
Message-ID: <1615722189832-0.post@n7.nabble.com>

Thanks, glad to hear that people are aware of the delay.
As I said, there are other reasons beyond my control, for the limitations.
The wait is on.


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From ralf.gommers at gmail.com  Sun Mar 14 07:45:17 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 14 Mar 2021 12:45:17 +0100
Subject: [Numpy-discussion] Numpy 1.20.1 availability
In-Reply-To: <CAKVJ-_5fvSe8WjWF6GHPmk5_ttnHtv5o2S+jENTZFrymw8DyiQ@mail.gmail.com>
References: <1615695178675-0.post@n7.nabble.com>
 <d825481a-903b-62c5-6a57-d181fcf57f59@gmail.com>
 <CAKVJ-_5fvSe8WjWF6GHPmk5_ttnHtv5o2S+jENTZFrymw8DyiQ@mail.gmail.com>
Message-ID: <CABL7CQh=Gp4V119_GCMyvgO5+-uZhDNz-6oEyAhxSH6dbeyghg@mail.gmail.com>

On Sun, Mar 14, 2021 at 11:14 AM Peter Cock <p.j.a.cock at googlemail.com>
wrote:

> I would recommend using the community run conda-forge as one of your
> default conda channels. They have a very slick largely automated system
> to update recipes when upstream makes a release. The default Anaconda
> channel from Anaconda, Inc. (formerly Continuum Analytics, Inc.) is
> comparatively slow.
>

Agreed. I know the goal of the maintainers of the defaults channel is to
make the latest version available quickly. However, `defaults` requires
more integration testing than conda-forge/PyPI, and work tends to happen in
batches - in the past we've seen update times ranging from days to several
months.

We have some guidance at https://numpy.org/install/. Basically the two main
reasons to use `defaults`: for beginning users with modest needs, the
easiest thing to get started is just installing the Anaconda distribution
(which gives you `defaults`). Or you have corporate policies to use
`defaults` - you can pay Anaconda and it does come with things companies
and institutions may need, like guarantees around uptime and security.

Cheers,
Ralf


> You may recognise some of the maintainers of the conda-forge numpy
> recipe? https://github.com/conda-forge/numpy-feedstock/
>
> I'm impressed to see 17 million conda-forge numpy downloads, vs
> 'just' 2.5 million downloads of the default channel's package:
>
> https://anaconda.org/conda-forge/numpy
> https://anaconda.org/anaconda/numpy
>
> Regards,
>
> Peter
>
> On Sun, Mar 14, 2021 at 8:06 AM Matti Picus <matti.picus at gmail.com> wrote:
> >
> > On 3/14/21 6:12 AM, dan_patterson wrote:
> >
> > Any idea why the most recent version isn't available on the main anaconda
> > channel.  conda-forge and building are not options for a number of
> reasons.
> > I posted a package request there but double digit days have gone by it
> just
> > got a thumbs up and package-request tag
> > https://github.com/ContinuumIO/anaconda-issues/issues/12309
> > I realize it could be the "times" or maybe no one is aware of its
> absence.
> >
> >
> > NumPy does not control the packages on the main anaconda channel, so a
> request here is likely to go unanswered. The package has been updated in
> the conda-forge channel.
> >
> >
> > Matti
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210314/5f676e53/attachment.html>

From ndbecker2 at gmail.com  Sun Mar 14 10:52:58 2021
From: ndbecker2 at gmail.com (Neal Becker)
Date: Sun, 14 Mar 2021 10:52:58 -0400
Subject: [Numpy-discussion] Pi day easter egg
Message-ID: <CAG3t+pGjz8qUipCUYkYPEMm4-wYGOWkyyWNtbAEgV4Gx5mZktA@mail.gmail.com>

There's a little pi day easter egg for all math fans.  Google for pi to
find it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210314/59a29474/attachment.html>

From sebastian at sipsolutions.net  Sun Mar 14 12:48:13 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Sun, 14 Mar 2021 11:48:13 -0500
Subject: [Numpy-discussion] Looking for a difference between Numpy
 0.19.5 and 0.20 explaining a perf regression with Pythran
In-Reply-To: <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com>
References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com>
Message-ID: <f6703a9bac75f8e6f804cc0252ae8d149c7f8b8c.camel@sipsolutions.net>

On Sun, 2021-03-14 at 17:15 +1100, Juan Nunez-Iglesias wrote:
> Hi Pierre,
> 
> If you?re able to compile NumPy locally and you have reliable
> benchmarks, you can write a script that tests the runtime of your
> benchmark and reports it as a test pass/fail. You can then use ?git
> bisect run? to automatically find the commit that caused the issue.
> That will help narrow down the discussion before it gets completely
> derailed a second time. ?
> 
> https://lwn.net/Articles/317154/


Let me share this partial benchmark result for branch I just worked on
in NumPy:

       before           after         ratio
     [c5de5b5c]       [2d9e11ea]
     <main>           <splitup-faster-argparsing>
+     2.12?0.01?s      3.69?0.02?s     1.74  bench_io.Copy.time_cont_assign('float32')
+     22.6?0.08?s       36.0?0.2?s     1.59  bench_io.CopyTo.time_copyto_sparse
+      49.4?0.8?s       55.2?0.1?s     1.12  bench_io.CopyTo.time_copyto_8_sparse
-     7.40?0.06?s      4.11?0.01?s     0.56  bench_io.CopyTo.time_copyto_dense
-     6.99?0.05?s         3.77?0?s     0.54  bench_io.Copy.time_cont_assign('float64')
-     6.94?0.02?s      3.73?0.01?s     0.54  bench_io.Copy.time_cont_assign('complex64')


That looks weird!  The benchmark sometimes speeds up by a factor of
almost 2, and sometimes the (de-facto) same code slows down by just as
much? (Focus on the `time_cont_assign` with float64 vs. float32).

Even better: I know 100% that no related code is touched!  The core of
that benchmark is just:

     arrya[...] = 1

and I did not even come close to any code related to that operation.


I have, as I did before, tried quite a few things (not as much as in
Victor Stinner's blog when it comes to compiler flags).
Such as enabling/disabling huge-pages, disabling address-space-
randomization (and disabling the NumPy small-array cache).

Note that the results are *stable*, as in: On this branch, I get
extremely reliable results for the benchmark [1]!


As you noticed, I have also seen these (or similar) changes "toggle"
e.g. when copying the array multiple times.  And I have dug down into
profiling one instance on the instruction level with `perf` so I know
for a fact that it is memory access speed.  (Which is a no-brainer
here, the operations are obviously memory or even cache speed bound.)


The point I was hoping to make is: Its complicated, and I am not
holding my breath that you can find an answer without digging much
deeper.
The blog post from Victor Stinner gave me the thought that profile-
guided-optimization *might* be a way to avoid some random fluctuations,
but I have not checked that the inner-loop for the code actually
compiles to different byte-code.


I would hope that someone comes along and "just knows" what is going
on. But, I don't know where to ask or what to google for.

My best bets right now (they may be terrible!) are:

* Profiler guided optimization might help (as in stabilize compiler
output due to *random* changes in code).  Which probably is involved in
some way or another.  But Victor Stinner's timed Python and that may
not have any massively memory bound operations (which are the "big"
things here).

* Maybe try to make the NumPy allocator align all its allocation to
much larger boundaries, such as the CPU cache-line size.  But I think I
tried to check whether alignment seems to matter, and it didn't.  Also,
the arrays feel large enough that it shouldn't matter?

* CPU caching L1/L2 uses a lot of fancy heuristics these days. Maybe to
really understand whats going on, you would have to drill into what the
CPU caches are doing here?


The only thing I do know for sure currently, is that it is a rabbit
hole that I would love to understand, but don't really want to spend
days just to get nowhere.

Cheers,

Sebastian


[1] That run above is without address space randomization, it feels
even more stable than the others.  But that doesn't matter, since we
average in any case, so ASR is probably useless and maybe even
detrimental.


> 
> Juan. 
> 
> > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER <
> > pierre.augier at univ-grenoble-alpes.fr> wrote:
> > 
> > ?Hi,
> > 
> > I tried to compile Numpy with `pip install numpy==1.20.1 --no-
> > binary numpy --force-reinstall` and I can reproduce the regression.
> > 
> > Good news, I was able to reproduce the difference with only Numpy
> > 1.20.1. 
> > 
> > Arrays prepared with (`df` is a Pandas dataframe)
> > 
> > arr = df.values.copy()
> > 
> > or 
> > 
> > arr = np.ascontiguousarray(df.values)
> > 
> > lead to "slow" execution while arrays prepared with
> > 
> > arr = np.copy(df.values)
> > 
> > lead to faster execution.
> > 
> > arr.copy() or np.copy(arr) do not give the same result, with arr
> > obtained from a Pandas dataframe with arr = df.values. It's strange
> > because type(df.values) gives <class 'numpy.ndarray'> so I would
> > expect arr.copy() and np.copy(arr) to give exactly the same result.
> > 
> > Note that I think I'm doing quite serious and reproducible
> > benchmarks. I also checked that this regression is reproducible on
> > another computer.
> > 
> > Cheers,
> > 
> > Pierre
> > 
> > ----- Mail original -----
> > > De: "Sebastian Berg" <sebastian at sipsolutions.net>
> > > ?: "numpy-discussion" <numpy-discussion at python.org>
> > > Envoy?: Vendredi 12 Mars 2021 22:50:24
> > > Objet: Re: [Numpy-discussion] Looking for a difference between
> > > Numpy 0.19.5 and 0.20 explaining a perf regression with
> > > Pythran
> > 
> > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> > > > Hi,
> > > > 
> > > > I'm looking for a difference between Numpy 0.19.5 and 0.20
> > > > which
> > > > could explain a performance regression (~15 %) with Pythran.
> > > > 
> > > > I observe this regression with the script
> > > > https://github.com/paugier/nbabel/blob/master/py/bench.py
> > > > 
> > > > Pythran reimplements Numpy so it is not about Numpy code for
> > > > computation. However, Pythran of course uses the native array
> > > > contained in a Numpy array. I'm quite sure that something has
> > > > changed
> > > > between Numpy 0.19.5 and 0.20 (or between the corresponding
> > > > wheels?)
> > > > since I don't get the same performance with Numpy 0.20. I
> > > > checked
> > > > that the values in the arrays are the same and that the flags
> > > > characterizing the arrays are also the same.
> > > > 
> > > > Good news, I'm now able to obtain the performance difference
> > > > just
> > > > with Numpy 0.19.5. In this code, I load the data with Pandas
> > > > and need
> > > > to prepare contiguous Numpy arrays to give them to Pythran.
> > > > With
> > > > Numpy 0.19.5, if I use np.copy I get better performance that
> > > > with
> > > > np.ascontiguousarray. With Numpy 0.20, both functions create
> > > > array
> > > > giving the same performance with Pythran (again, less good that
> > > > with
> > > > Numpy 0.19.5).
> > > > 
> > > > Note that this code is very efficient (more that 100 times
> > > > faster
> > > > than using Numpy), so I guess that things like alignment or
> > > > memory
> > > > location can lead to such difference.
> > > > 
> > > > More details in this issue
> > > > https://github.com/serge-sans-paille/pythran/issues/1735
> > > > 
> > > > Any help to understand what has changed would be greatly
> > > > appreciated!
> > > > 
> > > 
> > > If you want to really dig into this, it would be good to do
> > > profiling
> > > to find out at where the differences are.
> > > 
> > > Without that, I don't have much appetite to investigate
> > > personally. The
> > > reason is that fluctuations of ~30% (or even much more) when
> > > running
> > > the NumPy benchmarks are very common.
> > > 
> > > I am not aware of an immediate change in NumPy, especially since
> > > you
> > > are talking pythran, and only the memory space or the interface
> > > code
> > > should matter.
> > > As to the interface code... I would expect it to be quite a bit
> > > faster,
> > > not slower.
> > > There was no change around data allocation, so at best what you
> > > are
> > > seeing is a different pattern in how the "small array cache" ends
> > > up
> > > being used.
> > > 
> > > 
> > > Unfortunately, getting stable benchmarks that reflect code
> > > changes
> > > exactly is tough...? Here is a nice blog post from Victor Stinner
> > > where
> > > he had to go as far as using "profile guided compilation" to
> > > avoid
> > > fluctuations:
> > > 
> > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> > > 
> > > I somewhat hope that this is also the reason for the huge
> > > fluctuations
> > > we see in the NumPy benchmarks due to absolutely unrelated code
> > > changes.
> > > But I did not have the energy to try it (and a probably fixed bug
> > > in
> > > gcc makes it a bit harder right now).
> > > 
> > > Cheers,
> > > 
> > > Sebastian
> > > 
> > > 
> > > 
> > > 
> > > > Cheers,
> > > > Pierre
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210314/dcd5e25f/attachment.sig>

From sheikholeslam.ali at gmail.com  Sun Mar 14 15:04:58 2021
From: sheikholeslam.ali at gmail.com (Ali Sheikholeslam)
Date: Sun, 14 Mar 2021 22:34:58 +0330
Subject: [Numpy-discussion] How to get Boolean matrix for similar lists in
 two different-size numpy arrays of lists
Message-ID: <CAHEdPti+aNVtTiiyAoMcZt-rqncMjVPdMTaQbabdT1RCvBhEgg@mail.gmail.com>

I have written a question in:
https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o
It was recommended by numpy to send this subject to the mailing lists.

The question is as follows. I would be appreciated if you could advise me
to solve the problem:

At first, I write a small example of to lists:

F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]]          # (1*5)     5 lists
S = [[1,3,7],[6,8,1],[3,2,7]]                          # (1*3)     3 lists

I want to get Boolean matrix for the same 'list's in two F and S:

[False, True, False, False, True]                      #  (1*5)    5
Booleans for 5 lists of F

By using IM = reduce(np.in1d, (F, S)) it gives results for each number in
each lists of F:

[ True  True  True  True  True  True False False  True False  True  True
  True  True  True]       # (1*15)

By using IM = reduce(np.isin, (F, S)) it gives results for each number in
each lists of F, too, but in another shape:

[[ True  True  True]
 [ True  True  True]
 [False False  True]
 [False  True  True]
 [ True  True  True]]           # (5*3)

The true result will be achieved by code IM = [i in S for i in F] for the
example lists, but when I'm using this code for my two main bigger numpy
arrays of lists:

https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing

numpy array: 3036 lists

https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing

numpy array: 300 lists

It gives wrong answer. For the main files it must give 3036 Boolean, in
which 'True' is only 300 numbers. I didn't understand why this get wrong
answers?? It seems it applied only on the 3rd characters in each lists of
F. It is preferred to use reduce function by the two functions, np.in1d and
np.isin, instead of the last method. How could to solve each of the three
above methods??
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210314/78a7d24b/attachment-0001.html>

From robert.kern at gmail.com  Sun Mar 14 15:35:09 2021
From: robert.kern at gmail.com (Robert Kern)
Date: Sun, 14 Mar 2021 15:35:09 -0400
Subject: [Numpy-discussion] How to get Boolean matrix for similar lists
 in two different-size numpy arrays of lists
In-Reply-To: <CAHEdPti+aNVtTiiyAoMcZt-rqncMjVPdMTaQbabdT1RCvBhEgg@mail.gmail.com>
References: <CAHEdPti+aNVtTiiyAoMcZt-rqncMjVPdMTaQbabdT1RCvBhEgg@mail.gmail.com>
Message-ID: <CAF6FJisf9hG6WimrT6KNrqWWNn1e4cjG003NkoE0T6VVLqBAiA@mail.gmail.com>

On Sun, Mar 14, 2021 at 3:06 PM Ali Sheikholeslam <
sheikholeslam.ali at gmail.com> wrote:

> I have written a question in:
>
> https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o
> It was recommended by numpy to send this subject to the mailing lists.
>
> The question is as follows. I would be appreciated if you could advise me
> to solve the problem:
>
> At first, I write a small example of to lists:
>
> F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]]          # (1*5)     5 lists
> S = [[1,3,7],[6,8,1],[3,2,7]]                          # (1*3)     3 lists
>
> I want to get Boolean matrix for the same 'list's in two F and S:
>
> [False, True, False, False, True]                      #  (1*5)    5 Booleans for 5 lists of F
>
> By using IM = reduce(np.in1d, (F, S)) it gives results for each number in
> each lists of F:
>
> [ True  True  True  True  True  True False False  True False  True  True
>   True  True  True]       # (1*15)
>
> By using IM = reduce(np.isin, (F, S)) it gives results for each number in
> each lists of F, too, but in another shape:
>
> [[ True  True  True]
>  [ True  True  True]
>  [False False  True]
>  [False  True  True]
>  [ True  True  True]]           # (5*3)
>
> The true result will be achieved by code IM = [i in S for i in F] for the
> example lists, but when I'm using this code for my two main bigger numpy
> arrays of lists:
>
>
> https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing
>
> numpy array: 3036 lists
>
>
> https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing
>
> numpy array: 300 lists
>
> It gives wrong answer. For the main files it must give 3036 Boolean, in
> which 'True' is only 300 numbers. I didn't understand why this get wrong
> answers?? It seems it applied only on the 3rd characters in each lists of
> F. It is preferred to use reduce function by the two functions, np.in1d and
> np.isin, instead of the last method. How could to solve each of the three
> above methods??
>

Thank you for providing the data. Can you show a complete, runnable code
sample that fails? There are several things that could go wrong here, and
we can't be sure which is which without the exact code that you ran.

In general, you may well have problems with the floating point data that
you are not seeing with your integer examples.

FWIW, I would continue to use something like the `IM = [i in S for i in F]`
list comprehension for data of this size. You aren't getting any benefit
trying to convert to arrays and using our array set operations. They are
written for 1D arrays of numbers, not 2D arrays (attempting to treat them
as 1D arrays of lists) and won't really work on your data.

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210314/a78105cd/attachment-0001.html>

From deak.andris at gmail.com  Sun Mar 14 15:45:58 2021
From: deak.andris at gmail.com (Andras Deak)
Date: Sun, 14 Mar 2021 20:45:58 +0100
Subject: [Numpy-discussion] How to get Boolean matrix for similar lists
 in two different-size numpy arrays of lists
In-Reply-To: <CAF6FJisf9hG6WimrT6KNrqWWNn1e4cjG003NkoE0T6VVLqBAiA@mail.gmail.com>
References: <CAHEdPti+aNVtTiiyAoMcZt-rqncMjVPdMTaQbabdT1RCvBhEgg@mail.gmail.com>
 <CAF6FJisf9hG6WimrT6KNrqWWNn1e4cjG003NkoE0T6VVLqBAiA@mail.gmail.com>
Message-ID: <CAMEWA4Pv93nR_tphgmjQRe3BaZ_JR-jmhKZTrcp_sKKbB=KjMQ@mail.gmail.com>

On Sun, Mar 14, 2021 at 8:35 PM Robert Kern <robert.kern at gmail.com> wrote:
>
> On Sun, Mar 14, 2021 at 3:06 PM Ali Sheikholeslam <sheikholeslam.ali at gmail.com> wrote:
>>
>> I have written a question in:
>> https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o
>> It was recommended by numpy to send this subject to the mailing lists.
>>
>> The question is as follows. I would be appreciated if you could advise me to solve the problem:
>>
>> At first, I write a small example of to lists:
>>
>> F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]]          # (1*5)     5 lists
>> S = [[1,3,7],[6,8,1],[3,2,7]]                          # (1*3)     3 lists
>>
>> I want to get Boolean matrix for the same 'list's in two F and S:
>>
>> [False, True, False, False, True]                      #  (1*5)    5 Booleans for 5 lists of F
>>
>> By using IM = reduce(np.in1d, (F, S)) it gives results for each number in each lists of F:
>>
>> [ True  True  True  True  True  True False False  True False  True  True
>>   True  True  True]       # (1*15)
>>
>> By using IM = reduce(np.isin, (F, S)) it gives results for each number in each lists of F, too, but in another shape:
>>
>> [[ True  True  True]
>>  [ True  True  True]
>>  [False False  True]
>>  [False  True  True]
>>  [ True  True  True]]           # (5*3)
>>
>> The true result will be achieved by code IM = [i in S for i in F] for the example lists, but when I'm using this code for my two main bigger numpy arrays of lists:
>>
>> https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing
>>
>> numpy array: 3036 lists
>>
>> https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing
>>
>> numpy array: 300 lists
>>
>> It gives wrong answer. For the main files it must give 3036 Boolean, in which 'True' is only 300 numbers. I didn't understand why this get wrong answers?? It seems it applied only on the 3rd characters in each lists of F. It is preferred to use reduce function by the two functions, np.in1d and np.isin, instead of the last method. How could to solve each of the three above methods??
>
>
> Thank you for providing the data. Can you show a complete, runnable code sample that fails? There are several things that could go wrong here, and we can't be sure which is which without the exact code that you ran.
>
> In general, you may well have problems with the floating point data that you are not seeing with your integer examples.
>
> FWIW, I would continue to use something like the `IM = [i in S for i in F]` list comprehension for data of this size.

Although somewhat off-topic for the numpy aspect, for completeness'
sake let me add that you'll probably want to first turn your list of
lists `S` into a set of tuples, and then look up each list in `F`
converted to a tuple (`[tuple(lst) in setified_S for lst in F]`). That
would probably be a lot faster for large lists.

Andr?s


You aren't getting any benefit trying to convert to arrays and using
our array set operations. They are written for 1D arrays of numbers,
not 2D arrays (attempting to treat them as 1D arrays of lists) and
won't really work on your data.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From blkzol001 at myuct.ac.za  Sun Mar 14 16:17:40 2021
From: blkzol001 at myuct.ac.za (zoj613)
Date: Sun, 14 Mar 2021 13:17:40 -0700 (MST)
Subject: [Numpy-discussion] How to get Boolean matrix for similar lists
 in two different-size numpy arrays of lists
In-Reply-To: <CAHEdPti+aNVtTiiyAoMcZt-rqncMjVPdMTaQbabdT1RCvBhEgg@mail.gmail.com>
References: <CAHEdPti+aNVtTiiyAoMcZt-rqncMjVPdMTaQbabdT1RCvBhEgg@mail.gmail.com>
Message-ID: <1615753060752-0.post@n7.nabble.com>

The following seems to produce what you want using the data provided

```
In [31]: dF = np.genfromtxt('/home/F.csv', delimiter=',').tolist()

In [32]: dS = np.genfromtxt('/home/S.csv', delimiter=',').tolist()

In [33]: r =  [True if i in lS else False for i in dF]

In [34]: sum(r)

Out[34]: 300
```

I hope this helps.


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From Jerome.Kieffer at esrf.fr  Mon Mar 15 03:58:14 2021
From: Jerome.Kieffer at esrf.fr (Jerome Kieffer)
Date: Mon, 15 Mar 2021 08:58:14 +0100
Subject: [Numpy-discussion] Numpy 1.20.1 availability
In-Reply-To: <CAKVJ-_5fvSe8WjWF6GHPmk5_ttnHtv5o2S+jENTZFrymw8DyiQ@mail.gmail.com>
References: <1615695178675-0.post@n7.nabble.com>
 <d825481a-903b-62c5-6a57-d181fcf57f59@gmail.com>
 <CAKVJ-_5fvSe8WjWF6GHPmk5_ttnHtv5o2S+jENTZFrymw8DyiQ@mail.gmail.com>
Message-ID: <20210315085814.41637650@antarctica.fournet.lan>

On Sun, 14 Mar 2021 10:14:13 +0000
Peter Cock <p.j.a.cock at googlemail.com> wrote:

> I'm impressed to see 17 million conda-forge numpy downloads, vs
> 'just' 2.5 million downloads of the default channel's package:

I doubt the download figures from conda are correct ... 

A couple of days after my software package has entered "conda-forge" its 
metric was already 2 orders of magnitude larger than any other distribution route:
pip, debian packages, ... Since I know the approximate size of the
community, I have some doubts on the figures.

I suspect downloads for CI are all accounted and none cached, ...

Cheers,

Jerome

From pierre.augier at univ-grenoble-alpes.fr  Mon Mar 15 07:29:02 2021
From: pierre.augier at univ-grenoble-alpes.fr (PIERRE AUGIER)
Date: Mon, 15 Mar 2021 12:29:02 +0100 (CET)
Subject: [Numpy-discussion] Perf regression with Pythran between Numpy
 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f,
 ENH: implement NEP-35's `like=` argument)
In-Reply-To: <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com>
References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com>
Message-ID: <137390869.547851.1615807742248.JavaMail.zimbra@univ-grenoble-alpes.fr>


----- Mail original -----
> De: "Juan Nunez-Iglesias" <jni at fastmail.com>
> ?: "numpy-discussion" <numpy-discussion at python.org>
> Envoy?: Dimanche 14 Mars 2021 07:15:39
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
> Pythran

> Hi Pierre,
> 
> If you?re able to compile NumPy locally and you have reliable benchmarks, you
> can write a script that tests the runtime of your benchmark and reports it as a
> test pass/fail. You can then use ?git bisect run? to automatically find the
> commit that caused the issue. That will help narrow down the discussion before
> it gets completely derailed a second time. ?
> 
> [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ]
> 
> Juan.

Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg bisect` I managed to find that the first "bad" commit is

https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f   ENH: implement NEP-35's `like=` argument (gh-16935)

>From the point of view of my benchmark, this commit changes the behavior of arr.copy() (the resulting arrays do not give to the same performance). This makes sense because it is indeed about the array creation.

I haven't yet studied in details this commit (which is quite big and not simple) and I'm not sure I'm going to be able to understand it and in particular understand why it leads to such performance regression!

Cheers,

Pierre

> 
> 
> On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
> <pierre.augier at univ-grenoble-alpes.fr> wrote:
> 
> 
> 
> 
> Hi,
> 
> I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy
> --force-reinstall` and I can reproduce the regression.
> 
> Good news, I was able to reproduce the difference with only Numpy 1.20.1.
> 
> Arrays prepared with (`df` is a Pandas dataframe)
> 
> arr = df.values.copy()
> 
> or
> 
> arr = np.ascontiguousarray(df.values)
> 
> lead to "slow" execution while arrays prepared with
> 
> arr = np.copy(df.values)
> 
> lead to faster execution.
> 
> arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a
> Pandas dataframe with arr = df.values. It's strange because type(df.values)
> gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to
> give exactly the same result.
> 
> Note that I think I'm doing quite serious and reproducible benchmarks. I also
> checked that this regression is reproducible on another computer.
> 
> Cheers,
> 
> Pierre
> 
> ----- Mail original -----
> 
> 
> De: "Sebastian Berg" <sebastian at sipsolutions.net>
> 
> 
> ?: "numpy-discussion" <numpy-discussion at python.org>
> 
> 
> Envoy?: Vendredi 12 Mars 2021 22:50:24
> 
> 
> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and
> 0.20 explaining a perf regression with
> 
> 
> Pythran
> 
> 
> 
> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> 
> 
> 
> 
> Hi,
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> 
> 
> 
> 
> could explain a performance regression (~15 %) with Pythran.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I observe this regression with the script
> 
> 
> 
> 
> https://github.com/paugier/nbabel/blob/master/py/bench.py
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Pythran reimplements Numpy so it is not about Numpy code for
> 
> 
> 
> 
> computation. However, Pythran of course uses the native array
> 
> 
> 
> 
> contained in a Numpy array. I'm quite sure that something has changed
> 
> 
> 
> 
> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
> 
> 
> 
> 
> since I don't get the same performance with Numpy 0.20. I checked
> 
> 
> 
> 
> that the values in the arrays are the same and that the flags
> 
> 
> 
> 
> characterizing the arrays are also the same.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Good news, I'm now able to obtain the performance difference just
> 
> 
> 
> 
> with Numpy 0.19.5. In this code, I load the data with Pandas and need
> 
> 
> 
> 
> to prepare contiguous Numpy arrays to give them to Pythran. With
> 
> 
> 
> 
> Numpy 0.19.5, if I use np.copy I get better performance that with
> 
> 
> 
> 
> np.ascontiguousarray. With Numpy 0.20, both functions create array
> 
> 
> 
> 
> giving the same performance with Pythran (again, less good that with
> 
> 
> 
> 
> Numpy 0.19.5).
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Note that this code is very efficient (more that 100 times faster
> 
> 
> 
> 
> than using Numpy), so I guess that things like alignment or memory
> 
> 
> 
> 
> location can lead to such difference.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> More details in this issue
> 
> 
> 
> 
> https://github.com/serge-sans-paille/pythran/issues/1735
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Any help to understand what has changed would be greatly appreciated!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> If you want to really dig into this, it would be good to do profiling
> 
> 
> to find out at where the differences are.
> 
> 
> 
> 
> 
> Without that, I don't have much appetite to investigate personally. The
> 
> 
> reason is that fluctuations of ~30% (or even much more) when running
> 
> 
> the NumPy benchmarks are very common.
> 
> 
> 
> 
> 
> I am not aware of an immediate change in NumPy, especially since you
> 
> 
> are talking pythran, and only the memory space or the interface code
> 
> 
> should matter.
> 
> 
> As to the interface code... I would expect it to be quite a bit faster,
> 
> 
> not slower.
> 
> 
> There was no change around data allocation, so at best what you are
> 
> 
> seeing is a different pattern in how the "small array cache" ends up
> 
> 
> being used.
> 
> 
> 
> 
> 
> 
> 
> 
> Unfortunately, getting stable benchmarks that reflect code changes
> 
> 
> exactly is tough... Here is a nice blog post from Victor Stinner where
> 
> 
> he had to go as far as using "profile guided compilation" to avoid
> 
> 
> fluctuations:
> 
> 
> 
> 
> 
> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> 
> 
> 
> 
> 
> I somewhat hope that this is also the reason for the huge fluctuations
> 
> 
> we see in the NumPy benchmarks due to absolutely unrelated code
> 
> 
> changes.
> 
> 
> But I did not have the energy to try it (and a probably fixed bug in
> 
> 
> gcc makes it a bit harder right now).
> 
> 
> 
> 
> 
> Cheers,
> 
> 
> 
> 
> 
> Sebastian
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Cheers,
> 
> 
> 
> 
> Pierre
> 
> 
> 
> 
> _______________________________________________
> 
> 
> 
> 
> NumPy-Discussion mailing list
> 
> 
> 
> 
> NumPy-Discussion at python.org
> 
> 
> 
> 
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> 
> 
> NumPy-Discussion mailing list
> 
> 
> NumPy-Discussion at python.org
> 
> 
> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

From peter at entschev.com  Mon Mar 15 09:59:21 2021
From: peter at entschev.com (Peter Andreas Entschev)
Date: Mon, 15 Mar 2021 14:59:21 +0100
Subject: [Numpy-discussion] Perf regression with Pythran between Numpy
 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f,
 ENH: implement NEP-35's `like=` argument)
In-Reply-To: <137390869.547851.1615807742248.JavaMail.zimbra@univ-grenoble-alpes.fr>
References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com>
 <137390869.547851.1615807742248.JavaMail.zimbra@univ-grenoble-alpes.fr>
Message-ID: <CANxROpw6rNy6s+nBxd+Jb2z9mAfLQ24ML7HhgzUO8mBucRdnyA@mail.gmail.com>

Hi Pierre,

Thanks for pinging me. To put it in the simplest way possible, that PR
adds a new `like` kwarg that will dispatch to downstream libraries
using `__array_function__` when specified, otherwise fallback to the
default behavior of NumPy. While that introduces an extra check on the
C side, that should have minimal impact for use cases that don't use
the `like` kwarg.

Is there a simple reproducer with NumPy only? I assume your case with
Pandas is much more complex (unfortunately I'm not very experienced
with DataFrames), but curiously I see NumPy 1.20.1 being considerably
faster for small arrays and mildly-faster with large arrays (results
in https://gist.github.com/pentschev/add38b5aee61da87b4b70a1c4649861f)
.

Best,
Peter


On Mon, Mar 15, 2021 at 12:29 PM PIERRE AUGIER
<pierre.augier at univ-grenoble-alpes.fr> wrote:
>
>
> ----- Mail original -----
> > De: "Juan Nunez-Iglesias" <jni at fastmail.com>
> > ?: "numpy-discussion" <numpy-discussion at python.org>
> > Envoy?: Dimanche 14 Mars 2021 07:15:39
> > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with
> > Pythran
>
> > Hi Pierre,
> >
> > If you?re able to compile NumPy locally and you have reliable benchmarks, you
> > can write a script that tests the runtime of your benchmark and reports it as a
> > test pass/fail. You can then use ?git bisect run? to automatically find the
> > commit that caused the issue. That will help narrow down the discussion before
> > it gets completely derailed a second time.
> >
> > [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ]
> >
> > Juan.
>
> Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg bisect` I managed to find that the first "bad" commit is
>
> https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f   ENH: implement NEP-35's `like=` argument (gh-16935)
>
> From the point of view of my benchmark, this commit changes the behavior of arr.copy() (the resulting arrays do not give to the same performance). This makes sense because it is indeed about the array creation.
>
> I haven't yet studied in details this commit (which is quite big and not simple) and I'm not sure I'm going to be able to understand it and in particular understand why it leads to such performance regression!
>
> Cheers,
>
> Pierre
>
> >
> >
> > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
> > <pierre.augier at univ-grenoble-alpes.fr> wrote:
> >
> >
> >
> >
> > Hi,
> >
> > I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy
> > --force-reinstall` and I can reproduce the regression.
> >
> > Good news, I was able to reproduce the difference with only Numpy 1.20.1.
> >
> > Arrays prepared with (`df` is a Pandas dataframe)
> >
> > arr = df.values.copy()
> >
> > or
> >
> > arr = np.ascontiguousarray(df.values)
> >
> > lead to "slow" execution while arrays prepared with
> >
> > arr = np.copy(df.values)
> >
> > lead to faster execution.
> >
> > arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a
> > Pandas dataframe with arr = df.values. It's strange because type(df.values)
> > gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) to
> > give exactly the same result.
> >
> > Note that I think I'm doing quite serious and reproducible benchmarks. I also
> > checked that this regression is reproducible on another computer.
> >
> > Cheers,
> >
> > Pierre
> >
> > ----- Mail original -----
> >
> >
> > De: "Sebastian Berg" <sebastian at sipsolutions.net>
> >
> >
> > ?: "numpy-discussion" <numpy-discussion at python.org>
> >
> >
> > Envoy?: Vendredi 12 Mars 2021 22:50:24
> >
> >
> > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and
> > 0.20 explaining a perf regression with
> >
> >
> > Pythran
> >
> >
> >
> > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> >
> >
> >
> >
> > Hi,
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> >
> >
> >
> >
> > could explain a performance regression (~15 %) with Pythran.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I observe this regression with the script
> >
> >
> >
> >
> > https://github.com/paugier/nbabel/blob/master/py/bench.py
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Pythran reimplements Numpy so it is not about Numpy code for
> >
> >
> >
> >
> > computation. However, Pythran of course uses the native array
> >
> >
> >
> >
> > contained in a Numpy array. I'm quite sure that something has changed
> >
> >
> >
> >
> > between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?)
> >
> >
> >
> >
> > since I don't get the same performance with Numpy 0.20. I checked
> >
> >
> >
> >
> > that the values in the arrays are the same and that the flags
> >
> >
> >
> >
> > characterizing the arrays are also the same.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Good news, I'm now able to obtain the performance difference just
> >
> >
> >
> >
> > with Numpy 0.19.5. In this code, I load the data with Pandas and need
> >
> >
> >
> >
> > to prepare contiguous Numpy arrays to give them to Pythran. With
> >
> >
> >
> >
> > Numpy 0.19.5, if I use np.copy I get better performance that with
> >
> >
> >
> >
> > np.ascontiguousarray. With Numpy 0.20, both functions create array
> >
> >
> >
> >
> > giving the same performance with Pythran (again, less good that with
> >
> >
> >
> >
> > Numpy 0.19.5).
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Note that this code is very efficient (more that 100 times faster
> >
> >
> >
> >
> > than using Numpy), so I guess that things like alignment or memory
> >
> >
> >
> >
> > location can lead to such difference.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > More details in this issue
> >
> >
> >
> >
> > https://github.com/serge-sans-paille/pythran/issues/1735
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Any help to understand what has changed would be greatly appreciated!
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > If you want to really dig into this, it would be good to do profiling
> >
> >
> > to find out at where the differences are.
> >
> >
> >
> >
> >
> > Without that, I don't have much appetite to investigate personally. The
> >
> >
> > reason is that fluctuations of ~30% (or even much more) when running
> >
> >
> > the NumPy benchmarks are very common.
> >
> >
> >
> >
> >
> > I am not aware of an immediate change in NumPy, especially since you
> >
> >
> > are talking pythran, and only the memory space or the interface code
> >
> >
> > should matter.
> >
> >
> > As to the interface code... I would expect it to be quite a bit faster,
> >
> >
> > not slower.
> >
> >
> > There was no change around data allocation, so at best what you are
> >
> >
> > seeing is a different pattern in how the "small array cache" ends up
> >
> >
> > being used.
> >
> >
> >
> >
> >
> >
> >
> >
> > Unfortunately, getting stable benchmarks that reflect code changes
> >
> >
> > exactly is tough... Here is a nice blog post from Victor Stinner where
> >
> >
> > he had to go as far as using "profile guided compilation" to avoid
> >
> >
> > fluctuations:
> >
> >
> >
> >
> >
> > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> >
> >
> >
> >
> >
> > I somewhat hope that this is also the reason for the huge fluctuations
> >
> >
> > we see in the NumPy benchmarks due to absolutely unrelated code
> >
> >
> > changes.
> >
> >
> > But I did not have the energy to try it (and a probably fixed bug in
> >
> >
> > gcc makes it a bit harder right now).
> >
> >
> >
> >
> >
> > Cheers,
> >
> >
> >
> >
> >
> > Sebastian
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Cheers,
> >
> >
> >
> >
> > Pierre
> >
> >
> >
> >
> > _______________________________________________
> >
> >
> >
> >
> > NumPy-Discussion mailing list
> >
> >
> >
> >
> > NumPy-Discussion at python.org
> >
> >
> >
> >
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> >
> > NumPy-Discussion mailing list
> >
> >
> > NumPy-Discussion at python.org
> >
> >
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion

From sebastian at sipsolutions.net  Mon Mar 15 11:11:20 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Mon, 15 Mar 2021 10:11:20 -0500
Subject: [Numpy-discussion] Perf regression with Pythran between Numpy
 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f,
 ENH: implement NEP-35's `like=` argument)
In-Reply-To: <CANxROpw6rNy6s+nBxd+Jb2z9mAfLQ24ML7HhgzUO8mBucRdnyA@mail.gmail.com>
References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com>
 <137390869.547851.1615807742248.JavaMail.zimbra@univ-grenoble-alpes.fr>
 <CANxROpw6rNy6s+nBxd+Jb2z9mAfLQ24ML7HhgzUO8mBucRdnyA@mail.gmail.com>
Message-ID: <4891f157d73671aabdf983b11a405df0f63146d2.camel@sipsolutions.net>

On Mon, 2021-03-15 at 14:59 +0100, Peter Andreas Entschev wrote:
> Hi Pierre,
> 
> Thanks for pinging me. To put it in the simplest way possible, that
> PR
> adds a new `like` kwarg that will dispatch to downstream libraries
> using `__array_function__` when specified, otherwise fallback to the
> default behavior of NumPy. While that introduces an extra check on
> the
> C side, that should have minimal impact for use cases that don't use
> the `like` kwarg.
> 
> Is there a simple reproducer with NumPy only? I assume your case with
> Pandas is much more complex (unfortunately I'm not very experienced
> with DataFrames), but curiously I see NumPy 1.20.1 being considerably
> faster for small arrays and mildly-faster with large arrays (results
> in
> https://gist.github.com/pentschev/add38b5aee61da87b4b70a1c4649861f)
> .


1.20.1 should have some small overhead reductions there since array the
array-object life-cycle is probably around 30% faster (deleting an
array is faster).  But the array-object life-cycle is pretty
insignificant aside from creating views.
There are also many performance improvements around SIMD, which will
affect certain math operations.

The changes on that PR may add additional overhead to array creation
(something that should go away again in 1.21 and end up being much
faster when https://github.com/numpy/numpy/pull/15270 goes in).  But
that is all.


As much as I would love to have an answer, looking for changes in the
NumPy code seems to me unlikely to get you anywhere. Another example,
check out this benchmark from the NumPy benchmarks:


https://pv.github.io/numpy-bench/index.html#bench_reduce.AddReduceSeparate.time_reduce?cpu=Intel(R)%20Core(TM)%20i7%20CPU%20920%20%40%202.67GHz&machine=i7&os=Linux&ram=16416652&Cython=0.29.21&p-axis=1&p-type='int16'&p-type='int32'

It keeps jumping back and forth around 30% for the 'int16' version, but
the 'int32' one is pretty much stable, so its unlikely to be just bad
benchmarking.

Right now, I am willing to get that if you repeat that whole thing with
a different commit range, you will find another random bad commit.

Cheers,

Sebastian


> 
> Best,
> Peter
> 
> 
> 
> On Mon, Mar 15, 2021 at 12:29 PM PIERRE AUGIER
> <pierre.augier at univ-grenoble-alpes.fr> wrote:
> > 
> > 
> > ----- Mail original -----
> > > De: "Juan Nunez-Iglesias" <jni at fastmail.com>
> > > ?: "numpy-discussion" <numpy-discussion at python.org>
> > > Envoy?: Dimanche 14 Mars 2021 07:15:39
> > > Objet: Re: [Numpy-discussion] Looking for a difference between
> > > Numpy 0.19.5 and 0.20 explaining a perf regression with
> > > Pythran
> > 
> > > Hi Pierre,
> > > 
> > > If you?re able to compile NumPy locally and you have reliable
> > > benchmarks, you
> > > can write a script that tests the runtime of your benchmark and
> > > reports it as a
> > > test pass/fail. You can then use ?git bisect run? to
> > > automatically find the
> > > commit that caused the issue. That will help narrow down the
> > > discussion before
> > > it gets completely derailed a second time.
> > > 
> > > [ https://lwn.net/Articles/317154/?| 
> > > https://lwn.net/Articles/317154/?]
> > > 
> > > Juan.
> > 
> > Thanks a lot for this advice Juan! I wasn't able to use Git but
> > with `hg bisect` I managed to find that the first "bad" commit is
> > 
> > https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f
> > ?? ENH: implement NEP-35's `like=` argument (gh-16935)
> > 
> > From the point of view of my benchmark, this commit changes the
> > behavior of arr.copy() (the resulting arrays do not give to the
> > same performance). This makes sense because it is indeed about the
> > array creation.
> > 
> > I haven't yet studied in details this commit (which is quite big
> > and not simple) and I'm not sure I'm going to be able to understand
> > it and in particular understand why it leads to such performance
> > regression!
> > 
> > Cheers,
> > 
> > Pierre
> > 
> > > 
> > > 
> > > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER
> > > <pierre.augier at univ-grenoble-alpes.fr> wrote:
> > > 
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-
> > > binary numpy
> > > --force-reinstall` and I can reproduce the regression.
> > > 
> > > Good news, I was able to reproduce the difference with only Numpy
> > > 1.20.1.
> > > 
> > > Arrays prepared with (`df` is a Pandas dataframe)
> > > 
> > > arr = df.values.copy()
> > > 
> > > or
> > > 
> > > arr = np.ascontiguousarray(df.values)
> > > 
> > > lead to "slow" execution while arrays prepared with
> > > 
> > > arr = np.copy(df.values)
> > > 
> > > lead to faster execution.
> > > 
> > > arr.copy() or np.copy(arr) do not give the same result, with arr
> > > obtained from a
> > > Pandas dataframe with arr = df.values. It's strange because
> > > type(df.values)
> > > gives <class 'numpy.ndarray'> so I would expect arr.copy() and
> > > np.copy(arr) to
> > > give exactly the same result.
> > > 
> > > Note that I think I'm doing quite serious and reproducible
> > > benchmarks. I also
> > > checked that this regression is reproducible on another computer.
> > > 
> > > Cheers,
> > > 
> > > Pierre
> > > 
> > > ----- Mail original -----
> > > 
> > > 
> > > De: "Sebastian Berg" <sebastian at sipsolutions.net>
> > > 
> > > 
> > > ?: "numpy-discussion" <numpy-discussion at python.org>
> > > 
> > > 
> > > Envoy?: Vendredi 12 Mars 2021 22:50:24
> > > 
> > > 
> > > Objet: Re: [Numpy-discussion] Looking for a difference between
> > > Numpy 0.19.5 and
> > > 0.20 explaining a perf regression with
> > > 
> > > 
> > > Pythran
> > > 
> > > 
> > > 
> > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote:
> > > 
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which
> > > 
> > > 
> > > 
> > > 
> > > could explain a performance regression (~15 %) with Pythran.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > I observe this regression with the script
> > > 
> > > 
> > > 
> > > 
> > > https://github.com/paugier/nbabel/blob/master/py/bench.py
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Pythran reimplements Numpy so it is not about Numpy code for
> > > 
> > > 
> > > 
> > > 
> > > computation. However, Pythran of course uses the native array
> > > 
> > > 
> > > 
> > > 
> > > contained in a Numpy array. I'm quite sure that something has
> > > changed
> > > 
> > > 
> > > 
> > > 
> > > between Numpy 0.19.5 and 0.20 (or between the corresponding
> > > wheels?)
> > > 
> > > 
> > > 
> > > 
> > > since I don't get the same performance with Numpy 0.20. I checked
> > > 
> > > 
> > > 
> > > 
> > > that the values in the arrays are the same and that the flags
> > > 
> > > 
> > > 
> > > 
> > > characterizing the arrays are also the same.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Good news, I'm now able to obtain the performance difference just
> > > 
> > > 
> > > 
> > > 
> > > with Numpy 0.19.5. In this code, I load the data with Pandas and
> > > need
> > > 
> > > 
> > > 
> > > 
> > > to prepare contiguous Numpy arrays to give them to Pythran. With
> > > 
> > > 
> > > 
> > > 
> > > Numpy 0.19.5, if I use np.copy I get better performance that with
> > > 
> > > 
> > > 
> > > 
> > > np.ascontiguousarray. With Numpy 0.20, both functions create
> > > array
> > > 
> > > 
> > > 
> > > 
> > > giving the same performance with Pythran (again, less good that
> > > with
> > > 
> > > 
> > > 
> > > 
> > > Numpy 0.19.5).
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Note that this code is very efficient (more that 100 times faster
> > > 
> > > 
> > > 
> > > 
> > > than using Numpy), so I guess that things like alignment or
> > > memory
> > > 
> > > 
> > > 
> > > 
> > > location can lead to such difference.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > More details in this issue
> > > 
> > > 
> > > 
> > > 
> > > https://github.com/serge-sans-paille/pythran/issues/1735
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Any help to understand what has changed would be greatly
> > > appreciated!
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > If you want to really dig into this, it would be good to do
> > > profiling
> > > 
> > > 
> > > to find out at where the differences are.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Without that, I don't have much appetite to investigate
> > > personally. The
> > > 
> > > 
> > > reason is that fluctuations of ~30% (or even much more) when
> > > running
> > > 
> > > 
> > > the NumPy benchmarks are very common.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > I am not aware of an immediate change in NumPy, especially since
> > > you
> > > 
> > > 
> > > are talking pythran, and only the memory space or the interface
> > > code
> > > 
> > > 
> > > should matter.
> > > 
> > > 
> > > As to the interface code... I would expect it to be quite a bit
> > > faster,
> > > 
> > > 
> > > not slower.
> > > 
> > > 
> > > There was no change around data allocation, so at best what you
> > > are
> > > 
> > > 
> > > seeing is a different pattern in how the "small array cache" ends
> > > up
> > > 
> > > 
> > > being used.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Unfortunately, getting stable benchmarks that reflect code
> > > changes
> > > 
> > > 
> > > exactly is tough... Here is a nice blog post from Victor Stinner
> > > where
> > > 
> > > 
> > > he had to go as far as using "profile guided compilation" to
> > > avoid
> > > 
> > > 
> > > fluctuations:
> > > 
> > > 
> > > 
> > > 
> > > 
> > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html
> > > 
> > > 
> > > 
> > > 
> > > 
> > > I somewhat hope that this is also the reason for the huge
> > > fluctuations
> > > 
> > > 
> > > we see in the NumPy benchmarks due to absolutely unrelated code
> > > 
> > > 
> > > changes.
> > > 
> > > 
> > > But I did not have the energy to try it (and a probably fixed bug
> > > in
> > > 
> > > 
> > > gcc makes it a bit harder right now).
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Cheers,
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Sebastian
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Cheers,
> > > 
> > > 
> > > 
> > > 
> > > Pierre
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > 
> > > 
> > > 
> > > 
> > > NumPy-Discussion mailing list
> > > 
> > > 
> > > 
> > > 
> > > NumPy-Discussion at python.org
> > > 
> > > 
> > > 
> > > 
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > 
> > > 
> > > NumPy-Discussion mailing list
> > > 
> > > 
> > > NumPy-Discussion at python.org
> > > 
> > > 
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210315/46f68aef/attachment-0001.sig>

From ralf.gommers at gmail.com  Mon Mar 15 16:14:00 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Mon, 15 Mar 2021 21:14:00 +0100
Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47)
In-Reply-To: <3ba55e0fe50da814b486a73855da35770e50303b.camel@sipsolutions.net>
References: <CABL7CQhBcxRsba4hu2YXC66Rm-j6AFZXBy+P+9bhTZ8h3YR89w@mail.gmail.com>
 <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net>
 <CABL7CQjGe7TPQKVFsZ=eoRyN5NHj5tZq+Hu+3R4rZm7_+3RxHQ@mail.gmail.com>
 <3ba55e0fe50da814b486a73855da35770e50303b.camel@sipsolutions.net>
Message-ID: <CABL7CQjNs7hLFqP-Cy_dCiv9wxQPABTJ3HxuuVo37MYv1nzOaQ@mail.gmail.com>

On Thu, Mar 11, 2021 at 6:08 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Thu, 2021-03-11 at 12:37 +0100, Ralf Gommers wrote:
> > On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg <
> > sebastian at sipsolutions.net>
> > wrote:
> >
> > > Top Posting, to discuss post specific questions about NEP 47 and
> > > partially the start on implementing it in:
> > >
> > >     https://github.com/numpy/numpy/pull/18585
> > >
> > > There are probably many more that will crop up. But for me, each of
> > > these is a pretty major difficulty without a clear answer as of
> > > now.
> > >
> >
> > All great questions, that Sebastian. Let me reply to the questions
> > that
> > Aaron didn't reply to inline below.
> >
>
> To be clear, I do not expect complete answers to these questions right
> now.  (Although being unsure about some of them does make me slightly
> reluctant to merge the work-in-progress into NumPy proper as opposed to
> a separate repo.)
>
> Also, yes, most/all questions are hopefully are just trivialities to
> check of (or no more than seeds for thought).  Or even just a starting
> point for making NEP 47's "Usage and Impact" section more complete
> including them as either "example usage patterns" or "limitations".
>

Yes, those are always good to have more of.


>
> My second takeaway from the questions is that I have doubts the
> "minimal" version will pan out, it feels like many of the questions
> might disappear if you drop that part.


My impression is that a strictly compliant (or "minimal") version is *more*
useful than something that's a mix between portable and non-portable
functionality. The reason to add more than the minimum required
functionality would be that it's too hard to hide the numpy-specific
extras. E.g., if we'd do `np.array_api.int32 = np.int32` then that dtype
would have methods and behavior that's NumPy-specific. But it'd be hard to
hide, so we'd accept it.

It's maybe easier to discuss in a call, I've put it on the community
meeting agenda.


> So, from my current thinking,
> the minimal implementation may not be a good "NEP 47" implementation.
>
> That does _not_ mean that I think you should pause and reconsider or
> even worry about pleasing me with good answers!  Just continue under
> whatever assumption you prefer and if it turns out that "minimal" won't
> work for NEP 47: no harm done!  We need a "minimal implementation" in
> any case.
>

Yes, I agree.


> Cheers,
>
> Sebastian
>
>
>
> [1] If SciPy needs an additional NumPy code path to keep support
> `object` arrays or other dtypes ? right now even complex ?, then the
> reader needs to be aware of that to make a decision if NEP 47 will
> actually help for their library.
>

Clearly. This is why we'd like to have some WIP PRs for other libraries,
actual code to review will be more helpful than only a proposal.


> Will AstroPy have to reimplement `astropy.units.Quantity` to be
> "standard conform" (is that even possible!?) before it can easily adopt
> it any of its API that currently works with `astropy.units.Quantity`?
>

I'm not sure if the question is well-defined, so let me answer both cases:

1. If the APIs in question require units, then there's no other
array/tensor types that have unit support, so those APIs accept *only*
Quantity. Adopting the standard isn't possible.

2. If the units are unnecessary/optional, then Quantity is not special and
can be treated exactly the same as a `numpy.ndarray`. We don't intend to
make any changes to how ndarray subclasses work, so if ndarray works with
that API after adoption of the standard then Quantity works too.

Cheers,
Ralf


>
> >
> > > 1. I still need clarity how a library is supposed to use this
> > > namespace
> > > when the user passes in a NumPy array (mentioned before).  The user
> > > must get back a NumPy array after all.  Maybe that is just a
> > > decorator,
> > > but it seems important.
> > >
> >
> > I agree that it will be a common pattern that libraries will accept
> > all
> > standard-compliant array types plus numpy.ndarray. And the output
> > array
> > type should match the input type. In Aaron's implementation the new
> > array
> > object has a numpy.ndarray as private attribute, so that's the
> > instance
> > that should be returned. A decorator seems like a sensible way to
> > handle
> > that. Or a simple utility function, something like `return
> > correct_arraytype(out)`.
> >
> > Either way, that pattern should be added to NEP 47. I don't see a
> > fundamental problem here, we just need to find the nicest UX for it.
> >
> > 3. For all other function, the same problem applies. You don't
> > actually
> > > have anything to fix NumPy promotion rules.  You could bake your
> > > own
> > > cake here for numeric types, but I am not sure, you might also need
> > > NEP
> > > 43 in all its promotion power to pull it off.
> > >
> >
> > This is probably the single most difficult question implementation-
> > wise.
> > Note that there are only numerical dtypes (plus boolean), so dealing
> > with
> > string, datetime, object or third-party dtypes is a non-issue.
> >
> > 4. The PR makes no attempt at handling binary operators in any way
> > > aside from greedily coercing the other operand.
> > >
> >
> > Agreed. This is the same point as (3) I think - how to handle dtype
> > promotion is the main open question.
> >
> >
> > > 5. What happens with a mix of array-likes or even array subclasses
> > > like
> > > `astropy.quantity`?
> > >
> >
> > Array-likes (e.g. list) should raise an exception, the NEP clearly
> > says "do
> > not accept array_like dtypes". This is what every other array/tensor
> > library already does.
> >
> > Array subclasses should work as expected, assuming they're valid
> > subclasses
> > and not things like np.matrix. Using Mypy will help avoid writing
> > more
> > subclasses that break the Liskov substitution principle. More
> > comments in
> >
> https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern
> >
> > Mixing two different types of arrays into a single function call
> > should
> > raise an exception. A design goal is: enable writing functions
> > `somefunc(x1, x2)` that work for any type of array where `x1, x2`
> > come from
> > the same library = so they're either the same type, or two types for
> > which
> > the library itself knows how to mix them. If x1 and x2 are from
> > different
> > libraries, this will raise an exception.
> >
> > To be clear, it is not intended that `np.array_api.somefunc(x_cupy)`
> > works
> > - this will raise an exception.
> >
> > Cheers,
> > Ralf
> >
> >
> >
> > >
> > > I don't think we have to figure out everything up-front, but I do
> > > think
> > > there are a few very fundamental questions still open, at least for
> > > me
> > > personally.
> > >
> > > Cheers,
> > >
> > > Sebastian
> > >
> > >
> > >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210315/936ae9ae/attachment.html>

From lee.johnston.100 at gmail.com  Tue Mar 16 14:17:42 2021
From: lee.johnston.100 at gmail.com (Lee Johnston)
Date: Tue, 16 Mar 2021 13:17:42 -0500
Subject: [Numpy-discussion] NEP 42 status
Message-ID: <CAO3muZTyVb4BStoEQe55oYabqChW8GN=74f+JmCKxa-qiRoXOg@mail.gmail.com>

Is the work on NEP 42 custom DTypes far enough along to experiment with?

Lee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210316/1f05ef24/attachment.html>

From sebastian at sipsolutions.net  Tue Mar 16 17:10:47 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 16 Mar 2021 16:10:47 -0500
Subject: [Numpy-discussion] NEP 42 status
In-Reply-To: <CAO3muZTyVb4BStoEQe55oYabqChW8GN=74f+JmCKxa-qiRoXOg@mail.gmail.com>
References: <CAO3muZTyVb4BStoEQe55oYabqChW8GN=74f+JmCKxa-qiRoXOg@mail.gmail.com>
Message-ID: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net>

On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote:
> Is the work on NEP 42 custom DTypes far enough along to experiment
> with?
> 

TL;DR:  Its not quite ready, but if we work together I think we could
experiment a fair bit.  Mainly ufuncs are still limited (though not
quite completely missing).  The main problem is that we need to find a
way to expose the currently private API.

I would be happy to discuss this also in a call.


** The long story: **

There is one more PR related to casting, for which merge should be
around the corner. And which would bring a lot bang to such an
experiment:

https://github.com/numpy/numpy/pull/18398


At that point, the new machinery supports (or is used for):

* Array-coercion: `np.array([your_scalar])` or
  `np.array([1], dtype=your_dtype)`.

* Casting (practically full support).

* UFuncs do not quite work. But short of writing `np.add(arr1, arr2)`
  with your DType involved, you can try a whole lot. (see below)

* Promotion `np.result_type` should work very soon, but probably isn't
  is not very relevant anyway until ufuncs are fully implemented.

That should allow you to do a lot of good experimentation, but due to
the ufunc limitation, maybe not well on "existing" python code.


The long story about limitations is:

We are missing exposure of the new public API.  I think I should be
able to provide a solution for this pretty quickly, but it might
require working of a NumPy branch.  (I will write another email about
it, hopefully we can find a better solution.)


Limitations for UFuncs:  UFuncs are the next big project, so to try it
fully you will need some patience, unfortunately.

But, there is some good news!  You can write most of the "ufunc"
already, you just can't "register" it.
So what I can already offer you is a "DType-specific UFunc", e.g.:

   unit_dtype_multiply(np.array([1.], dtype=Float64UnitDType("m")),
                       np.array([2.], dtype=Float64UnitDtype("s")))

And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`.

But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` yet. 
Both registration and "promotion" logic are missing.

I admit promotion may be one of the trickiest things, but trying this a
bit might help with getting a clearer picture for promotion as well.


The main last limitation is that I did not replace or create "fallback"
solutions and/or replacement for the legacy `dtype->f-><slots>` yet. 
This is not a serious limitation for experimentation, though.  It might
even make sense to keep some of them around and replace them slowly.


And of course, all the small issues/limitations that are not fixed
because nobody tried yet...


I hope this doesn't scare you away, or at least not for long :/.  It
could be very useful to start experimentation soon to push things
forward a bit quicker.  And I really want to have at least an
experimental version in NumPy 1.21.

Cheers,

Sebastian


> Lee
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210316/7f598332/attachment.sig>

From sebastian at sipsolutions.net  Tue Mar 16 17:38:49 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 16 Mar 2021 16:38:49 -0500
Subject: [Numpy-discussion] Exposing experimental C-API for DTypes
Message-ID: <5facef5705009f1f1fed40fd5579bc0a828aff63.camel@sipsolutions.net>

Hi all,

For DTypes, it may soon make sense to expose API publicly for
testing/experimentation.  But, right now I don't really want to get
roped into discussing API details too much and slowing down potential
revamps.
Do we have any idea for exposing an "experimental" API?


The first option would be "symbols" with an underscore in the name and
an understanding that using them might just break if you don't use the
exact version you wrote the code for and compiled with (i.e. no API/ABI
guarantee).
My current expectation is that everyone will be appalled by such a
plan...


For a single, simple project which would end up as a test, similar to
the `rational` tests, we could work in NumPy itself. That is fine, but
fairly strictly constrained...


Of course I can make a "branch" of NumPy that exports more API, but
that doesn't feel great either, it seems a bit clunky.


The last idea I have right now is a bit convoluted but safe: We add a
private python function:

    np.core._multiarray_umath.get_new_dtype_api(api_version)

and a corresponding header (potentially outside of NumPy).  The header
would include an:

    import_new_dtype_api()

macro/function that leverages the private "Python" function.  To import
the API (much like `import_array()` works).
Since it would use its own header, it could do strict version checks.
And since it would have to "ask numpy", NumPy could require an
environment variable to be set and/or print out a warning.


Am I missing some obvious solution?  Aside from "be patient and get it
right the first time"?

Cheers,

Sebastian  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210316/51759b85/attachment.sig>

From sebastian at sipsolutions.net  Tue Mar 16 18:15:04 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 16 Mar 2021 17:15:04 -0500
Subject: [Numpy-discussion] NumPy Community Meeting Wednesday (no DST: for
 those in the US, one hour later)
Message-ID: <bbc7b8eb70a39fb634e3434a272f230ef182b87a.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting Wednesday March 3rd at
20:00 UTC. Everyone is invited and encouraged to
join in and edit the work-in-progress meeting topics and notes at:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian


PS: As the subject says, we will stay on UTC 20:00, so for those in the
US and anyone else who had daylight saving times switches the time will
have shifted.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210316/d8da95ea/attachment.sig>

From melissawm at gmail.com  Tue Mar 16 20:17:24 2021
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Tue, 16 Mar 2021 21:17:24 -0300
Subject: [Numpy-discussion] Google Season of Docs 2021
Message-ID: <CAC7J6VaX6hz2Ka2di8M-eKDgXrZ3Xefrti8-TijVbymGfSoq6w@mail.gmail.com>

Hello, folks!

NumPy is hoping to participate again in Google Season of Docs this year,
and we have a couple of project ideas listed here:
https://github.com/numpy/numpy/wiki/Google-Season-of-Docs-2021-Project-Ideas

This year, GSoD has a different structure: we must choose only one project
idea (ideally, the one prospective technical writers are most interested
in) and submit a sort of grant proposal (details are here:
https://developers.google.com/season-of-docs/docs/admin-guide). If we are
selected, we can hire up to 2 technical writers to work on our project,
depending on the budget allocated to us by Google.

The final proposal must be submitted by March 26 (you can find the complete
timeline here https://developers.google.com/season-of-docs/docs/timeline).

Feedback and input is appreciated, and please feel free to share with
technical writers who may be interested.

Cheers,

- Melissa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210316/01be38c8/attachment-0001.html>

From lee.johnston.100 at gmail.com  Wed Mar 17 08:56:41 2021
From: lee.johnston.100 at gmail.com (Lee Johnston)
Date: Wed, 17 Mar 2021 07:56:41 -0500
Subject: [Numpy-discussion] NEP 42 status
In-Reply-To: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net>
References: <CAO3muZTyVb4BStoEQe55oYabqChW8GN=74f+JmCKxa-qiRoXOg@mail.gmail.com>
 <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net>
Message-ID: <CAO3muZSyB0sDZ8eME7yRXL9mU1MOmW_oGR42o79HL-=gSno-2Q@mail.gmail.com>

I am willing to wait for PR #18398 as I am mainly interested at this point
in the process of developing a new DType and then array coercion and
casting.

Does _rational_tests.c.src
<https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/_rational_tests.c.src>
illustrate
the new DType?

On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote:
> > Is the work on NEP 42 custom DTypes far enough along to experiment
> > with?
> >
>
> TL;DR:  Its not quite ready, but if we work together I think we could
> experiment a fair bit.  Mainly ufuncs are still limited (though not
> quite completely missing).  The main problem is that we need to find a
> way to expose the currently private API.
>
> I would be happy to discuss this also in a call.
>
>
> ** The long story: **
>
> There is one more PR related to casting, for which merge should be
> around the corner. And which would bring a lot bang to such an
> experiment:
>
> https://github.com/numpy/numpy/pull/18398
>
>
> At that point, the new machinery supports (or is used for):
>
> * Array-coercion: `np.array([your_scalar])` or
>   `np.array([1], dtype=your_dtype)`.
>
> * Casting (practically full support).
>
> * UFuncs do not quite work. But short of writing `np.add(arr1, arr2)`
>   with your DType involved, you can try a whole lot. (see below)
>
> * Promotion `np.result_type` should work very soon, but probably isn't
>   is not very relevant anyway until ufuncs are fully implemented.
>
> That should allow you to do a lot of good experimentation, but due to
> the ufunc limitation, maybe not well on "existing" python code.
>
>
> The long story about limitations is:
>
> We are missing exposure of the new public API.  I think I should be
> able to provide a solution for this pretty quickly, but it might
> require working of a NumPy branch.  (I will write another email about
> it, hopefully we can find a better solution.)
>
>
> Limitations for UFuncs:  UFuncs are the next big project, so to try it
> fully you will need some patience, unfortunately.
>
> But, there is some good news!  You can write most of the "ufunc"
> already, you just can't "register" it.
> So what I can already offer you is a "DType-specific UFunc", e.g.:
>
>    unit_dtype_multiply(np.array([1.], dtype=Float64UnitDType("m")),
>                        np.array([2.], dtype=Float64UnitDtype("s")))
>
> And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`.
>
> But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` yet.
> Both registration and "promotion" logic are missing.
>
> I admit promotion may be one of the trickiest things, but trying this a
> bit might help with getting a clearer picture for promotion as well.
>
>
> The main last limitation is that I did not replace or create "fallback"
> solutions and/or replacement for the legacy `dtype->f-><slots>` yet.
> This is not a serious limitation for experimentation, though.  It might
> even make sense to keep some of them around and replace them slowly.
>
>
> And of course, all the small issues/limitations that are not fixed
> because nobody tried yet...
>
>
>
> I hope this doesn't scare you away, or at least not for long :/.  It
> could be very useful to start experimentation soon to push things
> forward a bit quicker.  And I really want to have at least an
> experimental version in NumPy 1.21.
>
> Cheers,
>
> Sebastian
>
>
> > Lee
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210317/51418483/attachment.html>

From sebastian at sipsolutions.net  Wed Mar 17 18:12:32 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Wed, 17 Mar 2021 17:12:32 -0500
Subject: [Numpy-discussion] NEP 42 status
In-Reply-To: <CAO3muZSyB0sDZ8eME7yRXL9mU1MOmW_oGR42o79HL-=gSno-2Q@mail.gmail.com>
References: <CAO3muZTyVb4BStoEQe55oYabqChW8GN=74f+JmCKxa-qiRoXOg@mail.gmail.com>
 <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net>
 <CAO3muZSyB0sDZ8eME7yRXL9mU1MOmW_oGR42o79HL-=gSno-2Q@mail.gmail.com>
Message-ID: <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net>

On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote:
> I am willing to wait for PR #18398 as I am mainly interested at this
> point
> in the process of developing a new DType and then array coercion and
> casting.
> 
> Does _rational_tests.c.src
> < 
> https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/_rational_tests.c.src
> >
> illustrate
> the new DType?
> 

Thanks for joining the communit call!

The `rational_tests` are still using the old API and unfortunately
there is no great example of the new API, because the API is not public
yet and dealing with "old dtypes" in NumPy obfuscates it a bit.


Let me try to summarize my take-away from discussion and next steps:

As discussed, I think we agreed on the idea of exposing the new API
"experimentally" with the following mechanism:

1. We add a new header, distinct from the normal NumPy headers.

2. This header will use private Python API to achieve:
   - Strict version ABI/API requirements. If the code is updated in
     NumPy we will increase this version. Possible very often.
     A mismatch will cause a strict failure requiring the user to
     "keep up" with the NumPy development.
   - NumPy will prohibit exporting the public API unless a
     `NUMPY_EXPERIMENTAL_DTYPE_API=1` environment variable is set.
     This will hopefully prevent the use in production code even if we
     make a release.

3. In parallel, I will create a small "toy" DType based on that
   experimental API.  Probably in a separate repo (in the NumPy
   organization?).


Anyone using the API, should expect bugs, crashes and changes for a
while.  But hopefully will only require small code modifications when
the API becomes public.

My personal plan for a toy example is currently a "scaled integer".
E.g. a uint8 where you can set a range `[min_double, max_double]` that
it maps to (which makes the DType "parametric").
We discussed some other examples, such as a "modernized" rational
DType, that could be nice as well, lets see...

Units would be a great experiment, but seem a bit complex to me (I
don't know units well though). So to keep it baby steps :) I would aim
for doing the above and then we can experiment on Units together!


Since it came up:  I agree that a Python API would be great to have. It
is something I firmly kept on the back-burner...  It should not be very
hard (if rudimentary), but unless it would help experiments a lot, I
would tend to leave it on the back-burner for now.

Cheers,

Sebastian


[1]  Maybe a `uint8` storage that maps to evenly spaced values on a
parametric range `[double_min, double_max]`.  That seems like a good
trade-off in complexity.


> On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg <
> sebastian at sipsolutions.net>
> wrote:
> 
> > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote:
> > > Is the work on NEP 42 custom DTypes far enough along to
> > > experiment
> > > with?
> > > 
> > 
> > TL;DR:? Its not quite ready, but if we work together I think we
> > could
> > experiment a fair bit.? Mainly ufuncs are still limited (though not
> > quite completely missing).? The main problem is that we need to
> > find a
> > way to expose the currently private API.
> > 
> > I would be happy to discuss this also in a call.
> > 
> > 
> > ** The long story: **
> > 
> > There is one more PR related to casting, for which merge should be
> > around the corner. And which would bring a lot bang to such an
> > experiment:
> > 
> > https://github.com/numpy/numpy/pull/18398
> > 
> > 
> > At that point, the new machinery supports (or is used for):
> > 
> > * Array-coercion: `np.array([your_scalar])` or
> > ? `np.array([1], dtype=your_dtype)`.
> > 
> > * Casting (practically full support).
> > 
> > * UFuncs do not quite work. But short of writing `np.add(arr1,
> > arr2)`
> > ? with your DType involved, you can try a whole lot. (see below)
> > 
> > * Promotion `np.result_type` should work very soon, but probably
> > isn't
> > ? is not very relevant anyway until ufuncs are fully implemented.
> > 
> > That should allow you to do a lot of good experimentation, but due
> > to
> > the ufunc limitation, maybe not well on "existing" python code.
> > 
> > 
> > The long story about limitations is:
> > 
> > We are missing exposure of the new public API.? I think I should be
> > able to provide a solution for this pretty quickly, but it might
> > require working of a NumPy branch.? (I will write another email
> > about
> > it, hopefully we can find a better solution.)
> > 
> > 
> > Limitations for UFuncs:? UFuncs are the next big project, so to try
> > it
> > fully you will need some patience, unfortunately.
> > 
> > But, there is some good news!? You can write most of the "ufunc"
> > already, you just can't "register" it.
> > So what I can already offer you is a "DType-specific UFunc", e.g.:
> > 
> > ?? unit_dtype_multiply(np.array([1.], dtype=Float64UnitDType("m")),
> > ?????????????????????? np.array([2.], dtype=Float64UnitDtype("s")))
> > 
> > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`.
> > 
> > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` yet.
> > Both registration and "promotion" logic are missing.
> > 
> > I admit promotion may be one of the trickiest things, but trying
> > this a
> > bit might help with getting a clearer picture for promotion as
> > well.
> > 
> > 
> > The main last limitation is that I did not replace or create
> > "fallback"
> > solutions and/or replacement for the legacy `dtype->f-><slots>`
> > yet.
> > This is not a serious limitation for experimentation, though.? It
> > might
> > even make sense to keep some of them around and replace them
> > slowly.
> > 
> > 
> > And of course, all the small issues/limitations that are not fixed
> > because nobody tried yet...
> > 
> > 
> > 
> > I hope this doesn't scare you away, or at least not for long :/.?
> > It
> > could be very useful to start experimentation soon to push things
> > forward a bit quicker.? And I really want to have at least an
> > experimental version in NumPy 1.21.
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > > Lee
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210317/df6e2cb4/attachment.sig>

From hameerabbasi at yahoo.com  Fri Mar 19 07:23:49 2021
From: hameerabbasi at yahoo.com (Hameer Abbasi)
Date: Fri, 19 Mar 2021 12:23:49 +0100
Subject: [Numpy-discussion] PyData/Sparse 0.12.0 Release Announcement
References: <12f1adcc-fe1e-408e-a2d0-784426599f31.ref@Canary>
Message-ID: <12f1adcc-fe1e-408e-a2d0-784426599f31@Canary>

Apologies in advance for the cross-post!

I?m happy to announce the release of PyData/Sparse 0.12.0!

PyData/Sparse provides sparse arrays with a NumPy-like API for the PyData ecosystem.

This is a large release with GCXS support, preliminary CSR/CSC support and extensions to DOK, as well as bugfixes.

Changelog: https://sparse.pydata.org/en/stable/changelog.html
Documentation: https://sparse.pydata.org/
Source: https://github.com/pydata/sparse/

Best regards,
Hameer Abbasi

--
Sent from Canary (https://canarymail.io)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210319/f3d14a96/attachment.html>

From sebastian at sipsolutions.net  Tue Mar 23 23:57:08 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 23 Mar 2021 22:57:08 -0500
Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage
 Focus
Message-ID: <df24317d354e04db81e293b5c736655cb71e6bf2.camel@sipsolutions.net>

Hi all,

Our bi-weekly triage-focused NumPy development meeting is Wednesday,
March 10th at 11 am Pacific Time (18:00 UTC).
Everyone is invited to join in and edit the work-in-progress meeting
topics and notes:
https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg

I encourage everyone to notify us of issues or PRs that you feel should
be prioritized, discussed, or reviewed.

Best regards

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210323/86d9f2a7/attachment.sig>

From diagonaldevice at gmail.com  Wed Mar 24 20:57:17 2021
From: diagonaldevice at gmail.com (Michael Lamparski)
Date: Wed, 24 Mar 2021 20:57:17 -0400
Subject: [Numpy-discussion] Programmatically contracting multiple tensors
In-Reply-To: <CAMEWA4O13436d+uJm3NS01vpMA+POQ88A+KtREyr90CsKMxwgA@mail.gmail.com>
References: <CAGfQJGHxGgh+XgnoSif6sUXoxfJEL35DQQL-6itipD5vm2QeDw@mail.gmail.com>
 <CAL1kJvCu7feKE4VvjCuB+A7_a8VRTaMarAo37CNdxwjB=Jf_EA@mail.gmail.com>
 <CAMEWA4O13436d+uJm3NS01vpMA+POQ88A+KtREyr90CsKMxwgA@mail.gmail.com>
Message-ID: <CAGfQJGHf_5S-45QUGdETrRScy4sndpS2BAAUWDtuZty49ALrTg@mail.gmail.com>

Hi, I must thank y'all for the exceptionally fast responses (and apologize
for my own tragically slow response!)

On Sat, Mar 13, 2021 at 1:32 AM Eric Wieser <wieser.eric+numpy at gmail.com>
wrote:
> Einsum has a secret integer argument format that appears in the Examples
section of the
> `np.einsum` docs, but appears not to be mentioned at all in the parameter
listing.

Ah, yes, this is precisely the sort of API I was hoping for!  I found it
pretty easy to use, but here's a snippet that solves my original problem
for those wondering:

https://github.com/exphp-share/gpaw-raman-script/blob/f98fe14cd6/script/symmetry.py#L442-L471


On Fri, Mar 12, 2021 at 8:09 PM Andras Deak <deak.andris at gmail.com> wrote:
> But I'm not sure _where_ this could be highlighted among the
> parameters; after all this is all covered by the *operands parameter.

The parameter list is definitely one of the places I checked most
closely, and having something there would have helped.  I'd say that,
technically, this also overlaps with the subscripts argument, which now
holds the first array, and I feel like that may be the best place to put
something.  For instance, a short paragraph could be added to the end of
'subscripts':

"einsum also has an alternative interface that uses integer labels for
axes, in which case the subscripts argument is not present.  This is
documented below." (with a link)

(the idea I'm trying to capture here is to avoid creating any specific (or
potentially wrong) picture of how the arguments look in the alternate
signature, more or less forcing the reader to follow a link to where it is
more easily described somewhere outside the constraints of parameter-based
documentation)

---
Michael
<https://github.com/exphp-share/gpaw-raman-script/blob/f98fe14cd60d78dbffd8f07f078696d3aa96ab07/script/symmetry.py#L442-L471>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210324/49d045cc/attachment.html>

From tyler.je.reddy at gmail.com  Wed Mar 24 22:13:19 2021
From: tyler.je.reddy at gmail.com (Tyler Reddy)
Date: Wed, 24 Mar 2021 20:13:19 -0600
Subject: [Numpy-discussion] ANN: SciPy 1.6.2
Message-ID: <CAHPuU_bbFev0NzsrVefg5WEW2kTaUdBYJuAPRBSyQyOVoV1_zQ@mail.gmail.com>

Hi all,

On behalf of the SciPy development team I'm pleased to announce
the release of SciPy 1.6.2, which is a bug fix release.

Sources and binary wheels can be found at:
https://pypi.org/project/scipy/
and at: https://github.com/scipy/scipy/releases/tag/v1.6.2
<https://github.com/scipy/scipy/releases/tag/v1.6.1>

One of a few ways to install this release with pip:

pip install scipy==1.6.2

=====================
SciPy 1.6.2 Release Notes
=====================

SciPy 1.6.2 is a bug-fix release with no new features
compared to 1.6.1. This is also the first SciPy release
to place upper bounds on some dependencies to improve
the long-term repeatability of source builds.

Authors
======

* Pradipta Ghosh +
* Tyler Reddy
* Ralf Gommers
* Martin K. Scherer +
* Robert Uhl
* Warren Weckesser

A total of 6 people contributed to this release.
People with a "+" by their names contributed a patch for the first time.
This list of names is automatically generated, and may not be fully
complete.

Issues closed for 1.6.2
------------------------------

* `#13512 <https://github.com/scipy/scipy/issues/13512>`__:
\`stats.gaussian_kde.evaluate\` broken on S390X
* `#13584 <https://github.com/scipy/scipy/issues/13584>`__:
rotation._compute_euler_from_matrix() creates an array with negative...
* `#13585 <https://github.com/scipy/scipy/issues/13585>`__: Behavior change
in coo_matrix when dtype=None
* `#13686 <https://github.com/scipy/scipy/issues/13686>`__: delta0 argument
of scipy.odr.ODR() ignored

Pull requests for 1.6.2
------------------------------

* `#12862 <https://github.com/scipy/scipy/pull/12862>`__: REL: put upper
bounds on versions of dependencies
* `#13575 <https://github.com/scipy/scipy/pull/13575>`__: BUG: fix
\`gaussian_kernel_estimate\` on S390X
* `#13586 <https://github.com/scipy/scipy/pull/13586>`__: BUG: sparse:
Create a utility function \`getdata\`
* `#13598 <https://github.com/scipy/scipy/pull/13598>`__: MAINT, BUG:
enforce contiguous layout for output array in Rotation.as_euler
* `#13687 <https://github.com/scipy/scipy/pull/13687>`__: BUG: fix
scipy.odr to consider given delta0 argument

Checksums
=========

MD5
~~~

fc81d43879a28270d593aaea37c74ff8
 scipy-1.6.2-cp37-cp37m-macosx_10_9_x86_64.whl
9213533bfd3c2f1563d169009c39825c  scipy-1.6.2-cp37-cp37m-manylinux1_i686.whl
2ddd03b89efdb1619fa995da7b83aa6f
 scipy-1.6.2-cp37-cp37m-manylinux1_x86_64.whl
d378f725958bd6a83db7ef23e8659762
 scipy-1.6.2-cp37-cp37m-manylinux2014_aarch64.whl
87bc2771b8a8ab1f10168b1563300415  scipy-1.6.2-cp37-cp37m-win32.whl
861dab18fe41e82c08c8f585f2710545  scipy-1.6.2-cp37-cp37m-win_amd64.whl
d2e2002b526adeebf94489aa95031f54
 scipy-1.6.2-cp38-cp38-macosx_10_9_x86_64.whl
2dc36bfbe3938c492533604aba002c17  scipy-1.6.2-cp38-cp38-manylinux1_i686.whl
0114de2118d41f9440cf86fdd67434fc
 scipy-1.6.2-cp38-cp38-manylinux1_x86_64.whl
ede6db56b1bf0a7fed0c75acac7dcb85
 scipy-1.6.2-cp38-cp38-manylinux2014_aarch64.whl
191636ac3276da0ee9fd263b47927b73  scipy-1.6.2-cp38-cp38-win32.whl
8bdf7ab041b9115b379f043bb02d905f  scipy-1.6.2-cp38-cp38-win_amd64.whl
608c82b227b6077d9a7871ac6278e64d
 scipy-1.6.2-cp39-cp39-macosx_10_9_x86_64.whl
4c0313b2cccc85666b858ffd692a3c87  scipy-1.6.2-cp39-cp39-manylinux1_i686.whl
92da8ffe165034dbbe5f098d0ed58aec
 scipy-1.6.2-cp39-cp39-manylinux1_x86_64.whl
b4b225fb1deeaaf0eda909fdd3bd6ca6
 scipy-1.6.2-cp39-cp39-manylinux2014_aarch64.whl
662969220eadbb6efec99030e4d00268  scipy-1.6.2-cp39-cp39-win32.whl
f19186d6d91c7e37000e9f6ccd9b9b60  scipy-1.6.2-cp39-cp39-win_amd64.whl
cbcb9b39bd9d877ad3deeccc7c37bb7f  scipy-1.6.2.tar.gz
b56e705c653ad808a9725dfe840d1258  scipy-1.6.2.tar.xz
6f615549670cd3d312dc9e4359d2436a  scipy-1.6.2.zip

SHA256
~~~~~~

77f7a057724545b7e097bfdca5c6006bed8580768cd6621bb1330aedf49afba5
 scipy-1.6.2-cp37-cp37m-macosx_10_9_x86_64.whl
e547f84cd52343ac2d56df0ab08d3e9cc202338e7d09fafe286d6c069ddacb31
 scipy-1.6.2-cp37-cp37m-manylinux1_i686.whl
bc52d4d70863141bb7e2f8fd4d98e41d77375606cde50af65f1243ce2d7853e8
 scipy-1.6.2-cp37-cp37m-manylinux1_x86_64.whl
adf7cee8e5c92b05f2252af498f77c7214a2296d009fc5478fc432c2f8fb953b
 scipy-1.6.2-cp37-cp37m-manylinux2014_aarch64.whl
e3e9742bad925c421d39e699daa8d396c57535582cba90017d17f926b61c1552
 scipy-1.6.2-cp37-cp37m-win32.whl
ffdfb09315896c6e9ac739bb6e13a19255b698c24e6b28314426fd40a1180822
 scipy-1.6.2-cp37-cp37m-win_amd64.whl
6ca1058cb5bd45388041a7c3c11c4b2bd58867ac9db71db912501df77be2c4a4
 scipy-1.6.2-cp38-cp38-macosx_10_9_x86_64.whl
993c86513272bc84c451349b10ee4376652ab21f312b0554fdee831d593b6c02
 scipy-1.6.2-cp38-cp38-manylinux1_i686.whl
37f4c2fb904c0ba54163e03993ce3544c9c5cde104bcf90614f17d85bdfbb431
 scipy-1.6.2-cp38-cp38-manylinux1_x86_64.whl
96620240b393d155097618bcd6935d7578e85959e55e3105490bbbf2f594c7ad
 scipy-1.6.2-cp38-cp38-manylinux2014_aarch64.whl
03f1fd3574d544456325dae502facdf5c9f81cbfe12808a5e67a737613b7ba8c
 scipy-1.6.2-cp38-cp38-win32.whl
0c81ea1a95b4c9e0a8424cf9484b7b8fa7ef57169d7bcc0dfcfc23e3d7c81a12
 scipy-1.6.2-cp38-cp38-win_amd64.whl
c1d3f771c19af00e1a36f749bd0a0690cc64632783383bc68f77587358feb5a4
 scipy-1.6.2-cp39-cp39-macosx_10_9_x86_64.whl
50e5bcd9d45262725e652611bb104ac0919fd25ecb78c22f5282afabd0b2e189
 scipy-1.6.2-cp39-cp39-manylinux1_i686.whl
816951e73d253a41fa2fd5f956f8e8d9ac94148a9a2039e7db56994520582bf2
 scipy-1.6.2-cp39-cp39-manylinux1_x86_64.whl
1fba8a214c89b995e3721670e66f7053da82e7e5d0fe6b31d8e4b19922a9315e
 scipy-1.6.2-cp39-cp39-manylinux2014_aarch64.whl
e89091e6a8e211269e23f049473b2fde0c0e5ae0dd5bd276c3fc91b97da83480
 scipy-1.6.2-cp39-cp39-win32.whl
d744657c27c128e357de2f0fd532c09c84cd6e4933e8232895a872e67059ac37
 scipy-1.6.2-cp39-cp39-win_amd64.whl
e9da33e21c9bc1b92c20b5328adb13e5f193b924c9b969cd700c8908f315aa59
 scipy-1.6.2.tar.gz
8fadc443044396283c48191d48e4e07a3c3b6e2ae320b1a56e76bb42929e84d2
 scipy-1.6.2.tar.xz
2af283054d91865336b4579aa91f9e59d648d436cf561f96d4692008f795c750
 scipy-1.6.2.zip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210324/71d8be5c/attachment.html>

From sebastian at sipsolutions.net  Thu Mar 25 18:27:06 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Thu, 25 Mar 2021 17:27:06 -0500
Subject: [Numpy-discussion] 
 =?utf-8?q?NEP_42_status_=E2=80=93_Store_quant?=
 =?utf-8?q?ity_in_a_NumPy_array_and_convert_it_=3A=29?=
In-Reply-To: <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net>
References: <CAO3muZTyVb4BStoEQe55oYabqChW8GN=74f+JmCKxa-qiRoXOg@mail.gmail.com>
 <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net>
 <CAO3muZSyB0sDZ8eME7yRXL9mU1MOmW_oGR42o79HL-=gSno-2Q@mail.gmail.com>
 <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net>
Message-ID: <9f63a292b2811bed6f0a6c27346879ae62b7432e.camel@sipsolutions.net>

On Wed, 2021-03-17 at 17:12 -0500, Sebastian Berg wrote:
> On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote:

<snip>

> 3. In parallel, I will create a small "toy" DType based on that
> ?? experimental API.? Probably in a separate repo (in the NumPy
> ?? organization?).
> 

So this is started. What you need to do right now if you want to try is
work of this branch in NumPy:

     https://github.com/numpy/numpy/compare/main...seberg:experimental-dtype-api

Install NumPy with `NPY_USE_NEW_CASTINGIMPL=1 python -mpip install .`
or your favorite alternative.
(The `NPY_USE_NEW_CASTINGIMPL=1` should be unnecessary very soon,
working of a branch and not "main" will hopefully also be unnecessary
soon.)


Then fetch: https://github.com/seberg/experimental_user_dtypes
and install it as well in the same environment.


After that, you can jump through the hoop of setting:

    NUMPY_EXPERIMENTAL_DTYPE_API=1

And you can enjoy these type of examples (while expecting hard crashes
when going too far beyond!):

    from experimental_user_dtypes import float64unit as u
    import numpy as np

    F = np.array([u.Quantity(70., "Fahrenheit")])
    C = F.astype(u.Float64UnitDType("Celsius"))
    print(repr(C))
    # array([21.11111111111115 ?C], dtype='Float64UnitDType(degC)')

    m = np.array([u.Quantity(5., "m")])
    m_squared = u.multiply(m, m)
    print(repr(m_squared))
    # array([25.0 m**2], dtype='Float64UnitDType(m**2)')

    # Or conversion to SI the long route:
    pc = np.arange(5., dtype="float64").view(u.Float64UnitDType("pc"))
    pc.astype(pc.dtype.si())
    # array([0.0 m, 3.085677580962325e+16 m, 6.17135516192465e+16 m,
    #        9.257032742886974e+16 m, 1.23427103238493e+17 m],
    #       dtype='Float64UnitDType(m)')


Yes, the code has some horrible hacks around creating the DType, but
the basic mechanism i.e. "functions you need to implement" are not
expected to change lot.

Right now, it forces you to use and implement the scalar `u.Quantity`
and the code sample uses it. But you can also do:

    np.arange(3.).view(u.Float64UnitDType("m"))

I do have plans to "not have a scalar" so the 0-D result would still be
an array.  But that option doesn't exist yet (and right now the scalar
is used for printing).


(There is also a `string_equal` "ufunc-like" that works on "S" dtypes.)

Cheers,

Sebastian


PS: I need to figure out some details about how to create DTypes and
DType instances with regards to our stable ABI.  The current "solution"
is some weird subclassing hoops which are probably not good.

That is painful unfortunately and any ideas would be great :). 
Unfortunately, it requires a grasp around the C-API and metaclassing...


> 
> Anyone using the API, should expect bugs, crashes and changes for a
> while.? But hopefully will only require small code modifications when
> the API becomes public.
> 
> My personal plan for a toy example is currently a "scaled integer".
> E.g. a uint8 where you can set a range `[min_double, max_double]`
> that
> it maps to (which makes the DType "parametric").
> We discussed some other examples, such as a "modernized" rational
> DType, that could be nice as well, lets see...
> 
> Units would be a great experiment, but seem a bit complex to me (I
> don't know units well though). So to keep it baby steps :) I would
> aim
> for doing the above and then we can experiment on Units together!
> 
> 
> Since it came up:? I agree that a Python API would be great to have.
> It
> is something I firmly kept on the back-burner...? It should not be
> very
> hard (if rudimentary), but unless it would help experiments a lot, I
> would tend to leave it on the back-burner for now.
> 
> Cheers,
> 
> Sebastian
> 
> 
> [1]? Maybe a `uint8` storage that maps to evenly spaced values on a
> parametric range `[double_min, double_max]`.? That seems like a good
> trade-off in complexity.
> 
> 
> 
> > On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg <
> > sebastian at sipsolutions.net>
> > wrote:
> > 
> > > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote:
> > > > Is the work on NEP 42 custom DTypes far enough along to
> > > > experiment
> > > > with?
> > > > 
> > > 
> > > TL;DR:? Its not quite ready, but if we work together I think we
> > > could
> > > experiment a fair bit.? Mainly ufuncs are still limited (though
> > > not
> > > quite completely missing).? The main problem is that we need to
> > > find a
> > > way to expose the currently private API.
> > > 
> > > I would be happy to discuss this also in a call.
> > > 
> > > 
> > > ** The long story: **
> > > 
> > > There is one more PR related to casting, for which merge should
> > > be
> > > around the corner. And which would bring a lot bang to such an
> > > experiment:
> > > 
> > > https://github.com/numpy/numpy/pull/18398
> > > 
> > > 
> > > At that point, the new machinery supports (or is used for):
> > > 
> > > * Array-coercion: `np.array([your_scalar])` or
> > > ? `np.array([1], dtype=your_dtype)`.
> > > 
> > > * Casting (practically full support).
> > > 
> > > * UFuncs do not quite work. But short of writing `np.add(arr1,
> > > arr2)`
> > > ? with your DType involved, you can try a whole lot. (see below)
> > > 
> > > * Promotion `np.result_type` should work very soon, but probably
> > > isn't
> > > ? is not very relevant anyway until ufuncs are fully implemented.
> > > 
> > > That should allow you to do a lot of good experimentation, but
> > > due
> > > to
> > > the ufunc limitation, maybe not well on "existing" python code.
> > > 
> > > 
> > > The long story about limitations is:
> > > 
> > > We are missing exposure of the new public API.? I think I should
> > > be
> > > able to provide a solution for this pretty quickly, but it might
> > > require working of a NumPy branch.? (I will write another email
> > > about
> > > it, hopefully we can find a better solution.)
> > > 
> > > 
> > > Limitations for UFuncs:? UFuncs are the next big project, so to
> > > try
> > > it
> > > fully you will need some patience, unfortunately.
> > > 
> > > But, there is some good news!? You can write most of the "ufunc"
> > > already, you just can't "register" it.
> > > So what I can already offer you is a "DType-specific UFunc",
> > > e.g.:
> > > 
> > > ?? unit_dtype_multiply(np.array([1.],
> > > dtype=Float64UnitDType("m")),
> > > ?????????????????????? np.array([2.],
> > > dtype=Float64UnitDtype("s")))
> > > 
> > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`.
> > > 
> > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2`
> > > yet.
> > > Both registration and "promotion" logic are missing.
> > > 
> > > I admit promotion may be one of the trickiest things, but trying
> > > this a
> > > bit might help with getting a clearer picture for promotion as
> > > well.
> > > 
> > > 
> > > The main last limitation is that I did not replace or create
> > > "fallback"
> > > solutions and/or replacement for the legacy `dtype->f-><slots>`
> > > yet.
> > > This is not a serious limitation for experimentation, though.? It
> > > might
> > > even make sense to keep some of them around and replace them
> > > slowly.
> > > 
> > > 
> > > And of course, all the small issues/limitations that are not
> > > fixed
> > > because nobody tried yet...
> > > 
> > > 
> > > 
> > > I hope this doesn't scare you away, or at least not for long :/.?
> > > It
> > > could be very useful to start experimentation soon to push things
> > > forward a bit quicker.? And I really want to have at least an
> > > experimental version in NumPy 1.21.
> > > 
> > > Cheers,
> > > 
> > > Sebastian
> > > 
> > > 
> > > > Lee
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210325/ce28ade1/attachment.sig>

From lee.johnston.100 at gmail.com  Fri Mar 26 10:44:42 2021
From: lee.johnston.100 at gmail.com (Lee Johnston)
Date: Fri, 26 Mar 2021 09:44:42 -0500
Subject: [Numpy-discussion] 
	=?utf-8?q?NEP_42_status_=E2=80=93_Store_quant?=
	=?utf-8?q?ity_in_a_NumPy_array_and_convert_it_=3A=29?=
In-Reply-To: <9f63a292b2811bed6f0a6c27346879ae62b7432e.camel@sipsolutions.net>
References: <CAO3muZTyVb4BStoEQe55oYabqChW8GN=74f+JmCKxa-qiRoXOg@mail.gmail.com>
 <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net>
 <CAO3muZSyB0sDZ8eME7yRXL9mU1MOmW_oGR42o79HL-=gSno-2Q@mail.gmail.com>
 <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net>
 <9f63a292b2811bed6f0a6c27346879ae62b7432e.camel@sipsolutions.net>
Message-ID: <CAO3muZSiFBHXxJHLM=HU+gqpd48o5zqHh1+QCTTLRC04-AqSFg@mail.gmail.com>

Thanks Sebastian, I have your example running and will start experimenting
with DType.

Lee

On Thu, Mar 25, 2021 at 5:32 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Wed, 2021-03-17 at 17:12 -0500, Sebastian Berg wrote:
> > On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote:
>
> <snip>
>
> > 3. In parallel, I will create a small "toy" DType based on that
> >    experimental API.  Probably in a separate repo (in the NumPy
> >    organization?).
> >
>
> So this is started. What you need to do right now if you want to try is
> work of this branch in NumPy:
>
>
> https://github.com/numpy/numpy/compare/main...seberg:experimental-dtype-api
>
> Install NumPy with `NPY_USE_NEW_CASTINGIMPL=1 python -mpip install .`
> or your favorite alternative.
> (The `NPY_USE_NEW_CASTINGIMPL=1` should be unnecessary very soon,
> working of a branch and not "main" will hopefully also be unnecessary
> soon.)
>
>
> Then fetch: https://github.com/seberg/experimental_user_dtypes
> and install it as well in the same environment.
>
>
> After that, you can jump through the hoop of setting:
>
>     NUMPY_EXPERIMENTAL_DTYPE_API=1
>
> And you can enjoy these type of examples (while expecting hard crashes
> when going too far beyond!):
>
>     from experimental_user_dtypes import float64unit as u
>     import numpy as np
>
>     F = np.array([u.Quantity(70., "Fahrenheit")])
>     C = F.astype(u.Float64UnitDType("Celsius"))
>     print(repr(C))
>     # array([21.11111111111115 ?C], dtype='Float64UnitDType(degC)')
>
>     m = np.array([u.Quantity(5., "m")])
>     m_squared = u.multiply(m, m)
>     print(repr(m_squared))
>     # array([25.0 m**2], dtype='Float64UnitDType(m**2)')
>
>     # Or conversion to SI the long route:
>     pc = np.arange(5., dtype="float64").view(u.Float64UnitDType("pc"))
>     pc.astype(pc.dtype.si())
>     # array([0.0 m, 3.085677580962325e+16 m, 6.17135516192465e+16 m,
>     #        9.257032742886974e+16 m, 1.23427103238493e+17 m],
>     #       dtype='Float64UnitDType(m)')
>
>
> Yes, the code has some horrible hacks around creating the DType, but
> the basic mechanism i.e. "functions you need to implement" are not
> expected to change lot.
>
> Right now, it forces you to use and implement the scalar `u.Quantity`
> and the code sample uses it. But you can also do:
>
>     np.arange(3.).view(u.Float64UnitDType("m"))
>
> I do have plans to "not have a scalar" so the 0-D result would still be
> an array.  But that option doesn't exist yet (and right now the scalar
> is used for printing).
>
>
> (There is also a `string_equal` "ufunc-like" that works on "S" dtypes.)
>
> Cheers,
>
> Sebastian
>
>
>
> PS: I need to figure out some details about how to create DTypes and
> DType instances with regards to our stable ABI.  The current "solution"
> is some weird subclassing hoops which are probably not good.
>
> That is painful unfortunately and any ideas would be great :).
> Unfortunately, it requires a grasp around the C-API and metaclassing...
>
>
>
> >
> > Anyone using the API, should expect bugs, crashes and changes for a
> > while.  But hopefully will only require small code modifications when
> > the API becomes public.
> >
> > My personal plan for a toy example is currently a "scaled integer".
> > E.g. a uint8 where you can set a range `[min_double, max_double]`
> > that
> > it maps to (which makes the DType "parametric").
> > We discussed some other examples, such as a "modernized" rational
> > DType, that could be nice as well, lets see...
> >
> > Units would be a great experiment, but seem a bit complex to me (I
> > don't know units well though). So to keep it baby steps :) I would
> > aim
> > for doing the above and then we can experiment on Units together!
> >
> >
> > Since it came up:  I agree that a Python API would be great to have.
> > It
> > is something I firmly kept on the back-burner...  It should not be
> > very
> > hard (if rudimentary), but unless it would help experiments a lot, I
> > would tend to leave it on the back-burner for now.
> >
> > Cheers,
> >
> > Sebastian
> >
> >
> > [1]  Maybe a `uint8` storage that maps to evenly spaced values on a
> > parametric range `[double_min, double_max]`.  That seems like a good
> > trade-off in complexity.
> >
> >
> >
> > > On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg <
> > > sebastian at sipsolutions.net>
> > > wrote:
> > >
> > > > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote:
> > > > > Is the work on NEP 42 custom DTypes far enough along to
> > > > > experiment
> > > > > with?
> > > > >
> > > >
> > > > TL;DR:  Its not quite ready, but if we work together I think we
> > > > could
> > > > experiment a fair bit.  Mainly ufuncs are still limited (though
> > > > not
> > > > quite completely missing).  The main problem is that we need to
> > > > find a
> > > > way to expose the currently private API.
> > > >
> > > > I would be happy to discuss this also in a call.
> > > >
> > > >
> > > > ** The long story: **
> > > >
> > > > There is one more PR related to casting, for which merge should
> > > > be
> > > > around the corner. And which would bring a lot bang to such an
> > > > experiment:
> > > >
> > > > https://github.com/numpy/numpy/pull/18398
> > > >
> > > >
> > > > At that point, the new machinery supports (or is used for):
> > > >
> > > > * Array-coercion: `np.array([your_scalar])` or
> > > >   `np.array([1], dtype=your_dtype)`.
> > > >
> > > > * Casting (practically full support).
> > > >
> > > > * UFuncs do not quite work. But short of writing `np.add(arr1,
> > > > arr2)`
> > > >   with your DType involved, you can try a whole lot. (see below)
> > > >
> > > > * Promotion `np.result_type` should work very soon, but probably
> > > > isn't
> > > >   is not very relevant anyway until ufuncs are fully implemented.
> > > >
> > > > That should allow you to do a lot of good experimentation, but
> > > > due
> > > > to
> > > > the ufunc limitation, maybe not well on "existing" python code.
> > > >
> > > >
> > > > The long story about limitations is:
> > > >
> > > > We are missing exposure of the new public API.  I think I should
> > > > be
> > > > able to provide a solution for this pretty quickly, but it might
> > > > require working of a NumPy branch.  (I will write another email
> > > > about
> > > > it, hopefully we can find a better solution.)
> > > >
> > > >
> > > > Limitations for UFuncs:  UFuncs are the next big project, so to
> > > > try
> > > > it
> > > > fully you will need some patience, unfortunately.
> > > >
> > > > But, there is some good news!  You can write most of the "ufunc"
> > > > already, you just can't "register" it.
> > > > So what I can already offer you is a "DType-specific UFunc",
> > > > e.g.:
> > > >
> > > >    unit_dtype_multiply(np.array([1.],
> > > > dtype=Float64UnitDType("m")),
> > > >                        np.array([2.],
> > > > dtype=Float64UnitDtype("s")))
> > > >
> > > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`.
> > > >
> > > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2`
> > > > yet.
> > > > Both registration and "promotion" logic are missing.
> > > >
> > > > I admit promotion may be one of the trickiest things, but trying
> > > > this a
> > > > bit might help with getting a clearer picture for promotion as
> > > > well.
> > > >
> > > >
> > > > The main last limitation is that I did not replace or create
> > > > "fallback"
> > > > solutions and/or replacement for the legacy `dtype->f-><slots>`
> > > > yet.
> > > > This is not a serious limitation for experimentation, though.  It
> > > > might
> > > > even make sense to keep some of them around and replace them
> > > > slowly.
> > > >
> > > >
> > > > And of course, all the small issues/limitations that are not
> > > > fixed
> > > > because nobody tried yet...
> > > >
> > > >
> > > >
> > > > I hope this doesn't scare you away, or at least not for long :/.
> > > > It
> > > > could be very useful to start experimentation soon to push things
> > > > forward a bit quicker.  And I really want to have at least an
> > > > experimental version in NumPy 1.21.
> > > >
> > > > Cheers,
> > > >
> > > > Sebastian
> > > >
> > > >
> > > > > Lee
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210326/dd26728b/attachment-0001.html>

From melissawm at gmail.com  Sat Mar 27 15:35:05 2021
From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=)
Date: Sat, 27 Mar 2021 16:35:05 -0300
Subject: [Numpy-discussion] Documentation Team meeting - Monday March 29
In-Reply-To: <CAC7J6VaBg=C-J4TgiUOSeHmE5qRM0O_Fzxsfyyozgczqsihv+g@mail.gmail.com>
References: <CAC7J6Vbkch+=wQcEbJgELAphM2d2552Ywr2txrbinUR-khTR=g@mail.gmail.com>
 <CAC7J6VY9k_EGL-adWpeHLvFOEG6EBy3wauhD8cZ5vmL-JDxA0Q@mail.gmail.com>
 <CAC7J6VaNwZDbtt4FP6mX_-+iD_d=_4VVcjdGiaJj=q7eWCHewA@mail.gmail.com>
 <CAC7J6VZ2D8q0s7+rpKv7HYh9oBqwOQnUP_HjwKCjMtAnfi60Og@mail.gmail.com>
 <CAC7J6VY=vgKXySL1Euy30ZNzqkqvcr4Za1L04JZv-GCk87c1AQ@mail.gmail.com>
 <CAC7J6Vbq0vtd5JH6MXvcSj4A4aZjg+x=QdHvPyQGeGcmpp8JYg@mail.gmail.com>
 <CAC7J6VZ_+uh967zGc7qA6+KWKsP=gZ8bZecrJ9Hq9oG0JTzsOg@mail.gmail.com>
 <CAC7J6VZkder9wiEmE+Cm5Lxg0zt0t_a-KwfuvcCXZG91Wk+7kA@mail.gmail.com>
 <CAC7J6VYyt4c2qcp92iE8_BMF70cUC3-ZX5E5mqNa0zVAFryvmQ@mail.gmail.com>
 <CAC7J6VZq1DLDk--F+S6F50DHMMXOaWo6cbpSau7APHe22X_c9w@mail.gmail.com>
 <CAC7J6VaegMEY9dvMeRSe2YY=AUEJYkXNNF-KTy89zyE6LgHo5A@mail.gmail.com>
 <CAC7J6VY4_X-MyXajy0KryML03KoryKR9n5dsf5bupgv7noukUg@mail.gmail.com>
 <CAC7J6Va3n9Hdym+RmscOkDyf1ukH+DDtCv7N9D8y0cX1a+uC-A@mail.gmail.com>
 <CAC7J6VZX57n+_nWL-dKe4YzqY=Nc1=LYRXkNv8+JcP=q3Vyqog@mail.gmail.com>
 <CAC7J6Vb5uGp-gEMudbbiWSBKo1gyMsq03M4hz9mnNYir9GcqXw@mail.gmail.com>
 <CAC7J6Va3g7Pqrs=Hxtgg6GKHhZwErz4PPiKLw0YL-R2HUkWprQ@mail.gmail.com>
 <CAC7J6Vay+j5SsbXsJkCh2K5aeR6a8HkrkFpzDWoPkOfH=a9OkA@mail.gmail.com>
 <CAC7J6VYzL4+hxf7=1NfxHx7tVyfQci2iNNr7=gCRgJkMVXuiyQ@mail.gmail.com>
 <CAC7J6VaBg=C-J4TgiUOSeHmE5qRM0O_Fzxsfyyozgczqsihv+g@mail.gmail.com>
Message-ID: <CAC7J6VaCrBzE5SsxmmkOLKukPjJCc_mU_7JZS+=2VQ7vE8RVsQ@mail.gmail.com>

Hi all!

Our next Documentation Team meeting will be on *Monday, March 29* at ***4PM
UTC***.

All are welcome - you don't need to already be a contributor to join. If
you have questions or are curious about what we're doing, we'll be happy to
meet you!

If you wish to join on Zoom, use this link:

https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success

Here's the permanent hackmd document with the meeting notes (still being
updated in the next few days!):

https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg
<https://www.google.com/url?q=https%3A%2F%2Fhackmd.io%2FoB_boakvRqKR-_2jRV-Qjg&sa=D&usd=2&usg=AFQjCNGIOzVwlfDFd6YAgBwVUjmjQKWRSw>

Hope to see you around!

** You can click this link to get the correct time at your timezone:
https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210329T16&p1=1440&ah=1

*** You can add the NumPy community calendar to your google calendar by
clicking this link: https://calendar.google.com/calendar
/r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20

- Melissa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210327/6671af08/attachment.html>

From charlesr.harris at gmail.com  Sat Mar 27 19:45:18 2021
From: charlesr.harris at gmail.com (Charles R Harris)
Date: Sat, 27 Mar 2021 17:45:18 -0600
Subject: [Numpy-discussion] NumPy 1.20.2 released.
Message-ID: <CAB6mnxKU0_b__S_b3gDkAnGmQ6=9uC1VghopLM_e9iuu8Ui5KA@mail.gmail.com>

Charles R Harris <charlesr.harris at gmail.com>
Sun, Feb 7, 2:23 PM
to numpy-discussion, SciPy, SciPy-User, bcc: python-announce-list
Hi All,

On behalf of the NumPy team I am pleased to announce the release of NumPy
1.20.2. NumPy 1,20.2 is a bugfix release containing several fixes merged to
the main branch after the NumPy 1.20.1 release. The Python versions
supported for this release are 3.7-3.9. Wheels can be downloaded from PyPI
<https://pypi.org/project/numpy/1.20.2/>; source archives, release notes,
and wheel hashes are available on Github
<https://github.com/numpy/numpy/releases/tag/v1.20.2>. Linux users will
need pip >= 0.19.3 in  order to install manylinux2010 and manylinux2014
wheels.


*Contributors*

A total of 7 people contributed to this release.  People with a "+" by their
names contributed a patch for the first time.

   - Allan Haldane
   - Bas van Beek
   - Charles Harris
   - Christoph Gohlke
   - Mateusz Sok?? +
   - Michael Lamparski
   - Sebastian Berg


*Pull requests merged*
A total of 20 pull requests were merged for this release.

   - #18382: MAINT: Update f2py from master.
   - #18459: BUG: ``diagflat`` could overflow on windows or 32-bit platforms
   - #18460: BUG: Fix refcount leak in f2py ``complex_double_from_pyobj``.
   - #18461: BUG: Fix tiny memory leaks when ``like=`` overrides are used
   - #18462: BUG: Remove temporary change of descr/flags in VOID functions
   - #18469: BUG: Segfault in nditer buffer dealloc for Object arrays
   - #18485: BUG: Remove suspicious type casting
   - #18486: BUG: remove nonsensical comparison of pointer < 0
   - #18487: BUG: verify pointer against NULL before using it
   - #18488: BUG: check if PyArray_malloc succeeded
   - #18546: BUG: incorrect error fallthrough in nditer
   - #18559: CI: Backport CI fixes from main.
   - #18599: MAINT: Add annotations for `dtype.__getitem__`, `__mul__`
   and...
   - #18611: BUG: NameError in numpy.distutils.fcompiler.compaq
   - #18612: BUG: Fixed ``where`` keyword for ``np.mean`` & ``np.var``
   methods
   - #18617: CI: Update apt package list before Python install
   - #18636: MAINT: Ensure that re-exported sub-modules are properly
   annotated
   - #18638: BUG: Fix ma coercion list-of-ma-arrays if they do not cast
   to...
   - #18661: BUG: Fix small valgrind-found issues
   - #18671: BUG: Fix small issues found with pytest-leaks

Cheers,
Charles Harris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210327/59eefeef/attachment.html>

From ralf.gommers at gmail.com  Sun Mar 28 08:26:25 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Sun, 28 Mar 2021 14:26:25 +0200
Subject: [Numpy-discussion] Japanese translation of numpy.org complete -
 proofreader help wanted
Message-ID: <CABL7CQg-8k4Z_CNBbPcvywzbpNyspWnZ2oTbNsF0iCbPy3gzFQ@mail.gmail.com>

Hi all,

We have our first complete translation of the numpy.org content, Japanese,
thanks to Atsushi Sakai. It would be really helpful if someone else who
speaks Japanese can proofread the translations. Then it's ready to deploy I
think - we just need to enable the language switcher widget, the code for
that was tested already.

Proofreading is done in Crowdin, which is a friendly interface for
translators see -
https://github.com/numpy/numpy/wiki/Translations-of-the-NumPy-website. This
should not take more than an hour or two I think, and would be a very
valuable contribution. If you'd like to do this and want help, or report
that it all looks good, you can reply here, comment on the PR that the
crowdin bot opened (https://github.com/numpy/numpy.org/pull/385), or use
the Discussions tab in Crowdin.

The Brazilian Portuguese translation is also close to complete (72%). If
anyone feels motivated to complete that, that'd be great too - it could be
taken along in the initial launch then. There were also some discussions
about correct Portuguese terminology I believe, Melissa could you point to
those in case that is relevant for completion?

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210328/60f382e1/attachment-0001.html>

From friedrichromstedt at gmail.com  Mon Mar 29 03:52:34 2021
From: friedrichromstedt at gmail.com (Friedrich Romstedt)
Date: Mon, 29 Mar 2021 09:52:34 +0200
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAE_ApoBHS+J=HorXoBoHPuEbjyuYSGk4+Hrx_B-VZaGn3xMzuA@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
 <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
 <CAE_ApoBHS+J=HorXoBoHPuEbjyuYSGk4+Hrx_B-VZaGn3xMzuA@mail.gmail.com>
Message-ID: <CAN06=Cwj6eaW5zKENCN2KmTDfR95aGqbbCyhrj7nty8pUnw0HQ@mail.gmail.com>

Hi Matti, Sebastian and Lev,

Am Mo., 15. Feb. 2021 um 18:50 Uhr schrieb Lev Maximov <lev.maximov at gmail.com>:
>
> Try adding
>     view->suboffsets = NULL;
>     view->internal = NULL;
> to Image_getbuffer

finally I got it working easily using Lev's pointer cited above.  I
didn't follow the valgrind approach furthermore, since I found it
likely that it'd produce the same finding.

This is just to let you know; I applied the fix several weeks ago.

Many thanks,
Friedrich

From lev.maximov at gmail.com  Mon Mar 29 04:10:34 2021
From: lev.maximov at gmail.com (Lev Maximov)
Date: Mon, 29 Mar 2021 15:10:34 +0700
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <CAN06=Cwj6eaW5zKENCN2KmTDfR95aGqbbCyhrj7nty8pUnw0HQ@mail.gmail.com>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
 <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
 <CAE_ApoBHS+J=HorXoBoHPuEbjyuYSGk4+Hrx_B-VZaGn3xMzuA@mail.gmail.com>
 <CAN06=Cwj6eaW5zKENCN2KmTDfR95aGqbbCyhrj7nty8pUnw0HQ@mail.gmail.com>
Message-ID: <CAE_ApoDaj27A6sAO9inTZUGqYYeEo1M_AJY8NYr2itOd4mURjw@mail.gmail.com>

I'm glad you sorted it out as the subject line sounded quite horrifying )

Best regards,
Lev

On Mon, Mar 29, 2021 at 2:54 PM Friedrich Romstedt <
friedrichromstedt at gmail.com> wrote:

> Hi Matti, Sebastian and Lev,
>
> Am Mo., 15. Feb. 2021 um 18:50 Uhr schrieb Lev Maximov <
> lev.maximov at gmail.com>:
> >
> > Try adding
> >     view->suboffsets = NULL;
> >     view->internal = NULL;
> > to Image_getbuffer
>
> finally I got it working easily using Lev's pointer cited above.  I
> didn't follow the valgrind approach furthermore, since I found it
> likely that it'd produce the same finding.
>
> This is just to let you know; I applied the fix several weeks ago.
>
> Many thanks,
> Friedrich
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210329/96a64028/attachment.html>

From meissner at hawaii.edu  Mon Mar 29 13:21:20 2021
From: meissner at hawaii.edu (Gunter Meissner)
Date: Mon, 29 Mar 2021 07:21:20 -1000
Subject: [Numpy-discussion] Unreliable crash when converting using
 numpy.asarray via C buffer interface
In-Reply-To: <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
References: <CAN06=CzuKftoPkz8TJmbTfC18dH9-WX6rE4WQxxpQUxVpbtHMQ@mail.gmail.com>
 <CAN06=CxrUr3Qv5MhWzBaouVZ7YNxQJoXaSWu_NELTgCbYjh4Yg@mail.gmail.com>
 <fdec1ced-997a-abdf-becd-1e885ba961ff@gmail.com>
 <CAN06=Cx8MXtCTY2N60oHxBd0z5gk7ySibf_bmt33VWWErFYn=Q@mail.gmail.com>
 <CAN06=CwkFrRv5TW7bnd4+_mKT8BHzEBQQYjBPm3fZqQnqedHsA@mail.gmail.com>
 <baa8ac6390532698ce6caccbc2528d5111ab4a9a.camel@sipsolutions.net>
Message-ID: <CADH9zKP=2ZKs81Agffc4dcLp=OgjfkfsXg+FrxUbxJ2FQYB=RQ@mail.gmail.com>

Aloha Numpy Community,
I am just writing a book on "How to Cheat in Statistics - And get Away with
It".
I noticed there is no built-in syntax for the 'Adjusted R-squared' in any
library (do correct me if I am wrong)
I think it would be a good idea to program it. The math is straight
forward, I can
provide it if desired.
Thank you,
Gunter


On Mon, Feb 15, 2021 at 5:56 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote:
> > Hi,
> >
> > Am Do., 4. Feb. 2021 um 09:07 Uhr schrieb Friedrich Romstedt
> > <friedrichromstedt at gmail.com>:
> > > Am Mo., 1. Feb. 2021 um 09:46 Uhr schrieb Matti Picus <
> > > matti.picus at gmail.com>:
> > > > Typically, one would create a complete example and then pointing
> > > > to the
> > > > code (as repo or pastebin, not as an attachment to a mail here).
> > >
> > > https://github.com/friedrichromstedt/bughunting-01
> >
> > Last week I updated my example code to be more slim.  There now
> > exists
> > a single-file extension module:
> >
> https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bugIhuntingfrmod/bughuntingfrmod.cpp
> <https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp>
> > .
> > The corresponding test program
> >
> https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py
> > crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as
> > well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the
> > ``print``
> > statement contained in the test file is commented out.
> >
> > My hope to be able to fix my error myself by reducing the code to
> > reproduce the problem has not been fulfillled.  I feel that the
> > abovementioned test code is short enough to ask for help with it
> > here.
> > Any hint on how I could solve my problem would be appreciated very
> > much.
>
> I have tried it out, and can confirm that using debugging tools (namely
> valgrind), will allow you track down the issue (valgrind reports it
> from within python, running a python without debug symbols may
> obfuscate the actual problem; if that is the limiting you, I can post
> my valgrind output).
> Since you are running a linux system, I am confident that you can run
> it in valgrind to find it yourself.  (There may be other ways.)
>
> Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and
> ignore some errors e.g. when importing NumPy.
>
> Cheers,
>
> Sebastian
>
>
> >
> > There are some points which were not clarified yet; I am citing them
> > below.
> >
> > So far,
> > Friedrich
> >
> > > > - There are tools out there to analyze refcount problems. Python
> > > > has
> > > > some built-in tools for switching allocation strategies.
> > >
> > > Can you give me some pointer about this?
> > >
> > > > - numpy.asarray has a number of strategies to convert instances,
> > > > which
> > > > one is it using?
> > >
> > > I've tried to read about this, but couldn't find anything.  What
> > > are
> > > these different strategies?
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Gunter Meissner, PhD

University of Hawaii

Adjunct Professor of MathFinance at Columbia University and NYU

President of Derivatives Software www.dersoft.com

CEO Cassandra Capital Management www.cassandracm.com

CV: www.dersoft.com/cv.pdf

Email: meissner at hawaii.edu

Tel: USA (808) 779 3660
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210329/f8bc6b12/attachment.html>

From rashiqazhan at gmail.com  Mon Mar 29 14:26:00 2021
From: rashiqazhan at gmail.com (Rashiq Azhan)
Date: Mon, 29 Mar 2021 18:26:00 +0000
Subject: [Numpy-discussion] Expanding the scope of numpy.unpackbits and
 numpy.packbits to include more than uint8 type
Message-ID: <CAH6KDcwMd1hHiY8vkWVh=bv1q3R=_Z=xY_OfG5s3=2bKksmRzw@mail.gmail.com>

I would like this feature to be added since I think it can very useful
when there is a need to process data that cannot be included in uint8.
One of my personal requirements is modifying a 10-bit, per channel,
images held in a NumPy array but I cannot do that using the specified
functions. They are eloquent solution and works well with the with
NumPy functions as long as the data is uint8.

From jfoxrabinovitz at gmail.com  Mon Mar 29 15:10:32 2021
From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz)
Date: Mon, 29 Mar 2021 15:10:32 -0400
Subject: [Numpy-discussion] Expanding the scope of numpy.unpackbits and
 numpy.packbits to include more than uint8 type
In-Reply-To: <CAH6KDcwMd1hHiY8vkWVh=bv1q3R=_Z=xY_OfG5s3=2bKksmRzw@mail.gmail.com>
References: <CAH6KDcwMd1hHiY8vkWVh=bv1q3R=_Z=xY_OfG5s3=2bKksmRzw@mail.gmail.com>
Message-ID: <CAAa1KPZiynU-UujvC4FHjWvhj-v+XLZcKk2dUNebbeiv2dbkRg@mail.gmail.com>

You can view any array as uint8


On Mon, Mar 29, 2021, 14:27 Rashiq Azhan <rashiqazhan at gmail.com> wrote:

> I would like this feature to be added since I think it can very useful
> when there is a need to process data that cannot be included in uint8.
> One of my personal requirements is modifying a 10-bit, per channel,
> images held in a NumPy array but I cannot do that using the specified
> functions. They are eloquent solution and works well with the with
> NumPy functions as long as the data is uint8.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210329/c510fab7/attachment.html>

From cimrman3 at ntc.zcu.cz  Mon Mar 29 17:38:55 2021
From: cimrman3 at ntc.zcu.cz (Robert Cimrman)
Date: Mon, 29 Mar 2021 23:38:55 +0200
Subject: [Numpy-discussion] ANN: SfePy 2021.1
Message-ID: <dec40ca3-954f-f88b-698d-cd9a2195e880@ntc.zcu.cz>

I am pleased to announce the release of SfePy 2021.1.

Description
-----------

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by finite element methods. It is
distributed under the new BSD license.

Home page: https://sfepy.org
Mailing list: https://mail.python.org/mm3/mailman3/lists/sfepy.python.org/
Git (source) repository, issue tracker: https://github.com/sfepy/sfepy

Highlights of this release
--------------------------

- non-square homogenized coefficient matrices
- new implementation of multi-linear terms
- improved handling of Dirichlet and periodic boundary conditions in common
   nodes
- terms in the term table document linked to examples

For full release notes see [1].

Cheers,
Robert Cimrman

[1] http://docs.sfepy.org/doc/release_notes.html#id1

---

Contributors to this release in alphabetical order:

Robert Cimrman
Antony Kamp
Vladimir Lukes


From klark--kent at yandex.ru  Tue Mar 30 08:41:40 2021
From: klark--kent at yandex.ru (klark--kent at yandex.ru)
Date: Tue, 30 Mar 2021 15:41:40 +0300
Subject: [Numpy-discussion] Expanding the scope of numpy.unpackbits and
 numpy.packbits to include more than uint8 type
In-Reply-To: <CAAa1KPZiynU-UujvC4FHjWvhj-v+XLZcKk2dUNebbeiv2dbkRg@mail.gmail.com>
References: <CAH6KDcwMd1hHiY8vkWVh=bv1q3R=_Z=xY_OfG5s3=2bKksmRzw@mail.gmail.com>
 <CAAa1KPZiynU-UujvC4FHjWvhj-v+XLZcKk2dUNebbeiv2dbkRg@mail.gmail.com>
Message-ID: <496321617107476@mail.yandex.ru>

An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210330/da37a80c/attachment.html>

From guillaume.bethouart at eshard.com  Tue Mar 30 20:34:19 2021
From: guillaume.bethouart at eshard.com (Guillaume Bethouart)
Date: Wed, 31 Mar 2021 02:34:19 +0200
Subject: [Numpy-discussion] Dot + add operation
Message-ID: <CAEWdsxS_9kMnK4SVeCMO2EOuDiCkvXLyW_NvkoDBE32yboSDZg@mail.gmail.com>

Is it possible to add a method to perform a dot product and add the result
to an existing matrix in a single operation ?

Like C = dot_add(A, B, C) equivalent to C += A @ B.This behavior is
natively proposed by the Blas *gemm primitive.

The goal is to reduce the peak memory consumption. Indeed, during the
computation of C += A @ B, the maximum allocated memory is twice the size
of C.Using *gemm to add directly the result , the maximum memory
consumption is less than 1.5x the size of C.
This difference is significant for large matrices.

Any people interested in it ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210331/6eba68e5/attachment.html>

From sebastian at sipsolutions.net  Tue Mar 30 22:35:41 2021
From: sebastian at sipsolutions.net (Sebastian Berg)
Date: Tue, 30 Mar 2021 21:35:41 -0500
Subject: [Numpy-discussion] NumPy Community Meeting Wednesday (no DST: for
 those e.g. in the EU)
Message-ID: <c06139be376428ee58181a3da20737bdf724f347.camel@sipsolutions.net>

Hi all,

There will be a NumPy Community meeting Wednesday March 3rd at
20:00 UTC. Everyone is invited and encouraged to
join in and edit the work-in-progress meeting topics and notes at:

https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both

Best wishes

Sebastian


PS: As the subject says, we will stay on UTC 20:00, so for those in the
EU and anyone else who had daylight saving times switches the time will
have shifted.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210330/c54ec22a/attachment.sig>

From ralf.gommers at gmail.com  Wed Mar 31 06:18:49 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 31 Mar 2021 12:18:49 +0200
Subject: [Numpy-discussion] Steering Council membership updates
Message-ID: <CABL7CQjVzTwvkKJBsu5-TVBnp28KvwUD_iNzmxbNsAU7Xs-tAg@mail.gmail.com>

Hi all,

On behalf of the NumPy Steering Council (SC) I have a number of membership
changes to announce.

We're excited to welcome Inessa Pawson and Melissa Mendon?a as new SC
members. Inessa has been contributing for close to two years, and has been
a driving force behind the new website, the user survey, and other content
and community initiatives. Melissa has been contributing since the start of
last year, she leads the documentation team, co-maintains f2py, does a lot
of mentoring, and is the PI on our current CZI grant.

A number of people are moving to "emeritus SC member" status: Nathaniel
Smith, Pauli Virtanen, Julian Taylor, Allan Haldane, and Jaime Fern?ndez
del R?o. They have been in low (or no) activity mode for a while, and this
membership update reflects that. They all still have commit rights, and if
they get more active we'll of course welcome them back with open arms. With
these changes the SC now consists of the people who have been most active
across the projects' activities and decision-making.

A PR with changes to the governance/people page is up at
https://github.com/numpy/numpy/pull/18705.

While the list of SC members is shrinking, it does feel like the NumPy
project itself is growing. There are a lot of people getting involved, from
new maintainers focusing on technical topics like type annotations and SIMD
acceleration to tutorial writers and people working on website content and
accessibility. Which is awesome to see.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210331/61826a7f/attachment.html>

From ralf.gommers at gmail.com  Wed Mar 31 07:34:51 2021
From: ralf.gommers at gmail.com (Ralf Gommers)
Date: Wed, 31 Mar 2021 13:34:51 +0200
Subject: [Numpy-discussion] Dot + add operation
In-Reply-To: <CAEWdsxS_9kMnK4SVeCMO2EOuDiCkvXLyW_NvkoDBE32yboSDZg@mail.gmail.com>
References: <CAEWdsxS_9kMnK4SVeCMO2EOuDiCkvXLyW_NvkoDBE32yboSDZg@mail.gmail.com>
Message-ID: <CABL7CQiEJHF0RonLsbQ5fOd+1S-iXxEERz1h6a=Dq2wOTaHwRw@mail.gmail.com>

On Wed, Mar 31, 2021 at 2:35 AM Guillaume Bethouart <
guillaume.bethouart at eshard.com> wrote:

> Is it possible to add a method to perform a dot product and add the result
> to an existing matrix in a single operation ?
>
> Like C = dot_add(A, B, C) equivalent to C += A @ B.This behavior is
> natively proposed by the Blas *gemm primitive.
>
> The goal is to reduce the peak memory consumption. Indeed, during the
> computation of C += A @ B, the maximum allocated memory is twice the size
> of C.Using *gemm to add directly the result , the maximum memory
> consumption is less than 1.5x the size of C.
> This difference is significant for large matrices.
>
> Any people interested in it ?
>

Hi Guillaume, such fused operations cannot easily be done with NumPy alone,
and it does not make sense to add separate APIs for that purpose because
there are so many combinations of function calls that one might want to
fuse.

Instead, Numba, Pythran or numexpr can add this to some extent for numpy
code. E.g. search for "loop fusion" in the Numba docs.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210331/80a8150e/attachment.html>

From kevin.k.sheppard at gmail.com  Wed Mar 31 07:43:02 2021
From: kevin.k.sheppard at gmail.com (Kevin Sheppard)
Date: Wed, 31 Mar 2021 12:43:02 +0100
Subject: [Numpy-discussion] Dot + add operation
In-Reply-To: <CABL7CQiEJHF0RonLsbQ5fOd+1S-iXxEERz1h6a=Dq2wOTaHwRw@mail.gmail.com>
References: <CAEWdsxS_9kMnK4SVeCMO2EOuDiCkvXLyW_NvkoDBE32yboSDZg@mail.gmail.com>
 <CABL7CQiEJHF0RonLsbQ5fOd+1S-iXxEERz1h6a=Dq2wOTaHwRw@mail.gmail.com>
Message-ID: <CAE8Ss2E-c6TG--PZWa36AiNwP4zKuXnt2kHa0VaC8GLpoh5z=g@mail.gmail.com>

Or just use SciPy's  get_blas_funcs
<https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.blas.get_blas_funcs.html#scipy.linalg.blas.get_blas_funcs>
to access *gemm, which directly exposes this function:

https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.blas.dgemm.html

Kevin


On Wed, Mar 31, 2021 at 12:35 PM Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
> On Wed, Mar 31, 2021 at 2:35 AM Guillaume Bethouart <
> guillaume.bethouart at eshard.com> wrote:
>
>> Is it possible to add a method to perform a dot product and add the
>> result to an existing matrix in a single operation ?
>>
>> Like C = dot_add(A, B, C) equivalent to C += A @ B.This behavior is
>> natively proposed by the Blas *gemm primitive.
>>
>> The goal is to reduce the peak memory consumption. Indeed, during the
>> computation of C += A @ B, the maximum allocated memory is twice the size
>> of C.Using *gemm to add directly the result , the maximum memory
>> consumption is less than 1.5x the size of C.
>> This difference is significant for large matrices.
>>
>> Any people interested in it ?
>>
>
> Hi Guillaume, such fused operations cannot easily be done with NumPy
> alone, and it does not make sense to add separate APIs for that purpose
> because there are so many combinations of function calls that one might
> want to fuse.
>
> Instead, Numba, Pythran or numexpr can add this to some extent for numpy
> code. E.g. search for "loop fusion" in the Numba docs.
>
> Cheers,
> Ralf
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210331/55575bc3/attachment-0001.html>

From guillaume.bethouart at eshard.com  Wed Mar 31 08:36:47 2021
From: guillaume.bethouart at eshard.com (Guillaume Bethouart)
Date: Wed, 31 Mar 2021 05:36:47 -0700 (MST)
Subject: [Numpy-discussion] Dot + add operation
In-Reply-To: <CAE8Ss2E-c6TG--PZWa36AiNwP4zKuXnt2kHa0VaC8GLpoh5z=g@mail.gmail.com>
References: <CAEWdsxS_9kMnK4SVeCMO2EOuDiCkvXLyW_NvkoDBE32yboSDZg@mail.gmail.com>
 <CABL7CQiEJHF0RonLsbQ5fOd+1S-iXxEERz1h6a=Dq2wOTaHwRw@mail.gmail.com>
 <CAE8Ss2E-c6TG--PZWa36AiNwP4zKuXnt2kHa0VaC8GLpoh5z=g@mail.gmail.com>
Message-ID: <1617194207104-0.post@n7.nabble.com>

Thanks for the quick reply.

A was not aware of the fact that this kind of "fuse" function is not
compatible with Numpy API. I understand the point.
FYI, numba is not able to simplify this kind of calculus: C += A @ B. Nor
numexpr which is not compatible with dot product. I did not test pythran.

Thus, the only solution is to use the Blas functions through scipy, as
recalled by Kevin.
I'll play a bit with transposition and alignment issues ...

Regards,


--
Sent from: http://numpy-discussion.10968.n7.nabble.com/

From stefanv at berkeley.edu  Wed Mar 31 16:56:20 2021
From: stefanv at berkeley.edu (Stefan van der Walt)
Date: Wed, 31 Mar 2021 13:56:20 -0700
Subject: [Numpy-discussion] Steering Council membership updates
In-Reply-To: <CABL7CQjVzTwvkKJBsu5-TVBnp28KvwUD_iNzmxbNsAU7Xs-tAg@mail.gmail.com>
References: <CABL7CQjVzTwvkKJBsu5-TVBnp28KvwUD_iNzmxbNsAU7Xs-tAg@mail.gmail.com>
Message-ID: <0b7fbf11-e3dc-4e79-9f9d-f771fca175ce@www.fastmail.com>

On Wed, Mar 31, 2021, at 03:18, Ralf Gommers wrote:
> We're excited to welcome Inessa Pawson and Melissa Mendon?a as new SC members. Inessa has been contributing for close to two years, and has been a driving force behind the new website, the user survey, and other content and community initiatives. Melissa has been contributing since the start of last year, she leads the documentation team, co-maintains f2py, does a lot of mentoring, and is the PI on our current CZI grant.

Thank you for your service, Inessa and Melissa!  Welcome to the steering council.

Best regards,
St?fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210331/7992fe44/attachment.html>