From ralf.gommers at gmail.com Mon Mar 1 04:53:11 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 1 Mar 2021 10:53:11 +0100 Subject: [Numpy-discussion] guide for downstream package authors & setting version constraints Message-ID: Hi all, We now have a guide for downstream package authors, talking about API and ABI stability, NumPy versioning, testing against NumPy master, and how to add build time and runtime dependencies for numpy: https://numpy.org/devdocs/user/depending_on_numpy.html Especially the version constraints and setting upper bounds for install_requires correctly is important - almost no packages do this correctly (or at all really). If your package depends on NumPy and you deal with packaging it, please check it out! And for even more practical details on release process steps for a downstream package, see http://scipy.github.io/devdocs/dev/core-dev/index.html#updating-upper-bounds-of-dependencies Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Mar 3 10:44:44 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 03 Mar 2021 09:44:44 -0600 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday (Today) Message-ID: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net> Hi all, There will be a NumPy Community meeting Wednesday March 3rd at 12pm Pacific Time (20:00 UTC). Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian PS: Sorry for the late reminder. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From robbmcleod at gmail.com Wed Mar 3 13:30:12 2021 From: robbmcleod at gmail.com (Robert McLeod) Date: Wed, 3 Mar 2021 10:30:12 -0800 Subject: [Numpy-discussion] ANN: NumExpr 2.7.3 Message-ID: ======================== Announcing NumExpr 2.7.3 ======================== Hi everyone, This is a maintenance release to make use of the oldest supported NumPy version when building wheels, in an effort to alleviate issues seen on Windows machines that do not have the latest Windows MSVC runtime installed. It also adds wheels built via GitHub Actions for ARMv8 platforms. Project documentation is available at: http://numexpr.readthedocs.io/ Changes from 2.7.2 to 2.7.3 --------------------------- - Pinned Numpy versions to minimum supported version in an effort to alleviate issues seen in Windows machines not having the same Windows SDK installed as was used to build the wheels. - ARMv8 wheels are now available, thanks to `odidev` for the pull request. What's Numexpr? --------------- Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It has multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. Where I can find Numexpr? ------------------------- The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Documentation is hosted at: http://numexpr.readthedocs.io/en/latest/ Share your experience --------------------- Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Robert McLeod robbmcleod at gmail.com robert.mcleod at hitachi-hhtc.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Mar 4 02:08:59 2021 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 03 Mar 2021 23:08:59 -0800 Subject: [Numpy-discussion] Development branches renamed Message-ID: <68d4ea36-941e-4232-b679-dbf1ff8f4469@www.fastmail.com> Hi everyone, The development branches of most of the repositories on github.com/numpy have been renamed to `main` (this is the GitHub default for newly created repositories). The move has not yet been made for sub-projects such as `numpydoc` or `numpy.org`, but those should follow soon. We were able to preserve all PRs, other than those for which the original branches have been deleted. You can update your locally cloned repository to have a `main` branch as follows: git branch -m master main git fetch git branch -u /main main (where YOUR_UPSTREAM_REMOTE is typically called `upstream` or `origin`) If you have any trouble, let us know. Best regards, St?fan From ralf.gommers at gmail.com Thu Mar 4 03:32:52 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 4 Mar 2021 09:32:52 +0100 Subject: [Numpy-discussion] Development branches renamed In-Reply-To: <68d4ea36-941e-4232-b679-dbf1ff8f4469@www.fastmail.com> References: <68d4ea36-941e-4232-b679-dbf1ff8f4469@www.fastmail.com> Message-ID: On Thu, Mar 4, 2021 at 8:09 AM Stefan van der Walt wrote: > Hi everyone, > > The development branches of most of the repositories on github.com/numpy > have been renamed to `main` (this is the GitHub default for newly created > repositories). The move has not yet been made for sub-projects such as > `numpydoc` or `numpy.org`, but those should follow soon. > Thanks for working on this St?fan! Cheers, Ralf > We were able to preserve all PRs, other than those for which the original > branches have been deleted. > > You can update your locally cloned repository to have a `main` branch as > follows: > > git branch -m master main > git fetch > git branch -u /main main > > (where YOUR_UPSTREAM_REMOTE is typically called `upstream` or `origin`) > > If you have any trouble, let us know. > > Best regards, > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From klark--kent at yandex.ru Thu Mar 4 17:37:40 2021 From: klark--kent at yandex.ru (klark--kent at yandex.ru) Date: Fri, 05 Mar 2021 01:37:40 +0300 Subject: [Numpy-discussion] NumPy logo merchandise In-Reply-To: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net> References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net> Message-ID: <2911614896816@mail.yandex.ru> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG-20201122-WA0001.jpg Type: image/jpeg Size: 80498 bytes Desc: not available URL: From shoyer at gmail.com Thu Mar 4 19:54:07 2021 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 4 Mar 2021 16:54:07 -0800 Subject: [Numpy-discussion] NumPy logo merchandise In-Reply-To: <2911614896816@mail.yandex.ru> References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net> <2911614896816@mail.yandex.ru> Message-ID: I love your mittens! NumPy really should be in the NumFOCUS store, but it currently isn't: https://shop.spreadshirt.com/numfocus/all I'll make some inquiries to see if we can sort that out :). On Thu, Mar 4, 2021 at 2:43 PM wrote: > Hello. I was looking for a T-shirt with Numpy logo but didn't find > anywhere. Anybody knows if there's a merchandise with Numpy? So I have to > kneet mittens with Numpy logo for myself. > > Best regards! > Konstantin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jni at fastmail.com Thu Mar 4 19:57:42 2021 From: jni at fastmail.com (Juan Nunez-Iglesias) Date: Fri, 5 Mar 2021 11:57:42 +1100 Subject: [Numpy-discussion] NumPy logo merchandise In-Reply-To: References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net> <2911614896816@mail.yandex.ru> Message-ID: Yeah I desperately want those mitts! > On 5 Mar 2021, at 11:54 am, Stephan Hoyer wrote: > > I love your mittens! > > NumPy really should be in the NumFOCUS store, but it currently isn't: > https://shop.spreadshirt.com/numfocus/all > > I'll make some inquiries to see if we can sort that out :). > > On Thu, Mar 4, 2021 at 2:43 PM > wrote: > Hello. I was looking for a T-shirt with Numpy logo but didn't find anywhere. Anybody knows if there's a merchandise with Numpy? So I have to kneet mittens with Numpy logo for myself. > > Best regards! > Konstantin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Fri Mar 5 07:58:03 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Fri, 5 Mar 2021 09:58:03 -0300 Subject: [Numpy-discussion] NumPy logo merchandise In-Reply-To: References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net> <2911614896816@mail.yandex.ru> Message-ID: Hi Konstantin, Would you mind open-sourcing that recipe? :D Cheers, Melissa Em qui, 4 de mar de 2021 21:59, Juan Nunez-Iglesias escreveu: > Yeah I desperately want those mitts! > > [image: Shut up and take my money! - How site performance affects revenue > - Scale Dynamix] > > > > On 5 Mar 2021, at 11:54 am, Stephan Hoyer wrote: > > I love your mittens! > > NumPy really should be in the NumFOCUS store, but it currently isn't: > https://shop.spreadshirt.com/numfocus/all > > I'll make some inquiries to see if we can sort that out :). > > On Thu, Mar 4, 2021 at 2:43 PM wrote: > >> Hello. I was looking for a T-shirt with Numpy logo but didn't find >> anywhere. Anybody knows if there's a merchandise with Numpy? So I have to >> kneet mittens with Numpy logo for myself. >> >> Best regards! >> Konstantin >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From klark--kent at yandex.ru Fri Mar 5 17:02:02 2021 From: klark--kent at yandex.ru (klark--kent at yandex.ru) Date: Sat, 06 Mar 2021 01:02:02 +0300 Subject: [Numpy-discussion] NumPy logo merchandise In-Reply-To: References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net> <2911614896816@mail.yandex.ru> Message-ID: <214931614981686@mail.yandex.ru> An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Sat Mar 6 09:44:17 2021 From: toddrjen at gmail.com (Todd) Date: Sat, 6 Mar 2021 09:44:17 -0500 Subject: [Numpy-discussion] String accessor methods Message-ID: Currently. working with strings in numpy is not very convenient. You have to use a separate set of functions in a separate namespace, and those functions are relatively limited and poorly-documented. A solution several other projects, including pandas [0] and xarray [1], have found are string accessor methods. These are a set of methods attached to a `str` attribute of the class. These have the advantage that they are always available and have a well-defined object they operate on. On non-str dtypes, it would raise an exception. This would also provide a standardized set of methods and behaviors that are part of the numpy api that other classes could depend on. An example would be something like this: >>> mystr = np.array(["test first", "test second", "test third"]) >>> mystr.str.title() array(['Test First', 'Test Second', 'Test Third'], dtype=' From blkzol001 at myuct.ac.za Sat Mar 6 12:36:35 2021 From: blkzol001 at myuct.ac.za (zoj613) Date: Sat, 6 Mar 2021 10:36:35 -0700 (MST) Subject: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler Message-ID: <1615052195691-0.post@n7.nabble.com> Hi All, I noticed that the transformed rejection method for generating Poisson random variables used in numpy makes use of the `random_loggam` function which directly calculates the log-gamma function. It appears that a log-factorial lookup table was added a few years back which could be used in place of random_loggam since the input is always an integer. Is there a reason for not using this table instead? See link below for the line of code: https://github.com/numpy/numpy/blob/6222e283fa0b8fb9ba562dabf6ca9ea7ed65be39/numpy/random/src/distributions/distributions.c#L572 Regards Zolisa -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From blkzol001 at myuct.ac.za Sat Mar 6 12:52:26 2021 From: blkzol001 at myuct.ac.za (zoj613) Date: Sat, 6 Mar 2021 10:52:26 -0700 (MST) Subject: [Numpy-discussion] guide for downstream package authors & setting version constraints In-Reply-To: References: Message-ID: <1615053146181-0.post@n7.nabble.com> Thanks you, this looks very informative. Is there a best practice guide somewhere in the docs on how to correctly expose C-level code to third parties via .pxd files, similarly to how one can access the c_distributions of numpy via cython? I tried this previously and failed miserably. It seemed like symbols for some C functions I tried to expose to the user via cython declaration could not be found. I know I did something wrong, but im not sure what (I linked the header files and everything). The Cython docs were not very helpful. Maybe scipy/numpy devs could shed some light on how this is properly done? -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From dan_patterson at outlook.com Sat Mar 6 12:57:04 2021 From: dan_patterson at outlook.com (dan_patterson) Date: Sat, 6 Mar 2021 10:57:04 -0700 (MST) Subject: [Numpy-discussion] String accessor methods In-Reply-To: References: Message-ID: <1615053424270-0.post@n7.nabble.com> The are in np.char mystr = np.array(["test first", "test second", "test third"]) np.char.title(mystr) array(['Test First', 'Test Second', 'Test Third'], dtype=' References: <1615052195691-0.post@n7.nabble.com> Message-ID: On 3/6/21, zoj613 wrote: > Hi All, > > I noticed that the transformed rejection method for generating Poisson > random variables used in numpy makes use of the `random_loggam` function > which directly calculates the log-gamma function. It appears that a > log-factorial lookup table was added a few years back which could be used > in > place of random_loggam since the input is always an integer. Is there a > reason for not using this table instead? See link below for the line of > code: > > https://github.com/numpy/numpy/blob/6222e283fa0b8fb9ba562dabf6ca9ea7ed65be39/numpy/random/src/distributions/distributions.c#L572 > > Regards > Zolisa > Hi Zolisa, In the pull request where the C function logfactorial was added (https://github.com/numpy/numpy/pull/13761), I originally modified the Poisson code to use logfactorial as you suggest, but Kevin (@bashtage on github) pointed out that the change could potentially alter the random stream for the legacy version. Making the change requires creating separate C functions, one for the legacy code that remains unchanged, and one for the newer Generator class that would use logfactorial. You can see the comments here (click on "Show resolved"): https://github.com/numpy/numpy/pull/13761#pullrequestreview-249973405 At the time, making that change was not a high priority, so I didn't pursue it. It does make sense to use the logfactorial function there, and I'd be happy to see it updated, but be aware that making the change is more work than changing just the function call. Warren > > > -- > Sent from: http://numpy-discussion.10968.n7.nabble.com/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From blkzol001 at myuct.ac.za Sat Mar 6 14:05:14 2021 From: blkzol001 at myuct.ac.za (zoj613) Date: Sat, 6 Mar 2021 12:05:14 -0700 (MST) Subject: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler In-Reply-To: References: <1615052195691-0.post@n7.nabble.com> Message-ID: <1615057514907-0.post@n7.nabble.com> Ah, I had a suspicion that it was to preserve the random stream but wasn't too sure. Thanks for the clarification. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From robert.kern at gmail.com Sat Mar 6 21:39:38 2021 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 6 Mar 2021 21:39:38 -0500 Subject: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler In-Reply-To: References: <1615052195691-0.post@n7.nabble.com> Message-ID: On Sat, Mar 6, 2021 at 1:45 PM Warren Weckesser wrote: > At the time, making that change was not a high priority, so I didn't > pursue it. It does make sense to use the logfactorial function there, > and I'd be happy to see it updated, but be aware that making the > change is more work than changing just the function call. > Does it make a big difference? Per NEP 19, even in `Generator`, we do weigh the cost of changing the stream reasonably highly. Improved accuracy is likely worthwhile, but a minor performance improvement is probably not. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sun Mar 7 00:23:36 2021 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 7 Mar 2021 07:23:36 +0200 Subject: [Numpy-discussion] guide for downstream package authors & setting version constraints In-Reply-To: <1615053146181-0.post@n7.nabble.com> References: <1615053146181-0.post@n7.nabble.com> Message-ID: This is a different topic altogether. I think you would get better results asking on the cython-users mailing list with a concrete example of something that didn't work. Matti On 3/6/21 7:52 PM, zoj613 wrote: > Is there a best practice guide > somewhere in the docs on how to correctly expose C-level code to third > parties via .pxd files, similarly to how one can access the c_distributions > of numpy via cython? From toddrjen at gmail.com Sun Mar 7 00:30:56 2021 From: toddrjen at gmail.com (Todd) Date: Sun, 7 Mar 2021 00:30:56 -0500 Subject: [Numpy-discussion] String accessor methods In-Reply-To: <1615053424270-0.post@n7.nabble.com> References: <1615053424270-0.post@n7.nabble.com> Message-ID: On Sat, Mar 6, 2021 at 12:57 PM dan_patterson wrote: > The are in np.char > > mystr = np.array(["test first", "test second", "test third"]) > > np.char.title(mystr) > array(['Test First', 'Test Second', 'Test Third'], dtype=' I mentioned those in my email, but they are far less convenient to use than class methods, nor do they relate well to how built-in strings are used in Python. That is why other projects have started using accessor methods and why Python removed all the separate string functions in Python 3. The functions in np.char are also limited in their capabilities, and fairly poorly documented in my opinion. Some of those limitations are impossible to overcome, for example they inherently can never support operators, addition or multiplication, or slicing like Python strings can, while an accessor could. However, putting them as top-level methods for ndarray would pollute the methods too much. That is why I am suggesting numpy do the same thing that pandas, xarray, etc. are doing and putting those as methods under a 'str' attribute for ndarrays rather than as separate functions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.k.sheppard at gmail.com Sun Mar 7 04:34:29 2021 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Sun, 7 Mar 2021 09:34:29 +0000 Subject: [Numpy-discussion] String accessor methods In-Reply-To: References: <1615053424270-0.post@n7.nabble.com> Message-ID: I think that and string functions that are exposed from an ndarray would have to be guaranteed to work in-place. Requiring casting to objects to use the methods feels more like syntactic sugar than an essential case. I think most of the ones mentioned are low performance and can't take advantage of the storage as a blob of int8 (ascii) or int32 (utf32) that underlay Numpy string arrays. I also think the existence of these in pandas reduces the case for them being in Numpy. On Sun, Mar 7, 2021, 05:32 Todd wrote: > On Sat, Mar 6, 2021 at 12:57 PM dan_patterson > wrote: > >> The are in np.char >> >> mystr = np.array(["test first", "test second", "test third"]) >> >> np.char.title(mystr) >> array(['Test First', 'Test Second', 'Test Third'], dtype='> > > I mentioned those in my email, but they are far less convenient to use > than class methods, nor do they relate well to how built-in strings are > used in Python. That is why other projects have started using accessor > methods and why Python removed all the separate string functions in Python > 3. The functions in np.char are also limited in their capabilities, and > fairly poorly documented in my opinion. Some of those limitations are > impossible to overcome, for example they inherently can never support > operators, addition or multiplication, or slicing like Python strings can, > while an accessor could. > > However, putting them as top-level methods for ndarray would pollute the > methods too much. That is why I am suggesting numpy do the same thing that > pandas, xarray, etc. are doing and putting those as methods under a 'str' > attribute for ndarrays rather than as separate functions. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Mar 7 11:16:36 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 07 Mar 2021 10:16:36 -0600 Subject: [Numpy-discussion] String accessor methods In-Reply-To: References: <1615053424270-0.post@n7.nabble.com> Message-ID: <85bb80fe21d2666a68a78c7a8af792c7bd2d5b7d.camel@sipsolutions.net> On Sun, 2021-03-07 at 09:34 +0000, Kevin Sheppard wrote: > I think that and string functions that are exposed from an ndarray > would > have to be guaranteed to work in-place. Requiring casting to objects > to use > the methods feels more like syntactic sugar than an essential case. I > think > most of the ones mentioned are low performance and can't take > advantage of > the storage as a blob of int8 (ascii) or int32 (utf32) that underlay > Numpy > string arrays. > > I also think the existence of these in pandas reduces the case for > them > being in Numpy. I agree with this, the need seems much lower in NumPy. And NumPy's currently somewhat weird strings at least for me makes it even less appealing to expose more string utilities of any kind at this time. In general, there is probably something to be said about such "accessor", in the sense of having a place to put methods which are specific to the array's dtype. Other examples are datetime/timedelta or Units and probably many potential DTypes [1]. It is one advantage that the `astropy.units.Quantity` subclass has over a DType based solution: `methods` can be added very transparently. Basically: The current `np.char` functions are a bit weird and I would need a quite a bit more convincing to expose them at this time. But, I would be delighted if we can think of a solution that goes beyond `str` [2]. Probably not adding `ndarray.str` at all or only if the array has a string DType. But do it in way that generalizes! That could be a DType specific mixin class, or I had previously played with the thought of a "generic" accessor: `ndarray.elementwise.` But those go beyond the original string request and need some smart idea/thoughts! An interesting aside is that `arr.imag` and `arr.real` fall into the same category. But they are narrow enough that we can just have a specific solution for them. Cheers, Sebastian [1] Datetimes/timedelta might have some use of basic timezone handling (not sure if relevant to NumPy's naive datetimes). `astropy.units.Quantity` has a few extra methods/properties: * `.cgs`, `.si`, `.decompose()`, `.to()`: cast to different unit. * `.unit` * `.value`: get a value array view without any unit. * `.to_value()` method that returns a copy, not a view. Of course we can spell those using DTypes, but I think it might be long: `arr.astype(arr.dtype.cgs)`, or `arr.view(arr.dtype.unitless)`. Utility functions similar to `np.char` also can simplify all of this, but methods do have merit. Other user DTypes could very well have more compelling use-cases. [2] But it probably won't reach my serious thinking cycles for a while. For starters, dedicated utility functions seem decent enough... > > On Sun, Mar 7, 2021, 05:32 Todd wrote: > > > On Sat, Mar 6, 2021 at 12:57 PM dan_patterson < > > dan_patterson at outlook.com> > > wrote: > > > > > The are? in np.char > > > > > > mystr = np.array(["test first", "test second", "test third"]) > > > > > > np.char.title(mystr) > > > array(['Test First', 'Test Second', 'Test Third'], dtype=' > > > > > > I mentioned those in my email, but they are far less convenient to > > use > > than class methods, nor do they relate well to how built-in strings > > are > > used in Python. That is why other projects have started using > > accessor > > methods and why Python removed all the separate string functions in > > Python > > 3. The functions in np.char are also limited in their capabilities, > > and > > fairly poorly documented in my opinion.? Some of those limitations > > are > > impossible to overcome, for example they inherently can never > > support > > operators, addition or multiplication, or slicing like Python > > strings can, > > while an accessor could. > > > > However, putting them as top-level methods for ndarray would > > pollute the > > methods too much. That is why I am suggesting numpy do the same > > thing that > > pandas, xarray, etc. are doing and putting those as methods under a > > 'str' > > attribute for ndarrays rather than as separate functions. > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From blkzol001 at myuct.ac.za Mon Mar 8 04:51:51 2021 From: blkzol001 at myuct.ac.za (zoj613) Date: Mon, 8 Mar 2021 02:51:51 -0700 (MST) Subject: [Numpy-discussion] guide for downstream package authors & setting version constraints In-Reply-To: References: <1615053146181-0.post@n7.nabble.com> Message-ID: <1615197111584-0.post@n7.nabble.com> Thanks for the suggestion. However I was able to solve the issue I had by just creating inline wrapper functions in cython for the C functions so I dont have to link them when importing in other 3rd party cython modules. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From kevin.k.sheppard at gmail.com Mon Mar 8 11:43:26 2021 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Mon, 8 Mar 2021 16:43:26 +0000 Subject: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler In-Reply-To: References: <1615052195691-0.post@n7.nabble.com> Message-ID: I did a quick test and using random_loggam was about 6% faster than using logfactorial (on Windows). Kevin On Sun, Mar 7, 2021 at 2:40 AM Robert Kern wrote: > On Sat, Mar 6, 2021 at 1:45 PM Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> At the time, making that change was not a high priority, so I didn't >> pursue it. It does make sense to use the logfactorial function there, >> and I'd be happy to see it updated, but be aware that making the >> change is more work than changing just the function call. >> > > Does it make a big difference? Per NEP 19, even in `Generator`, we do > weigh the cost of changing the stream reasonably highly. Improved accuracy > is likely worthwhile, but a minor performance improvement is probably not. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From blkzol001 at myuct.ac.za Mon Mar 8 12:05:48 2021 From: blkzol001 at myuct.ac.za (zoj613) Date: Mon, 8 Mar 2021 10:05:48 -0700 (MST) Subject: [Numpy-discussion] Using logfactorial instead of loggamma in random_poisson sampler In-Reply-To: References: <1615052195691-0.post@n7.nabble.com> Message-ID: <1615223148767-0.post@n7.nabble.com> What do you think is the explanation for that? I had assumed that using a lookup table would be faster considering that the loggam implementation has loops and makes calls to elementary functions in it. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From sebastian at sipsolutions.net Mon Mar 8 15:15:01 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 08 Mar 2021 14:15:01 -0600 Subject: [Numpy-discussion] NumPy logo merchandise available at spreadshirt NumFOCUS shop In-Reply-To: References: <20a5ea10481cbd64672b8c43bfff9109d7c0e68a.camel@sipsolutions.net> <2911614896816@mail.yandex.ru> Message-ID: On Thu, 2021-03-04 at 16:54 -0800, Stephan Hoyer wrote: > I love your mittens! > > NumPy really should be in the NumFOCUS store, but it currently isn't: > https://shop.spreadshirt.com/numfocus/all > Various items with the NumPy logo (text below cube) are now available here from the NumFOCUS spreadshirt shop [1] here: https://shop.spreadshirt.com/numfocus/numpy?idea=604683f7998267255de40bcc If there is popular demand to add something else that should be no problem :). Unfortunately, I doubt we can add those amazing mittens there! Cheers, Sebastian [1] https://shop.spreadshirt.com/numfocus > I'll make some inquiries to see if we can sort that out :). > > On Thu, Mar 4, 2021 at 2:43 PM wrote: > > > Hello. I was looking for a T-shirt with Numpy logo but didn't find > > anywhere. Anybody knows if there's a merchandise with Numpy? So I > > have to > > kneet mittens with Numpy logo for myself. > > > > Best regards! > > Konstantin > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Mar 9 22:57:25 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 09 Mar 2021 21:57:25 -0600 Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage Focus Message-ID: <6b3a8d4f360a7449ebd63fa5ef280dc55a63f5c5.camel@sipsolutions.net> Hi all, Our bi-weekly triage-focused NumPy development meeting is Wednesday, March 10th at 11 am Pacific Time (19:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized, discussed, or reviewed. Best regards Sebastian PS: We will probably schedule the community meeting in UTC next week probably, shifting to avoid shifting it by one hour. Which means that the times will shift for whoever has daylight saving time changes (which is this Sunday e.g. in the US). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Wed Mar 10 12:36:22 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 10 Mar 2021 11:36:22 -0600 Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47) In-Reply-To: References: Message-ID: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> Top Posting, to discuss post specific questions about NEP 47 and partially the start on implementing it in: https://github.com/numpy/numpy/pull/18585 There are probably many more that will crop up. But for me, each of these is a pretty major difficulty without a clear answer as of now. 1. I still need clarity how a library is supposed to use this namespace when the user passes in a NumPy array (mentioned before). The user must get back a NumPy array after all. Maybe that is just a decorator, but it seems important. 2. `np.result_type` special cases array-scalars (the current PR), NEP 47 promises it will not. The PR could attempt to work around that using `arr.dtype` int `result_type`, I expect there are more details to fight with there, but I am not sure. 3. For all other function, the same problem applies. You don't actually have anything to fix NumPy promotion rules. You could bake your own cake here for numeric types, but I am not sure, you might also need NEP 43 in all its promotion power to pull it off. 4. Now that I looked at the above, I do not feel its reasonable to limit this functionality to numeric dtypes. If someone uses a NumPy rational-dtype, why should a SciPy function currently implemented in pure NumPy reject that? In other words, I think this is the point where trying to be "minimal" is counterproductive. 4. The PR makes no attempt at handling binary operators in any way aside from greedily coercing the other operand. 5. What happens with a mix of array-likes or even array subclasses like `astropy.quantity`? 6. Is there any provision on how to deal with mixed array-like inputs? CuPy+numpy, etc.? I don't think we have to figure out everything up-front, but I do think there are a few very fundamental questions still open, at least for me personally. Cheers, Sebastian On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote: > Hi all, > > Here is a NEP, written together with Stephan Hoyer and Aaron Meurer, > for > discussion on adoption of the array API standard ( > https://data-apis.github.io/array-api/latest/). This will add a new > numpy.array_api submodule containing that standardized API. The main > purpose of this API is to be able to write code that is portable to > other > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and > MXNet. > > We expect this NEP to remain in draft state for quite a while, while > we're > gaining experience with using it in downstream libraries, discuss > adding it > to other array libraries, and finishing some of the loose ends (e.g., > specifications for linear algebra functions that aren't merged yet, > see > https://github.com/data-apis/array-api/pulls) in the API standard > itself. > > See > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html > for an initial discussion about this topic. > > Please keep high-level discussion here and detailed comments on > https://github.com/numpy/numpy/pull/18456. Also, you can access a > rendered > version of the NEP from that PR (see PR description for how), which > may be > helpful. > Cheers, > Ralf > > > Abstract > -------- > > We propose to adopt the `Python array API standard`_, developed by > the > `Consortium for Python Data API Standards`_. Implementing this as a > separate > new namespace in NumPy will allow authors of libraries which depend > on NumPy > as well as end users to write code that is portable between NumPy and > all > other array/tensor libraries that adopt this standard. > > .. note:: > > ??? We expect that this NEP will remain in a draft state for quite a > while. > ??? Given the large scope we don't expect to propose it for > acceptance any > ??? time soon; instead, we want to solicit feedback on both the high- > level > ??? design and implementation, and learn what needs describing better > in > this > ??? NEP or changing in either the implementation or the array API > standard > ??? itself. > > > Motivation and Scope > -------------------- > > Python users have a wealth of choice for libraries and frameworks for > numerical computing, data science, machine learning, and deep > learning. New > frameworks pushing forward the state of the art in these fields are > appearing > every year. One unintended consequence of all this activity and > creativity > has been fragmentation in multidimensional array (a.k.a. tensor) > libraries - > which are the fundamental data structure for these fields. Choices > include > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others. > > The APIs of each of these libraries are largely similar, but with > enough > differences that it?s quite difficult to write code that works with > multiple > (or all) of these libraries. The array API standard aims to address > that > issue, by specifying an API for the most common ways arrays are > constructed > and used. The proposed API is quite similar to NumPy's API, and > deviates > mainly > in places where (a) NumPy made design choices that are inherently not > portable > to other implementations, and (b) where other libraries consistently > deviated > from NumPy on purpose because NumPy's design turned out to have > issues or > unnecessary complexity. > > For a longer discussion on the purpose of the array API standard we > refer to > the `Purpose and Scope section of the array API standard < > https://data-apis.github.io/array-api/latest/purpose_and_scope.html>` > __ > and the two blog posts announcing the formation of the Consortium > [1]_ and > the release of the first draft version of the standard for community > review > [2]_. > > The scope of this NEP includes: > > - Adopting the 2021 version of the array API standard > - Adding a separate namespace, tentatively named ``numpy.array_api`` > - Changes needed/desired outside of the new namespace, for example > new > dunder > ? methods on the ``ndarray`` object > - Implementation choices, and differences between functions in the > new > ? namespace with those in the main ``numpy`` namespace > - A new array object conforming to the array API standard > - Maintenance effort and testing strategy > - Impact on NumPy's total exposed API surface and on other future and > ? under-discussion design choices > - Relation to existing and proposed NumPy array protocols > ? (``__array_ufunc__``, ``__array_function__``, > ``__array_module__``). > - Required improvements to existing NumPy functionality > > Out of scope for this NEP are: > > - Changes in the array API standard itself. Those are likely to come > up > ? during review of this NEP, but should be upstreamed as needed and > this NEP > ? subsequently updated. > > > Usage and Impact > ---------------- > > *This section will be fleshed out later, for now we refer to the use > cases > given > in* `the array API standard Use Cases section < > https://data-apis.github.io/array-api/latest/use_cases.html>`__ > > In addition to those use cases, the new namespace contains > functionality > that > is widely used and supported by many array libraries. As such, it is > a good > set of functions to teach to newcomers to NumPy and recommend as > "best > practice". That contrasts with NumPy's main namespace, which contains > many > functions and objects that have been superceded or we consider > mistakes - > but > that we can't remove because of backwards compatibility reasons. > > The usage of the ``numpy.array_api`` namespace by downstream > libraries is > intended to enable them to consume multiple kinds of arrays, *without > having > to have a hard dependency on all of those array libraries*: > > .. image:: _static/nep-0047-library-dependencies.png > > Adoption in downstream libraries > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The prototype implementation of the ``array_api`` namespace will be > used > with > SciPy, scikit-learn and other libraries of interest that depend on > NumPy, in > order to get more experience with the design and find out if any > important > parts are missing. > > The pattern to support multiple array libraries is intended to be > something > like:: > > ??? def somefunc(x, y): > ??????? # Retrieves standard namespace. Raises if x and y have > different > ??????? # namespaces.? See Appendix for possible get_namespace > implementation > ??????? xp = get_namespace(x, y) > ??????? out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0) > ??????? return out > > The ``get_namespace`` call is effectively the library author opting > in to > using the standard API namespace, and thereby explicitly supporting > all conforming array libraries. > > > The ``asarray`` / ``asanyarray`` pattern > ```````````````````````````````````````` > > Many existing libraries use the same ``asarray`` (or ``asanyarray``) > pattern > as NumPy itself does; accepting any object that can be coerced into a > ``np.ndarray``. > We consider this design pattern problematic - keeping in mind the Zen > of > Python, *"explicit is better than implicit"*, as well as the pattern > being > historically problematic in the SciPy ecosystem for ``ndarray`` > subclasses > and with over-eager object creation. All other array/tensor libraries > are > more strict, and that works out fine in practice. We would advise > authors of > new libraries to avoid the ``asarray`` pattern. Instead they should > either > accept just NumPy arrays or, if they want to support multiple kinds > of > arrays, check if the incoming array object supports the array API > standard > by checking for ``__array_namespace__`` as shown in the example > above. > > Existing libraries can do such a check as well, and only call > ``asarray`` if > the check fails. This is very similar to the ``__duckarray__`` idea > in > :ref:`NEP30`. > > > .. _adoption-application-code: > > Adoption in application code > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The new namespace can be seen by end users as a cleaned up and > slimmed down > version of NumPy's main namespace. Encouraging end users to use this > namespace like:: > > ??? import numpy.array_api as xp > > ??? x = xp.linspace(0, 2*xp.pi, num=100) > ??? y = xp.cos(x) > > seems perfectly reasonable, and potentially beneficial - users get > offered > only > one function for each purpose (the one we consider best-practice), > and they > then write code that is more easily portable to other libraries. > > > Backward compatibility > ---------------------- > > No deprecations or removals of existing NumPy APIs or other backwards > incompatible changes are proposed. > > > High-level design > ----------------- > > The array API standard consists of approximately 120 objects, all of > which > have a direct NumPy equivalent. This figure shows what is included at > a > high level: > > .. image:: _static/nep-0047-scope-of-array-API.png > > The most important changes compared to what NumPy currently offers > are: > > - A new array object which: > > ??? - conforms to the casting rules and indexing behaviour specified > by the > ????? standard, > ??? - does not have methods other than dunder methods, > ??? - does not support the full range of NumPy indexing behaviour. > Advanced > ????? indexing with integers is not supported. Only boolean indexing > ????? with a single (possibly multi-dimensional) boolean array is > supported. > ????? An indexing expression that selects a single element returns a > 0-D > array > ????? rather than a scalar. > > - Functions in the ``array_api`` namespace: > > ??? - do not accept ``array_like`` inputs, only NumPy arrays and > Python > scalars > ??? - do not support ``__array_ufunc__`` and ``__array_function__``, > ??? - use positional-only and keyword-only parameters in their > signatures, > ??? - have inline type annotations, > ??? - may have minor changes to signatures and semantics of > individual > ????? functions compared to their equivalents already present in > NumPy, > ??? - only support dtype literals, not format strings or other ways > of > ????? specifying dtypes > > - DLPack_ support will be added to NumPy, > - New syntax for "device support" will be added, through a > ``.device`` > ? attribute on the new array object, and ``device=`` keywords in > array > creation > ? functions in the ``array_api`` namespace, > - Casting rules that differ from those NumPy currently has. Output > dtypes > can > ? be derived from input dtypes (i.e. no value-based casting), and 0-D > arrays > ? are treated like >=1-D arrays. > - Not all dtypes NumPy has are part of the standard. Only boolean, > signed > and > ? unsigned integers, and floating-point dtypes up to ``float64`` are > supported. > ? Complex dtypes are expected to be added in the next version of the > standard. > ? Extended precision, string, void, object and datetime dtypes, as > well as > ? structured dtypes, are not included. > > Improvements to existing NumPy functionality that are needed include: > > - Add support for stacks of matrices to some functions in > ``numpy.linalg`` > ? that are currently missing such support. > - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``. > - Add a "never copy" mode to ``np.asarray``. > > > Functions in the ``array_api`` namespace > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Let's start with an example of a function implementation that shows > the most > important differences with the equivalent function in the main > namespace:: > > ??? def max(x: array, /, *, > ??????????? axis: Optional[Union[int, Tuple[int, ...]]] = None, > ??????????? keepdims: bool = False > ??????? ) -> array: > ??????? """ > ??????? Array API compatible wrapper for :py:func:`np.max > `. > ??????? """ > ??????? return np.max._implementation(x, axis=axis, > keepdims=keepdims) > > This function does not accept ``array_like`` inputs, only > ``ndarray``. There > are multiple reasons for this. Other array libraries all work like > this. > Letting the user do coercion of lists, generators, or other foreign > objects > separately results in a cleaner design with less unexpected > behaviour. > It's higher-performance - less overhead from ``asarray`` calls. > Static > typing > is easier. Subclasses will work as expected. And the slight increase > in > verbosity > because users have to explicitly coerce to ``ndarray`` on rare > occasions > seems like a small price to pay. > > This function does not support ``__array_ufunc__`` nor > ``__array_function__``. > These protocols serve a similar purpose as the array API standard > module > itself, > but through a different mechanisms. Because only ``ndarray`` > instances are > accepted, > dispatching via one of these protocols isn't useful anymore. > > This function uses positional-only parameters in its signature. This > makes > code > more portable - writing ``max(x=x, ...)`` is no longer valid, hence > if other > libraries call the first parameter ``input`` rather than ``x``, that > is > fine. > The rationale for keyword-only parameters (not shown in the above > example) > is > two-fold: clarity of end user code, and it being easier to extend the > signature > in the future with keywords in the desired order. > > This function has inline type annotations. Inline annotations are far > easier to > maintain than separate stub files. And because the types are simple, > this > will > not result in a large amount of clutter with type aliases or unions > like in > the > current stub files NumPy has. > > > DLPack support for zero-copy data interchange > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The ability to convert one kind of array into another kind is > valuable, and > indeed necessary when downstream libraries want to support multiple > kinds of > arrays. This requires a well-specified data exchange protocol. NumPy > already > supports two of these, namely the buffer protocol (i.e., PEP 3118), > and > the ``__array_interface__`` (Python side) / ``__array_struct__`` (C > side) > protocol. Both work similarly, letting the "producer" describe how > the data > is laid out in memory so the "consumer" can construct its own kind of > array > with a view on that data. > > DLPack works in a very similar way. The main reasons to prefer DLPack > over > the options already present in NumPy are: > > 1. DLPack is the only protocol with device support (e.g., GPUs using > CUDA or > ?? ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other > array > ?? libraries are not. Having one protocol per device isn't tenable, > hence > ?? device support is a must. > 2. Widespread support. DLPack has the widest adoption of all > protocols, only > ?? NumPy is missing support. And the experiences of other libraries > with it > ?? are positive. This contrasts with the protocols NumPy does > support, which > ?? are used very little - when other libraries want to interoperate > with > ?? NumPy, they typically use the (more limited, and NumPy-specific) > ?? ``__array__`` protocol. > > Adding support for DLPack to NumPy entails: > > - Adding a ``ndarray.__dlpack__`` method > - Adding a ``from_dlpack`` function, which takes as input an object > ? supporting ``__dlpack__``, and returns an ``ndarray``. > > DLPack is currently a ~200 LoC header, and is meant to be included > directly, so > no external dependency is needed. Implementation should be > straightforward. > > > Syntax for device support > ~~~~~~~~~~~~~~~~~~~~~~~~~ > > NumPy itself is CPU-only, so it clearly doesn't have a need for > device > support. > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet) > support > multiple types of devices: CPU, GPU, TPU, and more exotic hardware. > To write portable code on systems with multiple devices, it's often > necessary > to create new arrays on the same device as some other array, or check > that > two arrays live on the same device. Hence syntax for that is needed. > > The array object will have a ``.device`` attribute which enables > comparing > devices of different arrays (they only should compare equal if both > arrays > are > from the same library and it's the same hardware device). > Furthermore, > ``device=`` keywords in array creation functions are needed. For > example:: > > ??? def empty(shape: Union[int, Tuple[int, ...]], /, *, > ????????????? dtype: Optional[dtype] = None, > ????????????? device: Optional[device] = None) -> array: > ??????? """ > ??????? Array API compatible wrapper for :py:func:`np.empty > `. > ??????? """ > ??????? return np.empty(shape, dtype=dtype, device=device) > > The implementation for NumPy may be as simple as setting the device > attribute to > the string ``'cpu'`` and raising an exception if array creation > functions > encounter any other value. > > > Dtypes and casting rules > ~~~~~~~~~~~~~~~~~~~~~~~~ > > The supported dtypes in this namespace are boolean, 8/16/32/64-bit > signed > and > unsigned integer, and 32/64-bit floating-point dtypes. These will be > added > to > the namespace as dtype literals with the expected names (e.g., > ``bool``, > ``uint16``, ``float64``). > > The most obvious omissions are the complex dtypes. The rationale for > the > lack > of complex support in the first version of the array API standard is > that > several > libraries (PyTorch, MXNet) are still in the process of adding support > for > complex dtypes. The next version of the standard is expected to > include > ``complex64`` > and ``complex128`` (see `this issue < > https://github.com/data-apis/array-api/issues/102>`__ > for more details). > > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is > expected > to only use the dtype literals. Format strings, Python builtin > dtypes, or > string representations of the dtype literals are not accepted - this > will > improve readability and portability of code at little cost. > > Casting rules are only defined between different dtypes of the same > kind. > The > rationale for this is that mixed-kind (e.g., integer to floating- > point) > casting behavior differs between libraries. NumPy's mixed-kind > casting > behavior doesn't need to be changed or restricted, it only needs to > be > documented that if users use mixed-kind casting, their code may not > be > portable. > > .. image:: _static/nep-0047-casting-rules-lattice.png > > *Type promotion diagram. Promotion between any two types is given by > their > join on this lattice. Only the types of participating arrays matter, > not > their values. Dashed lines indicate that behaviour for Python scalars > is > undefined on overflow. Boolean, integer and floating-point dtypes are > not > connected, indicating mixed-kind promotion is undefined.* > > The most important difference between the casting rules in NumPy and > in the > array API standard is how scalars and 0-dimensional arrays are > handled. In > the standard, array scalars do not exist and 0-dimensional arrays > follow the > same casting rules as higher-dimensional arrays. > > See the `Type Promotion Rules section of the array API standard < > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html > > `__ > for more details. > > .. note:: > > ??? It is not clear what the best way is to support the different > casting > rules > ??? for 0-dimensional arrays and no value-based casting. One option > may be > to > ??? implement this second set of casting rules, keep them private, > mark the > ??? array API functions with a private attribute that says they > adhere to > ??? these different rules, and let the casting machinery check > whether for > ??? that attribute. > > ??? This needs discussion. > > > Indexing > ~~~~~~~~ > > An indexing expression that would return a scalar with ``ndarray``, > e.g. > ``arr_2d[0, 0]``, will return a 0-D array with the new array object. > There > are > several reasons for that: array scalars are largely considered a > design > mistake > which no other array library copied; it works better for non-CPU > libraries > (typically arrays can live on the device, scalars live on the host); > and > it's > simply a consistent design. To get a Python scalar out of a 0-D > array, one > can > simply use the builtin for the type, e.g. ``float(arr_0d)``. > > The other `indexing modes in the standard < > https://data-apis.github.io/array-api/latest/API_specification/indexing.html > > `__ > do work largely the same as they do for ``numpy.ndarray``. One > noteworthy > difference is that clipping in slice indexing (e.g., ``a[:n]`` where > ``n`` > is > larger than the size of the first axis) is unspecified behaviour, > because > that kind of check can be expensive on accelerators. > > The lack of advanced indexing, and boolean indexing being limited to > a > single > n-D boolean array, is due to those indexing modes not being suitable > for all > types of arrays or JIT compilation. Their absence does not seem to be > problematic; if a user or library author wants to use them, they can > do so > through zero-copy conversion to ``numpy.ndarray``. This will signal > correctly > to whomever reads the code that it is then NumPy-specific rather than > portable > to all conforming array types. > > > > The array object > ~~~~~~~~~~~~~~~~ > > The array object in the standard does not have methods other than > dunder > methods. The rationale for that is that not all array libraries have > methods > on their array object (e.g., TensorFlow does not). It also provides > only a > single way of doing something, rather than have functions and methods > that > are effectively duplicate. > > Mixing operations that may produce views (e.g., indexing, > ``nonzero``) > in combination with mutation (e.g., item or slice assignment) is > `explicitly documented in the standard to not be supported < > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html > > `__. > This cannot easily be prohibited in the array object itself; instead > this > will > be guidance to the user via documentation. > > The standard current does not prescribe a name for the array object > itself. > We propose to simply name it ``ndarray``. This is the most obvious > name, and > because of the separate namespace should not clash with > ``numpy.ndarray``. > > > Implementation > -------------- > > .. note:: > > ??? This section needs a lot more detail, which will gradually be > added when > ??? the implementation progresses. > > A prototype of the ``array_api`` namespace can be found in > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api. > The docstring in its ``__init__.py`` has notes on completeness of the > implementation. The code for the wrapper functions also contains ``# > Note:`` > comments everywhere there is a difference with the NumPy API. > Two important parts that are not implemented yet are the new array > object > and > DLPack support. Functions may need changes to ensure the changed > casting > rules > are respected. > > The array object > ~~~~~~~~~~~~~~~~ > > Regarding the array object implementation, we plan to start with a > regular > Python class that wraps a ``numpy.ndarray`` instance. Attributes and > methods > can forward to that wrapped instance, applying input validation and > implementing changed behaviour as needed. > > The casting rules are probably the most challenging part. The in- > progress > dtype system refactor (NEPs 40-43) should make implementing the > correct > casting > behaviour easier - it is already moving away from value-based casting > for > example. > > > The dtype objects > ~~~~~~~~~~~~~~~~~ > > We must be able to compare dtypes for equality, and expressions like > these > must > be possible:: > > ??? np.array_api.some_func(..., dtype=x.dtype) > > The above implies it would be nice to have ``np.array_api.float32 == > np.array_api.ndarray(...).dtype``. > > Dtypes should not be assumed to have a class hierarchy by users, > however we > are > free to implement it with a class hierarchy if that's convenient. We > considered > the following options to implement dtype objects: > > 1. Alias dtypes to those in the main namespace. E.g., > ``np.array_api.float32 = > ?? np.float32``. > 2. Make the dtypes instances of ``np.dtype``. E.g., > ``np.array_api.float32 = > ?? np.dtype(np.float32)``. > 3. Create new singleton classes with only the required > methods/attributes > ?? (currently just ``__eq__``). > > It seems like (2) would be easiest from the perspective of > interacting with > functions outside the main namespace. And (3) would adhere best to > the > standard. > > TBD: the standard does not yet have a good way to inspect properties > of a > dtype, to ask questions like "is this an integer dtype?". Perhaps > this is > easy > enough to do for users, like so:: > > ??? def _get_dtype(dt_or_arr): > ??????? return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else > dt_or_arr > > ??? def is_floating(dtype_or_array): > ??????? dtype = _get_dtype(dtype_or_array) > ??????? return dtype in (float32, float64) > > ??? def is_integer(dtype_or_array): > ??????? dtype = _get_dtype(dtype_or_array) > ??????? return dtype in (uint8, uint16, uint32, uint64, int8, int16, > int32, > int64) > > However it could make sense to add to the standard. Note that NumPy > itself > currently does not have a great for asking such questions, see > `gh-17325 `__. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From asmeurer at gmail.com Wed Mar 10 15:44:47 2021 From: asmeurer at gmail.com (Aaron Meurer) Date: Wed, 10 Mar 2021 13:44:47 -0700 Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47) In-Reply-To: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> References: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> Message-ID: On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg wrote: > > Top Posting, to discuss post specific questions about NEP 47 and > partially the start on implementing it in: > > https://github.com/numpy/numpy/pull/18585 > > There are probably many more that will crop up. But for me, each of > these is a pretty major difficulty without a clear answer as of now. > > 1. I still need clarity how a library is supposed to use this namespace > when the user passes in a NumPy array (mentioned before). The user > must get back a NumPy array after all. Maybe that is just a decorator, > but it seems important. > > 2. `np.result_type` special cases array-scalars (the current PR), NEP > 47 promises it will not. The PR could attempt to work around that > using `arr.dtype` int `result_type`, I expect there are more details to > fight with there, but I am not sure. The idea is to work around it everywhere, so that it follows the rules in the spec (no array scalars, no value-based casting). I haven't started it yet, though, so I don't know yet how hard it will be. If it ends up being too hard we could put it in the same camp as device support and dlpack support where it needs some basic implementation in numpy itself first before we can properly do it in the array API namespace. > > 3. For all other function, the same problem applies. You don't actually > have anything to fix NumPy promotion rules. You could bake your own > cake here for numeric types, but I am not sure, you might also need NEP > 43 in all its promotion power to pull it off. > > 4. Now that I looked at the above, I do not feel its reasonable to > limit this functionality to numeric dtypes. If someone uses a NumPy > rational-dtype, why should a SciPy function currently implemented in > pure NumPy reject that? In other words, I think this is the point > where trying to be "minimal" is counterproductive. The idea of minimality is to make it so users can be sure they will be able to use other libraries, once they also have array API compliant namespaces. A rational-dtype wouldn't ever be implemented in those other libraries, because it isn't part of the standard, so if a user is using those, that is a sign they are using things that aren't in the array API, so they can't expect to be able to swap out their dtypes. If a user wants to use something that's only in NumPy, then they should just use NumPy. > > 4. The PR makes no attempt at handling binary operators in any way > aside from greedily coercing the other operand. > > 5. What happens with a mix of array-likes or even array subclasses like > `astropy.quantity`? > > 6. Is there any provision on how to deal with mixed array-like inputs? > CuPy+numpy, etc.? Neither of these are defined in the spec. The spec only deals with staying inside of the compliant namespace. It doesn't require any behavior mixing things from other namespaces. That's generally considered a much harder problem, and there is the data interchange protocol to deal with it (https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html). Aaron Meurer > > > I don't think we have to figure out everything up-front, but I do think > there are a few very fundamental questions still open, at least for me > personally. > > Cheers, > > Sebastian > > > > On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote: > > Hi all, > > > > Here is a NEP, written together with Stephan Hoyer and Aaron Meurer, > > for > > discussion on adoption of the array API standard ( > > https://data-apis.github.io/array-api/latest/). This will add a new > > numpy.array_api submodule containing that standardized API. The main > > purpose of this API is to be able to write code that is portable to > > other > > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, and > > MXNet. > > > > We expect this NEP to remain in draft state for quite a while, while > > we're > > gaining experience with using it in downstream libraries, discuss > > adding it > > to other array libraries, and finishing some of the loose ends (e.g., > > specifications for linear algebra functions that aren't merged yet, > > see > > https://github.com/data-apis/array-api/pulls) in the API standard > > itself. > > > > See > > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html > > for an initial discussion about this topic. > > > > Please keep high-level discussion here and detailed comments on > > https://github.com/numpy/numpy/pull/18456. Also, you can access a > > rendered > > version of the NEP from that PR (see PR description for how), which > > may be > > helpful. > > Cheers, > > Ralf > > > > > > Abstract > > -------- > > > > We propose to adopt the `Python array API standard`_, developed by > > the > > `Consortium for Python Data API Standards`_. Implementing this as a > > separate > > new namespace in NumPy will allow authors of libraries which depend > > on NumPy > > as well as end users to write code that is portable between NumPy and > > all > > other array/tensor libraries that adopt this standard. > > > > .. note:: > > > > We expect that this NEP will remain in a draft state for quite a > > while. > > Given the large scope we don't expect to propose it for > > acceptance any > > time soon; instead, we want to solicit feedback on both the high- > > level > > design and implementation, and learn what needs describing better > > in > > this > > NEP or changing in either the implementation or the array API > > standard > > itself. > > > > > > Motivation and Scope > > -------------------- > > > > Python users have a wealth of choice for libraries and frameworks for > > numerical computing, data science, machine learning, and deep > > learning. New > > frameworks pushing forward the state of the art in these fields are > > appearing > > every year. One unintended consequence of all this activity and > > creativity > > has been fragmentation in multidimensional array (a.k.a. tensor) > > libraries - > > which are the fundamental data structure for these fields. Choices > > include > > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others. > > > > The APIs of each of these libraries are largely similar, but with > > enough > > differences that it?s quite difficult to write code that works with > > multiple > > (or all) of these libraries. The array API standard aims to address > > that > > issue, by specifying an API for the most common ways arrays are > > constructed > > and used. The proposed API is quite similar to NumPy's API, and > > deviates > > mainly > > in places where (a) NumPy made design choices that are inherently not > > portable > > to other implementations, and (b) where other libraries consistently > > deviated > > from NumPy on purpose because NumPy's design turned out to have > > issues or > > unnecessary complexity. > > > > For a longer discussion on the purpose of the array API standard we > > refer to > > the `Purpose and Scope section of the array API standard < > > https://data-apis.github.io/array-api/latest/purpose_and_scope.html>` > > __ > > and the two blog posts announcing the formation of the Consortium > > [1]_ and > > the release of the first draft version of the standard for community > > review > > [2]_. > > > > The scope of this NEP includes: > > > > - Adopting the 2021 version of the array API standard > > - Adding a separate namespace, tentatively named ``numpy.array_api`` > > - Changes needed/desired outside of the new namespace, for example > > new > > dunder > > methods on the ``ndarray`` object > > - Implementation choices, and differences between functions in the > > new > > namespace with those in the main ``numpy`` namespace > > - A new array object conforming to the array API standard > > - Maintenance effort and testing strategy > > - Impact on NumPy's total exposed API surface and on other future and > > under-discussion design choices > > - Relation to existing and proposed NumPy array protocols > > (``__array_ufunc__``, ``__array_function__``, > > ``__array_module__``). > > - Required improvements to existing NumPy functionality > > > > Out of scope for this NEP are: > > > > - Changes in the array API standard itself. Those are likely to come > > up > > during review of this NEP, but should be upstreamed as needed and > > this NEP > > subsequently updated. > > > > > > Usage and Impact > > ---------------- > > > > *This section will be fleshed out later, for now we refer to the use > > cases > > given > > in* `the array API standard Use Cases section < > > https://data-apis.github.io/array-api/latest/use_cases.html>`__ > > > > In addition to those use cases, the new namespace contains > > functionality > > that > > is widely used and supported by many array libraries. As such, it is > > a good > > set of functions to teach to newcomers to NumPy and recommend as > > "best > > practice". That contrasts with NumPy's main namespace, which contains > > many > > functions and objects that have been superceded or we consider > > mistakes - > > but > > that we can't remove because of backwards compatibility reasons. > > > > The usage of the ``numpy.array_api`` namespace by downstream > > libraries is > > intended to enable them to consume multiple kinds of arrays, *without > > having > > to have a hard dependency on all of those array libraries*: > > > > .. image:: _static/nep-0047-library-dependencies.png > > > > Adoption in downstream libraries > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > The prototype implementation of the ``array_api`` namespace will be > > used > > with > > SciPy, scikit-learn and other libraries of interest that depend on > > NumPy, in > > order to get more experience with the design and find out if any > > important > > parts are missing. > > > > The pattern to support multiple array libraries is intended to be > > something > > like:: > > > > def somefunc(x, y): > > # Retrieves standard namespace. Raises if x and y have > > different > > # namespaces. See Appendix for possible get_namespace > > implementation > > xp = get_namespace(x, y) > > out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0) > > return out > > > > The ``get_namespace`` call is effectively the library author opting > > in to > > using the standard API namespace, and thereby explicitly supporting > > all conforming array libraries. > > > > > > The ``asarray`` / ``asanyarray`` pattern > > ```````````````````````````````````````` > > > > Many existing libraries use the same ``asarray`` (or ``asanyarray``) > > pattern > > as NumPy itself does; accepting any object that can be coerced into a > > ``np.ndarray``. > > We consider this design pattern problematic - keeping in mind the Zen > > of > > Python, *"explicit is better than implicit"*, as well as the pattern > > being > > historically problematic in the SciPy ecosystem for ``ndarray`` > > subclasses > > and with over-eager object creation. All other array/tensor libraries > > are > > more strict, and that works out fine in practice. We would advise > > authors of > > new libraries to avoid the ``asarray`` pattern. Instead they should > > either > > accept just NumPy arrays or, if they want to support multiple kinds > > of > > arrays, check if the incoming array object supports the array API > > standard > > by checking for ``__array_namespace__`` as shown in the example > > above. > > > > Existing libraries can do such a check as well, and only call > > ``asarray`` if > > the check fails. This is very similar to the ``__duckarray__`` idea > > in > > :ref:`NEP30`. > > > > > > .. _adoption-application-code: > > > > Adoption in application code > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > The new namespace can be seen by end users as a cleaned up and > > slimmed down > > version of NumPy's main namespace. Encouraging end users to use this > > namespace like:: > > > > import numpy.array_api as xp > > > > x = xp.linspace(0, 2*xp.pi, num=100) > > y = xp.cos(x) > > > > seems perfectly reasonable, and potentially beneficial - users get > > offered > > only > > one function for each purpose (the one we consider best-practice), > > and they > > then write code that is more easily portable to other libraries. > > > > > > Backward compatibility > > ---------------------- > > > > No deprecations or removals of existing NumPy APIs or other backwards > > incompatible changes are proposed. > > > > > > High-level design > > ----------------- > > > > The array API standard consists of approximately 120 objects, all of > > which > > have a direct NumPy equivalent. This figure shows what is included at > > a > > high level: > > > > .. image:: _static/nep-0047-scope-of-array-API.png > > > > The most important changes compared to what NumPy currently offers > > are: > > > > - A new array object which: > > > > - conforms to the casting rules and indexing behaviour specified > > by the > > standard, > > - does not have methods other than dunder methods, > > - does not support the full range of NumPy indexing behaviour. > > Advanced > > indexing with integers is not supported. Only boolean indexing > > with a single (possibly multi-dimensional) boolean array is > > supported. > > An indexing expression that selects a single element returns a > > 0-D > > array > > rather than a scalar. > > > > - Functions in the ``array_api`` namespace: > > > > - do not accept ``array_like`` inputs, only NumPy arrays and > > Python > > scalars > > - do not support ``__array_ufunc__`` and ``__array_function__``, > > - use positional-only and keyword-only parameters in their > > signatures, > > - have inline type annotations, > > - may have minor changes to signatures and semantics of > > individual > > functions compared to their equivalents already present in > > NumPy, > > - only support dtype literals, not format strings or other ways > > of > > specifying dtypes > > > > - DLPack_ support will be added to NumPy, > > - New syntax for "device support" will be added, through a > > ``.device`` > > attribute on the new array object, and ``device=`` keywords in > > array > > creation > > functions in the ``array_api`` namespace, > > - Casting rules that differ from those NumPy currently has. Output > > dtypes > > can > > be derived from input dtypes (i.e. no value-based casting), and 0-D > > arrays > > are treated like >=1-D arrays. > > - Not all dtypes NumPy has are part of the standard. Only boolean, > > signed > > and > > unsigned integers, and floating-point dtypes up to ``float64`` are > > supported. > > Complex dtypes are expected to be added in the next version of the > > standard. > > Extended precision, string, void, object and datetime dtypes, as > > well as > > structured dtypes, are not included. > > > > Improvements to existing NumPy functionality that are needed include: > > > > - Add support for stacks of matrices to some functions in > > ``numpy.linalg`` > > that are currently missing such support. > > - Add the ``keepdims`` keyword to ``np.argmin`` and ``np.argmax``. > > - Add a "never copy" mode to ``np.asarray``. > > > > > > Functions in the ``array_api`` namespace > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > Let's start with an example of a function implementation that shows > > the most > > important differences with the equivalent function in the main > > namespace:: > > > > def max(x: array, /, *, > > axis: Optional[Union[int, Tuple[int, ...]]] = None, > > keepdims: bool = False > > ) -> array: > > """ > > Array API compatible wrapper for :py:func:`np.max > > `. > > """ > > return np.max._implementation(x, axis=axis, > > keepdims=keepdims) > > > > This function does not accept ``array_like`` inputs, only > > ``ndarray``. There > > are multiple reasons for this. Other array libraries all work like > > this. > > Letting the user do coercion of lists, generators, or other foreign > > objects > > separately results in a cleaner design with less unexpected > > behaviour. > > It's higher-performance - less overhead from ``asarray`` calls. > > Static > > typing > > is easier. Subclasses will work as expected. And the slight increase > > in > > verbosity > > because users have to explicitly coerce to ``ndarray`` on rare > > occasions > > seems like a small price to pay. > > > > This function does not support ``__array_ufunc__`` nor > > ``__array_function__``. > > These protocols serve a similar purpose as the array API standard > > module > > itself, > > but through a different mechanisms. Because only ``ndarray`` > > instances are > > accepted, > > dispatching via one of these protocols isn't useful anymore. > > > > This function uses positional-only parameters in its signature. This > > makes > > code > > more portable - writing ``max(x=x, ...)`` is no longer valid, hence > > if other > > libraries call the first parameter ``input`` rather than ``x``, that > > is > > fine. > > The rationale for keyword-only parameters (not shown in the above > > example) > > is > > two-fold: clarity of end user code, and it being easier to extend the > > signature > > in the future with keywords in the desired order. > > > > This function has inline type annotations. Inline annotations are far > > easier to > > maintain than separate stub files. And because the types are simple, > > this > > will > > not result in a large amount of clutter with type aliases or unions > > like in > > the > > current stub files NumPy has. > > > > > > DLPack support for zero-copy data interchange > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > The ability to convert one kind of array into another kind is > > valuable, and > > indeed necessary when downstream libraries want to support multiple > > kinds of > > arrays. This requires a well-specified data exchange protocol. NumPy > > already > > supports two of these, namely the buffer protocol (i.e., PEP 3118), > > and > > the ``__array_interface__`` (Python side) / ``__array_struct__`` (C > > side) > > protocol. Both work similarly, letting the "producer" describe how > > the data > > is laid out in memory so the "consumer" can construct its own kind of > > array > > with a view on that data. > > > > DLPack works in a very similar way. The main reasons to prefer DLPack > > over > > the options already present in NumPy are: > > > > 1. DLPack is the only protocol with device support (e.g., GPUs using > > CUDA or > > ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other > > array > > libraries are not. Having one protocol per device isn't tenable, > > hence > > device support is a must. > > 2. Widespread support. DLPack has the widest adoption of all > > protocols, only > > NumPy is missing support. And the experiences of other libraries > > with it > > are positive. This contrasts with the protocols NumPy does > > support, which > > are used very little - when other libraries want to interoperate > > with > > NumPy, they typically use the (more limited, and NumPy-specific) > > ``__array__`` protocol. > > > > Adding support for DLPack to NumPy entails: > > > > - Adding a ``ndarray.__dlpack__`` method > > - Adding a ``from_dlpack`` function, which takes as input an object > > supporting ``__dlpack__``, and returns an ``ndarray``. > > > > DLPack is currently a ~200 LoC header, and is meant to be included > > directly, so > > no external dependency is needed. Implementation should be > > straightforward. > > > > > > Syntax for device support > > ~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > NumPy itself is CPU-only, so it clearly doesn't have a need for > > device > > support. > > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet) > > support > > multiple types of devices: CPU, GPU, TPU, and more exotic hardware. > > To write portable code on systems with multiple devices, it's often > > necessary > > to create new arrays on the same device as some other array, or check > > that > > two arrays live on the same device. Hence syntax for that is needed. > > > > The array object will have a ``.device`` attribute which enables > > comparing > > devices of different arrays (they only should compare equal if both > > arrays > > are > > from the same library and it's the same hardware device). > > Furthermore, > > ``device=`` keywords in array creation functions are needed. For > > example:: > > > > def empty(shape: Union[int, Tuple[int, ...]], /, *, > > dtype: Optional[dtype] = None, > > device: Optional[device] = None) -> array: > > """ > > Array API compatible wrapper for :py:func:`np.empty > > `. > > """ > > return np.empty(shape, dtype=dtype, device=device) > > > > The implementation for NumPy may be as simple as setting the device > > attribute to > > the string ``'cpu'`` and raising an exception if array creation > > functions > > encounter any other value. > > > > > > Dtypes and casting rules > > ~~~~~~~~~~~~~~~~~~~~~~~~ > > > > The supported dtypes in this namespace are boolean, 8/16/32/64-bit > > signed > > and > > unsigned integer, and 32/64-bit floating-point dtypes. These will be > > added > > to > > the namespace as dtype literals with the expected names (e.g., > > ``bool``, > > ``uint16``, ``float64``). > > > > The most obvious omissions are the complex dtypes. The rationale for > > the > > lack > > of complex support in the first version of the array API standard is > > that > > several > > libraries (PyTorch, MXNet) are still in the process of adding support > > for > > complex dtypes. The next version of the standard is expected to > > include > > ``complex64`` > > and ``complex128`` (see `this issue < > > https://github.com/data-apis/array-api/issues/102>`__ > > for more details). > > > > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, is > > expected > > to only use the dtype literals. Format strings, Python builtin > > dtypes, or > > string representations of the dtype literals are not accepted - this > > will > > improve readability and portability of code at little cost. > > > > Casting rules are only defined between different dtypes of the same > > kind. > > The > > rationale for this is that mixed-kind (e.g., integer to floating- > > point) > > casting behavior differs between libraries. NumPy's mixed-kind > > casting > > behavior doesn't need to be changed or restricted, it only needs to > > be > > documented that if users use mixed-kind casting, their code may not > > be > > portable. > > > > .. image:: _static/nep-0047-casting-rules-lattice.png > > > > *Type promotion diagram. Promotion between any two types is given by > > their > > join on this lattice. Only the types of participating arrays matter, > > not > > their values. Dashed lines indicate that behaviour for Python scalars > > is > > undefined on overflow. Boolean, integer and floating-point dtypes are > > not > > connected, indicating mixed-kind promotion is undefined.* > > > > The most important difference between the casting rules in NumPy and > > in the > > array API standard is how scalars and 0-dimensional arrays are > > handled. In > > the standard, array scalars do not exist and 0-dimensional arrays > > follow the > > same casting rules as higher-dimensional arrays. > > > > See the `Type Promotion Rules section of the array API standard < > > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html > > > `__ > > for more details. > > > > .. note:: > > > > It is not clear what the best way is to support the different > > casting > > rules > > for 0-dimensional arrays and no value-based casting. One option > > may be > > to > > implement this second set of casting rules, keep them private, > > mark the > > array API functions with a private attribute that says they > > adhere to > > these different rules, and let the casting machinery check > > whether for > > that attribute. > > > > This needs discussion. > > > > > > Indexing > > ~~~~~~~~ > > > > An indexing expression that would return a scalar with ``ndarray``, > > e.g. > > ``arr_2d[0, 0]``, will return a 0-D array with the new array object. > > There > > are > > several reasons for that: array scalars are largely considered a > > design > > mistake > > which no other array library copied; it works better for non-CPU > > libraries > > (typically arrays can live on the device, scalars live on the host); > > and > > it's > > simply a consistent design. To get a Python scalar out of a 0-D > > array, one > > can > > simply use the builtin for the type, e.g. ``float(arr_0d)``. > > > > The other `indexing modes in the standard < > > https://data-apis.github.io/array-api/latest/API_specification/indexing.html > > > `__ > > do work largely the same as they do for ``numpy.ndarray``. One > > noteworthy > > difference is that clipping in slice indexing (e.g., ``a[:n]`` where > > ``n`` > > is > > larger than the size of the first axis) is unspecified behaviour, > > because > > that kind of check can be expensive on accelerators. > > > > The lack of advanced indexing, and boolean indexing being limited to > > a > > single > > n-D boolean array, is due to those indexing modes not being suitable > > for all > > types of arrays or JIT compilation. Their absence does not seem to be > > problematic; if a user or library author wants to use them, they can > > do so > > through zero-copy conversion to ``numpy.ndarray``. This will signal > > correctly > > to whomever reads the code that it is then NumPy-specific rather than > > portable > > to all conforming array types. > > > > > > > > The array object > > ~~~~~~~~~~~~~~~~ > > > > The array object in the standard does not have methods other than > > dunder > > methods. The rationale for that is that not all array libraries have > > methods > > on their array object (e.g., TensorFlow does not). It also provides > > only a > > single way of doing something, rather than have functions and methods > > that > > are effectively duplicate. > > > > Mixing operations that may produce views (e.g., indexing, > > ``nonzero``) > > in combination with mutation (e.g., item or slice assignment) is > > `explicitly documented in the standard to not be supported < > > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html > > > `__. > > This cannot easily be prohibited in the array object itself; instead > > this > > will > > be guidance to the user via documentation. > > > > The standard current does not prescribe a name for the array object > > itself. > > We propose to simply name it ``ndarray``. This is the most obvious > > name, and > > because of the separate namespace should not clash with > > ``numpy.ndarray``. > > > > > > Implementation > > -------------- > > > > .. note:: > > > > This section needs a lot more detail, which will gradually be > > added when > > the implementation progresses. > > > > A prototype of the ``array_api`` namespace can be found in > > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api. > > The docstring in its ``__init__.py`` has notes on completeness of the > > implementation. The code for the wrapper functions also contains ``# > > Note:`` > > comments everywhere there is a difference with the NumPy API. > > Two important parts that are not implemented yet are the new array > > object > > and > > DLPack support. Functions may need changes to ensure the changed > > casting > > rules > > are respected. > > > > The array object > > ~~~~~~~~~~~~~~~~ > > > > Regarding the array object implementation, we plan to start with a > > regular > > Python class that wraps a ``numpy.ndarray`` instance. Attributes and > > methods > > can forward to that wrapped instance, applying input validation and > > implementing changed behaviour as needed. > > > > The casting rules are probably the most challenging part. The in- > > progress > > dtype system refactor (NEPs 40-43) should make implementing the > > correct > > casting > > behaviour easier - it is already moving away from value-based casting > > for > > example. > > > > > > The dtype objects > > ~~~~~~~~~~~~~~~~~ > > > > We must be able to compare dtypes for equality, and expressions like > > these > > must > > be possible:: > > > > np.array_api.some_func(..., dtype=x.dtype) > > > > The above implies it would be nice to have ``np.array_api.float32 == > > np.array_api.ndarray(...).dtype``. > > > > Dtypes should not be assumed to have a class hierarchy by users, > > however we > > are > > free to implement it with a class hierarchy if that's convenient. We > > considered > > the following options to implement dtype objects: > > > > 1. Alias dtypes to those in the main namespace. E.g., > > ``np.array_api.float32 = > > np.float32``. > > 2. Make the dtypes instances of ``np.dtype``. E.g., > > ``np.array_api.float32 = > > np.dtype(np.float32)``. > > 3. Create new singleton classes with only the required > > methods/attributes > > (currently just ``__eq__``). > > > > It seems like (2) would be easiest from the perspective of > > interacting with > > functions outside the main namespace. And (3) would adhere best to > > the > > standard. > > > > TBD: the standard does not yet have a good way to inspect properties > > of a > > dtype, to ask questions like "is this an integer dtype?". Perhaps > > this is > > easy > > enough to do for users, like so:: > > > > def _get_dtype(dt_or_arr): > > return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') else > > dt_or_arr > > > > def is_floating(dtype_or_array): > > dtype = _get_dtype(dtype_or_array) > > return dtype in (float32, float64) > > > > def is_integer(dtype_or_array): > > dtype = _get_dtype(dtype_or_array) > > return dtype in (uint8, uint16, uint32, uint64, int8, int16, > > int32, > > int64) > > > > However it could make sense to add to the standard. Note that NumPy > > itself > > currently does not have a great for asking such questions, see > > `gh-17325 `__. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Wed Mar 10 17:35:49 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 10 Mar 2021 16:35:49 -0600 Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47) In-Reply-To: References: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> Message-ID: <01b5e7193b6e9b261befb4e62c5b94f39debf69f.camel@sipsolutions.net> On Wed, 2021-03-10 at 13:44 -0700, Aaron Meurer wrote: > On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg > wrote: > > > > Top Posting, to discuss post specific questions about NEP 47 and > > partially the start on implementing it in: > > > > ??? https://github.com/numpy/numpy/pull/18585 > > > > There are probably many more that will crop up. But for me, each of > > these is a pretty major difficulty without a clear answer as of > > now. > > > > 1. I still need clarity how a library is supposed to use this > > namespace > > when the user passes in a NumPy array (mentioned before).? The user > > must get back a NumPy array after all.? Maybe that is just a > > decorator, > > but it seems important. > > > > 2. `np.result_type` special cases array-scalars (the current PR), > > NEP > > 47 promises it will not.? The PR could attempt to work around that > > using `arr.dtype` int `result_type`, I expect there are more > > details to > > fight with there, but I am not sure. > > The idea is to work around it everywhere, so that it follows the > rules > in the spec (no array scalars, no value-based casting). I haven't > started it yet, though, so I don't know yet how hard it will be. If > it > ends up being too hard we could put it in the same camp as device > support and dlpack support where it needs some basic implementation > in > numpy itself first before we can properly do it in the array API > namespace. Quite frankly. If you really want to implement a minimal API, it may be best to just write it yourself and ditch NumPy. (Of course I currently doubt that the NEP 47 implementation should be minimal.) About doing promotion yourself ("promotion" as in what ufuncs do; I call `np.result_type` "common DType", because it is used e.g. in `concatenate`): Ufuncs have at least one more rule for true-division, plus there may be mixed float-int loops, etc. Since the standard is very limited and you only have numeric dtypes that might be all though. In any case, my point is: If NumPy does strange things (and it does with 0-D arrays currently). You could cook your own soup there also, and implement it in NumPy by using `signature=...` in the ufunc call. > > > > > 3. For all other function, the same problem applies. You don't > > actually > > have anything to fix NumPy promotion rules.? You could bake your > > own > > cake here for numeric types, but I am not sure, you might also need > > NEP > > 43 in all its promotion power to pull it off. > > > > 4. Now that I looked at the above, I do not feel its reasonable to > > limit this functionality to numeric dtypes.? If someone uses a > > NumPy > > rational-dtype, why should a SciPy function currently implemented > > in > > pure NumPy reject that?? In other words, I think this is the point > > where trying to be "minimal" is counterproductive. > > The idea of minimality is to make it so users can be sure they will > be > able to use other libraries, once they also have array API compliant > namespaces. A rational-dtype wouldn't ever be implemented in those > other libraries, because it isn't part of the standard, so if a user > is using those, that is a sign they are using things that aren't in > the array API, so they can't expect to be able to swap out their > dtypes. If a user wants to use something that's only in NumPy, then > they should just use NumPy. > This is not about the "user", in your scenario the end-user does use NumPy. The way I understand this is not a prerequisite. If it is, a lot of things will be simpler though, and most of my doubts will go away (but be replaced with uncertainty about the usefulness). The problem is that SciPy as the "library author" wants to to use NEP 47 without limiting the end-user (or the end-user even noticing!). The distinction between end-user and library author (someone who writes a function that should work with numpy, pytorch, etc.) is very important here and too all of these "protocol" discussions. I assume that SciPy should be able to have the cake and eat it to: * Uses the limited array-api and make sure to only rely on the minimal subset. * Not artificially limit end-users who pass in NumPy arrays. The second point can also be read as: SciPy would be able to support practically all current NumPy array use cases without jumping through any additional hoops (or well, maybe a bit of churn, but churn that is made easy by as of now undefined API). > > > > 4. The PR makes no attempt at handling binary operators in any way > > aside from greedily coercing the other operand. > > > > 5. What happens with a mix of array-likes or even array subclasses > > like > > `astropy.quantity`? > > > > 6. Is there any provision on how to deal with mixed array-like > > inputs? > > CuPy+numpy, etc.? > > Neither of these are defined in the spec. The spec only deals with > staying inside of the compliant namespace. It doesn't require any > behavior mixing things from other namespaces. That's generally > considered a much harder problem, and there is the data interchange > protocol to deal with it > ( > https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html > ). > OK, maybe you can get away with it, since the current proposal seems to be that `get_namespace()` raises on mixed input. Still seems like something that should probably raise an error rather than coerce to NumPy when calling: `nep47_array_object + dask_array`. Cheers, Sebastian > Aaron Meurer > > > > > > > I don't think we have to figure out everything up-front, but I do > > think > > there are a few very fundamental questions still open, at least for > > me > > personally. > > > > Cheers, > > > > Sebastian > > > > > > > > On Sun, 2021-02-21 at 17:30 +0100, Ralf Gommers wrote: > > > Hi all, > > > > > > Here is a NEP, written together with Stephan Hoyer and Aaron > > > Meurer, > > > for > > > discussion on adoption of the array API standard ( > > > https://data-apis.github.io/array-api/latest/). This will add a > > > new > > > numpy.array_api submodule containing that standardized API. The > > > main > > > purpose of this API is to be able to write code that is portable > > > to > > > other > > > array/tensor libraries like CuPy, PyTorch, JAX, TensorFlow, Dask, > > > and > > > MXNet. > > > > > > We expect this NEP to remain in draft state for quite a while, > > > while > > > we're > > > gaining experience with using it in downstream libraries, discuss > > > adding it > > > to other array libraries, and finishing some of the loose ends > > > (e.g., > > > specifications for linear algebra functions that aren't merged > > > yet, > > > see > > > https://github.com/data-apis/array-api/pulls) in the API standard > > > itself. > > > > > > See > > > > > > https://mail.python.org/pipermail/numpy-discussion/2020-November/081181.html > > > for an initial discussion about this topic. > > > > > > Please keep high-level discussion here and detailed comments on > > > https://github.com/numpy/numpy/pull/18456. Also, you can access a > > > rendered > > > version of the NEP from that PR (see PR description for how), > > > which > > > may be > > > helpful. > > > Cheers, > > > Ralf > > > > > > > > > Abstract > > > -------- > > > > > > We propose to adopt the `Python array API standard`_, developed > > > by > > > the > > > `Consortium for Python Data API Standards`_. Implementing this as > > > a > > > separate > > > new namespace in NumPy will allow authors of libraries which > > > depend > > > on NumPy > > > as well as end users to write code that is portable between NumPy > > > and > > > all > > > other array/tensor libraries that adopt this standard. > > > > > > .. note:: > > > > > > ??? We expect that this NEP will remain in a draft state for > > > quite a > > > while. > > > ??? Given the large scope we don't expect to propose it for > > > acceptance any > > > ??? time soon; instead, we want to solicit feedback on both the > > > high- > > > level > > > ??? design and implementation, and learn what needs describing > > > better > > > in > > > this > > > ??? NEP or changing in either the implementation or the array API > > > standard > > > ??? itself. > > > > > > > > > Motivation and Scope > > > -------------------- > > > > > > Python users have a wealth of choice for libraries and frameworks > > > for > > > numerical computing, data science, machine learning, and deep > > > learning. New > > > frameworks pushing forward the state of the art in these fields > > > are > > > appearing > > > every year. One unintended consequence of all this activity and > > > creativity > > > has been fragmentation in multidimensional array (a.k.a. tensor) > > > libraries - > > > which are the fundamental data structure for these fields. > > > Choices > > > include > > > NumPy, Tensorflow, PyTorch, Dask, JAX, CuPy, MXNet, and others. > > > > > > The APIs of each of these libraries are largely similar, but with > > > enough > > > differences that it?s quite difficult to write code that works > > > with > > > multiple > > > (or all) of these libraries. The array API standard aims to > > > address > > > that > > > issue, by specifying an API for the most common ways arrays are > > > constructed > > > and used. The proposed API is quite similar to NumPy's API, and > > > deviates > > > mainly > > > in places where (a) NumPy made design choices that are inherently > > > not > > > portable > > > to other implementations, and (b) where other libraries > > > consistently > > > deviated > > > from NumPy on purpose because NumPy's design turned out to have > > > issues or > > > unnecessary complexity. > > > > > > For a longer discussion on the purpose of the array API standard > > > we > > > refer to > > > the `Purpose and Scope section of the array API standard < > > > > > > https://data-apis.github.io/array-api/latest/purpose_and_scope.html > > > >` > > > __ > > > and the two blog posts announcing the formation of the Consortium > > > [1]_ and > > > the release of the first draft version of the standard for > > > community > > > review > > > [2]_. > > > > > > The scope of this NEP includes: > > > > > > - Adopting the 2021 version of the array API standard > > > - Adding a separate namespace, tentatively named > > > ``numpy.array_api`` > > > - Changes needed/desired outside of the new namespace, for > > > example > > > new > > > dunder > > > ? methods on the ``ndarray`` object > > > - Implementation choices, and differences between functions in > > > the > > > new > > > ? namespace with those in the main ``numpy`` namespace > > > - A new array object conforming to the array API standard > > > - Maintenance effort and testing strategy > > > - Impact on NumPy's total exposed API surface and on other future > > > and > > > ? under-discussion design choices > > > - Relation to existing and proposed NumPy array protocols > > > ? (``__array_ufunc__``, ``__array_function__``, > > > ``__array_module__``). > > > - Required improvements to existing NumPy functionality > > > > > > Out of scope for this NEP are: > > > > > > - Changes in the array API standard itself. Those are likely to > > > come > > > up > > > ? during review of this NEP, but should be upstreamed as needed > > > and > > > this NEP > > > ? subsequently updated. > > > > > > > > > Usage and Impact > > > ---------------- > > > > > > *This section will be fleshed out later, for now we refer to the > > > use > > > cases > > > given > > > in* `the array API standard Use Cases section < > > > https://data-apis.github.io/array-api/latest/use_cases.html>`__ > > > > > > In addition to those use cases, the new namespace contains > > > functionality > > > that > > > is widely used and supported by many array libraries. As such, it > > > is > > > a good > > > set of functions to teach to newcomers to NumPy and recommend as > > > "best > > > practice". That contrasts with NumPy's main namespace, which > > > contains > > > many > > > functions and objects that have been superceded or we consider > > > mistakes - > > > but > > > that we can't remove because of backwards compatibility reasons. > > > > > > The usage of the ``numpy.array_api`` namespace by downstream > > > libraries is > > > intended to enable them to consume multiple kinds of arrays, > > > *without > > > having > > > to have a hard dependency on all of those array libraries*: > > > > > > .. image:: _static/nep-0047-library-dependencies.png > > > > > > Adoption in downstream libraries > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > The prototype implementation of the ``array_api`` namespace will > > > be > > > used > > > with > > > SciPy, scikit-learn and other libraries of interest that depend > > > on > > > NumPy, in > > > order to get more experience with the design and find out if any > > > important > > > parts are missing. > > > > > > The pattern to support multiple array libraries is intended to be > > > something > > > like:: > > > > > > ??? def somefunc(x, y): > > > ??????? # Retrieves standard namespace. Raises if x and y have > > > different > > > ??????? # namespaces.? See Appendix for possible get_namespace > > > implementation > > > ??????? xp = get_namespace(x, y) > > > ??????? out = xp.mean(x, axis=0) + 2*xp.std(y, axis=0) > > > ??????? return out > > > > > > The ``get_namespace`` call is effectively the library author > > > opting > > > in to > > > using the standard API namespace, and thereby explicitly > > > supporting > > > all conforming array libraries. > > > > > > > > > The ``asarray`` / ``asanyarray`` pattern > > > ```````````````````````````````````````` > > > > > > Many existing libraries use the same ``asarray`` (or > > > ``asanyarray``) > > > pattern > > > as NumPy itself does; accepting any object that can be coerced > > > into a > > > ``np.ndarray``. > > > We consider this design pattern problematic - keeping in mind the > > > Zen > > > of > > > Python, *"explicit is better than implicit"*, as well as the > > > pattern > > > being > > > historically problematic in the SciPy ecosystem for ``ndarray`` > > > subclasses > > > and with over-eager object creation. All other array/tensor > > > libraries > > > are > > > more strict, and that works out fine in practice. We would advise > > > authors of > > > new libraries to avoid the ``asarray`` pattern. Instead they > > > should > > > either > > > accept just NumPy arrays or, if they want to support multiple > > > kinds > > > of > > > arrays, check if the incoming array object supports the array API > > > standard > > > by checking for ``__array_namespace__`` as shown in the example > > > above. > > > > > > Existing libraries can do such a check as well, and only call > > > ``asarray`` if > > > the check fails. This is very similar to the ``__duckarray__`` > > > idea > > > in > > > :ref:`NEP30`. > > > > > > > > > .. _adoption-application-code: > > > > > > Adoption in application code > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > The new namespace can be seen by end users as a cleaned up and > > > slimmed down > > > version of NumPy's main namespace. Encouraging end users to use > > > this > > > namespace like:: > > > > > > ??? import numpy.array_api as xp > > > > > > ??? x = xp.linspace(0, 2*xp.pi, num=100) > > > ??? y = xp.cos(x) > > > > > > seems perfectly reasonable, and potentially beneficial - users > > > get > > > offered > > > only > > > one function for each purpose (the one we consider best- > > > practice), > > > and they > > > then write code that is more easily portable to other libraries. > > > > > > > > > Backward compatibility > > > ---------------------- > > > > > > No deprecations or removals of existing NumPy APIs or other > > > backwards > > > incompatible changes are proposed. > > > > > > > > > High-level design > > > ----------------- > > > > > > The array API standard consists of approximately 120 objects, all > > > of > > > which > > > have a direct NumPy equivalent. This figure shows what is > > > included at > > > a > > > high level: > > > > > > .. image:: _static/nep-0047-scope-of-array-API.png > > > > > > The most important changes compared to what NumPy currently > > > offers > > > are: > > > > > > - A new array object which: > > > > > > ??? - conforms to the casting rules and indexing behaviour > > > specified > > > by the > > > ????? standard, > > > ??? - does not have methods other than dunder methods, > > > ??? - does not support the full range of NumPy indexing > > > behaviour. > > > Advanced > > > ????? indexing with integers is not supported. Only boolean > > > indexing > > > ????? with a single (possibly multi-dimensional) boolean array is > > > supported. > > > ????? An indexing expression that selects a single element > > > returns a > > > 0-D > > > array > > > ????? rather than a scalar. > > > > > > - Functions in the ``array_api`` namespace: > > > > > > ??? - do not accept ``array_like`` inputs, only NumPy arrays and > > > Python > > > scalars > > > ??? - do not support ``__array_ufunc__`` and > > > ``__array_function__``, > > > ??? - use positional-only and keyword-only parameters in their > > > signatures, > > > ??? - have inline type annotations, > > > ??? - may have minor changes to signatures and semantics of > > > individual > > > ????? functions compared to their equivalents already present in > > > NumPy, > > > ??? - only support dtype literals, not format strings or other > > > ways > > > of > > > ????? specifying dtypes > > > > > > - DLPack_ support will be added to NumPy, > > > - New syntax for "device support" will be added, through a > > > ``.device`` > > > ? attribute on the new array object, and ``device=`` keywords in > > > array > > > creation > > > ? functions in the ``array_api`` namespace, > > > - Casting rules that differ from those NumPy currently has. > > > Output > > > dtypes > > > can > > > ? be derived from input dtypes (i.e. no value-based casting), and > > > 0-D > > > arrays > > > ? are treated like >=1-D arrays. > > > - Not all dtypes NumPy has are part of the standard. Only > > > boolean, > > > signed > > > and > > > ? unsigned integers, and floating-point dtypes up to ``float64`` > > > are > > > supported. > > > ? Complex dtypes are expected to be added in the next version of > > > the > > > standard. > > > ? Extended precision, string, void, object and datetime dtypes, > > > as > > > well as > > > ? structured dtypes, are not included. > > > > > > Improvements to existing NumPy functionality that are needed > > > include: > > > > > > - Add support for stacks of matrices to some functions in > > > ``numpy.linalg`` > > > ? that are currently missing such support. > > > - Add the ``keepdims`` keyword to ``np.argmin`` and > > > ``np.argmax``. > > > - Add a "never copy" mode to ``np.asarray``. > > > > > > > > > Functions in the ``array_api`` namespace > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > Let's start with an example of a function implementation that > > > shows > > > the most > > > important differences with the equivalent function in the main > > > namespace:: > > > > > > ??? def max(x: array, /, *, > > > ??????????? axis: Optional[Union[int, Tuple[int, ...]]] = None, > > > ??????????? keepdims: bool = False > > > ??????? ) -> array: > > > ??????? """ > > > ??????? Array API compatible wrapper for :py:func:`np.max > > > `. > > > ??????? """ > > > ??????? return np.max._implementation(x, axis=axis, > > > keepdims=keepdims) > > > > > > This function does not accept ``array_like`` inputs, only > > > ``ndarray``. There > > > are multiple reasons for this. Other array libraries all work > > > like > > > this. > > > Letting the user do coercion of lists, generators, or other > > > foreign > > > objects > > > separately results in a cleaner design with less unexpected > > > behaviour. > > > It's higher-performance - less overhead from ``asarray`` calls. > > > Static > > > typing > > > is easier. Subclasses will work as expected. And the slight > > > increase > > > in > > > verbosity > > > because users have to explicitly coerce to ``ndarray`` on rare > > > occasions > > > seems like a small price to pay. > > > > > > This function does not support ``__array_ufunc__`` nor > > > ``__array_function__``. > > > These protocols serve a similar purpose as the array API standard > > > module > > > itself, > > > but through a different mechanisms. Because only ``ndarray`` > > > instances are > > > accepted, > > > dispatching via one of these protocols isn't useful anymore. > > > > > > This function uses positional-only parameters in its signature. > > > This > > > makes > > > code > > > more portable - writing ``max(x=x, ...)`` is no longer valid, > > > hence > > > if other > > > libraries call the first parameter ``input`` rather than ``x``, > > > that > > > is > > > fine. > > > The rationale for keyword-only parameters (not shown in the above > > > example) > > > is > > > two-fold: clarity of end user code, and it being easier to extend > > > the > > > signature > > > in the future with keywords in the desired order. > > > > > > This function has inline type annotations. Inline annotations are > > > far > > > easier to > > > maintain than separate stub files. And because the types are > > > simple, > > > this > > > will > > > not result in a large amount of clutter with type aliases or > > > unions > > > like in > > > the > > > current stub files NumPy has. > > > > > > > > > DLPack support for zero-copy data interchange > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > The ability to convert one kind of array into another kind is > > > valuable, and > > > indeed necessary when downstream libraries want to support > > > multiple > > > kinds of > > > arrays. This requires a well-specified data exchange protocol. > > > NumPy > > > already > > > supports two of these, namely the buffer protocol (i.e., PEP > > > 3118), > > > and > > > the ``__array_interface__`` (Python side) / ``__array_struct__`` > > > (C > > > side) > > > protocol. Both work similarly, letting the "producer" describe > > > how > > > the data > > > is laid out in memory so the "consumer" can construct its own > > > kind of > > > array > > > with a view on that data. > > > > > > DLPack works in a very similar way. The main reasons to prefer > > > DLPack > > > over > > > the options already present in NumPy are: > > > > > > 1. DLPack is the only protocol with device support (e.g., GPUs > > > using > > > CUDA or > > > ?? ROCm drivers, or OpenCL devices). NumPy is CPU-only, but other > > > array > > > ?? libraries are not. Having one protocol per device isn't > > > tenable, > > > hence > > > ?? device support is a must. > > > 2. Widespread support. DLPack has the widest adoption of all > > > protocols, only > > > ?? NumPy is missing support. And the experiences of other > > > libraries > > > with it > > > ?? are positive. This contrasts with the protocols NumPy does > > > support, which > > > ?? are used very little - when other libraries want to > > > interoperate > > > with > > > ?? NumPy, they typically use the (more limited, and NumPy- > > > specific) > > > ?? ``__array__`` protocol. > > > > > > Adding support for DLPack to NumPy entails: > > > > > > - Adding a ``ndarray.__dlpack__`` method > > > - Adding a ``from_dlpack`` function, which takes as input an > > > object > > > ? supporting ``__dlpack__``, and returns an ``ndarray``. > > > > > > DLPack is currently a ~200 LoC header, and is meant to be > > > included > > > directly, so > > > no external dependency is needed. Implementation should be > > > straightforward. > > > > > > > > > Syntax for device support > > > ~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > NumPy itself is CPU-only, so it clearly doesn't have a need for > > > device > > > support. > > > However, other libraries (e.g. TensorFlow, PyTorch, JAX, MXNet) > > > support > > > multiple types of devices: CPU, GPU, TPU, and more exotic > > > hardware. > > > To write portable code on systems with multiple devices, it's > > > often > > > necessary > > > to create new arrays on the same device as some other array, or > > > check > > > that > > > two arrays live on the same device. Hence syntax for that is > > > needed. > > > > > > The array object will have a ``.device`` attribute which enables > > > comparing > > > devices of different arrays (they only should compare equal if > > > both > > > arrays > > > are > > > from the same library and it's the same hardware device). > > > Furthermore, > > > ``device=`` keywords in array creation functions are needed. For > > > example:: > > > > > > ??? def empty(shape: Union[int, Tuple[int, ...]], /, *, > > > ????????????? dtype: Optional[dtype] = None, > > > ????????????? device: Optional[device] = None) -> array: > > > ??????? """ > > > ??????? Array API compatible wrapper for :py:func:`np.empty > > > `. > > > ??????? """ > > > ??????? return np.empty(shape, dtype=dtype, device=device) > > > > > > The implementation for NumPy may be as simple as setting the > > > device > > > attribute to > > > the string ``'cpu'`` and raising an exception if array creation > > > functions > > > encounter any other value. > > > > > > > > > Dtypes and casting rules > > > ~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > The supported dtypes in this namespace are boolean, 8/16/32/64- > > > bit > > > signed > > > and > > > unsigned integer, and 32/64-bit floating-point dtypes. These will > > > be > > > added > > > to > > > the namespace as dtype literals with the expected names (e.g., > > > ``bool``, > > > ``uint16``, ``float64``). > > > > > > The most obvious omissions are the complex dtypes. The rationale > > > for > > > the > > > lack > > > of complex support in the first version of the array API standard > > > is > > > that > > > several > > > libraries (PyTorch, MXNet) are still in the process of adding > > > support > > > for > > > complex dtypes. The next version of the standard is expected to > > > include > > > ``complex64`` > > > and ``complex128`` (see `this issue < > > > https://github.com/data-apis/array-api/issues/102>`__ > > > for more details). > > > > > > Specifying dtypes to functions, e.g. via the ``dtype=`` keyword, > > > is > > > expected > > > to only use the dtype literals. Format strings, Python builtin > > > dtypes, or > > > string representations of the dtype literals are not accepted - > > > this > > > will > > > improve readability and portability of code at little cost. > > > > > > Casting rules are only defined between different dtypes of the > > > same > > > kind. > > > The > > > rationale for this is that mixed-kind (e.g., integer to floating- > > > point) > > > casting behavior differs between libraries. NumPy's mixed-kind > > > casting > > > behavior doesn't need to be changed or restricted, it only needs > > > to > > > be > > > documented that if users use mixed-kind casting, their code may > > > not > > > be > > > portable. > > > > > > .. image:: _static/nep-0047-casting-rules-lattice.png > > > > > > *Type promotion diagram. Promotion between any two types is given > > > by > > > their > > > join on this lattice. Only the types of participating arrays > > > matter, > > > not > > > their values. Dashed lines indicate that behaviour for Python > > > scalars > > > is > > > undefined on overflow. Boolean, integer and floating-point dtypes > > > are > > > not > > > connected, indicating mixed-kind promotion is undefined.* > > > > > > The most important difference between the casting rules in NumPy > > > and > > > in the > > > array API standard is how scalars and 0-dimensional arrays are > > > handled. In > > > the standard, array scalars do not exist and 0-dimensional arrays > > > follow the > > > same casting rules as higher-dimensional arrays. > > > > > > See the `Type Promotion Rules section of the array API standard < > > > > > > https://data-apis.github.io/array-api/latest/API_specification/type_promotion.html > > > > `__ > > > for more details. > > > > > > .. note:: > > > > > > ??? It is not clear what the best way is to support the different > > > casting > > > rules > > > ??? for 0-dimensional arrays and no value-based casting. One > > > option > > > may be > > > to > > > ??? implement this second set of casting rules, keep them > > > private, > > > mark the > > > ??? array API functions with a private attribute that says they > > > adhere to > > > ??? these different rules, and let the casting machinery check > > > whether for > > > ??? that attribute. > > > > > > ??? This needs discussion. > > > > > > > > > Indexing > > > ~~~~~~~~ > > > > > > An indexing expression that would return a scalar with > > > ``ndarray``, > > > e.g. > > > ``arr_2d[0, 0]``, will return a 0-D array with the new array > > > object. > > > There > > > are > > > several reasons for that: array scalars are largely considered a > > > design > > > mistake > > > which no other array library copied; it works better for non-CPU > > > libraries > > > (typically arrays can live on the device, scalars live on the > > > host); > > > and > > > it's > > > simply a consistent design. To get a Python scalar out of a 0-D > > > array, one > > > can > > > simply use the builtin for the type, e.g. ``float(arr_0d)``. > > > > > > The other `indexing modes in the standard < > > > > > > https://data-apis.github.io/array-api/latest/API_specification/indexing.html > > > > `__ > > > do work largely the same as they do for ``numpy.ndarray``. One > > > noteworthy > > > difference is that clipping in slice indexing (e.g., ``a[:n]`` > > > where > > > ``n`` > > > is > > > larger than the size of the first axis) is unspecified behaviour, > > > because > > > that kind of check can be expensive on accelerators. > > > > > > The lack of advanced indexing, and boolean indexing being limited > > > to > > > a > > > single > > > n-D boolean array, is due to those indexing modes not being > > > suitable > > > for all > > > types of arrays or JIT compilation. Their absence does not seem > > > to be > > > problematic; if a user or library author wants to use them, they > > > can > > > do so > > > through zero-copy conversion to ``numpy.ndarray``. This will > > > signal > > > correctly > > > to whomever reads the code that it is then NumPy-specific rather > > > than > > > portable > > > to all conforming array types. > > > > > > > > > > > > The array object > > > ~~~~~~~~~~~~~~~~ > > > > > > The array object in the standard does not have methods other than > > > dunder > > > methods. The rationale for that is that not all array libraries > > > have > > > methods > > > on their array object (e.g., TensorFlow does not). It also > > > provides > > > only a > > > single way of doing something, rather than have functions and > > > methods > > > that > > > are effectively duplicate. > > > > > > Mixing operations that may produce views (e.g., indexing, > > > ``nonzero``) > > > in combination with mutation (e.g., item or slice assignment) is > > > `explicitly documented in the standard to not be supported < > > > > > > https://data-apis.github.io/array-api/latest/design_topics/copies_views_and_mutation.html > > > > `__. > > > This cannot easily be prohibited in the array object itself; > > > instead > > > this > > > will > > > be guidance to the user via documentation. > > > > > > The standard current does not prescribe a name for the array > > > object > > > itself. > > > We propose to simply name it ``ndarray``. This is the most > > > obvious > > > name, and > > > because of the separate namespace should not clash with > > > ``numpy.ndarray``. > > > > > > > > > Implementation > > > -------------- > > > > > > .. note:: > > > > > > ??? This section needs a lot more detail, which will gradually be > > > added when > > > ??? the implementation progresses. > > > > > > A prototype of the ``array_api`` namespace can be found in > > > > > > https://github.com/data-apis/numpy/tree/array-api/numpy/_array_api > > > . > > > The docstring in its ``__init__.py`` has notes on completeness of > > > the > > > implementation. The code for the wrapper functions also contains > > > ``# > > > Note:`` > > > comments everywhere there is a difference with the NumPy API. > > > Two important parts that are not implemented yet are the new > > > array > > > object > > > and > > > DLPack support. Functions may need changes to ensure the changed > > > casting > > > rules > > > are respected. > > > > > > The array object > > > ~~~~~~~~~~~~~~~~ > > > > > > Regarding the array object implementation, we plan to start with > > > a > > > regular > > > Python class that wraps a ``numpy.ndarray`` instance. Attributes > > > and > > > methods > > > can forward to that wrapped instance, applying input validation > > > and > > > implementing changed behaviour as needed. > > > > > > The casting rules are probably the most challenging part. The in- > > > progress > > > dtype system refactor (NEPs 40-43) should make implementing the > > > correct > > > casting > > > behaviour easier - it is already moving away from value-based > > > casting > > > for > > > example. > > > > > > > > > The dtype objects > > > ~~~~~~~~~~~~~~~~~ > > > > > > We must be able to compare dtypes for equality, and expressions > > > like > > > these > > > must > > > be possible:: > > > > > > ??? np.array_api.some_func(..., dtype=x.dtype) > > > > > > The above implies it would be nice to have ``np.array_api.float32 > > > == > > > np.array_api.ndarray(...).dtype``. > > > > > > Dtypes should not be assumed to have a class hierarchy by users, > > > however we > > > are > > > free to implement it with a class hierarchy if that's convenient. > > > We > > > considered > > > the following options to implement dtype objects: > > > > > > 1. Alias dtypes to those in the main namespace. E.g., > > > ``np.array_api.float32 = > > > ?? np.float32``. > > > 2. Make the dtypes instances of ``np.dtype``. E.g., > > > ``np.array_api.float32 = > > > ?? np.dtype(np.float32)``. > > > 3. Create new singleton classes with only the required > > > methods/attributes > > > ?? (currently just ``__eq__``). > > > > > > It seems like (2) would be easiest from the perspective of > > > interacting with > > > functions outside the main namespace. And (3) would adhere best > > > to > > > the > > > standard. > > > > > > TBD: the standard does not yet have a good way to inspect > > > properties > > > of a > > > dtype, to ask questions like "is this an integer dtype?". Perhaps > > > this is > > > easy > > > enough to do for users, like so:: > > > > > > ??? def _get_dtype(dt_or_arr): > > > ??????? return dt_or_arr.dtype if hasattr(dt_or_arr, 'dtype') > > > else > > > dt_or_arr > > > > > > ??? def is_floating(dtype_or_array): > > > ??????? dtype = _get_dtype(dtype_or_array) > > > ??????? return dtype in (float32, float64) > > > > > > ??? def is_integer(dtype_or_array): > > > ??????? dtype = _get_dtype(dtype_or_array) > > > ??????? return dtype in (uint8, uint16, uint32, uint64, int8, > > > int16, > > > int32, > > > int64) > > > > > > However it could make sense to add to the standard. Note that > > > NumPy > > > itself > > > currently does not have a great for asking such questions, see > > > `gh-17325 `__. > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Thu Mar 11 06:37:04 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 11 Mar 2021 12:37:04 +0100 Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47) In-Reply-To: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> References: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> Message-ID: On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg wrote: > Top Posting, to discuss post specific questions about NEP 47 and > partially the start on implementing it in: > > https://github.com/numpy/numpy/pull/18585 > > There are probably many more that will crop up. But for me, each of > these is a pretty major difficulty without a clear answer as of now. > All great questions, that Sebastian. Let me reply to the questions that Aaron didn't reply to inline below. > 1. I still need clarity how a library is supposed to use this namespace > when the user passes in a NumPy array (mentioned before). The user > must get back a NumPy array after all. Maybe that is just a decorator, > but it seems important. > I agree that it will be a common pattern that libraries will accept all standard-compliant array types plus numpy.ndarray. And the output array type should match the input type. In Aaron's implementation the new array object has a numpy.ndarray as private attribute, so that's the instance that should be returned. A decorator seems like a sensible way to handle that. Or a simple utility function, something like `return correct_arraytype(out)`. Either way, that pattern should be added to NEP 47. I don't see a fundamental problem here, we just need to find the nicest UX for it. 3. For all other function, the same problem applies. You don't actually > have anything to fix NumPy promotion rules. You could bake your own > cake here for numeric types, but I am not sure, you might also need NEP > 43 in all its promotion power to pull it off. > This is probably the single most difficult question implementation-wise. Note that there are only numerical dtypes (plus boolean), so dealing with string, datetime, object or third-party dtypes is a non-issue. 4. The PR makes no attempt at handling binary operators in any way > aside from greedily coercing the other operand. > Agreed. This is the same point as (3) I think - how to handle dtype promotion is the main open question. > 5. What happens with a mix of array-likes or even array subclasses like > `astropy.quantity`? > Array-likes (e.g. list) should raise an exception, the NEP clearly says "do not accept array_like dtypes". This is what every other array/tensor library already does. Array subclasses should work as expected, assuming they're valid subclasses and not things like np.matrix. Using Mypy will help avoid writing more subclasses that break the Liskov substitution principle. More comments in https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern Mixing two different types of arrays into a single function call should raise an exception. A design goal is: enable writing functions `somefunc(x1, x2)` that work for any type of array where `x1, x2` come from the same library = so they're either the same type, or two types for which the library itself knows how to mix them. If x1 and x2 are from different libraries, this will raise an exception. To be clear, it is not intended that `np.array_api.somefunc(x_cupy)` works - this will raise an exception. Cheers, Ralf > > I don't think we have to figure out everything up-front, but I do think > there are a few very fundamental questions still open, at least for me > personally. > > Cheers, > > Sebastian > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Mar 11 07:49:33 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 11 Mar 2021 13:49:33 +0100 Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47) In-Reply-To: <01b5e7193b6e9b261befb4e62c5b94f39debf69f.camel@sipsolutions.net> References: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> <01b5e7193b6e9b261befb4e62c5b94f39debf69f.camel@sipsolutions.net> Message-ID: On Wed, Mar 10, 2021 at 11:41 PM Sebastian Berg wrote: > On Wed, 2021-03-10 at 13:44 -0700, Aaron Meurer wrote: > > On Wed, Mar 10, 2021 at 10:42 AM Sebastian Berg > > wrote: > > > > > > > 2. `np.result_type` special cases array-scalars (the current PR), > > > NEP > > > 47 promises it will not. The PR could attempt to work around that > > > using `arr.dtype` int `result_type`, I expect there are more > > > details to > > > fight with there, but I am not sure. > > > > The idea is to work around it everywhere, so that it follows the > > rules > > in the spec (no array scalars, no value-based casting). I haven't > > started it yet, though, so I don't know yet how hard it will be. If > > it > > ends up being too hard we could put it in the same camp as device > > support and dlpack support where it needs some basic implementation > > in > > numpy itself first before we can properly do it in the array API > > namespace. > > Quite frankly. If you really want to implement a minimal API, it may be > best to just write it yourself and ditch NumPy. (Of course I currently > doubt that the NEP 47 implementation should be minimal.) > I'm not really sure what to say other than that I don't think anyone will be served by "ditching NumPy". The goal for this "minimal" part is to provide an API that you can write code against that will work portably across other array libraries. That seems like a valuable goal, right? And if you want NumPy-specific things that other libraries don't commonly (or at all) implement and are not supported by array_api, then you don't use this API but the existing main numpy namespace. > About doing promotion yourself ("promotion" as in what ufuncs do; I > call `np.result_type` "common DType", because it is used e.g. in > `concatenate`): > > Ufuncs have at least one more rule for true-division, plus there may be > mixed float-int loops, etc. Since the standard is very limited and you > only have numeric dtypes that might be all though. > > In any case, my point is: If NumPy does strange things (and it does > with 0-D arrays currently). You could cook your own soup there also, > and implement it in NumPy by using `signature=...` in the ufunc call. > Interesting idea. > > > 4. Now that I looked at the above, I do not feel its reasonable to > > > limit this functionality to numeric dtypes. If someone uses a > > > NumPy > > > rational-dtype, why should a SciPy function currently implemented > > > in > > > pure NumPy reject that? In other words, I think this is the point > > > where trying to be "minimal" is counterproductive. > SciPy would still be free to implement *both* a portable code path and a numpy-specific path (if that makes sense, which I doubt in many cases). There's just no way those two code paths can be 100% common, because no other library implements a rational dtype. > > > The idea of minimality is to make it so users can be sure they will > > be > > able to use other libraries, once they also have array API compliant > > namespaces. A rational-dtype wouldn't ever be implemented in those > > other libraries, because it isn't part of the standard, so if a user > > is using those, that is a sign they are using things that aren't in > > the array API, so they can't expect to be able to swap out their > > dtypes. If a user wants to use something that's only in NumPy, then > > they should just use NumPy. > > > > This is not about the "user", in your scenario the end-user does use > NumPy. The way I understand this is not a prerequisite. If it is, a > lot of things will be simpler though, and most of my doubts will go > away (but be replaced with uncertainty about the usefulness). > > The problem is that SciPy as the "library author" wants to to use NEP > 47 without limiting the end-user (or the end-user even noticing!). > The distinction between end-user and library author (someone who writes > a function that should work with numpy, pytorch, etc.) is very > important here and too all of these "protocol" discussions. > The example feels a little forced. >99% of end user code written against libraries like SciPy uses standard numerical dtypes. Things like a rational dtype are very niche. A rationale dtype works with most NumPy functions, but is not at all guaranteed to work with SciPy functions - and if it does it's accidental, untested and may break if SciPy would change its implementation (e.g. move from pure Python + NumPy to Cython or C++). > > I assume that SciPy should be able to have the cake and eat it to: > > * Uses the limited array-api and make sure to only rely on the minimal > subset. > * Not artificially limit end-users who pass in NumPy arrays. > > The second point can also be read as: SciPy would be able to support > practically all current NumPy array use cases without jumping through > any additional hoops (or well, maybe a bit of churn, but churn that is > made easy by as of now undefined API). > I suspect you have things in mind that are not actually supported by SciPy today. The rational dtype is one example, but so are ndarray subclasses. Take masked arrays as an example - these are not supported today, except for scipy.stats.mstats functionality - where support is intentional, special-cases and tested. For masked arrays as well as other arbitrary fancy subclasses, there's some not-well-defined subset of functionality that may work today, but that is fragile, untested and can break without warning in any release. Only Liskov-substitutable ndarray subclasses are not fragile - those are simply coerced to ndarray via the ubiquitous `np.asarray` pattern, and ndarrays are returned. That must and will remain working. This is a complex topic, and it's possible that I'm missing other use cases you have in mind, so I thought I'd make a diagram to explain the difference between the custom dtypes & subclasses that are supported by NumPy itself but not by downstream libraries: https://github.com/rgommers/numpy/blob/numpy-scipy-custom-inputs/doc/neps/_static/nep-0047-numpy-scipy-custominputs.png > > > > 6. Is there any provision on how to deal with mixed array-like > > > inputs? > > > CuPy+numpy, etc.? > > > > Neither of these are defined in the spec. The spec only deals with > > staying inside of the compliant namespace. It doesn't require any > > behavior mixing things from other namespaces. That's generally > > considered a much harder problem, and there is the data interchange > > protocol to deal with it > > ( > > > https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html > > ). > > > > OK, maybe you can get away with it, since the current proposal seems to > be that `get_namespace()` raises on mixed input. Still seems like > something that should probably raise an error rather than coerce to > NumPy when calling: `nep47_array_object + dask_array`. > Agreed, this must raise too. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Mar 11 12:07:42 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 11 Mar 2021 11:07:42 -0600 Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47) In-Reply-To: References: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> Message-ID: <3ba55e0fe50da814b486a73855da35770e50303b.camel@sipsolutions.net> On Thu, 2021-03-11 at 12:37 +0100, Ralf Gommers wrote: > On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg < > sebastian at sipsolutions.net> > wrote: > > > Top Posting, to discuss post specific questions about NEP 47 and > > partially the start on implementing it in: > > > > ??? https://github.com/numpy/numpy/pull/18585 > > > > There are probably many more that will crop up. But for me, each of > > these is a pretty major difficulty without a clear answer as of > > now. > > > > All great questions, that Sebastian. Let me reply to the questions > that > Aaron didn't reply to inline below. > To be clear, I do not expect complete answers to these questions right now. (Although being unsure about some of them does make me slightly reluctant to merge the work-in-progress into NumPy proper as opposed to a separate repo.) Also, yes, most/all questions are hopefully are just trivialities to check of (or no more than seeds for thought). Or even just a starting point for making NEP 47's "Usage and Impact" section more complete including them as either "example usage patterns" or "limitations". My second takeaway from the questions is that I have doubts the "minimal" version will pan out, it feels like many of the questions might disappear if you drop that part. So, from my current thinking, the minimal implementation may not be a good "NEP 47" implementation. That does _not_ mean that I think you should pause and reconsider or even worry about pleasing me with good answers! Just continue under whatever assumption you prefer and if it turns out that "minimal" won't work for NEP 47: no harm done! We need a "minimal implementation" in any case. Cheers, Sebastian [1] If SciPy needs an additional NumPy code path to keep support `object` arrays or other dtypes ? right now even complex ?, then the reader needs to be aware of that to make a decision if NEP 47 will actually help for their library. Will AstroPy have to reimplement `astropy.units.Quantity` to be "standard conform" (is that even possible!?) before it can easily adopt it any of its API that currently works with `astropy.units.Quantity`? > > > 1. I still need clarity how a library is supposed to use this > > namespace > > when the user passes in a NumPy array (mentioned before).? The user > > must get back a NumPy array after all.? Maybe that is just a > > decorator, > > but it seems important. > > > > I agree that it will be a common pattern that libraries will accept > all > standard-compliant array types plus numpy.ndarray. And the output > array > type should match the input type. In Aaron's implementation the new > array > object has a numpy.ndarray as private attribute, so that's the > instance > that should be returned. A decorator seems like a sensible way to > handle > that. Or a simple utility function, something like `return > correct_arraytype(out)`. > > Either way, that pattern should be added to NEP 47. I don't see a > fundamental problem here, we just need to find the nicest UX for it. > > 3. For all other function, the same problem applies. You don't > actually > > have anything to fix NumPy promotion rules.? You could bake your > > own > > cake here for numeric types, but I am not sure, you might also need > > NEP > > 43 in all its promotion power to pull it off. > > > > This is probably the single most difficult question implementation- > wise. > Note that there are only numerical dtypes (plus boolean), so dealing > with > string, datetime, object or third-party dtypes is a non-issue. > > 4. The PR makes no attempt at handling binary operators in any way > > aside from greedily coercing the other operand. > > > > Agreed. This is the same point as (3) I think - how to handle dtype > promotion is the main open question. > > > > 5. What happens with a mix of array-likes or even array subclasses > > like > > `astropy.quantity`? > > > > Array-likes (e.g. list) should raise an exception, the NEP clearly > says "do > not accept array_like dtypes". This is what every other array/tensor > library already does. > > Array subclasses should work as expected, assuming they're valid > subclasses > and not things like np.matrix. Using Mypy will help avoid writing > more > subclasses that break the Liskov substitution principle. More > comments in > https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern > > Mixing two different types of arrays into a single function call > should > raise an exception. A design goal is: enable writing functions > `somefunc(x1, x2)` that work for any type of array where `x1, x2` > come from > the same library = so they're either the same type, or two types for > which > the library itself knows how to mix them. If x1 and x2 are from > different > libraries, this will raise an exception. > > To be clear, it is not intended that `np.array_api.somefunc(x_cupy)` > works > - this will raise an exception. > > Cheers, > Ralf > > > > > > > I don't think we have to figure out everything up-front, but I do > > think > > there are a few very fundamental questions still open, at least for > > me > > personally. > > > > Cheers, > > > > Sebastian > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From pierre.augier at univ-grenoble-alpes.fr Fri Mar 12 15:36:20 2021 From: pierre.augier at univ-grenoble-alpes.fr (PIERRE AUGIER) Date: Fri, 12 Mar 2021 21:36:20 +0100 (CET) Subject: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran Message-ID: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> Hi, I'm looking for a difference between Numpy 0.19.5 and 0.20 which could explain a performance regression (~15 %) with Pythran. I observe this regression with the script https://github.com/paugier/nbabel/blob/master/py/bench.py Pythran reimplements Numpy so it is not about Numpy code for computation. However, Pythran of course uses the native array contained in a Numpy array. I'm quite sure that something has changed between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) since I don't get the same performance with Numpy 0.20. I checked that the values in the arrays are the same and that the flags characterizing the arrays are also the same. Good news, I'm now able to obtain the performance difference just with Numpy 0.19.5. In this code, I load the data with Pandas and need to prepare contiguous Numpy arrays to give them to Pythran. With Numpy 0.19.5, if I use np.copy I get better performance that with np.ascontiguousarray. With Numpy 0.20, both functions create array giving the same performance with Pythran (again, less good that with Numpy 0.19.5). Note that this code is very efficient (more that 100 times faster than using Numpy), so I guess that things like alignment or memory location can lead to such difference. More details in this issue https://github.com/serge-sans-paille/pythran/issues/1735 Any help to understand what has changed would be greatly appreciated! Cheers, Pierre From melissawm at gmail.com Fri Mar 12 16:27:23 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Fri, 12 Mar 2021 18:27:23 -0300 Subject: [Numpy-discussion] Documentation Team meeting - Monday March 15 (Beware of Daylight Saving Time!) In-Reply-To: References: Message-ID: Hi all! Our next Documentation Team meeting will be on *Monday, March 15* at ***4PM UTC*** (This has probably changed for you if you have recently gone through a DST change). All are welcome - you don't need to already be a contributor to join. If you have questions or are curious about what we're doing, we'll be happy to meet you! If you wish to join on Zoom, use this link: https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success Here's the permanent hackmd document with the meeting notes (still being updated in the next few days!): https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210315T16&p1=1440&ah=1 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Mar 12 16:50:24 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 12 Mar 2021 15:50:24 -0600 Subject: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran In-Reply-To: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > Hi, > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which > could explain a performance regression (~15 %) with Pythran. > > I observe this regression with the script > https://github.com/paugier/nbabel/blob/master/py/bench.py > > Pythran reimplements Numpy so it is not about Numpy code for > computation. However, Pythran of course uses the native array > contained in a Numpy array. I'm quite sure that something has changed > between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) > since I don't get the same performance with Numpy 0.20. I checked > that the values in the arrays are the same and that the flags > characterizing the arrays are also the same. > > Good news, I'm now able to obtain the performance difference just > with Numpy 0.19.5. In this code, I load the data with Pandas and need > to prepare contiguous Numpy arrays to give them to Pythran. With > Numpy 0.19.5, if I use np.copy I get better performance that with > np.ascontiguousarray. With Numpy 0.20, both functions create array > giving the same performance with Pythran (again, less good that with > Numpy 0.19.5). > > Note that this code is very efficient (more that 100 times faster > than using Numpy), so I guess that things like alignment or memory > location can lead to such difference. > > More details in this issue > https://github.com/serge-sans-paille/pythran/issues/1735 > > Any help to understand what has changed would be greatly appreciated! > If you want to really dig into this, it would be good to do profiling to find out at where the differences are. Without that, I don't have much appetite to investigate personally. The reason is that fluctuations of ~30% (or even much more) when running the NumPy benchmarks are very common. I am not aware of an immediate change in NumPy, especially since you are talking pythran, and only the memory space or the interface code should matter. As to the interface code... I would expect it to be quite a bit faster, not slower. There was no change around data allocation, so at best what you are seeing is a different pattern in how the "small array cache" ends up being used. Unfortunately, getting stable benchmarks that reflect code changes exactly is tough... Here is a nice blog post from Victor Stinner where he had to go as far as using "profile guided compilation" to avoid fluctuations: https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html I somewhat hope that this is also the reason for the huge fluctuations we see in the NumPy benchmarks due to absolutely unrelated code changes. But I did not have the energy to try it (and a probably fixed bug in gcc makes it a bit harder right now). Cheers, Sebastian > Cheers, > Pierre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From pierre.augier at univ-grenoble-alpes.fr Fri Mar 12 18:33:42 2021 From: pierre.augier at univ-grenoble-alpes.fr (PIERRE AUGIER) Date: Sat, 13 Mar 2021 00:33:42 +0100 (CET) Subject: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran In-Reply-To: <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> Message-ID: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> Hi, I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy --force-reinstall` and I can reproduce the regression. Good news, I was able to reproduce the difference with only Numpy 1.20.1. Arrays prepared with (`df` is a Pandas dataframe) arr = df.values.copy() or arr = np.ascontiguousarray(df.values) lead to "slow" execution while arrays prepared with arr = np.copy(df.values) lead to faster execution. arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives so I would expect arr.copy() and np.copy(arr) to give exactly the same result. Note that I think I'm doing quite serious and reproducible benchmarks. I also checked that this regression is reproducible on another computer. Cheers, Pierre ----- Mail original ----- > De: "Sebastian Berg" > ?: "numpy-discussion" > Envoy?: Vendredi 12 Mars 2021 22:50:24 > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with > Pythran > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: >> Hi, >> >> I'm looking for a difference between Numpy 0.19.5 and 0.20 which >> could explain a performance regression (~15 %) with Pythran. >> >> I observe this regression with the script >> https://github.com/paugier/nbabel/blob/master/py/bench.py >> >> Pythran reimplements Numpy so it is not about Numpy code for >> computation. However, Pythran of course uses the native array >> contained in a Numpy array. I'm quite sure that something has changed >> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) >> since I don't get the same performance with Numpy 0.20. I checked >> that the values in the arrays are the same and that the flags >> characterizing the arrays are also the same. >> >> Good news, I'm now able to obtain the performance difference just >> with Numpy 0.19.5. In this code, I load the data with Pandas and need >> to prepare contiguous Numpy arrays to give them to Pythran. With >> Numpy 0.19.5, if I use np.copy I get better performance that with >> np.ascontiguousarray. With Numpy 0.20, both functions create array >> giving the same performance with Pythran (again, less good that with >> Numpy 0.19.5). >> >> Note that this code is very efficient (more that 100 times faster >> than using Numpy), so I guess that things like alignment or memory >> location can lead to such difference. >> >> More details in this issue >> https://github.com/serge-sans-paille/pythran/issues/1735 >> >> Any help to understand what has changed would be greatly appreciated! >> > > If you want to really dig into this, it would be good to do profiling > to find out at where the differences are. > > Without that, I don't have much appetite to investigate personally. The > reason is that fluctuations of ~30% (or even much more) when running > the NumPy benchmarks are very common. > > I am not aware of an immediate change in NumPy, especially since you > are talking pythran, and only the memory space or the interface code > should matter. > As to the interface code... I would expect it to be quite a bit faster, > not slower. > There was no change around data allocation, so at best what you are > seeing is a different pattern in how the "small array cache" ends up > being used. > > > Unfortunately, getting stable benchmarks that reflect code changes > exactly is tough... Here is a nice blog post from Victor Stinner where > he had to go as far as using "profile guided compilation" to avoid > fluctuations: > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > I somewhat hope that this is also the reason for the huge fluctuations > we see in the NumPy benchmarks due to absolutely unrelated code > changes. > But I did not have the energy to try it (and a probably fixed bug in > gcc makes it a bit harder right now). > > Cheers, > > Sebastian > > > > >> Cheers, >> Pierre >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From efiring at hawaii.edu Fri Mar 12 18:53:23 2021 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 12 Mar 2021 13:53:23 -1000 Subject: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran In-Reply-To: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: <9ce63b83-8e61-e415-aef6-e5385e3a0649@hawaii.edu> On 2021/03/12 1:33 PM, PIERRE AUGIER wrote: > arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives so I would expect arr.copy() and np.copy(arr) to give exactly the same result. According to the docstrings for numpy.copy and arr.copy, the function and the method have different defaults for the memory layout. np.copy() tries to maintain the order of the original while arr.copy() defaults to C order. Eric From diagonaldevice at gmail.com Fri Mar 12 19:24:08 2021 From: diagonaldevice at gmail.com (Michael Lamparski) Date: Fri, 12 Mar 2021 19:24:08 -0500 Subject: [Numpy-discussion] Programmatically contracting multiple tensors Message-ID: Greetings, I have something in my code where I can receive an array M of unknown dimensionality and a list of "labels" for each axis. E.g. perhaps I might get an array of shape (2, 47, 3, 47, 3) with labels ['spin', 'atom', 'coord', 'atom', 'coord']. For every axis that is labeled "coord", I want to multiply in some rotation matrix R. So, for the above example, this could be done with the following handwritten line: return np.einsum('Cc,Ee,abcde->abCdE', R, R, M) But since I want to do this programmatically, I find myself in the awkward situation of having to construct this string (and e.g. having to arbitrarily limit the number of axes to 26 or something like that). Is there a more idiomatic way to do this that would let me supply integer labels for summation indices? Or should I just bite the bullet and start generating strings? --- Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Fri Mar 12 19:32:01 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sat, 13 Mar 2021 00:32:01 +0000 Subject: [Numpy-discussion] Programmatically contracting multiple tensors In-Reply-To: References: Message-ID: Einsum has a secret integer argument format that appears in the Examples section of the `np.einsum` docs, but appears not to be mentioned at all in the parameter listing. Eric On Sat, 13 Mar 2021 at 00:25, Michael Lamparski wrote: > Greetings, > > I have something in my code where I can receive an array M of unknown > dimensionality and a list of "labels" for each axis. E.g. perhaps I might > get an array of shape (2, 47, 3, 47, 3) with labels ['spin', 'atom', > 'coord', 'atom', 'coord']. > > For every axis that is labeled "coord", I want to multiply in some > rotation matrix R. So, for the above example, this could be done with the > following handwritten line: > > return np.einsum('Cc,Ee,abcde->abCdE', R, R, M) > > But since I want to do this programmatically, I find myself in the awkward > situation of having to construct this string (and e.g. having to > arbitrarily limit the number of axes to 26 or something like that). Is > there a more idiomatic way to do this that would let me supply integer > labels for summation indices? Or should I just bite the bullet and start > generating strings? > > --- > Michael > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Fri Mar 12 20:09:02 2021 From: deak.andris at gmail.com (Andras Deak) Date: Sat, 13 Mar 2021 02:09:02 +0100 Subject: [Numpy-discussion] Programmatically contracting multiple tensors In-Reply-To: References: Message-ID: On Sat, Mar 13, 2021 at 1:32 AM Eric Wieser wrote: > > Einsum has a secret integer argument format that appears in the Examples section of the `np.einsum` docs, but appears not to be mentioned at all in the parameter listing. It's mentioned (albeit somewhat cryptically) sooner in the Notes: "einsum also provides an alternative way to provide the subscripts and operands as einsum(op0, sublist0, op1, sublist1, ..., [sublistout]). If the output shape is not provided in this format einsum will be calculated in implicit mode, otherwise it will be performed explicitly. The examples below have corresponding einsum calls with the two parameter methods. New in version 1.10.0." Not that this helps much, because I definitely wouldn't understand this API without the examples. But I'm not sure _where_ this could be highlighted among the parameters; after all this is all covered by the *operands parameter. Andr?s > Eric > > On Sat, 13 Mar 2021 at 00:25, Michael Lamparski wrote: >> >> Greetings, >> >> I have something in my code where I can receive an array M of unknown dimensionality and a list of "labels" for each axis. E.g. perhaps I might get an array of shape (2, 47, 3, 47, 3) with labels ['spin', 'atom', 'coord', 'atom', 'coord']. >> >> For every axis that is labeled "coord", I want to multiply in some rotation matrix R. So, for the above example, this could be done with the following handwritten line: >> >> return np.einsum('Cc,Ee,abcde->abCdE', R, R, M) >> >> But since I want to do this programmatically, I find myself in the awkward situation of having to construct this string (and e.g. having to arbitrarily limit the number of axes to 26 or something like that). Is there a more idiomatic way to do this that would let me supply integer labels for summation indices? Or should I just bite the bullet and start generating strings? >> >> --- >> Michael >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Fri Mar 12 20:24:33 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 12 Mar 2021 19:24:33 -0600 Subject: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran In-Reply-To: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: On Sat, 2021-03-13 at 00:33 +0100, PIERRE AUGIER wrote: > Hi, > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary > numpy --force-reinstall` and I can reproduce the regression. > > Good news, I was able to reproduce the difference with only Numpy > 1.20.1. > > Arrays prepared with (`df` is a Pandas dataframe) > > arr = df.values.copy() > > or > > arr = np.ascontiguousarray(df.values) > > lead to "slow" execution while arrays prepared with > > arr = np.copy(df.values) > > lead to faster execution. > > arr.copy() or np.copy(arr) do not give the same result, with arr > obtained from a Pandas dataframe with arr = df.values. It's strange > because type(df.values) gives so I would > expect arr.copy() and np.copy(arr) to give exactly the same result. The only thing that can change would be the arrays flags and `arr.strides`, but they should not have cahnged. And there is no change in NumPy that I can even remotely think of. Array data is just allocated with `malloc`. That is: as I understand it, you are *not* timing `np.copy` or `np.ascontiguouscopy` itself, but just operating on the array returned. NumPy only ever uses `malloc` for allocating array content. > > Note that I think I'm doing quite serious and reproducible > benchmarks. I also checked that this regression is reproducible on > another computer. I absolutely trust the benchmark results. I was hoping you might also be running a profiler (as in analyze the running program) to find out where the difference originate on the C side. That would allow to say with certainty either what changed or that there was no actual related code change. E.g. I have seen huge speed differences in the same `memcpy` or similar calls, due to whatever reasons (maybe due to compiler changes, or due to address space changes... or maybe the former causing the latter, I don't know.). Cheers, Sebastian > > Cheers, > > Pierre > > ----- Mail original ----- > > De: "Sebastian Berg" > > ?: "numpy-discussion" > > Envoy?: Vendredi 12 Mars 2021 22:50:24 > > Objet: Re: [Numpy-discussion] Looking for a difference between > > Numpy 0.19.5 and 0.20 explaining a perf regression with > > Pythran > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > > > Hi, > > > > > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which > > > could explain a performance regression (~15 %) with Pythran. > > > > > > I observe this regression with the script > > > https://github.com/paugier/nbabel/blob/master/py/bench.py > > > > > > Pythran reimplements Numpy so it is not about Numpy code for > > > computation. However, Pythran of course uses the native array > > > contained in a Numpy array. I'm quite sure that something has > > > changed > > > between Numpy 0.19.5 and 0.20 (or between the corresponding > > > wheels?) > > > since I don't get the same performance with Numpy 0.20. I checked > > > that the values in the arrays are the same and that the flags > > > characterizing the arrays are also the same. > > > > > > Good news, I'm now able to obtain the performance difference just > > > with Numpy 0.19.5. In this code, I load the data with Pandas and > > > need > > > to prepare contiguous Numpy arrays to give them to Pythran. With > > > Numpy 0.19.5, if I use np.copy I get better performance that with > > > np.ascontiguousarray. With Numpy 0.20, both functions create > > > array > > > giving the same performance with Pythran (again, less good that > > > with > > > Numpy 0.19.5). > > > > > > Note that this code is very efficient (more that 100 times faster > > > than using Numpy), so I guess that things like alignment or > > > memory > > > location can lead to such difference. > > > > > > More details in this issue > > > https://github.com/serge-sans-paille/pythran/issues/1735 > > > > > > Any help to understand what has changed would be greatly > > > appreciated! > > > > > > > If you want to really dig into this, it would be good to do > > profiling > > to find out at where the differences are. > > > > Without that, I don't have much appetite to investigate personally. > > The > > reason is that fluctuations of ~30% (or even much more) when > > running > > the NumPy benchmarks are very common. > > > > I am not aware of an immediate change in NumPy, especially since > > you > > are talking pythran, and only the memory space or the interface > > code > > should matter. > > As to the interface code... I would expect it to be quite a bit > > faster, > > not slower. > > There was no change around data allocation, so at best what you are > > seeing is a different pattern in how the "small array cache" ends > > up > > being used. > > > > > > Unfortunately, getting stable benchmarks that reflect code changes > > exactly is tough...? Here is a nice blog post from Victor Stinner > > where > > he had to go as far as using "profile guided compilation" to avoid > > fluctuations: > > > > > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > > > I somewhat hope that this is also the reason for the huge > > fluctuations > > we see in the NumPy benchmarks due to absolutely unrelated code > > changes. > > But I did not have the energy to try it (and a probably fixed bug > > in > > gcc makes it a bit harder right now). > > > > Cheers, > > > > Sebastian > > > > > > > > > > > Cheers, > > > Pierre > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From klark--kent at yandex.ru Sat Mar 13 15:55:17 2021 From: klark--kent at yandex.ru (klark--kent at yandex.ru) Date: Sat, 13 Mar 2021 23:55:17 +0300 Subject: [Numpy-discussion] size of arrays In-Reply-To: References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: <526711615664631@mail.yandex.ru> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 23474 bytes Desc: not available URL: From toddrjen at gmail.com Sat Mar 13 16:05:11 2021 From: toddrjen at gmail.com (Todd) Date: Sat, 13 Mar 2021 16:05:11 -0500 Subject: [Numpy-discussion] size of arrays In-Reply-To: <526711615664631@mail.yandex.ru> References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <526711615664631@mail.yandex.ru> Message-ID: Ideally float64 uses 64 bits for each number while float16 uses 16 bits. 64/16=4. However, there is some additional overhead. This overhead makes up a large portion of small arrays, but becomes negligible as the array gets bigger. On Sat, Mar 13, 2021, 16:01 wrote: > Dear colleagues! > > Size of np.float16(1) is 26 > Size of np.float64(1) is 32 > 32 / 26 = 1.23 > > Since memory is limited I have a question after this code: > > import numpy as np > import sys > > a1 = np.ones(1, dtype='float16') > b1 = np.ones(1, dtype='float64') > div_1 = sys.getsizeof(b1) / sys.getsizeof(a1) > # div_1 = 1.06 > > a2 = np.ones(10, dtype='float16') > b2 = np.ones(10, dtype='float64') > div_2 = sys.getsizeof(b2) / sys.getsizeof(a2) > # div_2 = 1.51 > > a3 = np.ones(100, dtype='float16') > b3 = np.ones(100, dtype='float64') > div_3 = sys.getsizeof(b3) / sys.getsizeof(a3) > # div_3 = 3.0 > Size of np.float64 numpy arrays is four times more than for np.float16. > Is it possible to minimize the difference close to 1.23? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 23474 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 23474 bytes Desc: not available URL: From robert.kern at gmail.com Sat Mar 13 16:15:15 2021 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 13 Mar 2021 16:15:15 -0500 Subject: [Numpy-discussion] size of arrays In-Reply-To: <526711615664631@mail.yandex.ru> References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <526711615664631@mail.yandex.ru> Message-ID: On Sat, Mar 13, 2021 at 4:02 PM wrote: > Dear colleagues! > > Size of np.float16(1) is 26 > Size of np.float64(1) is 32 > 32 / 26 = 1.23 > Note that `sys.getsizeof()` is returning the size of the given Python object in bytes. `np.float16(1)` and `np.float64(1)` are so-called "numpy scalar objects" that wrap up the raw `float16` (2 bytes) and `float64` (8 bytes) values with the necessary information to make them Python objects. The extra 24 bytes for each is _not_ present for each value when you have `float16` and `float64` arrays of larger lengths. There is still some overhead to make the array of numbers into a Python object, but this does not increase with the number of array elements. This is what you are seeing below when you compute the sizes of the Python objects that are the arrays. The fixed overhead does not increase when you increase the sizes of the arrays. They eventually approach the ideal ratio of 4: `float64` values take up 4 times as many bytes as `float16` values, as the names suggest. The ratio of 1.23 that you get from comparing the scalar objects reflects that the overhead for making a single value into a Python object takes up significantly more memory than the actual single number itself. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From klark--kent at yandex.ru Sat Mar 13 16:17:08 2021 From: klark--kent at yandex.ru (klark--kent at yandex.ru) Date: Sun, 14 Mar 2021 00:17:08 +0300 Subject: [Numpy-discussion] size of arrays In-Reply-To: References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <526711615664631@mail.yandex.ru> Message-ID: <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Mar 13 16:21:30 2021 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 13 Mar 2021 16:21:30 -0500 Subject: [Numpy-discussion] size of arrays In-Reply-To: <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net> References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <526711615664631@mail.yandex.ru> <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net> Message-ID: On Sat, Mar 13, 2021 at 4:18 PM wrote: > So is it right that 100 arrays of one element is smaller than one array > with size of 100 elements? > No, typically the opposite is true. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Sat Mar 13 18:27:23 2021 From: toddrjen at gmail.com (Todd) Date: Sat, 13 Mar 2021 18:27:23 -0500 Subject: [Numpy-discussion] size of arrays In-Reply-To: <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net> References: <1824891914.8256126.1615581380025.JavaMail.zimbra@univ-grenoble-alpes.fr> <58c6b9734617461daaac6780bd6e0c3268bbf9c9.camel@sipsolutions.net> <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <526711615664631@mail.yandex.ru> <1604271615670228@myt4-52e7f804d1cd.qloud-c.yandex.net> Message-ID: No, because the array of 100 elements will only have the overhead once, while the 100 arrays will each have the overhead repeated. Think about the overhead like a book cover on a book. It takes additional space, but provides storage for the book, information to help you find it, etc. Each book only needs one cover. So a single 100 page book only needs one cover, while a hundred 1 page books needs 100 covers. Also, as the book gets more pages the cover takes a smaller portion of the total size of the book. On Sat, Mar 13, 2021, 16:17 wrote: > So is it right that 100 arrays of one element is smaller than one array > with size of 100 elements? > > 14.03.2021, 00:06, "Todd" : > > Ideally float64 uses 64 bits for each number while float16 uses 16 bits. > 64/16=4. However, there is some additional overhead. This overhead makes > up a large portion of small arrays, but becomes negligible as the array > gets bigger. > > On Sat, Mar 13, 2021, 16:01 wrote: > > Dear colleagues! > > Size of np.float16(1) is 26 > Size of np.float64(1) is 32 > 32 / 26 = 1.23 > > Since memory is limited I have a question after this code: > > import numpy as np > import sys > > a1 = np.ones(1, dtype='float16') > b1 = np.ones(1, dtype='float64') > div_1 = sys.getsizeof(b1) / sys.getsizeof(a1) > # div_1 = 1.06 > > a2 = np.ones(10, dtype='float16') > b2 = np.ones(10, dtype='float64') > div_2 = sys.getsizeof(b2) / sys.getsizeof(a2) > # div_2 = 1.51 > > a3 = np.ones(100, dtype='float16') > b3 = np.ones(100, dtype='float64') > div_3 = sys.getsizeof(b3) / sys.getsizeof(a3) > # div_3 = 3.0 > Size of np.float64 numpy arrays is four times more than for np.float16. > Is it possible to minimize the difference close to 1.23? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan_patterson at outlook.com Sat Mar 13 23:12:58 2021 From: dan_patterson at outlook.com (dan_patterson) Date: Sat, 13 Mar 2021 21:12:58 -0700 (MST) Subject: [Numpy-discussion] Numpy 1.20.1 availability Message-ID: <1615695178675-0.post@n7.nabble.com> Any idea why the most recent version isn't available on the main anaconda channel. conda-forge and building are not options for a number of reasons. I posted a package request there but double digit days have gone by it just got a thumbs up and package-request tag https://github.com/ContinuumIO/anaconda-issues/issues/12309 I realize it could be the "times" or maybe no one is aware of its absence. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From jni at fastmail.com Sun Mar 14 01:15:39 2021 From: jni at fastmail.com (Juan Nunez-Iglesias) Date: Sun, 14 Mar 2021 17:15:39 +1100 Subject: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran In-Reply-To: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com> Hi Pierre, If you?re able to compile NumPy locally and you have reliable benchmarks, you can write a script that tests the runtime of your benchmark and reports it as a test pass/fail. You can then use ?git bisect run? to automatically find the commit that caused the issue. That will help narrow down the discussion before it gets completely derailed a second time. ? https://lwn.net/Articles/317154/ Juan. > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER wrote: > > ?Hi, > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy --force-reinstall` and I can reproduce the regression. > > Good news, I was able to reproduce the difference with only Numpy 1.20.1. > > Arrays prepared with (`df` is a Pandas dataframe) > > arr = df.values.copy() > > or > > arr = np.ascontiguousarray(df.values) > > lead to "slow" execution while arrays prepared with > > arr = np.copy(df.values) > > lead to faster execution. > > arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a Pandas dataframe with arr = df.values. It's strange because type(df.values) gives so I would expect arr.copy() and np.copy(arr) to give exactly the same result. > > Note that I think I'm doing quite serious and reproducible benchmarks. I also checked that this regression is reproducible on another computer. > > Cheers, > > Pierre > > ----- Mail original ----- >> De: "Sebastian Berg" >> ?: "numpy-discussion" >> Envoy?: Vendredi 12 Mars 2021 22:50:24 >> Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with >> Pythran > >>> On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: >>> Hi, >>> >>> I'm looking for a difference between Numpy 0.19.5 and 0.20 which >>> could explain a performance regression (~15 %) with Pythran. >>> >>> I observe this regression with the script >>> https://github.com/paugier/nbabel/blob/master/py/bench.py >>> >>> Pythran reimplements Numpy so it is not about Numpy code for >>> computation. However, Pythran of course uses the native array >>> contained in a Numpy array. I'm quite sure that something has changed >>> between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) >>> since I don't get the same performance with Numpy 0.20. I checked >>> that the values in the arrays are the same and that the flags >>> characterizing the arrays are also the same. >>> >>> Good news, I'm now able to obtain the performance difference just >>> with Numpy 0.19.5. In this code, I load the data with Pandas and need >>> to prepare contiguous Numpy arrays to give them to Pythran. With >>> Numpy 0.19.5, if I use np.copy I get better performance that with >>> np.ascontiguousarray. With Numpy 0.20, both functions create array >>> giving the same performance with Pythran (again, less good that with >>> Numpy 0.19.5). >>> >>> Note that this code is very efficient (more that 100 times faster >>> than using Numpy), so I guess that things like alignment or memory >>> location can lead to such difference. >>> >>> More details in this issue >>> https://github.com/serge-sans-paille/pythran/issues/1735 >>> >>> Any help to understand what has changed would be greatly appreciated! >>> >> >> If you want to really dig into this, it would be good to do profiling >> to find out at where the differences are. >> >> Without that, I don't have much appetite to investigate personally. The >> reason is that fluctuations of ~30% (or even much more) when running >> the NumPy benchmarks are very common. >> >> I am not aware of an immediate change in NumPy, especially since you >> are talking pythran, and only the memory space or the interface code >> should matter. >> As to the interface code... I would expect it to be quite a bit faster, >> not slower. >> There was no change around data allocation, so at best what you are >> seeing is a different pattern in how the "small array cache" ends up >> being used. >> >> >> Unfortunately, getting stable benchmarks that reflect code changes >> exactly is tough... Here is a nice blog post from Victor Stinner where >> he had to go as far as using "profile guided compilation" to avoid >> fluctuations: >> >> https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html >> >> I somewhat hope that this is also the reason for the huge fluctuations >> we see in the NumPy benchmarks due to absolutely unrelated code >> changes. >> But I did not have the energy to try it (and a probably fixed bug in >> gcc makes it a bit harder right now). >> >> Cheers, >> >> Sebastian >> >> >> >> >>> Cheers, >>> Pierre >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sun Mar 14 04:05:50 2021 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 14 Mar 2021 10:05:50 +0200 Subject: [Numpy-discussion] Numpy 1.20.1 availability In-Reply-To: <1615695178675-0.post@n7.nabble.com> References: <1615695178675-0.post@n7.nabble.com> Message-ID: An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Sun Mar 14 06:14:13 2021 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 14 Mar 2021 10:14:13 +0000 Subject: [Numpy-discussion] Numpy 1.20.1 availability In-Reply-To: References: <1615695178675-0.post@n7.nabble.com> Message-ID: I would recommend using the community run conda-forge as one of your default conda channels. They have a very slick largely automated system to update recipes when upstream makes a release. The default Anaconda channel from Anaconda, Inc. (formerly Continuum Analytics, Inc.) is comparatively slow. You may recognise some of the maintainers of the conda-forge numpy recipe? https://github.com/conda-forge/numpy-feedstock/ I'm impressed to see 17 million conda-forge numpy downloads, vs 'just' 2.5 million downloads of the default channel's package: https://anaconda.org/conda-forge/numpy https://anaconda.org/anaconda/numpy Regards, Peter On Sun, Mar 14, 2021 at 8:06 AM Matti Picus wrote: > > On 3/14/21 6:12 AM, dan_patterson wrote: > > Any idea why the most recent version isn't available on the main anaconda > channel. conda-forge and building are not options for a number of reasons. > I posted a package request there but double digit days have gone by it just > got a thumbs up and package-request tag > https://github.com/ContinuumIO/anaconda-issues/issues/12309 > I realize it could be the "times" or maybe no one is aware of its absence. > > > NumPy does not control the packages on the main anaconda channel, so a request here is likely to go unanswered. The package has been updated in the conda-forge channel. > > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From dan_patterson at outlook.com Sun Mar 14 07:43:09 2021 From: dan_patterson at outlook.com (dan_patterson) Date: Sun, 14 Mar 2021 04:43:09 -0700 (MST) Subject: [Numpy-discussion] Numpy 1.20.1 availability In-Reply-To: References: <1615695178675-0.post@n7.nabble.com> Message-ID: <1615722189832-0.post@n7.nabble.com> Thanks, glad to hear that people are aware of the delay. As I said, there are other reasons beyond my control, for the limitations. The wait is on. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From ralf.gommers at gmail.com Sun Mar 14 07:45:17 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 14 Mar 2021 12:45:17 +0100 Subject: [Numpy-discussion] Numpy 1.20.1 availability In-Reply-To: References: <1615695178675-0.post@n7.nabble.com> Message-ID: On Sun, Mar 14, 2021 at 11:14 AM Peter Cock wrote: > I would recommend using the community run conda-forge as one of your > default conda channels. They have a very slick largely automated system > to update recipes when upstream makes a release. The default Anaconda > channel from Anaconda, Inc. (formerly Continuum Analytics, Inc.) is > comparatively slow. > Agreed. I know the goal of the maintainers of the defaults channel is to make the latest version available quickly. However, `defaults` requires more integration testing than conda-forge/PyPI, and work tends to happen in batches - in the past we've seen update times ranging from days to several months. We have some guidance at https://numpy.org/install/. Basically the two main reasons to use `defaults`: for beginning users with modest needs, the easiest thing to get started is just installing the Anaconda distribution (which gives you `defaults`). Or you have corporate policies to use `defaults` - you can pay Anaconda and it does come with things companies and institutions may need, like guarantees around uptime and security. Cheers, Ralf > You may recognise some of the maintainers of the conda-forge numpy > recipe? https://github.com/conda-forge/numpy-feedstock/ > > I'm impressed to see 17 million conda-forge numpy downloads, vs > 'just' 2.5 million downloads of the default channel's package: > > https://anaconda.org/conda-forge/numpy > https://anaconda.org/anaconda/numpy > > Regards, > > Peter > > On Sun, Mar 14, 2021 at 8:06 AM Matti Picus wrote: > > > > On 3/14/21 6:12 AM, dan_patterson wrote: > > > > Any idea why the most recent version isn't available on the main anaconda > > channel. conda-forge and building are not options for a number of > reasons. > > I posted a package request there but double digit days have gone by it > just > > got a thumbs up and package-request tag > > https://github.com/ContinuumIO/anaconda-issues/issues/12309 > > I realize it could be the "times" or maybe no one is aware of its > absence. > > > > > > NumPy does not control the packages on the main anaconda channel, so a > request here is likely to go unanswered. The package has been updated in > the conda-forge channel. > > > > > > Matti > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Sun Mar 14 10:52:58 2021 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 14 Mar 2021 10:52:58 -0400 Subject: [Numpy-discussion] Pi day easter egg Message-ID: There's a little pi day easter egg for all math fans. Google for pi to find it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Mar 14 12:48:13 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 14 Mar 2021 11:48:13 -0500 Subject: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with Pythran In-Reply-To: <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com> References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com> Message-ID: On Sun, 2021-03-14 at 17:15 +1100, Juan Nunez-Iglesias wrote: > Hi Pierre, > > If you?re able to compile NumPy locally and you have reliable > benchmarks, you can write a script that tests the runtime of your > benchmark and reports it as a test pass/fail. You can then use ?git > bisect run? to automatically find the commit that caused the issue. > That will help narrow down the discussion before it gets completely > derailed a second time. ? > > https://lwn.net/Articles/317154/ Let me share this partial benchmark result for branch I just worked on in NumPy: before after ratio [c5de5b5c] [2d9e11ea]
+ 2.12?0.01?s 3.69?0.02?s 1.74 bench_io.Copy.time_cont_assign('float32') + 22.6?0.08?s 36.0?0.2?s 1.59 bench_io.CopyTo.time_copyto_sparse + 49.4?0.8?s 55.2?0.1?s 1.12 bench_io.CopyTo.time_copyto_8_sparse - 7.40?0.06?s 4.11?0.01?s 0.56 bench_io.CopyTo.time_copyto_dense - 6.99?0.05?s 3.77?0?s 0.54 bench_io.Copy.time_cont_assign('float64') - 6.94?0.02?s 3.73?0.01?s 0.54 bench_io.Copy.time_cont_assign('complex64') That looks weird! The benchmark sometimes speeds up by a factor of almost 2, and sometimes the (de-facto) same code slows down by just as much? (Focus on the `time_cont_assign` with float64 vs. float32). Even better: I know 100% that no related code is touched! The core of that benchmark is just: arrya[...] = 1 and I did not even come close to any code related to that operation. I have, as I did before, tried quite a few things (not as much as in Victor Stinner's blog when it comes to compiler flags). Such as enabling/disabling huge-pages, disabling address-space- randomization (and disabling the NumPy small-array cache). Note that the results are *stable*, as in: On this branch, I get extremely reliable results for the benchmark [1]! As you noticed, I have also seen these (or similar) changes "toggle" e.g. when copying the array multiple times. And I have dug down into profiling one instance on the instruction level with `perf` so I know for a fact that it is memory access speed. (Which is a no-brainer here, the operations are obviously memory or even cache speed bound.) The point I was hoping to make is: Its complicated, and I am not holding my breath that you can find an answer without digging much deeper. The blog post from Victor Stinner gave me the thought that profile- guided-optimization *might* be a way to avoid some random fluctuations, but I have not checked that the inner-loop for the code actually compiles to different byte-code. I would hope that someone comes along and "just knows" what is going on. But, I don't know where to ask or what to google for. My best bets right now (they may be terrible!) are: * Profiler guided optimization might help (as in stabilize compiler output due to *random* changes in code). Which probably is involved in some way or another. But Victor Stinner's timed Python and that may not have any massively memory bound operations (which are the "big" things here). * Maybe try to make the NumPy allocator align all its allocation to much larger boundaries, such as the CPU cache-line size. But I think I tried to check whether alignment seems to matter, and it didn't. Also, the arrays feel large enough that it shouldn't matter? * CPU caching L1/L2 uses a lot of fancy heuristics these days. Maybe to really understand whats going on, you would have to drill into what the CPU caches are doing here? The only thing I do know for sure currently, is that it is a rabbit hole that I would love to understand, but don't really want to spend days just to get nowhere. Cheers, Sebastian [1] That run above is without address space randomization, it feels even more stable than the others. But that doesn't matter, since we average in any case, so ASR is probably useless and maybe even detrimental. > > Juan. > > > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER < > > pierre.augier at univ-grenoble-alpes.fr> wrote: > > > > ?Hi, > > > > I tried to compile Numpy with `pip install numpy==1.20.1 --no- > > binary numpy --force-reinstall` and I can reproduce the regression. > > > > Good news, I was able to reproduce the difference with only Numpy > > 1.20.1. > > > > Arrays prepared with (`df` is a Pandas dataframe) > > > > arr = df.values.copy() > > > > or > > > > arr = np.ascontiguousarray(df.values) > > > > lead to "slow" execution while arrays prepared with > > > > arr = np.copy(df.values) > > > > lead to faster execution. > > > > arr.copy() or np.copy(arr) do not give the same result, with arr > > obtained from a Pandas dataframe with arr = df.values. It's strange > > because type(df.values) gives so I would > > expect arr.copy() and np.copy(arr) to give exactly the same result. > > > > Note that I think I'm doing quite serious and reproducible > > benchmarks. I also checked that this regression is reproducible on > > another computer. > > > > Cheers, > > > > Pierre > > > > ----- Mail original ----- > > > De: "Sebastian Berg" > > > ?: "numpy-discussion" > > > Envoy?: Vendredi 12 Mars 2021 22:50:24 > > > Objet: Re: [Numpy-discussion] Looking for a difference between > > > Numpy 0.19.5 and 0.20 explaining a perf regression with > > > Pythran > > > > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > > > > Hi, > > > > > > > > I'm looking for a difference between Numpy 0.19.5 and 0.20 > > > > which > > > > could explain a performance regression (~15 %) with Pythran. > > > > > > > > I observe this regression with the script > > > > https://github.com/paugier/nbabel/blob/master/py/bench.py > > > > > > > > Pythran reimplements Numpy so it is not about Numpy code for > > > > computation. However, Pythran of course uses the native array > > > > contained in a Numpy array. I'm quite sure that something has > > > > changed > > > > between Numpy 0.19.5 and 0.20 (or between the corresponding > > > > wheels?) > > > > since I don't get the same performance with Numpy 0.20. I > > > > checked > > > > that the values in the arrays are the same and that the flags > > > > characterizing the arrays are also the same. > > > > > > > > Good news, I'm now able to obtain the performance difference > > > > just > > > > with Numpy 0.19.5. In this code, I load the data with Pandas > > > > and need > > > > to prepare contiguous Numpy arrays to give them to Pythran. > > > > With > > > > Numpy 0.19.5, if I use np.copy I get better performance that > > > > with > > > > np.ascontiguousarray. With Numpy 0.20, both functions create > > > > array > > > > giving the same performance with Pythran (again, less good that > > > > with > > > > Numpy 0.19.5). > > > > > > > > Note that this code is very efficient (more that 100 times > > > > faster > > > > than using Numpy), so I guess that things like alignment or > > > > memory > > > > location can lead to such difference. > > > > > > > > More details in this issue > > > > https://github.com/serge-sans-paille/pythran/issues/1735 > > > > > > > > Any help to understand what has changed would be greatly > > > > appreciated! > > > > > > > > > > If you want to really dig into this, it would be good to do > > > profiling > > > to find out at where the differences are. > > > > > > Without that, I don't have much appetite to investigate > > > personally. The > > > reason is that fluctuations of ~30% (or even much more) when > > > running > > > the NumPy benchmarks are very common. > > > > > > I am not aware of an immediate change in NumPy, especially since > > > you > > > are talking pythran, and only the memory space or the interface > > > code > > > should matter. > > > As to the interface code... I would expect it to be quite a bit > > > faster, > > > not slower. > > > There was no change around data allocation, so at best what you > > > are > > > seeing is a different pattern in how the "small array cache" ends > > > up > > > being used. > > > > > > > > > Unfortunately, getting stable benchmarks that reflect code > > > changes > > > exactly is tough...? Here is a nice blog post from Victor Stinner > > > where > > > he had to go as far as using "profile guided compilation" to > > > avoid > > > fluctuations: > > > > > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > > > > > I somewhat hope that this is also the reason for the huge > > > fluctuations > > > we see in the NumPy benchmarks due to absolutely unrelated code > > > changes. > > > But I did not have the energy to try it (and a probably fixed bug > > > in > > > gcc makes it a bit harder right now). > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > > > > > > > > Cheers, > > > > Pierre > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sheikholeslam.ali at gmail.com Sun Mar 14 15:04:58 2021 From: sheikholeslam.ali at gmail.com (Ali Sheikholeslam) Date: Sun, 14 Mar 2021 22:34:58 +0330 Subject: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists Message-ID: I have written a question in: https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o It was recommended by numpy to send this subject to the mailing lists. The question is as follows. I would be appreciated if you could advise me to solve the problem: At first, I write a small example of to lists: F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]] # (1*5) 5 lists S = [[1,3,7],[6,8,1],[3,2,7]] # (1*3) 3 lists I want to get Boolean matrix for the same 'list's in two F and S: [False, True, False, False, True] # (1*5) 5 Booleans for 5 lists of F By using IM = reduce(np.in1d, (F, S)) it gives results for each number in each lists of F: [ True True True True True True False False True False True True True True True] # (1*15) By using IM = reduce(np.isin, (F, S)) it gives results for each number in each lists of F, too, but in another shape: [[ True True True] [ True True True] [False False True] [False True True] [ True True True]] # (5*3) The true result will be achieved by code IM = [i in S for i in F] for the example lists, but when I'm using this code for my two main bigger numpy arrays of lists: https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing numpy array: 3036 lists https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing numpy array: 300 lists It gives wrong answer. For the main files it must give 3036 Boolean, in which 'True' is only 300 numbers. I didn't understand why this get wrong answers?? It seems it applied only on the 3rd characters in each lists of F. It is preferred to use reduce function by the two functions, np.in1d and np.isin, instead of the last method. How could to solve each of the three above methods?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Mar 14 15:35:09 2021 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 14 Mar 2021 15:35:09 -0400 Subject: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists In-Reply-To: References: Message-ID: On Sun, Mar 14, 2021 at 3:06 PM Ali Sheikholeslam < sheikholeslam.ali at gmail.com> wrote: > I have written a question in: > > https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o > It was recommended by numpy to send this subject to the mailing lists. > > The question is as follows. I would be appreciated if you could advise me > to solve the problem: > > At first, I write a small example of to lists: > > F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]] # (1*5) 5 lists > S = [[1,3,7],[6,8,1],[3,2,7]] # (1*3) 3 lists > > I want to get Boolean matrix for the same 'list's in two F and S: > > [False, True, False, False, True] # (1*5) 5 Booleans for 5 lists of F > > By using IM = reduce(np.in1d, (F, S)) it gives results for each number in > each lists of F: > > [ True True True True True True False False True False True True > True True True] # (1*15) > > By using IM = reduce(np.isin, (F, S)) it gives results for each number in > each lists of F, too, but in another shape: > > [[ True True True] > [ True True True] > [False False True] > [False True True] > [ True True True]] # (5*3) > > The true result will be achieved by code IM = [i in S for i in F] for the > example lists, but when I'm using this code for my two main bigger numpy > arrays of lists: > > > https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing > > numpy array: 3036 lists > > > https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing > > numpy array: 300 lists > > It gives wrong answer. For the main files it must give 3036 Boolean, in > which 'True' is only 300 numbers. I didn't understand why this get wrong > answers?? It seems it applied only on the 3rd characters in each lists of > F. It is preferred to use reduce function by the two functions, np.in1d and > np.isin, instead of the last method. How could to solve each of the three > above methods?? > Thank you for providing the data. Can you show a complete, runnable code sample that fails? There are several things that could go wrong here, and we can't be sure which is which without the exact code that you ran. In general, you may well have problems with the floating point data that you are not seeing with your integer examples. FWIW, I would continue to use something like the `IM = [i in S for i in F]` list comprehension for data of this size. You aren't getting any benefit trying to convert to arrays and using our array set operations. They are written for 1D arrays of numbers, not 2D arrays (attempting to treat them as 1D arrays of lists) and won't really work on your data. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Sun Mar 14 15:45:58 2021 From: deak.andris at gmail.com (Andras Deak) Date: Sun, 14 Mar 2021 20:45:58 +0100 Subject: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists In-Reply-To: References: Message-ID: On Sun, Mar 14, 2021 at 8:35 PM Robert Kern wrote: > > On Sun, Mar 14, 2021 at 3:06 PM Ali Sheikholeslam wrote: >> >> I have written a question in: >> https://stackoverflow.com/questions/66623145/how-to-get-boolean-matrix-for-similar-lists-in-two-different-size-numpy-arrays-o >> It was recommended by numpy to send this subject to the mailing lists. >> >> The question is as follows. I would be appreciated if you could advise me to solve the problem: >> >> At first, I write a small example of to lists: >> >> F = [[1,2,3],[3,2,7],[4,4,1],[5,6,3],[1,3,7]] # (1*5) 5 lists >> S = [[1,3,7],[6,8,1],[3,2,7]] # (1*3) 3 lists >> >> I want to get Boolean matrix for the same 'list's in two F and S: >> >> [False, True, False, False, True] # (1*5) 5 Booleans for 5 lists of F >> >> By using IM = reduce(np.in1d, (F, S)) it gives results for each number in each lists of F: >> >> [ True True True True True True False False True False True True >> True True True] # (1*15) >> >> By using IM = reduce(np.isin, (F, S)) it gives results for each number in each lists of F, too, but in another shape: >> >> [[ True True True] >> [ True True True] >> [False False True] >> [False True True] >> [ True True True]] # (5*3) >> >> The true result will be achieved by code IM = [i in S for i in F] for the example lists, but when I'm using this code for my two main bigger numpy arrays of lists: >> >> https://drive.google.com/file/d/1YUUdqxRu__9-fhE1542xqei-rjB3HOxX/view?usp=sharing >> >> numpy array: 3036 lists >> >> https://drive.google.com/file/d/1FrggAa-JoxxoRqRs8NVV_F69DdVdiq_m/view?usp=sharing >> >> numpy array: 300 lists >> >> It gives wrong answer. For the main files it must give 3036 Boolean, in which 'True' is only 300 numbers. I didn't understand why this get wrong answers?? It seems it applied only on the 3rd characters in each lists of F. It is preferred to use reduce function by the two functions, np.in1d and np.isin, instead of the last method. How could to solve each of the three above methods?? > > > Thank you for providing the data. Can you show a complete, runnable code sample that fails? There are several things that could go wrong here, and we can't be sure which is which without the exact code that you ran. > > In general, you may well have problems with the floating point data that you are not seeing with your integer examples. > > FWIW, I would continue to use something like the `IM = [i in S for i in F]` list comprehension for data of this size. Although somewhat off-topic for the numpy aspect, for completeness' sake let me add that you'll probably want to first turn your list of lists `S` into a set of tuples, and then look up each list in `F` converted to a tuple (`[tuple(lst) in setified_S for lst in F]`). That would probably be a lot faster for large lists. Andr?s You aren't getting any benefit trying to convert to arrays and using our array set operations. They are written for 1D arrays of numbers, not 2D arrays (attempting to treat them as 1D arrays of lists) and won't really work on your data. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From blkzol001 at myuct.ac.za Sun Mar 14 16:17:40 2021 From: blkzol001 at myuct.ac.za (zoj613) Date: Sun, 14 Mar 2021 13:17:40 -0700 (MST) Subject: [Numpy-discussion] How to get Boolean matrix for similar lists in two different-size numpy arrays of lists In-Reply-To: References: Message-ID: <1615753060752-0.post@n7.nabble.com> The following seems to produce what you want using the data provided ``` In [31]: dF = np.genfromtxt('/home/F.csv', delimiter=',').tolist() In [32]: dS = np.genfromtxt('/home/S.csv', delimiter=',').tolist() In [33]: r = [True if i in lS else False for i in dF] In [34]: sum(r) Out[34]: 300 ``` I hope this helps. -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From Jerome.Kieffer at esrf.fr Mon Mar 15 03:58:14 2021 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Mon, 15 Mar 2021 08:58:14 +0100 Subject: [Numpy-discussion] Numpy 1.20.1 availability In-Reply-To: References: <1615695178675-0.post@n7.nabble.com> Message-ID: <20210315085814.41637650@antarctica.fournet.lan> On Sun, 14 Mar 2021 10:14:13 +0000 Peter Cock wrote: > I'm impressed to see 17 million conda-forge numpy downloads, vs > 'just' 2.5 million downloads of the default channel's package: I doubt the download figures from conda are correct ... A couple of days after my software package has entered "conda-forge" its metric was already 2 orders of magnitude larger than any other distribution route: pip, debian packages, ... Since I know the approximate size of the community, I have some doubts on the figures. I suspect downloads for CI are all accounted and none cached, ... Cheers, Jerome From pierre.augier at univ-grenoble-alpes.fr Mon Mar 15 07:29:02 2021 From: pierre.augier at univ-grenoble-alpes.fr (PIERRE AUGIER) Date: Mon, 15 Mar 2021 12:29:02 +0100 (CET) Subject: [Numpy-discussion] Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument) In-Reply-To: <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com> References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com> Message-ID: <137390869.547851.1615807742248.JavaMail.zimbra@univ-grenoble-alpes.fr> ----- Mail original ----- > De: "Juan Nunez-Iglesias" > ?: "numpy-discussion" > Envoy?: Dimanche 14 Mars 2021 07:15:39 > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with > Pythran > Hi Pierre, > > If you?re able to compile NumPy locally and you have reliable benchmarks, you > can write a script that tests the runtime of your benchmark and reports it as a > test pass/fail. You can then use ?git bisect run? to automatically find the > commit that caused the issue. That will help narrow down the discussion before > it gets completely derailed a second time. ? > > [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ] > > Juan. Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg bisect` I managed to find that the first "bad" commit is https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f ENH: implement NEP-35's `like=` argument (gh-16935) >From the point of view of my benchmark, this commit changes the behavior of arr.copy() (the resulting arrays do not give to the same performance). This makes sense because it is indeed about the array creation. I haven't yet studied in details this commit (which is quite big and not simple) and I'm not sure I'm going to be able to understand it and in particular understand why it leads to such performance regression! Cheers, Pierre > > > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER > wrote: > > > > > Hi, > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy > --force-reinstall` and I can reproduce the regression. > > Good news, I was able to reproduce the difference with only Numpy 1.20.1. > > Arrays prepared with (`df` is a Pandas dataframe) > > arr = df.values.copy() > > or > > arr = np.ascontiguousarray(df.values) > > lead to "slow" execution while arrays prepared with > > arr = np.copy(df.values) > > lead to faster execution. > > arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a > Pandas dataframe with arr = df.values. It's strange because type(df.values) > gives so I would expect arr.copy() and np.copy(arr) to > give exactly the same result. > > Note that I think I'm doing quite serious and reproducible benchmarks. I also > checked that this regression is reproducible on another computer. > > Cheers, > > Pierre > > ----- Mail original ----- > > > De: "Sebastian Berg" > > > ?: "numpy-discussion" > > > Envoy?: Vendredi 12 Mars 2021 22:50:24 > > > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and > 0.20 explaining a perf regression with > > > Pythran > > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > > > > > Hi, > > > > > > > > > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which > > > > > could explain a performance regression (~15 %) with Pythran. > > > > > > > > > > I observe this regression with the script > > > > > https://github.com/paugier/nbabel/blob/master/py/bench.py > > > > > > > > > > Pythran reimplements Numpy so it is not about Numpy code for > > > > > computation. However, Pythran of course uses the native array > > > > > contained in a Numpy array. I'm quite sure that something has changed > > > > > between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) > > > > > since I don't get the same performance with Numpy 0.20. I checked > > > > > that the values in the arrays are the same and that the flags > > > > > characterizing the arrays are also the same. > > > > > > > > > > Good news, I'm now able to obtain the performance difference just > > > > > with Numpy 0.19.5. In this code, I load the data with Pandas and need > > > > > to prepare contiguous Numpy arrays to give them to Pythran. With > > > > > Numpy 0.19.5, if I use np.copy I get better performance that with > > > > > np.ascontiguousarray. With Numpy 0.20, both functions create array > > > > > giving the same performance with Pythran (again, less good that with > > > > > Numpy 0.19.5). > > > > > > > > > > Note that this code is very efficient (more that 100 times faster > > > > > than using Numpy), so I guess that things like alignment or memory > > > > > location can lead to such difference. > > > > > > > > > > More details in this issue > > > > > https://github.com/serge-sans-paille/pythran/issues/1735 > > > > > > > > > > Any help to understand what has changed would be greatly appreciated! > > > > > > > > > > > If you want to really dig into this, it would be good to do profiling > > > to find out at where the differences are. > > > > > > Without that, I don't have much appetite to investigate personally. The > > > reason is that fluctuations of ~30% (or even much more) when running > > > the NumPy benchmarks are very common. > > > > > > I am not aware of an immediate change in NumPy, especially since you > > > are talking pythran, and only the memory space or the interface code > > > should matter. > > > As to the interface code... I would expect it to be quite a bit faster, > > > not slower. > > > There was no change around data allocation, so at best what you are > > > seeing is a different pattern in how the "small array cache" ends up > > > being used. > > > > > > > > > Unfortunately, getting stable benchmarks that reflect code changes > > > exactly is tough... Here is a nice blog post from Victor Stinner where > > > he had to go as far as using "profile guided compilation" to avoid > > > fluctuations: > > > > > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > > > > > I somewhat hope that this is also the reason for the huge fluctuations > > > we see in the NumPy benchmarks due to absolutely unrelated code > > > changes. > > > But I did not have the energy to try it (and a probably fixed bug in > > > gcc makes it a bit harder right now). > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > > > > > > > > > Cheers, > > > > > Pierre > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From peter at entschev.com Mon Mar 15 09:59:21 2021 From: peter at entschev.com (Peter Andreas Entschev) Date: Mon, 15 Mar 2021 14:59:21 +0100 Subject: [Numpy-discussion] Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument) In-Reply-To: <137390869.547851.1615807742248.JavaMail.zimbra@univ-grenoble-alpes.fr> References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com> <137390869.547851.1615807742248.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: Hi Pierre, Thanks for pinging me. To put it in the simplest way possible, that PR adds a new `like` kwarg that will dispatch to downstream libraries using `__array_function__` when specified, otherwise fallback to the default behavior of NumPy. While that introduces an extra check on the C side, that should have minimal impact for use cases that don't use the `like` kwarg. Is there a simple reproducer with NumPy only? I assume your case with Pandas is much more complex (unfortunately I'm not very experienced with DataFrames), but curiously I see NumPy 1.20.1 being considerably faster for small arrays and mildly-faster with large arrays (results in https://gist.github.com/pentschev/add38b5aee61da87b4b70a1c4649861f) . Best, Peter On Mon, Mar 15, 2021 at 12:29 PM PIERRE AUGIER wrote: > > > ----- Mail original ----- > > De: "Juan Nunez-Iglesias" > > ?: "numpy-discussion" > > Envoy?: Dimanche 14 Mars 2021 07:15:39 > > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and 0.20 explaining a perf regression with > > Pythran > > > Hi Pierre, > > > > If you?re able to compile NumPy locally and you have reliable benchmarks, you > > can write a script that tests the runtime of your benchmark and reports it as a > > test pass/fail. You can then use ?git bisect run? to automatically find the > > commit that caused the issue. That will help narrow down the discussion before > > it gets completely derailed a second time. > > > > [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ] > > > > Juan. > > Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg bisect` I managed to find that the first "bad" commit is > > https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f ENH: implement NEP-35's `like=` argument (gh-16935) > > From the point of view of my benchmark, this commit changes the behavior of arr.copy() (the resulting arrays do not give to the same performance). This makes sense because it is indeed about the array creation. > > I haven't yet studied in details this commit (which is quite big and not simple) and I'm not sure I'm going to be able to understand it and in particular understand why it leads to such performance regression! > > Cheers, > > Pierre > > > > > > > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER > > wrote: > > > > > > > > > > Hi, > > > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy > > --force-reinstall` and I can reproduce the regression. > > > > Good news, I was able to reproduce the difference with only Numpy 1.20.1. > > > > Arrays prepared with (`df` is a Pandas dataframe) > > > > arr = df.values.copy() > > > > or > > > > arr = np.ascontiguousarray(df.values) > > > > lead to "slow" execution while arrays prepared with > > > > arr = np.copy(df.values) > > > > lead to faster execution. > > > > arr.copy() or np.copy(arr) do not give the same result, with arr obtained from a > > Pandas dataframe with arr = df.values. It's strange because type(df.values) > > gives so I would expect arr.copy() and np.copy(arr) to > > give exactly the same result. > > > > Note that I think I'm doing quite serious and reproducible benchmarks. I also > > checked that this regression is reproducible on another computer. > > > > Cheers, > > > > Pierre > > > > ----- Mail original ----- > > > > > > De: "Sebastian Berg" > > > > > > ?: "numpy-discussion" > > > > > > Envoy?: Vendredi 12 Mars 2021 22:50:24 > > > > > > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 and > > 0.20 explaining a perf regression with > > > > > > Pythran > > > > > > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which > > > > > > > > > > could explain a performance regression (~15 %) with Pythran. > > > > > > > > > > > > > > > > > > > > I observe this regression with the script > > > > > > > > > > https://github.com/paugier/nbabel/blob/master/py/bench.py > > > > > > > > > > > > > > > > > > > > Pythran reimplements Numpy so it is not about Numpy code for > > > > > > > > > > computation. However, Pythran of course uses the native array > > > > > > > > > > contained in a Numpy array. I'm quite sure that something has changed > > > > > > > > > > between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) > > > > > > > > > > since I don't get the same performance with Numpy 0.20. I checked > > > > > > > > > > that the values in the arrays are the same and that the flags > > > > > > > > > > characterizing the arrays are also the same. > > > > > > > > > > > > > > > > > > > > Good news, I'm now able to obtain the performance difference just > > > > > > > > > > with Numpy 0.19.5. In this code, I load the data with Pandas and need > > > > > > > > > > to prepare contiguous Numpy arrays to give them to Pythran. With > > > > > > > > > > Numpy 0.19.5, if I use np.copy I get better performance that with > > > > > > > > > > np.ascontiguousarray. With Numpy 0.20, both functions create array > > > > > > > > > > giving the same performance with Pythran (again, less good that with > > > > > > > > > > Numpy 0.19.5). > > > > > > > > > > > > > > > > > > > > Note that this code is very efficient (more that 100 times faster > > > > > > > > > > than using Numpy), so I guess that things like alignment or memory > > > > > > > > > > location can lead to such difference. > > > > > > > > > > > > > > > > > > > > More details in this issue > > > > > > > > > > https://github.com/serge-sans-paille/pythran/issues/1735 > > > > > > > > > > > > > > > > > > > > Any help to understand what has changed would be greatly appreciated! > > > > > > > > > > > > > > > > > > > > > > If you want to really dig into this, it would be good to do profiling > > > > > > to find out at where the differences are. > > > > > > > > > > > > Without that, I don't have much appetite to investigate personally. The > > > > > > reason is that fluctuations of ~30% (or even much more) when running > > > > > > the NumPy benchmarks are very common. > > > > > > > > > > > > I am not aware of an immediate change in NumPy, especially since you > > > > > > are talking pythran, and only the memory space or the interface code > > > > > > should matter. > > > > > > As to the interface code... I would expect it to be quite a bit faster, > > > > > > not slower. > > > > > > There was no change around data allocation, so at best what you are > > > > > > seeing is a different pattern in how the "small array cache" ends up > > > > > > being used. > > > > > > > > > > > > > > > > > > Unfortunately, getting stable benchmarks that reflect code changes > > > > > > exactly is tough... Here is a nice blog post from Victor Stinner where > > > > > > he had to go as far as using "profile guided compilation" to avoid > > > > > > fluctuations: > > > > > > > > > > > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > > > > > > > > > > > I somewhat hope that this is also the reason for the huge fluctuations > > > > > > we see in the NumPy benchmarks due to absolutely unrelated code > > > > > > changes. > > > > > > But I did not have the energy to try it (and a probably fixed bug in > > > > > > gcc makes it a bit harder right now). > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Sebastian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Pierre > > > > > > > > > > _______________________________________________ > > > > > > > > > > NumPy-Discussion mailing list > > > > > > > > > > NumPy-Discussion at python.org > > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Mon Mar 15 11:11:20 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 15 Mar 2021 10:11:20 -0500 Subject: [Numpy-discussion] Perf regression with Pythran between Numpy 0.19.5 and 0.20 (commit 4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f, ENH: implement NEP-35's `like=` argument) In-Reply-To: References: <1201918649.8270125.1615592022686.JavaMail.zimbra@univ-grenoble-alpes.fr> <7BDC12C1-00E2-4DF8-9C46-DF695751F1CA@fastmail.com> <137390869.547851.1615807742248.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: <4891f157d73671aabdf983b11a405df0f63146d2.camel@sipsolutions.net> On Mon, 2021-03-15 at 14:59 +0100, Peter Andreas Entschev wrote: > Hi Pierre, > > Thanks for pinging me. To put it in the simplest way possible, that > PR > adds a new `like` kwarg that will dispatch to downstream libraries > using `__array_function__` when specified, otherwise fallback to the > default behavior of NumPy. While that introduces an extra check on > the > C side, that should have minimal impact for use cases that don't use > the `like` kwarg. > > Is there a simple reproducer with NumPy only? I assume your case with > Pandas is much more complex (unfortunately I'm not very experienced > with DataFrames), but curiously I see NumPy 1.20.1 being considerably > faster for small arrays and mildly-faster with large arrays (results > in > https://gist.github.com/pentschev/add38b5aee61da87b4b70a1c4649861f) > . 1.20.1 should have some small overhead reductions there since array the array-object life-cycle is probably around 30% faster (deleting an array is faster). But the array-object life-cycle is pretty insignificant aside from creating views. There are also many performance improvements around SIMD, which will affect certain math operations. The changes on that PR may add additional overhead to array creation (something that should go away again in 1.21 and end up being much faster when https://github.com/numpy/numpy/pull/15270 goes in). But that is all. As much as I would love to have an answer, looking for changes in the NumPy code seems to me unlikely to get you anywhere. Another example, check out this benchmark from the NumPy benchmarks: https://pv.github.io/numpy-bench/index.html#bench_reduce.AddReduceSeparate.time_reduce?cpu=Intel(R)%20Core(TM)%20i7%20CPU%20920%20%40%202.67GHz&machine=i7&os=Linux&ram=16416652&Cython=0.29.21&p-axis=1&p-type='int16'&p-type='int32' It keeps jumping back and forth around 30% for the 'int16' version, but the 'int32' one is pretty much stable, so its unlikely to be just bad benchmarking. Right now, I am willing to get that if you repeat that whole thing with a different commit range, you will find another random bad commit. Cheers, Sebastian > > Best, > Peter > > > > On Mon, Mar 15, 2021 at 12:29 PM PIERRE AUGIER > wrote: > > > > > > ----- Mail original ----- > > > De: "Juan Nunez-Iglesias" > > > ?: "numpy-discussion" > > > Envoy?: Dimanche 14 Mars 2021 07:15:39 > > > Objet: Re: [Numpy-discussion] Looking for a difference between > > > Numpy 0.19.5 and 0.20 explaining a perf regression with > > > Pythran > > > > > Hi Pierre, > > > > > > If you?re able to compile NumPy locally and you have reliable > > > benchmarks, you > > > can write a script that tests the runtime of your benchmark and > > > reports it as a > > > test pass/fail. You can then use ?git bisect run? to > > > automatically find the > > > commit that caused the issue. That will help narrow down the > > > discussion before > > > it gets completely derailed a second time. > > > > > > [ https://lwn.net/Articles/317154/?| > > > https://lwn.net/Articles/317154/?] > > > > > > Juan. > > > > Thanks a lot for this advice Juan! I wasn't able to use Git but > > with `hg bisect` I managed to find that the first "bad" commit is > > > > https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f > > ?? ENH: implement NEP-35's `like=` argument (gh-16935) > > > > From the point of view of my benchmark, this commit changes the > > behavior of arr.copy() (the resulting arrays do not give to the > > same performance). This makes sense because it is indeed about the > > array creation. > > > > I haven't yet studied in details this commit (which is quite big > > and not simple) and I'm not sure I'm going to be able to understand > > it and in particular understand why it leads to such performance > > regression! > > > > Cheers, > > > > Pierre > > > > > > > > > > > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > I tried to compile Numpy with `pip install numpy==1.20.1 --no- > > > binary numpy > > > --force-reinstall` and I can reproduce the regression. > > > > > > Good news, I was able to reproduce the difference with only Numpy > > > 1.20.1. > > > > > > Arrays prepared with (`df` is a Pandas dataframe) > > > > > > arr = df.values.copy() > > > > > > or > > > > > > arr = np.ascontiguousarray(df.values) > > > > > > lead to "slow" execution while arrays prepared with > > > > > > arr = np.copy(df.values) > > > > > > lead to faster execution. > > > > > > arr.copy() or np.copy(arr) do not give the same result, with arr > > > obtained from a > > > Pandas dataframe with arr = df.values. It's strange because > > > type(df.values) > > > gives so I would expect arr.copy() and > > > np.copy(arr) to > > > give exactly the same result. > > > > > > Note that I think I'm doing quite serious and reproducible > > > benchmarks. I also > > > checked that this regression is reproducible on another computer. > > > > > > Cheers, > > > > > > Pierre > > > > > > ----- Mail original ----- > > > > > > > > > De: "Sebastian Berg" > > > > > > > > > ?: "numpy-discussion" > > > > > > > > > Envoy?: Vendredi 12 Mars 2021 22:50:24 > > > > > > > > > Objet: Re: [Numpy-discussion] Looking for a difference between > > > Numpy 0.19.5 and > > > 0.20 explaining a perf regression with > > > > > > > > > Pythran > > > > > > > > > > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which > > > > > > > > > > > > > > > could explain a performance regression (~15 %) with Pythran. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I observe this regression with the script > > > > > > > > > > > > > > > https://github.com/paugier/nbabel/blob/master/py/bench.py > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Pythran reimplements Numpy so it is not about Numpy code for > > > > > > > > > > > > > > > computation. However, Pythran of course uses the native array > > > > > > > > > > > > > > > contained in a Numpy array. I'm quite sure that something has > > > changed > > > > > > > > > > > > > > > between Numpy 0.19.5 and 0.20 (or between the corresponding > > > wheels?) > > > > > > > > > > > > > > > since I don't get the same performance with Numpy 0.20. I checked > > > > > > > > > > > > > > > that the values in the arrays are the same and that the flags > > > > > > > > > > > > > > > characterizing the arrays are also the same. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Good news, I'm now able to obtain the performance difference just > > > > > > > > > > > > > > > with Numpy 0.19.5. In this code, I load the data with Pandas and > > > need > > > > > > > > > > > > > > > to prepare contiguous Numpy arrays to give them to Pythran. With > > > > > > > > > > > > > > > Numpy 0.19.5, if I use np.copy I get better performance that with > > > > > > > > > > > > > > > np.ascontiguousarray. With Numpy 0.20, both functions create > > > array > > > > > > > > > > > > > > > giving the same performance with Pythran (again, less good that > > > with > > > > > > > > > > > > > > > Numpy 0.19.5). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Note that this code is very efficient (more that 100 times faster > > > > > > > > > > > > > > > than using Numpy), so I guess that things like alignment or > > > memory > > > > > > > > > > > > > > > location can lead to such difference. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > More details in this issue > > > > > > > > > > > > > > > https://github.com/serge-sans-paille/pythran/issues/1735 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Any help to understand what has changed would be greatly > > > appreciated! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If you want to really dig into this, it would be good to do > > > profiling > > > > > > > > > to find out at where the differences are. > > > > > > > > > > > > > > > > > > Without that, I don't have much appetite to investigate > > > personally. The > > > > > > > > > reason is that fluctuations of ~30% (or even much more) when > > > running > > > > > > > > > the NumPy benchmarks are very common. > > > > > > > > > > > > > > > > > > I am not aware of an immediate change in NumPy, especially since > > > you > > > > > > > > > are talking pythran, and only the memory space or the interface > > > code > > > > > > > > > should matter. > > > > > > > > > As to the interface code... I would expect it to be quite a bit > > > faster, > > > > > > > > > not slower. > > > > > > > > > There was no change around data allocation, so at best what you > > > are > > > > > > > > > seeing is a different pattern in how the "small array cache" ends > > > up > > > > > > > > > being used. > > > > > > > > > > > > > > > > > > > > > > > > > > > Unfortunately, getting stable benchmarks that reflect code > > > changes > > > > > > > > > exactly is tough... Here is a nice blog post from Victor Stinner > > > where > > > > > > > > > he had to go as far as using "profile guided compilation" to > > > avoid > > > > > > > > > fluctuations: > > > > > > > > > > > > > > > > > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > > > > > > > > > > > > > > > > > I somewhat hope that this is also the reason for the huge > > > fluctuations > > > > > > > > > we see in the NumPy benchmarks due to absolutely unrelated code > > > > > > > > > changes. > > > > > > > > > But I did not have the energy to try it (and a probably fixed bug > > > in > > > > > > > > > gcc makes it a bit harder right now). > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > Sebastian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > Pierre > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > > > NumPy-Discussion mailing list > > > > > > > > > > > > > > > NumPy-Discussion at python.org > > > > > > > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > NumPy-Discussion mailing list > > > > > > > > > NumPy-Discussion at python.org > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Mon Mar 15 16:14:00 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 15 Mar 2021 21:14:00 +0100 Subject: [Numpy-discussion] NEP: array API standard adoption (NEP 47) In-Reply-To: <3ba55e0fe50da814b486a73855da35770e50303b.camel@sipsolutions.net> References: <93e3ab801c49ea1331172bcbbb4d651ee3213994.camel@sipsolutions.net> <3ba55e0fe50da814b486a73855da35770e50303b.camel@sipsolutions.net> Message-ID: On Thu, Mar 11, 2021 at 6:08 PM Sebastian Berg wrote: > On Thu, 2021-03-11 at 12:37 +0100, Ralf Gommers wrote: > > On Wed, Mar 10, 2021 at 6:41 PM Sebastian Berg < > > sebastian at sipsolutions.net> > > wrote: > > > > > Top Posting, to discuss post specific questions about NEP 47 and > > > partially the start on implementing it in: > > > > > > https://github.com/numpy/numpy/pull/18585 > > > > > > There are probably many more that will crop up. But for me, each of > > > these is a pretty major difficulty without a clear answer as of > > > now. > > > > > > > All great questions, that Sebastian. Let me reply to the questions > > that > > Aaron didn't reply to inline below. > > > > To be clear, I do not expect complete answers to these questions right > now. (Although being unsure about some of them does make me slightly > reluctant to merge the work-in-progress into NumPy proper as opposed to > a separate repo.) > > Also, yes, most/all questions are hopefully are just trivialities to > check of (or no more than seeds for thought). Or even just a starting > point for making NEP 47's "Usage and Impact" section more complete > including them as either "example usage patterns" or "limitations". > Yes, those are always good to have more of. > > My second takeaway from the questions is that I have doubts the > "minimal" version will pan out, it feels like many of the questions > might disappear if you drop that part. My impression is that a strictly compliant (or "minimal") version is *more* useful than something that's a mix between portable and non-portable functionality. The reason to add more than the minimum required functionality would be that it's too hard to hide the numpy-specific extras. E.g., if we'd do `np.array_api.int32 = np.int32` then that dtype would have methods and behavior that's NumPy-specific. But it'd be hard to hide, so we'd accept it. It's maybe easier to discuss in a call, I've put it on the community meeting agenda. > So, from my current thinking, > the minimal implementation may not be a good "NEP 47" implementation. > > That does _not_ mean that I think you should pause and reconsider or > even worry about pleasing me with good answers! Just continue under > whatever assumption you prefer and if it turns out that "minimal" won't > work for NEP 47: no harm done! We need a "minimal implementation" in > any case. > Yes, I agree. > Cheers, > > Sebastian > > > > [1] If SciPy needs an additional NumPy code path to keep support > `object` arrays or other dtypes ? right now even complex ?, then the > reader needs to be aware of that to make a decision if NEP 47 will > actually help for their library. > Clearly. This is why we'd like to have some WIP PRs for other libraries, actual code to review will be more helpful than only a proposal. > Will AstroPy have to reimplement `astropy.units.Quantity` to be > "standard conform" (is that even possible!?) before it can easily adopt > it any of its API that currently works with `astropy.units.Quantity`? > I'm not sure if the question is well-defined, so let me answer both cases: 1. If the APIs in question require units, then there's no other array/tensor types that have unit support, so those APIs accept *only* Quantity. Adopting the standard isn't possible. 2. If the units are unnecessary/optional, then Quantity is not special and can be treated exactly the same as a `numpy.ndarray`. We don't intend to make any changes to how ndarray subclasses work, so if ndarray works with that API after adoption of the standard then Quantity works too. Cheers, Ralf > > > > > > 1. I still need clarity how a library is supposed to use this > > > namespace > > > when the user passes in a NumPy array (mentioned before). The user > > > must get back a NumPy array after all. Maybe that is just a > > > decorator, > > > but it seems important. > > > > > > > I agree that it will be a common pattern that libraries will accept > > all > > standard-compliant array types plus numpy.ndarray. And the output > > array > > type should match the input type. In Aaron's implementation the new > > array > > object has a numpy.ndarray as private attribute, so that's the > > instance > > that should be returned. A decorator seems like a sensible way to > > handle > > that. Or a simple utility function, something like `return > > correct_arraytype(out)`. > > > > Either way, that pattern should be added to NEP 47. I don't see a > > fundamental problem here, we just need to find the nicest UX for it. > > > > 3. For all other function, the same problem applies. You don't > > actually > > > have anything to fix NumPy promotion rules. You could bake your > > > own > > > cake here for numeric types, but I am not sure, you might also need > > > NEP > > > 43 in all its promotion power to pull it off. > > > > > > > This is probably the single most difficult question implementation- > > wise. > > Note that there are only numerical dtypes (plus boolean), so dealing > > with > > string, datetime, object or third-party dtypes is a non-issue. > > > > 4. The PR makes no attempt at handling binary operators in any way > > > aside from greedily coercing the other operand. > > > > > > > Agreed. This is the same point as (3) I think - how to handle dtype > > promotion is the main open question. > > > > > > > 5. What happens with a mix of array-likes or even array subclasses > > > like > > > `astropy.quantity`? > > > > > > > Array-likes (e.g. list) should raise an exception, the NEP clearly > > says "do > > not accept array_like dtypes". This is what every other array/tensor > > library already does. > > > > Array subclasses should work as expected, assuming they're valid > > subclasses > > and not things like np.matrix. Using Mypy will help avoid writing > > more > > subclasses that break the Liskov substitution principle. More > > comments in > > > https://numpy.org/neps/nep-0047-array-api-standard.html#the-asarray-asanyarray-pattern > > > > Mixing two different types of arrays into a single function call > > should > > raise an exception. A design goal is: enable writing functions > > `somefunc(x1, x2)` that work for any type of array where `x1, x2` > > come from > > the same library = so they're either the same type, or two types for > > which > > the library itself knows how to mix them. If x1 and x2 are from > > different > > libraries, this will raise an exception. > > > > To be clear, it is not intended that `np.array_api.somefunc(x_cupy)` > > works > > - this will raise an exception. > > > > Cheers, > > Ralf > > > > > > > > > > > > I don't think we have to figure out everything up-front, but I do > > > think > > > there are a few very fundamental questions still open, at least for > > > me > > > personally. > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lee.johnston.100 at gmail.com Tue Mar 16 14:17:42 2021 From: lee.johnston.100 at gmail.com (Lee Johnston) Date: Tue, 16 Mar 2021 13:17:42 -0500 Subject: [Numpy-discussion] NEP 42 status Message-ID: Is the work on NEP 42 custom DTypes far enough along to experiment with? Lee -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Mar 16 17:10:47 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 16 Mar 2021 16:10:47 -0500 Subject: [Numpy-discussion] NEP 42 status In-Reply-To: References: Message-ID: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net> On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote: > Is the work on NEP 42 custom DTypes far enough along to experiment > with? > TL;DR: Its not quite ready, but if we work together I think we could experiment a fair bit. Mainly ufuncs are still limited (though not quite completely missing). The main problem is that we need to find a way to expose the currently private API. I would be happy to discuss this also in a call. ** The long story: ** There is one more PR related to casting, for which merge should be around the corner. And which would bring a lot bang to such an experiment: https://github.com/numpy/numpy/pull/18398 At that point, the new machinery supports (or is used for): * Array-coercion: `np.array([your_scalar])` or `np.array([1], dtype=your_dtype)`. * Casting (practically full support). * UFuncs do not quite work. But short of writing `np.add(arr1, arr2)` with your DType involved, you can try a whole lot. (see below) * Promotion `np.result_type` should work very soon, but probably isn't is not very relevant anyway until ufuncs are fully implemented. That should allow you to do a lot of good experimentation, but due to the ufunc limitation, maybe not well on "existing" python code. The long story about limitations is: We are missing exposure of the new public API. I think I should be able to provide a solution for this pretty quickly, but it might require working of a NumPy branch. (I will write another email about it, hopefully we can find a better solution.) Limitations for UFuncs: UFuncs are the next big project, so to try it fully you will need some patience, unfortunately. But, there is some good news! You can write most of the "ufunc" already, you just can't "register" it. So what I can already offer you is a "DType-specific UFunc", e.g.: unit_dtype_multiply(np.array([1.], dtype=Float64UnitDType("m")), np.array([2.], dtype=Float64UnitDtype("s"))) And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`. But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` yet. Both registration and "promotion" logic are missing. I admit promotion may be one of the trickiest things, but trying this a bit might help with getting a clearer picture for promotion as well. The main last limitation is that I did not replace or create "fallback" solutions and/or replacement for the legacy `dtype->f->` yet. This is not a serious limitation for experimentation, though. It might even make sense to keep some of them around and replace them slowly. And of course, all the small issues/limitations that are not fixed because nobody tried yet... I hope this doesn't scare you away, or at least not for long :/. It could be very useful to start experimentation soon to push things forward a bit quicker. And I really want to have at least an experimental version in NumPy 1.21. Cheers, Sebastian > Lee > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Mar 16 17:38:49 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 16 Mar 2021 16:38:49 -0500 Subject: [Numpy-discussion] Exposing experimental C-API for DTypes Message-ID: <5facef5705009f1f1fed40fd5579bc0a828aff63.camel@sipsolutions.net> Hi all, For DTypes, it may soon make sense to expose API publicly for testing/experimentation. But, right now I don't really want to get roped into discussing API details too much and slowing down potential revamps. Do we have any idea for exposing an "experimental" API? The first option would be "symbols" with an underscore in the name and an understanding that using them might just break if you don't use the exact version you wrote the code for and compiled with (i.e. no API/ABI guarantee). My current expectation is that everyone will be appalled by such a plan... For a single, simple project which would end up as a test, similar to the `rational` tests, we could work in NumPy itself. That is fine, but fairly strictly constrained... Of course I can make a "branch" of NumPy that exports more API, but that doesn't feel great either, it seems a bit clunky. The last idea I have right now is a bit convoluted but safe: We add a private python function: np.core._multiarray_umath.get_new_dtype_api(api_version) and a corresponding header (potentially outside of NumPy). The header would include an: import_new_dtype_api() macro/function that leverages the private "Python" function. To import the API (much like `import_array()` works). Since it would use its own header, it could do strict version checks. And since it would have to "ask numpy", NumPy could require an environment variable to be set and/or print out a warning. Am I missing some obvious solution? Aside from "be patient and get it right the first time"? Cheers, Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Mar 16 18:15:04 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 16 Mar 2021 17:15:04 -0500 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday (no DST: for those in the US, one hour later) Message-ID: Hi all, There will be a NumPy Community meeting Wednesday March 3rd at 20:00 UTC. Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian PS: As the subject says, we will stay on UTC 20:00, so for those in the US and anyone else who had daylight saving times switches the time will have shifted. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From melissawm at gmail.com Tue Mar 16 20:17:24 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Tue, 16 Mar 2021 21:17:24 -0300 Subject: [Numpy-discussion] Google Season of Docs 2021 Message-ID: Hello, folks! NumPy is hoping to participate again in Google Season of Docs this year, and we have a couple of project ideas listed here: https://github.com/numpy/numpy/wiki/Google-Season-of-Docs-2021-Project-Ideas This year, GSoD has a different structure: we must choose only one project idea (ideally, the one prospective technical writers are most interested in) and submit a sort of grant proposal (details are here: https://developers.google.com/season-of-docs/docs/admin-guide). If we are selected, we can hire up to 2 technical writers to work on our project, depending on the budget allocated to us by Google. The final proposal must be submitted by March 26 (you can find the complete timeline here https://developers.google.com/season-of-docs/docs/timeline). Feedback and input is appreciated, and please feel free to share with technical writers who may be interested. Cheers, - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From lee.johnston.100 at gmail.com Wed Mar 17 08:56:41 2021 From: lee.johnston.100 at gmail.com (Lee Johnston) Date: Wed, 17 Mar 2021 07:56:41 -0500 Subject: [Numpy-discussion] NEP 42 status In-Reply-To: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net> References: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net> Message-ID: I am willing to wait for PR #18398 as I am mainly interested at this point in the process of developing a new DType and then array coercion and casting. Does _rational_tests.c.src illustrate the new DType? On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg wrote: > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote: > > Is the work on NEP 42 custom DTypes far enough along to experiment > > with? > > > > TL;DR: Its not quite ready, but if we work together I think we could > experiment a fair bit. Mainly ufuncs are still limited (though not > quite completely missing). The main problem is that we need to find a > way to expose the currently private API. > > I would be happy to discuss this also in a call. > > > ** The long story: ** > > There is one more PR related to casting, for which merge should be > around the corner. And which would bring a lot bang to such an > experiment: > > https://github.com/numpy/numpy/pull/18398 > > > At that point, the new machinery supports (or is used for): > > * Array-coercion: `np.array([your_scalar])` or > `np.array([1], dtype=your_dtype)`. > > * Casting (practically full support). > > * UFuncs do not quite work. But short of writing `np.add(arr1, arr2)` > with your DType involved, you can try a whole lot. (see below) > > * Promotion `np.result_type` should work very soon, but probably isn't > is not very relevant anyway until ufuncs are fully implemented. > > That should allow you to do a lot of good experimentation, but due to > the ufunc limitation, maybe not well on "existing" python code. > > > The long story about limitations is: > > We are missing exposure of the new public API. I think I should be > able to provide a solution for this pretty quickly, but it might > require working of a NumPy branch. (I will write another email about > it, hopefully we can find a better solution.) > > > Limitations for UFuncs: UFuncs are the next big project, so to try it > fully you will need some patience, unfortunately. > > But, there is some good news! You can write most of the "ufunc" > already, you just can't "register" it. > So what I can already offer you is a "DType-specific UFunc", e.g.: > > unit_dtype_multiply(np.array([1.], dtype=Float64UnitDType("m")), > np.array([2.], dtype=Float64UnitDtype("s"))) > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`. > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` yet. > Both registration and "promotion" logic are missing. > > I admit promotion may be one of the trickiest things, but trying this a > bit might help with getting a clearer picture for promotion as well. > > > The main last limitation is that I did not replace or create "fallback" > solutions and/or replacement for the legacy `dtype->f->` yet. > This is not a serious limitation for experimentation, though. It might > even make sense to keep some of them around and replace them slowly. > > > And of course, all the small issues/limitations that are not fixed > because nobody tried yet... > > > > I hope this doesn't scare you away, or at least not for long :/. It > could be very useful to start experimentation soon to push things > forward a bit quicker. And I really want to have at least an > experimental version in NumPy 1.21. > > Cheers, > > Sebastian > > > > Lee > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Mar 17 18:12:32 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 17 Mar 2021 17:12:32 -0500 Subject: [Numpy-discussion] NEP 42 status In-Reply-To: References: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net> Message-ID: <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net> On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote: > I am willing to wait for PR #18398 as I am mainly interested at this > point > in the process of developing a new DType and then array coercion and > casting. > > Does _rational_tests.c.src > < > https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/_rational_tests.c.src > > > illustrate > the new DType? > Thanks for joining the communit call! The `rational_tests` are still using the old API and unfortunately there is no great example of the new API, because the API is not public yet and dealing with "old dtypes" in NumPy obfuscates it a bit. Let me try to summarize my take-away from discussion and next steps: As discussed, I think we agreed on the idea of exposing the new API "experimentally" with the following mechanism: 1. We add a new header, distinct from the normal NumPy headers. 2. This header will use private Python API to achieve: - Strict version ABI/API requirements. If the code is updated in NumPy we will increase this version. Possible very often. A mismatch will cause a strict failure requiring the user to "keep up" with the NumPy development. - NumPy will prohibit exporting the public API unless a `NUMPY_EXPERIMENTAL_DTYPE_API=1` environment variable is set. This will hopefully prevent the use in production code even if we make a release. 3. In parallel, I will create a small "toy" DType based on that experimental API. Probably in a separate repo (in the NumPy organization?). Anyone using the API, should expect bugs, crashes and changes for a while. But hopefully will only require small code modifications when the API becomes public. My personal plan for a toy example is currently a "scaled integer". E.g. a uint8 where you can set a range `[min_double, max_double]` that it maps to (which makes the DType "parametric"). We discussed some other examples, such as a "modernized" rational DType, that could be nice as well, lets see... Units would be a great experiment, but seem a bit complex to me (I don't know units well though). So to keep it baby steps :) I would aim for doing the above and then we can experiment on Units together! Since it came up: I agree that a Python API would be great to have. It is something I firmly kept on the back-burner... It should not be very hard (if rudimentary), but unless it would help experiments a lot, I would tend to leave it on the back-burner for now. Cheers, Sebastian [1] Maybe a `uint8` storage that maps to evenly spaced values on a parametric range `[double_min, double_max]`. That seems like a good trade-off in complexity. > On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg < > sebastian at sipsolutions.net> > wrote: > > > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote: > > > Is the work on NEP 42 custom DTypes far enough along to > > > experiment > > > with? > > > > > > > TL;DR:? Its not quite ready, but if we work together I think we > > could > > experiment a fair bit.? Mainly ufuncs are still limited (though not > > quite completely missing).? The main problem is that we need to > > find a > > way to expose the currently private API. > > > > I would be happy to discuss this also in a call. > > > > > > ** The long story: ** > > > > There is one more PR related to casting, for which merge should be > > around the corner. And which would bring a lot bang to such an > > experiment: > > > > https://github.com/numpy/numpy/pull/18398 > > > > > > At that point, the new machinery supports (or is used for): > > > > * Array-coercion: `np.array([your_scalar])` or > > ? `np.array([1], dtype=your_dtype)`. > > > > * Casting (practically full support). > > > > * UFuncs do not quite work. But short of writing `np.add(arr1, > > arr2)` > > ? with your DType involved, you can try a whole lot. (see below) > > > > * Promotion `np.result_type` should work very soon, but probably > > isn't > > ? is not very relevant anyway until ufuncs are fully implemented. > > > > That should allow you to do a lot of good experimentation, but due > > to > > the ufunc limitation, maybe not well on "existing" python code. > > > > > > The long story about limitations is: > > > > We are missing exposure of the new public API.? I think I should be > > able to provide a solution for this pretty quickly, but it might > > require working of a NumPy branch.? (I will write another email > > about > > it, hopefully we can find a better solution.) > > > > > > Limitations for UFuncs:? UFuncs are the next big project, so to try > > it > > fully you will need some patience, unfortunately. > > > > But, there is some good news!? You can write most of the "ufunc" > > already, you just can't "register" it. > > So what I can already offer you is a "DType-specific UFunc", e.g.: > > > > ?? unit_dtype_multiply(np.array([1.], dtype=Float64UnitDType("m")), > > ?????????????????????? np.array([2.], dtype=Float64UnitDtype("s"))) > > > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`. > > > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` yet. > > Both registration and "promotion" logic are missing. > > > > I admit promotion may be one of the trickiest things, but trying > > this a > > bit might help with getting a clearer picture for promotion as > > well. > > > > > > The main last limitation is that I did not replace or create > > "fallback" > > solutions and/or replacement for the legacy `dtype->f->` > > yet. > > This is not a serious limitation for experimentation, though.? It > > might > > even make sense to keep some of them around and replace them > > slowly. > > > > > > And of course, all the small issues/limitations that are not fixed > > because nobody tried yet... > > > > > > > > I hope this doesn't scare you away, or at least not for long :/.? > > It > > could be very useful to start experimentation soon to push things > > forward a bit quicker.? And I really want to have at least an > > experimental version in NumPy 1.21. > > > > Cheers, > > > > Sebastian > > > > > > > Lee > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From hameerabbasi at yahoo.com Fri Mar 19 07:23:49 2021 From: hameerabbasi at yahoo.com (Hameer Abbasi) Date: Fri, 19 Mar 2021 12:23:49 +0100 Subject: [Numpy-discussion] PyData/Sparse 0.12.0 Release Announcement References: <12f1adcc-fe1e-408e-a2d0-784426599f31.ref@Canary> Message-ID: <12f1adcc-fe1e-408e-a2d0-784426599f31@Canary> Apologies in advance for the cross-post! I?m happy to announce the release of PyData/Sparse 0.12.0! PyData/Sparse provides sparse arrays with a NumPy-like API for the PyData ecosystem. This is a large release with GCXS support, preliminary CSR/CSC support and extensions to DOK, as well as bugfixes. Changelog: https://sparse.pydata.org/en/stable/changelog.html Documentation: https://sparse.pydata.org/ Source: https://github.com/pydata/sparse/ Best regards, Hameer Abbasi -- Sent from Canary (https://canarymail.io) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Mar 23 23:57:08 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 23 Mar 2021 22:57:08 -0500 Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage Focus Message-ID: Hi all, Our bi-weekly triage-focused NumPy development meeting is Wednesday, March 10th at 11 am Pacific Time (18:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized, discussed, or reviewed. Best regards Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From diagonaldevice at gmail.com Wed Mar 24 20:57:17 2021 From: diagonaldevice at gmail.com (Michael Lamparski) Date: Wed, 24 Mar 2021 20:57:17 -0400 Subject: [Numpy-discussion] Programmatically contracting multiple tensors In-Reply-To: References: Message-ID: Hi, I must thank y'all for the exceptionally fast responses (and apologize for my own tragically slow response!) On Sat, Mar 13, 2021 at 1:32 AM Eric Wieser wrote: > Einsum has a secret integer argument format that appears in the Examples section of the > `np.einsum` docs, but appears not to be mentioned at all in the parameter listing. Ah, yes, this is precisely the sort of API I was hoping for! I found it pretty easy to use, but here's a snippet that solves my original problem for those wondering: https://github.com/exphp-share/gpaw-raman-script/blob/f98fe14cd6/script/symmetry.py#L442-L471 On Fri, Mar 12, 2021 at 8:09 PM Andras Deak wrote: > But I'm not sure _where_ this could be highlighted among the > parameters; after all this is all covered by the *operands parameter. The parameter list is definitely one of the places I checked most closely, and having something there would have helped. I'd say that, technically, this also overlaps with the subscripts argument, which now holds the first array, and I feel like that may be the best place to put something. For instance, a short paragraph could be added to the end of 'subscripts': "einsum also has an alternative interface that uses integer labels for axes, in which case the subscripts argument is not present. This is documented below." (with a link) (the idea I'm trying to capture here is to avoid creating any specific (or potentially wrong) picture of how the arguments look in the alternate signature, more or less forcing the reader to follow a link to where it is more easily described somewhere outside the constraints of parameter-based documentation) --- Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From tyler.je.reddy at gmail.com Wed Mar 24 22:13:19 2021 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Wed, 24 Mar 2021 20:13:19 -0600 Subject: [Numpy-discussion] ANN: SciPy 1.6.2 Message-ID: Hi all, On behalf of the SciPy development team I'm pleased to announce the release of SciPy 1.6.2, which is a bug fix release. Sources and binary wheels can be found at: https://pypi.org/project/scipy/ and at: https://github.com/scipy/scipy/releases/tag/v1.6.2 One of a few ways to install this release with pip: pip install scipy==1.6.2 ===================== SciPy 1.6.2 Release Notes ===================== SciPy 1.6.2 is a bug-fix release with no new features compared to 1.6.1. This is also the first SciPy release to place upper bounds on some dependencies to improve the long-term repeatability of source builds. Authors ====== * Pradipta Ghosh + * Tyler Reddy * Ralf Gommers * Martin K. Scherer + * Robert Uhl * Warren Weckesser A total of 6 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete. Issues closed for 1.6.2 ------------------------------ * `#13512 `__: \`stats.gaussian_kde.evaluate\` broken on S390X * `#13584 `__: rotation._compute_euler_from_matrix() creates an array with negative... * `#13585 `__: Behavior change in coo_matrix when dtype=None * `#13686 `__: delta0 argument of scipy.odr.ODR() ignored Pull requests for 1.6.2 ------------------------------ * `#12862 `__: REL: put upper bounds on versions of dependencies * `#13575 `__: BUG: fix \`gaussian_kernel_estimate\` on S390X * `#13586 `__: BUG: sparse: Create a utility function \`getdata\` * `#13598 `__: MAINT, BUG: enforce contiguous layout for output array in Rotation.as_euler * `#13687 `__: BUG: fix scipy.odr to consider given delta0 argument Checksums ========= MD5 ~~~ fc81d43879a28270d593aaea37c74ff8 scipy-1.6.2-cp37-cp37m-macosx_10_9_x86_64.whl 9213533bfd3c2f1563d169009c39825c scipy-1.6.2-cp37-cp37m-manylinux1_i686.whl 2ddd03b89efdb1619fa995da7b83aa6f scipy-1.6.2-cp37-cp37m-manylinux1_x86_64.whl d378f725958bd6a83db7ef23e8659762 scipy-1.6.2-cp37-cp37m-manylinux2014_aarch64.whl 87bc2771b8a8ab1f10168b1563300415 scipy-1.6.2-cp37-cp37m-win32.whl 861dab18fe41e82c08c8f585f2710545 scipy-1.6.2-cp37-cp37m-win_amd64.whl d2e2002b526adeebf94489aa95031f54 scipy-1.6.2-cp38-cp38-macosx_10_9_x86_64.whl 2dc36bfbe3938c492533604aba002c17 scipy-1.6.2-cp38-cp38-manylinux1_i686.whl 0114de2118d41f9440cf86fdd67434fc scipy-1.6.2-cp38-cp38-manylinux1_x86_64.whl ede6db56b1bf0a7fed0c75acac7dcb85 scipy-1.6.2-cp38-cp38-manylinux2014_aarch64.whl 191636ac3276da0ee9fd263b47927b73 scipy-1.6.2-cp38-cp38-win32.whl 8bdf7ab041b9115b379f043bb02d905f scipy-1.6.2-cp38-cp38-win_amd64.whl 608c82b227b6077d9a7871ac6278e64d scipy-1.6.2-cp39-cp39-macosx_10_9_x86_64.whl 4c0313b2cccc85666b858ffd692a3c87 scipy-1.6.2-cp39-cp39-manylinux1_i686.whl 92da8ffe165034dbbe5f098d0ed58aec scipy-1.6.2-cp39-cp39-manylinux1_x86_64.whl b4b225fb1deeaaf0eda909fdd3bd6ca6 scipy-1.6.2-cp39-cp39-manylinux2014_aarch64.whl 662969220eadbb6efec99030e4d00268 scipy-1.6.2-cp39-cp39-win32.whl f19186d6d91c7e37000e9f6ccd9b9b60 scipy-1.6.2-cp39-cp39-win_amd64.whl cbcb9b39bd9d877ad3deeccc7c37bb7f scipy-1.6.2.tar.gz b56e705c653ad808a9725dfe840d1258 scipy-1.6.2.tar.xz 6f615549670cd3d312dc9e4359d2436a scipy-1.6.2.zip SHA256 ~~~~~~ 77f7a057724545b7e097bfdca5c6006bed8580768cd6621bb1330aedf49afba5 scipy-1.6.2-cp37-cp37m-macosx_10_9_x86_64.whl e547f84cd52343ac2d56df0ab08d3e9cc202338e7d09fafe286d6c069ddacb31 scipy-1.6.2-cp37-cp37m-manylinux1_i686.whl bc52d4d70863141bb7e2f8fd4d98e41d77375606cde50af65f1243ce2d7853e8 scipy-1.6.2-cp37-cp37m-manylinux1_x86_64.whl adf7cee8e5c92b05f2252af498f77c7214a2296d009fc5478fc432c2f8fb953b scipy-1.6.2-cp37-cp37m-manylinux2014_aarch64.whl e3e9742bad925c421d39e699daa8d396c57535582cba90017d17f926b61c1552 scipy-1.6.2-cp37-cp37m-win32.whl ffdfb09315896c6e9ac739bb6e13a19255b698c24e6b28314426fd40a1180822 scipy-1.6.2-cp37-cp37m-win_amd64.whl 6ca1058cb5bd45388041a7c3c11c4b2bd58867ac9db71db912501df77be2c4a4 scipy-1.6.2-cp38-cp38-macosx_10_9_x86_64.whl 993c86513272bc84c451349b10ee4376652ab21f312b0554fdee831d593b6c02 scipy-1.6.2-cp38-cp38-manylinux1_i686.whl 37f4c2fb904c0ba54163e03993ce3544c9c5cde104bcf90614f17d85bdfbb431 scipy-1.6.2-cp38-cp38-manylinux1_x86_64.whl 96620240b393d155097618bcd6935d7578e85959e55e3105490bbbf2f594c7ad scipy-1.6.2-cp38-cp38-manylinux2014_aarch64.whl 03f1fd3574d544456325dae502facdf5c9f81cbfe12808a5e67a737613b7ba8c scipy-1.6.2-cp38-cp38-win32.whl 0c81ea1a95b4c9e0a8424cf9484b7b8fa7ef57169d7bcc0dfcfc23e3d7c81a12 scipy-1.6.2-cp38-cp38-win_amd64.whl c1d3f771c19af00e1a36f749bd0a0690cc64632783383bc68f77587358feb5a4 scipy-1.6.2-cp39-cp39-macosx_10_9_x86_64.whl 50e5bcd9d45262725e652611bb104ac0919fd25ecb78c22f5282afabd0b2e189 scipy-1.6.2-cp39-cp39-manylinux1_i686.whl 816951e73d253a41fa2fd5f956f8e8d9ac94148a9a2039e7db56994520582bf2 scipy-1.6.2-cp39-cp39-manylinux1_x86_64.whl 1fba8a214c89b995e3721670e66f7053da82e7e5d0fe6b31d8e4b19922a9315e scipy-1.6.2-cp39-cp39-manylinux2014_aarch64.whl e89091e6a8e211269e23f049473b2fde0c0e5ae0dd5bd276c3fc91b97da83480 scipy-1.6.2-cp39-cp39-win32.whl d744657c27c128e357de2f0fd532c09c84cd6e4933e8232895a872e67059ac37 scipy-1.6.2-cp39-cp39-win_amd64.whl e9da33e21c9bc1b92c20b5328adb13e5f193b924c9b969cd700c8908f315aa59 scipy-1.6.2.tar.gz 8fadc443044396283c48191d48e4e07a3c3b6e2ae320b1a56e76bb42929e84d2 scipy-1.6.2.tar.xz 2af283054d91865336b4579aa91f9e59d648d436cf561f96d4692008f795c750 scipy-1.6.2.zip -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Mar 25 18:27:06 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 25 Mar 2021 17:27:06 -0500 Subject: [Numpy-discussion] =?utf-8?q?NEP_42_status_=E2=80=93_Store_quant?= =?utf-8?q?ity_in_a_NumPy_array_and_convert_it_=3A=29?= In-Reply-To: <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net> References: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net> <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net> Message-ID: <9f63a292b2811bed6f0a6c27346879ae62b7432e.camel@sipsolutions.net> On Wed, 2021-03-17 at 17:12 -0500, Sebastian Berg wrote: > On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote: > 3. In parallel, I will create a small "toy" DType based on that > ?? experimental API.? Probably in a separate repo (in the NumPy > ?? organization?). > So this is started. What you need to do right now if you want to try is work of this branch in NumPy: https://github.com/numpy/numpy/compare/main...seberg:experimental-dtype-api Install NumPy with `NPY_USE_NEW_CASTINGIMPL=1 python -mpip install .` or your favorite alternative. (The `NPY_USE_NEW_CASTINGIMPL=1` should be unnecessary very soon, working of a branch and not "main" will hopefully also be unnecessary soon.) Then fetch: https://github.com/seberg/experimental_user_dtypes and install it as well in the same environment. After that, you can jump through the hoop of setting: NUMPY_EXPERIMENTAL_DTYPE_API=1 And you can enjoy these type of examples (while expecting hard crashes when going too far beyond!): from experimental_user_dtypes import float64unit as u import numpy as np F = np.array([u.Quantity(70., "Fahrenheit")]) C = F.astype(u.Float64UnitDType("Celsius")) print(repr(C)) # array([21.11111111111115 ?C], dtype='Float64UnitDType(degC)') m = np.array([u.Quantity(5., "m")]) m_squared = u.multiply(m, m) print(repr(m_squared)) # array([25.0 m**2], dtype='Float64UnitDType(m**2)') # Or conversion to SI the long route: pc = np.arange(5., dtype="float64").view(u.Float64UnitDType("pc")) pc.astype(pc.dtype.si()) # array([0.0 m, 3.085677580962325e+16 m, 6.17135516192465e+16 m, # 9.257032742886974e+16 m, 1.23427103238493e+17 m], # dtype='Float64UnitDType(m)') Yes, the code has some horrible hacks around creating the DType, but the basic mechanism i.e. "functions you need to implement" are not expected to change lot. Right now, it forces you to use and implement the scalar `u.Quantity` and the code sample uses it. But you can also do: np.arange(3.).view(u.Float64UnitDType("m")) I do have plans to "not have a scalar" so the 0-D result would still be an array. But that option doesn't exist yet (and right now the scalar is used for printing). (There is also a `string_equal` "ufunc-like" that works on "S" dtypes.) Cheers, Sebastian PS: I need to figure out some details about how to create DTypes and DType instances with regards to our stable ABI. The current "solution" is some weird subclassing hoops which are probably not good. That is painful unfortunately and any ideas would be great :). Unfortunately, it requires a grasp around the C-API and metaclassing... > > Anyone using the API, should expect bugs, crashes and changes for a > while.? But hopefully will only require small code modifications when > the API becomes public. > > My personal plan for a toy example is currently a "scaled integer". > E.g. a uint8 where you can set a range `[min_double, max_double]` > that > it maps to (which makes the DType "parametric"). > We discussed some other examples, such as a "modernized" rational > DType, that could be nice as well, lets see... > > Units would be a great experiment, but seem a bit complex to me (I > don't know units well though). So to keep it baby steps :) I would > aim > for doing the above and then we can experiment on Units together! > > > Since it came up:? I agree that a Python API would be great to have. > It > is something I firmly kept on the back-burner...? It should not be > very > hard (if rudimentary), but unless it would help experiments a lot, I > would tend to leave it on the back-burner for now. > > Cheers, > > Sebastian > > > [1]? Maybe a `uint8` storage that maps to evenly spaced values on a > parametric range `[double_min, double_max]`.? That seems like a good > trade-off in complexity. > > > > > On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg < > > sebastian at sipsolutions.net> > > wrote: > > > > > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote: > > > > Is the work on NEP 42 custom DTypes far enough along to > > > > experiment > > > > with? > > > > > > > > > > TL;DR:? Its not quite ready, but if we work together I think we > > > could > > > experiment a fair bit.? Mainly ufuncs are still limited (though > > > not > > > quite completely missing).? The main problem is that we need to > > > find a > > > way to expose the currently private API. > > > > > > I would be happy to discuss this also in a call. > > > > > > > > > ** The long story: ** > > > > > > There is one more PR related to casting, for which merge should > > > be > > > around the corner. And which would bring a lot bang to such an > > > experiment: > > > > > > https://github.com/numpy/numpy/pull/18398 > > > > > > > > > At that point, the new machinery supports (or is used for): > > > > > > * Array-coercion: `np.array([your_scalar])` or > > > ? `np.array([1], dtype=your_dtype)`. > > > > > > * Casting (practically full support). > > > > > > * UFuncs do not quite work. But short of writing `np.add(arr1, > > > arr2)` > > > ? with your DType involved, you can try a whole lot. (see below) > > > > > > * Promotion `np.result_type` should work very soon, but probably > > > isn't > > > ? is not very relevant anyway until ufuncs are fully implemented. > > > > > > That should allow you to do a lot of good experimentation, but > > > due > > > to > > > the ufunc limitation, maybe not well on "existing" python code. > > > > > > > > > The long story about limitations is: > > > > > > We are missing exposure of the new public API.? I think I should > > > be > > > able to provide a solution for this pretty quickly, but it might > > > require working of a NumPy branch.? (I will write another email > > > about > > > it, hopefully we can find a better solution.) > > > > > > > > > Limitations for UFuncs:? UFuncs are the next big project, so to > > > try > > > it > > > fully you will need some patience, unfortunately. > > > > > > But, there is some good news!? You can write most of the "ufunc" > > > already, you just can't "register" it. > > > So what I can already offer you is a "DType-specific UFunc", > > > e.g.: > > > > > > ?? unit_dtype_multiply(np.array([1.], > > > dtype=Float64UnitDType("m")), > > > ?????????????????????? np.array([2.], > > > dtype=Float64UnitDtype("s"))) > > > > > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`. > > > > > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` > > > yet. > > > Both registration and "promotion" logic are missing. > > > > > > I admit promotion may be one of the trickiest things, but trying > > > this a > > > bit might help with getting a clearer picture for promotion as > > > well. > > > > > > > > > The main last limitation is that I did not replace or create > > > "fallback" > > > solutions and/or replacement for the legacy `dtype->f->` > > > yet. > > > This is not a serious limitation for experimentation, though.? It > > > might > > > even make sense to keep some of them around and replace them > > > slowly. > > > > > > > > > And of course, all the small issues/limitations that are not > > > fixed > > > because nobody tried yet... > > > > > > > > > > > > I hope this doesn't scare you away, or at least not for long :/.? > > > It > > > could be very useful to start experimentation soon to push things > > > forward a bit quicker.? And I really want to have at least an > > > experimental version in NumPy 1.21. > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > > Lee > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From lee.johnston.100 at gmail.com Fri Mar 26 10:44:42 2021 From: lee.johnston.100 at gmail.com (Lee Johnston) Date: Fri, 26 Mar 2021 09:44:42 -0500 Subject: [Numpy-discussion] =?utf-8?q?NEP_42_status_=E2=80=93_Store_quant?= =?utf-8?q?ity_in_a_NumPy_array_and_convert_it_=3A=29?= In-Reply-To: <9f63a292b2811bed6f0a6c27346879ae62b7432e.camel@sipsolutions.net> References: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net> <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net> <9f63a292b2811bed6f0a6c27346879ae62b7432e.camel@sipsolutions.net> Message-ID: Thanks Sebastian, I have your example running and will start experimenting with DType. Lee On Thu, Mar 25, 2021 at 5:32 PM Sebastian Berg wrote: > On Wed, 2021-03-17 at 17:12 -0500, Sebastian Berg wrote: > > On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote: > > > > > 3. In parallel, I will create a small "toy" DType based on that > > experimental API. Probably in a separate repo (in the NumPy > > organization?). > > > > So this is started. What you need to do right now if you want to try is > work of this branch in NumPy: > > > https://github.com/numpy/numpy/compare/main...seberg:experimental-dtype-api > > Install NumPy with `NPY_USE_NEW_CASTINGIMPL=1 python -mpip install .` > or your favorite alternative. > (The `NPY_USE_NEW_CASTINGIMPL=1` should be unnecessary very soon, > working of a branch and not "main" will hopefully also be unnecessary > soon.) > > > Then fetch: https://github.com/seberg/experimental_user_dtypes > and install it as well in the same environment. > > > After that, you can jump through the hoop of setting: > > NUMPY_EXPERIMENTAL_DTYPE_API=1 > > And you can enjoy these type of examples (while expecting hard crashes > when going too far beyond!): > > from experimental_user_dtypes import float64unit as u > import numpy as np > > F = np.array([u.Quantity(70., "Fahrenheit")]) > C = F.astype(u.Float64UnitDType("Celsius")) > print(repr(C)) > # array([21.11111111111115 ?C], dtype='Float64UnitDType(degC)') > > m = np.array([u.Quantity(5., "m")]) > m_squared = u.multiply(m, m) > print(repr(m_squared)) > # array([25.0 m**2], dtype='Float64UnitDType(m**2)') > > # Or conversion to SI the long route: > pc = np.arange(5., dtype="float64").view(u.Float64UnitDType("pc")) > pc.astype(pc.dtype.si()) > # array([0.0 m, 3.085677580962325e+16 m, 6.17135516192465e+16 m, > # 9.257032742886974e+16 m, 1.23427103238493e+17 m], > # dtype='Float64UnitDType(m)') > > > Yes, the code has some horrible hacks around creating the DType, but > the basic mechanism i.e. "functions you need to implement" are not > expected to change lot. > > Right now, it forces you to use and implement the scalar `u.Quantity` > and the code sample uses it. But you can also do: > > np.arange(3.).view(u.Float64UnitDType("m")) > > I do have plans to "not have a scalar" so the 0-D result would still be > an array. But that option doesn't exist yet (and right now the scalar > is used for printing). > > > (There is also a `string_equal` "ufunc-like" that works on "S" dtypes.) > > Cheers, > > Sebastian > > > > PS: I need to figure out some details about how to create DTypes and > DType instances with regards to our stable ABI. The current "solution" > is some weird subclassing hoops which are probably not good. > > That is painful unfortunately and any ideas would be great :). > Unfortunately, it requires a grasp around the C-API and metaclassing... > > > > > > > Anyone using the API, should expect bugs, crashes and changes for a > > while. But hopefully will only require small code modifications when > > the API becomes public. > > > > My personal plan for a toy example is currently a "scaled integer". > > E.g. a uint8 where you can set a range `[min_double, max_double]` > > that > > it maps to (which makes the DType "parametric"). > > We discussed some other examples, such as a "modernized" rational > > DType, that could be nice as well, lets see... > > > > Units would be a great experiment, but seem a bit complex to me (I > > don't know units well though). So to keep it baby steps :) I would > > aim > > for doing the above and then we can experiment on Units together! > > > > > > Since it came up: I agree that a Python API would be great to have. > > It > > is something I firmly kept on the back-burner... It should not be > > very > > hard (if rudimentary), but unless it would help experiments a lot, I > > would tend to leave it on the back-burner for now. > > > > Cheers, > > > > Sebastian > > > > > > [1] Maybe a `uint8` storage that maps to evenly spaced values on a > > parametric range `[double_min, double_max]`. That seems like a good > > trade-off in complexity. > > > > > > > > > On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg < > > > sebastian at sipsolutions.net> > > > wrote: > > > > > > > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote: > > > > > Is the work on NEP 42 custom DTypes far enough along to > > > > > experiment > > > > > with? > > > > > > > > > > > > > TL;DR: Its not quite ready, but if we work together I think we > > > > could > > > > experiment a fair bit. Mainly ufuncs are still limited (though > > > > not > > > > quite completely missing). The main problem is that we need to > > > > find a > > > > way to expose the currently private API. > > > > > > > > I would be happy to discuss this also in a call. > > > > > > > > > > > > ** The long story: ** > > > > > > > > There is one more PR related to casting, for which merge should > > > > be > > > > around the corner. And which would bring a lot bang to such an > > > > experiment: > > > > > > > > https://github.com/numpy/numpy/pull/18398 > > > > > > > > > > > > At that point, the new machinery supports (or is used for): > > > > > > > > * Array-coercion: `np.array([your_scalar])` or > > > > `np.array([1], dtype=your_dtype)`. > > > > > > > > * Casting (practically full support). > > > > > > > > * UFuncs do not quite work. But short of writing `np.add(arr1, > > > > arr2)` > > > > with your DType involved, you can try a whole lot. (see below) > > > > > > > > * Promotion `np.result_type` should work very soon, but probably > > > > isn't > > > > is not very relevant anyway until ufuncs are fully implemented. > > > > > > > > That should allow you to do a lot of good experimentation, but > > > > due > > > > to > > > > the ufunc limitation, maybe not well on "existing" python code. > > > > > > > > > > > > The long story about limitations is: > > > > > > > > We are missing exposure of the new public API. I think I should > > > > be > > > > able to provide a solution for this pretty quickly, but it might > > > > require working of a NumPy branch. (I will write another email > > > > about > > > > it, hopefully we can find a better solution.) > > > > > > > > > > > > Limitations for UFuncs: UFuncs are the next big project, so to > > > > try > > > > it > > > > fully you will need some patience, unfortunately. > > > > > > > > But, there is some good news! You can write most of the "ufunc" > > > > already, you just can't "register" it. > > > > So what I can already offer you is a "DType-specific UFunc", > > > > e.g.: > > > > > > > > unit_dtype_multiply(np.array([1.], > > > > dtype=Float64UnitDType("m")), > > > > np.array([2.], > > > > dtype=Float64UnitDtype("s"))) > > > > > > > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`. > > > > > > > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` > > > > yet. > > > > Both registration and "promotion" logic are missing. > > > > > > > > I admit promotion may be one of the trickiest things, but trying > > > > this a > > > > bit might help with getting a clearer picture for promotion as > > > > well. > > > > > > > > > > > > The main last limitation is that I did not replace or create > > > > "fallback" > > > > solutions and/or replacement for the legacy `dtype->f->` > > > > yet. > > > > This is not a serious limitation for experimentation, though. It > > > > might > > > > even make sense to keep some of them around and replace them > > > > slowly. > > > > > > > > > > > > And of course, all the small issues/limitations that are not > > > > fixed > > > > because nobody tried yet... > > > > > > > > > > > > > > > > I hope this doesn't scare you away, or at least not for long :/. > > > > It > > > > could be very useful to start experimentation soon to push things > > > > forward a bit quicker. And I really want to have at least an > > > > experimental version in NumPy 1.21. > > > > > > > > Cheers, > > > > > > > > Sebastian > > > > > > > > > > > > > Lee > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Sat Mar 27 15:35:05 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Sat, 27 Mar 2021 16:35:05 -0300 Subject: [Numpy-discussion] Documentation Team meeting - Monday March 29 In-Reply-To: References: Message-ID: Hi all! Our next Documentation Team meeting will be on *Monday, March 29* at ***4PM UTC***. All are welcome - you don't need to already be a contributor to join. If you have questions or are curious about what we're doing, we'll be happy to meet you! If you wish to join on Zoom, use this link: https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success Here's the permanent hackmd document with the meeting notes (still being updated in the next few days!): https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210329T16&p1=1440&ah=1 *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Mar 27 19:45:18 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 27 Mar 2021 17:45:18 -0600 Subject: [Numpy-discussion] NumPy 1.20.2 released. Message-ID: Charles R Harris Sun, Feb 7, 2:23 PM to numpy-discussion, SciPy, SciPy-User, bcc: python-announce-list Hi All, On behalf of the NumPy team I am pleased to announce the release of NumPy 1.20.2. NumPy 1,20.2 is a bugfix release containing several fixes merged to the main branch after the NumPy 1.20.1 release. The Python versions supported for this release are 3.7-3.9. Wheels can be downloaded from PyPI ; source archives, release notes, and wheel hashes are available on Github . Linux users will need pip >= 0.19.3 in order to install manylinux2010 and manylinux2014 wheels. *Contributors* A total of 7 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - Allan Haldane - Bas van Beek - Charles Harris - Christoph Gohlke - Mateusz Sok?? + - Michael Lamparski - Sebastian Berg *Pull requests merged* A total of 20 pull requests were merged for this release. - #18382: MAINT: Update f2py from master. - #18459: BUG: ``diagflat`` could overflow on windows or 32-bit platforms - #18460: BUG: Fix refcount leak in f2py ``complex_double_from_pyobj``. - #18461: BUG: Fix tiny memory leaks when ``like=`` overrides are used - #18462: BUG: Remove temporary change of descr/flags in VOID functions - #18469: BUG: Segfault in nditer buffer dealloc for Object arrays - #18485: BUG: Remove suspicious type casting - #18486: BUG: remove nonsensical comparison of pointer < 0 - #18487: BUG: verify pointer against NULL before using it - #18488: BUG: check if PyArray_malloc succeeded - #18546: BUG: incorrect error fallthrough in nditer - #18559: CI: Backport CI fixes from main. - #18599: MAINT: Add annotations for `dtype.__getitem__`, `__mul__` and... - #18611: BUG: NameError in numpy.distutils.fcompiler.compaq - #18612: BUG: Fixed ``where`` keyword for ``np.mean`` & ``np.var`` methods - #18617: CI: Update apt package list before Python install - #18636: MAINT: Ensure that re-exported sub-modules are properly annotated - #18638: BUG: Fix ma coercion list-of-ma-arrays if they do not cast to... - #18661: BUG: Fix small valgrind-found issues - #18671: BUG: Fix small issues found with pytest-leaks Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Mar 28 08:26:25 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 28 Mar 2021 14:26:25 +0200 Subject: [Numpy-discussion] Japanese translation of numpy.org complete - proofreader help wanted Message-ID: Hi all, We have our first complete translation of the numpy.org content, Japanese, thanks to Atsushi Sakai. It would be really helpful if someone else who speaks Japanese can proofread the translations. Then it's ready to deploy I think - we just need to enable the language switcher widget, the code for that was tested already. Proofreading is done in Crowdin, which is a friendly interface for translators see - https://github.com/numpy/numpy/wiki/Translations-of-the-NumPy-website. This should not take more than an hour or two I think, and would be a very valuable contribution. If you'd like to do this and want help, or report that it all looks good, you can reply here, comment on the PR that the crowdin bot opened (https://github.com/numpy/numpy.org/pull/385), or use the Discussions tab in Crowdin. The Brazilian Portuguese translation is also close to complete (72%). If anyone feels motivated to complete that, that'd be great too - it could be taken along in the initial launch then. There were also some discussions about correct Portuguese terminology I believe, Melissa could you point to those in case that is relevant for completion? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedrichromstedt at gmail.com Mon Mar 29 03:52:34 2021 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Mon, 29 Mar 2021 09:52:34 +0200 Subject: [Numpy-discussion] Unreliable crash when converting using numpy.asarray via C buffer interface In-Reply-To: References: Message-ID: Hi Matti, Sebastian and Lev, Am Mo., 15. Feb. 2021 um 18:50 Uhr schrieb Lev Maximov : > > Try adding > view->suboffsets = NULL; > view->internal = NULL; > to Image_getbuffer finally I got it working easily using Lev's pointer cited above. I didn't follow the valgrind approach furthermore, since I found it likely that it'd produce the same finding. This is just to let you know; I applied the fix several weeks ago. Many thanks, Friedrich From lev.maximov at gmail.com Mon Mar 29 04:10:34 2021 From: lev.maximov at gmail.com (Lev Maximov) Date: Mon, 29 Mar 2021 15:10:34 +0700 Subject: [Numpy-discussion] Unreliable crash when converting using numpy.asarray via C buffer interface In-Reply-To: References: Message-ID: I'm glad you sorted it out as the subject line sounded quite horrifying ) Best regards, Lev On Mon, Mar 29, 2021 at 2:54 PM Friedrich Romstedt < friedrichromstedt at gmail.com> wrote: > Hi Matti, Sebastian and Lev, > > Am Mo., 15. Feb. 2021 um 18:50 Uhr schrieb Lev Maximov < > lev.maximov at gmail.com>: > > > > Try adding > > view->suboffsets = NULL; > > view->internal = NULL; > > to Image_getbuffer > > finally I got it working easily using Lev's pointer cited above. I > didn't follow the valgrind approach furthermore, since I found it > likely that it'd produce the same finding. > > This is just to let you know; I applied the fix several weeks ago. > > Many thanks, > Friedrich > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From meissner at hawaii.edu Mon Mar 29 13:21:20 2021 From: meissner at hawaii.edu (Gunter Meissner) Date: Mon, 29 Mar 2021 07:21:20 -1000 Subject: [Numpy-discussion] Unreliable crash when converting using numpy.asarray via C buffer interface In-Reply-To: References: Message-ID: Aloha Numpy Community, I am just writing a book on "How to Cheat in Statistics - And get Away with It". I noticed there is no built-in syntax for the 'Adjusted R-squared' in any library (do correct me if I am wrong) I think it would be a good idea to program it. The math is straight forward, I can provide it if desired. Thank you, Gunter On Mon, Feb 15, 2021 at 5:56 AM Sebastian Berg wrote: > On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote: > > Hi, > > > > Am Do., 4. Feb. 2021 um 09:07 Uhr schrieb Friedrich Romstedt > > : > > > Am Mo., 1. Feb. 2021 um 09:46 Uhr schrieb Matti Picus < > > > matti.picus at gmail.com>: > > > > Typically, one would create a complete example and then pointing > > > > to the > > > > code (as repo or pastebin, not as an attachment to a mail here). > > > > > > https://github.com/friedrichromstedt/bughunting-01 > > > > Last week I updated my example code to be more slim. There now > > exists > > a single-file extension module: > > > https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bugIhuntingfrmod/bughuntingfrmod.cpp > > > . > > The corresponding test program > > > https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py > > crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as > > well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the > > ``print`` > > statement contained in the test file is commented out. > > > > My hope to be able to fix my error myself by reducing the code to > > reproduce the problem has not been fulfillled. I feel that the > > abovementioned test code is short enough to ask for help with it > > here. > > Any hint on how I could solve my problem would be appreciated very > > much. > > I have tried it out, and can confirm that using debugging tools (namely > valgrind), will allow you track down the issue (valgrind reports it > from within python, running a python without debug symbols may > obfuscate the actual problem; if that is the limiting you, I can post > my valgrind output). > Since you are running a linux system, I am confident that you can run > it in valgrind to find it yourself. (There may be other ways.) > > Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and > ignore some errors e.g. when importing NumPy. > > Cheers, > > Sebastian > > > > > > There are some points which were not clarified yet; I am citing them > > below. > > > > So far, > > Friedrich > > > > > > - There are tools out there to analyze refcount problems. Python > > > > has > > > > some built-in tools for switching allocation strategies. > > > > > > Can you give me some pointer about this? > > > > > > > - numpy.asarray has a number of strategies to convert instances, > > > > which > > > > one is it using? > > > > > > I've tried to read about this, but couldn't find anything. What > > > are > > > these different strategies? > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Gunter Meissner, PhD University of Hawaii Adjunct Professor of MathFinance at Columbia University and NYU President of Derivatives Software www.dersoft.com CEO Cassandra Capital Management www.cassandracm.com CV: www.dersoft.com/cv.pdf Email: meissner at hawaii.edu Tel: USA (808) 779 3660 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rashiqazhan at gmail.com Mon Mar 29 14:26:00 2021 From: rashiqazhan at gmail.com (Rashiq Azhan) Date: Mon, 29 Mar 2021 18:26:00 +0000 Subject: [Numpy-discussion] Expanding the scope of numpy.unpackbits and numpy.packbits to include more than uint8 type Message-ID: I would like this feature to be added since I think it can very useful when there is a need to process data that cannot be included in uint8. One of my personal requirements is modifying a 10-bit, per channel, images held in a NumPy array but I cannot do that using the specified functions. They are eloquent solution and works well with the with NumPy functions as long as the data is uint8. From jfoxrabinovitz at gmail.com Mon Mar 29 15:10:32 2021 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Mon, 29 Mar 2021 15:10:32 -0400 Subject: [Numpy-discussion] Expanding the scope of numpy.unpackbits and numpy.packbits to include more than uint8 type In-Reply-To: References: Message-ID: You can view any array as uint8 On Mon, Mar 29, 2021, 14:27 Rashiq Azhan wrote: > I would like this feature to be added since I think it can very useful > when there is a need to process data that cannot be included in uint8. > One of my personal requirements is modifying a 10-bit, per channel, > images held in a NumPy array but I cannot do that using the specified > functions. They are eloquent solution and works well with the with > NumPy functions as long as the data is uint8. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Mon Mar 29 17:38:55 2021 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Mon, 29 Mar 2021 23:38:55 +0200 Subject: [Numpy-discussion] ANN: SfePy 2021.1 Message-ID: I am pleased to announce the release of SfePy 2021.1. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by finite element methods. It is distributed under the new BSD license. Home page: https://sfepy.org Mailing list: https://mail.python.org/mm3/mailman3/lists/sfepy.python.org/ Git (source) repository, issue tracker: https://github.com/sfepy/sfepy Highlights of this release -------------------------- - non-square homogenized coefficient matrices - new implementation of multi-linear terms - improved handling of Dirichlet and periodic boundary conditions in common nodes - terms in the term table document linked to examples For full release notes see [1]. Cheers, Robert Cimrman [1] http://docs.sfepy.org/doc/release_notes.html#id1 --- Contributors to this release in alphabetical order: Robert Cimrman Antony Kamp Vladimir Lukes From klark--kent at yandex.ru Tue Mar 30 08:41:40 2021 From: klark--kent at yandex.ru (klark--kent at yandex.ru) Date: Tue, 30 Mar 2021 15:41:40 +0300 Subject: [Numpy-discussion] Expanding the scope of numpy.unpackbits and numpy.packbits to include more than uint8 type In-Reply-To: References: Message-ID: <496321617107476@mail.yandex.ru> An HTML attachment was scrubbed... URL: From guillaume.bethouart at eshard.com Tue Mar 30 20:34:19 2021 From: guillaume.bethouart at eshard.com (Guillaume Bethouart) Date: Wed, 31 Mar 2021 02:34:19 +0200 Subject: [Numpy-discussion] Dot + add operation Message-ID: Is it possible to add a method to perform a dot product and add the result to an existing matrix in a single operation ? Like C = dot_add(A, B, C) equivalent to C += A @ B.This behavior is natively proposed by the Blas *gemm primitive. The goal is to reduce the peak memory consumption. Indeed, during the computation of C += A @ B, the maximum allocated memory is twice the size of C.Using *gemm to add directly the result , the maximum memory consumption is less than 1.5x the size of C. This difference is significant for large matrices. Any people interested in it ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Mar 30 22:35:41 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 30 Mar 2021 21:35:41 -0500 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday (no DST: for those e.g. in the EU) Message-ID: Hi all, There will be a NumPy Community meeting Wednesday March 3rd at 20:00 UTC. Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian PS: As the subject says, we will stay on UTC 20:00, so for those in the EU and anyone else who had daylight saving times switches the time will have shifted. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Wed Mar 31 06:18:49 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 31 Mar 2021 12:18:49 +0200 Subject: [Numpy-discussion] Steering Council membership updates Message-ID: Hi all, On behalf of the NumPy Steering Council (SC) I have a number of membership changes to announce. We're excited to welcome Inessa Pawson and Melissa Mendon?a as new SC members. Inessa has been contributing for close to two years, and has been a driving force behind the new website, the user survey, and other content and community initiatives. Melissa has been contributing since the start of last year, she leads the documentation team, co-maintains f2py, does a lot of mentoring, and is the PI on our current CZI grant. A number of people are moving to "emeritus SC member" status: Nathaniel Smith, Pauli Virtanen, Julian Taylor, Allan Haldane, and Jaime Fern?ndez del R?o. They have been in low (or no) activity mode for a while, and this membership update reflects that. They all still have commit rights, and if they get more active we'll of course welcome them back with open arms. With these changes the SC now consists of the people who have been most active across the projects' activities and decision-making. A PR with changes to the governance/people page is up at https://github.com/numpy/numpy/pull/18705. While the list of SC members is shrinking, it does feel like the NumPy project itself is growing. There are a lot of people getting involved, from new maintainers focusing on technical topics like type annotations and SIMD acceleration to tutorial writers and people working on website content and accessibility. Which is awesome to see. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Mar 31 07:34:51 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 31 Mar 2021 13:34:51 +0200 Subject: [Numpy-discussion] Dot + add operation In-Reply-To: References: Message-ID: On Wed, Mar 31, 2021 at 2:35 AM Guillaume Bethouart < guillaume.bethouart at eshard.com> wrote: > Is it possible to add a method to perform a dot product and add the result > to an existing matrix in a single operation ? > > Like C = dot_add(A, B, C) equivalent to C += A @ B.This behavior is > natively proposed by the Blas *gemm primitive. > > The goal is to reduce the peak memory consumption. Indeed, during the > computation of C += A @ B, the maximum allocated memory is twice the size > of C.Using *gemm to add directly the result , the maximum memory > consumption is less than 1.5x the size of C. > This difference is significant for large matrices. > > Any people interested in it ? > Hi Guillaume, such fused operations cannot easily be done with NumPy alone, and it does not make sense to add separate APIs for that purpose because there are so many combinations of function calls that one might want to fuse. Instead, Numba, Pythran or numexpr can add this to some extent for numpy code. E.g. search for "loop fusion" in the Numba docs. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.k.sheppard at gmail.com Wed Mar 31 07:43:02 2021 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Wed, 31 Mar 2021 12:43:02 +0100 Subject: [Numpy-discussion] Dot + add operation In-Reply-To: References: Message-ID: Or just use SciPy's get_blas_funcs to access *gemm, which directly exposes this function: https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.blas.dgemm.html Kevin On Wed, Mar 31, 2021 at 12:35 PM Ralf Gommers wrote: > > > On Wed, Mar 31, 2021 at 2:35 AM Guillaume Bethouart < > guillaume.bethouart at eshard.com> wrote: > >> Is it possible to add a method to perform a dot product and add the >> result to an existing matrix in a single operation ? >> >> Like C = dot_add(A, B, C) equivalent to C += A @ B.This behavior is >> natively proposed by the Blas *gemm primitive. >> >> The goal is to reduce the peak memory consumption. Indeed, during the >> computation of C += A @ B, the maximum allocated memory is twice the size >> of C.Using *gemm to add directly the result , the maximum memory >> consumption is less than 1.5x the size of C. >> This difference is significant for large matrices. >> >> Any people interested in it ? >> > > Hi Guillaume, such fused operations cannot easily be done with NumPy > alone, and it does not make sense to add separate APIs for that purpose > because there are so many combinations of function calls that one might > want to fuse. > > Instead, Numba, Pythran or numexpr can add this to some extent for numpy > code. E.g. search for "loop fusion" in the Numba docs. > > Cheers, > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guillaume.bethouart at eshard.com Wed Mar 31 08:36:47 2021 From: guillaume.bethouart at eshard.com (Guillaume Bethouart) Date: Wed, 31 Mar 2021 05:36:47 -0700 (MST) Subject: [Numpy-discussion] Dot + add operation In-Reply-To: References: Message-ID: <1617194207104-0.post@n7.nabble.com> Thanks for the quick reply. A was not aware of the fact that this kind of "fuse" function is not compatible with Numpy API. I understand the point. FYI, numba is not able to simplify this kind of calculus: C += A @ B. Nor numexpr which is not compatible with dot product. I did not test pythran. Thus, the only solution is to use the Blas functions through scipy, as recalled by Kevin. I'll play a bit with transposition and alignment issues ... Regards, -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From stefanv at berkeley.edu Wed Mar 31 16:56:20 2021 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 31 Mar 2021 13:56:20 -0700 Subject: [Numpy-discussion] Steering Council membership updates In-Reply-To: References: Message-ID: <0b7fbf11-e3dc-4e79-9f9d-f771fca175ce@www.fastmail.com> On Wed, Mar 31, 2021, at 03:18, Ralf Gommers wrote: > We're excited to welcome Inessa Pawson and Melissa Mendon?a as new SC members. Inessa has been contributing for close to two years, and has been a driving force behind the new website, the user survey, and other content and community initiatives. Melissa has been contributing since the start of last year, she leads the documentation team, co-maintains f2py, does a lot of mentoring, and is the PI on our current CZI grant. Thank you for your service, Inessa and Melissa! Welcome to the steering council. Best regards, St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: