From melissawm at gmail.com Sun Aug 1 18:14:44 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Sun, 1 Aug 2021 19:14:44 -0300 Subject: [Numpy-discussion] Documentation Team meeting - Monday August 2 In-Reply-To: References: Message-ID: Hi all! Sorry for the late notice - our next Documentation Team meeting will be tomorrow - *Monday, August 2nd* at ***4PM UTC***. All are welcome - you don't need to already be a contributor to join. If you have questions or are curious about what we're doing, we'll be happy to meet you! If you wish to join on Zoom, use this link: https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success Here's the permanent hackmd document with the meeting notes (still being updated in the next few days!): https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210802T16&p1=1440&ah=1 *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From tyler.je.reddy at gmail.com Sun Aug 1 22:30:31 2021 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Sun, 1 Aug 2021 20:30:31 -0600 Subject: [Numpy-discussion] ANN: SciPy 1.7.1 Message-ID: Hi all, On behalf of the SciPy development team I'm pleased to announce the release of SciPy 1.7.1, which is a bug fix release. Sources and binary wheels can be found at: https://pypi.org/project/scipy/ and at: https://github.com/scipy/scipy/releases/tag/v1.7.1 One of a few ways to install this release with pip: pip install scipy==1.7.1 ===================== SciPy 1.7.1 Release Notes ===================== SciPy 1.7.1 is a bug-fix release with no new features compared to 1.7.0. Authors ======= * Peter Bell * Evgeni Burovski * Justin Charlong + * Ralf Gommers * Matti Picus * Tyler Reddy * Pamphile Roy * Sebastian Wallk?tter * Arthur Volant A total of 9 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete. Issues closed for 1.7.1 ------------------------------ * `#14074 `__: Segmentation fault when building cKDTree with Scipy 1.6.3. * `#14271 `__: scipy.io.loadmat failure in 1.7.0 * `#14273 `__: \`scipy.signal.{medfilt,medfilt2d}\` hit "Windows fatal exception:... * `#14282 `__: DOC, CI: stats skewtest refguide failure * `#14363 `__: Huge stack allocation in _sobol.pyx may cause stack overvflow * `#14382 `__: Memory leak in \`scipy.spatial.distance\` for \`cdist\` * `#14396 `__: BUG: Sphinx 4.1 breaks the banner's logo * `#14444 `__: DOC/FEAT Rotation.from_rotvec documents a degrees argument which... Pull requests for 1.7.1 ------------------------------ * `#14178 `__: DEV: Update Boschloo Exact test * `#14264 `__: REL: prepare for SciPy 1.7.1 * `#14283 `__: BUG: fix refguide-check namedtuple handling * `#14303 `__: FIX: Check for None before calling str methods * `#14327 `__: BUG: medfilt can access beyond the end of an array * `#14355 `__: BUG: KDTree balanced_tree is unbalanced for degenerate data * `#14368 `__: BUG: avoid large cython global variable in function * `#14384 `__: BUG: Reference count leak in distance_pybind * `#14397 `__: DOC/CI: do not allow sphinx 4.1. * `#14417 `__: DOC/CI: pin sphinx to !=4.1.0 * `#14460 `__: DOC: add required scipy version to kwarg * `#14466 `__: MAINT: 1.7.1 backports (round 1) * `#14508 `__: MAINT: bump scipy-mathjax * `#14509 `__: MAINT: 1.7.1 backports (round 2) Checksums ========= MD5 ~~~ ef8b44a175818183de28f2c3dacf4b74 scipy-1.7.1-cp37-cp37m-macosx_10_9_x86_64.whl 4f717b62946a6306bba88696b4992c73 scipy-1.7.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl decf6837d0a28bdeb911e6e2d18b777c scipy-1.7.1-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.whl 6449932605e3284f731744eb207e5612 scipy-1.7.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl fb94deaf9b43bf18890b0bd12fc26fda scipy-1.7.1-cp37-cp37m-win32.whl c5894e5811278243d7f4abeb1a5f230f scipy-1.7.1-cp37-cp37m-win_amd64.whl 3aec592a699f835319cbb4649f30df71 scipy-1.7.1-cp38-cp38-macosx_10_9_x86_64.whl 774a74a6c81d40c9a305523707c024f4 scipy-1.7.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 30206a19a96549f665bd608fe6bf2761 scipy-1.7.1-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl eb58d9f3797d47866bfe571d5df3b827 scipy-1.7.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl 808a907d994b98fd6dbe2050a48b8c69 scipy-1.7.1-cp38-cp38-win32.whl 688921def6681ee5abe8543aca8383c2 scipy-1.7.1-cp38-cp38-win_amd64.whl 2fe4e958cb14d0b071c494b9faee0c98 scipy-1.7.1-cp39-cp39-macosx_10_9_x86_64.whl dd5b4db9cf83a0594e0b651e198b16a4 scipy-1.7.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 67c5d75378d0ba2803c1b93fe670563b scipy-1.7.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl c1ea8eec1dd6dc9c1b3eae24e3b3a34a scipy-1.7.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl 93efd36f2c52dadbe7c9a377ad8c5be2 scipy-1.7.1-cp39-cp39-win32.whl f8d0f87aaa8929f059fcf840db345310 scipy-1.7.1-cp39-cp39-win_amd64.whl 8ac74369cdcabc097f602682c951197c scipy-1.7.1.tar.gz deb130f3959e5623fafb4a262c28183b scipy-1.7.1.tar.xz 5dd4ab895eaa141cb01b954ecae2ddfc scipy-1.7.1.zip SHA256 ~~~~~~ 2a0eeaab01258e0870c4022a6cd329aef3b7c6c2b606bd7cf7bb2ba9820ae561 scipy-1.7.1-cp37-cp37m-macosx_10_9_x86_64.whl 3f52470e0548cdb74fb8ddf06773ffdcca7c97550f903b1c51312ec19243a7f7 scipy-1.7.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 787749110a23502031fb1643c55a2236c99c6b989cca703ea2114d65e21728ef scipy-1.7.1-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.whl 3304bd5bc32e00954ac4b3f4cc382ca8824719bf348aacbec6347337d6b125fe scipy-1.7.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl d1388fbac9dd591ea630da75c455f4cc637a7ca5ecb31a6b6cef430914749cde scipy-1.7.1-cp37-cp37m-win32.whl d648aa85dd5074b1ed83008ae987c3fbb53d68af619fce1dee231f4d8bd40e2f scipy-1.7.1-cp37-cp37m-win_amd64.whl bc61e3e5ff92d2f32bb263621d54a9cff5e3f7c420af3d1fa122ce2529de2bd9 scipy-1.7.1-cp38-cp38-macosx_10_9_x86_64.whl a496b42dbcd04ea9924f5e92be63af3d8e0f43a274b769bfaca0a297327d54ee scipy-1.7.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl d13f31457f2216e5705304d9f28e2826edf75487410a57aa99263fa4ffd792c2 scipy-1.7.1-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl 90c07ba5f34f33299a428b0d4fa24c30d2ceba44d63f8385b2b05be460819fcb scipy-1.7.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl efdd3825d54c58df2cc394366ca4b9166cf940a0ebddeb87b6c10053deb625ea scipy-1.7.1-cp38-cp38-win32.whl 71cfc96297617eab911e22216e8a8597703202e95636d9406df9af5c2ac99a2b scipy-1.7.1-cp38-cp38-win_amd64.whl 4ee952f39a4a4c7ba775a32b664b1f4b74818548b65f765987adc14bb78f5802 scipy-1.7.1-cp39-cp39-macosx_10_9_x86_64.whl 611f9cb459d0707dd8e4de0c96f86e93f61aac7475fcb225e9ec71fecdc5cebf scipy-1.7.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl e101bceeb9e65a90dadbc5ca31283403a2d4667b9c178db29109750568e8d112 scipy-1.7.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl 4729b41a4cdaf4cd011aeac816b532f990bdf97710cef59149d3e293115cf467 scipy-1.7.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl c9951e3746b68974125e5e3445008a4163dd6d20ae0bbdae22b38cb8951dc11b scipy-1.7.1-cp39-cp39-win32.whl da9c6b336e540def0b7fd65603da8abeb306c5fc9a5f4238665cbbb5ff95cf58 scipy-1.7.1-cp39-cp39-win_amd64.whl 6b47d5fa7ea651054362561a28b1ccc8da9368a39514c1bbf6c0977a1c376764 scipy-1.7.1.tar.gz fdfe1d1eb1569846e331bd8d72106a8c446dafb2192c00adbb5376b02a0a1104 scipy-1.7.1.tar.xz 0dbea8556cb3770656770e5ed02e451dd3069c2f2f70ac885dea65f679a23afe scipy-1.7.1.zip -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Aug 2 10:53:15 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Aug 2021 08:53:15 -0600 Subject: [Numpy-discussion] ANN: SciPy 1.7.1 In-Reply-To: References: Message-ID: On Sun, Aug 1, 2021 at 8:31 PM Tyler Reddy wrote: > Hi all, > > On behalf of the SciPy development team I'm pleased to announce > the release of SciPy 1.7.1, which is a bug fix release. > > Sources and binary wheels can be found at: > https://pypi.org/project/scipy/ > and at: https://github.com/scipy/scipy/releases/tag/v1.7.1 > > > > One of a few ways to install this release with pip: > > pip install scipy==1.7.1 > > ===================== > SciPy 1.7.1 Release Notes > ===================== > > SciPy 1.7.1 is a bug-fix release with no new features > compared to 1.7.0. > > Authors > ======= > > * Peter Bell > * Evgeni Burovski > * Justin Charlong + > * Ralf Gommers > * Matti Picus > * Tyler Reddy > * Pamphile Roy > * Sebastian Wallk?tter > * Arthur Volant > > A total of 9 people contributed to this release. > People with a "+" by their names contributed a patch for the first time. > This list of names is automatically generated, and may not be fully > complete. > > Issues closed for 1.7.1 > ------------------------------ > > * `#14074 `__: Segmentation > fault when building cKDTree with Scipy 1.6.3. > * `#14271 `__: > scipy.io.loadmat failure in 1.7.0 > * `#14273 `__: > \`scipy.signal.{medfilt,medfilt2d}\` hit "Windows fatal exception:... > * `#14282 `__: DOC, CI: > stats skewtest refguide failure > * `#14363 `__: Huge stack > allocation in _sobol.pyx may cause stack overvflow > * `#14382 `__: Memory leak > in \`scipy.spatial.distance\` for \`cdist\` > * `#14396 `__: BUG: Sphinx > 4.1 breaks the banner's logo > * `#14444 `__: DOC/FEAT > Rotation.from_rotvec documents a degrees argument which... > > Pull requests for 1.7.1 > ------------------------------ > > * `#14178 `__: DEV: Update > Boschloo Exact test > * `#14264 `__: REL: prepare > for SciPy 1.7.1 > * `#14283 `__: BUG: fix > refguide-check namedtuple handling > * `#14303 `__: FIX: Check for > None before calling str methods > * `#14327 `__: BUG: medfilt > can access beyond the end of an array > * `#14355 `__: BUG: KDTree > balanced_tree is unbalanced for degenerate data > * `#14368 `__: BUG: avoid > large cython global variable in function > * `#14384 `__: BUG: Reference > count leak in distance_pybind > * `#14397 `__: DOC/CI: do not > allow sphinx 4.1. > * `#14417 `__: DOC/CI: pin > sphinx to !=4.1.0 > * `#14460 `__: DOC: add > required scipy version to kwarg > * `#14466 `__: MAINT: 1.7.1 > backports (round 1) > * `#14508 `__: MAINT: bump > scipy-mathjax > * `#14509 `__: MAINT: 1.7.1 > backports (round 2) > > > Checksums > ========= > > MD5 > ~~~ > > ef8b44a175818183de28f2c3dacf4b74 > scipy-1.7.1-cp37-cp37m-macosx_10_9_x86_64.whl > 4f717b62946a6306bba88696b4992c73 > scipy-1.7.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl > decf6837d0a28bdeb911e6e2d18b777c > scipy-1.7.1-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.whl > 6449932605e3284f731744eb207e5612 > scipy-1.7.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl > fb94deaf9b43bf18890b0bd12fc26fda scipy-1.7.1-cp37-cp37m-win32.whl > c5894e5811278243d7f4abeb1a5f230f scipy-1.7.1-cp37-cp37m-win_amd64.whl > 3aec592a699f835319cbb4649f30df71 > scipy-1.7.1-cp38-cp38-macosx_10_9_x86_64.whl > 774a74a6c81d40c9a305523707c024f4 > scipy-1.7.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl > 30206a19a96549f665bd608fe6bf2761 > scipy-1.7.1-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl > eb58d9f3797d47866bfe571d5df3b827 > scipy-1.7.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl > 808a907d994b98fd6dbe2050a48b8c69 scipy-1.7.1-cp38-cp38-win32.whl > 688921def6681ee5abe8543aca8383c2 scipy-1.7.1-cp38-cp38-win_amd64.whl > 2fe4e958cb14d0b071c494b9faee0c98 > scipy-1.7.1-cp39-cp39-macosx_10_9_x86_64.whl > dd5b4db9cf83a0594e0b651e198b16a4 > scipy-1.7.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl > 67c5d75378d0ba2803c1b93fe670563b > scipy-1.7.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl > c1ea8eec1dd6dc9c1b3eae24e3b3a34a > scipy-1.7.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl > 93efd36f2c52dadbe7c9a377ad8c5be2 scipy-1.7.1-cp39-cp39-win32.whl > f8d0f87aaa8929f059fcf840db345310 scipy-1.7.1-cp39-cp39-win_amd64.whl > 8ac74369cdcabc097f602682c951197c scipy-1.7.1.tar.gz > deb130f3959e5623fafb4a262c28183b scipy-1.7.1.tar.xz > 5dd4ab895eaa141cb01b954ecae2ddfc scipy-1.7.1.zip > > SHA256 > ~~~~~~ > > 2a0eeaab01258e0870c4022a6cd329aef3b7c6c2b606bd7cf7bb2ba9820ae561 > scipy-1.7.1-cp37-cp37m-macosx_10_9_x86_64.whl > 3f52470e0548cdb74fb8ddf06773ffdcca7c97550f903b1c51312ec19243a7f7 > scipy-1.7.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl > 787749110a23502031fb1643c55a2236c99c6b989cca703ea2114d65e21728ef > scipy-1.7.1-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.whl > 3304bd5bc32e00954ac4b3f4cc382ca8824719bf348aacbec6347337d6b125fe > scipy-1.7.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl > d1388fbac9dd591ea630da75c455f4cc637a7ca5ecb31a6b6cef430914749cde > scipy-1.7.1-cp37-cp37m-win32.whl > d648aa85dd5074b1ed83008ae987c3fbb53d68af619fce1dee231f4d8bd40e2f > scipy-1.7.1-cp37-cp37m-win_amd64.whl > bc61e3e5ff92d2f32bb263621d54a9cff5e3f7c420af3d1fa122ce2529de2bd9 > scipy-1.7.1-cp38-cp38-macosx_10_9_x86_64.whl > a496b42dbcd04ea9924f5e92be63af3d8e0f43a274b769bfaca0a297327d54ee > scipy-1.7.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl > d13f31457f2216e5705304d9f28e2826edf75487410a57aa99263fa4ffd792c2 > scipy-1.7.1-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl > 90c07ba5f34f33299a428b0d4fa24c30d2ceba44d63f8385b2b05be460819fcb > scipy-1.7.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl > efdd3825d54c58df2cc394366ca4b9166cf940a0ebddeb87b6c10053deb625ea > scipy-1.7.1-cp38-cp38-win32.whl > 71cfc96297617eab911e22216e8a8597703202e95636d9406df9af5c2ac99a2b > scipy-1.7.1-cp38-cp38-win_amd64.whl > 4ee952f39a4a4c7ba775a32b664b1f4b74818548b65f765987adc14bb78f5802 > scipy-1.7.1-cp39-cp39-macosx_10_9_x86_64.whl > 611f9cb459d0707dd8e4de0c96f86e93f61aac7475fcb225e9ec71fecdc5cebf > scipy-1.7.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl > e101bceeb9e65a90dadbc5ca31283403a2d4667b9c178db29109750568e8d112 > scipy-1.7.1-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl > 4729b41a4cdaf4cd011aeac816b532f990bdf97710cef59149d3e293115cf467 > scipy-1.7.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl > c9951e3746b68974125e5e3445008a4163dd6d20ae0bbdae22b38cb8951dc11b > scipy-1.7.1-cp39-cp39-win32.whl > da9c6b336e540def0b7fd65603da8abeb306c5fc9a5f4238665cbbb5ff95cf58 > scipy-1.7.1-cp39-cp39-win_amd64.whl > 6b47d5fa7ea651054362561a28b1ccc8da9368a39514c1bbf6c0977a1c376764 > scipy-1.7.1.tar.gz > fdfe1d1eb1569846e331bd8d72106a8c446dafb2192c00adbb5376b02a0a1104 > scipy-1.7.1.tar.xz > 0dbea8556cb3770656770e5ed02e451dd3069c2f2f70ac885dea65f679a23afe > scipy-1.7.1.zip > > Thanks Tyler... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Aug 2 13:03:32 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 02 Aug 2021 12:03:32 -0500 Subject: [Numpy-discussion] Revert the return of a single NaN for `np.unique` with floating point numbers? Message-ID: <4ba1cf41013a05c6ec255a850668ce3a3410b0c4.camel@sipsolutions.net> Hi all, In NumPy 1.21, the output of `np.unique` changed in the presence of multiple NaNs. Previously, all NaNs were returned when we now only return one (all NaNs were considered unique): a = np.array([1, 1, np.nan, np.nan, np.nan]) Before 1.21: >>> np.unique(a) array([ 1., nan, nan, nan]) After 1.21: array([ 1., nan]) This change was requested in an old issue: https://github.com/numpy/numpy/issues/2111 And happened here: https://github.com/numpy/numpy/pull/18070 While, it has a release note. I am not sure the change got the attention it deserved. This would be especially worrying if it is a regression for anyone? Cheers, Sebastian PS: One additional note, is that this does not work for object arrays (it cannot reasonable): >>> np.unique(a.astype(object)) array([1.0, nan, nan, nan], dtype=object) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From mukulikapahari at gmail.com Mon Aug 2 13:14:36 2021 From: mukulikapahari at gmail.com (Mukulika Pahari) Date: Mon, 2 Aug 2021 22:44:36 +0530 Subject: [Numpy-discussion] Plan for a new Indexing how-to document Message-ID: Hi, all! I'm planning to write a how-to doc for ndarray indexing as per discussion in https://github.com/numpy/numpy/pull/19407. I have opened a discussion issue (https://github.com/numpy/numpy/issues/19586) with a few ideas and would love to know your opinions on them and other use-cases you'd like to see in the doc. Thank you! Mukulika Pahari -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Aug 2 13:49:08 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 2 Aug 2021 19:49:08 +0200 Subject: [Numpy-discussion] Revert the return of a single NaN for `np.unique` with floating point numbers? In-Reply-To: <4ba1cf41013a05c6ec255a850668ce3a3410b0c4.camel@sipsolutions.net> References: <4ba1cf41013a05c6ec255a850668ce3a3410b0c4.camel@sipsolutions.net> Message-ID: On Mon, Aug 2, 2021 at 7:04 PM Sebastian Berg wrote: > Hi all, > > In NumPy 1.21, the output of `np.unique` changed in the presence of > multiple NaNs. Previously, all NaNs were returned when we now only > return one (all NaNs were considered unique): > > a = np.array([1, 1, np.nan, np.nan, np.nan]) > > Before 1.21: > > >>> np.unique(a) > array([ 1., nan, nan, nan]) > > After 1.21: > > array([ 1., nan]) > > > This change was requested in an old issue: > > https://github.com/numpy/numpy/issues/2111 > > And happened here: > > https://github.com/numpy/numpy/pull/18070 > > While, it has a release note. I am not sure the change got the > attention it deserved. This would be especially worrying if it is a > regression for anyone? > I think it's now the expected answer, not a regression. `unique` is not an elementwise function that needs to adhere to IEEE-754 where nan != nan. I can't remember reviewing this change, but it makes perfect sense to me. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Aug 2 13:50:14 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 02 Aug 2021 12:50:14 -0500 Subject: [Numpy-discussion] Proposal for adding bit_count In-Reply-To: References: Message-ID: On Thu, 2021-07-29 at 21:46 +0530, Ganesh Kathiresan wrote: > Hi All, > > > > I am working on a new > UFunc, ` > bit_count ` (popcount > in > other languages) that aims to count the number of 1-bits in > the absolute value of an Integer. > Thanks for the proposal! Since `int.bit_count()` is now a Python builtin (as a method on integers), I feel it is a good idea to add it to NumPy. It is requested fairly commonly as well. Aside from comments about the inclusion in NumPy, I had two questions where I would love input: * Should `np.ndarray.bit_count()` exist? I tend against this; but we should have it on (integer) scalars to mirror the Python `int`. * The return value is currently the same type as the input. That means that: `np.bit_count(uint8)` returns the count as `uint8` while `np.bit_count(int32)` returns it as `int32`, etc. I think `bit_count` is different from typical math functions, so I am not quite sure this is what we want? The main alternative I see right now would be returning the default integer (usually int64) ? unless otherwise specified. As an aside, I am not sure what is returned for booleans right now int8, uint8, or boolean? (Returning boolean for a count seems an oversight though). Cheers, Sebastian > > Implementation > ---------------------------------- > > The primary reference for the implementation is CountBitsSetParallel > < > http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel > >. > Here we take 12 operations to achieve the result which is the same as > the > lookup table method but does not suffer from memory issues or cache > misses. > > The implementation is aimed at unsigned integers, absolute value of > signed > integers and objects that support the operation. > > > Usage > -------------- > > ??? >>> np.bit_count(1023) > > ??? 10 > > ??? >>> a = np.array([2**i - 1 for i in range(16)]) > > ??? >>> np.bit_count(a) > > ??? array([ 0,? 1,? 2,? 3,? 4,? 5,? 6,? 7,? 8,? 9, 10, 11, 12, 13, > 14, 15]) > > ??? >>> np.int32(1023).bit_count() > > ??? 10 > > > Notes > ------------- > > 1. Python has included this method here > < > https://github.com/python/cpython/commit/8bd216dfede9cb2d5bedb67f20a30c99844dbfb8 > > > ?(3.10+). Tracking issue > > 2.? NumPy tracking issue > > > 3.? Interesting read > < > https://archive.org/details/dr_dobbs_journal_vol_08/page/n185/mode/2up > > on > how we get the magic number. Needed a bit of digging :) > > > Please let us know what you think about the implementation and where > we can > improve in terms of performance or interface. > > Regards, > Ganesh > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From stefanv at berkeley.edu Mon Aug 2 16:10:00 2021 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Mon, 02 Aug 2021 13:10:00 -0700 Subject: [Numpy-discussion] Proposal for adding bit_count In-Reply-To: References: Message-ID: On Mon, Aug 2, 2021, at 10:50, Sebastian Berg wrote: > * Should `np.ndarray.bit_count()` exist? I tend against this; > but we should have it on (integer) scalars to mirror the > Python `int`. Should `np.bit_count` exist? Having it on the int* types may be sufficient. > * The return value is currently the same type as the input. That > means that: `np.bit_count(uint8)` returns the count as `uint8` > while `np.bit_count(int32)` returns it as `int32`, etc. What is the max value of the count? 64? If so it can go in a uint8. St?fan From matti.picus at gmail.com Tue Aug 3 04:44:20 2021 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 3 Aug 2021 11:44:20 +0300 Subject: [Numpy-discussion] Revert the return of a single NaN for `np.unique` with floating point numbers? In-Reply-To: References: <4ba1cf41013a05c6ec255a850668ce3a3410b0c4.camel@sipsolutions.net> Message-ID: <7dc05090-1f26-7253-df35-4e6e6b7b20c7@gmail.com> On 2/8/21 8:49 pm, Ralf Gommers wrote: > > > On Mon, Aug 2, 2021 at 7:04 PM Sebastian Berg > > wrote: > > Hi all, > > In NumPy 1.21, the output of `np.unique` changed in the presence of > multiple NaNs.? Previously, all NaNs were returned when we now only > return one (all NaNs were considered unique): > > ? ? a = np.array([1, 1, np.nan, np.nan, np.nan]) > > Before 1.21: > > ? ? >>> np.unique(a) > ? ? array([ 1., nan, nan, nan]) > > After 1.21: > > ? ? array([ 1., nan]) > > > This change was requested in an old issue: > > https://github.com/numpy/numpy/issues/2111 > > > And happened here: > > https://github.com/numpy/numpy/pull/18070 > > > While, it has a release note.? I am not sure the change got the > attention it deserved.? This would be especially worrying if it is a > regression for anyone? > > > I think it's now the expected answer, not a regression. `unique` is > not an elementwise function that needs to adhere to IEEE-754 where nan > != nan. I can't remember reviewing this change, but it makes perfect > sense to me. > > Cheers, > Ralf > We were discussing this today (me and Matthew) and came up with an edge case when using set(a), it will return the old value. We should add this as a documented "feature" Matti From sebastian at sipsolutions.net Tue Aug 3 11:47:59 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 03 Aug 2021 10:47:59 -0500 Subject: [Numpy-discussion] Proposal for adding bit_count In-Reply-To: References: Message-ID: <0f7e9395a9de68c85479e3b69ae96f68d259b87f.camel@sipsolutions.net> On Mon, 2021-08-02 at 13:10 -0700, Stefan van der Walt wrote: > On Mon, Aug 2, 2021, at 10:50, Sebastian Berg wrote: > > * Should `np.ndarray.bit_count()` exist?? I tend against this; > > ? but we should have it on (integer) scalars to mirror the > > ? Python `int`. > > Should `np.bit_count` exist?? Having it on the int* types may be > sufficient. Right, we could add it only to the integer scalars mostly for Python compatibility. The PR suggests to create a ufunc to make the feature available to typical NumPy code (allow using it with arrays). > > > * The return value is currently the same type as the input.? That > > ? means that: `np.bit_count(uint8)` returns the count as `uint8` > > ? while `np.bit_count(int32)` returns it as `int32`, etc. > > What is the max value of the count?? 64?? If so it can go in a uint8. Yes, uint8 would even work for 128 bit integers. I was a bit unsure about this, since we rarely create non-default integer arrays unless prompted, but it is a good option as well. Cheers, Sebastian > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Aug 3 20:41:20 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 03 Aug 2021 19:41:20 -0500 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday Message-ID: Hi all, There will be a NumPy Community meeting Wednesday August 4th at 20:00 UTC. Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ganesh3597 at gmail.com Wed Aug 4 12:40:44 2021 From: ganesh3597 at gmail.com (Ganesh Kathiresan) Date: Wed, 4 Aug 2021 22:10:44 +0530 Subject: [Numpy-discussion] Proposal for adding bit_count In-Reply-To: <0f7e9395a9de68c85479e3b69ae96f68d259b87f.camel@sipsolutions.net> References: <0f7e9395a9de68c85479e3b69ae96f68d259b87f.camel@sipsolutions.net> Message-ID: > > Should `np.ndarray.bit_count()` exist? I tend against this; Thanks for the info Sebastian, I agree with this as we can stick to what Python offers. Should `np.bit_count` exist? Having it on the int* types may be sufficient. Hey Stephan, regarding this, I felt we could support it in the same lines NumPy Mathematical Functions , something like GCD perhaps, where we do not have `np.ndarray.gcd` but do have an `np.gcd` What is the max value of the count? 64? If so it can go in a uint8. This makes sense yeah, will make this change, thanks for the suggestion. Also, an interesting future proposal can be to club all the bitwise functions into a single "namespace" of sorts and have np.bits.*. This has already been suggested in this comment and I feel this would be a clean addition and we can support other useful functions as well. Regards, Ganesh On Tue, Aug 3, 2021 at 9:18 PM Sebastian Berg wrote: > On Mon, 2021-08-02 at 13:10 -0700, Stefan van der Walt wrote: > > On Mon, Aug 2, 2021, at 10:50, Sebastian Berg wrote: > > > * Should `np.ndarray.bit_count()` exist? I tend against this; > > > but we should have it on (integer) scalars to mirror the > > > Python `int`. > > > > Should `np.bit_count` exist? Having it on the int* types may be > > sufficient. > > Right, we could add it only to the integer scalars mostly for Python > compatibility. The PR suggests to create a ufunc to make the feature > available to typical NumPy code (allow using it with arrays). > > > > > > * The return value is currently the same type as the input. That > > > means that: `np.bit_count(uint8)` returns the count as `uint8` > > > while `np.bit_count(int32)` returns it as `int32`, etc. > > > > What is the max value of the count? 64? If so it can go in a uint8. > > Yes, uint8 would even work for 128 bit integers. I was a bit unsure > about this, since we rarely create non-default integer arrays unless > prompted, but it is a good option as well. > > Cheers, > > Sebastian > > > > > > St?fan > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon Aug 9 09:50:47 2021 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 9 Aug 2021 16:50:47 +0300 Subject: [Numpy-discussion] reducing effort spent on wheel builds? In-Reply-To: References: <65a423a1-c126-2dc9-426d-44c3292683f5@gmail.com> Message-ID: <065ed152-60a6-bab4-fed6-e83262c0ef09@gmail.com> On 16/7/21 9:11 pm, Chris Barker wrote: > Just a note on: > > > For the record, I am +1 on removing sdists from PyPI until pip changes > its default to --only-binary :all: [1] > > I agree that the defaults for pip are unfortunate (and indeed the > legacy of pip doing, well, a lot, (i.e. building and installing and > package managing and dependencies, and ...) with one interface. > > However, There's a long tradition of sdists on PyPi -- and PyPi is > used, for the most part, as the source of sdists?for other systems > (conda-forge for example). I did just check, and numpy is an exception > -- it's pointing to gitHub: > > source: > ? ? url: https://github.com/numpy/numpy/releases/download/v{{ > version > }}/numpy-{{ version }}.tar.gz > > But others may be counting on sdists?on PyPi. > > Also, an sdist is not always the same as a gitHub release -- there is > some "magic" in building?it -- it's not just a copy of the repo. > Again, numpy may be building its releases as an sdist (or it just > doesn't. matter), but something to keep in mind. > > Another thought is to only support platforms that have a > committed?maintainer -- I think that's how Python itself does it. The > more obscure platforms are only supported if someone steps up to > support them (I suppose that's technically true for all platforms, but > not hard to find someone?on the existing core dev team to?support the > majors). This can be a bit tricky, as the users of a platform may not > have the skills to maintain the builds, but it seems fair enough to > only support platforms that someone cares enough about to do the work. > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice > 7600 Sand Point Way NE ??(206) 526-6329?? fax > Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception > > Chris.Barker at noaa.gov Just an empty response since this ended up in my spam filter, and I am probably not the only one. Matti From tcaswell at gmail.com Mon Aug 9 11:33:19 2021 From: tcaswell at gmail.com (Thomas Caswell) Date: Mon, 9 Aug 2021 11:33:19 -0400 Subject: [Numpy-discussion] reducing effort spent on wheel builds? In-Reply-To: <065ed152-60a6-bab4-fed6-e83262c0ef09@gmail.com> References: <65a423a1-c126-2dc9-426d-44c3292683f5@gmail.com> <065ed152-60a6-bab4-fed6-e83262c0ef09@gmail.com> Message-ID: I am pretty -1 on removing sdists. At least for Matplotlib we have a number of users on AIX who rely on the the sdists (I know they exist because when our build breaks they send us patches to un-break it) for their installation. I strongly suspect that numpy also has a fair number of users who are willing and able to compile from source on niche systems and it does seem like a good idea to me to break them. I like Ralf's proposal, but would amend it with "unless someone stands up and puts their name on ensuring the wheels exist for platform " with the understanding that if they have to step away support for that platform stops until someone else is willing to put their name on it. This may be a good candidate for the new SPEC process to handle? Just like supported versions of Python, we should have consistent platform support (and processes for how we decide on / provide that support) across the community (thinking with my Matplotlib and h5py hats here). Tom On Mon, Aug 9, 2021 at 9:51 AM Matti Picus wrote: > > On 16/7/21 9:11 pm, Chris Barker wrote: > > Just a note on: > > > > > For the record, I am +1 on removing sdists from PyPI until pip changes > > its default to --only-binary :all: [1] > > > > I agree that the defaults for pip are unfortunate (and indeed the > > legacy of pip doing, well, a lot, (i.e. building and installing and > > package managing and dependencies, and ...) with one interface. > > > > However, There's a long tradition of sdists on PyPi -- and PyPi is > > used, for the most part, as the source of sdists for other systems > > (conda-forge for example). I did just check, and numpy is an exception > > -- it's pointing to gitHub: > > > > source: > > url: https://github.com/numpy/numpy/releases/download/v{{ > > version > > }}/numpy-{{ version }}.tar.gz > > > > But others may be counting on sdists on PyPi. > > > > Also, an sdist is not always the same as a gitHub release -- there is > > some "magic" in building it -- it's not just a copy of the repo. > > Again, numpy may be building its releases as an sdist (or it just > > doesn't. matter), but something to keep in mind. > > > > Another thought is to only support platforms that have a > > committed maintainer -- I think that's how Python itself does it. The > > more obscure platforms are only supported if someone steps up to > > support them (I suppose that's technically true for all platforms, but > > not hard to find someone on the existing core dev team to support the > > majors). This can be a bit tricky, as the users of a platform may not > > have the skills to maintain the builds, but it seems fair enough to > > only support platforms that someone cares enough about to do the work. > > > > -CHB > > > > > > -- > > > > Christopher Barker, Ph.D. > > Oceanographer > > > > Emergency Response Division > > NOAA/NOS/OR&R (206) 526-6959 voice > > 7600 Sand Point Way NE (206) 526-6329 fax > > Seattle, WA 98115 (206) 526-6317 main reception > > > > Chris.Barker at noaa.gov > > > Just an empty response since this ended up in my spam filter, and I am > probably not the only one. > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Thomas Caswell tcaswell at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Aug 10 14:06:18 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 10 Aug 2021 13:06:18 -0500 Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage Focus Message-ID: <3c11e7613f873f4d28d91a83dc8f76d023ac0b15.camel@sipsolutions.net> Hi all, Our bi-weekly triage-focused NumPy development meeting is Wednesday, August 11th at 9 am Pacific Time (16:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized, discussed, or reviewed. Best regards Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Tue Aug 10 14:32:30 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 10 Aug 2021 12:32:30 -0600 Subject: [Numpy-discussion] NumPy Python 3.10.0rc1 wheels. Message-ID: Hi All, There are now NumPy Python 3.10.0rc1 wheels available for 64 bit Linux on Intel. You can install them from the nightly builds using "pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy". Test away. Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Tue Aug 10 19:00:11 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Tue, 10 Aug 2021 20:00:11 -0300 Subject: [Numpy-discussion] Newcomer's Meeting Aug 12: special accessibility mini-sprint! (Part 2) Message-ID: Hey folks! This week we are repeating our low-code accessibility mini-sprint on August 12, at 4pm UTC. Last time we had a few people participate - you can see the work they did here: https://github.com/melissawm/numpy/pull/27 If you are interested, we'd be happy to help you get started. As usual, all are welcome - feel free to join even if you just want to observe! To join on Zoom, use this link: https://zoom.us/j/6345425936 Hope to see you around! - Melissa ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Newcomer%27s+Meeting+-+Accessibility+Mini+Sprint+part+2&iso=20210812T16&p1=1440&ah=1 *** You can add the NumPy community calendar to your Google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 ---------- Forwarded message --------- De: Melissa Mendon?a Date: seg, 26 de jul de 2021 17:03 Subject: Newcomer's Meeting July 29: special accessibility mini-sprint! To: Discussion of Numerical Python Hello, folks! This week we have our planned Newcomer's meeting on July 29, at 8PM UTC. This time, we are teaming up with Tony Fast and Isabela Presedo-Floyd to propose a low-code mini-sprint focusing on accessibility. This is a great opportunity to make your first contribution to NumPy! We will be focusing on writing alt-text for images in our documentation. Alt-text provides a textual alternative to non-text content in web pages, and as cited in this WebAIM document [1]: "Alternative text serves several functions: - It is read by screen readers in place of images allowing the content and function of the image to be accessible to those with visual or certain cognitive disabilities. - It is displayed in place of the image in browsers if the image file is not loaded or when the user has chosen not to view images. - It provides a semantic meaning and description to images which can be read by search engines or be used to later determine the content of the image from page context alone." You can find some more information on how to add alt-text in [2]. As usual, all are welcome - feel free to join even if you just want to observe! To join on Zoom, use this link: https://zoom.us/j/6345425936 Hope to see you around! - Melissa [1] https://webaim.org/techniques/alttext/ [2] https://www.w3.org/WAI/tutorials/images/decision-tree/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 12 10:57:30 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 12 Aug 2021 08:57:30 -0600 Subject: [Numpy-discussion] Matti Picus and Sebastian Berg added to org-admin Message-ID: Hi All, Matti Picus and Sebastian Berg have been added to the org-admin team on github. That will allow them to help maintain the github site going forward. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 12 21:41:12 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 12 Aug 2021 19:41:12 -0600 Subject: [Numpy-discussion] Dropping Python 3.7 for NumPy 1.22 Message-ID: Hi All, This is to propose dropping Python 3.7 for NumPy 1.22. Doing so will allow merging the array API standard (keyword only arguments), simplify removing import time compiles , allow making 64 bit pickles the default, and bring annotations closer to current. NEP 29 suggests Dec 26, 2021 for the drop date, which is close to the likely 1.22 release date, but given the advantages of dropping 3.7 I think there are good reasons to cheat by a week or two if needed. Downstream projects releasing after 1.22 will probably want to drop Python 3.7 anyway, as they will be past the deadline. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Aug 13 05:10:21 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 13 Aug 2021 11:10:21 +0200 Subject: [Numpy-discussion] Dropping Python 3.7 for NumPy 1.22 In-Reply-To: References: Message-ID: On Fri, Aug 13, 2021 at 3:41 AM Charles R Harris wrote: > Hi All, > > This is to propose dropping Python 3.7 for NumPy 1.22. Doing so will allow > merging the array API standard (keyword > only arguments), simplify removing import time compiles > , allow making 64 bit pickles > the default, and bring annotations closer to current. NEP 29 > suggests Dec > 26, 2021 for the drop date, which is close to the likely 1.22 release date, > but given the advantages of dropping 3.7 I think there are good reasons to > cheat by a week or two if needed. Downstream projects releasing after 1.22 > will probably want to drop Python 3.7 anyway, as they will be past the > deadline. > > Thoughts? > +1 from me. Python 3.10 is out by then, and supporting three Python releases for 1.22 seems fine. And the benefits are interesting. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From randallpittman at outlook.com Fri Aug 13 11:26:08 2021 From: randallpittman at outlook.com (Randall Pittman) Date: Fri, 13 Aug 2021 15:26:08 +0000 Subject: [Numpy-discussion] Idea: Simplify passing of shared memory info for ndarrays Message-ID: Hi! First time poster to the list, long time happy user of NumPy (thank you, devs!!) I created a bit of a wrapper for using `ndarray` with Py 3.8+ `multiprocessing.shared_memory`. I really like the ability to use `SharedMemory` with `ndarray` via the `buffer` arg. (as in https://docs.python.org/3/library/multiprocessing.shared_memory.html#multiprocessing.shared_memory.SharedMemory) However, it seemed a bit clunky to worry about passing the SharedMemory object or name, dtype, and shape to reconstruct the `ndarray` in other processes. I came up with this SharedNDArray class that encapsulates that information and provides an ephemeral `ndarray` interface: https://gitlab.com/osu-nrsg/shared-ndarray2. It's especially meant for use in a `SharedMemoryManager` context manager. I shared it on r/Python and someone suggested I share it here for your input (and maybe even integration into NumPy rather than the lone-wolf project it currently is). I'm happy to receive any suggestions/input/criticism. As an aside, I discovered that it's not at all trivial and, at the moment, not technically possible to define all the possible typing overloads of ndarray `__setitem__` and `__getitem__`. Stll, I enjoyed making use of np.typing.NDArray. Thanks for your consideration, Randy From melissawm at gmail.com Sat Aug 14 15:25:07 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Sat, 14 Aug 2021 16:25:07 -0300 Subject: [Numpy-discussion] Documentation Team meeting - Monday August 16 In-Reply-To: References: Message-ID: Hi all! Our next Documentation Team meeting will be tomorrow - *Monday, August 16* at ***4PM UTC***. All are welcome - you don't need to already be a contributor to join. If you have questions or are curious about what we're doing, we'll be happy to meet you! If you wish to join on Zoom, use this link: https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success Here's the permanent hackmd document with the meeting notes (still being updated in the next few days!): https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210816T16&p1=1440&ah=1 *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Aug 14 16:26:47 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 14 Aug 2021 14:26:47 -0600 Subject: [Numpy-discussion] Drop LGTM testing. Message-ID: Hi All, LGTM on github uses Python 3.7, which causes a problem if we drop 3.7 support. LGTM is nice for pointing to possible code improvements, but we mostly ignore it anyway. There are probably standalone code analysers that would serve our needs as well, so dropping it seems the easiest way forward. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Sat Aug 14 16:34:12 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sat, 14 Aug 2021 21:34:12 +0100 Subject: [Numpy-discussion] Drop LGTM testing. In-Reply-To: References: Message-ID: This might be worth creating a github issue for simply so we can tag someone working at LGTM; they've been helpful in the past, and it's possible we just need to fiddle with some configuration to make it work. It's also worth noting that LGTM runs against C code too; so even if we disable it for python, it might be worth keeping around for C. Eric On Sat, 14 Aug 2021 at 21:27, Charles R Harris wrote: > Hi All, > > LGTM on github uses Python 3.7, which causes a problem if we drop 3.7 > support. LGTM is nice for pointing to possible code improvements, but we > mostly ignore it anyway. There are probably standalone code analysers that > would serve our needs as well, so dropping it seems the easiest way forward. > > Thoughts? > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Aug 14 17:15:40 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 14 Aug 2021 15:15:40 -0600 Subject: [Numpy-discussion] Drop LGTM testing. In-Reply-To: References: Message-ID: On Sat, Aug 14, 2021 at 2:35 PM Eric Wieser wrote: > This might be worth creating a github issue for simply so we can tag > someone working at LGTM; they've been helpful in the past, and it's > possible we just need to fiddle with some configuration to make it work. > > It's also worth noting that LGTM runs against C code too; so even if we > disable it for python, it might be worth keeping around for C. > It's the C code that causes problems, LGTM builds the code with `python3 setup.py` and setup.py has a check for the Python version. There is no method to disable the C checks from the Github app and no method to specify Python version beyond 2 or 3. I'd be happy to tag someone at LGTM, but I don't know who that would be. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sat Aug 14 17:39:32 2021 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 15 Aug 2021 00:39:32 +0300 Subject: [Numpy-discussion] Drop LGTM testing. In-Reply-To: References: Message-ID: <8bdbfea8-32cb-6098-2e57-87de8319b04c@gmail.com> On 15/8/21 12:15 am, Charles R Harris wrote: > > > On Sat, Aug 14, 2021 at 2:35 PM Eric Wieser > > > wrote: > > This might be worth creating a github issue for simply so we can > tag someone working at LGTM; they've been helpful in the past, and > it's possible we just need to fiddle with some configuration to > make it work. > > It's also worth noting that LGTM runs against C code too; so even > if we disable it for python, it might be worth keeping around for C. > > > It's the C code that causes problems, LGTM builds the code with > `python3 setup.py` and setup.py has a check for the Python version. > There is no method to disable the C checks from the Github app and no > method to specify Python version beyond 2 or 3. > > I'd be happy to tag someone at LGTM, but I don't know who that would?be. > > Chuck > Personally, I would prefer we drop it. I do not recall an instance when we related to its results in a PR review in the last year, so for me it is just one more thing that breaks randomly. As it is our PR review and CI support bandwidth is limited, so I would prefer to optimize for that bottleneck. Matti From wieser.eric+numpy at gmail.com Sat Aug 14 17:43:03 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sat, 14 Aug 2021 22:43:03 +0100 Subject: [Numpy-discussion] Drop LGTM testing. In-Reply-To: References: Message-ID: > I'd be happy to tag someone at LGTM, but I don't know who that would be. I've tagged @jhelie in the past with some success, although it's been a while so they may no longer be employed there! Eric On Sat, 14 Aug 2021 at 22:16, Charles R Harris wrote: > > > On Sat, Aug 14, 2021 at 2:35 PM Eric Wieser > wrote: > >> This might be worth creating a github issue for simply so we can tag >> someone working at LGTM; they've been helpful in the past, and it's >> possible we just need to fiddle with some configuration to make it work. >> >> It's also worth noting that LGTM runs against C code too; so even if we >> disable it for python, it might be worth keeping around for C. >> > > It's the C code that causes problems, LGTM builds the code with `python3 > setup.py` and setup.py has a check for the Python version. There is no > method to disable the C checks from the Github app and no method to specify > Python version beyond 2 or 3. > > I'd be happy to tag someone at LGTM, but I don't know who that would be. > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Aug 15 16:41:02 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 15 Aug 2021 14:41:02 -0600 Subject: [Numpy-discussion] NumPy 1.21.2 released Message-ID: Hi All, On behalf of the NumPy team I am pleased to announce the release of NumPy 1.21.2. NumPy 1.21.2 is a maintenance release that fixes bugs discovered after 1.21.1. It also provides 64 bit manylinux Python 3.10.0rc1 wheels for downstream testing. Note that Python 3.10 is not yet final. There is also preliminary support for Windows on ARM64 builds, but there is no OpenBLAS for that platform and no wheels are available. The Python versions supported for this release are 3.7-3.9. The 1.21.2 release is compatible with Python 3.10.0rc1 and Python 3.10 will be officially supported after it is released. The previous problems with gcc-11.1 have been fixed by gcc-11.2, check your version if you are using gcc-11. Wheels can be downloaded from PyPI ; source archives, release notes, and wheel hashes are available on Github . Linux users will need pip >= 0.19.3 in order to install manylinux2010 and manylinux2014 wheels. *Contributors* A total of 10 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - Bas van Beek - Carl Johnsen + - Charles Harris - Gwyn Ciesla + - Matthieu Dartiailh - Matti Picus - Niyas Sait + - Ralf Gommers - Sayed Adel - Sebastian Berg *Pull requests merged* A total of 18 pull requests were merged for this release. - #19497: MAINT: set Python version for 1.21.x to ``<3.11`` - #19533: BUG: Fix an issue wherein importing ``numpy.typing`` could raise - #19646: MAINT: Update Cython version for Python 3.10. - #19648: TST: Bump the python 3.10 test version from beta4 to rc1 - #19651: TST: avoid distutils.sysconfig in runtests.py - #19652: MAINT: add missing dunder method to nditer type hints - #19656: BLD, SIMD: Fix testing extra checks when ``-Werror`` isn't applicable... - #19657: BUG: Remove logical object ufuncs with bool output - #19658: MAINT: Include .coveragerc in source distributions to support... - #19659: BUG: Fix bad write in masked iterator output copy paths - #19660: ENH: Add support for windows on arm targets - #19661: BUG: add base to templated arguments for platlib - #19662: BUG,DEP: Non-default UFunc signature/dtype usage should be deprecated - #19666: MAINT: Add Python 3.10 to supported versions. - #19668: TST,BUG: Sanitize path-separators when running ``runtest.py`` - #19671: BLD: load extra flags when checking for libflame - #19676: BLD: update circleCI docker image - #19677: REL: Prepare for 1.21.2 release. Cheers Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Aug 16 12:47:11 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Aug 2021 10:47:11 -0600 Subject: [Numpy-discussion] OS X universal2 wheels Message-ID: Hi All, I note that the numpy universal wheels for Mac are marked "10_9" whereas the wheels for Arm64 are marked '11_0'. Does that need to be fixed? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Aug 16 13:02:41 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 16 Aug 2021 19:02:41 +0200 Subject: [Numpy-discussion] OS X universal2 wheels In-Reply-To: References: Message-ID: On Mon, Aug 16, 2021 at 6:47 PM Charles R Harris wrote: > Hi All, > > I note that the numpy universal wheels for Mac are marked "10_9" whereas > the wheels for Arm64 are marked '11_0'. Does that need to be fixed? > Arm64 hardware requires macOS >= 11.0 while Intel hardware works with 10.x and 11.x, so it looks fine to me as is. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Aug 18 00:32:50 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 17 Aug 2021 23:32:50 -0500 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday Message-ID: Hi all, There will be a NumPy Community meeting Wednesday August 18th at 20:00 UTC. Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From Jerome.Kieffer at esrf.fr Wed Aug 18 09:14:34 2021 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Wed, 18 Aug 2021 15:14:34 +0200 Subject: [Numpy-discussion] Floating point precision expectations in NumPy In-Reply-To: References: <3f6563789d2b4ef963c4407e187e1797866eeea5.camel@sipsolutions.net> Message-ID: <20210818151434.7ee432cb@lintaillefer.esrf.fr> I strongly agree with you Gregor: * Best precision should remain the default. I lost months in finding the compiler option (in ICC) which switched to LA mode and broke all my calculations. * I wonder how those SVML behaves on non-intel plateform ? Sleef provides the same approach but it works also on Power and ARM platforms (and is designed to be extended...). Cheers, Jerome On Wed, 28 Jul 2021 12:13:44 +0200 Gregor Thalhammer wrote: > > Am 28.07.2021 um 01:50 schrieb Sebastian Berg : > > > > Hi all, > > > > there is a proposal to add some Intel specific fast math routine to > > NumPy: > > > > https://github.com/numpy/numpy/pull/19478 > > Many years ago I wrote a package > https://github.com/geggo/uvml > that makes the VML, a fast implementation of transcendetal math functions, available for numpy. Don?t know if it still compiles. > It uses Intel VML, designed for processing arrays, not the SVML intrinsics. By this it is less machine dependent (optimized implementations are selected automatically depending on the availability of, e.g., SSE, AVX, or AVX512), just link to a library. It compiles as an external module, can be activated at runtime. > > Different precision models can be selected at runtime (globally). I thinks Intel advocates to use the LA (low accuracy) mode as a good compromise between performance and accuracy. Different people have strongly diverging opinions about what to expect. > > The speedups possibly gained by these approaches often vaporize in non-benchmark applications, as for those functions performance is often limited by memory bandwidth, unless all your data stays in CPU cache. By default I would go for high accuracy mode, with option to switch to low accuracy if one urgently needs the better performance. But then one should use different approaches for speeding up numpy. > > Gregor > > > > > > part of numerical algorithms is that there is always a speed vs. > > precision trade-off, giving a more precise result is slower. > > > > So there is a question what the general precision expectation should be > > in NumPy. And how much is it acceptable to diverge in the > > precision/speed trade-off depending on CPU/system? > > > > I doubt we can formulate very clear rules here, but any input on what > > precision you would expect or trade-offs seem acceptable would be > > appreciated! > > > > > > Some more details > > ----------------- > > > > This is mainly interesting e.g. for functions like logarithms, > > trigonometric functions, or cubic roots. > > > > Some basic functions (multiplication, addition) are correct as per IEEE > > standard and give the best possible result, but these are typically > > only correct within very small numerical errors. > > > > This is typically measured as "ULP": > > > > https://en.wikipedia.org/wiki/Unit_in_the_last_place > > > > where 0.5 ULP would be the best possible result. > > > > > > Merging the PR may mean relaxing the current precision slightly in some > > places. In general Intel advertises 4 ULP of precision (although the > > actual precision for most functions seems better). > > > > > > Here are two tables, one from glibc and one for the Intel functions: > > > > https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html > > (Mainly the LA column) https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vmperfdata/top/real-functions/measured-accuracy-of-all-real-vm-functions.html > > > > > > Different implementation give different accuracy, but formulating some > > guidelines/expectation (or referencing them) would be useful guidance. > > > > For basic > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -- J?r?me Kieffer tel +33 476 882 445 From mikofski at berkeley.edu Wed Aug 18 11:51:44 2021 From: mikofski at berkeley.edu (Dr. Mark Alexander Mikofski PhD) Date: Wed, 18 Aug 2021 08:51:44 -0700 Subject: [Numpy-discussion] [JOB] modeling engineer at Form Energy Message-ID: https://jobs.lever.co/formenergy/672c9b68-fc0e-46ff-b430-7a65e829ab8a Role Description We are looking for a Staff Modeling Engineer to join our systems team to build energy storage system performance models which will be used as the basis for designing our multi-day storage product. The Staff Modeling Engineer will be responsible for engaging with the battery team, the systems team, and the market analytics team to develop modeling tools which capture the electrochemical performance of our product under real world operating conditions, equipping those teams with the tools they need to make trade decisions between cost, performance, and durability. This engineer is expected to collaborate with the engineers developing the mechanistic battery models but need not be an electrochemist. The engineer is expected to have experience with systems modelling at a high level and proven abilities for abstracting and simplifying complex systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerry.morrison+numpy at gmail.com Thu Aug 19 03:12:42 2021 From: jerry.morrison+numpy at gmail.com (Jerry Morrison) Date: Thu, 19 Aug 2021 00:12:42 -0700 Subject: [Numpy-discussion] Floating point precision expectations in NumPy In-Reply-To: References: <3f6563789d2b4ef963c4407e187e1797866eeea5.camel@sipsolutions.net> Message-ID: On Fri, Jul 30, 2021 at 12:22 PM Sebastian Berg wrote: > On Fri, 2021-07-30 at 11:04 -0700, Jerry Morrison wrote: > > On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg < > > sebastian at sipsolutions.net> > > wrote: > > > > > Hi all, > > > > > > there is a proposal to add some Intel specific fast math routine to > > > NumPy: > > > > > > https://github.com/numpy/numpy/pull/19478 > > > > > > part of numerical algorithms is that there is always a speed vs. > > > precision trade-off, giving a more precise result is slower. > > > > > > > > "Close enough" depends on the application but non-linear models can > > get the > > "butterfly effect" where the results diverge if they aren't > > identical. > > > Right, so my hope was to gauge what the general expectation is. I take > it you expect a high accuracy. > > The error for the computations itself is seems low on first sight, but > of course they can explode quickly in non-linear settings... > (In the chaotic systems I worked with, the shadowing theorem would > usually alleviate such worries. And testing the integration would be > more important. But I am sure for certain questions things may be far > more tricky.) > I'll put forth an expectation that after installing a specific set of libraries, the floating point results would be identical across platforms and into the future. Ideally developers could install library updates (for hardware compatibility, security fixes, or other reasons) and still get identical results. That expectation is for reproducibility, not high accuracy. So it'd be fine to install different libraries [or maybe use those pip package options in brackets, whatever they do?] to trade accuracy for speed. Could any particular choice of accuracy still provide reproducible results across platforms and time? > > For a certain class of scientific programming applications, > > reproducibility > > is paramount. > > > > Development teams may use a variety of development laptops, > > workstations, > > scientific computing clusters, and cloud computing platforms. If the > > tests > > pass on your machine but fail in CI, you have a debugging problem. > > > > If your published scientific article links to source code that > > replicates > > your computation, scientists will expect to be able to run that code, > > now > > or in a couple decades, and replicate the same outputs. They'll be > > using > > different OS releases and maybe different CPU + accelerator > > architectures. > > > > Reproducible Science is good. Replicated Science is better. > > > > > > Clearly there are other applications where it's easy to trade > > reproducibility and some precision for speed. > > > Agreed, although there are so many factors, often out of our control, > that I am not sure that true replicability is achievable without > containers :(. > > It would be amazing if NumPy could have a "replicable" mode, but I am > not sure how that could be done, or if the "ground work" in the math > and linear algebra libraries even exists. > > > However, even if it is practically impossible to make things > replicable, there is an argument for improving reproducibility and > replicability, e.g. by choosing the high-accuracy version here. Even > if it is impossible to actually ensure. > Yes! Let's at least have reproducibility in mind and work on improving it, e.g. by removing failure modes. (Ditto for security :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rockwizard2001 at gmail.com Thu Aug 19 11:15:17 2021 From: rockwizard2001 at gmail.com (Bhavay Malhotra) Date: Thu, 19 Aug 2021 20:45:17 +0530 Subject: [Numpy-discussion] Adding New Feature Message-ID: Dear Team, I?m thinking of adding a new feature in response to the issue no. #19039. The feature is basically a function to check whether the data type of both the numpy arrays are same or not. If the numpy arrays have different data types function return a False else it returns a True. Please consider my feature and reply appropriately so that I can send my PR accordingly. Waiting to see a prompt reply. Thanking You. Regards, Bhavay -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Thu Aug 19 11:38:21 2021 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 19 Aug 2021 18:38:21 +0300 Subject: [Numpy-discussion] Adding New Feature In-Reply-To: References: Message-ID: <253cacb9-4d1c-614d-3fd9-595b9ddd4f31@gmail.com> On 19/8/21 6:15 pm, Bhavay Malhotra wrote: > Dear Team, > > I?m thinking of adding a new feature in response to the issue no. #19039. > > The feature is basically a function to check whether the data type of > both the numpy arrays are same or not. > > If the numpy arrays have different data types function return a False > else it returns a True. > > > Thanking You. > > Regards, > > Bhavay > As we discussed on the issue https://github.com/numpy/numpy/issues/19039, Is there a use-case where |"b.dtype == c.dtype"| would not suffice? Matti From sseibert at anaconda.com Thu Aug 19 14:18:27 2021 From: sseibert at anaconda.com (Stanley Seibert) Date: Thu, 19 Aug 2021 13:18:27 -0500 Subject: [Numpy-discussion] Floating point precision expectations in NumPy In-Reply-To: References: <3f6563789d2b4ef963c4407e187e1797866eeea5.camel@sipsolutions.net> Message-ID: On Thu, Aug 19, 2021 at 2:13 AM Jerry Morrison < jerry.morrison+numpy at gmail.com> wrote: > > I'll put forth an expectation that after installing a specific set of > libraries, the floating point results would be identical across platforms > and into the future. Ideally developers could install library updates (for > hardware compatibility, security fixes, or other reasons) and still get > identical results. > > That expectation is for reproducibility, not high accuracy. So it'd be > fine to install different libraries [or maybe use those pip package options > in brackets, whatever they do?] to trade accuracy for speed. Could any > particular choice of accuracy still provide reproducible results across > platforms and time? > While this would be nice, in practice bit-identical results for floating point NumPy functions across different operating systems and future time is going to be impractical to achieve. IEEE-754 helps by specifying the result of basic floating point operations, but once you move into special math functions (like cos()) or other algorithms that can be implemented in several "mathematically equivalent" ways, bit-level stability basically becomes impossible without snapshotting your entire software stack. Many of these special math functions are provided by the operating system, which generally do not make such guarantees. Quick example: Suppose you want to implement sum() on a floating point array. If you start at the beginning of the array and iterate to the end, adding each element to an accumulator, you will get one answer. If you do mathematically equivalent pairwise summations (using a temporary array for storage), you will get a different, and probably more accurate answer. Neither answer will (in general) be the same as summing those numbers together with infinite precision, then rounding to the closest floating point number at the end. We could decide to make the specification for sum() also specify the algorithm for computing sum() to ensure we make the same round-off errors every time. However, this kind of detailed specification might be harder to write for other functions, or might even lock the library into accuracy bugs that can't be fixed in the future. I think the most pragmatic thing you can hope for is: - Bit-identical results with containers that snapshot everything, including the system math library. - Libraries that specify their accuracy levels when possible, and disclose when algorithm changes will affect the bit-identicalness of results. On a meta-level, if analysis conclusions depend on getting bit-identical results from floating point operations, then you really want to use a higher precision float and/or an algorithm less sensitive to round-off error. Floating point numbers are not real numbers. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Tue Aug 24 18:07:34 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Tue, 24 Aug 2021 19:07:34 -0300 Subject: [Numpy-discussion] Newcomer's meeting: August 26, 8PM UTC In-Reply-To: References: Message-ID: Hi all! Our next Newcomer's Meeting is on Thursday, *August 26, at 8pm UTC.* This is an informal meeting with no agenda to ask questions, get to know other people and (hopefully) figure out ways to contribute to NumPy. Feel free to join if you are lurking around but found it hard to start contributing - we'll do our best to support you. If you wish to join on Zoom, use this link: https://zoom.us/j/6345425936 Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Newcomer%27s+Meeting&iso=20210826T20&p1=1440&ah=1 *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Aug 25 00:03:50 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 24 Aug 2021 23:03:50 -0500 Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage Focus Message-ID: Hi all, Our bi-weekly triage-focused NumPy development meeting is Wednesday, August 25th at 9 am Pacific Time (16:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized, discussed, or reviewed. Best regards Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From serge.guelton at telecom-bretagne.eu Wed Aug 25 11:48:28 2021 From: serge.guelton at telecom-bretagne.eu (Serge Guelton) Date: Wed, 25 Aug 2021 17:48:28 +0200 Subject: [Numpy-discussion] A bite of C++ Message-ID: <20210825154828.GA28076@sguelton.remote.csb> Hi folks, https://github.com/numpy/numpy/pull/19713 showcases what *could* be a first step toward getting rid of generated C code within numpy, in favor of some C++ code, coupled with a single macro trick. Basically, templated code is an easy and robust way to replace generated code (the C++ compiler becomes the code generator when instantiating code), and a single X-macro takes care of the glue with the C world. Some changes in distutils were needed to cope with C++-specific flags, and extensions that consist in mixed C and C++ code. I've kept the change as minimal as possible to ease the (potential) transition and keep the C++ code close to the C code. This led to less idiomatic C++ code, but I value a "correct first" approach. There's an on-going effort by seiko2plus to remove that C layer, I acknowledge this would bring even more C++ code, but that looks orthogonal to me (and a very good second step!) All lights are green for the PR, let's assume it's a solid ground for discussion :-) So, well, what do you think? Should we go forward? Potential follow-ups : - do we want to use -nostdlib, to be sure we don't bring any C++ runtime dep? - what about -fno-exception, -fno-rtti? - coding style? - (I'm-not-a-farseer-I-don-t-know-all-topics) From sebastian at sipsolutions.net Wed Aug 25 18:50:49 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 25 Aug 2021 17:50:49 -0500 Subject: [Numpy-discussion] A bite of C++ In-Reply-To: <20210825154828.GA28076@sguelton.remote.csb> References: <20210825154828.GA28076@sguelton.remote.csb> Message-ID: On Wed, 2021-08-25 at 17:48 +0200, Serge Guelton wrote: > Hi folks, > > https://github.com/numpy/numpy/pull/19713?showcases what *could* be a > first step > toward getting rid of generated C code within numpy, in favor of some > C++ code, > coupled with a single macro trick. > > Basically, templated code is an easy and robust way to replace > generated code > (the C++ compiler becomes the code generator when instantiating > code), and a > single X-macro takes care of the glue with the C world. I am not a C++ export, and really have to get to used to this code. So I would prefer if some C++ experts can look at it and give feedback. This will be a bit harder to read for me than our `.c.src` code for a while. But on the up-side, I am frustrated by my IDE not being able to deal with the `c.src` templating. One reaction reading the X-macro trick is that I would be more comfortable with a positive list rather than block-listing. It just felt a bit like too much magic and I am not sure how good it is to assume we usually want to export everything (for one, datetimes are pretty special). Even if it is verbose, I would not mind if we just list everything, so long we have short-hands for all-integers, all-float, all-inexact, etc. > > Some changes in distutils were needed to cope with C++-specific > flags, and > extensions that consist in mixed C and C++ code. > Potential follow-ups : > > - do we want to use -nostdlib, to be sure we don't bring any C++ > runtime dep? What does this mean for compatibility? It sounds reasonable to me for now if it increases systems we can run on, but I really don't know. > - what about -fno-exception, -fno-rtti? How do C++ exceptions work at run-time? What if I store a C++ function pointer that raises an exception and use it from a C program? Does it add run-time overhead, do we need that `no-exception` to define that our functions are actually C "calling convention" in this regard?! Run-time calling convention changes worry me, because I am not sure C++ exception have a place in the current or even future ABI. All our current API use a `-1` return value for exceptions. This is just like Python's type slots, so there must be "off the shelve" approaches for this? Embracing C++ exceptions seems a bit steep to me right now, unless I am missing something awesome? I will note that a lot of the functions that we want to template like this, are ? and should be ? accessible as public API (i.e. you can ask NumPy to give you the function pointer). Cheers, Sebastian > - coding style? > - (I'm-not-a-farseer-I-don-t-know-all-topics) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Thu Aug 26 07:37:42 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 26 Aug 2021 13:37:42 +0200 Subject: [Numpy-discussion] A bite of C++ In-Reply-To: References: <20210825154828.GA28076@sguelton.remote.csb> Message-ID: On Thu, Aug 26, 2021 at 12:51 AM Sebastian Berg wrote: > On Wed, 2021-08-25 at 17:48 +0200, Serge Guelton wrote: > > > Potential follow-ups : > > > > - do we want to use -nostdlib, to be sure we don't bring any C++ > > runtime dep? > > What does this mean for compatibility? It sounds reasonable to me for > now if it increases systems we can run on, but I really don't know. > The only platform where we'd need to bundle in a runtime is Windows I believe. Here's what we do for SciPy: https://github.com/MacPython/scipy-wheels/blob/72cb8ab580ed5ca1b95eb60243fef4284ccc52b0/LICENSE_win32.txt#L125 . That is indeed a bit of a pain and hard to test, so if we can get away with not doing that by adding `-nostdlib`, that sounds great. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From stigkorsnes at gmail.com Thu Aug 26 14:20:53 2021 From: stigkorsnes at gmail.com (Stig Korsnes) Date: Thu, 26 Aug 2021 20:20:53 +0200 Subject: [Numpy-discussion] SeedSequence.spawn() Message-ID: Hi, Is there a way to uniquely spawn child seeds? I`m doing monte carlo analysis, where I have n random processes, each with their own generator. All process models instantiate a generator with default_rng(). I.e ss=SeedSequence() cs=ss.Spawn(n), and using cs[i] for process i. Now, the problem I`m facing, is that results using individual process depends on the order of the process initialization ,and the number of processes used. However, if I could spawn children with a unique identifier, I would be able to reproduce my individual results without having to pickle/log states. For example, all my models have an id (tuple) field which is hashable. If I had the ability to SeedSequence(x).Spawn([objects]) where objects support hash(object), I would have reproducibility for all my processes. I could do without the spawning, but then I would probably loose independence when I do multiproc? Is there a way to achieve my goal in the current version 1.21 of numpy? Best Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From serge.guelton at telecom-bretagne.eu Thu Aug 26 16:44:19 2021 From: serge.guelton at telecom-bretagne.eu (Serge Guelton) Date: Thu, 26 Aug 2021 22:44:19 +0200 Subject: [Numpy-discussion] A bite of C++ In-Reply-To: References: <20210825154828.GA28076@sguelton.remote.csb> Message-ID: <20210826204419.GA27017@sguelton.remote.csb> On Wed, Aug 25, 2021 at 05:50:49PM -0500, Sebastian Berg wrote: > On Wed, 2021-08-25 at 17:48 +0200, Serge Guelton wrote: > > Hi folks, > > > > https://github.com/numpy/numpy/pull/19713?showcases what *could* be a > > first step > > toward getting rid of generated C code within numpy, in favor of some > > C++ code, > > coupled with a single macro trick. > > > > Basically, templated code is an easy and robust way to replace > > generated code > > (the C++ compiler becomes the code generator when instantiating > > code), and a > > single X-macro takes care of the glue with the C world. Hi Sebastian and thanks for the feedback. > I am not a C++ export, and really have to get to used to this code. So > I would prefer if some C++ experts can look at it and give feedback. I don't know if I'm a C++ expert, but I've a decent background with that language. I'll try to give as much clarification as I can. > This will be a bit harder to read for me than our `.c.src` code for a > while. But on the up-side, I am frustrated by my IDE not being able to > deal with the `c.src` templating. > One reaction reading the X-macro trick is that I would be more > comfortable with a positive list rather than block-listing. It just > felt a bit like too much magic and I am not sure how good it is to > assume we usually want to export everything (for one, datetimes are > pretty special). > > Even if it is verbose, I would not mind if we just list everything, so > long we have short-hands for all-integers, all-float, all-inexact, etc. There has been similar comments on the PR, I've reverted to an explicit listing. > > > > Some changes in distutils were needed to cope with C++-specific > > flags, and > > extensions that consist in mixed C and C++ code. > > > > > Potential follow-ups : > > > > - do we want to use -nostdlib, to be sure we don't bring any C++ > > runtime dep? > > What does this mean for compatibility? It sounds reasonable to me for > now if it increases systems we can run on, but I really don't know. It basically means less packaging issues as one doesn't need to link with the standard C++ library. It doesn't prevent from using some headers, but remove some aspect of the language. If numpy wants to use C++ as a preprocessor on steorids, that's fine. If Numpy wants to embrace more of C++, it's a bad idea (e.g. no new operator) > > - what about -fno-exception, -fno-rtti? > > How do C++ exceptions work at run-time? What if I store a C++ function > pointer that raises an exception and use it from a C program? Does it > add run-time overhead, do we need that `no-exception` to define that > our functions are actually C "calling convention" in this regard?! Exception add runtime overhead and imply larger binaries. If an exception is raised at C++ level and not caught at C++ level, there going to unwind the whole C stack and then call a default handler that terminates the program. > Run-time calling convention changes worry me, because I am not sure C++ > exception have a place in the current or even future ABI. All our > current API use a `-1` return value for exceptions. > > This is just like Python's type slots, so there must be "off the > shelve" approaches for this? > > Embracing C++ exceptions seems a bit steep to me right now, unless I am > missing something awesome? I totally second your opinion. In the spirit of C++ as a preprocessor on steroids, I don't see why exception would be needed. > I will note that a lot of the functions that we want to template like > this, are ? and should be ? accessible as public API (i.e. you can ask > NumPy to give you the function pointer). As of now, I've kept the current C symbol names, which requires a thin foward to the C++ implementation. I would be glad to remove those, but I think it's a nice second step, something that could be done once the custom preprocessor has been removed. From robert.kern at gmail.com Thu Aug 26 16:57:50 2021 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 26 Aug 2021 16:57:50 -0400 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: On Thu, Aug 26, 2021 at 2:22 PM Stig Korsnes wrote: > Hi, > Is there a way to uniquely spawn child seeds? > I`m doing monte carlo analysis, where I have n random processes, each with > their own generator. > All process models instantiate a generator with default_rng(). I.e > ss=SeedSequence() cs=ss.Spawn(n), and using cs[i] for process i. Now, the > problem I`m facing, is that results using individual process depends on > the order of the process initialization ,and the number of processes used. > However, if I could spawn children with a unique identifier, I would be > able to reproduce my individual results without having to pickle/log > states. For example, all my models have an id (tuple) field which is > hashable. > If I had the ability to SeedSequence(x).Spawn([objects]) where objects > support hash(object), I would have reproducibility for all my processes. I > could do without the spawning, but then I would probably loose independence > when I do multiproc? Is there a way to achieve my goal in the current > version 1.21 of numpy? > I would probably not rely on `hash()` as it is only intended to be pretty good at getting distinct values from distinct inputs. If you can combine the tuple objects into a string of bytes in a reliable, collision-free way and use one of the cryptographic hashes to get them down to a 128bit number, that'd be ideal. `int(joblib.hash(key) , 16)` should do nicely. You can combine that with your main process's seed easily. SeedSequence can take arbitrary amounts of integer data and smoosh them all together. The spawning functionality builds off of that, but you can also just manually pass in lists of integers. Let's call that function `stronghash()`. Let's call your main process seed number `seed` (this is the thing that the user can set on the command-line or something you get from `secrets.randbits(128)` if you need a fresh one). Let's call the unique tuple `key`. You can build the `SeedSequence` for each job according to the `key` like so: root_ss = SeedSequence(seed) for key, data in jobs: child_ss = SeedSequence([stronghash(key), seed]) submit_job(key, data, seed=child_ss) Now each job will get its own unique stream regardless of the order the job is assigned. When the user reruns it with the same root `seed`, they will get the same results. When the user chooses a different `seed`, they will get another set of results (this is why you don't want to just use `SeedSequence(stronghash(key))` all by itself). I put the job-specific seed data ahead of the main program's seed to be on the super-safe side. The spawning mechanism will append integers to the end, so there's a super-tiny chance somewhere down a long line of `root_ss.spawn()`s that there would be a collision (and I mean super-extra-tiny). But best practices cost nothing. I hope that helps and is not too confusing! -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From stigkorsnes at gmail.com Fri Aug 27 05:03:04 2021 From: stigkorsnes at gmail.com (Stig Korsnes) Date: Fri, 27 Aug 2021 11:03:04 +0200 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: Thank you Robert! This scheme fits perfectly into what I`m trying to accomplish! :) The "smooshing" of ints by supplying a list of ints had eluded me. Thank you also for the pointer about built-in hash(). I would not be able to rely on it anyways, because it does not return strictly positive ints which SeedSequence requires. If you have a minute to spare: Could you briefly explain "int(joblib.hash(key) , 16)" , and would this always return non-negative integers? Thanks again! tor. 26. aug. 2021 kl. 22:59 skrev Robert Kern : > On Thu, Aug 26, 2021 at 2:22 PM Stig Korsnes > wrote: > >> Hi, >> Is there a way to uniquely spawn child seeds? >> I`m doing monte carlo analysis, where I have n random processes, each >> with their own generator. >> All process models instantiate a generator with default_rng(). I.e >> ss=SeedSequence() cs=ss.Spawn(n), and using cs[i] for process i. Now, the >> problem I`m facing, is that results using individual process depends on >> the order of the process initialization ,and the number of processes used. >> However, if I could spawn children with a unique identifier, I would be >> able to reproduce my individual results without having to pickle/log >> states. For example, all my models have an id (tuple) field which is >> hashable. >> If I had the ability to SeedSequence(x).Spawn([objects]) where objects >> support hash(object), I would have reproducibility for all my processes. I >> could do without the spawning, but then I would probably loose independence >> when I do multiproc? Is there a way to achieve my goal in the current >> version 1.21 of numpy? >> > > I would probably not rely on `hash()` as it is only intended to be pretty > good at getting distinct values from distinct inputs. If you can combine > the tuple objects into a string of bytes in a reliable, collision-free way > and use one of the cryptographic hashes to get them down to a 128bit > number, that'd be ideal. `int(joblib.hash(key) > , > 16)` should do nicely. You can combine that with your main process's seed > easily. SeedSequence can take arbitrary amounts of integer data and smoosh > them all together. The spawning functionality builds off of that, but you > can also just manually pass in lists of integers. > > Let's call that function `stronghash()`. Let's call your main process seed > number `seed` (this is the thing that the user can set on the command-line > or something you get from `secrets.randbits(128)` if you need a fresh one). > Let's call the unique tuple `key`. You can build the `SeedSequence` for > each job according to the `key` like so: > > root_ss = SeedSequence(seed) > for key, data in jobs: > child_ss = SeedSequence([stronghash(key), seed]) > submit_job(key, data, seed=child_ss) > > Now each job will get its own unique stream regardless of the order the > job is assigned. When the user reruns it with the same root `seed`, they > will get the same results. When the user chooses a different `seed`, they > will get another set of results (this is why you don't want to just use > `SeedSequence(stronghash(key))` all by itself). > > I put the job-specific seed data ahead of the main program's seed to be on > the super-safe side. The spawning mechanism will append integers to the > end, so there's a super-tiny chance somewhere down a long line of > `root_ss.spawn()`s that there would be a collision (and I mean > super-extra-tiny). But best practices cost nothing. > > I hope that helps and is not too confusing! > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Aug 27 10:59:52 2021 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 27 Aug 2021 10:59:52 -0400 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: joblib is a library that uses clever caching of function call results to make the development of certain kinds of data-heavy computational pipelines easier. In order to derive the key to be used to check the cache, joblib has to look at the arguments passed to the function, which may involve usually-nonhashable things like large numpy arrays. https://joblib.readthedocs.io/en/latest/ So they constructed joblib.hash() which basically takes the arguments, pickles them into a bytestring (with some implementation details), then computes an MD5 hash on that. It's probably overkill for your keys, but it's easily available and quite generic. It returns a hex-encoded string of the 128-bit MD5 hash. `int(..., 16)` will convert that to a non-negative (almost-certainly positive!) integer that can be fed into SeedSequence. On Fri, Aug 27, 2021 at 5:03 AM Stig Korsnes wrote: > Thank you Robert! > This scheme fits perfectly into what I`m trying to accomplish! :) The > "smooshing" of ints by supplying a list of ints had eluded me. Thank you > also for the pointer about built-in hash(). I would not be able to rely on > it anyways, because it does not return strictly positive ints which > SeedSequence requires. If you have a minute to spare: Could you briefly > explain "int(joblib.hash(key) > , > 16)" , and would this always return non-negative integers? > Thanks again! > > tor. 26. aug. 2021 kl. 22:59 skrev Robert Kern : > >> On Thu, Aug 26, 2021 at 2:22 PM Stig Korsnes >> wrote: >> >>> Hi, >>> Is there a way to uniquely spawn child seeds? >>> I`m doing monte carlo analysis, where I have n random processes, each >>> with their own generator. >>> All process models instantiate a generator with default_rng(). I.e >>> ss=SeedSequence() cs=ss.Spawn(n), and using cs[i] for process i. Now, the >>> problem I`m facing, is that results using individual process depends on >>> the order of the process initialization ,and the number of processes used. >>> However, if I could spawn children with a unique identifier, I would be >>> able to reproduce my individual results without having to pickle/log >>> states. For example, all my models have an id (tuple) field which is >>> hashable. >>> If I had the ability to SeedSequence(x).Spawn([objects]) where objects >>> support hash(object), I would have reproducibility for all my processes. I >>> could do without the spawning, but then I would probably loose independence >>> when I do multiproc? Is there a way to achieve my goal in the current >>> version 1.21 of numpy? >>> >> >> I would probably not rely on `hash()` as it is only intended to be pretty >> good at getting distinct values from distinct inputs. If you can combine >> the tuple objects into a string of bytes in a reliable, collision-free way >> and use one of the cryptographic hashes to get them down to a 128bit >> number, that'd be ideal. `int(joblib.hash(key) >> , >> 16)` should do nicely. You can combine that with your main process's seed >> easily. SeedSequence can take arbitrary amounts of integer data and smoosh >> them all together. The spawning functionality builds off of that, but you >> can also just manually pass in lists of integers. >> >> Let's call that function `stronghash()`. Let's call your main process >> seed number `seed` (this is the thing that the user can set on the >> command-line or something you get from `secrets.randbits(128)` if you need >> a fresh one). Let's call the unique tuple `key`. You can build the >> `SeedSequence` for each job according to the `key` like so: >> >> root_ss = SeedSequence(seed) >> for key, data in jobs: >> child_ss = SeedSequence([stronghash(key), seed]) >> submit_job(key, data, seed=child_ss) >> >> Now each job will get its own unique stream regardless of the order the >> job is assigned. When the user reruns it with the same root `seed`, they >> will get the same results. When the user chooses a different `seed`, they >> will get another set of results (this is why you don't want to just use >> `SeedSequence(stronghash(key))` all by itself). >> >> I put the job-specific seed data ahead of the main program's seed to be >> on the super-safe side. The spawning mechanism will append integers to the >> end, so there's a super-tiny chance somewhere down a long line of >> `root_ss.spawn()`s that there would be a collision (and I mean >> super-extra-tiny). But best practices cost nothing. >> >> I hope that helps and is not too confusing! >> >> -- >> Robert Kern >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Fri Aug 27 18:41:06 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Fri, 27 Aug 2021 19:41:06 -0300 Subject: [Numpy-discussion] Documentation Team meeting - Monday August 30 In-Reply-To: References: Message-ID: Hi all! Our next Documentation Team meeting will be tomorrow - *Monday, August 30* at ***4PM UTC***. All are welcome - you don't need to already be a contributor to join. If you have questions or are curious about what we're doing, we'll be happy to meet you! If you wish to join on Zoom, use this link: https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success Here's the permanent hackmd document with the meeting notes (still being updated in the next few days!): https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210830T16&p1=1440&ah=1 *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From stigkorsnes at gmail.com Sat Aug 28 05:55:26 2021 From: stigkorsnes at gmail.com (Stig Korsnes) Date: Sat, 28 Aug 2021 11:55:26 +0200 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: Thank you again Robert. I am using NamedTuple for mye keys, which also are keys in a dictionary. Each key will be unique (tuple on distinct int and enum), so I am thinking maybe the risk of producing duplicate hash is not present, but could as always be wrong :) For positive ints i followed this tip https://stackoverflow.com/questions/18766535/positive-integer-from-python-hash-function , and did: def stronghash(key:ComponentId): return ctypes.c_size_t(hash(key)).value Since I will be using each process/random sample several times, and keeping all of them in memory at once is not feasible (dimensionality) i did the following: self._rng = default_rng(cs) self._state = dict(self._rng.bit_generator.state) # def scenarios(self) -> npt.NDArray[np.float64]: self._rng.bit_generator.state = self._state .... return .... Would you consider this bad practice, or an ok solution? I Norway we have a saying which directly translates :" He asked for the finger... and took the whole arm" . Best, Stig fre. 27. aug. 2021 kl. 17:01 skrev Robert Kern : > joblib is a library that uses clever caching of function call results to > make the development of certain kinds of data-heavy computational pipelines > easier. In order to derive the key to be used to check the cache, joblib > has to look at the arguments passed to the function, which may > involve usually-nonhashable things like large numpy arrays. > > https://joblib.readthedocs.io/en/latest/ > > So they constructed joblib.hash() which basically takes the arguments, > pickles them into a bytestring (with some implementation details), then > computes an MD5 hash on that. It's probably overkill for your keys, but > it's easily available and quite generic. It returns a hex-encoded string of > the 128-bit MD5 hash. `int(..., 16)` will convert that to a non-negative > (almost-certainly positive!) integer that can be fed into SeedSequence. > > On Fri, Aug 27, 2021 at 5:03 AM Stig Korsnes > wrote: > >> Thank you Robert! >> This scheme fits perfectly into what I`m trying to accomplish! :) The >> "smooshing" of ints by supplying a list of ints had eluded me. Thank you >> also for the pointer about built-in hash(). I would not be able to rely on >> it anyways, because it does not return strictly positive ints which >> SeedSequence requires. If you have a minute to spare: Could you briefly >> explain "int(joblib.hash(key) >> , >> 16)" , and would this always return non-negative integers? >> Thanks again! >> >> tor. 26. aug. 2021 kl. 22:59 skrev Robert Kern : >> >>> On Thu, Aug 26, 2021 at 2:22 PM Stig Korsnes >>> wrote: >>> >>>> Hi, >>>> Is there a way to uniquely spawn child seeds? >>>> I`m doing monte carlo analysis, where I have n random processes, each >>>> with their own generator. >>>> All process models instantiate a generator with default_rng(). I.e >>>> ss=SeedSequence() cs=ss.Spawn(n), and using cs[i] for process i. Now, the >>>> problem I`m facing, is that results using individual process depends on >>>> the order of the process initialization ,and the number of processes used. >>>> However, if I could spawn children with a unique identifier, I would be >>>> able to reproduce my individual results without having to pickle/log >>>> states. For example, all my models have an id (tuple) field which is >>>> hashable. >>>> If I had the ability to SeedSequence(x).Spawn([objects]) where objects >>>> support hash(object), I would have reproducibility for all my processes. I >>>> could do without the spawning, but then I would probably loose independence >>>> when I do multiproc? Is there a way to achieve my goal in the current >>>> version 1.21 of numpy? >>>> >>> >>> I would probably not rely on `hash()` as it is only intended to be >>> pretty good at getting distinct values from distinct inputs. If you can >>> combine the tuple objects into a string of bytes in a reliable, >>> collision-free way and use one of the cryptographic hashes to get them down >>> to a 128bit number, that'd be ideal. `int(joblib.hash(key) >>> , >>> 16)` should do nicely. You can combine that with your main process's seed >>> easily. SeedSequence can take arbitrary amounts of integer data and smoosh >>> them all together. The spawning functionality builds off of that, but you >>> can also just manually pass in lists of integers. >>> >>> Let's call that function `stronghash()`. Let's call your main process >>> seed number `seed` (this is the thing that the user can set on the >>> command-line or something you get from `secrets.randbits(128)` if you need >>> a fresh one). Let's call the unique tuple `key`. You can build the >>> `SeedSequence` for each job according to the `key` like so: >>> >>> root_ss = SeedSequence(seed) >>> for key, data in jobs: >>> child_ss = SeedSequence([stronghash(key), seed]) >>> submit_job(key, data, seed=child_ss) >>> >>> Now each job will get its own unique stream regardless of the order the >>> job is assigned. When the user reruns it with the same root `seed`, they >>> will get the same results. When the user chooses a different `seed`, they >>> will get another set of results (this is why you don't want to just use >>> `SeedSequence(stronghash(key))` all by itself). >>> >>> I put the job-specific seed data ahead of the main program's seed to be >>> on the super-safe side. The spawning mechanism will append integers to the >>> end, so there's a super-tiny chance somewhere down a long line of >>> `root_ss.spawn()`s that there would be a collision (and I mean >>> super-extra-tiny). But best practices cost nothing. >>> >>> I hope that helps and is not too confusing! >>> >>> -- >>> Robert Kern >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Sat Aug 28 15:19:40 2021 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Sat, 28 Aug 2021 21:19:40 +0200 Subject: [Numpy-discussion] Announcing PyData/Sparse 0.13.0 Message-ID: Hello everyone. I?m happy to announce version 0.13.0 of PyData/Sparse. PyData/Sparse provides sparse arrays for the PyData ecosystem: It mimics the NumPy API but provides sparse storage. Version 0.13.0 was mainly a bugfix-centred release, fixing many bugs and regressions reported by users for versions 0.12.0 and 0.11.0. Python 3.6 was dropped for this release, and version 3.9 support was formally added. Some minor features were also added. Source Code: https://github.com/pydata/sparse/tree/0.13.0 Documentation: https://sparse.pydata.org/en/0.13.0/ Changelog: https://sparse.pydata.org/en/0.13.0/changelog.html Best regards, Hameer Abbasi -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From robert.kern at gmail.com Sat Aug 28 20:41:40 2021 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 28 Aug 2021 20:41:40 -0400 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: On Sat, Aug 28, 2021 at 5:56 AM Stig Korsnes wrote: > Thank you again Robert. > I am using NamedTuple for mye keys, which also are keys in a dictionary. > Each key will be unique (tuple on distinct int and enum), so I am thinking > maybe the risk of producing duplicate hash is not present, but could as > always be wrong :) > Present, but possibly ignorably small. 128-bit spaces give enough breathing room for me to be comfortable; 64-bit spaces like what hash() will use for its results makes me just a little claustrophobic. If the structure of the keys is pretty fixed, just these two integers (counting the enum as an integer), then I might just use both in the seeding material. def get_key_seed(key:ComponentId, root_seed:int): return np.random.SeedSequence([key.the_int, int(key.the_enum), root_seed]) > For positive ints i followed this tip > https://stackoverflow.com/questions/18766535/positive-integer-from-python-hash-function > , and did: > > def stronghash(key:ComponentId): > return ctypes.c_size_t(hash(key)).value > np.uint64(possibly_negative_integer) will also work for this purpose (somewhat more reliably). Since I will be using each process/random sample several times, and keeping > all of them in memory at once is not feasible (dimensionality) i did the > following: > > self._rng = default_rng(cs) > self._state = dict(self._rng.bit_generator.state) # > > def scenarios(self) -> npt.NDArray[np.float64]: > self._rng.bit_generator.state = self._state > .... > return .... > > Would you consider this bad practice, or an ok solution? > It's what that property is there for. No need to copy; `.state` creates a new dict each time. In a quick test, I measured a process with 1 million Generator instances to use ~1.5 GiB while 1 million state dicts ~1.0 GiB (including all of the other overhead of Python and numpy; not a scientific test). Storing just the BitGenerator is half-way in between. That's something, but not a huge win. If that is really crossing the border from feasible to infeasible, you may be about to run into your limits anyways for other reasons. So balance that out with the complications of swapping state in and out of a single instance. I Norway we have a saying which directly translates :" He asked for the > finger... and took the whole arm" . > Well, when I craft an overly-complicated system, I feel responsible to help shepherd people along in using it well. :-) -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From stigkorsnes at gmail.com Sun Aug 29 06:56:33 2021 From: stigkorsnes at gmail.com (Stig Korsnes) Date: Sun, 29 Aug 2021 12:56:33 +0200 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: Thanks again Robert! Got rid of dict(state). Not sure I followed you completely on the test case. The "calculator" i am writing , will for the specific use case depend on ~200-1000 processes. Each process object will return say 1m floats when its method scenario is called. If I am not mistaken, that would require 7-8GiB just to keep the these in memory. Furthermore I would possibly have to add the size of the dependent calculation on these (but would likely aggregate outside of testing). A given object that depends on processes will calculate its results based on 1-4 (1-4 *1m of these processes (non multiproc)), and will loop over objects with processpool. So my reasoning is that running memory consumption would then be (1-4)*size of 1m floats x processes + all of other overhead. Since sampling 1m normals is pretty fast, I can happily live with sampling (vs lookup in presampled array), but since two object might depend on the same process they need the exact same array of samples. Hence the state. If I understood you correctly, another solution is to add another duplicate process with same seed, instead of using one where i "reset" state. I promised that this could run on any laptop.. s?n. 29. aug. 2021 kl. 02:42 skrev Robert Kern : > On Sat, Aug 28, 2021 at 5:56 AM Stig Korsnes > wrote: > >> Thank you again Robert. >> I am using NamedTuple for mye keys, which also are keys in a dictionary. >> Each key will be unique (tuple on distinct int and enum), so I am thinking >> maybe the risk of producing duplicate hash is not present, but could as >> always be wrong :) >> > > Present, but possibly ignorably small. 128-bit spaces give enough > breathing room for me to be comfortable; 64-bit spaces like what hash() > will use for its results makes me just a little claustrophobic. > > If the structure of the keys is pretty fixed, just these two integers > (counting the enum as an integer), then I might just use both in the > seeding material. > > def get_key_seed(key:ComponentId, root_seed:int): > return np.random.SeedSequence([key.the_int, int(key.the_enum), > root_seed]) > > >> For positive ints i followed this tip >> https://stackoverflow.com/questions/18766535/positive-integer-from-python-hash-function >> , and did: >> >> def stronghash(key:ComponentId): >> return ctypes.c_size_t(hash(key)).value >> > > np.uint64(possibly_negative_integer) will also work for this purpose > (somewhat more reliably). > > Since I will be using each process/random sample several times, and >> keeping all of them in memory at once is not feasible (dimensionality) i >> did the following: >> >> self._rng = default_rng(cs) >> self._state = dict(self._rng.bit_generator.state) # >> >> def scenarios(self) -> npt.NDArray[np.float64]: >> self._rng.bit_generator.state = self._state >> .... >> return .... >> >> Would you consider this bad practice, or an ok solution? >> > > It's what that property is there for. No need to copy; `.state` creates a > new dict each time. > > In a quick test, I measured a process with 1 million Generator instances > to use ~1.5 GiB while 1 million state dicts ~1.0 GiB (including all of the > other overhead of Python and numpy; not a scientific test). Storing just > the BitGenerator is half-way in between. That's something, but not a huge > win. If that is really crossing the border from feasible to infeasible, you > may be about to run into your limits anyways for other reasons. So balance > that out with the complications of swapping state in and out of a single > instance. > > I Norway we have a saying which directly translates :" He asked for the >> finger... and took the whole arm" . >> > > Well, when I craft an overly-complicated system, I feel responsible to > help shepherd people along in using it well. :-) > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stigkorsnes at gmail.com Sun Aug 29 06:57:46 2021 From: stigkorsnes at gmail.com (Stig Korsnes) Date: Sun, 29 Aug 2021 12:57:46 +0200 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: And big kudos for building AND shepherding :) s?n. 29. aug. 2021 kl. 12:56 skrev Stig Korsnes : > Thanks again Robert! > Got rid of dict(state). > > Not sure I followed you completely on the test case. The "calculator" i am > writing , will for the specific use case depend on ~200-1000 processes. > Each process object will return say 1m floats when its method scenario is > called. If I am not mistaken, that would require 7-8GiB just to keep the > these in memory. Furthermore I would possibly have to add the size of the > dependent calculation on these (but would likely aggregate outside of > testing). A given object that depends on processes will calculate its > results based on 1-4 (1-4 *1m of these processes (non multiproc)), and > will loop over objects with processpool. So my reasoning is that running > memory consumption would then be (1-4)*size of 1m floats x processes + all > of other overhead. Since sampling 1m normals is pretty fast, I can happily > live with sampling (vs lookup in presampled array), but since two object > might depend on the same process they need the exact same array of samples. > Hence the state. If I understood you correctly, another solution is to add > another duplicate process with same seed, instead of using one where i > "reset" state. > > I promised that this could run on any laptop.. > > > > s?n. 29. aug. 2021 kl. 02:42 skrev Robert Kern : > >> On Sat, Aug 28, 2021 at 5:56 AM Stig Korsnes >> wrote: >> >>> Thank you again Robert. >>> I am using NamedTuple for mye keys, which also are keys in a dictionary. >>> Each key will be unique (tuple on distinct int and enum), so I am thinking >>> maybe the risk of producing duplicate hash is not present, but could as >>> always be wrong :) >>> >> >> Present, but possibly ignorably small. 128-bit spaces give enough >> breathing room for me to be comfortable; 64-bit spaces like what hash() >> will use for its results makes me just a little claustrophobic. >> >> If the structure of the keys is pretty fixed, just these two integers >> (counting the enum as an integer), then I might just use both in the >> seeding material. >> >> def get_key_seed(key:ComponentId, root_seed:int): >> return np.random.SeedSequence([key.the_int, int(key.the_enum), >> root_seed]) >> >> >>> For positive ints i followed this tip >>> https://stackoverflow.com/questions/18766535/positive-integer-from-python-hash-function >>> , and did: >>> >>> def stronghash(key:ComponentId): >>> return ctypes.c_size_t(hash(key)).value >>> >> >> np.uint64(possibly_negative_integer) will also work for this purpose >> (somewhat more reliably). >> >> Since I will be using each process/random sample several times, and >>> keeping all of them in memory at once is not feasible (dimensionality) i >>> did the following: >>> >>> self._rng = default_rng(cs) >>> self._state = dict(self._rng.bit_generator.state) # >>> >>> def scenarios(self) -> npt.NDArray[np.float64]: >>> self._rng.bit_generator.state = self._state >>> .... >>> return .... >>> >>> Would you consider this bad practice, or an ok solution? >>> >> >> It's what that property is there for. No need to copy; `.state` creates a >> new dict each time. >> >> In a quick test, I measured a process with 1 million Generator instances >> to use ~1.5 GiB while 1 million state dicts ~1.0 GiB (including all of the >> other overhead of Python and numpy; not a scientific test). Storing just >> the BitGenerator is half-way in between. That's something, but not a huge >> win. If that is really crossing the border from feasible to infeasible, you >> may be about to run into your limits anyways for other reasons. So balance >> that out with the complications of swapping state in and out of a single >> instance. >> >> I Norway we have a saying which directly translates :" He asked for the >>> finger... and took the whole arm" . >>> >> >> Well, when I craft an overly-complicated system, I feel responsible to >> help shepherd people along in using it well. :-) >> >> -- >> Robert Kern >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Aug 29 10:06:57 2021 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 29 Aug 2021 10:06:57 -0400 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: On Sun, Aug 29, 2021 at 6:58 AM Stig Korsnes wrote: > Thanks again Robert! > Got rid of dict(state). > > Not sure I followed you completely on the test case. > In the code that you showed, you were pulling out and storing the `.state` dict and then punching that back into a single `Generator` instance. Instead, you can just make the ~200-1000 `Generator` instances. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From stigkorsnes at gmail.com Sun Aug 29 10:54:25 2021 From: stigkorsnes at gmail.com (Stig Korsnes) Date: Sun, 29 Aug 2021 16:54:25 +0200 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: I am indeed making ~200-1000 generator instances.As many as I have processes. Each process is an instance of a component class , which has a generator. Every time i ask this process for 1m numbers, i need the same 1m numbers. I could instead make a new generator with same seed every time I ask for for the 1m numbers, but presumed that this would be more computationally expensive than setting state on an existing generator. Thank your Robert. Best Stig s?n. 29. aug. 2021 kl. 16:08 skrev Robert Kern : > On Sun, Aug 29, 2021 at 6:58 AM Stig Korsnes > wrote: > >> Thanks again Robert! >> Got rid of dict(state). >> >> Not sure I followed you completely on the test case. >> > > In the code that you showed, you were pulling out and storing the `.state` > dict and then punching that back into a single `Generator` instance. > Instead, you can just make the ~200-1000 `Generator` instances. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Aug 29 12:27:46 2021 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 29 Aug 2021 12:27:46 -0400 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: On Sun, Aug 29, 2021 at 10:55 AM Stig Korsnes wrote: > I am indeed making ~200-1000 generator instances.As many as I have > processes. Each process is an instance of a component class , which has a > generator. Every time i ask this process for 1m numbers, i need the same 1m > numbers. I could instead make a new generator with same seed every time I > ask for for the 1m numbers, but presumed that this would be more > computationally expensive than setting state on an existing generator. > Nominally, but it's overwhelmed by the actual computation. You will have less to juggle if you just compute it from the key each time. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From stigkorsnes at gmail.com Sun Aug 29 13:41:14 2021 From: stigkorsnes at gmail.com (Stig Korsnes) Date: Sun, 29 Aug 2021 19:41:14 +0200 Subject: [Numpy-discussion] SeedSequence.spawn() In-Reply-To: References: Message-ID: Agreed, I already have a flag on the class to toggle fixed "state". Could just set self._rng instead of its state. Will check it out. Must say, had not in my wildest dreams expected such help on any given Sunday. Have a great day and week, sir. Best, Stig s?n. 29. aug. 2021, 18:29 skrev Robert Kern : > On Sun, Aug 29, 2021 at 10:55 AM Stig Korsnes > wrote: > >> I am indeed making ~200-1000 generator instances.As many as I have >> processes. Each process is an instance of a component class , which has a >> generator. Every time i ask this process for 1m numbers, i need the same 1m >> numbers. I could instead make a new generator with same seed every time I >> ask for for the 1m numbers, but presumed that this would be more >> computationally expensive than setting state on an existing generator. >> > > Nominally, but it's overwhelmed by the actual computation. You will have > less to juggle if you just compute it from the key each time. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yashbsr3 at gmail.com Tue Aug 31 06:07:14 2021 From: yashbsr3 at gmail.com (Yash Tewatia) Date: Tue, 31 Aug 2021 15:37:14 +0530 Subject: [Numpy-discussion] New Feature added to rotate MeshGrid Message-ID: Hi, it is my first contribution to the open-source community, I have tried to fix issue #19315, which is to add a new feature of rotating mesh grid in NumPy. It would be great if I get improvements and suggestions for it. Added functionality of rotating mesh grid which fixes #19315 issue, adds a feature which works as follows Parameters ----------- xspan : Input_array range of values of x in the unrotated matrix. yspan : Input_array range of values of y in the unrotated matrix. angle : float or int Angle of rotation, positive for clockwise rotation, negative for anti-clockwise rotation. boolRad : bool True if the given angle is in Radians, False if given angle is in Degrees. Returns -------- out : ndarray A new nested array is generated by Einstein Summation after rotation is applied. Examples ---------- >>> xspan = np.linspace(-2*np.pi, 2*np.pi, 3) >>> yspan = np.linspace(-2*np.pi, 2*np.pi, 3) >>> arr = np.rotateMeshgrid(xspan,yspan,0.4) >>> arr array([[[-6.32689674, -0.04386455, 6.23916764], [-6.28303219, 0. , 6.28303219], [-6.23916764, 0.04386455, 6.32689674]], [[-6.23916764, -6.28303219, -6.32689674], [ 0.04386455, 0. , -0.04386455], [ 6.32689674, 6.28303219, 6.23916764]]]) >>> arr = np.rotateMeshgrid(xspan,yspan,80,False) >>> arr array([[[ 6.9383701 , 6.24478659, 5.55120308], [ 0.69358351, 0. , -0.69358351], [-5.55120308, -6.24478659, -6.9383701 ]], [[-5.55120308, 0.69358351, 6.9383701 ], [-6.24478659, 0. , 6.24478659], [-6.9383701 , -0.69358351, 5.55120308]]]) Code snippet: ( https://user-images.githubusercontent.com/60055574/131483810-05012ab1-8971-4a94-8b1c-8f11d13b7c02.png ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Aug 31 06:29:38 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 31 Aug 2021 12:29:38 +0200 Subject: [Numpy-discussion] New Feature added to rotate MeshGrid In-Reply-To: References: Message-ID: On Tue, Aug 31, 2021 at 12:07 PM Yash Tewatia wrote: > Hi, it is my first contribution to the open-source community, I have tried > to fix issue #19315, which is to add a new feature of rotating mesh grid in > NumPy. It would be great if I get improvements and suggestions for it. > Hi Yash, welcome! That issue is a feature request that did not get a response yet. It seems to me like we should reject it, because rotating a grid is a fairly easy thing to do, as the Stack Overflow discussion linked from https://github.com/numpy/numpy/issues/19315 shows. Cheers, Ralf > Added functionality of rotating mesh grid which fixes #19315 issue, adds a > feature which works as follows > Parameters > ----------- > xspan : Input_array > range of values of x in the unrotated matrix. > yspan : Input_array > range of values of y in the unrotated matrix. > angle : float or int > Angle of rotation, positive for clockwise rotation, negative for > anti-clockwise rotation. > boolRad : bool > True if the given angle is in Radians, False if given angle is > in Degrees. > Returns > -------- > out : ndarray > A new nested array is generated by Einstein Summation after > rotation is applied. > > Examples > ---------- > >>> xspan = np.linspace(-2*np.pi, 2*np.pi, 3) > >>> yspan = np.linspace(-2*np.pi, 2*np.pi, 3) > >>> arr = np.rotateMeshgrid(xspan,yspan,0.4) > >>> arr > array([[[-6.32689674, -0.04386455, 6.23916764], > [-6.28303219, 0. , 6.28303219], > [-6.23916764, 0.04386455, 6.32689674]], > > [[-6.23916764, -6.28303219, -6.32689674], > [ 0.04386455, 0. , -0.04386455], > [ 6.32689674, 6.28303219, 6.23916764]]]) > > >>> arr = np.rotateMeshgrid(xspan,yspan,80,False) > >>> arr > array([[[ 6.9383701 , 6.24478659, 5.55120308], > [ 0.69358351, 0. , -0.69358351], > [-5.55120308, -6.24478659, -6.9383701 ]], > > [[-5.55120308, 0.69358351, 6.9383701 ], > [-6.24478659, 0. , 6.24478659], > [-6.9383701 , -0.69358351, 5.55120308]]]) > > Code snippet: > ( > https://user-images.githubusercontent.com/60055574/131483810-05012ab1-8971-4a94-8b1c-8f11d13b7c02.png > ) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Tue Aug 31 13:27:55 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Tue, 31 Aug 2021 14:27:55 -0300 Subject: [Numpy-discussion] New CZI grant to support DEI initiatives in the scientific Python ecosystem Message-ID: We are happy to announce the Chan Zuckerberg Initiative has awarded a grant to support the onboarding, inclusion, and retention of people from historically marginalized groups on scientific Python projects, and to structurally improve the community dynamics for NumPy, SciPy, Matplotlib, and Pandas. As a part of CZI?s Essential Open Source Software for Science program [1], this Diversity & Inclusion supplemental grant [2] will support the creation of dedicated Contributor Experience Lead positions to identify, document, and implement practices to foster inclusive open-source communities. This project will be led by Melissa Mendon?a (NumPy), with additional mentorship and guidance provided by Ralf Gommers (NumPy, SciPy), Hannah Aizenman and Thomas Caswell (Matplotlib), Matt Haberland (SciPy), and Joris Van den Bossche (Pandas). This is an ambitious project aiming to discover and implement activities that should structurally improve the community dynamics of our projects. By establishing these new cross-project roles, we hope to introduce a new collaboration model to the Scientific Python communities, allowing community-building work within the ecosystem to be done more efficiently and with greater outcomes. We also expect to develop a clearer picture of what works and what doesn?t in our projects to engage and retain new contributors, especially from historically underrepresented groups. Finally, we plan on producing detailed reports on the actions executed, explaining how they have impacted our projects in terms of representation and interaction with our communities. The two-year project is expected to start by November 2021, and we are excited to see the results from this work! You can read the full proposal in figshare [3] and see this announcement at the NumPy site [4]. Cheers! - Melissa [1] https://chanzuckerberg.com/eoss/ [2] https://cziscience.medium.com/advancing-diversity-and-inclusion-in-scientific-open-source-eaabe6a5488b [3] https://figshare.com/articles/online_resource/Advancing_an_inclusive_culture_in_the_scientific_Python_ecosystem/16548063 [4] https://numpy.org/news/#advancing-an-inclusive-culture-in-the-scientific-python-ecosystem -------------- next part -------------- An HTML attachment was scrubbed... URL: