From melissawm at gmail.com Sun May 2 09:34:41 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Sun, 2 May 2021 10:34:41 -0300 Subject: [Numpy-discussion] GSoD 2021 - Statement of Interest In-Reply-To: References: Message-ID: Hello, Mahesh While you work on this, it may be interesting to keep in mind that we are looking for high-impact work that can be done in the timeframe of the GSoD program - examples would be reorganizing content in a section of the documentation, creating new complete document pages on some subject or concept. It may be worth familiarizing yourself with the documentation (I won't suggest reading all of it as it's huge!) to get an idea for the different sections. One idea would be to focus on the user guide, which contains several sub-pages, and check for improvements that can be done there. Cheers, Melissa On Fri, Apr 30, 2021 at 9:28 AM Mahesh S wrote: > Hi there, > > The current documentation of NumPy is really good but a bit more > improvement can be made to it, which is the prime objective of my project. > The improvements which I mentioned in my brief proposal are strategies that > can be applied to every documentation to make it better. > > Apart from general improvements most the documentation related issues in > the NumPy's GitHub issue tracker > will > be addressed. . Some needs more technical information and help from the > community. Some are due to the lack of visual aids. Most of them will be > addressed and improvements will be made such that similar issues will not > be generated in future. > Rearranging of sections in the User Guide > can be done > after further discussions > > Some examples regarding the need of restructuring and duplication are > given in the attached document. Apologies in advance if my observations are > inaccurate or nitpicks. The given doc is just a very brief one. An in-depth > proposal with all the planned changes along with solutions to close as many > issues in the tracker will be prepared. I am getting familiar with the > community and NumPy. > > > On Thu, Apr 29, 2021 at 9:54 PM Melissa Mendon?a > wrote: > >> Hello, Mahesh >> >> Thank you for your proposal. One thing I would say is that some of the >> things you are suggesting are already there - for example, the How-tos and >> Explanations section - but I agree they can be improved! >> >> Another question I could pose is about duplication of content: can you >> give examples of duplicated content that you found in the docs? >> >> Cheers, >> >> Melissa >> >> On Wed, Apr 28, 2021 at 12:33 AM Mahesh S >> wrote: >> >>> Hello, >>> >>> I am Mahesh from India. I am interested in doing Google Season of Docs >>> with NumPy on the project HIGH-LEVEL RESTRUCTURING AND END-USER FOCUS - >>> NumPy >>> >>> I have past experience in documentation writing with WordPress and have >>> completed *Google Summer of Code 2018 *with KDE. I have been an open >>> source enthusiast and contributor since 2017.My past experience with coding >>> ,code documentation , understanding code bases ,will help achieve this task >>> without much input from your end. I have about four years experience >>> working with Open-Source and currently working as a *Quality >>> Assurance specialis*t . I have delivered technical talks at >>> international conferences (KDE-Akademy) and was *mentor of Google >>> Code-In* also. All these makes me the best candidate for the project. >>> >>> I am attaching a brief proposal for the work along with this mail. A >>> more in-depth A more in depth proposal with timelines , all planned >>> strategies , document structures will be submitted after more discussion >>> with the mentors and community. >>> I am ready to start the work as soon as possible and will be able work 7 >>> to 9 hours per day including weekends. I strongly wish to be part of the >>> NumPy community even after the project. >>> >>> >>> Hope to hear from you soon >>> >>> -- >>> Thank you >>> Mahesh S Nair >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > -- > Thank you > Mahesh S Nair > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahesh6947foss at gmail.com Sun May 2 13:06:37 2021 From: mahesh6947foss at gmail.com (Mahesh S) Date: Sun, 2 May 2021 22:36:37 +0530 Subject: [Numpy-discussion] GSoD 2021 - Statement of Interest In-Reply-To: References: Message-ID: Hello there Melissa, Thank You for your valuable suggestion.Actually , I am planning for a high-impact work. I have already started reading the User-Guide and working out small projects using NumPy to understand it further, to prepare an in-depth proposal, which will include changes which I mentioned in my brief proposal, reorganizing content, addressing issues in the GitHub tracker and all. One doubt from my side is regarding the timeline. Currently I am planning to prepare the proposal with the GSoD Timeline that is from June 16 2021 and ends on November 16th 2021 with four evaluation phases , Is there any specific timeline from the community or Am I free to follow the same? And as of now, I am planning to set an order in which the tasks needed to be done , which I will share in the first draft of my proposal which I will submit by this week, later changes can be done as per suggestions from the mentors and community. Thank You Mahesh On Sun, May 2, 2021 at 7:05 PM Melissa Mendon?a wrote: > Hello, Mahesh > > While you work on this, it may be interesting to keep in mind that we are > looking for high-impact work that can be done in the timeframe of the GSoD > program - examples would be reorganizing content in a section of the > documentation, creating new complete document pages on some subject or > concept. It may be worth familiarizing yourself with the documentation (I > won't suggest reading all of it as it's huge!) to get an idea for the > different sections. One idea would be to focus on the user guide, which > contains several sub-pages, and check for improvements that can be done > there. > > Cheers, > > Melissa > > On Fri, Apr 30, 2021 at 9:28 AM Mahesh S wrote: > >> Hi there, >> >> The current documentation of NumPy is really good but a bit more >> improvement can be made to it, which is the prime objective of my project. >> The improvements which I mentioned in my brief proposal are strategies that >> can be applied to every documentation to make it better. >> >> Apart from general improvements most the documentation related issues in >> the NumPy's GitHub issue tracker >> will >> be addressed. . Some needs more technical information and help from the >> community. Some are due to the lack of visual aids. Most of them will be >> addressed and improvements will be made such that similar issues will not >> be generated in future. >> Rearranging of sections in the User Guide >> can be done >> after further discussions >> >> Some examples regarding the need of restructuring and duplication are >> given in the attached document. Apologies in advance if my observations are >> inaccurate or nitpicks. The given doc is just a very brief one. An in-depth >> proposal with all the planned changes along with solutions to close as many >> issues in the tracker will be prepared. I am getting familiar with the >> community and NumPy. >> >> >> On Thu, Apr 29, 2021 at 9:54 PM Melissa Mendon?a >> wrote: >> >>> Hello, Mahesh >>> >>> Thank you for your proposal. One thing I would say is that some of the >>> things you are suggesting are already there - for example, the How-tos and >>> Explanations section - but I agree they can be improved! >>> >>> Another question I could pose is about duplication of content: can you >>> give examples of duplicated content that you found in the docs? >>> >>> Cheers, >>> >>> Melissa >>> >>> On Wed, Apr 28, 2021 at 12:33 AM Mahesh S >>> wrote: >>> >>>> Hello, >>>> >>>> I am Mahesh from India. I am interested in doing Google Season of Docs >>>> with NumPy on the project HIGH-LEVEL RESTRUCTURING AND END-USER FOCUS - >>>> NumPy >>>> >>>> I have past experience in documentation writing with WordPress and have >>>> completed *Google Summer of Code 2018 *with KDE. I have been an open >>>> source enthusiast and contributor since 2017.My past experience with coding >>>> ,code documentation , understanding code bases ,will help achieve this task >>>> without much input from your end. I have about four years experience >>>> working with Open-Source and currently working as a *Quality >>>> Assurance specialis*t . I have delivered technical talks at >>>> international conferences (KDE-Akademy) and was *mentor of Google >>>> Code-In* also. All these makes me the best candidate for the project. >>>> >>>> I am attaching a brief proposal for the work along with this mail. A >>>> more in-depth A more in depth proposal with timelines , all planned >>>> strategies , document structures will be submitted after more discussion >>>> with the mentors and community. >>>> I am ready to start the work as soon as possible and will be able work >>>> 7 to 9 hours per day including weekends. I strongly wish to be part of the >>>> NumPy community even after the project. >>>> >>>> >>>> Hope to hear from you soon >>>> >>>> -- >>>> Thank you >>>> Mahesh S Nair >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> -- >> Thank you >> Mahesh S Nair >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Thank you Mahesh S Nair -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronniegandhi19999 at gmail.com Mon May 3 03:21:03 2021 From: ronniegandhi19999 at gmail.com (Ronnie Gandhi) Date: Mon, 3 May 2021 12:51:03 +0530 Subject: [Numpy-discussion] Prospective technical writer for GSoD 2021 under NumPy Message-ID: Hello, I am Ronnie Gandhi, a Computer Science undergrad from IIT Roorkee in my final year.I am interested in working on NumPy's documentation problem statement "HIGH-LEVEL RESTRUCTURING AND END-USER FOCUS - NumPy". I have successfully completed *GSoD2020 *(my display name is Krezhairo) under *LibreOffice*, I have done* GSoC 2019*, and *2020 *under CGAL, and I have been contributing to open-source development for quite a while. I have also done Data Science internships at *Microsoft*, Hyderabad in 2019 and 2020. I am very well versed with GitHub, git, and CLI tools. Also from my past GSoD2020 experience I know Sphinx, Markdown, reStructuredText, XML as well. In terms of technical writing experience, - Under the Season of Docs 2020 program, I was given an opportunity to populate the wiki pages for the Calc functions of LibreOffice. I updated and populated the wiki pages for 340 of Calc functions in many categories for which the documentation was quite shallow and incomplete on the wiki pages. - For some of the examples of previous technical writing that I have successfully completed in GSoD 2020 under LibreOffice where I edited/populated 340+ sparse wiki pages of Calc functions. Some of the examples are as follows: - https://wiki.documentfoundation.org/Documentation/Calc_Functions/YIELDDISC - https://wiki.documentfoundation.org/Documentation/Calc_Functions/CHIDIST - https://wiki.documentfoundation.org/Documentation/Calc_Functions/SIN - https://wiki.documentfoundation.org/Documentation/Calc_Functions/DAVERAGE - https://wiki.documentfoundation.org/Documentation/Calc_Functions/Guidelines If you want to look into more examples, you can find them here: https://wiki.documentfoundation.org/Documentation/Calc_Functions/List_of_Functions (this is the new link where the "Find Functions" page is shifted sorry for the blank page) - I have also had created the developer docs as a part of my GSoC2019 and GsoC2020 program updating my progress under CGAL org. (some screenshots to the work since it is behind login wall )(although no important data) For my c++ experience, - My GSoC2019 and GSoC2020 projects were completely in C++ and Qt5 (my display name is Krezhairo). For my python experience, - As a data scientist intern at Microsoft, a large part of my work was in python and jupyter notebook. I have had multiple occasions to use NumPy. My GitHub link: https://github.com/RonnieGandhi I am attaching my detailed resume for your reference. Looking forward to hearing from you soon. Kind regards, Ronnie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: short_resume.pdf Type: application/pdf Size: 47044 bytes Desc: not available URL: From mukulikapahari at gmail.com Mon May 3 08:05:39 2021 From: mukulikapahari at gmail.com (Mukulika Pahari) Date: Mon, 3 May 2021 17:35:39 +0530 Subject: [Numpy-discussion] GSoD '21 Statement of Interest Message-ID: Hello everyone, I am attaching my Statement of Interest for GSoD 2021 below. I would love to know everyone's thoughts on it. Thank you! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GSoD '21 Statement of Interest - Mukulika Pahari.pdf Type: application/pdf Size: 117451 bytes Desc: not available URL: From khkiran01 at gmail.com Mon May 3 16:09:30 2021 From: khkiran01 at gmail.com (KiranKhanna) Date: Tue, 4 May 2021 01:39:30 +0530 Subject: [Numpy-discussion] NumPy Proposal Message-ID: <95FD18FC-5E72-4085-9413-8B479849202B@hxcore.ol> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GSOD 2021 Proposal [NumPy].pdf Type: application/pdf Size: 161881 bytes Desc: not available URL: From mrahtz at google.com Tue May 4 16:17:03 2021 From: mrahtz at google.com (Matthew Rahtz) Date: Tue, 4 May 2021 21:17:03 +0100 Subject: [Numpy-discussion] Catching array shape errors using the type checker Message-ID: Hi all, Tl;dr: come to our talk *Catching Tensor Shape Errors Using the Type Checker* at the PyCon 2021 Typing Summit next week! https://us.pycon.org/2021/summits/typing/ Longer version: over the past year a small group of us have been thinking about how to use Python's type system to make tensor/array shape information more visible, and to catch shape errors automatically using type checkers. In this talk we'll discuss our tentative solution, talking about a) how it works, b) some examples of what it looks like in practice, and c) what its limitations are. Hope to see you there! Matthew and Pradeep -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Tue May 4 16:35:53 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Tue, 4 May 2021 17:35:53 -0300 Subject: [Numpy-discussion] GSoD 2021 - Statement of Interest In-Reply-To: References: Message-ID: Hello, Mahesh Yes, the timeline looks fine. Cheers, Melissa On Sun, May 2, 2021 at 2:08 PM Mahesh S wrote: > Hello there Melissa, > > Thank You for your valuable suggestion.Actually , I am planning for a > high-impact work. I have already started reading the User-Guide and working > out small projects using NumPy to understand it further, to prepare an > in-depth proposal, which will include changes which I mentioned in my brief > proposal, reorganizing content, addressing issues in the GitHub tracker and > all. One doubt from my side is regarding the timeline. Currently I am > planning to prepare the proposal with the GSoD Timeline that is from June > 16 2021 and ends on November 16th 2021 with four evaluation phases , > Is there any specific timeline from the community or Am I free to follow > the same? > > And as of now, I am planning to set an order in which the tasks needed to > be done , which I will share in the first draft of my proposal which I will > submit by this week, later changes can be done as per suggestions from the > mentors and community. > > Thank You > Mahesh > > On Sun, May 2, 2021 at 7:05 PM Melissa Mendon?a > wrote: > >> Hello, Mahesh >> >> While you work on this, it may be interesting to keep in mind that we are >> looking for high-impact work that can be done in the timeframe of the GSoD >> program - examples would be reorganizing content in a section of the >> documentation, creating new complete document pages on some subject or >> concept. It may be worth familiarizing yourself with the documentation (I >> won't suggest reading all of it as it's huge!) to get an idea for the >> different sections. One idea would be to focus on the user guide, which >> contains several sub-pages, and check for improvements that can be done >> there. >> >> Cheers, >> >> Melissa >> >> On Fri, Apr 30, 2021 at 9:28 AM Mahesh S >> wrote: >> >>> Hi there, >>> >>> The current documentation of NumPy is really good but a bit more >>> improvement can be made to it, which is the prime objective of my project. >>> The improvements which I mentioned in my brief proposal are strategies that >>> can be applied to every documentation to make it better. >>> >>> Apart from general improvements most the documentation related issues in >>> the NumPy's GitHub issue tracker >>> will >>> be addressed. . Some needs more technical information and help from the >>> community. Some are due to the lack of visual aids. Most of them will be >>> addressed and improvements will be made such that similar issues will not >>> be generated in future. >>> Rearranging of sections in the User Guide >>> can be done >>> after further discussions >>> >>> Some examples regarding the need of restructuring and duplication are >>> given in the attached document. Apologies in advance if my observations are >>> inaccurate or nitpicks. The given doc is just a very brief one. An in-depth >>> proposal with all the planned changes along with solutions to close as many >>> issues in the tracker will be prepared. I am getting familiar with the >>> community and NumPy. >>> >>> >>> On Thu, Apr 29, 2021 at 9:54 PM Melissa Mendon?a >>> wrote: >>> >>>> Hello, Mahesh >>>> >>>> Thank you for your proposal. One thing I would say is that some of the >>>> things you are suggesting are already there - for example, the How-tos and >>>> Explanations section - but I agree they can be improved! >>>> >>>> Another question I could pose is about duplication of content: can you >>>> give examples of duplicated content that you found in the docs? >>>> >>>> Cheers, >>>> >>>> Melissa >>>> >>>> On Wed, Apr 28, 2021 at 12:33 AM Mahesh S >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I am Mahesh from India. I am interested in doing Google Season of Docs >>>>> with NumPy on the project HIGH-LEVEL RESTRUCTURING AND END-USER FOCUS - >>>>> NumPy >>>>> >>>>> I have past experience in documentation writing with WordPress and >>>>> have completed *Google Summer of Code 2018 *with KDE. I have been an >>>>> open source enthusiast and contributor since 2017.My past experience with >>>>> coding ,code documentation , understanding code bases ,will help achieve >>>>> this task without much input from your end. I have about four years >>>>> experience working with Open-Source and currently working as a *Quality >>>>> Assurance specialis*t . I have delivered technical talks at >>>>> international conferences (KDE-Akademy) and was *mentor of Google >>>>> Code-In* also. All these makes me the best candidate for the project. >>>>> >>>>> I am attaching a brief proposal for the work along with this mail. A >>>>> more in-depth A more in depth proposal with timelines , all planned >>>>> strategies , document structures will be submitted after more discussion >>>>> with the mentors and community. >>>>> I am ready to start the work as soon as possible and will be able work >>>>> 7 to 9 hours per day including weekends. I strongly wish to be part of the >>>>> NumPy community even after the project. >>>>> >>>>> >>>>> Hope to hear from you soon >>>>> >>>>> -- >>>>> Thank you >>>>> Mahesh S Nair >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at python.org >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> -- >>> Thank you >>> Mahesh S Nair >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > -- > Thank you > Mahesh S Nair > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bas.vanbeek at hotmail.com Tue May 4 16:37:04 2021 From: bas.vanbeek at hotmail.com (bas van beek) Date: Tue, 4 May 2021 20:37:04 +0000 Subject: [Numpy-discussion] Catching array shape errors using the type checker In-Reply-To: References: Message-ID: Hi Matthew and Pradeep, Do you know if the typing summit will be recorded and if these will be made available later? While I?d love to attend the live event, this is unfortunately not possible due to other obligations that day. Regards, Bas From: NumPy-Discussion On Behalf Of Matthew Rahtz Sent: 04 May 2021 22:17 To: numpy-discussion at python.org Cc: Pradeep Kumar Srinivasan Subject: [Numpy-discussion] Catching array shape errors using the type checker Hi all, Tl;dr: come to our talk Catching Tensor Shape Errors Using the Type Checker at the PyCon 2021 Typing Summit next week! https://us.pycon.org/2021/summits/typing/ Longer version: over the past year a small group of us have been thinking about how to use Python's type system to make tensor/array shape information more visible, and to catch shape errors automatically using type checkers. In this talk we'll discuss our tentative solution, talking about a) how it works, b) some examples of what it looks like in practice, and c) what its limitations are. Hope to see you there! Matthew and Pradeep -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Tue May 4 16:42:53 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Tue, 4 May 2021 17:42:53 -0300 Subject: [Numpy-discussion] GSoD '21 Statement of Interest In-Reply-To: References: Message-ID: Thank you, Mukilika - I really like your ideas for tutorials! The timeline also looks fine to me. Cheers, Melissa On Mon, May 3, 2021 at 9:07 AM Mukulika Pahari wrote: > Hello everyone, > I am attaching my Statement of Interest for GSoD 2021 below. I would love > to know everyone's thoughts on it. > > Thank you! > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Tue May 4 16:56:46 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Tue, 4 May 2021 17:56:46 -0300 Subject: [Numpy-discussion] NumPy Proposal In-Reply-To: <95FD18FC-5E72-4085-9413-8B479849202B@hxcore.ol> References: <95FD18FC-5E72-4085-9413-8B479849202B@hxcore.ol> Message-ID: Hello, Kiran, thank you for your proposal. It would be good to see examples of technical writing - remember that the Season of Docs program is meant for people with some experience in that area, so if you have anything to share, please do. Cheers, Melissa On Mon, May 3, 2021 at 5:11 PM KiranKhanna wrote: > Respected ma?am / sir > > > > I am Kiran Khanna , I am a B.Tech second year Computer > science with AI engineering student . I am very interested to contribute in > the NumPy organization . I have attached my proposal below , Guide me how I > can improve myself . > > > > Thanking You, > > Your faithfully , > > Kiran Khanna > > > > Sent from Mail for > Windows 10 > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Tue May 4 17:04:39 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Tue, 4 May 2021 18:04:39 -0300 Subject: [Numpy-discussion] Prospective technical writer for GSoD 2021 under NumPy In-Reply-To: References: Message-ID: Hello, Ronnie Thank you for your interest! Based on our projects page, do you have any specific ideas for working on NumPy? Because the deadline for hiring technical writers is so close (May 17) it might be worth thinking of writing your Statement of Interest as soon as possible. Let me know if you have further questions. Cheers, Melissa On Mon, May 3, 2021 at 4:21 AM Ronnie Gandhi wrote: > Hello, > > I am Ronnie Gandhi, a Computer Science undergrad from IIT Roorkee in my > final year.I am interested in working on NumPy's documentation problem > statement "HIGH-LEVEL RESTRUCTURING AND END-USER FOCUS - NumPy". > > I have successfully completed *GSoD2020 > *(my > display name is Krezhairo) under *LibreOffice*, I have done* GSoC 2019*, > and *2020 *under CGAL, and I have been contributing to open-source > development for quite a while. I have also done Data Science internships at > *Microsoft*, Hyderabad in 2019 and 2020. > > I am very well versed with GitHub, git, and CLI tools. Also from my past > GSoD2020 experience I know Sphinx, Markdown, reStructuredText, XML as well. > > In terms of technical writing experience, > > - Under the Season of Docs 2020 program, I was given an opportunity to > populate the wiki pages for the Calc functions of LibreOffice. I updated > and populated the wiki pages for 340 of Calc functions in many categories > for which the documentation was quite shallow and incomplete on the wiki > pages. > - For some of the examples of previous technical writing that I > have successfully completed in GSoD 2020 under LibreOffice where I > edited/populated 340+ sparse wiki pages of Calc functions. Some of the > examples are as follows: > - > https://wiki.documentfoundation.org/Documentation/Calc_Functions/YIELDDISC > > - > https://wiki.documentfoundation.org/Documentation/Calc_Functions/CHIDIST > > - > https://wiki.documentfoundation.org/Documentation/Calc_Functions/SIN > > - > https://wiki.documentfoundation.org/Documentation/Calc_Functions/DAVERAGE > - > https://wiki.documentfoundation.org/Documentation/Calc_Functions/Guidelines > If you want to look into more examples, you can find them here: > https://wiki.documentfoundation.org/Documentation/Calc_Functions/List_of_Functions (this > is the new link where the "Find Functions" page is shifted sorry for the > blank page) > > > > - I have also had created the developer docs as a part of my GSoC2019 > and GsoC2020 program updating my progress under CGAL org. (some > screenshots to the work since it is behind login wall > )(although > no important data) > > For my c++ experience, > > - My GSoC2019 > > and GSoC2020 > projects > were completely in C++ and Qt5 (my display name is Krezhairo). > > For my python experience, > > - As a data scientist intern at Microsoft, a large part of my work was > in python and jupyter notebook. I have had multiple occasions to use NumPy. > > My GitHub link: https://github.com/RonnieGandhi > > I am attaching my detailed resume for your reference. > > Looking forward to hearing from you soon. > > Kind regards, > Ronnie > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From khkiran01 at gmail.com Tue May 4 23:09:17 2021 From: khkiran01 at gmail.com (KiranKhanna) Date: Wed, 5 May 2021 08:39:17 +0530 Subject: [Numpy-discussion] NumPy Proposal In-Reply-To: References: <95FD18FC-5E72-4085-9413-8B479849202B@hxcore.ol>, Message-ID: <6FFB20DB-3094-481A-AE35-9D8925841802@hxcore.ol> An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed May 5 00:28:25 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 04 May 2021 23:28:25 -0500 Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage Focus Message-ID: Hi all, Our bi-weekly triage-focused NumPy development meeting is Wednesday, May 5th at 11 am Pacific Time (18:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized, discussed, or reviewed. Best regards Sebastian From matti.picus at gmail.com Thu May 6 06:40:51 2021 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 6 May 2021 13:40:51 +0300 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies Message-ID: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> Here is the current rendering of the NEP:https://numpy.org/neps/nep-0049.html The mailing list discussion, started on April 20 did not bring up any objections to the proposal, nor were there objections in the discussion around the text of the NEP. There were questions around details of the implementation, thank you reviewers for carefully looking at them and suggesting improvements. If there are no substantive objections within 7 days from this email, then the NEP will be accepted; see NEP 0 for more details. Matti From wieser.eric+numpy at gmail.com Thu May 6 07:07:26 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 6 May 2021 12:07:26 +0100 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> Message-ID: The NEP looks good, but I worry the API isn't flexible enough. My two main concerns are: ### Stateful allocators Consider an allocator that aligns to `N` bytes, where `N` is configurable from a python call in someone else's extension module. Where do they store `N`? They can hide it in `PyDataMem_Handler::name` but that's obviously an abuse of the API. They can store it as a global variable, but then obviously the idea of tracking the allocator used to construct an array doesn't work, as the state ends up changing with the global allocator. The easy way out here would be to add a `void* context` field to the structure, and pass it into all the methods. This doesn't really solve the problem though, as now there's no way to cleanup any allocations used to populate `context`, or worse decrement references to python objects stored within `context`. I think we want to bundle `PyDataMem_Handler` in a `PyObject` somehow, either via a new C type, or by using the PyCapsule API which has the cleanup and state hooks we need. `PyDataMem_GetHandlerName` would then return this PyObject rather than an opaque name. For a more exotic case - consider a file-backed allocator, that is constructed from a python `mmap` object which manages blocks within that mmap. The allocator needs to keep a reference to the `mmap` object alive until all the arrays allocated within it are gone, but probably shouldn't leak a reference to it either. ### Thread and async-local allocators For tracing purposes, I expect it to be valuable to be able to configure the allocator within a single thread / coroutine. If we want to support this, we'd most likely want to work with the PEP567 ContextVar API rather than a half-baked thread_local solution that doesn't work for async code. This problem isn't as pressing as the statefulness problem. Fixing it would amount to extending the `PyDataMem_SetHandler` API, and would be unlikely to break any code written against the current version of the NEP; meaning it would be fine to leave as a follow-up. It might still be worth remarking upon as future work of some kind in the NEP. Eric On Thu, 6 May 2021 at 11:41, Matti Picus wrote: > Here is the current rendering of the > NEP:https://numpy.org/neps/nep-0049.html > > > > The mailing list discussion, started on April 20 did not bring up any > objections to the proposal, nor were there objections in the discussion > around the text of the NEP. There were questions around details of the > implementation, thank you reviewers for carefully looking at them and > suggesting improvements. > > > If there are no substantive objections within 7 days from this email, > then the NEP will be accepted; see NEP 0 for more details. > > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Thu May 6 07:43:21 2021 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 6 May 2021 14:43:21 +0300 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> Message-ID: On 6/5/21 2:07 pm, Eric Wieser wrote: > The NEP looks good, but I worry the API isn't flexible enough. My two > main concerns are: > > ### Stateful allocators > > Consider an allocator that aligns to `N` bytes, where `N` is > configurable from a python call in someone else's?extension module. > ... > > ### Thread and async-local allocators > > For tracing purposes, I expect it to be valuable to be able to > configure the allocator within a single thread / coroutine. > If we want to support this, we'd most likely want to work with the > PEP567 ContextVar API rather than a half-baked thread_local solution > that doesn't work for async code. > > This problem isn't as pressing as the statefulness problem. > Fixing it would amount to extending the `PyDataMem_SetHandler` API, > and would be unlikely to break any code written against the current > version of the NEP; meaning it would be fine to leave as a follow-up. > It might still be worth remarking upon as future work of some kind in > the NEP. > > I would prefer to leave both of these to a future extension for the NEP. Setting the alignment from a python-level call seems to be asking for trouble, and I would need to be convinced that the extra layer of flexibility is worth it. It might be worth mentioning that this NEP may be extended in the future, but truthfully I think that is the case for all NEPs. Matti From wieser.eric+numpy at gmail.com Thu May 6 08:06:08 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 6 May 2021 13:06:08 +0100 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> Message-ID: Another argument for supporting stateful allocators would be compatibility with the stateful C++11 allocator API, such as https://en.cppreference.com/w/cpp/memory/allocator_traits/allocate. Adding support for stateful allocators at a later date would almost certainly create an ABI breakage or lots of pain around avoiding one. I haven't thought very much about the PyCapsule approach (although it appears some other reviewers on github considered it at one point), but even building it from scratch, the overhead to support statefulness is not large. As I demonstrate on the github issue (18805), would amount to changing the API from: ```C // the version in the NEP typedef void *(PyDataMem_AllocFunc)(size_t size); typedef void *(PyDataMem_ZeroedAllocFunc)(size_t nelems, size_t elsize); typedef void (PyDataMem_FreeFunc)(void *ptr, size_t size); typedef void *(PyDataMem_ReallocFunc)(void *ptr, size_t size); typedef struct { char name[200]; PyDataMem_AllocFunc *alloc; PyDataMem_ZeroedAllocFunc *zeroed_alloc; PyDataMem_FreeFunc *free; PyDataMem_ReallocFunc *realloc; } PyDataMem_HandlerObject; const PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler); const char * PyDataMem_GetHandlerName(PyArrayObject *obj); ``` to ```C // proposed changes: a `PyObject *self` argument pointing to a `PyDataMem_HandlerObject` and a ` PyObject_HEAD` typedef void *(PyDataMem_AllocFunc)(PyObject *self, size_t size); typedef void *(PyDataMem_ZeroedAllocFunc)(PyObject *self, size_t nelems, size_t elsize); typedef void (PyDataMem_FreeFunc)(PyObject *self, void *ptr, size_t size); typedef void *(PyDataMem_ReallocFunc)(PyObject *self, void *ptr, size_t size); typedef struct { PyObject_HEAD PyDataMem_AllocFunc *alloc; PyDataMem_ZeroedAllocFunc *zeroed_alloc; PyDataMem_FreeFunc *free; PyDataMem_ReallocFunc *realloc; } PyDataMem_HandlerObject; // steals a reference to handler, caller is responsible for decrefing the result PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler); // borrowed reference PyDataMem_Handler * PyDataMem_GetHandler(PyArrayObject *obj); // some boilerplate that numpy is already full of and doesn't impact users of non-stateful allocators PyTypeObject PyDataMem_HandlerType = ...; ``` When constructing an array, the reference count of the handler would be incremented before storing it in the array struct Since the extra work now to support this is not awful, but the potential for ABI headaches down the road is, I think we should aim to support statefulness right from the start. The runtime overhead of the stateful approach above vs the NEP approach is negligible, and consists of: * Some overhead costs for setting up an allocator. This likely only happens near startup, so won't matter. * An extra incref on each array allocation * An extra pointer argument on the stack for each allocation and deallocation * Perhaps around 32 extra bytes per allocator objects. Since arrays just store pointers to allocators this doesn't matter. Eric On Thu, 6 May 2021 at 12:43, Matti Picus wrote: > > On 6/5/21 2:07 pm, Eric Wieser wrote: > > The NEP looks good, but I worry the API isn't flexible enough. My two > > main concerns are: > > > > ### Stateful allocators > > > > Consider an allocator that aligns to `N` bytes, where `N` is > > configurable from a python call in someone else's extension module. > > ... > > > > ### Thread and async-local allocators > > > > For tracing purposes, I expect it to be valuable to be able to > > configure the allocator within a single thread / coroutine. > > If we want to support this, we'd most likely want to work with the > > PEP567 ContextVar API rather than a half-baked thread_local solution > > that doesn't work for async code. > > > > This problem isn't as pressing as the statefulness problem. > > Fixing it would amount to extending the `PyDataMem_SetHandler` API, > > and would be unlikely to break any code written against the current > > version of the NEP; meaning it would be fine to leave as a follow-up. > > It might still be worth remarking upon as future work of some kind in > > the NEP. > > > > > I would prefer to leave both of these to a future extension for the NEP. > Setting the alignment from a python-level call seems to be asking for > trouble, and I would need to be convinced that the extra layer of > flexibility is worth it. > > > It might be worth mentioning that this NEP may be extended in the > future, but truthfully I think that is the case for all NEPs. > > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Thu May 6 08:48:25 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Thu, 6 May 2021 09:48:25 -0300 Subject: [Numpy-discussion] Newcomer's Hour! Message-ID: Hello, folks! I've added a new event to the NumPy community calendar: the Newcomer's Hour. This is an event we've been running for a few months and is a bi-weekly informal meeting for newcomers to the NumPy community. This is a place to ask questions, technical or otherwise, meet other contributors, and (hopefully) figure out how to contribute :) We are having one today at 8pm UTC - here's the zoom link: https://zoom.us/j/6345425936. To follow all the events and meetings related to NumPy and add them to your own Google calendar, you can use this link: https://calendar.google.com/calendar/r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 Cheers! Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu May 6 10:20:18 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 6 May 2021 08:20:18 -0600 Subject: [Numpy-discussion] Accept NEP 35 as final Message-ID: Hi All, It is proposed to accept NEP 35 as final. It is discussed in issue #17075 . If there is no opposition I will put up a pull request in a week. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From legoff.ant at gmail.com Fri May 7 01:53:23 2021 From: legoff.ant at gmail.com (Anthony Le Goff) Date: Fri, 7 May 2021 07:53:23 +0200 Subject: [Numpy-discussion] Open for contribution Message-ID: Hi, I'm Anthony from Brest, Brittany, France. I've done my first PR for a documentation fix, and i'm still learning the workflow of github. I'm looking for to participate a open source project, i'm also a free software enthousiast. I discovered numpy while I try to find a solution in engineering computational science for hobbyist to replace MATLAB where i'm against the politic of licensing. I have a degree in mechanical & design engineering and i'm a self-taugh programmer. I'm still learning python and i know the language C. I get some knowledge in system administration in linux. I hope i can help to contribute on the code or the documentation. I never try to change a piece of code in a project, it would be my first time. I'm little bit confuse to do something wrong with a shitty code and broke something. I thing it is normal just try. My next step after python is learning data structure and algorithm then design pattern. I large program and i do that alone on my free time. I'm looking for a full time jobs also. I've seen numpy library is also coded in C. It's a good point to my knowledge. I was looking for some "first issue" but i'm not sure it's easy with 27 comments after 5 years on some issues. I'm open for large project to get experience in programming. Regards Anthony Le Goff -------------- next part -------------- An HTML attachment was scrubbed... URL: From verma16.ayush at gmail.com Fri May 7 09:07:16 2021 From: verma16.ayush at gmail.com (Ayush Verma) Date: Fri, 7 May 2021 18:37:16 +0530 Subject: [Numpy-discussion] GSoD'21- Submitting first draft of the Statement of Interest. Message-ID: Hello everyone, I have attached my Statement of Interest for GSoD'21 below. I will also be adding a couple more links to the statement within a day or two. Till then, I would love to get reviews from the community on the proposed changes. Cheers. Ayush. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GSoD '21_NumPy_R3.pdf Type: application/pdf Size: 499204 bytes Desc: not available URL: From melissawm at gmail.com Fri May 7 15:47:16 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Fri, 7 May 2021 16:47:16 -0300 Subject: [Numpy-discussion] Documentation Team meeting (and Google Season of Docs) - Monday May 10 In-Reply-To: References: Message-ID: Hi all! Our next Documentation Team meeting will be on *Monday, May 10* at ***4PM UTC***. All are welcome - you don't need to already be a contributor to join. If you have questions or are curious about what we're doing, we'll be happy to meet you! As a reminder, we are in the process of reading through Statements of Interest. Due to the deadline for hiring a technical writer (May 17), if you plan on submitting one you should do so as early as possible - and no later than May 10. If you wish to join on Zoom, use this link: https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success Here's the permanent hackmd document with the meeting notes (still being updated in the next few days!): https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210426T16&p1=1440&ah=1 *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Fri May 7 15:48:33 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Fri, 7 May 2021 16:48:33 -0300 Subject: [Numpy-discussion] Documentation Team meeting (and Google Season of Docs) - Monday May 10 In-Reply-To: References: Message-ID: Sorry forgot to update the timezone link. Here's the right one: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210510T16&p1=%3A&ah=1 Cheers! On Fri, May 7, 2021 at 4:47 PM Melissa Mendon?a wrote: > Hi all! > > Our next Documentation Team meeting will be on *Monday, May 10* at ***4PM > UTC***. > > All are welcome - you don't need to already be a contributor to join. If > you have questions or are curious about what we're doing, we'll be happy to > meet you! > > As a reminder, we are in the process of reading through Statements of > Interest. Due to the deadline for hiring a technical writer (May 17), if > you plan on submitting one you should do so as early as possible - and no > later than May 10. > > If you wish to join on Zoom, use this link: > > https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success > > Here's the permanent hackmd document with the meeting notes (still being > updated in the next few days!): > > https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg > > > Hope to see you around! > > ** You can click this link to get the correct time at your timezone: > https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+Meeting&iso=20210426T16&p1=1440&ah=1 > > *** You can add the NumPy community calendar to your google calendar by > clicking this link: https://calendar.google.com/calendar > /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 > > - Melissa > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mahesh6947foss at gmail.com Sat May 8 15:17:25 2021 From: mahesh6947foss at gmail.com (Mahesh S) Date: Sun, 9 May 2021 00:47:25 +0530 Subject: [Numpy-discussion] GSoD 2021 - Statement of Interest In-Reply-To: References: Message-ID: Hi there everyone, I am attaching my in depth proposal for Google Season of Docs 2021 with NumPy along with this mail. Kindly share your thoughts on it Thanks & Regards Mahesh S Nair On Wed, May 5, 2021 at 2:06 AM Melissa Mendon?a wrote: > Hello, Mahesh > > Yes, the timeline looks fine. > > Cheers, > > Melissa > > On Sun, May 2, 2021 at 2:08 PM Mahesh S wrote: > >> Hello there Melissa, >> >> Thank You for your valuable suggestion.Actually , I am planning for a >> high-impact work. I have already started reading the User-Guide and working >> out small projects using NumPy to understand it further, to prepare an >> in-depth proposal, which will include changes which I mentioned in my brief >> proposal, reorganizing content, addressing issues in the GitHub tracker and >> all. One doubt from my side is regarding the timeline. Currently I am >> planning to prepare the proposal with the GSoD Timeline that is from June >> 16 2021 and ends on November 16th 2021 with four evaluation phases , >> Is there any specific timeline from the community or Am I free to follow >> the same? >> >> And as of now, I am planning to set an order in which the tasks needed >> to be done , which I will share in the first draft of my proposal which I >> will submit by this week, later changes can be done as per suggestions from >> the mentors and community. >> >> Thank You >> Mahesh >> >> On Sun, May 2, 2021 at 7:05 PM Melissa Mendon?a >> wrote: >> >>> Hello, Mahesh >>> >>> While you work on this, it may be interesting to keep in mind that we >>> are looking for high-impact work that can be done in the timeframe of the >>> GSoD program - examples would be reorganizing content in a section of the >>> documentation, creating new complete document pages on some subject or >>> concept. It may be worth familiarizing yourself with the documentation (I >>> won't suggest reading all of it as it's huge!) to get an idea for the >>> different sections. One idea would be to focus on the user guide, which >>> contains several sub-pages, and check for improvements that can be done >>> there. >>> >>> Cheers, >>> >>> Melissa >>> >>> On Fri, Apr 30, 2021 at 9:28 AM Mahesh S >>> wrote: >>> >>>> Hi there, >>>> >>>> The current documentation of NumPy is really good but a bit more >>>> improvement can be made to it, which is the prime objective of my project. >>>> The improvements which I mentioned in my brief proposal are strategies that >>>> can be applied to every documentation to make it better. >>>> >>>> Apart from general improvements most the documentation related issues >>>> in the NumPy's GitHub issue tracker >>>> will >>>> be addressed. . Some needs more technical information and help from the >>>> community. Some are due to the lack of visual aids. Most of them will be >>>> addressed and improvements will be made such that similar issues will not >>>> be generated in future. >>>> Rearranging of sections in the User Guide >>>> can be done >>>> after further discussions >>>> >>>> Some examples regarding the need of restructuring and duplication are >>>> given in the attached document. Apologies in advance if my observations are >>>> inaccurate or nitpicks. The given doc is just a very brief one. An in-depth >>>> proposal with all the planned changes along with solutions to close as many >>>> issues in the tracker will be prepared. I am getting familiar with the >>>> community and NumPy. >>>> >>>> >>>> On Thu, Apr 29, 2021 at 9:54 PM Melissa Mendon?a >>>> wrote: >>>> >>>>> Hello, Mahesh >>>>> >>>>> Thank you for your proposal. One thing I would say is that some of the >>>>> things you are suggesting are already there - for example, the How-tos and >>>>> Explanations section - but I agree they can be improved! >>>>> >>>>> Another question I could pose is about duplication of content: can you >>>>> give examples of duplicated content that you found in the docs? >>>>> >>>>> Cheers, >>>>> >>>>> Melissa >>>>> >>>>> On Wed, Apr 28, 2021 at 12:33 AM Mahesh S >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I am Mahesh from India. I am interested in doing Google Season of >>>>>> Docs with NumPy on the project HIGH-LEVEL RESTRUCTURING AND END-USER FOCUS >>>>>> - NumPy >>>>>> >>>>>> I have past experience in documentation writing with WordPress and >>>>>> have completed *Google Summer of Code 2018 *with KDE. I have been an >>>>>> open source enthusiast and contributor since 2017.My past experience with >>>>>> coding ,code documentation , understanding code bases ,will help achieve >>>>>> this task without much input from your end. I have about four years >>>>>> experience working with Open-Source and currently working as a *Quality >>>>>> Assurance specialis*t . I have delivered technical talks at >>>>>> international conferences (KDE-Akademy) and was *mentor of Google >>>>>> Code-In* also. All these makes me the best candidate for the >>>>>> project. >>>>>> >>>>>> I am attaching a brief proposal for the work along with this mail. A >>>>>> more in-depth A more in depth proposal with timelines , all planned >>>>>> strategies , document structures will be submitted after more discussion >>>>>> with the mentors and community. >>>>>> I am ready to start the work as soon as possible and will be able >>>>>> work 7 to 9 hours per day including weekends. I strongly wish to be part of >>>>>> the NumPy community even after the project. >>>>>> >>>>>> >>>>>> Hope to hear from you soon >>>>>> >>>>>> -- >>>>>> Thank you >>>>>> Mahesh S Nair >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at python.org >>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at python.org >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> >>>> -- >>>> Thank you >>>> Mahesh S Nair >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> -- >> Thank you >> Mahesh S Nair >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Thank you Mahesh S Nair -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Restructuring the NumPy Documentation-GSoD-2021-Proposal.pdf Type: application/pdf Size: 125345 bytes Desc: not available URL: From sebastian at sipsolutions.net Sun May 9 23:02:48 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 09 May 2021 20:02:48 -0700 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> Message-ID: <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> On Thu, 2021-05-06 at 13:06 +0100, Eric Wieser wrote: > Another argument for supporting stateful allocators would be > compatibility > with the stateful C++11 allocator API, such as > https://en.cppreference.com/w/cpp/memory/allocator_traits/allocate. The Python version of this does have a `void *ctx`, but I am not sure if the use for this is actually valuable for the NumPy use-cases. (Honestly, beyond aligned allocation, or memory pinning, I am uncertain what those use-cases are). I had more written, but maybe just keep it short: While I like the `PyObject *` idea, I am also not sure that it helps much. If we want allocation specific state, the user should overallocate and save it before the actual allocation. I am sure there could be extensions in the future (although I don't know what exactly). I am not super worried about it, its fairly niche and we can probably figure out ways to deprecate an old way of registration and slowly replace it with a new way. But if we don't mind the churn it creates, the only serious idea I would have right now is using a `FromSpec` API. The only difference would be that we allocate the struct and (for now) return something that is fully opaque (we could allow get/set functions on it though). In fact, we could even keep the current struct largely unchanged but change it to be the main "spec", with no actual slots currently necessary (could even be a `void *slots` that is always NULL). (slots are a bit unfortunate, since they cast to `void *` making compile time type checking harder, but overall I think its OK and something we will be using more anyway for DTypes.) I am not sure it is worth it, but if there are no arguments why we cannot allocate the struct, that seems fine. If the return value is opaque, we even have the ability to turn it into a proper Python object if we want to. Cheers, Sebastian > Adding support for stateful allocators at a later date would almost > certainly create an ABI breakage or lots of pain around avoiding one. > > I haven't thought very much about the PyCapsule approach (although it > appears some other reviewers on github considered it at one point), > but > even building it from scratch, the overhead to support statefulness > is not > large. > As I demonstrate on the github issue (18805), would amount to > changing the > API from: > ```C > // the version in the NEP > typedef void *(PyDataMem_AllocFunc)(size_t size); > typedef void *(PyDataMem_ZeroedAllocFunc)(size_t nelems, size_t > elsize); > typedef void (PyDataMem_FreeFunc)(void *ptr, size_t size); > typedef void *(PyDataMem_ReallocFunc)(void *ptr, size_t size); > typedef struct { > ??? char name[200]; > ??? PyDataMem_AllocFunc *alloc; > ??? PyDataMem_ZeroedAllocFunc *zeroed_alloc; > ??? PyDataMem_FreeFunc *free; > ??? PyDataMem_ReallocFunc *realloc; > } PyDataMem_HandlerObject; > const PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler > *handler); > const char * PyDataMem_GetHandlerName(PyArrayObject *obj); > ``` > to > ```C > // proposed changes: a `PyObject *self` argument pointing to a > `PyDataMem_HandlerObject` and a ` PyObject_HEAD` > typedef void *(PyDataMem_AllocFunc)(PyObject *self, size_t size); > typedef void *(PyDataMem_ZeroedAllocFunc)(PyObject *self, size_t > nelems, > size_t elsize); > typedef void (PyDataMem_FreeFunc)(PyObject *self, void *ptr, size_t > size); > typedef void *(PyDataMem_ReallocFunc)(PyObject *self, void *ptr, > size_t > size); > typedef struct { > ??? PyObject_HEAD > ??? PyDataMem_AllocFunc *alloc; > ??? PyDataMem_ZeroedAllocFunc *zeroed_alloc; > ??? PyDataMem_FreeFunc *free; > ??? PyDataMem_ReallocFunc *realloc; > } PyDataMem_HandlerObject; > // steals a reference to handler, caller is responsible for decrefing > the > result > PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler); > // borrowed reference > PyDataMem_Handler * PyDataMem_GetHandler(PyArrayObject *obj); > > // some boilerplate that numpy is already full of and doesn't impact > users > of non-stateful allocators > PyTypeObject PyDataMem_HandlerType = ...; > ``` > When constructing an array, the reference count of the handler would > be > incremented before storing it in the array struct > > Since the extra work now to support this is not awful, but the > potential > for ABI headaches down the road is, I think we should aim to support > statefulness right from the start. > The runtime overhead of the stateful approach above vs the NEP > approach is > negligible, and consists of: > * Some overhead costs for setting up an allocator. This likely only > happens > near startup, so won't matter. > * An extra incref on each array allocation > * An extra pointer argument on the stack for each allocation and > deallocation > * Perhaps around 32 extra bytes per allocator objects. Since arrays > just > store pointers to allocators this doesn't matter. > > Eric > > > On Thu, 6 May 2021 at 12:43, Matti Picus > wrote: > > > > > On 6/5/21 2:07 pm, Eric Wieser wrote: > > > The NEP looks good, but I worry the API isn't flexible enough. My > > > two > > > main concerns are: > > > > > > ### Stateful allocators > > > > > > Consider an allocator that aligns to `N` bytes, where `N` is > > > configurable from a python call in someone else's extension > > > module. > > > ... > > > > > > ### Thread and async-local allocators > > > > > > For tracing purposes, I expect it to be valuable to be able to > > > configure the allocator within a single thread / coroutine. > > > If we want to support this, we'd most likely want to work with > > > the > > > PEP567 ContextVar API rather than a half-baked thread_local > > > solution > > > that doesn't work for async code. > > > > > > This problem isn't as pressing as the statefulness problem. > > > Fixing it would amount to extending the `PyDataMem_SetHandler` > > > API, > > > and would be unlikely to break any code written against the > > > current > > > version of the NEP; meaning it would be fine to leave as a > > > follow-up. > > > It might still be worth remarking upon as future work of some > > > kind in > > > the NEP. > > > > > > > > I would prefer to leave both of these to a future extension for the > > NEP. > > Setting the alignment from a python-level call seems to be asking > > for > > trouble, and I would need to be convinced that the extra layer of > > flexibility is worth it. > > > > > > It might be worth mentioning that this NEP may be extended in the > > future, but truthfully I think that is the case for all NEPs. > > > > > > Matti > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From wieser.eric+numpy at gmail.com Mon May 10 05:01:16 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 10 May 2021 10:01:16 +0100 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> Message-ID: > The Python version of this does have a `void *ctx`, but I am not sure if the use for this is actually valuable for the NumPy use-cases. Do you mean "the CPython version"? If so, can you link a reference? > While I like the `PyObject *` idea, I am also not sure that it helps much. If we want allocation specific state, the user should overallocate and save it before the actual allocation. I was talking about allocator- not allocation- specific state. I agree that the correct place to store the latter is by overallocating, but it doesn't make much sense to me to duplicate state about the allocator itself in each allocation. > But if we don't mind the churn it creates, the only serious idea I would have right now is using a `FromSpec` API. We could allow get/set functions on it though We don't even need to go as far as a flexible `FromSpec` API. Simply having a function to allocate (and free) the opaque struct and a handful of getters ought to be enough to let us change the allocator to be stateful in future. On the other hand, this is probably about as much work as just making it a PyObject in the first place. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From verma16.ayush at gmail.com Mon May 10 07:24:19 2021 From: verma16.ayush at gmail.com (Ayush Verma) Date: Mon, 10 May 2021 16:54:19 +0530 Subject: [Numpy-discussion] GSoD'21- Submitting first draft of the Statement of Interest. In-Reply-To: References: Message-ID: I have updated the statement with links to a personal project's documentation following the Diataxis framework guidelines. I am also attaching the links below in the mail itself. Please find the updated document and attached links below. Github repo link of the project: www.github.com/verma16Ayush/profpoll Hosted documentation: profpoll.readthedocs.io/en/latest Thanks, Ayush On Fri, May 7, 2021 at 6:37 PM Ayush Verma wrote: > Hello everyone, > I have attached my Statement of Interest for GSoD'21 below. I will also be > adding a couple more links to the statement within a day or two. Till then, > I would love to get reviews from the community on the proposed changes. > Cheers. > Ayush. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GSoD '21_NumPy_R4.pdf Type: application/pdf Size: 501573 bytes Desc: not available URL: From melissawm at gmail.com Mon May 10 08:22:40 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Mon, 10 May 2021 09:22:40 -0300 Subject: [Numpy-discussion] GSoD'21- Submitting first draft of the Statement of Interest. In-Reply-To: References: Message-ID: Thank you, Ayush! We will take your project into consideration and let you all know of our decision shortly. Cheers! - Melissa On Mon, May 10, 2021 at 8:26 AM Ayush Verma wrote: > I have updated the statement with links to a personal project's > documentation following the Diataxis framework guidelines. I am also > attaching the links below in the mail itself. > Please find the updated document and attached links below. > Github repo link of the project: www.github.com/verma16Ayush/profpoll > Hosted documentation: profpoll.readthedocs.io/en/latest > Thanks, > Ayush > > On Fri, May 7, 2021 at 6:37 PM Ayush Verma > wrote: > >> Hello everyone, >> I have attached my Statement of Interest for GSoD'21 below. I will also >> be adding a couple more links to the statement within a day or two. Till >> then, I would love to get reviews from the community on the proposed >> changes. >> Cheers. >> Ayush. >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Mon May 10 08:23:18 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Mon, 10 May 2021 09:23:18 -0300 Subject: [Numpy-discussion] GSoD 2021 - Statement of Interest In-Reply-To: References: Message-ID: Hello, Mahesh! We will take your project into consideration and let you know of our decision shortly. Cheers! Melissa On Sat, May 8, 2021 at 4:19 PM Mahesh S wrote: > Hi there everyone, > > I am attaching my in depth proposal for Google Season of Docs 2021 with > NumPy along with this mail. Kindly share your thoughts on it > > Thanks & Regards > Mahesh S Nair > > On Wed, May 5, 2021 at 2:06 AM Melissa Mendon?a > wrote: > >> Hello, Mahesh >> >> Yes, the timeline looks fine. >> >> Cheers, >> >> Melissa >> >> On Sun, May 2, 2021 at 2:08 PM Mahesh S wrote: >> >>> Hello there Melissa, >>> >>> Thank You for your valuable suggestion.Actually , I am planning for a >>> high-impact work. I have already started reading the User-Guide and working >>> out small projects using NumPy to understand it further, to prepare an >>> in-depth proposal, which will include changes which I mentioned in my brief >>> proposal, reorganizing content, addressing issues in the GitHub tracker and >>> all. One doubt from my side is regarding the timeline. Currently I am >>> planning to prepare the proposal with the GSoD Timeline that is from June >>> 16 2021 and ends on November 16th 2021 with four evaluation phases , >>> Is there any specific timeline from the community or Am I free to follow >>> the same? >>> >>> And as of now, I am planning to set an order in which the tasks needed >>> to be done , which I will share in the first draft of my proposal which I >>> will submit by this week, later changes can be done as per suggestions from >>> the mentors and community. >>> >>> Thank You >>> Mahesh >>> >>> On Sun, May 2, 2021 at 7:05 PM Melissa Mendon?a >>> wrote: >>> >>>> Hello, Mahesh >>>> >>>> While you work on this, it may be interesting to keep in mind that we >>>> are looking for high-impact work that can be done in the timeframe of the >>>> GSoD program - examples would be reorganizing content in a section of the >>>> documentation, creating new complete document pages on some subject or >>>> concept. It may be worth familiarizing yourself with the documentation (I >>>> won't suggest reading all of it as it's huge!) to get an idea for the >>>> different sections. One idea would be to focus on the user guide, which >>>> contains several sub-pages, and check for improvements that can be done >>>> there. >>>> >>>> Cheers, >>>> >>>> Melissa >>>> >>>> On Fri, Apr 30, 2021 at 9:28 AM Mahesh S >>>> wrote: >>>> >>>>> Hi there, >>>>> >>>>> The current documentation of NumPy is really good but a bit more >>>>> improvement can be made to it, which is the prime objective of my project. >>>>> The improvements which I mentioned in my brief proposal are strategies that >>>>> can be applied to every documentation to make it better. >>>>> >>>>> Apart from general improvements most the documentation related issues >>>>> in the NumPy's GitHub issue tracker >>>>> will >>>>> be addressed. . Some needs more technical information and help from the >>>>> community. Some are due to the lack of visual aids. Most of them will be >>>>> addressed and improvements will be made such that similar issues will not >>>>> be generated in future. >>>>> Rearranging of sections in the User Guide >>>>> can be done >>>>> after further discussions >>>>> >>>>> Some examples regarding the need of restructuring and duplication are >>>>> given in the attached document. Apologies in advance if my observations are >>>>> inaccurate or nitpicks. The given doc is just a very brief one. An in-depth >>>>> proposal with all the planned changes along with solutions to close as many >>>>> issues in the tracker will be prepared. I am getting familiar with the >>>>> community and NumPy. >>>>> >>>>> >>>>> On Thu, Apr 29, 2021 at 9:54 PM Melissa Mendon?a >>>>> wrote: >>>>> >>>>>> Hello, Mahesh >>>>>> >>>>>> Thank you for your proposal. One thing I would say is that some of >>>>>> the things you are suggesting are already there - for example, the How-tos >>>>>> and Explanations section - but I agree they can be improved! >>>>>> >>>>>> Another question I could pose is about duplication of content: can >>>>>> you give examples of duplicated content that you found in the docs? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Melissa >>>>>> >>>>>> On Wed, Apr 28, 2021 at 12:33 AM Mahesh S >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I am Mahesh from India. I am interested in doing Google Season of >>>>>>> Docs with NumPy on the project HIGH-LEVEL RESTRUCTURING AND END-USER FOCUS >>>>>>> - NumPy >>>>>>> >>>>>>> I have past experience in documentation writing with WordPress and >>>>>>> have completed *Google Summer of Code 2018 *with KDE. I have been >>>>>>> an open source enthusiast and contributor since 2017.My past experience >>>>>>> with coding ,code documentation , understanding code bases ,will help >>>>>>> achieve this task without much input from your end. I have about four years >>>>>>> experience working with Open-Source and currently working as a *Quality >>>>>>> Assurance specialis*t . I have delivered technical talks at >>>>>>> international conferences (KDE-Akademy) and was *mentor of Google >>>>>>> Code-In* also. All these makes me the best candidate for the >>>>>>> project. >>>>>>> >>>>>>> I am attaching a brief proposal for the work along with this mail. A >>>>>>> more in-depth A more in depth proposal with timelines , all planned >>>>>>> strategies , document structures will be submitted after more discussion >>>>>>> with the mentors and community. >>>>>>> I am ready to start the work as soon as possible and will be able >>>>>>> work 7 to 9 hours per day including weekends. I strongly wish to be part of >>>>>>> the NumPy community even after the project. >>>>>>> >>>>>>> >>>>>>> Hope to hear from you soon >>>>>>> >>>>>>> -- >>>>>>> Thank you >>>>>>> Mahesh S Nair >>>>>>> >>>>>>> _______________________________________________ >>>>>>> NumPy-Discussion mailing list >>>>>>> NumPy-Discussion at python.org >>>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at python.org >>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thank you >>>>> Mahesh S Nair >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at python.org >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> -- >>> Thank you >>> Mahesh S Nair >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > -- > Thank you > Mahesh S Nair > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon May 10 11:52:47 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 May 2021 09:52:47 -0600 Subject: [Numpy-discussion] NumPy 1.20.3 release Message-ID: Hi All, On behalf of the NumPy team I am pleased to announce the release of NumPy 1.20.3. NumPy 1,20.3 is a bugfix release containing several fixes merged to the main branch after the NumPy 1.20.2 release. The Python versions supported for this release are 3.7-3.9. Wheels can be downloaded from PyPI ; source archives, release notes, and wheel hashes are available on Github . Linux users will need pip >= 0.19.3 in order to install manylinux2010 and manylinux2014 wheels. *Contributors* A total of 7 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - Anne Archibald - Bas van Beek - Charles Harris - Dong Keun Oh + - Kamil Choudhury + - Sayed Adel - Sebastian Berg *Pull requests merged* A total of 15 pull requests were merged for this release. - #18763: BUG: Correct ``datetime64`` missing type overload for ``datetime.date``... - #18764: MAINT: Remove ``__all__`` in favor of explicit re-exports - #18768: BLD: Strip extra newline when dumping gfortran version on MacOS - #18769: BUG: fix segfault in object/longdouble operations - #18794: MAINT: Use towncrier build explicitly - #18887: MAINT: Relax certain integer-type constraints - #18915: MAINT: Remove unsafe unions and ABCs from return-annotations - #18921: MAINT: Allow more recursion depth for scalar tests. - #18922: BUG: Initialize the full nditer buffer in case of error - #18923: BLD: remove unnecessary flag ``-faltivec`` on macOS - #18924: MAINT, CI: treats _SIMD module build warnings as errors through... - #18925: BUG: for MINGW, threads.h existence test requires GLIBC > 2.12 - #18941: BUG: Make changelog recognize gh- as a PR number prefix. - #18948: REL, DOC: Prepare for the NumPy 1.20.3 release. - #18953: BUG: Fix failing mypy test in 1.20.x. Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon May 10 13:43:16 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 10 May 2021 10:43:16 -0700 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> Message-ID: <2f5624a12f2005a62f5c2549624fb3684009d41a.camel@sipsolutions.net> On Mon, 2021-05-10 at 10:01 +0100, Eric Wieser wrote: > > The Python version of this does have a `void *ctx`, but I am not > > sure if > the use for this is actually valuable for the NumPy use-cases. > > Do you mean "the CPython version"? If so, can you link a reference? Yes, sorry, had been a while since I had looked it up: https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx That all looks like it can be customized in theory. But I am not sure that it is practical, except for hooking and calling the previous one. (But we also have tracemalloc anyway?) I have to say it feels a bit like exposing things publicly, that are really mainly used internally, but not sure... Presumably Python uses the `ctx` for something though. > > > ?While I like the `PyObject *` idea, I am also not sure that it > > helps > much.? If we want allocation specific state, the user should > overallocate > and save it before the actual allocation. > > I was talking about allocator- not allocation- specific state. I > agree that > the correct place to store the latter is by overallocating, but it > doesn't > make much sense to me to duplicate state about the allocator itself > in each > allocation. > Right, I don't really know a use-case right now. But I am fine with saying: lets pass in some state anyway, to future-proof. Although if we ensure that the API can be extended, even that is probably not really necessary, unless we have a faint idea how it would be used? (I guess the C++ similarity may be a reason, but I am not familiar with that.) > > But if we don't mind the churn it creates, the only serious idea I > > would > have right now is using a `FromSpec` API. We could allow get/set > functions > on it though > > We don't even need to go as far as a flexible `FromSpec` API. Simply > having > a function to allocate (and free) the opaque struct and a handful of > getters ought to be enough to let us change the allocator to be > stateful in > future. > On the other hand, this is probably about as much work as just making > it a > PyObject in the first place. Yeah, if we don't expect things to grow often/much, we can just use what we have now and either add `NULL` argument at the end and/or just make a new function when we need it. The important part would be returning a new struct. I think even opaque is not necessary! If we return the new struct, we can extend it freely and return NULL to indicate an error (thus being able to deprecate if we have to). Right now we don't even have getters in the proposal IIRC, so that part probably just doesn't matter either. (If we want to allow to fall back to the previous allocator this would have to be expanded.) I agree that `PyObject *` is probably just as well if you want the struct to be free'able since then you suddenly need reference counting or similar! But right now the proposal says this is static, and I honestly don't see much reason for it to be freeable? The current use-cases `cupy` or `pnumpy` don't not seem to need it. If we return a new struct (I do not care if opaque or not), all of that can still be expanded. Should we just do that? Or can we think of any downside to that or use-case where this is clearly too limiting right now? Cheers, Sebastian > > Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From matti.picus at gmail.com Mon May 10 23:58:01 2021 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 11 May 2021 06:58:01 +0300 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: <2f5624a12f2005a62f5c2549624fb3684009d41a.camel@sipsolutions.net> References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> <2f5624a12f2005a62f5c2549624fb3684009d41a.camel@sipsolutions.net> Message-ID: <8c06b4f4-ebfb-e8d5-9460-4e700d70ca85@gmail.com> On 10/5/21 8:43 pm, Sebastian Berg wrote: > But right now the proposal says this is static, and I honestly don't > see much reason for it to be freeable? I think this is the crux of the issue. The current design is for a singly-allocated struct to be passed around since it is just an aggregate of functions. If someone wants a different strategy (i.e. different alignment) they create a new policy: there are no additional parameters or data associated with the struct. I don't really see an ask from possible users for anything more, and so would prefer to remain with the simplest possible design. If the need arises in the future for additional data, which is doubtful, I am confident we can expand this as needed, and do not want to burden the current design with unneeded optional features. It would be nice to hear from some actual users if they need the flexibility. In any case I would like to resolve this quickly and get it into the next release, so if Eric is adamant that the advanced design is needed I will accept his proposal, since that seems easier than any of the alternatives so far. Matti From wieser.eric+numpy at gmail.com Tue May 11 04:54:37 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Tue, 11 May 2021 09:54:37 +0100 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: <8c06b4f4-ebfb-e8d5-9460-4e700d70ca85@gmail.com> References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> <2f5624a12f2005a62f5c2549624fb3684009d41a.camel@sipsolutions.net> <8c06b4f4-ebfb-e8d5-9460-4e700d70ca85@gmail.com> Message-ID: > Yes, sorry, had been a while since I had looked it up: > > https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx That `PyMemAllocatorEx` looks almost exactly like one of the two variants I was proposing. Is there a reason for wanting to define our own structure vs just using that one? I think the NEP should at least offer a brief comparison to that structure, even if we ultimately end up not using it. > That all looks like it can be customized in theory. But I am not sure > that it is practical, except for hooking and calling the previous one. Is chaining allocators not likely something we want to support too? For instance, an allocator that is used for large arrays, but falls back to the previous one for small arrays? > I have to say it feels a bit > like exposing things publicly, that are really mainly used internally, > but not sure... Presumably Python uses the `ctx` for something though. I'd argue `ctx` / `baton` / `user_data` arguments are an essential part of any C callback API. I can't find any particularly good reference for this right now, but I have been bitten multiple times by C APIs that forget to add this argument. > If someone wants a different strategy (i.e. different alignment) they create a new policy The crux of the problem here is that without very nasty hacks, C and C++ do not allow new functions to be created at runtime. This makes it very awkward to write a parameterizable allocator. If you want to create two aligned allocators with different alignments, and you don't have a `ctx` argument to plumb through that alignment information, you're forced to write the entire thing twice. > I guess the C++ similarity may be a reason, but I am not familiar with that. Similarity isn't the only motivation - I was considering compatibility. Consider a user who's already written a shiny stateful C++ allocator, and wants to use it with numpy. I've made a gist at https://gist.github.com/eric-wieser/6d0fde53fc1ba7a2fa4ac208467f2ae5 which demonstrates how to hook an arbitrary C++ allocator into this new numpy allocator API, that compares both the NEP version and the version with an added `ctx` argument. The NEP version has a bug that is very hard to fix without duplicating the entire `numpy_handler_from_cpp_allocator` function. If compatibility with C++ seems too much of a stretch, the NEP API is not even compatible with `PyMemAllocatorEx`. > But right now the proposal says this is static, and I honestly don't > see much reason for it to be freeable? The current use-cases `cupy` or > `pnumpy` don't not seem to need it. I don't know much about either of these use cases, so the following is speculative. In cupy, presumably the application is to tie allocation to a specific GPU device. Presumably then, somewhere in the python code there is a handle to a GPU object, through which the allocators operate. If that handle is stored in the allocator, and the allocator is freeable, then it is possible to write code that automatically releases the GPU handle after the allocator has been restored to the default and the last array using it is cleaned up. If that cupy use-case seems somwhat plausible, then I think we should go with the PyObject approach. If it doesn't seem plausible, then I think the `ctx` approach is acceptable, and we should consider declaring our struct ```struct { PyMemAllocatorEx allocator; char const *name; }``` to reuse the existing python API unless there's a reason not to. Eric On Tue, 11 May 2021 at 04:58, Matti Picus wrote: > On 10/5/21 8:43 pm, Sebastian Berg wrote: > > > But right now the proposal says this is static, and I honestly don't > > see much reason for it to be freeable? > > > I think this is the crux of the issue. The current design is for a > singly-allocated struct to be passed around since it is just an > aggregate of functions. If someone wants a different strategy (i.e. > different alignment) they create a new policy: there are no additional > parameters or data associated with the struct. I don't really see an ask > from possible users for anything more, and so would prefer to remain > with the simplest possible design. If the need arises in the future for > additional data, which is doubtful, I am confident we can expand this as > needed, and do not want to burden the current design with unneeded > optional features. > > > It would be nice to hear from some actual users if they need the > flexibility. > > > In any case I would like to resolve this quickly and get it into the > next release, so if Eric is adamant that the advanced design is needed I > will accept his proposal, since that seems easier than any of the > alternatives so far. > > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue May 11 17:46:19 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 11 May 2021 14:46:19 -0700 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> <2f5624a12f2005a62f5c2549624fb3684009d41a.camel@sipsolutions.net> <8c06b4f4-ebfb-e8d5-9460-4e700d70ca85@gmail.com> Message-ID: <7c38007bcf0480a33ab4118d4bc1f3194a061b8a.camel@sipsolutions.net> On Tue, 2021-05-11 at 09:54 +0100, Eric Wieser wrote: > > Yes, sorry, had been a while since I had looked it up: > > > > https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx > > That `PyMemAllocatorEx` looks almost exactly like one of the two > variants I > was proposing. Is there a reason for wanting to define our own > structure vs > just using that one? > I think the NEP should at least offer a brief comparison to that > structure, > even if we ultimately end up not using it. > > > That all looks like it can be customized in theory. But I am not > > sure > > that it is practical, except for hooking and calling the previous > > one. > > Is chaining allocators not likely something we want to support too? > For > instance, an allocator that is used for large arrays, but falls back > to the > previous one for small arrays? > > > I have to say it feels a bit > > like exposing things publicly, that are really mainly used > > internally, > > but not sure...? Presumably Python uses the `ctx` for something > > though. > > I'd argue `ctx` / `baton` / `user_data` arguments are an essential > part of > any C callback API. > I can't find any particularly good reference for this right now, but > I have > been bitten multiple times by C APIs that forget to add this > argument. Can't argue with that :). I am personally still mostly a bit concerned that we have some way to modify/extend in the future (even clunky seems fine). Beyond that, I don't care all that much. Passing a context feels right to me, but neither do I know that we need it. Using PyObject still feels a bit much, but I am also not opposed. I guess for future extension, we would have to subclass ourselves and/or include an ABI version number (if just to avoid `PyObject_TypeCheck` calls to figure out which ABI version we got). Otherwise, either allocating the struct or including a version number (or reserved space) in the struct/PyObject is probably good enough to to ensure we have a path for modifying/extending the ABI. I hope that the actual end-users can chip in and clear it up a bit... Cheers, Sebastian > > > ?If someone wants a different strategy (i.e. different alignment) > > they > create a new policy > > The crux of the problem here is that without very nasty hacks, C and > C++ do > not allow new functions to be created at runtime. > This makes it very awkward to write a parameterizable allocator. If > you > want to create two aligned allocators with different alignments, and > you > don't have a `ctx` argument to plumb through that alignment > information, > you're forced to write the entire thing twice. > > > I guess the C++ similarity may be a reason, but I am not familiar > > with > that. > > Similarity isn't the only motivation - I was considering > compatibility. > Consider a user who's already written a shiny stateful C++ allocator, > and > wants to use it with numpy. > I've made a gist at > https://gist.github.com/eric-wieser/6d0fde53fc1ba7a2fa4ac208467f2ae5? > which > demonstrates how to hook an arbitrary C++ allocator into this new > numpy > allocator API, that compares both the NEP version and the version > with an > added `ctx` argument. > The NEP version has a bug that is very hard to fix without > duplicating the > entire `numpy_handler_from_cpp_allocator` function. > > If compatibility with C++ seems too much of a stretch, the NEP API is > not > even compatible with `PyMemAllocatorEx`. > > > But right now the proposal says this is static, and I honestly > > don't > > see much reason for it to be freeable?? The current use-cases > > `cupy` or > > `pnumpy` don't not seem to need it. > > I don't know much about either of these use cases, so the following > is > speculative. > In cupy, presumably the application is to tie allocation to a > specific GPU > device. > Presumably then, somewhere in the python code there is a handle to a > GPU > object, through which the allocators operate. > If that handle is stored in the allocator, and the allocator is > freeable, > then it is possible to write code that automatically releases the GPU > handle after the allocator has been restored to the default and the > last > array using it is cleaned up. > > If that cupy use-case seems somwhat plausible, then I think we should > go > with the PyObject approach. > If it doesn't seem plausible, then I think the `ctx` approach is > acceptable, and we should consider declaring our struct > ```struct { PyMemAllocatorEx allocator; char const *name; }``` to > reuse the > existing python API unless there's a reason not to. > > Eric > > > > > On Tue, 11 May 2021 at 04:58, Matti Picus > wrote: > > > On 10/5/21 8:43 pm, Sebastian Berg wrote: > > > > > But right now the proposal says this is static, and I honestly > > > don't > > > see much reason for it to be freeable? > > > > > > I think this is the crux of the issue. The current design is for a > > singly-allocated struct to be passed around since it is just an > > aggregate of functions. If someone wants a different strategy (i.e. > > different alignment) they create a new policy: there are no > > additional > > parameters or data associated with the struct. I don't really see > > an ask > > from possible users for anything more, and so would prefer to > > remain > > with the simplest possible design. If the need arises in the future > > for > > additional data, which is doubtful, I am confident we can expand > > this as > > needed, and do not want to burden the current design with unneeded > > optional features. > > > > > > It would be nice to hear from some actual users if they need the > > flexibility. > > > > > > In any case I would like to resolve this quickly and get it into > > the > > next release, so if Eric is adamant that the advanced design is > > needed I > > will accept his proposal, since that seems easier than any of the > > alternatives so far. > > > > > > Matti > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Tue May 11 23:06:30 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 11 May 2021 20:06:30 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_42_status_=E2=80=93_Store_quant?= =?utf-8?q?ity_in_a_NumPy_array_and_convert_it_=3A=29?= In-Reply-To: <9f63a292b2811bed6f0a6c27346879ae62b7432e.camel@sipsolutions.net> References: <27f75a86c08b0e7a3d64ea6f466e73cfe0c12e9c.camel@sipsolutions.net> <2e803e9c3091d77f819036fd1ffb8d395ef51d71.camel@sipsolutions.net> <9f63a292b2811bed6f0a6c27346879ae62b7432e.camel@sipsolutions.net> Message-ID: <671b318070a46b1bdcdb5373200bfcfbdbf960b3.camel@sipsolutions.net> On Thu, 2021-03-25 at 17:27 -0500, Sebastian Berg wrote: > On Wed, 2021-03-17 at 17:12 -0500, Sebastian Berg wrote: > > On Wed, 2021-03-17 at 07:56 -0500, Lee Johnston wrote: > > > > > 3. In parallel, I will create a small "toy" DType based on that > > ?? experimental API.? Probably in a separate repo (in the NumPy > > ?? organization?). > > As a small update to the experimental user DTypes. The branches now include the merge of the PR https://github.com/numpy/numpy/pull/18905 which implementes most of NEP 43 to refactor ufuncs and allow new user DTypes for them. (The PR does not cover reductions, so those are also missing here.) That means, the unit dtype's multiplication can be written as `np.multiply(unit_arr, unit_arr)` or just `unit_arr * unit_arr`. And after importing the right experimental module string comparison can use `np.equal` directly. With that, the most central parts of dtypes exists far enough to play around. (For units the simple re-use of existing math functions is missing, though). Cheers, Sebastian > > So this is started. What you need to do right now if you want to try > is > work of this branch in NumPy: > > ???? > https://github.com/numpy/numpy/compare/main...seberg:experimental-dtype-api > > Install NumPy with `NPY_USE_NEW_CASTINGIMPL=1 python -mpip install .` > or your favorite alternative. > (The `NPY_USE_NEW_CASTINGIMPL=1` should be unnecessary very soon, > working of a branch and not "main" will hopefully also be unnecessary > soon.) > > > Then fetch: https://github.com/seberg/experimental_user_dtypes > and install it as well in the same environment. > > > After that, you can jump through the hoop of setting: > > ??? NUMPY_EXPERIMENTAL_DTYPE_API=1 > > And you can enjoy these type of examples (while expecting hard > crashes > when going too far beyond!): > > ??? from experimental_user_dtypes import float64unit as u > ??? import numpy as np > > ??? F = np.array([u.Quantity(70., "Fahrenheit")]) > ??? C = F.astype(u.Float64UnitDType("Celsius")) > ??? print(repr(C)) > ??? # array([21.11111111111115 ?C], dtype='Float64UnitDType(degC)') > > ??? m = np.array([u.Quantity(5., "m")]) > ??? m_squared = u.multiply(m, m) > ??? print(repr(m_squared)) > ??? # array([25.0 m**2], dtype='Float64UnitDType(m**2)') > > ??? # Or conversion to SI the long route: > ??? pc = np.arange(5., > dtype="float64").view(u.Float64UnitDType("pc")) > ??? pc.astype(pc.dtype.si()) > ??? # array([0.0 m, 3.085677580962325e+16 m, 6.17135516192465e+16 m, > ??? #??????? 9.257032742886974e+16 m, 1.23427103238493e+17 m], > ??? #?????? dtype='Float64UnitDType(m)') > > > Yes, the code has some horrible hacks around creating the DType, but > the basic mechanism i.e. "functions you need to implement" are not > expected to change lot. > > Right now, it forces you to use and implement the scalar `u.Quantity` > and the code sample uses it. But you can also do: > > ??? np.arange(3.).view(u.Float64UnitDType("m")) > > I do have plans to "not have a scalar" so the 0-D result would still > be > an array.? But that option doesn't exist yet (and right now the > scalar > is used for printing). > > > (There is also a `string_equal` "ufunc-like" that works on "S" > dtypes.) > > Cheers, > > Sebastian > > > > PS: I need to figure out some details about how to create DTypes and > DType instances with regards to our stable ABI.? The current > "solution" > is some weird subclassing hoops which are probably not good. > > That is painful unfortunately and any ideas would be great :). > Unfortunately, it requires a grasp around the C-API and > metaclassing... > > > > > > > Anyone using the API, should expect bugs, crashes and changes for a > > while.? But hopefully will only require small code modifications > > when > > the API becomes public. > > > > My personal plan for a toy example is currently a "scaled integer". > > E.g. a uint8 where you can set a range `[min_double, max_double]` > > that > > it maps to (which makes the DType "parametric"). > > We discussed some other examples, such as a "modernized" rational > > DType, that could be nice as well, lets see... > > > > Units would be a great experiment, but seem a bit complex to me (I > > don't know units well though). So to keep it baby steps :) I would > > aim > > for doing the above and then we can experiment on Units together! > > > > > > Since it came up:? I agree that a Python API would be great to > > have. > > It > > is something I firmly kept on the back-burner...? It should not be > > very > > hard (if rudimentary), but unless it would help experiments a lot, > > I > > would tend to leave it on the back-burner for now. > > > > Cheers, > > > > Sebastian > > > > > > [1]? Maybe a `uint8` storage that maps to evenly spaced values on a > > parametric range `[double_min, double_max]`.? That seems like a > > good > > trade-off in complexity. > > > > > > > > > On Tue, Mar 16, 2021 at 4:11 PM Sebastian Berg < > > > sebastian at sipsolutions.net> > > > wrote: > > > > > > > On Tue, 2021-03-16 at 13:17 -0500, Lee Johnston wrote: > > > > > Is the work on NEP 42 custom DTypes far enough along to > > > > > experiment > > > > > with? > > > > > > > > > > > > > TL;DR:? Its not quite ready, but if we work together I think we > > > > could > > > > experiment a fair bit.? Mainly ufuncs are still limited (though > > > > not > > > > quite completely missing).? The main problem is that we need to > > > > find a > > > > way to expose the currently private API. > > > > > > > > I would be happy to discuss this also in a call. > > > > > > > > > > > > ** The long story: ** > > > > > > > > There is one more PR related to casting, for which merge should > > > > be > > > > around the corner. And which would bring a lot bang to such an > > > > experiment: > > > > > > > > https://github.com/numpy/numpy/pull/18398 > > > > > > > > > > > > At that point, the new machinery supports (or is used for): > > > > > > > > * Array-coercion: `np.array([your_scalar])` or > > > > ? `np.array([1], dtype=your_dtype)`. > > > > > > > > * Casting (practically full support). > > > > > > > > * UFuncs do not quite work. But short of writing `np.add(arr1, > > > > arr2)` > > > > ? with your DType involved, you can try a whole lot. (see > > > > below) > > > > > > > > * Promotion `np.result_type` should work very soon, but > > > > probably > > > > isn't > > > > ? is not very relevant anyway until ufuncs are fully > > > > implemented. > > > > > > > > That should allow you to do a lot of good experimentation, but > > > > due > > > > to > > > > the ufunc limitation, maybe not well on "existing" python code. > > > > > > > > > > > > The long story about limitations is: > > > > > > > > We are missing exposure of the new public API.? I think I > > > > should > > > > be > > > > able to provide a solution for this pretty quickly, but it > > > > might > > > > require working of a NumPy branch.? (I will write another email > > > > about > > > > it, hopefully we can find a better solution.) > > > > > > > > > > > > Limitations for UFuncs:? UFuncs are the next big project, so to > > > > try > > > > it > > > > fully you will need some patience, unfortunately. > > > > > > > > But, there is some good news!? You can write most of the > > > > "ufunc" > > > > already, you just can't "register" it. > > > > So what I can already offer you is a "DType-specific UFunc", > > > > e.g.: > > > > > > > > ?? unit_dtype_multiply(np.array([1.], > > > > dtype=Float64UnitDType("m")), > > > > ?????????????????????? np.array([2.], > > > > dtype=Float64UnitDtype("s"))) > > > > > > > > And get out `np.array([2.], dtype=Float64UnitDtype("m s"))`. > > > > > > > > But you can't write `np.multiple(arr1, arr2)` or `arr1 * arr2` > > > > yet. > > > > Both registration and "promotion" logic are missing. > > > > > > > > I admit promotion may be one of the trickiest things, but > > > > trying > > > > this a > > > > bit might help with getting a clearer picture for promotion as > > > > well. > > > > > > > > > > > > The main last limitation is that I did not replace or create > > > > "fallback" > > > > solutions and/or replacement for the legacy `dtype->f->` > > > > yet. > > > > This is not a serious limitation for experimentation, though.? > > > > It > > > > might > > > > even make sense to keep some of them around and replace them > > > > slowly. > > > > > > > > > > > > And of course, all the small issues/limitations that are not > > > > fixed > > > > because nobody tried yet... > > > > > > > > > > > > > > > > I hope this doesn't scare you away, or at least not for long > > > > :/.? > > > > It > > > > could be very useful to start experimentation soon to push > > > > things > > > > forward a bit quicker.? And I really want to have at least an > > > > experimental version in NumPy 1.21. > > > > > > > > Cheers, > > > > > > > > Sebastian > > > > > > > > > > > > > Lee > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Tue May 11 23:27:27 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 11 May 2021 20:27:27 -0700 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday Message-ID: <08022a4886f95503d8fd972a70469b43d765df0f.camel@sipsolutions.net> Hi all, There will be a NumPy Community meeting Wednesday Mai 12th at 20:00 UTC. Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian From leo80042 at gmail.com Wed May 12 16:56:59 2021 From: leo80042 at gmail.com (leofang) Date: Wed, 12 May 2021 13:56:59 -0700 (MST) Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> <2f5624a12f2005a62f5c2549624fb3684009d41a.camel@sipsolutions.net> <8c06b4f4-ebfb-e8d5-9460-4e700d70ca85@gmail.com> Message-ID: <1620853019135-0.post@n7.nabble.com> Eric Wieser wrote >> Yes, sorry, had been a while since I had looked it up: >> >> https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx > > That `PyMemAllocatorEx` looks almost exactly like one of the two variants > I was proposing. Is there a reason for wanting to define our own structure > vs just using that one? > I think the NEP should at least offer a brief comparison to that > structure, even if we ultimately end up not using it. Agreed. Eric Wieser wrote >> But right now the proposal says this is static, and I honestly don't >> see much reason for it to be freeable? The current use-cases `cupy` or >> `pnumpy` don't not seem to need it. > > I don't know much about either of these use cases, so the following is > speculative. > In cupy, presumably the application is to tie allocation to a specific GPU > device. > Presumably then, somewhere in the python code there is a handle to a GPU > object, through which the allocators operate. > If that handle is stored in the allocator, and the allocator is freeable, > then it is possible to write code that automatically releases the GPU > handle after the allocator has been restored to the default and the last > array using it is cleaned up. > > If that cupy use-case seems somwhat plausible, then I think we should go > with the PyObject approach. > If it doesn't seem plausible, then I think the `ctx` approach is > acceptable, and we should consider declaring our struct > ```struct { PyMemAllocatorEx allocator; char const *name; }``` to reuse > the > existing python API unless there's a reason not to. Coming as a CuPy contributor here. The discussion of using this new NEP is not yet finalized in CuPy, so I am only speaking for the potential usage that I conceived. The original idea of using a custom NumPy allocator in CuPy (or any GPU library) is to allocate pinned / page-locked memory, which is on host (CPU). The idea is to explore the fact that device-host transfer is faster when pinned memory is in use. So, if I am calling arr_cpu = cupy.asnumpy(arr_gpu) to create a NumPy array and make a D2H transfer, and if I know arr_cpu's buffer is going to be reused several times, then it's better for it to be backed by pinned memory from the beginning. While there are tricks to achieve this, such a use pattern can be quite common in user codes, so it's much easier if the allocator can be configurable to avoid repeating boilerplates. An interesting note: this new custom allocator can be used to allocate managed/unified memory from CUDA. This memory lives in a unified address space so that both CPU and GPU can access it. I do not have much to say about this use case however. Now, I am not fully sure we need `void* ctx` or even make it a `PyObject`. My understanding (correct me if I am wrong!) is that the allocator state is considered internal. Imagine I set `alloc` in `PyDataMem_Handler` to be `alloc_my_mempool`, which has access to the internal of a memory pool class that manage a pool of pinned memory. Then whatever information should just be kept inside my mempool (including alignment, pool size, etc). I could implement the pool as a C++ class, and expose the alloc/free/etc member functions to C with some hacks. If using Cython, I suppose it's less hacky to expose a method of a cdef class. On the other hand, for pure C code life is probably easier if ctx is there. One way or another someone must keep a unique instance of that struct or class alive, so I do not have strong opinion. Best, Leo -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From sujalmaiti123456 at gmail.com Thu May 13 08:51:45 2021 From: sujalmaiti123456 at gmail.com (1DS19IS109_Sujal) Date: Thu, 13 May 2021 05:51:45 -0700 Subject: [Numpy-discussion] GSOD 2021 Message-ID: Can I still apply for GSOD'21? -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Thu May 13 11:17:19 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Thu, 13 May 2021 12:17:19 -0300 Subject: [Numpy-discussion] GSOD 2021 In-Reply-To: References: Message-ID: Hello, Unfortunately it is a bit late as the deadline for hiring a candidate is next Monday, May 17. We are close to reaching a decision already. Maybe next year? :) Cheers, Melissa On Thu, May 13, 2021 at 9:53 AM 1DS19IS109_Sujal wrote: > Can I still apply for GSOD'21? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elias.koromilas at gmail.com Thu May 13 12:06:02 2021 From: elias.koromilas at gmail.com (eliaskoromilas) Date: Thu, 13 May 2021 09:06:02 -0700 (MST) Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> <2f5624a12f2005a62f5c2549624fb3684009d41a.camel@sipsolutions.net> <8c06b4f4-ebfb-e8d5-9460-4e700d70ca85@gmail.com> Message-ID: <1620921962860-0.post@n7.nabble.com> Eric Wieser wrote >> Yes, sorry, had been a while since I had looked it up: >> >> https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx > > That `PyMemAllocatorEx` looks almost exactly like one of the two variants > I > was proposing. Is there a reason for wanting to define our own structure > vs > just using that one? > I think the NEP should at least offer a brief comparison to that > structure, > even if we ultimately end up not using it. > >> I have to say it feels a bit >> like exposing things publicly, that are really mainly used internally, >> but not sure... Presumably Python uses the `ctx` for something though. > > I'd argue `ctx` / `baton` / `user_data` arguments are an essential part of > any C callback API. > I can't find any particularly good reference for this right now, but I > have > been bitten multiple times by C APIs that forget to add this argument. > >> If someone wants a different strategy (i.e. different alignment) they > create a new policy > > The crux of the problem here is that without very nasty hacks, C and C++ > do > not allow new functions to be created at runtime. > This makes it very awkward to write a parameterizable allocator. If you > want to create two aligned allocators with different alignments, and you > don't have a `ctx` argument to plumb through that alignment information, > you're forced to write the entire thing twice. The `PyMemAllocatorEx` memory API will allow (lambda) closure-like definition of the data mem routines. That's the main idea behind the `ctx` thing, it's huge and will enable every allocation scenario. In my opinion, the rest of the proposals (PyObjects, PyCapsules, etc.) are secondary and could be considered out-of-scope. I would suggest to let people use this before hiding it behind a strict API. Let me also give you an insight of how we plan to do it, since we are the first to integrate this in production code. Considering this NEP as a primitive API, I developed a new project to address our requirements: 1. Provide a Python-native way to define a new numpy allocator 2. Accept data mem routine symbols (function pointers) from open dynamic libraries 3. Allow local-scoped allocation, e.g. inside a `with` statement But since there was not much fun in these, I thought it would be nice if we could exploit `ctypes` callback functions, to allow developers hook into such routines natively (e.g. for debugging/monitoring), or even write them entirely in Python (of course there has to be an underlying memory allocation API). For example, the idea is to be able to define a page-aligned allocator in ~30 lines of Python code, like that: https://github.com/inaccel/numpy-allocator/blob/master/test/aligned_allocator.py --- While experimenting with this project I spotted the two following issues: 1. Thread-locality My biggest concern is the global scope of the numpy `current_allocator` variable. Currently, an allocator change is applied globally affecting every thread. This behavior breaks the local-scoped allocation promise of my project. Imagine for example the implications of allocating pinned (page-locked) memory (since you mention this use-case a lot) for random glue-code ndarrays in background threads. 2. Allocator context (already discussed) I found a bug, when I tried to use a Python callback (`ctypes.CFUNCTION`) for the `PyDataMem_FreeFunc` routine. Since there are cases in which the `free` routine is invoked after a PyErr has occurred (to clean up internal arrays for example), `ctypes` messes with the exception state badly. This problem can be resolved with the the use of a `ctx` (allocator context) that will allow the routines to run clean of errors, wrapping them like that: ``` static void wrapped_free(void *ptr, size_t size, void *ctx) { PyObject *type; PyObject *value; PyObject *traceback; PyErr_Fetch(&type, &value, &traceback); ((PyDataMem_Context *) ctx)->free(ptr, size); PyErr_Restore(type, value, traceback); } ``` Note: This bug doesn't affect `CDLL` members (CFuncPtr objects), since they are pure `dlsym` pointers. Of course, this is a simple case of how a `ctx` could be useful for an allocation policy. I guess people can become very creative with this in general. Elias -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From isaac.gerg at gergltd.com Fri May 14 10:04:55 2021 From: isaac.gerg at gergltd.com (Isaac Gerg) Date: Fri, 14 May 2021 10:04:55 -0400 Subject: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though! Message-ID: I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]. I have two python processes running (i.e. no threads) which do independent processing jobs and NOT writing to the same directories. Each process runs for 5-10 hours and then writes out a ~900MB npz file containing 4 arrays. When I go back to read in the npz files, I will sporadically get bad CRC errors which are related to npz using ziplib. I cannot figure out why this is happening. Looking through online forums, other folks have had CRC problems but they seem to be isolated to specifically using ziblib, not numpy. I have found a few mentions though of ziplib causing headaches if the same file pointer is used across calls when one uses the file handle interface to ziblib as opposed to passing in a filename.' I have verified with 7zip that the files do in fact have a CRC error so its not an artifact of the ziblib. I have also used the file handle interface to np.load and still get the error. Aside from writing my own numpy storage file container, I am stumped as to how to fix this, or reproduce this in a consistent manner. Any suggestions would be greatly appreciated! Thank you, Isaac -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Fri May 14 10:15:22 2021 From: ben.v.root at gmail.com (Benjamin Root) Date: Fri, 14 May 2021 10:15:22 -0400 Subject: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though! In-Reply-To: References: Message-ID: Perhaps it is a similar bug as this one? https://github.com/scipy/scipy/issues/6999 Basically, it turned out that the CRC was getting computed on an unflushed buffer, or something like that. On Fri, May 14, 2021 at 10:05 AM Isaac Gerg wrote: > I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, > Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]. > > I have two python processes running (i.e. no threads) which do > independent processing jobs and NOT writing to the same directories. Each > process runs for 5-10 hours and then writes out a ~900MB npz file > containing 4 arrays. > > When I go back to read in the npz files, I will sporadically get bad CRC > errors which are related to npz using ziplib. I cannot figure out why this > is happening. Looking through online forums, other folks have had CRC > problems but they seem to be isolated to specifically using ziblib, not > numpy. I have found a few mentions though of ziplib causing headaches if > the same file pointer is used across calls when one uses the file handle > interface to ziblib as opposed to passing in a filename.' > > I have verified with 7zip that the files do in fact have a CRC error so > its not an artifact of the ziblib. I have also used the file handle > interface to np.load and still get the error. > > Aside from writing my own numpy storage file container, I am stumped as to > how to fix this, or reproduce this in a consistent manner. Any suggestions > would be greatly appreciated! > > Thank you, > Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From isaac.gerg at gergltd.com Fri May 14 10:21:59 2021 From: isaac.gerg at gergltd.com (Isaac Gerg) Date: Fri, 14 May 2021 10:21:59 -0400 Subject: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though! In-Reply-To: References: Message-ID: Hi Ben, I am not sure. However, in looking at the dates, it looks like that was fixed in scipy as of 2019. Would you recommend using the scipy save interface as opposed to the numpy one? On Fri, May 14, 2021 at 10:16 AM Benjamin Root wrote: > Perhaps it is a similar bug as this one? > https://github.com/scipy/scipy/issues/6999 > > Basically, it turned out that the CRC was getting computed on an unflushed > buffer, or something like that. > > On Fri, May 14, 2021 at 10:05 AM Isaac Gerg > wrote: > >> I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, >> Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]. >> >> I have two python processes running (i.e. no threads) which do >> independent processing jobs and NOT writing to the same directories. Each >> process runs for 5-10 hours and then writes out a ~900MB npz file >> containing 4 arrays. >> >> When I go back to read in the npz files, I will sporadically get bad CRC >> errors which are related to npz using ziplib. I cannot figure out why this >> is happening. Looking through online forums, other folks have had CRC >> problems but they seem to be isolated to specifically using ziblib, not >> numpy. I have found a few mentions though of ziplib causing headaches if >> the same file pointer is used across calls when one uses the file handle >> interface to ziblib as opposed to passing in a filename.' >> >> I have verified with 7zip that the files do in fact have a CRC error so >> its not an artifact of the ziblib. I have also used the file handle >> interface to np.load and still get the error. >> >> Aside from writing my own numpy storage file container, I am stumped as >> to how to fix this, or reproduce this in a consistent manner. >> Any suggestions would be greatly appreciated! >> >> Thank you, >> Isaac >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.k.sheppard at gmail.com Fri May 14 10:28:31 2021 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Fri, 14 May 2021 15:28:31 +0100 Subject: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though! In-Reply-To: References: , Message-ID: <8E86AF0B-6E51-41FC-BE1F-F15A5C018EDC@hxcore.ol> An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Fri May 14 11:37:38 2021 From: ben.v.root at gmail.com (Benjamin Root) Date: Fri, 14 May 2021 11:37:38 -0400 Subject: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though! In-Reply-To: References: Message-ID: Isaac, What I mean is that your bug might be similar to the savemat() bug that was fixed in scipy in 2019. Completely different functions, but both functions need to properly interact with zlib in order to work properly. On Fri, May 14, 2021 at 10:22 AM Isaac Gerg wrote: > Hi Ben, I am not sure. However, in looking at the dates, it looks like > that was fixed in scipy as of 2019. > > Would you recommend using the scipy save interface as opposed to the numpy > one? > > On Fri, May 14, 2021 at 10:16 AM Benjamin Root > wrote: > >> Perhaps it is a similar bug as this one? >> https://github.com/scipy/scipy/issues/6999 >> >> Basically, it turned out that the CRC was getting computed on an >> unflushed buffer, or something like that. >> >> On Fri, May 14, 2021 at 10:05 AM Isaac Gerg >> wrote: >> >>> I am using 1.19.5 on Windows 10 using Python 3.8.6 (tags/v3.8.6:db45529, >>> Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]. >>> >>> I have two python processes running (i.e. no threads) which do >>> independent processing jobs and NOT writing to the same directories. Each >>> process runs for 5-10 hours and then writes out a ~900MB npz file >>> containing 4 arrays. >>> >>> When I go back to read in the npz files, I will sporadically get bad CRC >>> errors which are related to npz using ziplib. I cannot figure out why this >>> is happening. Looking through online forums, other folks have had CRC >>> problems but they seem to be isolated to specifically using ziblib, not >>> numpy. I have found a few mentions though of ziplib causing headaches if >>> the same file pointer is used across calls when one uses the file handle >>> interface to ziblib as opposed to passing in a filename.' >>> >>> I have verified with 7zip that the files do in fact have a CRC error so >>> its not an artifact of the ziblib. I have also used the file handle >>> interface to np.load and still get the error. >>> >>> Aside from writing my own numpy storage file container, I am stumped as >>> to how to fix this, or reproduce this in a consistent manner. >>> Any suggestions would be greatly appreciated! >>> >>> Thank you, >>> Isaac >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.miccoli at polimi.it Fri May 14 11:48:10 2021 From: stefano.miccoli at polimi.it (Stefano Miccoli) Date: Fri, 14 May 2021 15:48:10 +0000 Subject: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though! In-Reply-To: References: Message-ID: <6B68B988-06A6-4573-9377-D568CA8E285D@polimi.it> If changing the on-disk format is an option, I would suggest h5py which allows to save numpy arrays in HDF5 format. Stefano On 14 May 2021, at 16:22, numpy-discussion-request at python.org wrote: Aside from writing my own numpy storage file container, I am stumped as to how to fix this, or reproduce this in a consistent manner. Any suggestions would be greatly appreciated! Thank you, Isaac -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Fri May 14 14:41:56 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Fri, 14 May 2021 15:41:56 -0300 Subject: [Numpy-discussion] Google Season of Docs Message-ID: Hello, all! I am happy to announce we found our technical writer for the Google Season of Docs program. After looking at all submitted proposals we have decided to hire Mukulika. You can see her Statement of Interest here: https://mail.python.org/pipermail/numpy-discussion/2021-May/081746.html We are very glad to participate again this year and sincerely thank all the technical writers who took an interest in our project and that contacted us about it. If you want to keep contributing to NumPy, you are all most welcome! Mukulika has already started doing some contributions to NumPy and we wish her a successful project. Me and Ross will be the mentors for this proposal but the community is also encouraged to participate with suggestions and comments. You can check out the timeline and more details about GSoD here: https://developers.google.com/season-of-docs/docs/timeline Cheers, Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri May 14 14:47:02 2021 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 14 May 2021 11:47:02 -0700 Subject: [Numpy-discussion] Google Season of Docs In-Reply-To: References: Message-ID: On Fri, May 14, 2021, at 11:41, Melissa Mendon?a wrote: > I am happy to announce we found our technical writer for the Google Season of Docs program. After looking at all submitted proposals we have decided to hire Mukulika. You can see her Statement of Interest here: https://mail.python.org/pipermail/numpy-discussion/2021-May/081746.html A great big welcome to the team, Mukulika! We are excited to work with you. Thank you also to Melissa & Ross for mentoring, and to all the other applicants for their interest in the project. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri May 14 16:37:12 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 14 May 2021 22:37:12 +0200 Subject: [Numpy-discussion] Google Season of Docs In-Reply-To: References: Message-ID: On Fri, May 14, 2021 at 8:47 PM Stefan van der Walt wrote: > On Fri, May 14, 2021, at 11:41, Melissa Mendon?a wrote: > > I am happy to announce we found our technical writer for the Google Season > of Docs program. After looking at all submitted proposals we have decided > to hire Mukulika. You can see her Statement of Interest here: > https://mail.python.org/pipermail/numpy-discussion/2021-May/081746.html > > > A great big welcome to the team, Mukulika! We are excited to work with > you. > Great to have you on the team Mukulika, welcome! > Thank you also to Melissa & Ross for mentoring, and to all the other > applicants for their interest in the project. > +10 Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From isaac.gerg at gergltd.com Fri May 14 16:46:19 2021 From: isaac.gerg at gergltd.com (Isaac Gerg) Date: Fri, 14 May 2021 16:46:19 -0400 Subject: [Numpy-discussion] bad CRC errors when using np.savez, only sometimes though! In-Reply-To: References: Message-ID: Is it zlib or zipfile? On Fri, May 14, 2021 at 11:38 AM Benjamin Root wrote: > Isaac, > > What I mean is that your bug might be similar to the savemat() bug that > was fixed in scipy in 2019. Completely different functions, but both > functions need to properly interact with zlib in order to work properly. > > On Fri, May 14, 2021 at 10:22 AM Isaac Gerg > wrote: > >> Hi Ben, I am not sure. However, in looking at the dates, it looks like >> that was fixed in scipy as of 2019. >> >> Would you recommend using the scipy save interface as opposed to the >> numpy one? >> >> On Fri, May 14, 2021 at 10:16 AM Benjamin Root >> wrote: >> >>> Perhaps it is a similar bug as this one? >>> https://github.com/scipy/scipy/issues/6999 >>> >>> Basically, it turned out that the CRC was getting computed on an >>> unflushed buffer, or something like that. >>> >>> On Fri, May 14, 2021 at 10:05 AM Isaac Gerg >>> wrote: >>> >>>> I am using 1.19.5 on Windows 10 using Python 3.8.6 >>>> (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]. >>>> >>>> I have two python processes running (i.e. no threads) which do >>>> independent processing jobs and NOT writing to the same directories. Each >>>> process runs for 5-10 hours and then writes out a ~900MB npz file >>>> containing 4 arrays. >>>> >>>> When I go back to read in the npz files, I will sporadically get bad >>>> CRC errors which are related to npz using ziplib. I cannot figure out why >>>> this is happening. Looking through online forums, other folks have had CRC >>>> problems but they seem to be isolated to specifically using ziblib, not >>>> numpy. I have found a few mentions though of ziplib causing headaches if >>>> the same file pointer is used across calls when one uses the file handle >>>> interface to ziblib as opposed to passing in a filename.' >>>> >>>> I have verified with 7zip that the files do in fact have a CRC error so >>>> its not an artifact of the ziblib. I have also used the file handle >>>> interface to np.load and still get the error. >>>> >>>> Aside from writing my own numpy storage file container, I am stumped as >>>> to how to fix this, or reproduce this in a consistent manner. >>>> Any suggestions would be greatly appreciated! >>>> >>>> Thank you, >>>> Isaac >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mukulikapahari at gmail.com Sat May 15 00:52:58 2021 From: mukulikapahari at gmail.com (Mukulika Pahari) Date: Sat, 15 May 2021 10:22:58 +0530 Subject: [Numpy-discussion] Google Season of Docs In-Reply-To: References: Message-ID: Thank you so much for this opportunity! I am so excited to get started with this project. I really appreciate all the help and encouragement from the community. It is going to be so much fun to work with you all! Cheers, Mukulika On Sat, May 15, 2021 at 2:18 AM wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at python.org > > You can reach the person managing the list at > numpy-discussion-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Google Season of Docs (Melissa Mendon?a) > 2. Re: Google Season of Docs (Stefan van der Walt) > 3. Re: Google Season of Docs (Ralf Gommers) > 4. Re: bad CRC errors when using np.savez, only sometimes > though! (Isaac Gerg) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 14 May 2021 15:41:56 -0300 > From: Melissa Mendon?a > To: Discussion of Numerical Python > Subject: [Numpy-discussion] Google Season of Docs > Message-ID: > < > CAC7J6VZZVKLuhC14F1emvo1Vh40y+QhXfEYL8whxk9-BPiNp0Q at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello, all! > > I am happy to announce we found our technical writer for the Google Season > of Docs program. After looking at all submitted proposals we have decided > to hire Mukulika. You can see her Statement of Interest here: > https://mail.python.org/pipermail/numpy-discussion/2021-May/081746.html > > We are very glad to participate again this year and sincerely thank all the > technical writers who took an interest in our project and that contacted us > about it. If you want to keep contributing to NumPy, you are all most > welcome! > > Mukulika has already started doing some contributions to NumPy and we wish > her a successful project. Me and Ross will be the mentors for this proposal > but the community is also encouraged to participate with suggestions and > comments. You can check out the timeline and more details about GSoD here: > https://developers.google.com/season-of-docs/docs/timeline > > Cheers, > > Melissa > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://mail.python.org/pipermail/numpy-discussion/attachments/20210514/1dce5a39/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Fri, 14 May 2021 11:47:02 -0700 > From: "Stefan van der Walt" > To: "Discussion of Numerical Python" > Subject: Re: [Numpy-discussion] Google Season of Docs > Message-ID: > Content-Type: text/plain; charset="utf-8" > > On Fri, May 14, 2021, at 11:41, Melissa Mendon?a wrote: > > I am happy to announce we found our technical writer for the Google > Season of Docs program. After looking at all submitted proposals we have > decided to hire Mukulika. You can see her Statement of Interest here: > https://mail.python.org/pipermail/numpy-discussion/2021-May/081746.html > > A great big welcome to the team, Mukulika! We are excited to work with > you. > > Thank you also to Melissa & Ross for mentoring, and to all the other > applicants for their interest in the project. > > St?fan > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://mail.python.org/pipermail/numpy-discussion/attachments/20210514/61720fa1/attachment-0001.html > > > > ------------------------------ > > Message: 3 > Date: Fri, 14 May 2021 22:37:12 +0200 > From: Ralf Gommers > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Google Season of Docs > Message-ID: > < > CABL7CQhoXQSBFPkSRK-tv6uP6Ozk1yiOTSC-31bpW7hftJ9WNQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Fri, May 14, 2021 at 8:47 PM Stefan van der Walt > wrote: > > > On Fri, May 14, 2021, at 11:41, Melissa Mendon?a wrote: > > > > I am happy to announce we found our technical writer for the Google > Season > > of Docs program. After looking at all submitted proposals we have decided > > to hire Mukulika. You can see her Statement of Interest here: > > https://mail.python.org/pipermail/numpy-discussion/2021-May/081746.html > > > > > > A great big welcome to the team, Mukulika! We are excited to work with > > you. > > > > Great to have you on the team Mukulika, welcome! > > > > Thank you also to Melissa & Ross for mentoring, and to all the other > > applicants for their interest in the project. > > > > +10 > > Cheers, > Ralf > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://mail.python.org/pipermail/numpy-discussion/attachments/20210514/4e16ab2c/attachment-0001.html > > > > ------------------------------ > > Message: 4 > Date: Fri, 14 May 2021 16:46:19 -0400 > From: Isaac Gerg > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] bad CRC errors when using np.savez, > only sometimes though! > Message-ID: > OUhtW0vW9UHsH74+WsgD1w at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Is it zlib or zipfile? > > On Fri, May 14, 2021 at 11:38 AM Benjamin Root > wrote: > > > Isaac, > > > > What I mean is that your bug might be similar to the savemat() bug that > > was fixed in scipy in 2019. Completely different functions, but both > > functions need to properly interact with zlib in order to work properly. > > > > On Fri, May 14, 2021 at 10:22 AM Isaac Gerg > > wrote: > > > >> Hi Ben, I am not sure. However, in looking at the dates, it looks like > >> that was fixed in scipy as of 2019. > >> > >> Would you recommend using the scipy save interface as opposed to the > >> numpy one? > >> > >> On Fri, May 14, 2021 at 10:16 AM Benjamin Root > >> wrote: > >> > >>> Perhaps it is a similar bug as this one? > >>> https://github.com/scipy/scipy/issues/6999 > >>> > >>> Basically, it turned out that the CRC was getting computed on an > >>> unflushed buffer, or something like that. > >>> > >>> On Fri, May 14, 2021 at 10:05 AM Isaac Gerg > >>> wrote: > >>> > >>>> I am using 1.19.5 on Windows 10 using Python 3.8.6 > >>>> (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit > (AMD64)]. > >>>> > >>>> I have two python processes running (i.e. no threads) which do > >>>> independent processing jobs and NOT writing to the same directories. > Each > >>>> process runs for 5-10 hours and then writes out a ~900MB npz file > >>>> containing 4 arrays. > >>>> > >>>> When I go back to read in the npz files, I will sporadically get bad > >>>> CRC errors which are related to npz using ziplib. I cannot figure > out why > >>>> this is happening. Looking through online forums, other folks have > had CRC > >>>> problems but they seem to be isolated to specifically using ziblib, > not > >>>> numpy. I have found a few mentions though of ziplib causing > headaches if > >>>> the same file pointer is used across calls when one uses the file > handle > >>>> interface to ziblib as opposed to passing in a filename.' > >>>> > >>>> I have verified with 7zip that the files do in fact have a CRC error > so > >>>> its not an artifact of the ziblib. I have also used the file handle > >>>> interface to np.load and still get the error. > >>>> > >>>> Aside from writing my own numpy storage file container, I am stumped > as > >>>> to how to fix this, or reproduce this in a consistent manner. > >>>> Any suggestions would be greatly appreciated! > >>>> > >>>> Thank you, > >>>> Isaac > >>>> _______________________________________________ > >>>> NumPy-Discussion mailing list > >>>> NumPy-Discussion at python.org > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion > >>>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at python.org > >>> https://mail.python.org/mailman/listinfo/numpy-discussion > >>> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://mail.python.org/pipermail/numpy-discussion/attachments/20210514/dede81ba/attachment.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > ------------------------------ > > End of NumPy-Discussion Digest, Vol 176, Issue 22 > ************************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Sat May 15 08:06:32 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sat, 15 May 2021 13:06:32 +0100 Subject: [Numpy-discussion] Proposal to accept NEP 49: Data allocation strategies In-Reply-To: <1620921962860-0.post@n7.nabble.com> References: <68067d32-4112-67e2-3b8b-08feb7c875ba@gmail.com> <7ee0aded6d0b6036710a683ed60c832d920ac6fa.camel@sipsolutions.net> <2f5624a12f2005a62f5c2549624fb3684009d41a.camel@sipsolutions.net> <8c06b4f4-ebfb-e8d5-9460-4e700d70ca85@gmail.com> <1620921962860-0.post@n7.nabble.com> Message-ID: Note that PEP-445 which introduced `PyMemAllocatorEx` specifically rejected omitting the `ctx` argument here: https://www.python.org/dev/peps/pep-0445/#id23, which is another argument in favor of having it. I'll try to give a more thorough justification for the pyobject / capsule suggestion in another message in the next few days. On Thu, 13 May 2021 at 17:06, eliaskoromilas wrote: > Eric Wieser wrote > >> Yes, sorry, had been a while since I had looked it up: > >> > >> https://docs.python.org/3/c-api/memory.html#c.PyMemAllocatorEx > > > > That `PyMemAllocatorEx` looks almost exactly like one of the two variants > > I > > was proposing. Is there a reason for wanting to define our own structure > > vs > > just using that one? > > I think the NEP should at least offer a brief comparison to that > > structure, > > even if we ultimately end up not using it. > > > >> I have to say it feels a bit > >> like exposing things publicly, that are really mainly used internally, > >> but not sure... Presumably Python uses the `ctx` for something though. > > > > I'd argue `ctx` / `baton` / `user_data` arguments are an essential part > of > > any C callback API. > > I can't find any particularly good reference for this right now, but I > > have > > been bitten multiple times by C APIs that forget to add this argument. > > > >> If someone wants a different strategy (i.e. different alignment) they > > create a new policy > > > > The crux of the problem here is that without very nasty hacks, C and C++ > > do > > not allow new functions to be created at runtime. > > This makes it very awkward to write a parameterizable allocator. If you > > want to create two aligned allocators with different alignments, and you > > don't have a `ctx` argument to plumb through that alignment information, > > you're forced to write the entire thing twice. > > The `PyMemAllocatorEx` memory API will allow (lambda) closure-like > definition of the data mem routines. That's the main idea behind the `ctx` > thing, it's huge and will enable every allocation scenario. > > In my opinion, the rest of the proposals (PyObjects, PyCapsules, etc.) are > secondary and could be considered out-of-scope. I would suggest to let > people use this before hiding it behind a strict API. > > Let me also give you an insight of how we plan to do it, since we are the > first to integrate this in production code. Considering this NEP as a > primitive API, I developed a new project to address our requirements: > > 1. Provide a Python-native way to define a new numpy allocator > 2. Accept data mem routine symbols (function pointers) from open dynamic > libraries > 3. Allow local-scoped allocation, e.g. inside a `with` statement > > But since there was not much fun in these, I thought it would be nice if we > could exploit `ctypes` callback functions, to allow developers hook into > such routines natively (e.g. for debugging/monitoring), or even write them > entirely in Python (of course there has to be an underlying memory > allocation API). > > For example, the idea is to be able to define a page-aligned allocator in > ~30 lines of Python code, like that: > > > https://github.com/inaccel/numpy-allocator/blob/master/test/aligned_allocator.py > > --- > > While experimenting with this project I spotted the two following issues: > > 1. Thread-locality > My biggest concern is the global scope of the numpy `current_allocator` > variable. Currently, an allocator change is applied globally affecting > every > thread. This behavior breaks the local-scoped allocation promise of my > project. Imagine for example the implications of allocating pinned > (page-locked) memory (since you mention this use-case a lot) for random > glue-code ndarrays in background threads. > > 2. Allocator context (already discussed) > I found a bug, when I tried to use a Python callback (`ctypes.CFUNCTION`) > for the `PyDataMem_FreeFunc` routine. Since there are cases in which the > `free` routine is invoked after a PyErr has occurred (to clean up internal > arrays for example), `ctypes` messes with the exception state badly. This > problem can be resolved with the the use of a `ctx` (allocator context) > that > will allow the routines to run clean of errors, wrapping them like that: > > ``` > static void wrapped_free(void *ptr, size_t size, void *ctx) { > PyObject *type; > PyObject *value; > PyObject *traceback; > PyErr_Fetch(&type, &value, &traceback); > ((PyDataMem_Context *) ctx)->free(ptr, size); > PyErr_Restore(type, value, traceback); > } > ``` > > Note: This bug doesn't affect `CDLL` members (CFuncPtr objects), since they > are pure `dlsym` pointers. > > Of course, this is a simple case of how a `ctx` could be useful for an > allocation policy. I guess people can become very creative with this in > general. > > Elias > > > > > -- > Sent from: http://numpy-discussion.10968.n7.nabble.com/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shashwatjaiswal2001 at gmail.com Sun May 16 05:13:25 2021 From: shashwatjaiswal2001 at gmail.com (Shashwat Jaiswal) Date: Sun, 16 May 2021 14:43:25 +0530 Subject: [Numpy-discussion] linalg.det for fractions Message-ID: How about having linalg.det returning a fraction object when passed a numpy matrix of fractions? -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Sun May 16 05:45:19 2021 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sun, 16 May 2021 10:45:19 +0100 Subject: [Numpy-discussion] linalg.det for fractions In-Reply-To: References: Message-ID: Numpy implements linalg.det by going through LAPACK, which only knows about f4, f8, c8, and c16 data types. Your request amounts to wanting an `O` dtype implementation. I think this is a totally reasonable request as we already have such an implementation for `np.matmul`; but it won't be particularly easy to implement or fast, especially as it won't be optimized for fractions specifically. Some other options for you would be to: * use sympy's matrix operations; fractions are really just "symbolics lite" * Extract a common denominator from your matrix, convert the numerators to float64, and hope you don't exceed 2**52 in the result. You could improve the second option a little by implementing (and PRing) an integer loop for `det`, which would be somewhat easier than implementing the object loop. Eric On Sun, May 16, 2021, 10:14 Shashwat Jaiswal wrote: > How about having linalg.det returning a fraction object when passed a > numpy matrix of fractions? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Sun May 16 15:16:11 2021 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 16 May 2021 20:16:11 +0100 Subject: [Numpy-discussion] linalg.det for fractions In-Reply-To: References: Message-ID: On Sun, 16 May 2021 at 10:46, Eric Wieser wrote: > > Numpy implements linalg.det by going through LAPACK, which only knows about f4, f8, c8, and c16 data types. > > Your request amounts to wanting an `O` dtype implementation. I think this is a totally reasonable request as we already have such an implementation for `np.matmul`; but it won't be particularly easy to implement or fast, especially as it won't be optimized for fractions specifically. Computing determinants is somewhat different from matrix multiplication in the sense that the best algorithms depend on the nature of the "dtype" e.g. is the arithmetic exact or approximate? Also is division possible or desirable to use? There are division-free and fraction-free algorithms and different strategies for pivoting depending on whether you are trying to control bit growth or rounding error. What assumptions would an `O` dtype implementation of det make? Should it be allowed to use division? If so, what kind of properties can it assume about divisibility (e.g. true or floor division)? > Some other options for you would be to: > > * use sympy's matrix operations; fractions are really just "symbolics lite" > * Extract a common denominator from your matrix, convert the numerators to float64, and hope you don't exceed 2**52 in the result. Is the float64 algorithm exact for integer-valued float64? Doesn't look like it: In [33]: a = np.array([[1, 2], [3, 4]], np.float64) In [34]: np.linalg.det(a) Out[34]: -2.0000000000000004 Except for very small matrices and simple fractions the chances of overflow are actually quite high. Just extracting the common denominator could easily mean that the remaining numerators overflow. It is also possible that the inputs and outputs might be in range but intermediate calculations could have much larger magnitudes. > You could improve the second option a little by implementing (and PRing) an integer loop for `det`, which would be somewhat easier than implementing the object loop. For integers and other non-field PIDs the fraction-free Bareiss algorithm is typically used to control bit growth: https://en.wikipedia.org/wiki/Bareiss_algorithm If you're looking for an off-the-shelf library that already does this from Python then I suggest sympy. Example: In [15]: import sympy as sym, random In [16]: rand_fraction = lambda: sym.Rational(random.randint(-10, 10), random.randint(1, 10)) In [17]: M = sym.Matrix([[rand_fraction() for _ in range(5)] for _ in range(5)]) In [18]: M Out[18]: ?8/3 4 -1/3 -4/7 -6/5? ? ? ? 0 0 7/5 3 -6/5? ? ? ?-6/5 -9/4 -1/5 -1 9/4 ? ? ? ? 5 -9/10 -9/2 -3/2 -2 ? ? ? ? 1 -5/6 5/7 9/8 -2/3? In [19]: M.det() Out[19]: -201061211 ??????????? 2205000 Another option is python_flint which is not so easy to set up and use but wraps the flint library which is the fastest I know of for this sort of thing. Here is flint computing the exact determinant of a 1000x1000 matrix of small bit-count rationals: In [37]: import flint In [38]: rand_fraction = lambda: flint.fmpq(random.randint(-10, 10), random.randint(1, 10)) In [39]: M = flint.fmpq_mat([[rand_fraction() for _ in range(1000)] for _ in range(1000)]) In [40]: %time M.det() CPU times: user 25 s, sys: 224 ms, total: 25.2 s Wall time: 26 s Although the inputs are all small fractions like 3/7, 1/2, etc the repr of the output is too long to show in this email: In [41]: len(str(_)) Out[41]: 8440 -- Oscar From bas.vanbeek at hotmail.com Tue May 18 12:21:01 2021 From: bas.vanbeek at hotmail.com (bas van beek) Date: Tue, 18 May 2021 16:21:01 +0000 Subject: [Numpy-discussion] Deprecate four `ndarray.ctypes` implementation artifacts Message-ID: Hi all, The `ndarray.ctypes` object currently contains four (undocumented) methods that are essentially leftover implementation artifacts, kept around for the sake of backwards compatibility. As this was three years ago it is, in my opinion, a suitable time now to properly deprecate them, my question being if anyone would have objections to this. The methods in question are: * `get_data` (use the `data` property instead) * `get_shape` (use the `shape` property instead) * `get_strides` (use the `ctypes.strides` property instead) * `get_as_parameter` (use the `_as_parameter_` property instead) For those interested, a PR implementing their deprecation is currently up at https://github.com/numpy/numpy/pull/19031 Regards, Bas van Beek -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue May 18 15:30:58 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 18 May 2021 12:30:58 -0700 Subject: [Numpy-discussion] NumPy Development Meeting Wednesday - Triage Focus Message-ID: <7dcfcaff7f4ee1e7ad2e338c8287774dfe23a1db.camel@sipsolutions.net> Hi all, Our bi-weekly triage-focused NumPy development meeting is Wednesday, May 19th at 11 am Pacific Time (18:00 UTC). Everyone is invited to join in and edit the work-in-progress meeting topics and notes: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg I encourage everyone to notify us of issues or PRs that you feel should be prioritized, discussed, or reviewed. Best regards Sebastian From thomas.hilger at gmail.com Wed May 19 07:20:41 2021 From: thomas.hilger at gmail.com (Thomas Hilger) Date: Wed, 19 May 2021 13:20:41 +0200 Subject: [Numpy-discussion] interval wrapping / remainder / angle normalization Message-ID: Hi everyone! I was wondering if it is a possible addition to numpy to have a function to wrap values to an interval. Typically, it is desired to limit an angle to [0, 2pi) or [-pi ,pi), either by letting it "overflow" or by "bouncing" hence and forth. The function which does this is actually really simple. However, whenever I am facing this task I tend to work a while on this until I get it correct. I have a small and handy function (it is small because it just uses np.divmod) at hand which does this, including also the left-open or closed cases, and some tests. In case this is of interest, I can contribute. Best regards, Thomas. -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Wed May 19 19:12:18 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Wed, 19 May 2021 20:12:18 -0300 Subject: [Numpy-discussion] Newcomer's Meeting tomorrow! In-Reply-To: References: Message-ID: Hi all! Tomorrow,* May 20, at 8pm UTC* we have a Newcomer's Meeting! This is an informal meeting with no agenda to ask questions, get to know other people and (hopefully) figure out ways to contribute to NumPy. Feel free to join if you are lurking around but found it hard to start contributing - we'll do our best to support you. If you wish to join on Zoom, use this link: https://zoom.us/j/6345425936 Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Newcomer%27s+Meeting&iso=20210520T20&p1=1440&ah=1 *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From zinger.kyon at gmail.com Wed May 19 21:20:45 2021 From: zinger.kyon at gmail.com (ZinGer_KyoN) Date: Wed, 19 May 2021 18:20:45 -0700 (MST) Subject: [Numpy-discussion] interval wrapping / remainder / angle normalization In-Reply-To: References: Message-ID: <1621473645841-0.post@n7.nabble.com> I have similar needs, but for int array and integer interval (like -32768~32767), currently I'm using bitwise and/or (&/|) to do this trick. It will be nice if there is an optimized function, both for float and int -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From ndbecker2 at gmail.com Thu May 20 09:45:41 2021 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 20 May 2021 09:45:41 -0400 Subject: [Numpy-discussion] Indexing question Message-ID: This seems like something that can be done with indexing, but I haven't found the solution. out is a 2D array is initialized to zeros. x is a 1D array whose values correspond to the columns of out. For each row in out, set out[row,x[row]] = 1. Here is working code: def orthogonal_mod (x, nbits): out = np.zeros ((len(x), 1< References: Message-ID: On Thu, May 20, 2021 at 9:47 AM Neal Becker wrote: > This seems like something that can be done with indexing, but I > haven't found the solution. > > out is a 2D array is initialized to zeros. x is a 1D array whose > values correspond to the columns of out. For each row in out, set > out[row,x[row]] = 1. Here is working code: > def orthogonal_mod (x, nbits): > out = np.zeros ((len(x), 1< for e in range (len (x)): > out[e,x[e]] = 1 > return out > > Any idea to do this without an explicit python loop? > i = np.arange(len(x)) j = x[i] out[i, j] = 1 -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Thu May 20 10:05:14 2021 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 20 May 2021 10:05:14 -0400 Subject: [Numpy-discussion] Indexing question In-Reply-To: References: Message-ID: Thanks! On Thu, May 20, 2021 at 9:53 AM Robert Kern wrote: > > On Thu, May 20, 2021 at 9:47 AM Neal Becker wrote: >> >> This seems like something that can be done with indexing, but I >> haven't found the solution. >> >> out is a 2D array is initialized to zeros. x is a 1D array whose >> values correspond to the columns of out. For each row in out, set >> out[row,x[row]] = 1. Here is working code: >> def orthogonal_mod (x, nbits): >> out = np.zeros ((len(x), 1<> for e in range (len (x)): >> out[e,x[e]] = 1 >> return out >> >> Any idea to do this without an explicit python loop? > > > > i = np.arange(len(x)) > j = x[i] > out[i, j] = 1 > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -- Those who don't understand recursion are doomed to repeat it From thomas.hilger at gmail.com Thu May 20 13:30:36 2021 From: thomas.hilger at gmail.com (Thomas Hilger) Date: Thu, 20 May 2021 19:30:36 +0200 Subject: [Numpy-discussion] interval wrapping / remainder / angle normalization In-Reply-To: <1621473645841-0.post@n7.nabble.com> References: <1621473645841-0.post@n7.nabble.com> Message-ID: When I understand correctly and what you desire is equivalent to integer overflowing, the function can indeed be applied as well. I tested. But to be sure, maybe some examples are better. Am Do., 20. Mai 2021 um 03:22 Uhr schrieb ZinGer_KyoN : > I have similar needs, but for int array and integer interval (like > -32768~32767), currently I'm using bitwise and/or (&/|) to do this trick. > > It will be nice if there is an optimized function, both for float and int > > > > -- > Sent from: http://numpy-discussion.10968.n7.nabble.com/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Thu May 20 13:39:06 2021 From: perimosocordiae at gmail.com (CJ Carey) Date: Thu, 20 May 2021 13:39:06 -0400 Subject: [Numpy-discussion] Indexing question In-Reply-To: References: Message-ID: Or as a one-liner: out[np.arange(len(x)), x] = 1 If NEP 21 is accepted ( https://numpy.org/neps/nep-0021-advanced-indexing.html) this would be even simpler: out.vindex[:, x] = 1 Was there ever a decision about that NEP? I didn't follow the discussion too closely at the time. On Thu, May 20, 2021 at 10:06 AM Neal Becker wrote: > Thanks! > > On Thu, May 20, 2021 at 9:53 AM Robert Kern wrote: > > > > On Thu, May 20, 2021 at 9:47 AM Neal Becker wrote: > >> > >> This seems like something that can be done with indexing, but I > >> haven't found the solution. > >> > >> out is a 2D array is initialized to zeros. x is a 1D array whose > >> values correspond to the columns of out. For each row in out, set > >> out[row,x[row]] = 1. Here is working code: > >> def orthogonal_mod (x, nbits): > >> out = np.zeros ((len(x), 1< >> for e in range (len (x)): > >> out[e,x[e]] = 1 > >> return out > >> > >> Any idea to do this without an explicit python loop? > > > > > > > > i = np.arange(len(x)) > > j = x[i] > > out[i, j] = 1 > > > > -- > > Robert Kern > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > -- > Those who don't understand recursion are doomed to repeat it > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu May 20 13:46:27 2021 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 20 May 2021 13:46:27 -0400 Subject: [Numpy-discussion] Indexing question In-Reply-To: References: Message-ID: On Thu, May 20, 2021 at 1:40 PM CJ Carey wrote: > Or as a one-liner: > > out[np.arange(len(x)), x] = 1 > Ah, right. `x[arange(len(x))]` is a no-op. > If NEP 21 is accepted ( > https://numpy.org/neps/nep-0021-advanced-indexing.html) this would be > even simpler: > > out.vindex[:, x] = 1 > > Was there ever a decision about that NEP? I didn't follow the discussion > too closely at the time. > IIRC, I think there was broad agreement on the final plan as stated in the NEP. I suspect, though, that the general appetite for adding to the array API surface has declined even from its anemic starting point, now that deep learning frameworks with ndarray-mimicking APIs have taken off. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu May 20 14:26:11 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 20 May 2021 11:26:11 -0700 Subject: [Numpy-discussion] Indexing question In-Reply-To: References: Message-ID: <1f233f6d3027e6aeb46565adcd07ccdb8fe1d677.camel@sipsolutions.net> On Thu, 2021-05-20 at 13:46 -0400, Robert Kern wrote: > On Thu, May 20, 2021 at 1:40 PM CJ Carey > wrote: > > If NEP 21 is accepted ( > > https://numpy.org/neps/nep-0021-advanced-indexing.html) this would > > be > > even simpler: > > > > out.vindex[:, x] = 1 > > > > Was there ever a decision about that NEP? I didn't follow the > > discussion > > too closely at the time. > > > > IIRC, I think there was broad agreement on the final plan as stated > in the > NEP. I suspect, though, that the general appetite for adding to the > array > API surface has declined even from its anemic starting point, now > that deep > learning frameworks with ndarray-mimicking APIs have taken off. True, I am not sure on which side we would land now.? Although, NumPy's advanced indexing is too odd to expect ndarray-mimicking APIs to copy it.? At least with new attributes you have a chance to define clearly what should happen. I personally still have appetite for it.? But expect there will be enough "small" things to fix-up (improve the NEP, figure out how subclassing should be done, clean-up the old code) that this is still a decent sized project.? The old 80-20 problem, I guess... So, it just never quite reached the motivation/priority threshold for me since the original push. Cheers, Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From melissawm at gmail.com Sat May 22 21:19:54 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Sat, 22 May 2021 22:19:54 -0300 Subject: [Numpy-discussion] Documentation Team meeting - Monday May 24 In-Reply-To: References: Message-ID: Hi all! Our next Documentation Team meeting will be on *Monday, May 24* at ***4PM UTC***. All are welcome - you don't need to already be a contributor to join. If you have questions or are curious about what we're doing, we'll be happy to meet you! If you wish to join on Zoom, use this link: https://zoom.us/j/96219574921?pwd=VTRNeGwwOUlrYVNYSENpVVBRRjlkZz09#success Here's the permanent hackmd document with the meeting notes (still being updated in the next few days!): https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg Hope to see you around! ** You can click this link to get the correct time at your timezone: https://www.timeanddate.com/worldclock/fixedtime.html?msg=NumPy+Documentation+Team+meeting&iso=20210524T16&p1=1440&ah=1 *** You can add the NumPy community calendar to your google calendar by clicking this link: https://calendar.google.com/calendar /r?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon May 24 13:12:59 2021 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 24 May 2021 11:12:59 -0600 Subject: [Numpy-discussion] NumPy 1.21.0rc1 has been released. Message-ID: Hi All, On behalf of the NumPy team I am pleased to announce the release of NumPy 1.21.0rc1. The highlights are - continued SIMD work covering more functions and platforms, - initial work on the new dtype infrastructure and casting, - improved documentation, - improved annotations, - the new ``PCG64DXSM`` bitgenerator for random numbers. This NumPy release is the result of 561 merged pull requests contributed by 171 people. The Python versions supported for this release are 3.7-3.9, support for Python 3.10 will be added after Python 3.10 is released. Wheels can be downloaded from PyPI ; source archives, release notes, and wheel hashes are available on Github . Linux users will need pip >= 0.19.3 in order to install manylinux2010 and manylinux2014 wheels. *Contributors* A total of 171 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - @8bitmp3 + - @DWesl + - @Endolith - @Illviljan + - @Lbogula + - @Lisa + - @Patrick + - @Scian + - @h-vetinari + - @h6197627 + - @jbCodeHub + - @legoffant + - @sfolje0 + - @tautaus + - @yetanothercheer + - Abhay Raghuvanshi + - Adrian Price-Whelan + - Aerik Pawson + - Agbonze Osazuwa + - Aitik Gupta + - Al-Baraa El-Hag - Alex Henrie - Alexander Hunt + - Aliz? Papp + - Allan Haldane - Amarnath1904 + - Amrit Krishnan + - Andras Deak - AngelGris + - Anne Archibald - Anthony Vo + - Antony Lee - Atharva-Vidwans + - Ayush Verma + - Bas van Beek - Bharat Raghunathan - Bhargav V + - Brian Soto - Carl Michal + - Charles Harris - Charles Stern + - Chiara Marmo + - Chris Barnes + - Chris Vavaliaris - Christina Hedges + - Christoph Gohlke - Christopher Dahlin + - Christos Efstathiou + - Chunlin Fang - Constanza Fierro + - Daniel Evans + - Daniel Montes + - Dario Mory + - David Carlier + - David Stansby - Deepyaman Datta + - Derek Homeier - Dong Keun Oh + - Dylan Cutler + - Eric Larson - Eric Wieser - Eva Jau + - Evgeni Burovski - FX Coudert + - Faris A Chugthai + - Filip Ter + - Filip Trojan + - Fran?ois Le Lay + - Ganesh Kathiresan - Giannis Zapantis + - Giulio Procopio + - Greg Lucas + - Hollow Man + - Holly Corbett + - Inessa Pawson - Isabela Presedo-Floyd - Ismael Jimenez + - Isuru Fernando - Jakob Jakobson - James Gerity + - Jamie Macey + - Jasmin Classen + - Jody Klymak + - Joseph Fox-Rabinovitz - J?rome Eertmans + - Kamil Choudhury + - Kasia Leszek + - Keller Meier + - Kevin Sheppard - Kulin Seth + - Kumud Lakara + - Laura Kopf + - Laura Martens + - Leo Singer + - Leonardus Chen + - Lima Tango + - Lumir Balhar + - Maia Kaplan + - Mainak Debnath + - Marco Aur?lio da Costa + - Marta Lemanczyk + - Marten van Kerkwijk - Mary Conley + - Marysia Winkels + - Mateusz Sok?? + - Matt Haberland - Matt Hall + - Matt Ord + - Matthew Badin + - Matthias Bussonnier - Matthias Geier - Matti Picus - Mat?as R?os + - Maxim Belkin + - Melissa Weber Mendon?a - Meltem Eren Copur + - Michael Dubravski + - Michael Lamparski - Michal W. Tarnowski + - Micha? G?rny + - Mike Boyle + - Mike Toews - Misal Raj + - Mitchell Faas + - Mukulikaa Parhari + - Neil Girdhar + - Nicholas McKibben + - Nico Schl?mer - Nicolas Hug + - Nilo Kruchelski + - Nirjas Jakilim + - Ohad Ravid + - Olivier Grisel - Pamphile ROY + - Panos Mavrogiorgos + - Patrick T. Komiske III + - Pearu Peterson - Raghuveer Devulapalli - Ralf Gommers - Ra?l Mont?n Pinillos + - Rin Arakaki + - Robert Kern - Rohit Sanjay - Roman Yurchak - Ronan Lamy - Ross Barnowski - Ryan C Cooper - Ryan Polley + - Ryan Soklaski - Sabrina Simao + - Sayed Adel - Sebastian Berg - Shen Zhou + - Stefan van der Walt - Sylwester Arabas + - Takanori Hirano - Tania Allard + - Thomas J. Fan + - Thomas Orgis + - Tim Hoffmann - Tomoki, Karatsu + - Tong Zou + - Touqir Sajed + - Tyler Reddy - Wansoo Kim - Warren Weckesser - Weh Andreas + - Yang Hau - Yashasvi Misra + - Zolboo Erdenebaatar + - Zolisa Bleki Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From melissawm at gmail.com Mon May 24 15:07:57 2021 From: melissawm at gmail.com (=?UTF-8?Q?Melissa_Mendon=C3=A7a?=) Date: Mon, 24 May 2021 16:07:57 -0300 Subject: [Numpy-discussion] Understanding Our Contributors - NumFOCUS survey Message-ID: Hi all, NumPy is participating in a research project being conducted by NumFOCUS , our fiscal sponsorship organization. The research is looking into understanding the diversity, inclusion and barriers to participation within NumFOCUS-sponsored projects and the wider open source community. The survey will take 15-20 min to complete. We?d appreciate your contribution by May 31, 2021. The results of this survey will help NumFOCUS work closely with our projects to develop practices that will lead to project success around diversity, inclusion and sustainability. Click here to participate in the survey Thank you for your participation! - Melissa -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon May 24 18:29:15 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 24 May 2021 15:29:15 -0700 Subject: [Numpy-discussion] The status of DType Refactor Message-ID: <7b011e3114b00679344417e5ab85f1aaf87e0c87.camel@sipsolutions.net> Hi all, I thought I would give a brief update on where we are with new DTypes. Partially for Matti who is braving the brunt of the review, but also for anyone else interested. ?Please don't hesitate to ask for clarifications, any questions, or to schedule a meeting to discuss! Recap The past year, has seen most of the "big picture" changes merged into NumPy, a good chunk already part of 1.20: * dtype instances are not instances of np.dtype subclasses. I usually write DType for those. But DTypeType is also a good name :). * Array coercion using np.array(...) was completely rewritten, which was necessary to allow new user DTypes. * Introduced the ArrayMethod concept to unif casting and ufuncs as much as possible (NEP 42/43):Casting was first fixed up to support error returns."can-cast" logic was rewritten in terms of ArrayMethod (i.e. casting safety checks are integrated into Arraymethod)Casting largely reorganized around the ArrayMethod concept, including the casting safety. (Also this) * Promotion was implemented and later integrated everywhere, e.g. for np.result_type(...). * A larger refactor of UFuncs and a few smaller PRs set the stage for the ufunc refactor (see currently in progress) With the exception of universal functions, the above list covers all major areas of change in NumPy that are required to change. It also implements many of the things that new user DTypes will need and currently cannot do. Previously, these were either unavailable or limited in various ways; especially when it comes to parametric DTypes such as units or strings. Currently in Progress The current main reamining points are the universal functions. Since, a majority of NumPy features are organized as universal functions, and universal functions inheritently did not support parametric user defined DTypes. These need a major change. This change is proposed in NEP 43 (although that will need some smaller updates). The work on implemeting it, is mostly settling in the following PR and the following branch (I hope these will move in very soon): * PR 18905: Implements new promotion, dispatching and use for most ufuncs. * My developement branch extends this to the reductions. In parallel, the new DType API is only useful for users once it is exposed, I have a branch here to experiment with that: * The expermental DType API exposure branch. * And a repository with (currently cython) examples using it. This currently includes a very simplicitic Units DType and ufuncs for strings (previous difficult or not really possible). The exact way to write a new DType probably needs some alternative. But note that this should largely be limited to the boilerplate code. Future The main step still remaining is figuring out how to exactly expose the DType API best (ABI compatibility is the major concern) and finishing the NEP 43 (or most of it) as closing up. After that there are still some things that need to be done (although, this is unlikely to be exhaustive): * The way users should define new DTypes has to be decided (this seems tricky, unfortunately). * Some functionality is defined in the "old style" API that should be removed/discouraged. This includes things like sorting functions. (The old way could be allowed for a transition period.) To be specific, these are the ((PyArray_Descr *)descr)->f->funcs. * Some small parts of the new API are missing right now. E.g. ensure_nbo() in current NumPy code, has to use the ensure_canonical() as defined by NEP 42. Similarly, some parts will need tweaking. * Part of the API should be public, but it would also be nice to clean them up before doing so; An example for this is the get_loop() for/of ufuncs. For most use-cases, this is probably not too important, but the API is a bit awkward currently. (It would be possible to accept the awkward API and replace it in the future with a new get_loop(), deprecating the old one slowly) * There should be some new API for "reference counting" (more generally, any item with memory management). Cleaning up the split between the current transfer to NULL and PyArray_XDECREF. That is, we should unify it as much as possible (probably by using the transfer to NULL path). And then expose that also to custom DTypes. * Some utility functionality is missing at this time. For example a way for a Unit DType to fall back to the normal math implemented by NumPy (after figuring out the unit part). * A Python API is not on my explicit roadmap right now (although probably not hard). But most importantly, whatever comes up when potential users start exploring the API, hopefully soon! Otherwise, there are a couple of related improvements, that I think would make sense. Such as considering storing the actual power-of-two alignment in the array flags (they are getting a bit cramped if we assume int can be 16 bits though). Also the discussion about removing value based casting/promotion is one that would help with DTypes and pushing it forward probably makes sense as soon as the items that are "currently in progress" are largely settled and the next NumPy version is released. Cheers, Sebastian -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue May 25 16:26:19 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 25 May 2021 13:26:19 -0700 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday Message-ID: Hi all, There will be a NumPy Community meeting Wednesday Mai 26th at 20:00 UTC. Everyone is invited and encouraged to join in and edit the work-in-progress meeting topics and notes at: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian From kangkai at mail.ustc.edu.cn Fri May 28 10:57:59 2021 From: kangkai at mail.ustc.edu.cn (kangkai at mail.ustc.edu.cn) Date: Fri, 28 May 2021 22:57:59 +0800 (GMT+08:00) Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' Message-ID: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> Hi all, Finding topk elements is widely used in several fields, but missed in NumPy. I implement this functionality named as numpy.topk using core numpy functions and open a PR: https://github.com/numpy/numpy/pull/19117 Any discussion are welcome. Best wishes, Kang Kai -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat May 29 10:28:01 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 29 May 2021 16:28:01 +0200 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> Message-ID: On Fri, May 28, 2021 at 4:58 PM wrote: > Hi all, > > Finding topk elements is widely used in several fields, but missed in > NumPy. > I implement this functionality named as numpy.topk using core numpy > functions and open a PR: > > https://github.com/numpy/numpy/pull/19117 > > Any discussion are welcome. > Thanks for the proposal Kang. I think this functionality is indeed a fairly obvious gap in what Numpy offers, and would make sense to add. A detailed comparison with other libraries would be very helpful here. TensorFlow and JAX call this function `top_k`, while PyTorch, Dask and MXNet call it `topk`. Two things to look at in more detail here are: 1. complete signatures of the function in each of those libraries, and what the commonality is there. 2. the argument Eric made on your PR about consistency with sort/argsort, and if we want topk/argtopk? Also, do other libraries have `argtopk`? Cheers, Ralf > Best wishes, > > Kang Kai > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Sat May 29 12:33:06 2021 From: davidmenhur at gmail.com (=?UTF-8?Q?David_Men=C3=A9ndez_Hurtado?=) Date: Sat, 29 May 2021 18:33:06 +0200 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> Message-ID: On Sat, 29 May 2021, 4:29 pm Ralf Gommers, wrote: > > > On Fri, May 28, 2021 at 4:58 PM wrote: > >> Hi all, >> >> Finding topk elements is widely used in several fields, but missed in >> NumPy. >> I implement this functionality named as numpy.topk using core numpy >> functions and open a PR: >> >> https://github.com/numpy/numpy/pull/19117 >> >> Any discussion are welcome. >> > > Thanks for the proposal Kang. I think this functionality is indeed a > fairly obvious gap in what Numpy offers, and would make sense to add. A > detailed comparison with other libraries would be very helpful here. > TensorFlow and JAX call this function `top_k`, while PyTorch, Dask and > MXNet call it `topk`. > When I saw `topk` I initially parsed it as "to pk", similar to the current `tolist`. I think `top_k` is more explicit and clear. /David -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Sat May 29 15:26:07 2021 From: daniele at grinta.net (Daniele Nicolodi) Date: Sat, 29 May 2021 21:26:07 +0200 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> Message-ID: <19656066-ebcd-2725-ecba-8935f79b41f6@grinta.net> On 29/05/2021 18:33, David Men?ndez Hurtado wrote: > > > On Sat, 29 May 2021, 4:29 pm Ralf Gommers, > wrote: > > > > On Fri, May 28, 2021 at 4:58 PM > wrote: > > Hi all, > > Finding topk elements is widely used in several fields, but > missed in NumPy. > I implement this functionality named as? numpy.topk using core numpy > functions and open a PR: > > https://github.com/numpy/numpy/pull/19117 > > > Any discussion are welcome. > > > Thanks for the proposal Kang. I think this functionality is indeed a > fairly obvious gap in what Numpy offers, and would make sense to > add. A detailed comparison with other libraries would be very > helpful here. TensorFlow and JAX call this function `top_k`, while > PyTorch, Dask and MXNet call it `topk`. > > > When I saw `topk` I initially parsed it as "to pk", similar to the > current `tolist`. I think `top_k` is more explicit and clear. What does k stand for here? As someone that never encountered this function before I find both names equally confusing. If I understand what the function is supposed to be doing, I think largest() would be much more descriptive. Cheers, Dan From robert.kern at gmail.com Sat May 29 18:48:46 2021 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 29 May 2021 18:48:46 -0400 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: <19656066-ebcd-2725-ecba-8935f79b41f6@grinta.net> References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <19656066-ebcd-2725-ecba-8935f79b41f6@grinta.net> Message-ID: On Sat, May 29, 2021 at 3:35 PM Daniele Nicolodi wrote: > What does k stand for here? As someone that never encountered this > function before I find both names equally confusing. If I understand > what the function is supposed to be doing, I think largest() would be > much more descriptive. > `k` is the number of elements to return. `largest()` can connote that it's only returning the one largest value. It's fairly typical to include a dummy variable (`k` or `n`) in the name to indicate that the function lets you specify how many you want. See, for example, the stdlib `heapq` module's `nlargest()` function. https://docs.python.org/3/library/heapq.html#heapq.nlargest "top-k" comes from the ML community where this function is used to evaluate classification models (`k` instead of `n` being largely an accident of history, I imagine). In many classification problems, the number of classes is very large, and they are very related to each other. For example, ImageNet has a lot of different dog breeds broken out as separate classes. In order to get a more balanced view of the relative performance of the classification models, you often want to check whether the correct class is in the top 5 classes (or whatever `k` is appropriate) that the model predicted for the example, not just the one class that the model says is the most likely. "5 largest" doesn't really work in the sentences that one usually writes when talking about ML classifiers; they are talking about the 5 classes that are associated with the 5 largest values from the predictor, not the values themselves. So "top k" is what gets used in ML discussions, and that transfers over to the name of the function in ML libraries. It is a top-down reflection of the higher level thing that people want to compute (in that context) rather than a bottom-up description of how the function is manipulating the input, if that makes sense. Either one is a valid way to name things. There is a lot to be said for numpy's domain-agnostic nature that we should prefer the bottom-up description style of naming. However, we are also in the midst of a diversifying ecosystem of array libraries, largely driven by the ML domain, and adopting some of that terminology when we try to enhance our interoperability with those libraries is also a factor to be considered. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Sun May 30 02:38:06 2021 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Sun, 30 May 2021 08:38:06 +0200 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <19656066-ebcd-2725-ecba-8935f79b41f6@grinta.net> Message-ID: Since this going into the top namespace, I'd also vote against the matlab-y "topk" name. And even matlab didn't do what I would expect and went with maxk https://nl.mathworks.com/help/matlab/ref/maxk.html I think "max_k" is a good generalization of the regular "max". Even when auto-completing, this showing up under max makes sense to me instead of searching them inside "t"s. Besides, "argmax_k" also follows suite, that of the previous convention. To my eyes this is an acceptable disturbance to an already very crowded namespace. a few moments later.... But then again an ugly idea rears its head proposing this going into the existing max function. But I'll shut up now :) On Sun, May 30, 2021 at 12:50 AM Robert Kern wrote: > On Sat, May 29, 2021 at 3:35 PM Daniele Nicolodi > wrote: > >> What does k stand for here? As someone that never encountered this >> function before I find both names equally confusing. If I understand >> what the function is supposed to be doing, I think largest() would be >> much more descriptive. >> > > `k` is the number of elements to return. `largest()` can connote that it's > only returning the one largest value. It's fairly typical to include a > dummy variable (`k` or `n`) in the name to indicate that the function lets > you specify how many you want. See, for example, the stdlib `heapq` > module's `nlargest()` function. > > https://docs.python.org/3/library/heapq.html#heapq.nlargest > > "top-k" comes from the ML community where this function is used to > evaluate classification models (`k` instead of `n` being largely an > accident of history, I imagine). In many classification problems, the > number of classes is very large, and they are very related to each other. > For example, ImageNet has a lot of different dog breeds broken out as > separate classes. In order to get a more balanced view of the relative > performance of the classification models, you often want to check whether > the correct class is in the top 5 classes (or whatever `k` is appropriate) > that the model predicted for the example, not just the one class that the > model says is the most likely. "5 largest" doesn't really work in the > sentences that one usually writes when talking about ML classifiers; they > are talking about the 5 classes that are associated with the 5 largest > values from the predictor, not the values themselves. So "top k" is what > gets used in ML discussions, and that transfers over to the name of the > function in ML libraries. > > It is a top-down reflection of the higher level thing that people want to > compute (in that context) rather than a bottom-up description of how the > function is manipulating the input, if that makes sense. Either one is a > valid way to name things. There is a lot to be said for numpy's > domain-agnostic nature that we should prefer the bottom-up description > style of naming. However, we are also in the midst of a diversifying > ecosystem of array libraries, largely driven by the ML domain, and adopting > some of that terminology when we try to enhance our interoperability with > those libraries is also a factor to be considered. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Sun May 30 03:10:01 2021 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Sun, 30 May 2021 09:10:01 +0200 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <19656066-ebcd-2725-ecba-8935f79b41f6@grinta.net> Message-ID: after a coffee, I don't see the point of calling it still "k" so "max_n" is my vote for what its worth. On Sun, May 30, 2021 at 8:38 AM Ilhan Polat wrote: > Since this going into the top namespace, I'd also vote against the > matlab-y "topk" name. And even matlab didn't do what I would expect and > went with maxk > > https://nl.mathworks.com/help/matlab/ref/maxk.html > > I think "max_k" is a good generalization of the regular "max". Even when > auto-completing, this showing up under max makes sense to me instead of > searching them inside "t"s. Besides, "argmax_k" also follows suite, that of > the previous convention. To my eyes this is an acceptable disturbance to an > already very crowded namespace. > > > > a few moments later.... > > But then again an ugly idea rears its head proposing this going into the > existing max function. But I'll shut up now :) > > > > > > > > On Sun, May 30, 2021 at 12:50 AM Robert Kern > wrote: > >> On Sat, May 29, 2021 at 3:35 PM Daniele Nicolodi >> wrote: >> >>> What does k stand for here? As someone that never encountered this >>> function before I find both names equally confusing. If I understand >>> what the function is supposed to be doing, I think largest() would be >>> much more descriptive. >>> >> >> `k` is the number of elements to return. `largest()` can connote that >> it's only returning the one largest value. It's fairly typical to include a >> dummy variable (`k` or `n`) in the name to indicate that the function lets >> you specify how many you want. See, for example, the stdlib `heapq` >> module's `nlargest()` function. >> >> https://docs.python.org/3/library/heapq.html#heapq.nlargest >> >> "top-k" comes from the ML community where this function is used to >> evaluate classification models (`k` instead of `n` being largely an >> accident of history, I imagine). In many classification problems, the >> number of classes is very large, and they are very related to each other. >> For example, ImageNet has a lot of different dog breeds broken out as >> separate classes. In order to get a more balanced view of the relative >> performance of the classification models, you often want to check whether >> the correct class is in the top 5 classes (or whatever `k` is appropriate) >> that the model predicted for the example, not just the one class that the >> model says is the most likely. "5 largest" doesn't really work in the >> sentences that one usually writes when talking about ML classifiers; they >> are talking about the 5 classes that are associated with the 5 largest >> values from the predictor, not the values themselves. So "top k" is what >> gets used in ML discussions, and that transfers over to the name of the >> function in ML libraries. >> >> It is a top-down reflection of the higher level thing that people want to >> compute (in that context) rather than a bottom-up description of how the >> function is manipulating the input, if that makes sense. Either one is a >> valid way to name things. There is a lot to be said for numpy's >> domain-agnostic nature that we should prefer the bottom-up description >> style of naming. However, we are also in the midst of a diversifying >> ecosystem of array libraries, largely driven by the ML domain, and adopting >> some of that terminology when we try to enhance our interoperability with >> those libraries is also a factor to be considered. >> >> -- >> Robert Kern >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sun May 30 04:00:40 2021 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 30 May 2021 11:00:40 +0300 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> Message-ID: On 29/5/21 5:28 pm, Ralf Gommers wrote: > > > On Fri, May 28, 2021 at 4:58 PM > wrote: > > Hi all, > > Finding topk elements is widely used in several fields, but missed > in NumPy. > I implement this functionality named as? numpy.topk using core numpy > functions and open a PR: > > https://github.com/numpy/numpy/pull/19117 > > > Any discussion are welcome. > > > Thanks for the proposal Kang. I think this functionality is indeed a > fairly obvious gap in what Numpy offers, and would make sense to add. > A detailed comparison with other libraries would be very helpful here. > TensorFlow and JAX call this function `top_k`, while PyTorch, Dask and > MXNet call it `topk`. > > Two things to look at in more detail here are: > 1. complete signatures of the function in each of those libraries, and > what the commonality is there. > 2. the argument Eric made on your PR about consistency with > sort/argsort, and if we want topk/argtopk? Also, do other libraries > have `argtopk`? > > Cheers, > Ralf > > > Best wishes, > > Kang Kai > Did this function come up at all in the array-API consortium dicussions? Matti From daniele at grinta.net Sun May 30 04:10:30 2021 From: daniele at grinta.net (Daniele Nicolodi) Date: Sun, 30 May 2021 10:10:30 +0200 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <19656066-ebcd-2725-ecba-8935f79b41f6@grinta.net> Message-ID: On 30/05/2021 00:48, Robert Kern wrote: > On Sat, May 29, 2021 at 3:35 PM Daniele Nicolodi > wrote: > > What does k stand for here? As someone that never encountered this > function before I find both names equally confusing. If I understand > what the function is supposed to be doing, I think largest() would be > much more descriptive. > > > `k` is the number of elements to return. `largest()` can connote that > it's only returning the one largest value. It's fairly typical to > include a dummy variable (`k` or `n`) in the name to indicate that the > function lets you specify how many you want. See, for example, the > stdlib `heapq` module's?`nlargest()` function. I thought that a `largest()` function with an integer second argument could be enough self explanatory. `nlargest()` would be much more obvious to the wider audience, I think. > https://docs.python.org/3/library/heapq.html#heapq.nlargest > > > "top-k" comes from the ML community where this function is used to > evaluate classification models (`k` instead of `n` being largely an > accident of history,?I imagine). In many classification problems, the > number of classes is very large, and they are very related to each > other. For example, ImageNet has a lot of different dog breeds broken > out as separate classes. In order to get a more balanced view of the > relative performance of the classification models, you often want to > check whether the correct class is in the top 5 classes (or whatever `k` > is appropriate) that the model predicted for the example, not just the > one class that the model says is the most likely. "5 largest" doesn't > really work in the sentences that one usually writes when talking about > ML classifiers; they are talking about the 5 classes that?are associated > with the?5 largest values from the predictor, not the values themselves. > So "top k" is what gets used in ML discussions, and that transfers over > to the name of the function in ML libraries. > > It is a top-down reflection of the higher level thing that people want > to compute (in that context) rather than a bottom-up description of how > the function is manipulating the input, if that makes sense. Either one > is a valid way to name things. There is a lot to be said for numpy's > domain-agnostic nature that we should prefer the bottom-up description > style of naming. However, we are also in the midst of a diversifying > ecosystem of array libraries, largely driven by the ML domain, and > adopting some of that terminology when we try to enhance our > interoperability with those libraries is also a factor to be considered. I think that such a simple function should be named in the most obvious way possible, or it will become one function that will be used in the domains where the unusual name makes sense, but will end being re-implemented in all other contexts. I am sure that if I would have been looking for a function that returns the N largest items in an array (being that intended accordingly to a given key function or otherwise) I would never have looked at a function named `topk()` or `top_k()` and I am pretty sure I would have discarded anything that has `k` or `top` in its name. On the other hand, I understand that ML is where all the hipe (and a large fraction of the money) is this days, thus I understand if numpy wants to appease the crowd. Cheers, Dan From kangkai at mail.ustc.edu.cn Sun May 30 04:40:46 2021 From: kangkai at mail.ustc.edu.cn (kangkai at mail.ustc.edu.cn) Date: Sun, 30 May 2021 16:40:46 +0800 (GMT+08:00) Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> Message-ID: <48beb3be.116b3.179bc6ec32e.Coremail.kangkai@mail.ustc.edu.cn> > > > On Fri, May 28, 2021 at 4:58 PM > wrote: > > Hi all, > > Finding topk elements is widely used in several fields, but missed > in NumPy. > I implement this functionality named as numpy.topk using core numpy > functions and open a PR: > > https://github.com/numpy/numpy/pull/19117 > > > Any discussion are welcome. > > > Thanks for the proposal Kang. I think this functionality is indeed a > fairly obvious gap in what Numpy offers, and would make sense to add. > A detailed comparison with other libraries would be very helpful here. > TensorFlow and JAX call this function `top_k`, while PyTorch, Dask and > MXNet call it `topk`. > > Two things to look at in more detail here are: > 1. complete signatures of the function in each of those libraries, and > what the commonality is there. > 2. the argument Eric made on your PR about consistency with > sort/argsort, and if we want topk/argtopk? Also, do other libraries > have `argtopk`? > > Cheers, > Ralf > > > Best wishes, > > Kang Kai > Hi, Thanks for reply, I present some details below: ## 1. complete signatures of the function in each of those libraries, and what the commonality is there. | Library | Name | arg1 | arg2 | arg3 | arg4 | arg5 | |-------------|--------------------|-------|------|------|-----------|--------| | NumPy [1] | numpy.topk | a | k | axis | largest | sorted | | PyTorch [2] | torch.topk | input | k | dim | largest | sorted | | R [3] | topK | x | K | / | / | / | | MXNet [4] | mxnet.npx.topk | data | k | axis | is_ascend | / | | CNTK [5] | cntk.ops.top_k | x | k | axis | / | / | | TF [6] | tf.math.top_k | input | k | / | / | sorted | | Dask [7] | dask.array.topk | a | k | axis | -k | / | | Dask [8] | dask.array.argtopk | a | k | axis | -k | / | | MATLAB [9] | mink | A | k | dim | / | / | | MATLAB [10] | maxk | A | k | dim | / | / | | Library | Name | Returns | |-------------|--------------------|---------------------| | NumPy [1] | numpy.topk | values, indices | | PyTorch [2] | torch.topk | values, indices | | R [3] | topK | indices | | MXNet [4] | mxnet.npx.topk | controls by ret_typ | | CNTK [5] | cntk.ops.top_k | values, indices | | TF [6] | tf.math.top_k | values, indices | | Dask [7] | dask.array.topk | values | | Dask [8] | dask.array.argtopk | indices | | MATLAB [9] | mink | values, indices | | MATLAB [10] | maxk | values, indices | - arg1: Input array. - arg2: Number of top elements to look for along the given axis. - arg3: Axis along which to find topk. - R only supports vector, TensorFlow only supports axis=-1. - arg4: Controls whether to return k largest or smallest elements. - R, CNTK and TensorFlow only return k largest elements. - In Dask, k can be negative, which means to return k smallest elements. - In MATLAB, use two distinct functions. - arg5: If true the resulting k elements will be sorted by the values. - R, MXNet, CNTK, Dask and MATLAB only return sorted elements. **Summary**: - Function Name: could be `topk`, `top_k`, `mink`/`maxk`. - arg1 (a), arg2 (k), arg3 (axis): should be required. - arg4 (largest), arg4 (sorted): might be discussed. - Returns: discussed below. ## 2. the argument Eric made on your PR about consistency with sort/argsort, if we want topk/argtopk? Also, do other libraries have `argtopk` In most libraries, `topk` or `top_k` returns both values and indices, and `argtopk` is not included except for Dask. In addition, there is another inconsistency: `sort` returns ascending values, but `topk` returns descending values. ## Suggestions Finally, IMHO, new function signature might be designed as one of: I) use `topk` / `argtopk` or `top_k` / `argtop_k` ```python def topk(a, k, axis=-1, sorted=True) -> topk_values def argtopk(a, k, axis=-1, sorted=True) -> topk_indices ``` or ```python def top_k(a, k, axis=-1, sorted=True) -> topk_values def argtop_k(a, k, axis=-1, sorted=True) -> topk_indices ``` where `k` can be negative which means to return k smallest elements. II) use `maxk` / `argmaxk` or `max_k` / `argmax_k` (`mink` / `argmink` or `min_k` / `argmin_k`) ```python def maxk(a, k, axis=-1, sorted=True) -> values def argmaxk(a, k, axis=-1, sorted=True) -> indices def mink(a, k, axis=-1, sorted=True) -> values def argmink(a, k, axis=-1, sorted=True) -> indices ``` or ```python def max_k(a, k, axis=-1, sorted=True) -> values def argmax_k(a, k, axis=-1, sorted=True) -> indices def min_k(a, k, axis=-1, sorted=True) -> values def argmin_k(a, k, axis=-1, sorted=True) -> indices ``` where `k` must be positive. **References**: - [1] https://github.com/numpy/numpy/pull/19117 - [2] https://pytorch.org/docs/stable/generated/torch.topk.html - [3] https://www.rdocumentation.org/packages/tensr/versions/1.0.1/topics/topK - [4] https://mxnet.apache.org/versions/master/api/python/docs/api/npx/generated/mxnet.npx.topk.html - [5] https://docs.microsoft.com/en-us/python/api/cntk/cntk.ops?view=cntk-py-2.7#top-k-x--k--axis--1--name---- - [6] https://tensorflow.google.cn/api_docs/python/tf/math/top_k?hl=zh-cn - [7] https://docs.dask.org/en/latest/array-api.html?highlight=topk#dask.array.topk - [8] https://docs.dask.org/en/latest/array-api.html?highlight=topk#dask.array.argtopk - [9] https://nl.mathworks.com/help/matlab/ref/maxk.html - [10] https://nl.mathworks.com/help/matlab/ref/mink.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sun May 30 07:23:50 2021 From: alan.isaac at gmail.com (Alan G. Isaac) Date: Sun, 30 May 2021 07:23:50 -0400 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <19656066-ebcd-2725-ecba-8935f79b41f6@grinta.net> Message-ID: <4cd8d679-b14d-71b3-5227-0ffc458c1208@gmail.com> Is there any thought of allowing for other comparisons? In which case `last_k` might be preferable. Alan Isaac On 5/30/2021 2:38 AM, Ilhan Polat wrote: > > I think "max_k" is a good generalization of the regular "max". From alan.isaac at gmail.com Sun May 30 07:50:17 2021 From: alan.isaac at gmail.com (Alan G. Isaac) Date: Sun, 30 May 2021 07:50:17 -0400 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: <48beb3be.116b3.179bc6ec32e.Coremail.kangkai@mail.ustc.edu.cn> References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <48beb3be.116b3.179bc6ec32e.Coremail.kangkai@mail.ustc.edu.cn> Message-ID: <4906b29f-126e-d787-3da0-5aa3f5c92a15@gmail.com> Mathematica and Julia both seem relevant here. Mma has TakeLargest (and Wolfram tends to think hard about names). https://reference.wolfram.com/language/ref/TakeLargest.html Julia's closest comparable is perhaps partialsortperm: https://docs.julialang.org/en/v1/base/sort/#Base.Sort.partialsortperm Alan Isaac On 5/30/2021 4:40 AM, kangkai at mail.ustc.edu.cn wrote: > Hi,?Thanks?for?reply,?I?present?some?details?below: From ndbecker2 at gmail.com Sun May 30 08:22:08 2021 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 30 May 2021 08:22:08 -0400 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: <4906b29f-126e-d787-3da0-5aa3f5c92a15@gmail.com> References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <48beb3be.116b3.179bc6ec32e.Coremail.kangkai@mail.ustc.edu.cn> <4906b29f-126e-d787-3da0-5aa3f5c92a15@gmail.com> Message-ID: Topk is a bad choice imo. I initially parsed it as to_pk, and had no idea what that was, although sounded a lot like a scipy signal function. Nlargest would be very obvious. On Sun, May 30, 2021, 7:50 AM Alan G. Isaac wrote: > Mathematica and Julia both seem relevant here. > Mma has TakeLargest (and Wolfram tends to think hard about names). > https://reference.wolfram.com/language/ref/TakeLargest.html > Julia's closest comparable is perhaps partialsortperm: > https://docs.julialang.org/en/v1/base/sort/#Base.Sort.partialsortperm > Alan Isaac > > > > On 5/30/2021 4:40 AM, kangkai at mail.ustc.edu.cn wrote: > > Hi, Thanks for reply, I present some details below: > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Sun May 30 23:31:09 2021 From: ben.v.root at gmail.com (Benjamin Root) Date: Sun, 30 May 2021 23:31:09 -0400 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <48beb3be.116b3.179bc6ec32e.Coremail.kangkai@mail.ustc.edu.cn> <4906b29f-126e-d787-3da0-5aa3f5c92a15@gmail.com> Message-ID: to be honest, I read "topk" as "topeka", but I am weird. While numpy doesn't use underscores all that much, I think this is one case where it makes sense. I'd also watch out for the use of the term "sorted", as it may mean different things to different people, particularly with regards to what its default value should be. I also find myself initially confused by the names "largest" and "sorted", especially what should they mean with the "min-k" behavior. I think Dask's use of negative k is very pythonic and would help keep the namespace clean by avoiding the extra "min_k". As for the indices, I am of two minds. On the one hand, I don't like polluting the namespace with extra functions. On the other hand, having a function that behaves differently based on a parameter is just fugly, although we do have a function that does this - np.unique(). Ben Root On Sun, May 30, 2021 at 8:22 AM Neal Becker wrote: > Topk is a bad choice imo. I initially parsed it as to_pk, and had no idea > what that was, although sounded a lot like a scipy signal function. > Nlargest would be very obvious. > > On Sun, May 30, 2021, 7:50 AM Alan G. Isaac wrote: > >> Mathematica and Julia both seem relevant here. >> Mma has TakeLargest (and Wolfram tends to think hard about names). >> https://reference.wolfram.com/language/ref/TakeLargest.html >> Julia's closest comparable is perhaps partialsortperm: >> https://docs.julialang.org/en/v1/base/sort/#Base.Sort.partialsortperm >> Alan Isaac >> >> >> >> On 5/30/2021 4:40 AM, kangkai at mail.ustc.edu.cn wrote: >> > Hi, Thanks for reply, I present some details below: >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfine2358 at gmail.com Mon May 31 11:10:48 2021 From: jfine2358 at gmail.com (Jonathan Fine) Date: Mon, 31 May 2021 16:10:48 +0100 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <48beb3be.116b3.179bc6ec32e.Coremail.kangkai@mail.ustc.edu.cn> <4906b29f-126e-d787-3da0-5aa3f5c92a15@gmail.com> Message-ID: Here's my opinion, as a bit of an outsider. Mainly, I understand MAX to mean the largest value in a finite totally ordered set. I understand TOP to mean the 'best' member of a finite set. For example, on a mountain each point has a HEIGHT. There will be a MAX HEIGHT. The point(s) on the mountain that is the highest is the SUMMIT. Or in other words the TOP of the mountain. Or another example, there are TOP 40 charts for music. https://www.officialcharts.com/ To summarize, use MAX for the largest value in a totally ordered set. Use TOP when you have a height (or similar) function applied to an unordered set. The highest temperature in 2021 will occur on the hottest day(s). One is a temperature, the other a date. I'm an outsider, and I've not made an effort to gain special knowledge about the domain prior to posting this opinion. I hope it helps. Please ignore it if it does not. -- Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon May 31 12:26:53 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 31 May 2021 18:26:53 +0200 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> Message-ID: On Sun, May 30, 2021 at 10:01 AM Matti Picus wrote: > > > Did this function come up at all in the array-API consortium dicussions? > It happens to be in this list of functions which was made last week: https://github.com/data-apis/array-api/issues/187. That list is potential next candidates, based on them being implemented in most but not all libraries. There was no real discussion on `topk` specifically though. The current version of the array API standard basically contains functionality that is either common to all libraries, or that NumPy has and most other libraries have as well. Given how much harder it is to get functions into NumPy than in other libraries, the "most libraries have it, NumPy does not" set of functions was not investigated much yet. That's also the reason NEP 47 doesn't have any new functions to be added to NumPy except for `from_dlpack`, but only consistency changes like adding keepdims keywords, stacking for linalg functions that are missing that, etc. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon May 31 12:49:14 2021 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 31 May 2021 18:49:14 +0200 Subject: [Numpy-discussion] EHN: Discusions about 'add numpy.topk' In-Reply-To: <48beb3be.116b3.179bc6ec32e.Coremail.kangkai@mail.ustc.edu.cn> References: <56656bdf.f7c3.179b37b689a.Coremail.kangkai@mail.ustc.edu.cn> <48beb3be.116b3.179bc6ec32e.Coremail.kangkai@mail.ustc.edu.cn> Message-ID: On Sun, May 30, 2021 at 10:41 AM wrote: > > > > > > On Fri, May 28, 2021 at 4:58 PM > > wrote: > > > > Hi all, > > > > Finding topk elements is widely used in several fields, but missed > > in NumPy. > > I implement this functionality named as numpy.topk using core numpy > > functions and open a PR: > > > > https://github.com/numpy/numpy/pull/19117 > > > > > > Any discussion are welcome. > > > > > > Thanks for the proposal Kang. I think this functionality is indeed a > > fairly obvious gap in what Numpy offers, and would make sense to add. > > A detailed comparison with other libraries would be very helpful here. > > TensorFlow and JAX call this function `top_k`, while PyTorch, Dask and > > MXNet call it `topk`. > > > > Two things to look at in more detail here are: > > 1. complete signatures of the function in each of those libraries, and > > what the commonality is there. > > 2. the argument Eric made on your PR about consistency with > > sort/argsort, and if we want topk/argtopk? Also, do other libraries > > have `argtopk`? > > > > Cheers, > > Ralf > > > > > > Best wishes, > > > > Kang Kai > > > > Hi, Thanks for reply, I present some details below: > Thanks for the detailed investigation Kang! > > ## 1. complete signatures of the function in each of those libraries, and what the commonality is there. > > > | Library | Name | arg1 | arg2 | arg3 | arg4 | arg5 | > > |-------------|--------------------|-------|------|------|-----------|--------| > | NumPy [1 > ] | numpy.topk | a | k | axis | largest | sorted | > | PyTorch [2 > ] | torch.topk | input | k | dim | largest | sorted | > | R [3 > ] | topK | x | K | / | / | / | > | MXNet [4 > ] | mxnet.npx.topk | data | k | axis | is_ascend | / | > | CNTK [5 > ] | cntk.ops.top_k | x | k | axis | / | / | > | TF [6 > ] | tf.math.top_k | input | k | / | / | sorted | > | Dask [7 > ] | dask.array.topk | a | k | axis | -k | / | > | Dask [8 > ] | dask.array.argtopk | a | k | axis | -k | / | > | MATLAB [9 > ] | mink | A | k | dim | / | / | > | MATLAB [10 > ] | maxk | A | k | dim | / | / | > > > | Library | Name | Returns | > |-------------|--------------------|---------------------| > | NumPy [1] | numpy.topk | values, indices | > | PyTorch [2] | torch.topk | values, indices | > | R [3] | topK | indices | > | MXNet [4] | mxnet.npx.topk | controls by ret_typ | > | CNTK [5] | cntk.ops.top_k | values, indices | > | TF [6] | tf.math.top_k | values, indices | > | Dask [7] | dask.array.topk | values | > | Dask [8] | dask.array.argtopk | indices | > | MATLAB [9] | mink | values, indices | > | MATLAB [10] | maxk | values, indices | > > - arg1: Input array. > - arg2: Number of top elements to look for along the given axis. > - arg3: Axis along which to find topk. > - R only supports vector, TensorFlow only supports axis=-1. > - arg4: Controls whether to return k largest or smallest elements. > - R, CNTK and TensorFlow only return k largest elements. > - > In Dask, k can be negative, which means to return k smallest elements. > - In MATLAB, use two distinct functions. > - arg5: If true the resulting k elements will be sorted by the values. > - R, MXNet, CNTK, Dask and MATLAB only return sorted elements. > > **Summary**: > - Function Name: could be `topk`, `top_k`, `mink`/`maxk`. > - arg1 (a), arg2 (k), arg3 (axis): should be required. > - arg4 (largest), arg4 (sorted): might be discussed. > - Returns: discussed below. > > > ## 2. the argument Eric made on your PR about consistency with sort/argsort, if we want topk/argtopk? Also, do other libraries have `argtopk` > > In most libraries, `topk` or `top_k` returns both values and indices, and > `argtopk` is not included except for Dask. In addition, there is another > inconsistency: `sort` returns ascending values, but `topk` returns > descending values. > > ## Suggestions > Finally, IMHO, new function signature might be designed as one of: > I) use `topk` / `argtopk` or `top_k` / `argtop_k` > ```python > def topk(a, k, axis=-1, sorted=True) -> topk_values > def argtopk(a, k, axis=-1, sorted=True) -> topk_indices > ``` > or > ```python > def top_k(a, k, axis=-1, sorted=True) -> topk_values > def argtop_k(a, k, axis=-1, sorted=True) -> topk_indices > ``` > where `k` can be negative which means to return k smallest elements. > I don't think I'm a fan of the `-k` cleverness. Saying you want `-5` values as a stand-in for wanting the 5 smallest values is worse than a keyword imho. It seems like commenters so far have a preference for `top_k` over `topk`, because of readability. Either way it's going to impact Dask, JAX, etc. - so it would be nice to get some input from maintainers of those libraries. The two functions vs. returning `(values, indices)` is also a tricky choice - it may depend on usage patterns. If one needs indices a lot, then there's something to say for the tuple return. Otherwise the code is going to look like: indices = argtop_k(x, ....) values = x[indices] which is significantly worse than: values, indices = top_k(x, ...) > II) use `maxk` / `argmaxk` or `max_k` / `argmax_k` (`mink` / `argmink` or > `min_k` / `argmin_k`) > I suggest to forget about maxk/max_k. All Python libraries call it topk/top_k. And Matlab choosing something is usually a good reason to run in the other direction. Cheers, Ralf ```python > def maxk(a, k, axis=-1, sorted=True) -> values > def argmaxk(a, k, axis=-1, sorted=True) -> indices > > def mink(a, k, axis=-1, sorted=True) -> values > def argmink(a, k, axis=-1, sorted=True) -> indices > ``` > or > ```python > def max_k(a, k, axis=-1, sorted=True) -> values > def argmax_k(a, k, axis=-1, sorted=True) -> indices > > def min_k(a, k, axis=-1, sorted=True) -> values > def argmin_k(a, k, axis=-1, sorted=True) -> indices > ``` > where `k` must be positive. > > > **References**: > - [1] https://github.com/numpy/numpy/pull/19117 > - [2] https://pytorch.org/docs/stable/generated/torch.topk.html > - [3] > https://www.rdocumentation.org/packages/tensr/versions/1.0.1/topics/topK > - [4] > https://mxnet.apache.org/versions/master/api/python/docs/api/npx/generated/mxnet.npx.topk.html > - [5] > https://docs.microsoft.com/en-us/python/api/cntk/cntk.ops?view=cntk-py-2.7#top-k-x--k--axis--1--name---- > - [6] https://tensorflow.google.cn/api_docs/python/tf/math/top_k?hl=zh-cn > - [7] > https://docs.dask.org/en/latest/array-api.html?highlight=topk#dask.array.topk > - [8] > https://docs.dask.org/en/latest/array-api.html?highlight=topk#dask.array.argtopk > - [9] https://nl.mathworks.com/help/matlab/ref/maxk.html > - [10] https://nl.mathworks.com/help/matlab/ref/mink.html > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: