From wesmckinn at gmail.com Sun Dec 1 20:34:22 2019 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 1 Dec 2019 19:34:22 -0600 Subject: [Pandas-dev] Just a quick question from a regular pandas user In-Reply-To: <9e4fe1eaa534dfd3150cf694c0fce0e557fed7ab.camel@pietrobattiston.it> References: <1309884188.227412.1574998229127@nm43.abv.bg> <9e4fe1eaa534dfd3150cf694c0fce0e557fed7ab.camel@pietrobattiston.it> Message-ID: hi Pietro, There is, perhaps, a philosophical question about open source projects: should a project have aspirational goals? Should developers talk openly about large, difficult, uncertain, long term goals? Unfortunately, many users have a one-sided consumption-based relationship with open source projects and so their interactions with the developers tend to be about getting what they need out of the project or finding out when to expect something to be done. Large projects like pandas will also face disinformation or misinformed users jumping to conclusions about what the maintainers are thinking or where the project is heading. Personally, I don't think open source maintainers owe their users the same kinds of communication that an enterprise software company (getting paid millions or billions of dollars per year) might provide. Many users behave like customers (I recall a discussion on this list where someone literally called themselves a "customer") but I guess their checks got lost in the mail. I started the "pandas 2" discussion in 2016 to try to understand if there was any consensus of thought about some of the architectural / design shortcomings about pandas's internals and what people think we should do about them. I also thought maybe it might spur some funding or new sources of support for the project. pandas seems also to be saddled with many legacy support issues though as time goes on some of these are getting trimmed away, while other large issues (e.g. the BlockManager) remain. I don't see a problem with having this discussion and writing down some ideas. I'm not sure what needs to be done to clarify that anything relating to "pandas 2" are aspirational goals of the project without any concrete plans to deliver a new piece of software until stated otherwise, but if it isn't clear, let's make it more clear. If anyone was expecting me to pull a rabbit out of a hat and deliver this end-to-end on my own, I am sorry to disappoint. With the anemic state of funding for open source innovation-focused projects, it is going to take quite a long time to get all the pieces developed. FWIW, I've been consistently working since mid/late 2015 (so approaching 5 years now...) precisely to create a more sustainable, general purpose technology foundation for data frame libraries in part to make it easier for something like "pandas 2" (as it was discussed at the time) to come to life. Something I've said many times is that I think the pandas maintainers' scope of code ownership should decrease, not increase. One of the problems with pandas (and other libraries, too) is the relatively strong coupling between the API and the implementation -- as a result of this in Apache Arrow we have strived to avoid having an opinion about what kind of public API a data frame user will see. - Wes On Sat, Nov 30, 2019 at 9:48 AM Pietro Battiston wrote: > > Dear devs, > > every time that "pandas 2" comes out, it is (it seems to me) not > because of our concrete plans for it, or even because it is used as > inspiration for current pandas (which by the way is receiving great and > substantial improvements), but because some user is confused by the > docs/issues mentioning it. > > I know it is somewhat of a rhetorical question - because we ourselves > always considered "pandas 2" first and foremost as a direction to take > (or at least discuss) rather than as a version to release - but I'm > wondering whether having pandas 2 mentioned, discussed and postponed > (inevitably, as we are not even really targeting at it) is really > helpful, and in particular whether the separate github project is > really helpful. > > I see two options: > - spend serious effort in communicating users what/when to expect (and > not to expect) from pandas 2 > - delete any mention to pandas 2 from our github and from the "pandas > 2.0 Design Documents" - which could be just described as "the future of > pandas" > > ... which clearly doesn't mean we do not need to introduce important > changes in pandas (this is happening daily), or that there shouldn't be a version 2.0 some day. > > This is some of the "confused users" I have in mind: > https://www.reddit.com/r/datascience/comments/8rcoou/what_happened_to_pandas_2/ > > Cheers, > > Pietro > > Il giorno ven, 29/11/2019 alle 12.05 +0100, Joris Van den Bossche ha > scritto: > > Hi Martin, > > > > The 2.0 milestone is not updated for a very long time, and also not > > yet really used (there are a few issues tagged with it to mean "maybe > > in a next big release but not yet in 1.0"). So I wouldn't look too > > much to that. In any case, we are certainly not going to do a pandas > > 2.0 release in summer 2020 (so we should update the milestone date). > > > > What we do plan is a final 1.0 release in early 2020. What we also > > discussed recently is a version policy for starting with 1.0: > > https://dev.pandas.io/docs/development/policies.html#version-policy > > This means that code working with 1.0 should mostly keep working in > > the full 1.x series of releases when not using experimental features > > (although we will keep doing deprecations, so you still might need to > > change code to get rid of such warnings, in preparation of pandas > > 2.0). > > > > And you are correct: pandas 1.0 will not be drastically different > > from 0.25.3 (the main difference will be that a lot of things that > > were deprecated before will now be removed, plus some documented API > > changes). While we do not yet have much concrete plans for pandas > > 2.0, I think the expectation is that it will be similar (and also not > > something for the coming year anyway). > > > > So if you are writing code now for 0.25.3, and you take notice of > > possible deprecation warnings and fix your code for those, you can be > > ensured that your code will mostly work on 1.0 as well. > > Now, it is still very recommended to ensure you write tests for your > > code, so you can run those on new pandas releases to verify this is > > indeed the case (and running such tests on release candidates of new > > pandas releases is also very valuable, so potential regressions can > > be reported and fixed early). > > > > Hopefully that could shed some light > > Joris > > > > > > On Fri, 29 Nov 2019 at 05:42, Martin Gantchev wrote: > > > Dear Representatives of Pandas-dev, > > > > > > This is Martin here, a regular user of the pandas library. > > > > > > First of all, thank you for providing, maintaining and still > > > developing this amazing library which I use pretty much every day. > > > > > > On that note, I am facing a project that will involve working with > > > pandas heavily, but that is supposed to retain the code for a long > > > period of time (hopefully, for years to come). > > > > > > I am referring to this piece of information: > > > > > > https://github.com/pandas-dev/pandas/milestones > > > > > > It seems that pandas 1.0 has 90% completion rate, while pandas 2.0 > > > is expected to be ready for as early as August 2020, however it > > > strangely has just 10 problems that need to be solved. > > > > > > Of course, no precise answer is requested. However, I am afraid > > > that in the next couple of months I may write code that might > > > become obsolete in the middle of next summer. Am I right about > > > that? > > > > > > I did read around the internet and read more articles, so I don't > > > expect neither 1.0 or 2.0 to be drastically different from 0.25.3. > > > At least, I guess most of the code I'd use in 0.25.3 should work > > > normally under 1.0 or 2.0. Is that correct? > > > > > > Shedding light on this subject may save tons of worries for me, so > > > even a loose delineation of your schedule and the potential impact > > > it may have on code written in 0.25.3 would be greatly appreciated. > > > > > > Thank you very much! > > > > > > Looking forward to your answer. > > > Best, > > > Martin > > > _______________________________________________ > > > Pandas-dev mailing list > > > Pandas-dev at python.org > > > https://mail.python.org/mailman/listinfo/pandas-dev > > > > _______________________________________________ > > Pandas-dev mailing list > > Pandas-dev at python.org > > https://mail.python.org/mailman/listinfo/pandas-dev > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev From me at pietrobattiston.it Sun Dec 1 23:53:36 2019 From: me at pietrobattiston.it (Pietro Battiston) Date: Mon, 02 Dec 2019 05:53:36 +0100 Subject: [Pandas-dev] Just a quick question from a regular pandas user In-Reply-To: References: <1309884188.227412.1574998229127@nm43.abv.bg> <9e4fe1eaa534dfd3150cf694c0fce0e557fed7ab.camel@pietrobattiston.it> Message-ID: <6bc7eb39d235fc467cb41340383ac15472ff1d29.camel@pietrobattiston.it> Hi Wes, thanks for your reply. I totally agree with your view that open source devs shouldn't be scared by their users... but if indeed pandas 2 is an "aspirational goal", which is great, maybe labelling it as "2" is not doing a good service to us either? Pandas 2.0, when it will eventually come out, will have benefited (in terms of coordination, and possibly funding) from your "pandas 2" plans, but it will still emerge, as always happens, from long discussions that might result in more, or less, or slightly different, changes than envisaged in those docs. Hence if we devote attention do those documents, they will sooner or later become a source of confusion - while if we don't, they are only a source of confusion for users. Again, I would understand the cost to users (in terms of confusion) if it was a resource to us, but it is instead mostly a cost to us (every time we take time in explaining some confused user "what's happening with pandas 2"). And to be honest, what bothers me most is the github project. I don't think that a mostly empty, inactive project, with issues we don't really aim at closing - including some which are actually being discussed in parallel "today's pandas" issues!? - is of help to us, and even less that it inspires funders to give money! I probably wouldn't be writing this mail if pandas was in crucial need of 1) more innovative momentum, or 2) coordinated efforts to communicate this to potential funders. But the devs (not me, unfortunately) have been doing great lately on both these aspects, also getting inspiration from these 2016 docs, but definitely not (I think) benefitting from the "pandas 2" label. So I would keep the docs somewhere, but avoiding the ambiguity of whether they are talking about a different project, and definitely avoiding the ambiguity or whether they are describing a specific version of the current project. Pietro ? https://github.com/pandas-dev/pandas2/issues/37 Il giorno dom, 01/12/2019 alle 19.34 -0600, Wes McKinney ha scritto: > hi Pietro, > > There is, perhaps, a philosophical question about open source > projects: should a project have aspirational goals? Should developers > talk openly about large, difficult, uncertain, long term goals? > > Unfortunately, many users have a one-sided consumption-based > relationship with open source projects and so their interactions with > the developers tend to be about getting what they need out of the > project or finding out when to expect something to be done. Large > projects like pandas will also face disinformation or misinformed > users jumping to conclusions about what the maintainers are thinking > or where the project is heading. > > Personally, I don't think open source maintainers owe their users the > same kinds of communication that an enterprise software company > (getting paid millions or billions of dollars per year) might > provide. > Many users behave like customers (I recall a discussion on this list > where someone literally called themselves a "customer") but I guess > their checks got lost in the mail. > > I started the "pandas 2" discussion in 2016 to try to understand if > there was any consensus of thought about some of the architectural / > design shortcomings about pandas's internals and what people think we > should do about them. I also thought maybe it might spur some funding > or new sources of support for the project. pandas seems also to be > saddled with many legacy support issues though as time goes on some > of > these are getting trimmed away, while other large issues (e.g. the > BlockManager) remain. I don't see a problem with having this > discussion and writing down some ideas. > > I'm not sure what needs to be done to clarify that anything relating > to "pandas 2" are aspirational goals of the project without any > concrete plans to deliver a new piece of software until stated > otherwise, but if it isn't clear, let's make it more clear. If anyone > was expecting me to pull a rabbit out of a hat and deliver this > end-to-end on my own, I am sorry to disappoint. With the anemic state > of funding for open source innovation-focused projects, it is going > to > take quite a long time to get all the pieces developed. > > FWIW, I've been consistently working since mid/late 2015 (so > approaching 5 years now...) precisely to create a more sustainable, > general purpose technology foundation for data frame libraries in > part > to make it easier for something like "pandas 2" (as it was discussed > at the time) to come to life. Something I've said many times is that > I > think the pandas maintainers' scope of code ownership should > decrease, > not increase. One of the problems with pandas (and other libraries, > too) is the relatively strong coupling between the API and the > implementation -- as a result of this in Apache Arrow we have strived > to avoid having an opinion about what kind of public API a data frame > user will see. > > - Wes > > On Sat, Nov 30, 2019 at 9:48 AM Pietro Battiston < > me at pietrobattiston.it> wrote: > > Dear devs, > > > > every time that "pandas 2" comes out, it is (it seems to me) not > > because of our concrete plans for it, or even because it is used as > > inspiration for current pandas (which by the way is receiving great > > and > > substantial improvements), but because some user is confused by the > > docs/issues mentioning it. > > > > I know it is somewhat of a rhetorical question - because we > > ourselves > > always considered "pandas 2" first and foremost as a direction to > > take > > (or at least discuss) rather than as a version to release - but I'm > > wondering whether having pandas 2 mentioned, discussed and > > postponed > > (inevitably, as we are not even really targeting at it) is really > > helpful, and in particular whether the separate github project is > > really helpful. > > > > I see two options: > > - spend serious effort in communicating users what/when to expect > > (and > > not to expect) from pandas 2 > > - delete any mention to pandas 2 from our github and from the > > "pandas > > 2.0 Design Documents" - which could be just described as "the > > future of > > pandas" > > > > ... which clearly doesn't mean we do not need to introduce > > important > > changes in pandas (this is happening daily), or that there > > shouldn't be a version 2.0 some day. > > > > This is some of the "confused users" I have in mind: > > https://www.reddit.com/r/datascience/comments/8rcoou/what_happened_to_pandas_2/ > > > > Cheers, > > > > Pietro > > > > Il giorno ven, 29/11/2019 alle 12.05 +0100, Joris Van den Bossche > > ha > > scritto: > > > Hi Martin, > > > > > > The 2.0 milestone is not updated for a very long time, and also > > > not > > > yet really used (there are a few issues tagged with it to mean > > > "maybe > > > in a next big release but not yet in 1.0"). So I wouldn't look > > > too > > > much to that. In any case, we are certainly not going to do a > > > pandas > > > 2.0 release in summer 2020 (so we should update the milestone > > > date). > > > > > > What we do plan is a final 1.0 release in early 2020. What we > > > also > > > discussed recently is a version policy for starting with 1.0: > > > https://dev.pandas.io/docs/development/policies.html#version-policy > > > This means that code working with 1.0 should mostly keep working > > > in > > > the full 1.x series of releases when not using experimental > > > features > > > (although we will keep doing deprecations, so you still might > > > need to > > > change code to get rid of such warnings, in preparation of pandas > > > 2.0). > > > > > > And you are correct: pandas 1.0 will not be drastically different > > > from 0.25.3 (the main difference will be that a lot of things > > > that > > > were deprecated before will now be removed, plus some documented > > > API > > > changes). While we do not yet have much concrete plans for pandas > > > 2.0, I think the expectation is that it will be similar (and also > > > not > > > something for the coming year anyway). > > > > > > So if you are writing code now for 0.25.3, and you take notice of > > > possible deprecation warnings and fix your code for those, you > > > can be > > > ensured that your code will mostly work on 1.0 as well. > > > Now, it is still very recommended to ensure you write tests for > > > your > > > code, so you can run those on new pandas releases to verify this > > > is > > > indeed the case (and running such tests on release candidates of > > > new > > > pandas releases is also very valuable, so potential regressions > > > can > > > be reported and fixed early). > > > > > > Hopefully that could shed some light > > > Joris > > > > > > > > > On Fri, 29 Nov 2019 at 05:42, Martin Gantchev > > > wrote: > > > > Dear Representatives of Pandas-dev, > > > > > > > > This is Martin here, a regular user of the pandas library. > > > > > > > > First of all, thank you for providing, maintaining and still > > > > developing this amazing library which I use pretty much every > > > > day. > > > > > > > > On that note, I am facing a project that will involve working > > > > with > > > > pandas heavily, but that is supposed to retain the code for a > > > > long > > > > period of time (hopefully, for years to come). > > > > > > > > I am referring to this piece of information: > > > > > > > > https://github.com/pandas-dev/pandas/milestones > > > > > > > > It seems that pandas 1.0 has 90% completion rate, while pandas > > > > 2.0 > > > > is expected to be ready for as early as August 2020, however it > > > > strangely has just 10 problems that need to be solved. > > > > > > > > Of course, no precise answer is requested. However, I am afraid > > > > that in the next couple of months I may write code that might > > > > become obsolete in the middle of next summer. Am I right about > > > > that? > > > > > > > > I did read around the internet and read more articles, so I > > > > don't > > > > expect neither 1.0 or 2.0 to be drastically different from > > > > 0.25.3. > > > > At least, I guess most of the code I'd use in 0.25.3 should > > > > work > > > > normally under 1.0 or 2.0. Is that correct? > > > > > > > > Shedding light on this subject may save tons of worries for me, > > > > so > > > > even a loose delineation of your schedule and the potential > > > > impact > > > > it may have on code written in 0.25.3 would be greatly > > > > appreciated. > > > > > > > > Thank you very much! > > > > > > > > Looking forward to your answer. > > > > Best, > > > > Martin > > > > _______________________________________________ > > > > Pandas-dev mailing list > > > > Pandas-dev at python.org > > > > https://mail.python.org/mailman/listinfo/pandas-dev > > > > > > _______________________________________________ > > > Pandas-dev mailing list > > > Pandas-dev at python.org > > > https://mail.python.org/mailman/listinfo/pandas-dev > > > > _______________________________________________ > > Pandas-dev mailing list > > Pandas-dev at python.org > > https://mail.python.org/mailman/listinfo/pandas-dev From wesmckinn at gmail.com Sun Dec 1 23:55:46 2019 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 1 Dec 2019 22:55:46 -0600 Subject: [Pandas-dev] Just a quick question from a regular pandas user In-Reply-To: <6bc7eb39d235fc467cb41340383ac15472ff1d29.camel@pietrobattiston.it> References: <1309884188.227412.1574998229127@nm43.abv.bg> <9e4fe1eaa534dfd3150cf694c0fce0e557fed7ab.camel@pietrobattiston.it> <6bc7eb39d235fc467cb41340383ac15472ff1d29.camel@pietrobattiston.it> Message-ID: On Sun, Dec 1, 2019 at 10:53 PM Pietro Battiston wrote: > > Hi Wes, > > thanks for your reply. I totally agree with your view that open source > devs shouldn't be scared by their users... but if indeed pandas 2 is an > "aspirational goal", which is great, maybe labelling it as "2" is not > doing a good service to us either? > > Pandas 2.0, when it will eventually come out, will have benefited (in > terms of coordination, and possibly funding) from your "pandas 2" > plans, but it will still emerge, as always happens, from long > discussions that might result in more, or less, or slightly different, > changes than envisaged in those docs. Hence if we devote attention do > those documents, they will sooner or later become a source of confusion > - while if we don't, they are only a source of confusion for users. > Again, I would understand the cost to users (in terms of confusion) if > it was a resource to us, but it is instead mostly a cost to us (every > time we take time in explaining some confused user "what's happening > with pandas 2"). > > And to be honest, what bothers me most is the github project. I don't > think that a mostly empty, inactive project, with issues we don't > really aim at closing - including some which are actually being > discussed in parallel "today's pandas" issues!? - is of help to us, and > even less that it inspires funders to give money! This is easy. Go ahead and alter or remove the GitHub repository. Or I can do it, too. > I probably wouldn't be writing this mail if pandas was in crucial need > of 1) more innovative momentum, or 2) coordinated efforts to > communicate this to potential funders. But the devs (not me, > unfortunately) have been doing great lately on both these aspects, also > getting inspiration from these 2016 docs, but definitely not (I think) > benefitting from the "pandas 2" label. So I would keep the docs > somewhere, but avoiding the ambiguity of whether they are talking about > a different project, and definitely avoiding the ambiguity or whether > they are describing a specific version of the current project. > > Pietro > > ? https://github.com/pandas-dev/pandas2/issues/37 > > Il giorno dom, 01/12/2019 alle 19.34 -0600, Wes McKinney ha scritto: > > hi Pietro, > > > > There is, perhaps, a philosophical question about open source > > projects: should a project have aspirational goals? Should developers > > talk openly about large, difficult, uncertain, long term goals? > > > > Unfortunately, many users have a one-sided consumption-based > > relationship with open source projects and so their interactions with > > the developers tend to be about getting what they need out of the > > project or finding out when to expect something to be done. Large > > projects like pandas will also face disinformation or misinformed > > users jumping to conclusions about what the maintainers are thinking > > or where the project is heading. > > > > Personally, I don't think open source maintainers owe their users the > > same kinds of communication that an enterprise software company > > (getting paid millions or billions of dollars per year) might > > provide. > > Many users behave like customers (I recall a discussion on this list > > where someone literally called themselves a "customer") but I guess > > their checks got lost in the mail. > > > > I started the "pandas 2" discussion in 2016 to try to understand if > > there was any consensus of thought about some of the architectural / > > design shortcomings about pandas's internals and what people think we > > should do about them. I also thought maybe it might spur some funding > > or new sources of support for the project. pandas seems also to be > > saddled with many legacy support issues though as time goes on some > > of > > these are getting trimmed away, while other large issues (e.g. the > > BlockManager) remain. I don't see a problem with having this > > discussion and writing down some ideas. > > > > I'm not sure what needs to be done to clarify that anything relating > > to "pandas 2" are aspirational goals of the project without any > > concrete plans to deliver a new piece of software until stated > > otherwise, but if it isn't clear, let's make it more clear. If anyone > > was expecting me to pull a rabbit out of a hat and deliver this > > end-to-end on my own, I am sorry to disappoint. With the anemic state > > of funding for open source innovation-focused projects, it is going > > to > > take quite a long time to get all the pieces developed. > > > > FWIW, I've been consistently working since mid/late 2015 (so > > approaching 5 years now...) precisely to create a more sustainable, > > general purpose technology foundation for data frame libraries in > > part > > to make it easier for something like "pandas 2" (as it was discussed > > at the time) to come to life. Something I've said many times is that > > I > > think the pandas maintainers' scope of code ownership should > > decrease, > > not increase. One of the problems with pandas (and other libraries, > > too) is the relatively strong coupling between the API and the > > implementation -- as a result of this in Apache Arrow we have strived > > to avoid having an opinion about what kind of public API a data frame > > user will see. > > > > - Wes > > > > On Sat, Nov 30, 2019 at 9:48 AM Pietro Battiston < > > me at pietrobattiston.it> wrote: > > > Dear devs, > > > > > > every time that "pandas 2" comes out, it is (it seems to me) not > > > because of our concrete plans for it, or even because it is used as > > > inspiration for current pandas (which by the way is receiving great > > > and > > > substantial improvements), but because some user is confused by the > > > docs/issues mentioning it. > > > > > > I know it is somewhat of a rhetorical question - because we > > > ourselves > > > always considered "pandas 2" first and foremost as a direction to > > > take > > > (or at least discuss) rather than as a version to release - but I'm > > > wondering whether having pandas 2 mentioned, discussed and > > > postponed > > > (inevitably, as we are not even really targeting at it) is really > > > helpful, and in particular whether the separate github project is > > > really helpful. > > > > > > I see two options: > > > - spend serious effort in communicating users what/when to expect > > > (and > > > not to expect) from pandas 2 > > > - delete any mention to pandas 2 from our github and from the > > > "pandas > > > 2.0 Design Documents" - which could be just described as "the > > > future of > > > pandas" > > > > > > ... which clearly doesn't mean we do not need to introduce > > > important > > > changes in pandas (this is happening daily), or that there > > > shouldn't be a version 2.0 some day. > > > > > > This is some of the "confused users" I have in mind: > > > https://www.reddit.com/r/datascience/comments/8rcoou/what_happened_to_pandas_2/ > > > > > > Cheers, > > > > > > Pietro > > > > > > Il giorno ven, 29/11/2019 alle 12.05 +0100, Joris Van den Bossche > > > ha > > > scritto: > > > > Hi Martin, > > > > > > > > The 2.0 milestone is not updated for a very long time, and also > > > > not > > > > yet really used (there are a few issues tagged with it to mean > > > > "maybe > > > > in a next big release but not yet in 1.0"). So I wouldn't look > > > > too > > > > much to that. In any case, we are certainly not going to do a > > > > pandas > > > > 2.0 release in summer 2020 (so we should update the milestone > > > > date). > > > > > > > > What we do plan is a final 1.0 release in early 2020. What we > > > > also > > > > discussed recently is a version policy for starting with 1.0: > > > > https://dev.pandas.io/docs/development/policies.html#version-policy > > > > This means that code working with 1.0 should mostly keep working > > > > in > > > > the full 1.x series of releases when not using experimental > > > > features > > > > (although we will keep doing deprecations, so you still might > > > > need to > > > > change code to get rid of such warnings, in preparation of pandas > > > > 2.0). > > > > > > > > And you are correct: pandas 1.0 will not be drastically different > > > > from 0.25.3 (the main difference will be that a lot of things > > > > that > > > > were deprecated before will now be removed, plus some documented > > > > API > > > > changes). While we do not yet have much concrete plans for pandas > > > > 2.0, I think the expectation is that it will be similar (and also > > > > not > > > > something for the coming year anyway). > > > > > > > > So if you are writing code now for 0.25.3, and you take notice of > > > > possible deprecation warnings and fix your code for those, you > > > > can be > > > > ensured that your code will mostly work on 1.0 as well. > > > > Now, it is still very recommended to ensure you write tests for > > > > your > > > > code, so you can run those on new pandas releases to verify this > > > > is > > > > indeed the case (and running such tests on release candidates of > > > > new > > > > pandas releases is also very valuable, so potential regressions > > > > can > > > > be reported and fixed early). > > > > > > > > Hopefully that could shed some light > > > > Joris > > > > > > > > > > > > On Fri, 29 Nov 2019 at 05:42, Martin Gantchev > > > > wrote: > > > > > Dear Representatives of Pandas-dev, > > > > > > > > > > This is Martin here, a regular user of the pandas library. > > > > > > > > > > First of all, thank you for providing, maintaining and still > > > > > developing this amazing library which I use pretty much every > > > > > day. > > > > > > > > > > On that note, I am facing a project that will involve working > > > > > with > > > > > pandas heavily, but that is supposed to retain the code for a > > > > > long > > > > > period of time (hopefully, for years to come). > > > > > > > > > > I am referring to this piece of information: > > > > > > > > > > https://github.com/pandas-dev/pandas/milestones > > > > > > > > > > It seems that pandas 1.0 has 90% completion rate, while pandas > > > > > 2.0 > > > > > is expected to be ready for as early as August 2020, however it > > > > > strangely has just 10 problems that need to be solved. > > > > > > > > > > Of course, no precise answer is requested. However, I am afraid > > > > > that in the next couple of months I may write code that might > > > > > become obsolete in the middle of next summer. Am I right about > > > > > that? > > > > > > > > > > I did read around the internet and read more articles, so I > > > > > don't > > > > > expect neither 1.0 or 2.0 to be drastically different from > > > > > 0.25.3. > > > > > At least, I guess most of the code I'd use in 0.25.3 should > > > > > work > > > > > normally under 1.0 or 2.0. Is that correct? > > > > > > > > > > Shedding light on this subject may save tons of worries for me, > > > > > so > > > > > even a loose delineation of your schedule and the potential > > > > > impact > > > > > it may have on code written in 0.25.3 would be greatly > > > > > appreciated. > > > > > > > > > > Thank you very much! > > > > > > > > > > Looking forward to your answer. > > > > > Best, > > > > > Martin > > > > > _______________________________________________ > > > > > Pandas-dev mailing list > > > > > Pandas-dev at python.org > > > > > https://mail.python.org/mailman/listinfo/pandas-dev > > > > > > > > _______________________________________________ > > > > Pandas-dev mailing list > > > > Pandas-dev at python.org > > > > https://mail.python.org/mailman/listinfo/pandas-dev > > > > > > _______________________________________________ > > > Pandas-dev mailing list > > > Pandas-dev at python.org > > > https://mail.python.org/mailman/listinfo/pandas-dev > From bms91 at abv.bg Thu Dec 5 21:44:56 2019 From: bms91 at abv.bg (Martin Gantchev) Date: Fri, 6 Dec 2019 04:44:56 +0200 (EET) Subject: [Pandas-dev] Just a quick question from a regular pandas user In-Reply-To: References: <1309884188.227412.1574998229127@nm43.abv.bg> <9e4fe1eaa534dfd3150cf694c0fce0e557fed7ab.camel@pietrobattiston.it> Message-ID: <407389473.2052379.1575600296977@nm42.abv.bg> Hi Joris, Pietro, and Wes! Thank you for your immediate and so rich answers and explanations. And also, thank you very much for continuously improving pandas. I think it was obvious that I am not a full-time programmer, so reading your answers enlightened me not only about pandas and its future versions, but about the way open-source developers operate. Actually, you told me about more than I needed, so I am really grateful and can only hope to be of such level one day so that I can contribute with something in return, in order to keep improving pandas as a library. Regarding my initial question - considering the fact that I am not writing complicated code as of now, I think I am safe to proceed with my work on 0.25.3 for the moment. I will just be careful for any deprecation warnings. In a nutshell, as long as we are expected to have Series and DataFrames in pandas, the rest I think I should be able to handle! Thank you very much and although I have enjoyed this discussion a lot, please excuse me if you think I should have kept my messages simpler or shorter. Kind regards, Martin >-------- ?????????? ????? -------- >??: Wes McKinney wesmckinn at gmail.com >???????: Re: [Pandas-dev] Just a quick question from a regular pandas user >??: Pietro Battiston >????????? ??: 02.12.2019 03:34 hi Pietro, There is, perhaps, a philosophical question about open source projects: should a project have aspirational goals? Should developers talk openly about large, difficult, uncertain, long term goals? Unfortunately, many users have a one-sided consumption-based relationship with open source projects and so their interactions with the developers tend to be about getting what they need out of the project or finding out when to expect something to be done. Large projects like pandas will also face disinformation or misinformed users jumping to conclusions about what the maintainers are thinking or where the project is heading. Personally, I don't think open source maintainers owe their users the same kinds of communication that an enterprise software company (getting paid millions or billions of dollars per year) might provide. Many users behave like customers (I recall a discussion on this list where someone literally called themselves a "customer") but I guess their checks got lost in the mail. I started the "pandas 2" discussion in 2016 to try to understand if there was any consensus of thought about some of the architectural / design shortcomings about pandas's internals and what people think we should do about them. I also thought maybe it might spur some funding or new sources of support for the project. pandas seems also to be saddled with many legacy support issues though as time goes on some of these are getting trimmed away, while other large issues (e.g. the BlockManager) remain. I don't see a problem with having this discussion and writing down some ideas. I'm not sure what needs to be done to clarify that anything relating to "pandas 2" are aspirational goals of the project without any concrete plans to deliver a new piece of software until stated otherwise, but if it isn't clear, let's make it more clear. If anyone was expecting me to pull a rabbit out of a hat and deliver this end-to-end on my own, I am sorry to disappoint. With the anemic state of funding for open source innovation-focused projects, it is going to take quite a long time to get all the pieces developed. FWIW, I've been consistently working since mid/late 2015 (so approaching 5 years now...) precisely to create a more sustainable, general purpose technology foundation for data frame libraries in part to make it easier for something like "pandas 2" (as it was discussed at the time) to come to life. Something I've said many times is that I think the pandas maintainers' scope of code ownership should decrease, not increase. One of the problems with pandas (and other libraries, too) is the relatively strong coupling between the API and the implementation -- as a result of this in Apache Arrow we have strived to avoid having an opinion about what kind of public API a data frame user will see. - Wes On Sat, Nov 30, 2019 at 9:48 AM Pietro Battiston wrote: > > Dear devs, > > every time that "pandas 2" comes out, it is (it seems to me) not > because of our concrete plans for it, or even because it is used as > inspiration for current pandas (which by the way is receiving great and > substantial improvements), but because some user is confused by the > docs/issues mentioning it. > > I know it is somewhat of a rhetorical question - because we ourselves > always considered "pandas 2" first and foremost as a direction to take > (or at least discuss) rather than as a version to release - but I'm > wondering whether having pandas 2 mentioned, discussed and postponed > (inevitably, as we are not even really targeting at it) is really > helpful, and in particular whether the separate github project is > really helpful. > > I see two options: > - spend serious effort in communicating users what/when to expect (and > not to expect) from pandas 2 > - delete any mention to pandas 2 from our github and from the "pandas > 2.0 Design Documents" - which could be just described as "the future of > pandas" > > ... which clearly doesn't mean we do not need to introduce important > changes in pandas (this is happening daily), or that there shouldn't be a version 2.0 some day. > > This is some of the "confused users" I have in mind: > https://www.reddit.com/r/datascience/comments/8rcoou/what_happened_to_pandas_2/ > > Cheers, > > Pietro > > Il giorno ven, 29/11/2019 alle 12.05 +0100, Joris Van den Bossche ha > scritto: > > Hi Martin, > > > > The 2.0 milestone is not updated for a very long time, and also not > > yet really used (there are a few issues tagged with it to mean "maybe > > in a next big release but not yet in 1.0"). So I wouldn't look too > > much to that. In any case, we are certainly not going to do a pandas > > 2.0 release in summer 2020 (so we should update the milestone date). > > > > What we do plan is a final 1.0 release in early 2020. What we also > > discussed recently is a version policy for starting with 1.0: > > https://dev.pandas.io/docs/development/policies.html#version-policy > > This means that code working with 1.0 should mostly keep working in > > the full 1.x series of releases when not using experimental features > > (although we will keep doing deprecations, so you still might need to > > change code to get rid of such warnings, in preparation of pandas > > 2.0). > > > > And you are correct: pandas 1.0 will not be drastically different > > from 0.25.3 (the main difference will be that a lot of things that > > were deprecated before will now be removed, plus some documented API > > changes). While we do not yet have much concrete plans for pandas > > 2.0, I think the expectation is that it will be similar (and also not > > something for the coming year anyway). > > > > So if you are writing code now for 0.25.3, and you take notice of > > possible deprecation warnings and fix your code for those, you can be > > ensured that your code will mostly work on 1.0 as well. > > Now, it is still very recommended to ensure you write tests for your > > code, so you can run those on new pandas releases to verify this is > > indeed the case (and running such tests on release candidates of new > > pandas releases is also very valuable, so potential regressions can > > be reported and fixed early). > > > > Hopefully that could shed some light > > Joris > > > > > > On Fri, 29 Nov 2019 at 05:42, Martin Gantchev bms91 at abv.bg > wrote: > > > Dear Representatives of Pandas-dev, > > > > > > This is Martin here, a regular user of the pandas library. > > > > > > First of all, thank you for providing, maintaining and still > > > developing this amazing library which I use pretty much every day. > > > > > > On that note, I am facing a project that will involve working with > > > pandas heavily, but that is supposed to retain the code for a long > > > period of time (hopefully, for years to come). > > > > > > I am referring to this piece of information: > > > > > > https://github.com/pandas-dev/pandas/milestones > > > > > > It seems that pandas 1.0 has 90% completion rate, while pandas 2.0 > > > is expected to be ready for as early as August 2020, however it > > > strangely has just 10 problems that need to be solved. > > > > > > Of course, no precise answer is requested. However, I am afraid > > > that in the next couple of months I may write code that might > > > become obsolete in the middle of next summer. Am I right about > > > that? > > > > > > I did read around the internet and read more articles, so I don't > > > expect neither 1.0 or 2.0 to be drastically different from 0.25.3. > > > At least, I guess most of the code I'd use in 0.25.3 should work > > > normally under 1.0 or 2.0. Is that correct? > > > > > > Shedding light on this subject may save tons of worries for me, so > > > even a loose delineation of your schedule and the potential impact > > > it may have on code written in 0.25.3 would be greatly appreciated. > > > > > > Thank you very much! > > > > > > Looking forward to your answer. > > > Best, > > > Martin > > > _______________________________________________ > > > Pandas-dev mailing list > > > Pandas-dev at python.org > > > https://mail.python.org/mailman/listinfo/pandas-dev > > > > _______________________________________________ > > Pandas-dev mailing list > > Pandas-dev at python.org > > https://mail.python.org/mailman/listinfo/pandas-dev > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From tcaswell at gmail.com Mon Dec 9 13:47:35 2019 From: tcaswell at gmail.com (Thomas Caswell) Date: Mon, 9 Dec 2019 13:47:35 -0500 Subject: [Pandas-dev] Matplotlib Research Software Engineering Fellow Message-ID: Hi folks, Matplotlib has received a grant from the Chan Zuckerberg Initiative to fund maintenance! As part of the grant, we have 1 year of funding for a Research Software Engineering Fellow to carry out this work. If you are interested in working on Matplotlib full time, please apply! See below for a link to the full job description and application instructions. The timeline on this is rather short, so applications are due Jan 3. https://discourse.matplotlib.org/t/now-hiring-matplotlib-research-software-engineering-fellow/20701 Tom and Hannah -- Thomas Caswell tcaswell at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Mon Dec 9 16:30:45 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Mon, 9 Dec 2019 15:30:45 -0600 Subject: [Pandas-dev] Pandas Dev Meeting Wednesday, December 11th Message-ID: Hi all, The next pandas dev meeting is this Wednesday at 18:00 UTC. All are welcome to attend. - Hangout: https://meet.google.com/hav-rmax-zjx - Minutes: https://docs.google.com/document/u/1/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?ouid=102771015311436394588&usp=docs_home&ths=true Feel free to add agenda items. As a reminder, https://dev.pandas.io/docs/development/meeting.html has a calendar with all our meetings. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: