[Pandas-dev] Just a quick question from a regular pandas user

Wes McKinney wesmckinn at gmail.com
Sun Dec 1 23:55:46 EST 2019


On Sun, Dec 1, 2019 at 10:53 PM Pietro Battiston <me at pietrobattiston.it> wrote:
>
> Hi Wes,
>
> thanks for your reply. I totally agree with your view that open source
> devs shouldn't be scared by their users... but if indeed pandas 2 is an
> "aspirational goal", which is great, maybe labelling it as "2" is not
> doing a good service to us either?
>
> Pandas 2.0, when it will eventually come out, will have benefited (in
> terms of coordination, and possibly funding) from your "pandas 2"
> plans, but it will still emerge, as always happens, from long
> discussions that might result in more, or less, or slightly different,
> changes than envisaged in those docs. Hence if we devote attention do
> those documents, they will sooner or later become a source of confusion
> - while if we don't, they are only a source of confusion for users.
> Again, I would understand the cost to users (in terms of confusion) if
> it was a resource to us, but it is instead mostly a cost to us (every
> time we take time in explaining some confused user "what's happening
> with pandas 2").
>
> And to be honest, what bothers me most is the github project. I don't
> think that a mostly empty, inactive project, with issues we don't
> really aim at closing - including some which are actually being
> discussed in parallel "today's pandas" issues!¹ - is of help to us, and
> even less that it inspires funders to give money!

This is easy. Go ahead and alter or remove the GitHub repository. Or I
can do it, too.

> I probably wouldn't be writing this mail if pandas was in crucial need
> of 1) more innovative momentum, or 2) coordinated efforts to
> communicate this to potential funders. But the devs (not me,
> unfortunately) have been doing great lately on both these aspects, also
> getting inspiration from these 2016 docs, but definitely not (I think)
> benefitting from the "pandas 2" label. So I would keep the docs
> somewhere, but avoiding the ambiguity of whether they are talking about
> a different project, and definitely avoiding the ambiguity or whether
> they are describing a specific version of the current project.
>
> Pietro
>
> ¹ https://github.com/pandas-dev/pandas2/issues/37
>
> Il giorno dom, 01/12/2019 alle 19.34 -0600, Wes McKinney ha scritto:
> > hi Pietro,
> >
> > There is, perhaps, a philosophical question about open source
> > projects: should a project have aspirational goals? Should developers
> > talk openly about large, difficult, uncertain, long term goals?
> >
> > Unfortunately, many users have a one-sided consumption-based
> > relationship with open source projects and so their interactions with
> > the developers tend to be about getting what they need out of the
> > project or finding out when to expect something to be done. Large
> > projects like pandas will also face disinformation or misinformed
> > users jumping to conclusions about what the maintainers are thinking
> > or where the project is heading.
> >
> > Personally, I don't think open source maintainers owe their users the
> > same kinds of communication that an enterprise software company
> > (getting paid millions or billions of dollars per year) might
> > provide.
> > Many users behave like customers (I recall a discussion on this list
> > where someone literally called themselves a "customer") but I guess
> > their checks got lost in the mail.
> >
> > I started the "pandas 2" discussion in 2016 to try to understand if
> > there was any consensus of thought about some of the architectural /
> > design shortcomings about pandas's internals and what people think we
> > should do about them. I also thought maybe it might spur some funding
> > or new sources of support for the project. pandas seems also to be
> > saddled with many legacy support issues though as time goes on some
> > of
> > these are getting trimmed away, while other large issues (e.g. the
> > BlockManager) remain. I don't see a problem with having this
> > discussion and writing down some ideas.
> >
> > I'm not sure what needs to be done to clarify that anything relating
> > to "pandas 2" are aspirational goals of the project without any
> > concrete plans to deliver a new piece of software until stated
> > otherwise, but if it isn't clear, let's make it more clear. If anyone
> > was expecting me to pull a rabbit out of a hat and deliver this
> > end-to-end on my own, I am sorry to disappoint. With the anemic state
> > of funding for open source innovation-focused projects, it is going
> > to
> > take quite a long time to get all the pieces developed.
> >
> > FWIW, I've been consistently working since mid/late 2015 (so
> > approaching 5 years now...) precisely to create a more sustainable,
> > general purpose technology foundation for data frame libraries in
> > part
> > to make it easier for something like "pandas 2" (as it was discussed
> > at the time) to come to life. Something I've said many times is that
> > I
> > think the pandas maintainers' scope of code ownership should
> > decrease,
> > not increase. One of the problems with pandas (and other libraries,
> > too) is the relatively strong coupling between the API and the
> > implementation -- as a result of this in Apache Arrow we have strived
> > to avoid having an opinion about what kind of public API a data frame
> > user will see.
> >
> > - Wes
> >
> > On Sat, Nov 30, 2019 at 9:48 AM Pietro Battiston <
> > me at pietrobattiston.it> wrote:
> > > Dear devs,
> > >
> > > every time that "pandas 2" comes out, it is (it seems to me) not
> > > because of our concrete plans for it, or even because it is used as
> > > inspiration for current pandas (which by the way is receiving great
> > > and
> > > substantial improvements), but because some user is confused by the
> > > docs/issues mentioning it.
> > >
> > > I know it is somewhat of a rhetorical question - because we
> > > ourselves
> > > always considered "pandas 2" first and foremost as a direction to
> > > take
> > > (or at least discuss) rather than as a version to release - but I'm
> > > wondering whether having pandas 2 mentioned, discussed and
> > > postponed
> > > (inevitably, as we are not even really targeting at it) is really
> > > helpful, and in particular whether the separate github project is
> > > really helpful.
> > >
> > > I see two options:
> > > - spend serious effort in communicating users what/when to expect
> > > (and
> > > not to expect) from pandas 2
> > > - delete any mention to pandas 2 from our github and from the
> > > "pandas
> > > 2.0 Design Documents" - which could be just described as "the
> > > future of
> > > pandas"
> > >
> > > ... which clearly doesn't mean we do not need to introduce
> > > important
> > > changes in pandas (this is happening daily), or that there
> > > shouldn't be a version 2.0 some day.
> > >
> > > This is some of the "confused users" I have in mind:
> > > https://www.reddit.com/r/datascience/comments/8rcoou/what_happened_to_pandas_2/
> > >
> > > Cheers,
> > >
> > > Pietro
> > >
> > > Il giorno ven, 29/11/2019 alle 12.05 +0100, Joris Van den Bossche
> > > ha
> > > scritto:
> > > > Hi Martin,
> > > >
> > > > The 2.0 milestone is not updated for a very long time, and also
> > > > not
> > > > yet really used (there are a few issues tagged with it to mean
> > > > "maybe
> > > > in a next big release but not yet in 1.0"). So I wouldn't look
> > > > too
> > > > much to that. In any case, we are certainly not going to do a
> > > > pandas
> > > > 2.0 release in summer 2020 (so we should update the milestone
> > > > date).
> > > >
> > > > What we do plan is a final 1.0 release in early 2020. What we
> > > > also
> > > > discussed recently is a version policy for starting with 1.0:
> > > > https://dev.pandas.io/docs/development/policies.html#version-policy
> > > > This means that code working with 1.0 should mostly keep working
> > > > in
> > > > the full 1.x series of releases when not using experimental
> > > > features
> > > > (although we will keep doing deprecations, so you still might
> > > > need to
> > > > change code to get rid of such warnings, in preparation of pandas
> > > > 2.0).
> > > >
> > > > And you are correct: pandas 1.0 will not be drastically different
> > > > from 0.25.3 (the main difference will be that a lot of things
> > > > that
> > > > were deprecated before will now be removed, plus some documented
> > > > API
> > > > changes). While we do not yet have much concrete plans for pandas
> > > > 2.0, I think the expectation is that it will be similar (and also
> > > > not
> > > > something for the coming year anyway).
> > > >
> > > > So if you are writing code now for 0.25.3, and you take notice of
> > > > possible deprecation warnings and fix your code for those, you
> > > > can be
> > > > ensured that your code will mostly work on 1.0 as well.
> > > > Now, it is still very recommended to ensure you write tests for
> > > > your
> > > > code, so you can run those on new pandas releases to verify this
> > > > is
> > > > indeed the case (and running such tests on release candidates of
> > > > new
> > > > pandas releases is also very valuable, so potential regressions
> > > > can
> > > > be reported and fixed early).
> > > >
> > > > Hopefully that could shed some light
> > > > Joris
> > > >
> > > >
> > > > On Fri, 29 Nov 2019 at 05:42, Martin Gantchev <bms91 at abv.bg>
> > > > wrote:
> > > > > Dear Representatives of Pandas-dev,
> > > > >
> > > > > This is Martin here, a regular user of the pandas library.
> > > > >
> > > > > First of all, thank you for providing, maintaining and still
> > > > > developing this amazing library which I use pretty much every
> > > > > day.
> > > > >
> > > > > On that note, I am facing a project that will involve working
> > > > > with
> > > > > pandas heavily, but that is supposed to retain the code for a
> > > > > long
> > > > > period of time (hopefully, for years to come).
> > > > >
> > > > > I am referring to this piece of information:
> > > > >
> > > > > https://github.com/pandas-dev/pandas/milestones
> > > > >
> > > > > It seems that pandas 1.0 has 90% completion rate, while pandas
> > > > > 2.0
> > > > > is expected to be ready for as early as August 2020, however it
> > > > > strangely has just 10 problems that need to be solved.
> > > > >
> > > > > Of course, no precise answer is requested. However, I am afraid
> > > > > that in the next couple of months I may write code that might
> > > > > become obsolete in the middle of next summer. Am I right about
> > > > > that?
> > > > >
> > > > > I did read around the internet and read more articles, so I
> > > > > don't
> > > > > expect neither 1.0 or 2.0 to be drastically different from
> > > > > 0.25.3.
> > > > > At least, I guess most of the code I'd use in 0.25.3 should
> > > > > work
> > > > > normally under 1.0 or 2.0. Is that correct?
> > > > >
> > > > > Shedding light on this subject may save tons of worries for me,
> > > > > so
> > > > > even a loose delineation of your schedule and the potential
> > > > > impact
> > > > > it may have on code written in 0.25.3 would be greatly
> > > > > appreciated.
> > > > >
> > > > > Thank you very much!
> > > > >
> > > > > Looking forward to your answer.
> > > > > Best,
> > > > > Martin
> > > > > _______________________________________________
> > > > > Pandas-dev mailing list
> > > > > Pandas-dev at python.org
> > > > > https://mail.python.org/mailman/listinfo/pandas-dev
> > > >
> > > > _______________________________________________
> > > > Pandas-dev mailing list
> > > > Pandas-dev at python.org
> > > > https://mail.python.org/mailman/listinfo/pandas-dev
> > >
> > > _______________________________________________
> > > Pandas-dev mailing list
> > > Pandas-dev at python.org
> > > https://mail.python.org/mailman/listinfo/pandas-dev
>


More information about the Pandas-dev mailing list