[Pandas-dev] On bug-fix releases and maintenance branches

Tue Mar 8 20:23:07 EST 2016

2016-03-08 23:48 GMT+01:00 Wes McKinney <wesmckinn at gmail.com>:

> hey Jeff,
>
> On Tue, Feb 23, 2016 at 12:11 PM, Jeff Reback <jeffreback at gmail.com>
> wrote:
> > Thanks for bringing this up joris, here are some thoughts.
> >
> > 1) I agree that the next releases should probably focus on bug fixes. So
> > this might mean
> > we should shoot for 0.18.2....3 etc.
> >
> > However, we do need a 0.19.0 in order to provide any big deprecations
> > (Panel) and API changes that
> > are needed.
> >
> > 2) I am a bit hesitant to even make a big break (1.0) because I have seen
> > this just bifurcating people (e.g. do I upgrade now, what if I want
> > compat). This just creates less community. So I think this should be a
> goal,
> > that even though its called 1.0 it is as back-compat as possible.
> >
>
> Yeah, with more significant internal refactoring the goal would be to
> not break API compatibility unless absolutely necessary.
>

Jeff, to be clear, my initial mail was not to discuss the issue whether to
do a major breaking release or not, or going into general maintanance mode
or not (that's certainly an interesting discussion, but another one I
think). The fact is that we are still cleaning up things and so do
breakages in 0.X releases (like the resample now), and that won't directly
stop.

But given that context, we can think about how to do 0.X.X releases that
help users as much as possible to upgrade smoothly.
We now put quite a lot in a micro 0.X.X bug-fix releases (including new
features), which can have the consequence that it introduces new bugs.

> > 3) Releases can be big, and do fix lots of bugs, and usually introduce
> new
> > ones. This is almost inevitable as we add new features, changes, and even
> > bug fixes which occasionally have regressions (though test suite is
> pretty
> > good, so hopefully not too often).
> >
> > 4) I don't relish backporting things. I think this could lead to lots of
> > headaches and IMHO doesn't really buy much.
> >
>
> I think what we are talking about is backporting bug fixes for major
> brokenness (e.g. serious correctness issues) or regressions that
> aren't caught by major release time. I think what's been happening in
> practice is that people are creating their own patched bugfix versions
> of releases to avoid the pain induced by API-breakage in major
> releases.
>
> Obviously, continuing to innovate and clean up the API (with judicious
> breakage where absolutely necessary -- I think the resampling cleanup
> is a good example where the net benefit in the long run will be high)
> but we have to take care of the user base, many of whom depend on
> pandas in production applications.
>
> This is all made more difficult because there isn't any direct cash
> flow funding pandas development AFAICT. Where I work, for example, we
> have many employees who are responsible for creating patched builds
> and handling backports for otherwise API-stable branches of major
> Apache open source projects. But we can afford to do this because
> customers are paying for this (priority support and backports /
> patched builds).
>
> So what I would suggest, in lieu of financial support for backports
> and maintenance builds, is that we consider maint-0.XX.X branches for
> backporting only the most serious of serious bug fixes ("Bad Bugs").
> Major regressions and correctness issues should go into this bucket.
> Perhaps we can start doing this with 0.18.x -- as a matter of process
> if any PR appears to fix a Bad Bug it should be brought up here on the
> mailing list so we can decide whether it should be backported.
>
>
With regard to the possible concern of "this is too much work": I don't
think it would be many bug fixes that would be backported.
For example, the last micro release, 0.17.1 had quite a lot of new features
and the whatsnew notes listed 50 bug fixes. But a lot of these bug fixes
were not regressions, but were bugs that were also in the previous
releases.
So if we restrict the 0.xx.x release to only regressions, it would be a
much smaller of maybe 10 to 15 bug fixes (rough estimate, didn't look into
detail). But in any case I think this would be a rather manageable amount.

So that would make our bug fix releases smaller, and we also don't have to
hold up master with breaking changes/larger new features until one or two
bug-fix releases are released.

For me, the fixes that could go in such a bug-fix release:

- bug fixes or clean-up of rough edges of major new features in the 0.X
release (for example for 0.18.1 possible changes to the newly introduced
RangeIndex)
- regressions, issues that were not present in the previous 0.X release,
and could make it therefore more difficult to upgrade

+ the correctness issues that Wes mentioned.

> > 5) We don't want to just go into maintenance mode because we still have a
> > fair amount of feature requests. (though these are often pretty
> targeted),
> > but off of the top of my head, nothing really *new*, mainly some API
> changes
> > to bring consistency. E.g. ``.agg`` on a DataFrame is a long-requested
> > feature, which actually after 0.18.0 is quite trivially to do.
> >
>
> Yeah, I think we should try to stick with
> https://en.wikipedia.org/wiki/Open/closed_principle -- so
> conveniences, extensions to existing APIs, and other helpful new
> features are fair game, but breaking API changes should be
>
> > 6) I think we telegraph any API changes and really really try to have
> > back-compat, so people do have the ability to upgrade at their leisure.
> >
> >> API changes are most painful for users who do not write tests for
> >> their code that depends on pandas. That problem is probably not
> >> fixable =)
> >
> >
> > of course this is a telling point. pandas upgrades often expose bugs in
> user
> > code. I view this as a good thing!
> >
> > So given all of the somewhat contradictory points above, what do I really
> > think we should do?
> >
> > In order for pandas to be (even more) of a force in leading the
> scientific
> > community. I think we have to grow. So having more contributors is a
> great
> > thing. People do like / appreciate fixing bugs, but even more (IMHO), are
> > performance enhancements and *some* new features.
> >
> > I will probably try to do more bug-fixing (rather than large API's ish
> > fixes) I think. There is quite a back-log. This should *slow* the issue
> of
> > the BIG API changes.
> >
> > So I am kind of -1 on backports for mostly 2), it seems to just slow
> things
> > down, and 4) it can often lead to MORE things being inconcistent (you
> need
> > machinery to ensure that what is backported is correct and is included).
> I
> > can easily forsee that we decide to create 'stable' branches, which in
> fact
> > are stable but might have inconsistent fixes, this is even more
> confusing in
> > my view.
> >
>
> Let me know what you think about my Bad Bug = backport policy. This is
> mostly about communication and keeping track of serious issues that
> should necessitate upgrading.
>
> I also think we should try to keep minor releases API stable from here
> on out; so this may result in our version numbers increasing more
> quickly but that's OK for the improved communication about "what is a
> minor release (major release plus bug fix backports)"
>

Just for clarity, with minor release, do you mean the 0.X releases?
(because 0.X.X matches more the 'major release plus bug fix backports'
description)

Joris

>
> - Wes
>
> > I think we have a fairly aggressive release cycle. We for sure don't
> want to
> > debate everything. I am of the opinion that it is much better to put
> things
> > out there quicker, then to endlessly debate extremely minor points (not
> > naming project names here :).
> >
> > For the general user what we do w.r.t. release cycles probably doesn't
> > matter, and for the corporate user, they almost always have a 'fixed'
> > version anyhow (and then they do of course port the new ones, but then
> they
> > have people upgraded it carefully). I am not so sure we should impose
> > structure on this. We already have announced major releases and minor
> > releases.
> >
> > All for better 'language' in the minor releases.
> >
> > Jeff
> >
> >
> > On Tue, Feb 23, 2016 at 2:21 PM, Wes McKinney <wesmckinn at gmail.com>
> wrote:
> >>
> >> hi Joris,
> >>
> >> I'm sorry it's taken a couple weeks to write a reply -- been really
> >> busy and wanted to put some thought into this.
> >>
> >> This is a really important discussion given how important pandas has
> >> become to so many people, thank you for bringing it up.
> >>
> >> On Tue, Feb 9, 2016 at 4:59 PM, Joris Van den Bossche
> >> <jorisvandenbossche at gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I wanted to stir some discussion on pandas its policy on bug-fx
> releases
> >> > and
> >> > upgrading pains. First some context:
> >> >
> >> > Context part 1: Currently we do not use maintenance branches for
> bugfix
> >> > releases, and we actually also do not really do bugfix releases. We
> just
> >> > develop further on master, and try to not merge breaking changes the
> >> > first
> >> > weeks/months, so we can do a minor kind of bug-fix release (but
> usually
> >> > also
> >> > with a lot of new features).
> >> > But we don't, for example, backport fixes of regressions if they are
> >> > fixed
> >> > after master is pointing to the next major release.
> >>
> >> I think in general it would be a good idea to tilt development away
> >> from new feature development and toward bug fixes and stability. Given
> >> that we are contemplating making some breaking changes in a 1.x
> >> development branch (like removing the Panel classes), we should decide
> >> as some point to create a 0.X.Y maintenance line where we can backport
> >> bug fixes only, so that "legacy pandas" users can have a "LTS" (in
> >> Ubuntu parlance) maintenance branch. This introduces some development
> >> overhead but it seems worth it.
> >>
> >> >
> >> > Context part 2: pandas is not yet that stable, in the sense that there
> >> > are
> >> > still quite some breaking changes in each release. I am not arguing
> for
> >> > not
> >> > doing these breaking changes, as some of these changes are really
> needed
> >> > to
> >> > clean up the API  (although there are also arguments for that, but I
> >> > think
> >> > that is another discussion). This has the consequence that updating
> your
> >> > pandas version is not always that pleasant.
> >>
> >> Over the years I've heard many horror stories from companies who have
> >> created and maintained internal 0.7.x, 0.8.x, or 0.9.x pandas forks
> >> because of the API breakage issues. This is definitely an anti-pattern
> >> that we should try to avoid happening in the future, but API breakages
> >> in many cases are the inevitable price of progress.
> >>
> >> Some of the API breakage has resulted from experiences accumulated
> >> over a long period of time -- I made a lot of decisions early on in
> >> the project that ended up not being the right ones (e.g. resample
> >> default arguments changed at one point). There wasn't enough community
> >> engagement at that point to have a thorough design process to
> >> potentially come up with the "right" design first. In other cases, the
> >> "right" choice was perhaps more ambiguous.
> >>
> >> API changes are most painful for users who do not write tests for
> >> their code that depends on pandas. That problem is probably not
> >> fixable =)
> >>
> >> I think having stable releases with backports of serious correctness
> >> bugs helps mitigate this problem, whereas modest API changes between
> >> major releases. I would also be in favor of having point releases only
> >> contain bug fixes rather than the current system of point releases
> >> being a stable snapshot of trunk.
> >>
> >> Since Jeff is the most affected by this on a day to day basis as de
> >> facto steward of the PR queue I would be curious what process he feels
> >> would be the most helpful.
> >>
> >> - Wes
> >>
> >> >
> >> > Sidenote: I have not that much experience with using pandas in a
> larger
> >> > company or in larger codebases that need to be upgraded, rather with
> >> > just my
> >> > own code for my PhD. So it is difficult for me to judge on how much
> this
> >> > is
> >> > a problem or if bug-fx releases would help.
> >> >
> >> > Questions:
> >> >
> >> > What are other people's experiences with upgrading pandas? And would
> >> > more
> >> > bug-fix releases actually ease the upgrading?
> >> > Do we want to do more bug-fix releases?
> >> > Having a maintenance branch and backporting fixes is extra work. Would
> >> > we be
> >> > able to handle this? Would it be worth the effort?
> >> >
> >> > (It has been mentioned before, but I think the main point raised was
> >> > lack of
> >> > manpower to maintain separate branches)
> >> >
> >> > To put it another way. In our whatsnew notice there is "We recommend
> >> > that
> >> > all users upgrade to this version", but I am actually not sure we
> should
> >> > recommend that. I personally do not always recommend that no matter
> what
> >> > without careful consideration.
> >> >
> >> > Regards,
> >> > Joris
> >> >
> >> > _______________________________________________
> >> > Pandas-dev mailing list
> >> > Pandas-dev at python.org
> >> > https://mail.python.org/mailman/listinfo/pandas-dev
> >> >
> >> _______________________________________________
> >> Pandas-dev mailing list
> >> Pandas-dev at python.org
> >> https://mail.python.org/mailman/listinfo/pandas-dev
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20160309/ba60b7ea/attachment-0001.html>