[Pandas-dev] On bug-fix releases and maintenance branches

Tue Mar 15 23:16:01 EDT 2016

On Tue, Mar 8, 2016 at 5:23 PM, Joris Van den Bossche
<jorisvandenbossche at gmail.com> wrote:
>
>
> 2016-03-08 23:48 GMT+01:00 Wes McKinney <wesmckinn at gmail.com>:
>>
>> hey Jeff,
>>
>> On Tue, Feb 23, 2016 at 12:11 PM, Jeff Reback <jeffreback at gmail.com>
>> wrote:
>> > Thanks for bringing this up joris, here are some thoughts.
>> >
>> > 1) I agree that the next releases should probably focus on bug fixes. So
>> > this might mean
>> > we should shoot for 0.18.2....3 etc.
>> >
>> > However, we do need a 0.19.0 in order to provide any big deprecations
>> > (Panel) and API changes that
>> > are needed.
>> >
>> > 2) I am a bit hesitant to even make a big break (1.0) because I have
>> > seen
>> > this just bifurcating people (e.g. do I upgrade now, what if I want
>> > compat). This just creates less community. So I think this should be a
>> > goal,
>> > that even though its called 1.0 it is as back-compat as possible.
>> >
>>
>> Yeah, with more significant internal refactoring the goal would be to
>> not break API compatibility unless absolutely necessary.
>
>
> Jeff, to be clear, my initial mail was not to discuss the issue whether to
> do a major breaking release or not, or going into general maintanance mode
> or not (that's certainly an interesting discussion, but another one I
> think). The fact is that we are still cleaning up things and so do breakages
> in 0.X releases (like the resample now), and that won't directly stop.
>
> But given that context, we can think about how to do 0.X.X releases that
> help users as much as possible to upgrade smoothly.
> We now put quite a lot in a micro 0.X.X bug-fix releases (including new
> features), which can have the consequence that it introduces new bugs.
>
>>
>> > 3) Releases can be big, and do fix lots of bugs, and usually introduce
>> > new
>> > ones. This is almost inevitable as we add new features, changes, and
>> > even
>> > bug fixes which occasionally have regressions (though test suite is
>> > pretty
>> > good, so hopefully not too often).
>> >
>> > 4) I don't relish backporting things. I think this could lead to lots of
>> > headaches and IMHO doesn't really buy much.
>> >
>>
>> I think what we are talking about is backporting bug fixes for major
>> brokenness (e.g. serious correctness issues) or regressions that
>> aren't caught by major release time. I think what's been happening in
>> practice is that people are creating their own patched bugfix versions
>> of releases to avoid the pain induced by API-breakage in major
>> releases.
>>
>> Obviously, continuing to innovate and clean up the API (with judicious
>> breakage where absolutely necessary -- I think the resampling cleanup
>> is a good example where the net benefit in the long run will be high)
>> but we have to take care of the user base, many of whom depend on
>> pandas in production applications.
>>
>> This is all made more difficult because there isn't any direct cash
>> flow funding pandas development AFAICT. Where I work, for example, we
>> have many employees who are responsible for creating patched builds
>> and handling backports for otherwise API-stable branches of major
>> Apache open source projects. But we can afford to do this because
>> customers are paying for this (priority support and backports /
>> patched builds).
>>
>> So what I would suggest, in lieu of financial support for backports
>> and maintenance builds, is that we consider maint-0.XX.X branches for
>> backporting only the most serious of serious bug fixes ("Bad Bugs").
>> Major regressions and correctness issues should go into this bucket.
>> Perhaps we can start doing this with 0.18.x -- as a matter of process
>> if any PR appears to fix a Bad Bug it should be brought up here on the
>> mailing list so we can decide whether it should be backported.
>>
>
> With regard to the possible concern of "this is too much work": I don't
> think it would be many bug fixes that would be backported.
> For example, the last micro release, 0.17.1 had quite a lot of new features
> and the whatsnew notes listed 50 bug fixes. But a lot of these bug fixes
> were not regressions, but were bugs that were also in the previous releases.
> So if we restrict the 0.xx.x release to only regressions, it would be a much
> smaller of maybe 10 to 15 bug fixes (rough estimate, didn't look into
> detail). But in any case I think this would be a rather manageable amount.
>
> So that would make our bug fix releases smaller, and we also don't have to
> hold up master with breaking changes/larger new features until one or two
> bug-fix releases are released.
>
> For me, the fixes that could go in such a bug-fix release:
>
> - bug fixes or clean-up of rough edges of major new features in the 0.X
> release (for example for 0.18.1 possible changes to the newly introduced
> RangeIndex)
> - regressions, issues that were not present in the previous 0.X release, and
> could make it therefore more difficult to upgrade
>
> + the correctness issues that Wes mentioned.
>
>
>
>>
>> > 5) We don't want to just go into maintenance mode because we still have
>> > a
>> > fair amount of feature requests. (though these are often pretty
>> > targeted),
>> > but off of the top of my head, nothing really *new*, mainly some API
>> > changes
>> > to bring consistency. E.g. ``.agg`` on a DataFrame is a long-requested
>> > feature, which actually after 0.18.0 is quite trivially to do.
>> >
>>
>> Yeah, I think we should try to stick with
>> https://en.wikipedia.org/wiki/Open/closed_principle -- so
>> conveniences, extensions to existing APIs, and other helpful new
>> features are fair game, but breaking API changes should be
>>
>> > 6) I think we telegraph any API changes and really really try to have
>> > back-compat, so people do have the ability to upgrade at their leisure.
>> >
>> >> API changes are most painful for users who do not write tests for
>> >> their code that depends on pandas. That problem is probably not
>> >> fixable =)
>> >
>> >
>> > of course this is a telling point. pandas upgrades often expose bugs in
>> > user
>> > code. I view this as a good thing!
>> >
>> > So given all of the somewhat contradictory points above, what do I
>> > really
>> > think we should do?
>> >
>> > In order for pandas to be (even more) of a force in leading the
>> > scientific
>> > community. I think we have to grow. So having more contributors is a
>> > great
>> > thing. People do like / appreciate fixing bugs, but even more (IMHO),
>> > are
>> > performance enhancements and *some* new features.
>> >
>> > I will probably try to do more bug-fixing (rather than large API's ish
>> > fixes) I think. There is quite a back-log. This should *slow* the issue
>> > of
>> > the BIG API changes.
>> >
>> > So I am kind of -1 on backports for mostly 2), it seems to just slow
>> > things
>> > down, and 4) it can often lead to MORE things being inconcistent (you
>> > need
>> > machinery to ensure that what is backported is correct and is included).
>> > I
>> > can easily forsee that we decide to create 'stable' branches, which in
>> > fact
>> > are stable but might have inconsistent fixes, this is even more
>> > confusing in
>> > my view.
>> >
>>
>> Let me know what you think about my Bad Bug = backport policy. This is
>> mostly about communication and keeping track of serious issues that
>> should necessitate upgrading.
>>
>> I also think we should try to keep minor releases API stable from here
>> on out; so this may result in our version numbers increasing more
>> quickly but that's OK for the improved communication about "what is a
>> minor release (major release plus bug fix backports)"
>
>
> Just for clarity, with minor release, do you mean the 0.X releases? (because
> 0.X.X matches more the 'major release plus bug fix backports' description)
>

Sorry, I meant that 0.X.Y should be API stable with all other 0.X versions

> Joris
>
>