[Pandas-dev] On bug-fix releases and maintenance branches

Wes McKinney wesmckinn at gmail.com
Tue Mar 8 17:48:43 EST 2016


hey Jeff,

On Tue, Feb 23, 2016 at 12:11 PM, Jeff Reback <jeffreback at gmail.com> wrote:
> Thanks for bringing this up joris, here are some thoughts.
>
> 1) I agree that the next releases should probably focus on bug fixes. So
> this might mean
> we should shoot for 0.18.2....3 etc.
>
> However, we do need a 0.19.0 in order to provide any big deprecations
> (Panel) and API changes that
> are needed.
>
> 2) I am a bit hesitant to even make a big break (1.0) because I have seen
> this just bifurcating people (e.g. do I upgrade now, what if I want
> compat). This just creates less community. So I think this should be a goal,
> that even though its called 1.0 it is as back-compat as possible.
>

Yeah, with more significant internal refactoring the goal would be to
not break API compatibility unless absolutely necessary. However,
fixing such horror shows as this

In [2]: import pandas as pd

In [3]: s = pd.Series([1,2,3])

In [4]: s
Out[4]:
0    1
1    2
2    3
dtype: int64

In [5]: import numpy as np

In [6]: s[1] = np.nan

In [7]: s
Out[7]:
0     1
1   NaN
2     3
dtype: float64

should be fair game.

> 3) Releases can be big, and do fix lots of bugs, and usually introduce new
> ones. This is almost inevitable as we add new features, changes, and even
> bug fixes which occasionally have regressions (though test suite is pretty
> good, so hopefully not too often).
>
> 4) I don't relish backporting things. I think this could lead to lots of
> headaches and IMHO doesn't really buy much.
>

I think what we are talking about is backporting bug fixes for major
brokenness (e.g. serious correctness issues) or regressions that
aren't caught by major release time. I think what's been happening in
practice is that people are creating their own patched bugfix versions
of releases to avoid the pain induced by API-breakage in major
releases.

Obviously, continuing to innovate and clean up the API (with judicious
breakage where absolutely necessary -- I think the resampling cleanup
is a good example where the net benefit in the long run will be high)
but we have to take care of the user base, many of whom depend on
pandas in production applications.

This is all made more difficult because there isn't any direct cash
flow funding pandas development AFAICT. Where I work, for example, we
have many employees who are responsible for creating patched builds
and handling backports for otherwise API-stable branches of major
Apache open source projects. But we can afford to do this because
customers are paying for this (priority support and backports /
patched builds).

So what I would suggest, in lieu of financial support for backports
and maintenance builds, is that we consider maint-0.XX.X branches for
backporting only the most serious of serious bug fixes ("Bad Bugs").
Major regressions and correctness issues should go into this bucket.
Perhaps we can start doing this with 0.18.x -- as a matter of process
if any PR appears to fix a Bad Bug it should be brought up here on the
mailing list so we can decide whether it should be backported.

> 5) We don't want to just go into maintenance mode because we still have a
> fair amount of feature requests. (though these are often pretty targeted),
> but off of the top of my head, nothing really *new*, mainly some API changes
> to bring consistency. E.g. ``.agg`` on a DataFrame is a long-requested
> feature, which actually after 0.18.0 is quite trivially to do.
>

Yeah, I think we should try to stick with
https://en.wikipedia.org/wiki/Open/closed_principle -- so
conveniences, extensions to existing APIs, and other helpful new
features are fair game, but breaking API changes should be

> 6) I think we telegraph any API changes and really really try to have
> back-compat, so people do have the ability to upgrade at their leisure.
>
>> API changes are most painful for users who do not write tests for
>> their code that depends on pandas. That problem is probably not
>> fixable =)
>
>
> of course this is a telling point. pandas upgrades often expose bugs in user
> code. I view this as a good thing!
>
> So given all of the somewhat contradictory points above, what do I really
> think we should do?
>
> In order for pandas to be (even more) of a force in leading the scientific
> community. I think we have to grow. So having more contributors is a great
> thing. People do like / appreciate fixing bugs, but even more (IMHO), are
> performance enhancements and *some* new features.
>
> I will probably try to do more bug-fixing (rather than large API's ish
> fixes) I think. There is quite a back-log. This should *slow* the issue of
> the BIG API changes.
>
> So I am kind of -1 on backports for mostly 2), it seems to just slow things
> down, and 4) it can often lead to MORE things being inconcistent (you need
> machinery to ensure that what is backported is correct and is included). I
> can easily forsee that we decide to create 'stable' branches, which in fact
> are stable but might have inconsistent fixes, this is even more confusing in
> my view.
>

Let me know what you think about my Bad Bug = backport policy. This is
mostly about communication and keeping track of serious issues that
should necessitate upgrading.

I also think we should try to keep minor releases API stable from here
on out; so this may result in our version numbers increasing more
quickly but that's OK for the improved communication about "what is a
minor release (major release plus bug fix backports)"

- Wes

> I think we have a fairly aggressive release cycle. We for sure don't want to
> debate everything. I am of the opinion that it is much better to put things
> out there quicker, then to endlessly debate extremely minor points (not
> naming project names here :).
>
> For the general user what we do w.r.t. release cycles probably doesn't
> matter, and for the corporate user, they almost always have a 'fixed'
> version anyhow (and then they do of course port the new ones, but then they
> have people upgraded it carefully). I am not so sure we should impose
> structure on this. We already have announced major releases and minor
> releases.
>
> All for better 'language' in the minor releases.
>
> Jeff
>
>
> On Tue, Feb 23, 2016 at 2:21 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>
>> hi Joris,
>>
>> I'm sorry it's taken a couple weeks to write a reply -- been really
>> busy and wanted to put some thought into this.
>>
>> This is a really important discussion given how important pandas has
>> become to so many people, thank you for bringing it up.
>>
>> On Tue, Feb 9, 2016 at 4:59 PM, Joris Van den Bossche
>> <jorisvandenbossche at gmail.com> wrote:
>> > Hi all,
>> >
>> > I wanted to stir some discussion on pandas its policy on bug-fx releases
>> > and
>> > upgrading pains. First some context:
>> >
>> > Context part 1: Currently we do not use maintenance branches for bugfix
>> > releases, and we actually also do not really do bugfix releases. We just
>> > develop further on master, and try to not merge breaking changes the
>> > first
>> > weeks/months, so we can do a minor kind of bug-fix release (but usually
>> > also
>> > with a lot of new features).
>> > But we don't, for example, backport fixes of regressions if they are
>> > fixed
>> > after master is pointing to the next major release.
>>
>> I think in general it would be a good idea to tilt development away
>> from new feature development and toward bug fixes and stability. Given
>> that we are contemplating making some breaking changes in a 1.x
>> development branch (like removing the Panel classes), we should decide
>> as some point to create a 0.X.Y maintenance line where we can backport
>> bug fixes only, so that "legacy pandas" users can have a "LTS" (in
>> Ubuntu parlance) maintenance branch. This introduces some development
>> overhead but it seems worth it.
>>
>> >
>> > Context part 2: pandas is not yet that stable, in the sense that there
>> > are
>> > still quite some breaking changes in each release. I am not arguing for
>> > not
>> > doing these breaking changes, as some of these changes are really needed
>> > to
>> > clean up the API  (although there are also arguments for that, but I
>> > think
>> > that is another discussion). This has the consequence that updating your
>> > pandas version is not always that pleasant.
>>
>> Over the years I've heard many horror stories from companies who have
>> created and maintained internal 0.7.x, 0.8.x, or 0.9.x pandas forks
>> because of the API breakage issues. This is definitely an anti-pattern
>> that we should try to avoid happening in the future, but API breakages
>> in many cases are the inevitable price of progress.
>>
>> Some of the API breakage has resulted from experiences accumulated
>> over a long period of time -- I made a lot of decisions early on in
>> the project that ended up not being the right ones (e.g. resample
>> default arguments changed at one point). There wasn't enough community
>> engagement at that point to have a thorough design process to
>> potentially come up with the "right" design first. In other cases, the
>> "right" choice was perhaps more ambiguous.
>>
>> API changes are most painful for users who do not write tests for
>> their code that depends on pandas. That problem is probably not
>> fixable =)
>>
>> I think having stable releases with backports of serious correctness
>> bugs helps mitigate this problem, whereas modest API changes between
>> major releases. I would also be in favor of having point releases only
>> contain bug fixes rather than the current system of point releases
>> being a stable snapshot of trunk.
>>
>> Since Jeff is the most affected by this on a day to day basis as de
>> facto steward of the PR queue I would be curious what process he feels
>> would be the most helpful.
>>
>> - Wes
>>
>> >
>> > Sidenote: I have not that much experience with using pandas in a larger
>> > company or in larger codebases that need to be upgraded, rather with
>> > just my
>> > own code for my PhD. So it is difficult for me to judge on how much this
>> > is
>> > a problem or if bug-fx releases would help.
>> >
>> > Questions:
>> >
>> > What are other people's experiences with upgrading pandas? And would
>> > more
>> > bug-fix releases actually ease the upgrading?
>> > Do we want to do more bug-fix releases?
>> > Having a maintenance branch and backporting fixes is extra work. Would
>> > we be
>> > able to handle this? Would it be worth the effort?
>> >
>> > (It has been mentioned before, but I think the main point raised was
>> > lack of
>> > manpower to maintain separate branches)
>> >
>> > To put it another way. In our whatsnew notice there is "We recommend
>> > that
>> > all users upgrade to this version", but I am actually not sure we should
>> > recommend that. I personally do not always recommend that no matter what
>> > without careful consideration.
>> >
>> > Regards,
>> > Joris
>> >
>> > _______________________________________________
>> > Pandas-dev mailing list
>> > Pandas-dev at python.org
>> > https://mail.python.org/mailman/listinfo/pandas-dev
>> >
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>
>


More information about the Pandas-dev mailing list