From jorisvandenbossche at gmail.com  Sun Nov 12 12:24:46 2017
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Sun, 12 Nov 2017 18:24:46 +0100
Subject: [Pandas-dev] Online dev meeting - wednesday 15th November 6pm UTC
Message-ID: <CALQtMBYpVtm=-pOsnLzGkRrX+BowouJ8D7Xi8+_zVHv5UdJ8yw@mail.gmail.com>

Hi all,

FYI, we are planning a dev meeting coming Wednesday at 6-7pm UTC. If you
are interested to join, always welcome! (https://appear.in/pandas-dev)

Best,
Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171112/f200192c/attachment.html>

From jorisvandenbossche at gmail.com  Mon Nov 13 16:54:20 2017
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Mon, 13 Nov 2017 22:54:20 +0100
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
Message-ID: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>

Hi all,

Currently when PRs get outdated, we often ask to "rebase and update"
(although to be honest, it mainly Jeff that is doing most of this work to
ping stale PRs), and rebasing is also how it is explained in the docs (
http://pandas-docs.github.io/pandas-docs-travis/contributing.html#creating-a-branch).

And also many active contributors use rebasing while working on a PR to get
in sync with changes in master.

I would like to propose to change this policy from rebasing to merging (=
merging master in the feature branch, creating a merge commit).

Some reasons for this:

- I personally think this is easier to do (certainly for less experienced
git users; conflicts can be easier to solve, ..)
- It makes it easier to follow what has changed (certainly if we extend
'not rebasing' with 'not squash rebasing'), making it easier review
- It doesn't destroy links to github PR comments
- Since we squash on merge in the end, we don't care about the additional
merge commits in the PR's history

Thoughts on this?

If we would agree on this, that would mean: update the docs + start doing
it ourselves + start asking that of contributors consistently.

Regards,
Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171113/b2c89eb3/attachment.html>

From shoyer at gmail.com  Mon Nov 13 16:58:23 2017
From: shoyer at gmail.com (Stephan Hoyer)
Date: Mon, 13 Nov 2017 21:58:23 +0000
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
In-Reply-To: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
References: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
Message-ID: <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>

+1 for merging instead of rebasing. Not losing comment history in PRs is a
major bonus, and the end result (when we squash on merge) is basically
looks the same.

In fact, I would say with this workflow GitHub almost works as well as
Gerrit ;).

On Mon, Nov 13, 2017 at 1:54 PM Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

> Hi all,
>
> Currently when PRs get outdated, we often ask to "rebase and update"
> (although to be honest, it mainly Jeff that is doing most of this work to
> ping stale PRs), and rebasing is also how it is explained in the docs (
> http://pandas-docs.github.io/pandas-docs-travis/contributing.html#creating-a-branch).
>
> And also many active contributors use rebasing while working on a PR to
> get in sync with changes in master.
>
> I would like to propose to change this policy from rebasing to merging (=
> merging master in the feature branch, creating a merge commit).
>
> Some reasons for this:
>
> - I personally think this is easier to do (certainly for less experienced
> git users; conflicts can be easier to solve, ..)
> - It makes it easier to follow what has changed (certainly if we extend
> 'not rebasing' with 'not squash rebasing'), making it easier review
> - It doesn't destroy links to github PR comments
> - Since we squash on merge in the end, we don't care about the additional
> merge commits in the PR's history
>
> Thoughts on this?
>
> If we would agree on this, that would mean: update the docs + start doing
> it ourselves + start asking that of contributors consistently.
>
> Regards,
> Joris
>
>
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171113/4a3251df/attachment.html>

From tom.augspurger88 at gmail.com  Mon Nov 13 17:02:43 2017
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Mon, 13 Nov 2017 16:02:43 -0600
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
In-Reply-To: <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>
References: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
 <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>
Message-ID: <CAE1aY-nh4UhJw2AHebGbednsxALFFrNRU91+CFBNAGjH_wQBQQ@mail.gmail.com>

Yes, recommending merging instead of rebasing seems OK now that Github has
squash on merge.

Tom

On Mon, Nov 13, 2017 at 3:58 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> +1 for merging instead of rebasing. Not losing comment history in PRs is a
> major bonus, and the end result (when we squash on merge) is basically
> looks the same.
>
> In fact, I would say with this workflow GitHub almost works as well as
> Gerrit ;).
>
> On Mon, Nov 13, 2017 at 1:54 PM Joris Van den Bossche <
> jorisvandenbossche at gmail.com> wrote:
>
>> Hi all,
>>
>> Currently when PRs get outdated, we often ask to "rebase and update"
>> (although to be honest, it mainly Jeff that is doing most of this work to
>> ping stale PRs), and rebasing is also how it is explained in the docs (
>> http://pandas-docs.github.io/pandas-docs-travis/
>> contributing.html#creating-a-branch).
>> And also many active contributors use rebasing while working on a PR to
>> get in sync with changes in master.
>>
>> I would like to propose to change this policy from rebasing to merging (=
>> merging master in the feature branch, creating a merge commit).
>>
>> Some reasons for this:
>>
>> - I personally think this is easier to do (certainly for less experienced
>> git users; conflicts can be easier to solve, ..)
>> - It makes it easier to follow what has changed (certainly if we extend
>> 'not rebasing' with 'not squash rebasing'), making it easier review
>> - It doesn't destroy links to github PR comments
>> - Since we squash on merge in the end, we don't care about the additional
>> merge commits in the PR's history
>>
>> Thoughts on this?
>>
>> If we would agree on this, that would mean: update the docs + start doing
>> it ourselves + start asking that of contributors consistently.
>>
>> Regards,
>> Joris
>>
>>
>>
>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171113/c3c5d3eb/attachment.html>

From jeffreback at gmail.com  Mon Nov 13 17:05:18 2017
From: jeffreback at gmail.com (Jeff Reback)
Date: Mon, 13 Nov 2017 17:05:18 -0500
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
In-Reply-To: <CAE1aY-nh4UhJw2AHebGbednsxALFFrNRU91+CFBNAGjH_wQBQQ@mail.gmail.com>
References: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
 <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>
 <CAE1aY-nh4UhJw2AHebGbednsxALFFrNRU91+CFBNAGjH_wQBQQ@mail.gmail.com>
Message-ID: <DC094164-987C-4038-90DC-C490A7ED2CB0@gmail.com>

i don?t care what people actually do in there own branches; though i find rebase much easier to read

as long as github squashes then it?s fine

the issue is that when i need to look at there branches locally they are always a mess and very hard to follow

i would still recommend rebasing

On Nov 13, 2017, at 5:02 PM, Tom Augspurger <tom.augspurger88 at gmail.com> wrote:
> 
> Yes, recommending merging instead of rebasing seems OK now that Github has squash on merge.
> 
> Tom
> 
>> On Mon, Nov 13, 2017 at 3:58 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
>> +1 for merging instead of rebasing. Not losing comment history in PRs is a major bonus, and the end result (when we squash on merge) is basically looks the same.
>> 
>> In fact, I would say with this workflow GitHub almost works as well as Gerrit ;).
>> 
>>> On Mon, Nov 13, 2017 at 1:54 PM Joris Van den Bossche <jorisvandenbossche at gmail.com> wrote:
>>> Hi all,
>>> 
>>> Currently when PRs get outdated, we often ask to "rebase and update" (although to be honest, it mainly Jeff that is doing most of this work to ping stale PRs), and rebasing is also how it is explained in the docs (http://pandas-docs.github.io/pandas-docs-travis/contributing.html#creating-a-branch). 
>>> And also many active contributors use rebasing while working on a PR to get in sync with changes in master.
>>> 
>>> I would like to propose to change this policy from rebasing to merging (= merging master in the feature branch, creating a merge commit). 
>>> 
>>> Some reasons for this:
>>> 
>>> - I personally think this is easier to do (certainly for less experienced git users; conflicts can be easier to solve, ..)
>>> - It makes it easier to follow what has changed (certainly if we extend 'not rebasing' with 'not squash rebasing'), making it easier review
>>> - It doesn't destroy links to github PR comments
>>> - Since we squash on merge in the end, we don't care about the additional merge commits in the PR's history
>>> 
>>> Thoughts on this?
>>> 
>>> If we would agree on this, that would mean: update the docs + start doing it ourselves + start asking that of contributors consistently.
>>> 
>>> Regards,
>>> Joris
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>> 
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>> 
> 
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171113/87349251/attachment-0001.html>

From jorisvandenbossche at gmail.com  Mon Nov 13 17:49:47 2017
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Mon, 13 Nov 2017 23:49:47 +0100
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
In-Reply-To: <DC094164-987C-4038-90DC-C490A7ED2CB0@gmail.com>
References: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
 <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>
 <CAE1aY-nh4UhJw2AHebGbednsxALFFrNRU91+CFBNAGjH_wQBQQ@mail.gmail.com>
 <DC094164-987C-4038-90DC-C490A7ED2CB0@gmail.com>
Message-ID: <CALQtMBY4P+ZCsvZFepPe9Lyu5awNG2_d=uUgP09dqdhNo9r5_A@mail.gmail.com>

2017-11-13 23:05 GMT+01:00 Jeff Reback <jeffreback at gmail.com>:

> i don?t care what people actually do in there own branches;
>

The point is a bit that I actually *do* care about what you do in your
branches. I would find it easier for the PRs I am reviewing that people
would add commits (and merge master to sync with latest changes) than to
rebase and amend or squash commits, or add commit + rebase.


> though i find rebase much easier to read
>
> as long as github squashes then it?s fine
>
> the issue is that when i need to look at there branches locally they are
> always a mess and very hard to follow
>

Can you give an example of what is hard? Eg if the branch is out of date, I
typically just merge master in it.


> i would still recommend rebasing
>
> On Nov 13, 2017, at 5:02 PM, Tom Augspurger <tom.augspurger88 at gmail.com>
> wrote:
>
> Yes, recommending merging instead of rebasing seems OK now that Github has
> squash on merge.
>
> Tom
>
> On Mon, Nov 13, 2017 at 3:58 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> +1 for merging instead of rebasing. Not losing comment history in PRs is
>> a major bonus, and the end result (when we squash on merge) is basically
>> looks the same.
>>
>> In fact, I would say with this workflow GitHub almost works as well as
>> Gerrit ;).
>>
>> On Mon, Nov 13, 2017 at 1:54 PM Joris Van den Bossche <
>> jorisvandenbossche at gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Currently when PRs get outdated, we often ask to "rebase and update"
>>> (although to be honest, it mainly Jeff that is doing most of this work to
>>> ping stale PRs), and rebasing is also how it is explained in the docs (
>>> http://pandas-docs.github.io/pandas-docs-travis/contributin
>>> g.html#creating-a-branch).
>>> And also many active contributors use rebasing while working on a PR to
>>> get in sync with changes in master.
>>>
>>> I would like to propose to change this policy from rebasing to merging
>>> (= merging master in the feature branch, creating a merge commit).
>>>
>>> Some reasons for this:
>>>
>>> - I personally think this is easier to do (certainly for less
>>> experienced git users; conflicts can be easier to solve, ..)
>>> - It makes it easier to follow what has changed (certainly if we extend
>>> 'not rebasing' with 'not squash rebasing'), making it easier review
>>> - It doesn't destroy links to github PR comments
>>> - Since we squash on merge in the end, we don't care about the
>>> additional merge commits in the PR's history
>>>
>>> Thoughts on this?
>>>
>>> If we would agree on this, that would mean: update the docs + start
>>> doing it ourselves + start asking that of contributors consistently.
>>>
>>> Regards,
>>> Joris
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171113/43bb08e0/attachment.html>

From wesmckinn at gmail.com  Mon Nov 13 23:34:53 2017
From: wesmckinn at gmail.com (Wes McKinney)
Date: Mon, 13 Nov 2017 23:34:53 -0500
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
In-Reply-To: <CALQtMBY4P+ZCsvZFepPe9Lyu5awNG2_d=uUgP09dqdhNo9r5_A@mail.gmail.com>
References: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
 <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>
 <CAE1aY-nh4UhJw2AHebGbednsxALFFrNRU91+CFBNAGjH_wQBQQ@mail.gmail.com>
 <DC094164-987C-4038-90DC-C490A7ED2CB0@gmail.com>
 <CALQtMBY4P+ZCsvZFepPe9Lyu5awNG2_d=uUgP09dqdhNo9r5_A@mail.gmail.com>
Message-ID: <CAJPUwMBccaecznhegtmJoBBE1BxwGLOmRZT3NWGya_U0wnK26Q@mail.gmail.com>

I think as long as clean atomic commits end up in master (which the
merge/squash tool takes care of) then whatever is convenient for the
contributor is fine. I personally prefer clean rebases but some users
struggle with rebasing.

- Wes

On Mon, Nov 13, 2017 at 5:49 PM, Joris Van den Bossche
<jorisvandenbossche at gmail.com> wrote:
> 2017-11-13 23:05 GMT+01:00 Jeff Reback <jeffreback at gmail.com>:
>>
>> i don?t care what people actually do in there own branches;
>
>
> The point is a bit that I actually do care about what you do in your
> branches. I would find it easier for the PRs I am reviewing that people
> would add commits (and merge master to sync with latest changes) than to
> rebase and amend or squash commits, or add commit + rebase.
>
>>
>> though i find rebase much easier to read
>>
>> as long as github squashes then it?s fine
>>
>> the issue is that when i need to look at there branches locally they are
>> always a mess and very hard to follow
>
>
> Can you give an example of what is hard? Eg if the branch is out of date, I
> typically just merge master in it.
>
>>
>> i would still recommend rebasing
>>
>> On Nov 13, 2017, at 5:02 PM, Tom Augspurger <tom.augspurger88 at gmail.com>
>> wrote:
>>
>> Yes, recommending merging instead of rebasing seems OK now that Github has
>> squash on merge.
>>
>> Tom
>>
>> On Mon, Nov 13, 2017 at 3:58 PM, Stephan Hoyer <shoyer at gmail.com> wrote:
>>>
>>> +1 for merging instead of rebasing. Not losing comment history in PRs is
>>> a major bonus, and the end result (when we squash on merge) is basically
>>> looks the same.
>>>
>>> In fact, I would say with this workflow GitHub almost works as well as
>>> Gerrit ;).
>>>
>>> On Mon, Nov 13, 2017 at 1:54 PM Joris Van den Bossche
>>> <jorisvandenbossche at gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Currently when PRs get outdated, we often ask to "rebase and update"
>>>> (although to be honest, it mainly Jeff that is doing most of this work to
>>>> ping stale PRs), and rebasing is also how it is explained in the docs
>>>> (http://pandas-docs.github.io/pandas-docs-travis/contributing.html#creating-a-branch).
>>>> And also many active contributors use rebasing while working on a PR to
>>>> get in sync with changes in master.
>>>>
>>>> I would like to propose to change this policy from rebasing to merging
>>>> (= merging master in the feature branch, creating a merge commit).
>>>>
>>>> Some reasons for this:
>>>>
>>>> - I personally think this is easier to do (certainly for less
>>>> experienced git users; conflicts can be easier to solve, ..)
>>>> - It makes it easier to follow what has changed (certainly if we extend
>>>> 'not rebasing' with 'not squash rebasing'), making it easier review
>>>> - It doesn't destroy links to github PR comments
>>>> - Since we squash on merge in the end, we don't care about the
>>>> additional merge commits in the PR's history
>>>>
>>>> Thoughts on this?
>>>>
>>>> If we would agree on this, that would mean: update the docs + start
>>>> doing it ourselves + start asking that of contributors consistently.
>>>>
>>>> Regards,
>>>> Joris
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>

From jorisvandenbossche at gmail.com  Tue Nov 14 09:49:58 2017
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Tue, 14 Nov 2017 15:49:58 +0100
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
In-Reply-To: <CAJPUwMBccaecznhegtmJoBBE1BxwGLOmRZT3NWGya_U0wnK26Q@mail.gmail.com>
References: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
 <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>
 <CAE1aY-nh4UhJw2AHebGbednsxALFFrNRU91+CFBNAGjH_wQBQQ@mail.gmail.com>
 <DC094164-987C-4038-90DC-C490A7ED2CB0@gmail.com>
 <CALQtMBY4P+ZCsvZFepPe9Lyu5awNG2_d=uUgP09dqdhNo9r5_A@mail.gmail.com>
 <CAJPUwMBccaecznhegtmJoBBE1BxwGLOmRZT3NWGya_U0wnK26Q@mail.gmail.com>
Message-ID: <CALQtMBZq04N1LRaqDjU7XbK44xsKEdRpGmBxGwzCVo02FiHTug@mail.gmail.com>

Another advantage (IMO) that I forgot:

- when a contributor adds a commit to a branch instead of
squash+rebasing/amending new additions, you get notified of that by github.
In that way, I know the contributor has actually pushed updates related to
my review, and I know I have to look again at the PR (otherwise I have to
check the PR from time to time to see if the contributor pushed new
changes).

2017-11-14 5:34 GMT+01:00 Wes McKinney <wesmckinn at gmail.com>:

> I think as long as clean atomic commits end up in master (which the
> merge/squash tool takes care of) then whatever is convenient for the
> contributor is fine. I personally prefer clean rebases but some users
> struggle with rebasing.


My preference to not rebase is not only from a contributor point of view,
but mainly from reviewer point of view.
So my point is basically that, for me personally, this "whathever is
convenient for the contributor" is not fine for me as a reviewer.

But of course, if the different reviewers don't share this preference, we
can't ask something specific from the contributors. That's the reason I
opened this discussion to see if there would be agreement.


> - Wes
>
> On Mon, Nov 13, 2017 at 5:49 PM, Joris Van den Bossche
> <jorisvandenbossche at gmail.com> wrote:
> > 2017-11-13 23:05 GMT+01:00 Jeff Reback <jeffreback at gmail.com>:
> >>
> >> i don?t care what people actually do in there own branches;
> >
> >
> > The point is a bit that I actually do care about what you do in your
> > branches. I would find it easier for the PRs I am reviewing that people
> > would add commits (and merge master to sync with latest changes) than to
> > rebase and amend or squash commits, or add commit + rebase.
> >
> >>
> >> though i find rebase much easier to read
> >>
> >> as long as github squashes then it?s fine
> >>
> >> the issue is that when i need to look at there branches locally they are
> >> always a mess and very hard to follow
> >
> >
> > Can you give an example of what is hard? Eg if the branch is out of
> date, I
> > typically just merge master in it.
> >
> >>
> >> i would still recommend rebasing
> >>
> >> On Nov 13, 2017, at 5:02 PM, Tom Augspurger <tom.augspurger88 at gmail.com
> >
> >> wrote:
> >>
> >> Yes, recommending merging instead of rebasing seems OK now that Github
> has
> >> squash on merge.
> >>
> >> Tom
> >>
> >> On Mon, Nov 13, 2017 at 3:58 PM, Stephan Hoyer <shoyer at gmail.com>
> wrote:
> >>>
> >>> +1 for merging instead of rebasing. Not losing comment history in PRs
> is
> >>> a major bonus, and the end result (when we squash on merge) is
> basically
> >>> looks the same.
> >>>
> >>> In fact, I would say with this workflow GitHub almost works as well as
> >>> Gerrit ;).
> >>>
> >>> On Mon, Nov 13, 2017 at 1:54 PM Joris Van den Bossche
> >>> <jorisvandenbossche at gmail.com> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> Currently when PRs get outdated, we often ask to "rebase and update"
> >>>> (although to be honest, it mainly Jeff that is doing most of this
> work to
> >>>> ping stale PRs), and rebasing is also how it is explained in the docs
> >>>> (http://pandas-docs.github.io/pandas-docs-travis/
> contributing.html#creating-a-branch).
> >>>> And also many active contributors use rebasing while working on a PR
> to
> >>>> get in sync with changes in master.
> >>>>
> >>>> I would like to propose to change this policy from rebasing to merging
> >>>> (= merging master in the feature branch, creating a merge commit).
> >>>>
> >>>> Some reasons for this:
> >>>>
> >>>> - I personally think this is easier to do (certainly for less
> >>>> experienced git users; conflicts can be easier to solve, ..)
> >>>> - It makes it easier to follow what has changed (certainly if we
> extend
> >>>> 'not rebasing' with 'not squash rebasing'), making it easier review
> >>>> - It doesn't destroy links to github PR comments
> >>>> - Since we squash on merge in the end, we don't care about the
> >>>> additional merge commits in the PR's history
> >>>>
> >>>> Thoughts on this?
> >>>>
> >>>> If we would agree on this, that would mean: update the docs + start
> >>>> doing it ourselves + start asking that of contributors consistently.
> >>>>
> >>>> Regards,
> >>>> Joris
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Pandas-dev mailing list
> >>>> Pandas-dev at python.org
> >>>> https://mail.python.org/mailman/listinfo/pandas-dev
> >>>
> >>>
> >>> _______________________________________________
> >>> Pandas-dev mailing list
> >>> Pandas-dev at python.org
> >>> https://mail.python.org/mailman/listinfo/pandas-dev
> >>>
> >>
> >> _______________________________________________
> >> Pandas-dev mailing list
> >> Pandas-dev at python.org
> >> https://mail.python.org/mailman/listinfo/pandas-dev
> >>
> >>
> >> _______________________________________________
> >> Pandas-dev mailing list
> >> Pandas-dev at python.org
> >> https://mail.python.org/mailman/listinfo/pandas-dev
> >>
> >
> >
> > _______________________________________________
> > Pandas-dev mailing list
> > Pandas-dev at python.org
> > https://mail.python.org/mailman/listinfo/pandas-dev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171114/11d31ac1/attachment-0001.html>

From wesmckinn at gmail.com  Tue Nov 14 10:55:12 2017
From: wesmckinn at gmail.com (Wes McKinney)
Date: Tue, 14 Nov 2017 10:55:12 -0500
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
In-Reply-To: <CALQtMBZq04N1LRaqDjU7XbK44xsKEdRpGmBxGwzCVo02FiHTug@mail.gmail.com>
References: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
 <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>
 <CAE1aY-nh4UhJw2AHebGbednsxALFFrNRU91+CFBNAGjH_wQBQQ@mail.gmail.com>
 <DC094164-987C-4038-90DC-C490A7ED2CB0@gmail.com>
 <CALQtMBY4P+ZCsvZFepPe9Lyu5awNG2_d=uUgP09dqdhNo9r5_A@mail.gmail.com>
 <CAJPUwMBccaecznhegtmJoBBE1BxwGLOmRZT3NWGya_U0wnK26Q@mail.gmail.com>
 <CALQtMBZq04N1LRaqDjU7XbK44xsKEdRpGmBxGwzCVo02FiHTug@mail.gmail.com>
Message-ID: <CAJPUwMD1JGb2-TO3x2HKOQQ-Jykussc9jjdXMoH5AdTbaS3_=A@mail.gmail.com>

Having now experienced Gerrit and other professional code review
tools, it's very hard for me to see GitHub's review in anything but a
negative light. I haven't found any version of contributor behavior to
make incremental reviews any easier -- this "has the contributor
updated the PR" is a problem that is completely solved / a non-issue
in tools like Gerrit (the other issues you cited also do not exist in
Gerrit). Since I'm not actively maintaining pandas PRs, the solution
that makes your lives as maintainers easiest is OK by me.

It's too bad we're in a position of advocating "messy" PR branches in
order to improve the UX of code reviews in GitHub. FWIW, I talked with
GitHub employees in person about these exact sort of problems in
October and I don't think it's likely to improve anytime soon.

- Wes

On Tue, Nov 14, 2017 at 9:49 AM, Joris Van den Bossche
<jorisvandenbossche at gmail.com> wrote:
> Another advantage (IMO) that I forgot:
>
> - when a contributor adds a commit to a branch instead of
> squash+rebasing/amending new additions, you get notified of that by github.
> In that way, I know the contributor has actually pushed updates related to
> my review, and I know I have to look again at the PR (otherwise I have to
> check the PR from time to time to see if the contributor pushed new
> changes).
>
> 2017-11-14 5:34 GMT+01:00 Wes McKinney <wesmckinn at gmail.com>:
>>
>> I think as long as clean atomic commits end up in master (which the
>> merge/squash tool takes care of) then whatever is convenient for the
>> contributor is fine. I personally prefer clean rebases but some users
>> struggle with rebasing.
>
>
> My preference to not rebase is not only from a contributor point of view,
> but mainly from reviewer point of view.
> So my point is basically that, for me personally, this "whathever is
> convenient for the contributor" is not fine for me as a reviewer.
>
> But of course, if the different reviewers don't share this preference, we
> can't ask something specific from the contributors. That's the reason I
> opened this discussion to see if there would be agreement.
>
>>
>> - Wes
>>
>> On Mon, Nov 13, 2017 at 5:49 PM, Joris Van den Bossche
>> <jorisvandenbossche at gmail.com> wrote:
>> > 2017-11-13 23:05 GMT+01:00 Jeff Reback <jeffreback at gmail.com>:
>> >>
>> >> i don?t care what people actually do in there own branches;
>> >
>> >
>> > The point is a bit that I actually do care about what you do in your
>> > branches. I would find it easier for the PRs I am reviewing that people
>> > would add commits (and merge master to sync with latest changes) than to
>> > rebase and amend or squash commits, or add commit + rebase.
>> >
>> >>
>> >> though i find rebase much easier to read
>> >>
>> >> as long as github squashes then it?s fine
>> >>
>> >> the issue is that when i need to look at there branches locally they
>> >> are
>> >> always a mess and very hard to follow
>> >
>> >
>> > Can you give an example of what is hard? Eg if the branch is out of
>> > date, I
>> > typically just merge master in it.
>> >
>> >>
>> >> i would still recommend rebasing
>> >>
>> >> On Nov 13, 2017, at 5:02 PM, Tom Augspurger
>> >> <tom.augspurger88 at gmail.com>
>> >> wrote:
>> >>
>> >> Yes, recommending merging instead of rebasing seems OK now that Github
>> >> has
>> >> squash on merge.
>> >>
>> >> Tom
>> >>
>> >> On Mon, Nov 13, 2017 at 3:58 PM, Stephan Hoyer <shoyer at gmail.com>
>> >> wrote:
>> >>>
>> >>> +1 for merging instead of rebasing. Not losing comment history in PRs
>> >>> is
>> >>> a major bonus, and the end result (when we squash on merge) is
>> >>> basically
>> >>> looks the same.
>> >>>
>> >>> In fact, I would say with this workflow GitHub almost works as well as
>> >>> Gerrit ;).
>> >>>
>> >>> On Mon, Nov 13, 2017 at 1:54 PM Joris Van den Bossche
>> >>> <jorisvandenbossche at gmail.com> wrote:
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> Currently when PRs get outdated, we often ask to "rebase and update"
>> >>>> (although to be honest, it mainly Jeff that is doing most of this
>> >>>> work to
>> >>>> ping stale PRs), and rebasing is also how it is explained in the docs
>> >>>>
>> >>>> (http://pandas-docs.github.io/pandas-docs-travis/contributing.html#creating-a-branch).
>> >>>> And also many active contributors use rebasing while working on a PR
>> >>>> to
>> >>>> get in sync with changes in master.
>> >>>>
>> >>>> I would like to propose to change this policy from rebasing to
>> >>>> merging
>> >>>> (= merging master in the feature branch, creating a merge commit).
>> >>>>
>> >>>> Some reasons for this:
>> >>>>
>> >>>> - I personally think this is easier to do (certainly for less
>> >>>> experienced git users; conflicts can be easier to solve, ..)
>> >>>> - It makes it easier to follow what has changed (certainly if we
>> >>>> extend
>> >>>> 'not rebasing' with 'not squash rebasing'), making it easier review
>> >>>> - It doesn't destroy links to github PR comments
>> >>>> - Since we squash on merge in the end, we don't care about the
>> >>>> additional merge commits in the PR's history
>> >>>>
>> >>>> Thoughts on this?
>> >>>>
>> >>>> If we would agree on this, that would mean: update the docs + start
>> >>>> doing it ourselves + start asking that of contributors consistently.
>> >>>>
>> >>>> Regards,
>> >>>> Joris
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> Pandas-dev mailing list
>> >>>> Pandas-dev at python.org
>> >>>> https://mail.python.org/mailman/listinfo/pandas-dev
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Pandas-dev mailing list
>> >>> Pandas-dev at python.org
>> >>> https://mail.python.org/mailman/listinfo/pandas-dev
>> >>>
>> >>
>> >> _______________________________________________
>> >> Pandas-dev mailing list
>> >> Pandas-dev at python.org
>> >> https://mail.python.org/mailman/listinfo/pandas-dev
>> >>
>> >>
>> >> _______________________________________________
>> >> Pandas-dev mailing list
>> >> Pandas-dev at python.org
>> >> https://mail.python.org/mailman/listinfo/pandas-dev
>> >>
>> >
>> >
>> > _______________________________________________
>> > Pandas-dev mailing list
>> > Pandas-dev at python.org
>> > https://mail.python.org/mailman/listinfo/pandas-dev
>> >
>
>

From shoyer at gmail.com  Tue Nov 14 11:46:23 2017
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 14 Nov 2017 16:46:23 +0000
Subject: [Pandas-dev] Proposal to change PR update policy from rebase to
 merge
In-Reply-To: <CAJPUwMD1JGb2-TO3x2HKOQQ-Jykussc9jjdXMoH5AdTbaS3_=A@mail.gmail.com>
References: <CALQtMBZpAH706Qe+6vvomyv0HFQM6usPGVKdLSmhW4G+Z4ifaw@mail.gmail.com>
 <CAEQ_TvcL3VA=YVCu1dP=wu48k9+CH1kLASz1CzbXmEMEaO6D-g@mail.gmail.com>
 <CAE1aY-nh4UhJw2AHebGbednsxALFFrNRU91+CFBNAGjH_wQBQQ@mail.gmail.com>
 <DC094164-987C-4038-90DC-C490A7ED2CB0@gmail.com>
 <CALQtMBY4P+ZCsvZFepPe9Lyu5awNG2_d=uUgP09dqdhNo9r5_A@mail.gmail.com>
 <CAJPUwMBccaecznhegtmJoBBE1BxwGLOmRZT3NWGya_U0wnK26Q@mail.gmail.com>
 <CALQtMBZq04N1LRaqDjU7XbK44xsKEdRpGmBxGwzCVo02FiHTug@mail.gmail.com>
 <CAJPUwMD1JGb2-TO3x2HKOQQ-Jykussc9jjdXMoH5AdTbaS3_=A@mail.gmail.com>
Message-ID: <CAEQ_Tvew+sqkm+o_7XoSxiLBf8wmHpd=Rg070fQy4VRGKmKHhg@mail.gmail.com>

The GitHub pull request model does have some virtues: there is a 1-1
relationship between git commits on a branch and what you see on the pull
request. This makes it easier to review/edit a change's history without the
review tool, and for multiple users to work on a single changes at once --
as long as contributors stick to adding new commits and merging in updates
from master instead of rebasing. The PR interface itself is designed around
this, e.g., you can click "Changes since my last review" (or any particular
change), get email notifications when new commits are pushed, etc.

It would be nice to have the option of Gerrit's single commit model, but as
long as we are stuck on GitHub we should use the process that works well
with the tool.

On Tue, Nov 14, 2017 at 7:56 AM Wes McKinney <wesmckinn at gmail.com> wrote:

> Having now experienced Gerrit and other professional code review
> tools, it's very hard for me to see GitHub's review in anything but a
> negative light. I haven't found any version of contributor behavior to
> make incremental reviews any easier -- this "has the contributor
> updated the PR" is a problem that is completely solved / a non-issue
> in tools like Gerrit (the other issues you cited also do not exist in
> Gerrit). Since I'm not actively maintaining pandas PRs, the solution
> that makes your lives as maintainers easiest is OK by me.
>
> It's too bad we're in a position of advocating "messy" PR branches in
> order to improve the UX of code reviews in GitHub. FWIW, I talked with
> GitHub employees in person about these exact sort of problems in
> October and I don't think it's likely to improve anytime soon.
>
> - Wes
>
> On Tue, Nov 14, 2017 at 9:49 AM, Joris Van den Bossche
> <jorisvandenbossche at gmail.com> wrote:
> > Another advantage (IMO) that I forgot:
> >
> > - when a contributor adds a commit to a branch instead of
> > squash+rebasing/amending new additions, you get notified of that by
> github.
> > In that way, I know the contributor has actually pushed updates related
> to
> > my review, and I know I have to look again at the PR (otherwise I have to
> > check the PR from time to time to see if the contributor pushed new
> > changes).
> >
> > 2017-11-14 5:34 GMT+01:00 Wes McKinney <wesmckinn at gmail.com>:
> >>
> >> I think as long as clean atomic commits end up in master (which the
> >> merge/squash tool takes care of) then whatever is convenient for the
> >> contributor is fine. I personally prefer clean rebases but some users
> >> struggle with rebasing.
> >
> >
> > My preference to not rebase is not only from a contributor point of view,
> > but mainly from reviewer point of view.
> > So my point is basically that, for me personally, this "whathever is
> > convenient for the contributor" is not fine for me as a reviewer.
> >
> > But of course, if the different reviewers don't share this preference, we
> > can't ask something specific from the contributors. That's the reason I
> > opened this discussion to see if there would be agreement.
> >
> >>
> >> - Wes
> >>
> >> On Mon, Nov 13, 2017 at 5:49 PM, Joris Van den Bossche
> >> <jorisvandenbossche at gmail.com> wrote:
> >> > 2017-11-13 23:05 GMT+01:00 Jeff Reback <jeffreback at gmail.com>:
> >> >>
> >> >> i don?t care what people actually do in there own branches;
> >> >
> >> >
> >> > The point is a bit that I actually do care about what you do in your
> >> > branches. I would find it easier for the PRs I am reviewing that
> people
> >> > would add commits (and merge master to sync with latest changes) than
> to
> >> > rebase and amend or squash commits, or add commit + rebase.
> >> >
> >> >>
> >> >> though i find rebase much easier to read
> >> >>
> >> >> as long as github squashes then it?s fine
> >> >>
> >> >> the issue is that when i need to look at there branches locally they
> >> >> are
> >> >> always a mess and very hard to follow
> >> >
> >> >
> >> > Can you give an example of what is hard? Eg if the branch is out of
> >> > date, I
> >> > typically just merge master in it.
> >> >
> >> >>
> >> >> i would still recommend rebasing
> >> >>
> >> >> On Nov 13, 2017, at 5:02 PM, Tom Augspurger
> >> >> <tom.augspurger88 at gmail.com>
> >> >> wrote:
> >> >>
> >> >> Yes, recommending merging instead of rebasing seems OK now that
> Github
> >> >> has
> >> >> squash on merge.
> >> >>
> >> >> Tom
> >> >>
> >> >> On Mon, Nov 13, 2017 at 3:58 PM, Stephan Hoyer <shoyer at gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> +1 for merging instead of rebasing. Not losing comment history in
> PRs
> >> >>> is
> >> >>> a major bonus, and the end result (when we squash on merge) is
> >> >>> basically
> >> >>> looks the same.
> >> >>>
> >> >>> In fact, I would say with this workflow GitHub almost works as well
> as
> >> >>> Gerrit ;).
> >> >>>
> >> >>> On Mon, Nov 13, 2017 at 1:54 PM Joris Van den Bossche
> >> >>> <jorisvandenbossche at gmail.com> wrote:
> >> >>>>
> >> >>>> Hi all,
> >> >>>>
> >> >>>> Currently when PRs get outdated, we often ask to "rebase and
> update"
> >> >>>> (although to be honest, it mainly Jeff that is doing most of this
> >> >>>> work to
> >> >>>> ping stale PRs), and rebasing is also how it is explained in the
> docs
> >> >>>>
> >> >>>> (
> http://pandas-docs.github.io/pandas-docs-travis/contributing.html#creating-a-branch
> ).
> >> >>>> And also many active contributors use rebasing while working on a
> PR
> >> >>>> to
> >> >>>> get in sync with changes in master.
> >> >>>>
> >> >>>> I would like to propose to change this policy from rebasing to
> >> >>>> merging
> >> >>>> (= merging master in the feature branch, creating a merge commit).
> >> >>>>
> >> >>>> Some reasons for this:
> >> >>>>
> >> >>>> - I personally think this is easier to do (certainly for less
> >> >>>> experienced git users; conflicts can be easier to solve, ..)
> >> >>>> - It makes it easier to follow what has changed (certainly if we
> >> >>>> extend
> >> >>>> 'not rebasing' with 'not squash rebasing'), making it easier review
> >> >>>> - It doesn't destroy links to github PR comments
> >> >>>> - Since we squash on merge in the end, we don't care about the
> >> >>>> additional merge commits in the PR's history
> >> >>>>
> >> >>>> Thoughts on this?
> >> >>>>
> >> >>>> If we would agree on this, that would mean: update the docs + start
> >> >>>> doing it ourselves + start asking that of contributors
> consistently.
> >> >>>>
> >> >>>> Regards,
> >> >>>> Joris
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> _______________________________________________
> >> >>>> Pandas-dev mailing list
> >> >>>> Pandas-dev at python.org
> >> >>>> https://mail.python.org/mailman/listinfo/pandas-dev
> >> >>>
> >> >>>
> >> >>> _______________________________________________
> >> >>> Pandas-dev mailing list
> >> >>> Pandas-dev at python.org
> >> >>> https://mail.python.org/mailman/listinfo/pandas-dev
> >> >>>
> >> >>
> >> >> _______________________________________________
> >> >> Pandas-dev mailing list
> >> >> Pandas-dev at python.org
> >> >> https://mail.python.org/mailman/listinfo/pandas-dev
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Pandas-dev mailing list
> >> >> Pandas-dev at python.org
> >> >> https://mail.python.org/mailman/listinfo/pandas-dev
> >> >>
> >> >
> >> >
> >> > _______________________________________________
> >> > Pandas-dev mailing list
> >> > Pandas-dev at python.org
> >> > https://mail.python.org/mailman/listinfo/pandas-dev
> >> >
> >
> >
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171114/15f903b7/attachment-0001.html>

From tom.augspurger88 at gmail.com  Thu Nov 16 09:22:32 2017
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Thu, 16 Nov 2017 08:22:32 -0600
Subject: [Pandas-dev] Label and Milestone cleanup
Message-ID: <CAE1aY-=10D8QRmP3e4MgZFfb_WniTT-Z1LS7Xw9fpet-2FTdLg@mail.gmail.com>

We ran out of time to discuss this in the dev meeting. I want to clean up
our Github labels and milestones.

## Milestones

I would like to have 4 real milestones:

1. "Next Major Release" for API breaking changes that we'd like to do
eventually
2. "Next Release" For bugfix / non-API breaking changes that we'd like to
do eventually
3. "0.22.0" (i.e. the actual next major release) for PRs that should go
into a specific release
4. "0.21.1" (i.e. the actual next minor release) for PRs that should go
into a specific release

The main change is the "Next Release", which I hope will clear up confusion
about what file the release notes should go in. I think we can remove
"Interesting Issues", remove "High Level Issue Tracking", and move "won't
fix" into "No action".  I'm not sure about "1.0" and "2.0" and "Someday",
but maybe consolidate those.

## Labels

I would like to

- Start tagging "Easy" issues with "good first issue". Github gives some
prominence to this tag in their UI.
- Remove some of the less frequently used issues like "Closed PR, Multi
Dimensional, etc."
- Start using the "Needs Info" tag more often for incomplete bug reports,
and regularly close issues that have been tagged as "Needs Info" and not
updated in more that a couple weeks.

Thoughts? Objections?


Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171116/48e687d1/attachment.html>

From gfyoung17 at gmail.com  Thu Nov 16 12:29:35 2017
From: gfyoung17 at gmail.com (G Young)
Date: Thu, 16 Nov 2017 09:29:35 -0800
Subject: [Pandas-dev] Label and Milestone cleanup
In-Reply-To: <CAE1aY-=10D8QRmP3e4MgZFfb_WniTT-Z1LS7Xw9fpet-2FTdLg@mail.gmail.com>
References: <CAE1aY-=10D8QRmP3e4MgZFfb_WniTT-Z1LS7Xw9fpet-2FTdLg@mail.gmail.com>
Message-ID: <CAJ1_J5h13VgyW9W0Ka=5uq3Q3Pmj7UUBOcRn=cb1abk9Wh1dPQ@mail.gmail.com>

IMO "Won't fix" and "No action" were both good indicators of deliberate
non-action on a PR / issue.  We can consolidate those, but I wouldn't
remove both.

"Someday" doesn't look like it's used anymore nowadays, so I wouldn't mind
removing that.

As for tagging in general, I think our classification are a little
incomplete (e.g. there is no tag for general "DataFrame" issues or their
methods).

On Thu, Nov 16, 2017 at 6:22 AM, Tom Augspurger <tom.augspurger88 at gmail.com>
wrote:

> We ran out of time to discuss this in the dev meeting. I want to clean up
> our Github labels and milestones.
>
> ## Milestones
>
> I would like to have 4 real milestones:
>
> 1. "Next Major Release" for API breaking changes that we'd like to do
> eventually
> 2. "Next Release" For bugfix / non-API breaking changes that we'd like to
> do eventually
> 3. "0.22.0" (i.e. the actual next major release) for PRs that should go
> into a specific release
> 4. "0.21.1" (i.e. the actual next minor release) for PRs that should go
> into a specific release
>
> The main change is the "Next Release", which I hope will clear up
> confusion about what file the release notes should go in. I think we can
> remove "Interesting Issues", remove "High Level Issue Tracking", and move
> "won't fix" into "No action".  I'm not sure about "1.0" and "2.0" and
> "Someday", but maybe consolidate those.
>
> ## Labels
>
> I would like to
>
> - Start tagging "Easy" issues with "good first issue". Github gives some
> prominence to this tag in their UI.
> - Remove some of the less frequently used issues like "Closed PR, Multi
> Dimensional, etc."
> - Start using the "Needs Info" tag more often for incomplete bug reports,
> and regularly close issues that have been tagged as "Needs Info" and not
> updated in more that a couple weeks.
>
> Thoughts? Objections?
>
>
> Tom
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171116/3ad57984/attachment.html>

From jeffreback at gmail.com  Thu Nov 16 18:43:56 2017
From: jeffreback at gmail.com (Jeff Reback)
Date: Thu, 16 Nov 2017 18:43:56 -0500
Subject: [Pandas-dev] Label and Milestone cleanup
In-Reply-To: <CAJ1_J5h13VgyW9W0Ka=5uq3Q3Pmj7UUBOcRn=cb1abk9Wh1dPQ@mail.gmail.com>
References: <CAE1aY-=10D8QRmP3e4MgZFfb_WniTT-Z1LS7Xw9fpet-2FTdLg@mail.gmail.com>
 <CAJ1_J5h13VgyW9W0Ka=5uq3Q3Pmj7UUBOcRn=cb1abk9Wh1dPQ@mail.gmail.com>
Message-ID: <56D136E9-4CDA-4283-9A2D-C5B1270254F1@gmail.com>

someday can go

ok with your tags/milestones otherwise 

though leave the interesting ones  - these are ones that i tagged - will move them after we realign things (then i will remove)


> On Nov 16, 2017, at 12:29 PM, G Young <gfyoung17 at gmail.com> wrote:
> 
> IMO "Won't fix" and "No action" were both good indicators of deliberate non-action on a PR / issue.  We can consolidate those, but I wouldn't remove both.
> 
> "Someday" doesn't look like it's used anymore nowadays, so I wouldn't mind removing that.
> 
> As for tagging in general, I think our classification are a little incomplete (e.g. there is no tag for general "DataFrame" issues or their methods).
> 
>> On Thu, Nov 16, 2017 at 6:22 AM, Tom Augspurger <tom.augspurger88 at gmail.com> wrote:
>> We ran out of time to discuss this in the dev meeting. I want to clean up our Github labels and milestones.
>> 
>> ## Milestones
>> 
>> I would like to have 4 real milestones:
>> 
>> 1. "Next Major Release" for API breaking changes that we'd like to do eventually
>> 2. "Next Release" For bugfix / non-API breaking changes that we'd like to do eventually
>> 3. "0.22.0" (i.e. the actual next major release) for PRs that should go into a specific release
>> 4. "0.21.1" (i.e. the actual next minor release) for PRs that should go into a specific release
>> 
>> The main change is the "Next Release", which I hope will clear up confusion about what file the release notes should go in. I think we can remove "Interesting Issues", remove "High Level Issue Tracking", and move "won't fix" into "No action".  I'm not sure about "1.0" and "2.0" and "Someday", but maybe consolidate those.
>> 
>> ## Labels
>> 
>> I would like to
>> 
>> - Start tagging "Easy" issues with "good first issue". Github gives some prominence to this tag in their UI.
>> - Remove some of the less frequently used issues like "Closed PR, Multi Dimensional, etc."
>> - Start using the "Needs Info" tag more often for incomplete bug reports, and regularly close issues that have been tagged as "Needs Info" and not updated in more that a couple weeks.
>> 
>> Thoughts? Objections?
>> 
>> 
>> Tom
>> 
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>> 
> 
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171116/85251794/attachment.html>

From jorisvandenbossche at gmail.com  Fri Nov 17 09:49:13 2017
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Fri, 17 Nov 2017 15:49:13 +0100
Subject: [Pandas-dev] Label and Milestone cleanup
In-Reply-To: <56D136E9-4CDA-4283-9A2D-C5B1270254F1@gmail.com>
References: <CAE1aY-=10D8QRmP3e4MgZFfb_WniTT-Z1LS7Xw9fpet-2FTdLg@mail.gmail.com>
 <CAJ1_J5h13VgyW9W0Ka=5uq3Q3Pmj7UUBOcRn=cb1abk9Wh1dPQ@mail.gmail.com>
 <56D136E9-4CDA-4283-9A2D-C5B1270254F1@gmail.com>
Message-ID: <CALQtMBZ=21D=J4R6wsG2+3fcoZekMT+MVKHD=Q0-ue7wYSBNoQ@mail.gmail.com>

Already answering for the Milestones:

## Milestones
>>
>> I would like to have 4 real milestones:
>>
>> 1. "Next Major Release" for API breaking changes that we'd like to do
>> eventually
>> 2. "Next Release" For bugfix / non-API breaking changes that we'd like to
>> do eventually
>> 3. "0.22.0" (i.e. the actual next major release) for PRs that should go
>> into a specific release
>> 4. "0.21.1" (i.e. the actual next minor release) for PRs that should go
>> into a specific release
>>
>> The main change is the "Next Release", which I hope will clear up
>> confusion about what file the release notes should go in.
>>
>
Would the issues in "Next release" be those issues that, if somebody does a
PR, can be included in eg 0.21.1, but are not important to tag them as such?


> I think we can remove "Interesting Issues",
>>
>

> Jeff: though leave the interesting ones  - these are ones that i tagged -
> will move them after we realign things (then i will remove)
>

You can make a "Jeff's interesting issues" project, and add them to that. I
think that is a better use (you still can look at the list of issues you
want to specifically tag for attention) and it keeps the Milestones clean
for actual milestones.


> remove "High Level Issue Tracking",
>>
>
+ 1 since we a "master issue" label as well.


> and move "won't fix" into "No action".  I'm not sure about "1.0" and "2.0"
>> and "Someday", but maybe consolidate those.
>>
>
I personally find the notion of "won't fix" informative, but we could also
give it a label "won't fix" ? (and then use a single "No action" milestone)

I would keep the 1.0 milestone for now.

I think "Someday" was originally intended to give some difference in
prioritization for the core-devs between "Next major release" and
"Someday"? But I think we don't really do it like that, so for me ok to
remove. I would suppose that 'no milestone' would then be that.

Joris

2017-11-17 0:43 GMT+01:00 Jeff Reback <jeffreback at gmail.com>:

> someday can g
>
> ok with your tags/milestones otherwise
>
> though leave the interesting ones  - these are ones that i tagged - will
> move them after we realign things (then i will remove)
>

You can make a "Jeff's interesting issues" project, and add them to that. I
think that is a better use (you still can look at the list of issues) and
it keeps the Milestones clean for actual milestones.


>
>
> On Nov 16, 2017, at 12:29 PM, G Young <gfyoung17 at gmail.com> wrote:
>
> IMO "Won't fix" and "No action" were both good indicators of deliberate
> non-action on a PR / issue.  We can consolidate those, but I wouldn't
> remove both.
>
> "Someday" doesn't look like it's used anymore nowadays, so I wouldn't mind
> removing that.
>
> As for tagging in general, I think our classification are a little
> incomplete (e.g. there is no tag for general "DataFrame" issues or their
> methods).
>
> On Thu, Nov 16, 2017 at 6:22 AM, Tom Augspurger <
> tom.augspurger88 at gmail.com> wrote:
>
>> We ran out of time to discuss this in the dev meeting. I want to clean up
>> our Github labels and milestones.
>>
>> ## Milestones
>>
>> I would like to have 4 real milestones:
>>
>> 1. "Next Major Release" for API breaking changes that we'd like to do
>> eventually
>> 2. "Next Release" For bugfix / non-API breaking changes that we'd like to
>> do eventually
>> 3. "0.22.0" (i.e. the actual next major release) for PRs that should go
>> into a specific release
>> 4. "0.21.1" (i.e. the actual next minor release) for PRs that should go
>> into a specific release
>>
>> The main change is the "Next Release", which I hope will clear up
>> confusion about what file the release notes should go in. I think we can
>> remove "Interesting Issues", remove "High Level Issue Tracking", and move
>> "won't fix" into "No action".  I'm not sure about "1.0" and "2.0" and
>> "Someday", but maybe consolidate those.
>>
>> ## Labels
>>
>> I would like to
>>
>> - Start tagging "Easy" issues with "good first issue". Github gives some
>> prominence to this tag in their UI.
>> - Remove some of the less frequently used issues like "Closed PR, Multi
>> Dimensional, etc."
>> - Start using the "Needs Info" tag more often for incomplete bug reports,
>> and regularly close issues that have been tagged as "Needs Info" and not
>> updated in more that a couple weeks.
>>
>> Thoughts? Objections?
>>
>>
>> Tom
>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171117/5a8ac125/attachment-0001.html>

From jeffreback at gmail.com  Fri Nov 17 10:09:28 2017
From: jeffreback at gmail.com (Jeff Reback)
Date: Fri, 17 Nov 2017 10:09:28 -0500
Subject: [Pandas-dev] Label and Milestone cleanup
In-Reply-To: <CALQtMBZ=21D=J4R6wsG2+3fcoZekMT+MVKHD=Q0-ue7wYSBNoQ@mail.gmail.com>
References: <CAE1aY-=10D8QRmP3e4MgZFfb_WniTT-Z1LS7Xw9fpet-2FTdLg@mail.gmail.com>
 <CAJ1_J5h13VgyW9W0Ka=5uq3Q3Pmj7UUBOcRn=cb1abk9Wh1dPQ@mail.gmail.com>
 <56D136E9-4CDA-4283-9A2D-C5B1270254F1@gmail.com>
 <CALQtMBZ=21D=J4R6wsG2+3fcoZekMT+MVKHD=Q0-ue7wYSBNoQ@mail.gmail.com>
Message-ID: <45D17807-FEE2-4167-82C8-1D8BE050E7DB@gmail.com>

On Nov 17, 2017, at 9:49 AM, Joris Van den Bossche <jorisvandenbossche at gmail.com> wrote:
> 
> Already answering for the Milestones:
> 
>>> ## Milestones
>>> 
>>> I would like to have 4 real milestones:
>>> 
>>> 1. "Next Major Release" for API breaking changes that we'd like to do eventually
>>> 2. "Next Release" For bugfix / non-API breaking changes that we'd like to do eventually
>>> 3. "0.22.0" (i.e. the actual next major release) for PRs that should go into a specific release
>>> 4. "0.21.1" (i.e. the actual next minor release) for PRs that should go into a specific release
>>> 
>>> The main change is the "Next Release", which I hope will clear up confusion about what file the release notes should go in.
> 
> Would the issues in "Next release" be those issues that, if somebody does a PR, can be included in eg 0.21.1, but are not important to tag them as such?
> 
Here?s the issue (pun intended). We cannot tag things for a specific milestone unless they are close to being merged; because things get stale, not worked on etc

it is a heavy burden to then move all things from a specific milestone to the next one (and it?s just plain confusing)

basically we have 3 pots

issues for a specific major milestone
issues for a specific minor milestone
everything else


>  
>>> I think we can remove "Interesting Issues",
>  
>> Jeff: though leave the interesting ones  - these are ones that i tagged - will move them after we realign things (then i will remove)
> 
> You can make a "Jeff's interesting issues" project, and add them to that. I think that is a better use (you still can look at the list of issues you want to specifically tag for attention) and it keeps the Milestones clean for actual milestones.
>  
>>> remove "High Level Issue Tracking",
> 
> + 1 since we a "master issue" label as well.
>  
>>> and move "won't fix" into "No action".  I'm not sure about "1.0" and "2.0" and "Someday", but maybe consolidate those.
> 
> I personally find the notion of "won't fix" informative, but we could also give it a label "won't fix" ? (and then use a single "No action" milestone)
> 
> I would keep the 1.0 milestone for now.
> 
> I think "Someday" was originally intended to give some difference in prioritization for the core-devs between "Next major release" and "Someday"? But I think we don't really do it like that, so for me ok to remove. I would suppose that 'no milestone' would then be that.
> 
> Joris
> 
> 2017-11-17 0:43 GMT+01:00 Jeff Reback <jeffreback at gmail.com>:
>> someday can g
>> 
>> ok with your tags/milestones otherwise 
>> 
>> though leave the interesting ones  - these are ones that i tagged - will move them after we realign things (then i will remove)
> 
> You can make a "Jeff's interesting issues" project, and add them to that. I think that is a better use (you still can look at the list of issues) and it keeps the Milestones clean for actual milestones.
>  
>> 
>> 
>>> On Nov 16, 2017, at 12:29 PM, G Young <gfyoung17 at gmail.com> wrote:
>>> 
>>> IMO "Won't fix" and "No action" were both good indicators of deliberate non-action on a PR / issue.  We can consolidate those, but I wouldn't remove both.
>>> 
>>> "Someday" doesn't look like it's used anymore nowadays, so I wouldn't mind removing that.
>>> 
>>> As for tagging in general, I think our classification are a little incomplete (e.g. there is no tag for general "DataFrame" issues or their methods).
>>> 
>>>> On Thu, Nov 16, 2017 at 6:22 AM, Tom Augspurger <tom.augspurger88 at gmail.com> wrote:
>>>> We ran out of time to discuss this in the dev meeting. I want to clean up our Github labels and milestones.
>>>> 
>>>> ## Milestones
>>>> 
>>>> I would like to have 4 real milestones:
>>>> 
>>>> 1. "Next Major Release" for API breaking changes that we'd like to do eventually
>>>> 2. "Next Release" For bugfix / non-API breaking changes that we'd like to do eventually
>>>> 3. "0.22.0" (i.e. the actual next major release) for PRs that should go into a specific release
>>>> 4. "0.21.1" (i.e. the actual next minor release) for PRs that should go into a specific release
>>>> 
>>>> The main change is the "Next Release", which I hope will clear up confusion about what file the release notes should go in. I think we can remove "Interesting Issues", remove "High Level Issue Tracking", and move "won't fix" into "No action".  I'm not sure about "1.0" and "2.0" and "Someday", but maybe consolidate those.
>>>> 
>>>> ## Labels
>>>> 
>>>> I would like to
>>>> 
>>>> - Start tagging "Easy" issues with "good first issue". Github gives some prominence to this tag in their UI.
>>>> - Remove some of the less frequently used issues like "Closed PR, Multi Dimensional, etc."
>>>> - Start using the "Needs Info" tag more often for incomplete bug reports, and regularly close issues that have been tagged as "Needs Info" and not updated in more that a couple weeks.
>>>> 
>>>> Thoughts? Objections?
>>>> 
>>>> 
>>>> Tom
>>>> 
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>> 
>>> 
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>> 
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171117/90cd6290/attachment.html>

From aivar.annamaa at gmail.com  Sat Nov 18 03:21:26 2017
From: aivar.annamaa at gmail.com (Aivar Annamaa)
Date: Sat, 18 Nov 2017 10:21:26 +0200
Subject: [Pandas-dev] Pandas lite
Message-ID: <f88793b9-3fc3-77fc-94be-b66d0f7b7b90@gmail.com>

Hi!

I'm going to teach an introduction to pandas to Python newbies, and I'm 
looking for ways to simplify the the view to the API and/or avoid some 
of the pitfalls.

I'd like to identify a minimal set of methods/operations, which are 
enough for performing most common tasks with simply-indexed data 
(importing/exporting from csv/Excel, selecting rows and columns by 
index, boolean indexing of the rows, creating new columns, simple 
group-by and aggregations, simple plotting, maybe also simple joins) and 
which have minimal potential for surprises (unexpected copies, 
unexpected views, confusing warnings, differences with indexing with 
lists vs tuples etc). Maybe even allowing only "pure" transformations a 
la relational algebra? We could call it an opinionated and restricted 
usage-scheme of pandas.

The students would use this subset of the API until they gain enough 
experience to meet the hairier face of pandas.

Has anybody tried marking a subset of pandas API for some reasons?

I was also thinking about how to enforce the boundaries of this subset:

  * Just suggest students to stick with it.
  * Provide a static analysis which disallows (or warns against) the
    operations/tricks outside the boundaries.
  * a wrapper library (eg. import pandaslite as pd) which wraps required
    pandas classes into similar classes which publish only a subset of
    the pandas capabilities and perform some extra checks (eg. disallow
    duplicates in the index). When the students grow tough enough or
    need more power, they would simply replace "import pandaslite as pd"
    with "import pandas as pd" in their code.

At the moment I'm considering experimenting with the third approach.

I'd be glad to hear your comments!

best regards,
Aivar Annamaa

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171118/46d4e44a/attachment.html>

From tom.augspurger88 at gmail.com  Sun Nov 19 08:04:32 2017
From: tom.augspurger88 at gmail.com (Tom Augspurger)
Date: Sun, 19 Nov 2017 07:04:32 -0600
Subject: [Pandas-dev] Pandas lite
In-Reply-To: <f88793b9-3fc3-77fc-94be-b66d0f7b7b90@gmail.com>
References: <f88793b9-3fc3-77fc-94be-b66d0f7b7b90@gmail.com>
Message-ID: <CAE1aY-kjFhB0y1gtC5S6=BPeLnYwiWEYxwtVdPm4aSGiQ=mUhA@mail.gmail.com>

I'm not aware of any attempts to do that. Personally, I would recommend the
first option as it's the least amount of work,
and the least likely to force them to unlearn anything.

I suppose another option would be to have them import a module that just
monkey patches what shows up in dir(Series/DataFrame)
so that tab-completion is less overwhelming.

Good luck.

- Tom

On Sat, Nov 18, 2017 at 2:21 AM, Aivar Annamaa <aivar.annamaa at gmail.com>
wrote:

> Hi!
>
> I'm going to teach an introduction to pandas to Python newbies, and I'm
> looking for ways to simplify the the view to the API and/or avoid some of
> the pitfalls.
>
> I'd like to identify a minimal set of methods/operations, which are enough
> for performing most common tasks with simply-indexed data
> (importing/exporting from csv/Excel, selecting rows and columns by index,
> boolean indexing of the rows, creating new columns, simple group-by and
> aggregations, simple plotting, maybe also simple joins) and which have
> minimal potential for surprises (unexpected copies, unexpected views,
> confusing warnings, differences with indexing with lists vs tuples etc).
> Maybe even allowing only "pure" transformations a la relational algebra? We
> could call it an opinionated and restricted usage-scheme of pandas.
>
> The students would use this subset of the API until they gain enough
> experience to meet the hairier face of pandas.
>
> Has anybody tried marking a subset of pandas API for some reasons?
>
> I was also thinking about how to enforce the boundaries of this subset:
>
>    - Just suggest students to stick with it.
>    - Provide a static analysis which disallows (or warns against) the
>    operations/tricks outside the boundaries.
>    - a wrapper library (eg. import pandaslite as pd) which wraps required
>    pandas classes into similar classes which publish only a subset of the
>    pandas capabilities and perform some extra checks (eg. disallow duplicates
>    in the index). When the students grow tough enough or need more power, they
>    would simply replace "import pandaslite as pd" with "import pandas as pd"
>    in their code.
>
> At the moment I'm considering experimenting with the third approach.
>
> I'd be glad to hear your comments!
>
> best regards,
> Aivar Annamaa
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171119/015ec2ad/attachment.html>

From njs at pobox.com  Sun Nov 19 08:47:12 2017
From: njs at pobox.com (Nathaniel Smith)
Date: Sun, 19 Nov 2017 05:47:12 -0800
Subject: [Pandas-dev] Pandas lite
In-Reply-To: <f88793b9-3fc3-77fc-94be-b66d0f7b7b90@gmail.com>
References: <f88793b9-3fc3-77fc-94be-b66d0f7b7b90@gmail.com>
Message-ID: <CAPJVwBnVmjtou72=B-QXtbBvCDaN4enJpPq-G_BGj43C4qERjA@mail.gmail.com>

The 'datascience' package is an attempt to solve this problem -- it's
a simplified wrapper around pandas written for use in the intro to
data science sequence at UC Berkeley: http://data8.org/datascience/

I'm not involved in the course or that package myself, so I can't say
how well it's worked. It's definitely worth checking out as prior art,
though, and you might find it useful to contact the authors to compare
notes.

-n

On Sat, Nov 18, 2017 at 12:21 AM, Aivar Annamaa <aivar.annamaa at gmail.com> wrote:
> Hi!
>
> I'm going to teach an introduction to pandas to Python newbies, and I'm
> looking for ways to simplify the the view to the API and/or avoid some of
> the pitfalls.
>
> I'd like to identify a minimal set of methods/operations, which are enough
> for performing most common tasks with simply-indexed data
> (importing/exporting from csv/Excel, selecting rows and columns by index,
> boolean indexing of the rows, creating new columns, simple group-by and
> aggregations, simple plotting, maybe also simple joins) and which have
> minimal potential for surprises (unexpected copies, unexpected views,
> confusing warnings, differences with indexing with lists vs tuples etc).
> Maybe even allowing only "pure" transformations a la relational algebra? We
> could call it an opinionated and restricted usage-scheme of pandas.
>
> The students would use this subset of the API until they gain enough
> experience to meet the hairier face of pandas.
>
> Has anybody tried marking a subset of pandas API for some reasons?
>
> I was also thinking about how to enforce the boundaries of this subset:
>
> Just suggest students to stick with it.
> Provide a static analysis which disallows (or warns against) the
> operations/tricks outside the boundaries.
> a wrapper library (eg. import pandaslite as pd) which wraps required pandas
> classes into similar classes which publish only a subset of the pandas
> capabilities and perform some extra checks (eg. disallow duplicates in the
> index). When the students grow tough enough or need more power, they would
> simply replace "import pandaslite as pd" with "import pandas as pd" in their
> code.
>
> At the moment I'm considering experimenting with the third approach.
>
> I'd be glad to hear your comments!
>
> best regards,
> Aivar Annamaa
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>


-- 
Nathaniel J. Smith -- https://vorpus.org

From pmhobson at gmail.com  Mon Nov 27 20:21:39 2017
From: pmhobson at gmail.com (Paul Hobson)
Date: Mon, 27 Nov 2017 17:21:39 -0800
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
Message-ID: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>

Hey folks,

I noticed that DataFrame.select is now deprecated in favor of
DataFrame.loc[index.map(selector_fxn)]

PR: https://github.com/pandas-dev/pandas/pull/17633
Issue: https://github.com/pandas-dev/pandas/issues/12401

I have a lot of work flows that look something like this:

    res = (
        data.resample(freq)
            .agg(agg_dict)
            .pipe(fxn_that_adds_many_cols)
            .select(complex_fxn_that_selects_a_few_cols, axis='columns')
    )

It's not immediately clear to me how to access all of the e.g., columns in
the middle or at the end of a chain of dataframe operations.

Any tips?

-Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171127/c25a3fc1/attachment.html>

From jorisvandenbossche at gmail.com  Tue Nov 28 05:31:21 2017
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Tue, 28 Nov 2017 11:31:21 +0100
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
In-Reply-To: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
References: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
Message-ID: <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>

Hi Paul,

That's a good question. I think you can do it with a lambda function, like
this:

(data.
     ... (full pipeline)
     .loc[:, lambda df: complex_fxn_that_selects_a_few_cols(df.columns)]
)

Does that work?

But personally I am not sure if I find this really an usability improvement
compared to the select method.

Best,
Joris


2017-11-28 2:21 GMT+01:00 Paul Hobson <pmhobson at gmail.com>:

> Hey folks,
>
> I noticed that DataFrame.select is now deprecated in favor of
> DataFrame.loc[index.map(selector_fxn)]
>
> PR: https://github.com/pandas-dev/pandas/pull/17633
> Issue: https://github.com/pandas-dev/pandas/issues/12401
>
> I have a lot of work flows that look something like this:
>
>     res = (
>         data.resample(freq)
>             .agg(agg_dict)
>             .pipe(fxn_that_adds_many_cols)
>             .select(complex_fxn_that_selects_a_few_cols, axis='columns')
>     )
>
> It's not immediately clear to me how to access all of the e.g., columns in
> the middle or at the end of a chain of dataframe operations.
>
> Any tips?
>
> -Paul
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171128/b40af475/attachment.html>

From pmhobson at gmail.com  Tue Nov 28 11:52:17 2017
From: pmhobson at gmail.com (Paul Hobson)
Date: Tue, 28 Nov 2017 08:52:17 -0800
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
In-Reply-To: <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>
References: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
 <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>
Message-ID: <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>

Joris,

Thanks for the nudge. I didn't understand that the callable could be passed
the entire dataframe. That's what I needed.

I'll miss the .select() method when it's gone, but it appears my use cases
are covered.

Cheers,

-Paul

On Tue, Nov 28, 2017 at 2:31 AM, Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

> Hi Paul,
>
> That's a good question. I think you can do it with a lambda function, like
> this:
>
> (data.
>      ... (full pipeline)
>      .loc[:, lambda df: complex_fxn_that_selects_a_few_cols(df.columns)]
> )
>
> Does that work?
>
> But personally I am not sure if I find this really an usability
> improvement compared to the select method.
>
> Best,
> Joris
>
>
>
> 2017-11-28 2:21 GMT+01:00 Paul Hobson <pmhobson at gmail.com>:
>
>> Hey folks,
>>
>> I noticed that DataFrame.select is now deprecated in favor of
>> DataFrame.loc[index.map(selector_fxn)]
>>
>> PR: https://github.com/pandas-dev/pandas/pull/17633
>> Issue: https://github.com/pandas-dev/pandas/issues/12401
>>
>> I have a lot of work flows that look something like this:
>>
>>     res = (
>>         data.resample(freq)
>>             .agg(agg_dict)
>>             .pipe(fxn_that_adds_many_cols)
>>             .select(complex_fxn_that_selects_a_few_cols, axis='columns')
>>     )
>>
>> It's not immediately clear to me how to access all of the e.g., columns
>> in the middle or at the end of a chain of dataframe operations.
>>
>> Any tips?
>>
>> -Paul
>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171128/e22bf75c/attachment.html>

From pmhobson at gmail.com  Tue Nov 28 11:56:35 2017
From: pmhobson at gmail.com (Paul Hobson)
Date: Tue, 28 Nov 2017 08:56:35 -0800
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
In-Reply-To: <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>
References: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
 <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>
 <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>
Message-ID: <CADT3MEDGDQg7yAFkyy3YYmNxTFBVUR6VmF+ivbj826FPNtNXAw@mail.gmail.com>

Follow-up question for the whole group:

My recollection is that .loc returns a slice, but .select returns a copy.
Is this correct? Are there any implications of that distinction with long,
chained workflows switching away from .select?

-Paul

On Tue, Nov 28, 2017 at 8:52 AM, Paul Hobson <pmhobson at gmail.com> wrote:

> Joris,
>
> Thanks for the nudge. I didn't understand that the callable could be
> passed the entire dataframe. That's what I needed.
>
> I'll miss the .select() method when it's gone, but it appears my use cases
> are covered.
>
> Cheers,
>
> -Paul
>
> On Tue, Nov 28, 2017 at 2:31 AM, Joris Van den Bossche <
> jorisvandenbossche at gmail.com> wrote:
>
>> Hi Paul,
>>
>> That's a good question. I think you can do it with a lambda function,
>> like this:
>>
>> (data.
>>      ... (full pipeline)
>>      .loc[:, lambda df: complex_fxn_that_selects_a_few_cols(df.columns)]
>> )
>>
>> Does that work?
>>
>> But personally I am not sure if I find this really an usability
>> improvement compared to the select method.
>>
>> Best,
>> Joris
>>
>>
>>
>> 2017-11-28 2:21 GMT+01:00 Paul Hobson <pmhobson at gmail.com>:
>>
>>> Hey folks,
>>>
>>> I noticed that DataFrame.select is now deprecated in favor of
>>> DataFrame.loc[index.map(selector_fxn)]
>>>
>>> PR: https://github.com/pandas-dev/pandas/pull/17633
>>> Issue: https://github.com/pandas-dev/pandas/issues/12401
>>>
>>> I have a lot of work flows that look something like this:
>>>
>>>     res = (
>>>         data.resample(freq)
>>>             .agg(agg_dict)
>>>             .pipe(fxn_that_adds_many_cols)
>>>             .select(complex_fxn_that_selects_a_few_cols, axis='columns')
>>>     )
>>>
>>> It's not immediately clear to me how to access all of the e.g., columns
>>> in the middle or at the end of a chain of dataframe operations.
>>>
>>> Any tips?
>>>
>>> -Paul
>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171128/32f163fb/attachment-0001.html>

From shoyer at gmail.com  Tue Nov 28 13:07:31 2017
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 28 Nov 2017 18:07:31 +0000
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
In-Reply-To: <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>
References: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
 <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>
 <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>
Message-ID: <CAEQ_Tvdazun4_v86qiOC1ckn0vk_yPFKH=mnTeXnA8403NHNRw@mail.gmail.com>

The biggest reason for deprecating DataFrame.select() was that it was
confusingly named. On GroupBy objects, it's equivalent to .filter(). Also,
SELECT in SQL does something very different, more like DataFrame.filter().
If only we could simply switch the names without causing more confusion!

So I think we would potentially be welcome to resurfacing the functionality
if necessary, though probably under a different name. For discussion see
https://github.com/pandas-dev/pandas/issues/12401

On Tue, Nov 28, 2017 at 4:52 PM Paul Hobson <pmhobson at gmail.com> wrote:

> Joris,
>
> Thanks for the nudge. I didn't understand that the callable could be
> passed the entire dataframe. That's what I needed.
>
> I'll miss the .select() method when it's gone, but it appears my use cases
> are covered.
>
> Cheers,
>
> -Paul
>
> On Tue, Nov 28, 2017 at 2:31 AM, Joris Van den Bossche <
> jorisvandenbossche at gmail.com> wrote:
>
>> Hi Paul,
>>
>> That's a good question. I think you can do it with a lambda function,
>> like this:
>>
>> (data.
>>      ... (full pipeline)
>>      .loc[:, lambda df: complex_fxn_that_selects_a_few_cols(df.columns)]
>> )
>>
>> Does that work?
>>
>> But personally I am not sure if I find this really an usability
>> improvement compared to the select method.
>>
>> Best,
>> Joris
>>
>>
>>
>> 2017-11-28 2:21 GMT+01:00 Paul Hobson <pmhobson at gmail.com>:
>>
>>> Hey folks,
>>>
>>> I noticed that DataFrame.select is now deprecated in favor of
>>> DataFrame.loc[index.map(selector_fxn)]
>>>
>>> PR: https://github.com/pandas-dev/pandas/pull/17633
>>> Issue: https://github.com/pandas-dev/pandas/issues/12401
>>>
>>> I have a lot of work flows that look something like this:
>>>
>>>     res = (
>>>         data.resample(freq)
>>>             .agg(agg_dict)
>>>             .pipe(fxn_that_adds_many_cols)
>>>             .select(complex_fxn_that_selects_a_few_cols, axis='columns')
>>>     )
>>>
>>> It's not immediately clear to me how to access all of the e.g., columns
>>> in the middle or at the end of a chain of dataframe operations.
>>>
>>> Any tips?
>>>
>>> -Paul
>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>>>
>>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171128/d3fc3a22/attachment.html>

From pmhobson at gmail.com  Tue Nov 28 13:34:14 2017
From: pmhobson at gmail.com (Paul Hobson)
Date: Tue, 28 Nov 2017 10:34:14 -0800
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
In-Reply-To: <CAEQ_Tvdazun4_v86qiOC1ckn0vk_yPFKH=mnTeXnA8403NHNRw@mail.gmail.com>
References: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
 <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>
 <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>
 <CAEQ_Tvdazun4_v86qiOC1ckn0vk_yPFKH=mnTeXnA8403NHNRw@mail.gmail.com>
Message-ID: <CADT3MEBy5TSNQkX9wCMYbZ6P=L6WK=2kcpgVOZZ+CNLHQARHFQ@mail.gmail.com>

Hey Stephen,

Thanks for the info. While .select on the default axis (index) is indeed
very different than SQL, operating on the columns is very similar (jn my
twisted brain at least).

I totally understand the deprecation, and remember rumblings about poor
performance back in the early days. So I can't even say I'm surprised.

If the devs find time to bring similar functionality back, might I suggest
a name+sig as simple as DataFrame.keep_only(index=None, columns=None).

Just to reiterate: I'm not complaining. I'm just trying to keep up :)

-paul


On Tue, Nov 28, 2017 at 10:07 AM, Stephan Hoyer <shoyer at gmail.com> wrote:

> The biggest reason for deprecating DataFrame.select() was that it was
> confusingly named. On GroupBy objects, it's equivalent to .filter(). Also,
> SELECT in SQL does something very different, more like DataFrame.filter().
> If only we could simply switch the names without causing more confusion!
>
> So I think we would potentially be welcome to resurfacing the
> functionality if necessary, though probably under a different name. For
> discussion see https://github.com/pandas-dev/pandas/issues/12401
>
> On Tue, Nov 28, 2017 at 4:52 PM Paul Hobson <pmhobson at gmail.com> wrote:
>
>> Joris,
>>
>> Thanks for the nudge. I didn't understand that the callable could be
>> passed the entire dataframe. That's what I needed.
>>
>> I'll miss the .select() method when it's gone, but it appears my use
>> cases are covered.
>>
>> Cheers,
>>
>> -Paul
>>
>> On Tue, Nov 28, 2017 at 2:31 AM, Joris Van den Bossche <
>> jorisvandenbossche at gmail.com> wrote:
>>
>>> Hi Paul,
>>>
>>> That's a good question. I think you can do it with a lambda function,
>>> like this:
>>>
>>> (data.
>>>      ... (full pipeline)
>>>      .loc[:, lambda df: complex_fxn_that_selects_a_few_cols(df.columns)]
>>> )
>>>
>>> Does that work?
>>>
>>> But personally I am not sure if I find this really an usability
>>> improvement compared to the select method.
>>>
>>> Best,
>>> Joris
>>>
>>>
>>>
>>> 2017-11-28 2:21 GMT+01:00 Paul Hobson <pmhobson at gmail.com>:
>>>
>>>> Hey folks,
>>>>
>>>> I noticed that DataFrame.select is now deprecated in favor of
>>>> DataFrame.loc[index.map(selector_fxn)]
>>>>
>>>> PR: https://github.com/pandas-dev/pandas/pull/17633
>>>> Issue: https://github.com/pandas-dev/pandas/issues/12401
>>>>
>>>> I have a lot of work flows that look something like this:
>>>>
>>>>     res = (
>>>>         data.resample(freq)
>>>>             .agg(agg_dict)
>>>>             .pipe(fxn_that_adds_many_cols)
>>>>             .select(complex_fxn_that_selects_a_few_cols,
>>>> axis='columns')
>>>>     )
>>>>
>>>> It's not immediately clear to me how to access all of the e.g., columns
>>>> in the middle or at the end of a chain of dataframe operations.
>>>>
>>>> Any tips?
>>>>
>>>> -Paul
>>>>
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>
>>>>
>>>
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171128/9e982790/attachment-0001.html>

From shoyer at gmail.com  Tue Nov 28 13:58:05 2017
From: shoyer at gmail.com (Stephan Hoyer)
Date: Tue, 28 Nov 2017 18:58:05 +0000
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
In-Reply-To: <CADT3MEBy5TSNQkX9wCMYbZ6P=L6WK=2kcpgVOZZ+CNLHQARHFQ@mail.gmail.com>
References: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
 <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>
 <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>
 <CAEQ_Tvdazun4_v86qiOC1ckn0vk_yPFKH=mnTeXnA8403NHNRw@mail.gmail.com>
 <CADT3MEBy5TSNQkX9wCMYbZ6P=L6WK=2kcpgVOZZ+CNLHQARHFQ@mail.gmail.com>
Message-ID: <CAEQ_TvfVhrdtehEJCPStUSRS-t0i3KdUDpww4FTqGY_R7VrxCQ@mail.gmail.com>

On Tue, Nov 28, 2017 at 6:34 PM Paul Hobson <pmhobson at gmail.com> wrote:

> Thanks for the info. While .select on the default axis (index) is indeed
> very different than SQL, operating on the columns is very similar (jn my
> twisted brain at least).
>

Agreed, but sadly .select() didn't default to axis=1.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171128/c9054d67/attachment.html>

From jorisvandenbossche at gmail.com  Tue Nov 28 18:28:00 2017
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Wed, 29 Nov 2017 00:28:00 +0100
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
In-Reply-To: <CAEQ_TvfVhrdtehEJCPStUSRS-t0i3KdUDpww4FTqGY_R7VrxCQ@mail.gmail.com>
References: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
 <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>
 <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>
 <CAEQ_Tvdazun4_v86qiOC1ckn0vk_yPFKH=mnTeXnA8403NHNRw@mail.gmail.com>
 <CADT3MEBy5TSNQkX9wCMYbZ6P=L6WK=2kcpgVOZZ+CNLHQARHFQ@mail.gmail.com>
 <CAEQ_TvfVhrdtehEJCPStUSRS-t0i3KdUDpww4FTqGY_R7VrxCQ@mail.gmail.com>
Message-ID: <CALQtMBZmC7K-OABPs6UffOnfjFf3TCUeLE4Z9whQbycdWPciXw@mail.gmail.com>

Would there be a way in keeping .select() but only deprecating the
(default) `axis=0` ? Or would that only be more confusing?

Because if we would find a name for such a method that defaults to the
columns, we would come up with 'select' ...

2017-11-28 19:58 GMT+01:00 Stephan Hoyer <shoyer at gmail.com>:

> On Tue, Nov 28, 2017 at 6:34 PM Paul Hobson <pmhobson at gmail.com> wrote:
>
>> Thanks for the info. While .select on the default axis (index) is indeed
>> very different than SQL, operating on the columns is very similar (jn my
>> twisted brain at least).
>>
>
> Agreed, but sadly .select() didn't default to axis=1.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171129/3a4a8732/attachment.html>

From jorisvandenbossche at gmail.com  Thu Nov 30 20:09:10 2017
From: jorisvandenbossche at gmail.com (Joris Van den Bossche)
Date: Fri, 1 Dec 2017 02:09:10 +0100
Subject: [Pandas-dev] Feedback request for return value of empty or all-NA
 sum (0 or NA?)
Message-ID: <CALQtMBZhRkTHrxnmuptCyL_97QvWfJadT3vCt4uEjpAqWKyD8g@mail.gmail.com>

*[Note for those reading it on the pydata mailing list, please answer to
pandas-dev at python.org <pandas-dev at python.org> to keep discussion
centralised there]*


Hi list,

In pandas 0.21.0 we changed the behaviour of the sum method for empty or
all-NaN Series (to consistently return NaN), see the what's note
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#sum-prod-of-all-nan-or-empty-series-dataframes-is-now-consistently-nan>.
This change lead to some discussion on github whether this was the right
choice we made.

But the reach of github is of course limited, and therefore we wanted to
solicit some more feedback on the mailing list. Below is given an overview
of the background of the issue and the different options.

Please keep in mind that we are not really interested in theoretical
reasons why one of the other option is better or more correct. Each of the
options has it advantages / disadvantages in practice. But it would be very
interesting to hear the consequences in actual example analysis pipelines.

Best,
Joris
Background

Before pandas 0.21.0, the behaviour of the sum of an all-NA Series depended
on whether the optional bottleneck dependency was installed. This
inconsistency was in place since the bottleneck 1.0.0 release (February
2015), and you can read more background on it in the github issue #9422
<https://github.com/pandas-dev/pandas/issues/9422>. With bottleneck, the
sum of all-NA was zero; without bottleneck, the sum was NaN.

In [2]: pd.__version__

Out[2]: '0.20.3'

In [3]: pd.options.compute.use_bottleneck = True

In [4]: Series([np.nan]).sum()

Out[4]: 0.0

In [5]: pd.options.compute.use_bottleneck = False

In [6]: Series([np.nan]).sum()

Out[6]: nan

The sum of an empty series was always 0, with or without bottleneck.

In [7]: Series([]).sum()

Out[7]: 0

For pandas 0.21, we wanted to fix this inconsistency. The return value
should not depend on whether an optional dependency is installed. After a
lengthy discussion, we opted for the original pandas behaviour to return
NaN. As a result, also the sum of an empty Series was changed to return NaN
(see the what?s new notice here
<http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#sum-prod-of-all-nan-or-empty-series-dataframes-is-now-consistently-nan>
):

In [2]: pd.__version__

Out[2]: '0.21.0'

In [3]: pd.Series([np.nan]).sum()

Out[3]: nan

In [4]: pd.Series([]).sum()

Out[4]: nan

However, after the 0.21.0 release more feedback was received about cases
where this choice is not desirable, and due to this feedback, we are
reconsidering the decision.
Options

We see three different options for the default behaviour of sum for those
two cases of empty and all-NA series:


   1.

   Empty / all-NA sum is always zero: SUM([]) = 0 and SUM([NA]) = 0


   -

   Behaviour of pandas < 0.21 + bottleneck installed
   -

   Consistent with NumPy, R, MATLAB, etc. (given you use the variant that
   is NA aware: nansum for numpy, na.rm=TRUE for R, ...)


   1.

   Empty / all-NA sum is always NA: SUM([]) = NA and SUM([NA]) = NA


   -

   The behaviour that is introduced in 0.21.0
   -

   Consistent with SQL (although often (rightly or not) complained about)


   1.

   Mixed behaviour: SUM([]) = 0 and SUM([NA]) = NA


   -

   Behaviour of pandas < 0.21 (without bottleneck installed)
   -

   A practicable compromise (having SUM([NA]) keep the information of NA,
   while SUM([]) = 0 does not introduce NAs when there were no in the data)
   -

   But somewhat inconsistent and unique to pandas ?


We have to stress that each of those choices can be preferable depending on
the use case and has its advantages and disadvantages. Some might be more
mathematical sound, others might preserve more information about having
missing data, each can be be more consistent with a certain ecosystem, ? It
is clear that there is no ?best? option for all case.

While we can only choose one of those options as the default behaviour,
each choice can be accompanied by new features that can make it easier for
the user to opt for a different behaviour:


   -

   When choosing option 1 or 2, we can introduce a new method (eg .total())
   or a keyword to .sum() (eg min_count) to obtain the other behaviour.
   -

   When choosing for option 2, we could provide a pd.zeroifna(..) to be
   able to convert NaN values from aggregation results into zero?s if desired
   (similar to COALESCE(expr, 0) in SQL)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171201/93116ea9/attachment-0001.html>

From clemens.brunner at gmail.com  Tue Nov 28 05:57:39 2017
From: clemens.brunner at gmail.com (Clemens Brunner)
Date: Tue, 28 Nov 2017 11:57:39 +0100
Subject: [Pandas-dev] Changing the default max_columns and max_rows
Message-ID: <034945C6-2D66-4F51-ACA5-50DC01DDDA71@gmail.com>

Hello!

We're currently discussing a change in how data frames are displayed by default in https://github.com/pandas-dev/pandas/pull/17023. There are two proposed changes:

(1) Set pd.options.display.max_columns=0 (previously this was set to 20).
(2) Set pd.options.display.max_rows=20 (previously this was set to 60).

Change (1) means that the number of printed columns is adapted to fit within the width of the terminal. If there are too many columns, ellipsis will be shown to indicate collapsed columns in the middle of the data frame. This doesn't work if Python is run as a Jupyter kernel (e.g. in a Jupyter notebook or in IPython QtConsole), in which case the maximum columns remain 20.

Example:
========
import pandas as pd
import numpy as np
pd.DataFrame(np.random.rand(5, 10))

Output before (in a terminal with 100 chars width):
---------------------------------------------------
          0         1         2         3         4         5         6  \
0  0.643979  0.690414  0.018603  0.991478  0.707534  0.376765  0.670848   
1  0.547836  0.810972  0.054448  0.415112  0.268120  0.904528  0.839258   
2  0.582256  0.732149  0.284208  0.405197  0.213591  0.715367  0.150106   
3  0.197348  0.317159  0.051669  0.738405  0.821046  0.179270  0.245793   
4  0.483466  0.583330  0.999213  0.882883  0.315169  0.045712  0.897048   

          7         8         9  
0  0.891467  0.494220  0.713369  
1  0.601304  0.449880  0.266205  
2  0.113262  0.360580  0.238833  
3  0.798063  0.077769  0.471169  
4  0.262779  0.530565  0.992084 

Output after:
-------------
          0         1         2         3    ...            6         7         8         9
0  0.673621  0.211505  0.943201  0.946548    ...     0.900453  0.612182  0.861933  0.710967
1  0.670855  0.834449  0.796273  0.785976    ...     0.609954  0.686663  0.684582  0.837505
2  0.544736  0.814827  0.352893  0.459556    ...     0.650993  0.735943  0.279110  0.840203
3  0.440125  0.554323  0.745462  0.940896    ...     0.544576  0.224175  0.852603  0.509837
4  0.225551  0.791834  0.476059  0.321857    ...     0.391165  0.423213  0.290683  0.954423

[5 rows x 10 columns]


Change (2) implies fewer rows are displayed before auto-hiding takes place. I find that 60 rows almost always causes the terminal to scroll (most terminals have between 25-40 rows), so reducing the value to 20 increases the chance that a data frame can be observed on one terminal page. I'm not including a before/after output since it should be easy to imagine how this change affects the output.

Both changes would make Pandas behave similar to R's Tidyverse (which I really like), but this should not be the main reason why these changes are a good idea. I mainly like them because these settings make (large) data frames much nicer to look at.

Note that these changes affect the default values. Of course, users are free to change them back in their active Python session.

Comments to both proposed changes are highly welcome (either here on the mailing list or at https://github.com/pandas-dev/pandas/pull/17023.

Clemens

From jon.mease at gmail.com  Tue Nov 28 19:29:44 2017
From: jon.mease at gmail.com (Jon Mease)
Date: Tue, 28 Nov 2017 19:29:44 -0500
Subject: [Pandas-dev] Help replacing workflows that used DataFrame.select
In-Reply-To: <CALQtMBZmC7K-OABPs6UffOnfjFf3TCUeLE4Z9whQbycdWPciXw@mail.gmail.com>
References: <CADT3MEDrQS4pCQNq-khyftH0e6Diqgv_o8XBjnSsaFYs9uGQ+A@mail.gmail.com>
 <CALQtMBY0wiETCSg_XzjjrWsMi823QuXJEkHepcDceSMwNNBjHw@mail.gmail.com>
 <CADT3MEDMw1C6rms5mriW_jQ3YfygxubmZob46WpY63DLdjQbGg@mail.gmail.com>
 <CAEQ_Tvdazun4_v86qiOC1ckn0vk_yPFKH=mnTeXnA8403NHNRw@mail.gmail.com>
 <CADT3MEBy5TSNQkX9wCMYbZ6P=L6WK=2kcpgVOZZ+CNLHQARHFQ@mail.gmail.com>
 <CAEQ_TvfVhrdtehEJCPStUSRS-t0i3KdUDpww4FTqGY_R7VrxCQ@mail.gmail.com>
 <CALQtMBZmC7K-OABPs6UffOnfjFf3TCUeLE4Z9whQbycdWPciXw@mail.gmail.com>
Message-ID: <CAOy8Ok4m0Bt4w1dGGpryL+8HTB+EBwPQsGz5pvZnPBJ4Lakbgg@mail.gmail.com>

Perhaps for versions 0.21.1 and 0.22 a warning could be issued when
.select() is used without an explicit `axis` parameter.

The warning would state that the current default is `axis=0` but that this
will change to `axis=1` in the next major release. If the user wants the
current default behavior then they could suppress the warning and
future-proof their code by passing `axis=0` explicitly.

-Jon

On Tue, Nov 28, 2017 at 6:28 PM, Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

> Would there be a way in keeping .select() but only deprecating the
> (default) `axis=0` ? Or would that only be more confusing?
>
> Because if we would find a name for such a method that defaults to the
> columns, we would come up with 'select' ...
>
> 2017-11-28 19:58 GMT+01:00 Stephan Hoyer <shoyer at gmail.com>:
>
>> On Tue, Nov 28, 2017 at 6:34 PM Paul Hobson <pmhobson at gmail.com> wrote:
>>
>>> Thanks for the info. While .select on the default axis (index) is indeed
>>> very different than SQL, operating on the columns is very similar (jn my
>>> twisted brain at least).
>>>
>>
>> Agreed, but sadly .select() didn't default to axis=1.
>>
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171128/78df604c/attachment.html>