[Pandas-dev] PDEPs: pandas enhancement proposals

Simon Hawkins simonjayhawkins at gmail.com
Fri Aug 5 11:41:45 EDT 2022


On Fri, 5 Aug 2022 at 16:29, Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

>
> On Wed, 20 Jul 2022 at 16:26, Marc Garcia <garcia.marc at gmail.com> wrote:
>
>> Ok, correct me if I'm wrong, but for what you say the options to consider
>> are:
>>
>> 1) Keep everything as is (in the main pandas repo), and maybe improve
>> notifications (send emails to a list where people can subscribe, rss feed,
>> telegram messages...)
>>
>> 2) Use a new repo for PRs, and use the main pandas website to display them
>>   2.a) On the website build, fetch the PDEP docs from the other repo
>>   2.b) On the PDEP repo CI, push the PDEP docs to the main pandas repo
>>
>> 3) Use a new repo for the PDEP PRs, and have a separate website for PDEPs
>>
>> Does these options sound reasonable as the ones to discuss? Or am I
>> missing something?
>>
>> My preference is 1, as I think it's the simplest, and adding
>> notifications that allow following PDEPs separate from all pandas activity
>> doesn't seem complex.
>>
>> I'm also fine with 2.a if more people have a strong opinion about keeping
>> PDEP discussions/PRs in a separate repo. I personally don't see advantages
>> in 2.b and 3.
>>
>
> Thanks, that's indeed a good summary of the options. I also think 2.a is
> the easiest of the alternatives, so I would indeed only consider 1 and 2.a.
>
> My preference is to go with 2 (separate repo), for the reasons mentioned
> before. I think Tom also mentioned this as his preference, and Jeff being
> OK with it, while Marc/Matthew prefer the main repo. But it would be good
> to hear from others as well whether they have a (strong) preference.
>

maybe another quick poll with just those 2 options, it seemed to get a
swift resolution on the PDEP name.

if not a clear majority, then we would need further discussion.

if a majority, then maybe only a few pain points to resolve.


>
> If we decide to do that, I am happy to do a PR to update the publishing
> workflow to handle a separate repo.
>
> Joris
>
>
>>
>> On Mon, Jul 18, 2022, 18:34 Joris Van den Bossche <
>> jorisvandenbossche at gmail.com> wrote:
>>
>>> Thanks Marc for the detailed answer.
>>> In general, I personally think that the added complexity is not that
>>> big, and we can still have a nice publishing workflow to the website with a
>>> separate repo (some more detailed responses inline below).
>>>
>>> For someone who wants to follow the PDEPs (and I hope with this new PDEP
>>> process we can engage more people in the pandas community), but doesn't
>>> have the time follow all of pandas (eg a maintainer of a dependent package,
>>> ..), my hunch is that a separate repo is a more accessible way to do this.
>>> You can indeed list all related PRs based on a label filter, but you
>>> still need to know this (we can of course document that on the roadmap
>>> page) and it's not an automatic notification. And for email notifications
>>> you can indeed set up an email filter (although I don't think you have a
>>> good option if using github notifications?).
>>>
>>> For someone as myself, if we end up using the main repo, I can for sure
>>> set up those filters, that is not a problem. But in general I think that is
>>> not a very accessible way to have people follow those discussions. Having
>>> it as a separate repo provides a clear home and gives you all the tools
>>> that github has to manage and customize the notifications however you want
>>> (eg watch one repo and not the other).
>>>
>>> Sidenote: I do (or did) this for other projects, such as numpy or
>>> python. I don't follow either of their issue trackers, but I do (somewhat)
>>> follow NEP or PEP discussions, and both give me a way to do that without
>>> having to follow their main issue trackers.
>>>
>>> The last point that you raise about "forgetting about a separate repo"
>>> is certainly a valid concern. It's true that the other separate repos that
>>> we have (had) were no success, so we don't have a good track record on this
>>> front. But I do think it is a matter of habit (and
>>> documentation/communication! we never really publicized any of the other
>>> repos, neither actively used them at any point), and if ensure we have a
>>> steady activity in such a separate repo for a while, I think that will grow
>>> naturally.
>>>
>>> On Sat, 25 Jun 2022 at 21:19, Matthew Roeschke <emailformattr at gmail.com>
>>> wrote:
>>>
>>>> I find Marc's arguments regarding general simplicity of PDEP flow
>>>> (publishing to website & integration to the main repo) a strong argument to
>>>> keep these in the main repo.
>>>>
>>>> Since there is a dependency between PDEP development and the pandas-dev
>>>> repo development, having them separated may lead to similar workflow
>>>> challenges with the MacPython/pandas-wheels repo for example (where ciwheelbuild
>>>> being integrated into the main repo
>>>> <https://github.com/pandas-dev/pandas/issues/44027> is considered a
>>>> benefit due to tighter integration).
>>>>
>>>
>>> I think an important difference here is that building wheels is defined
>>> in the pandas repo (packaging setup) and often needs fixes in pandas, and
>>> so here it indeed makes that workflow much easier to have that in the same
>>> repo. For PDEPs that is much less of an issue.
>>>
>>>>
>>>> I agree PDEP visibility from notifications is important, but
>>>> notification priority and channels can differ person-to-person. For
>>>> example, I just manage my GIthub notifications in Github, not email.
>>>>
>>>> I don't think there is fundamentally a difference between both. Also if
>>> I was using github notifications, seeing a specific subset of issues in
>>> those is challenging (while when using email I could at least set up some
>>> automatic filters).
>>> (but I don't know github notifications well, so I might be wrong)
>>>
>>>
>>>> On Sat, Jun 25, 2022 at 10:50 AM Tom Augspurger <
>>>> tom.w.augspurger at gmail.com> wrote:
>>>>
>>>>> For me, notifications are the big thing. Having the emails come from a
>>>>> separate repo would make following things much easier for those who can’t
>>>>> keep up with the main repo.
>>>>>
>>>>> Tom
>>>>>
>>>>> On Jun 25, 2022, at 12:04 PM, Marc Garcia <garcia.marc at gmail.com>
>>>>> wrote:
>>>>>
>>>>> 
>>>>> Thanks for the feedback. I understand your point about using a
>>>>> different repo, but I see several advantages on the current approach, so
>>>>> maybe worth discussing a bit further what are the exact pain points, to see
>>>>> if a separate repo is really the best solution.
>>>>>
>>>>> Let me know if I miss something, but I see three different ways in
>>>>> which we'll be interacting with PDEPs:
>>>>>
>>>>> a) Via their rendered version. Not sure if you checked it, but the
>>>>> current rendered page from the PDEP PR (attached) is equivalent to the home
>>>>> of the scikit-learn SLEP proposals [1]. The main difference is that with
>>>>> the current approach we have it integrated with the website, which I
>>>>> personally think it's an advantage.
>>>>>
>>>>> I am assuming that also with a separate repo we will have an identical
>>> web page (which wil be very useful!).
>>>
>>>>
>>>>> b) Via the list of PDEP PRs to review. In this case, to see only PDEP
>>>>> PRs, if we use the main pandas repo, this is just a label filter [2]. To me
>>>>> personally quicker than having to go to another repo, but no big difference
>>>>> about one or the other.
>>>>>
>>>>> c) Notifications. I guess this is the main thing. I think one concern
>>>>> is that notifications from PDEPs get lost in the rest of the repo
>>>>> notifications. I assume you're using your email client filters, and if the
>>>>> notifications come from another repo, you can change the rules easily. I
>>>>> guess the solution here would be to use something like PDEP in the title
>>>>> and use that as a rule. Or we can try to find something more reliable, if
>>>>> that's the main concern.
>>>>>
>>>>> Personally, I don't see the advantages of having the proposals in a
>>>>> separate repo very significant. And by keeping things the way they're
>>>>> implemented in the PR, I do see some advantages:
>>>>> - No need to maintain a separate repo, CI workflow, jobs to publish
>>>>> the build, sphinx (or equivalent) project... Nothing too complex, by why
>>>>> having to implement and maintain all that if our website is already
>>>>> prepared to handle it. And in particular, with Sphinx is not as easy as
>>>>> with out website to fetch the open PRs and render them.
>>>>> - Integrated UX of the PDEPs into our website. I think this gives it
>>>>> more visibility, and a better using experience than having to jump from one
>>>>> website to another.
>>>>>
>>>>> I think it should certainly be possible to keep the website UX as you
>>> implemented with a separate repo as well.
>>> There are for sure multiple options, but one (maybe simplest) option
>>> would be to keep the publishing in the main repo as you have now (since the
>>> website publishing lives there): for example the separate repo could
>>> additionally be cloned in the website workflow, and then that content is
>>> available as well (requiring to change the path in the script a bit).
>>>
>>> The PDEP repo itself could further have only very limited CI?
>>>
>>>
>>>> - One of my concerns is that being in a separate repo we forget about
>>>>> them. We're used to check PRs in the pandas repo, and we'll keep coming
>>>>> back to PRs about PDEPs until they're merged if they are in the main repo,
>>>>> but feels like being in a separate repo is easier to forget them when there
>>>>> is no recent activity and notifications.
>>>>>
>>>>> It would be good to know if I miss any of your concerns. If I didn't,
>>>>> I'd say we can start with what's already implemented, which is almost ready
>>>>> to get merged, and if in the future you still think we can do better by
>>>>> using a separate repo, you can implement it, we have a discussion about it,
>>>>> and we move PDEPs to a separate repo if that makes sense. What do you think?
>>>>>
>>>>> Cheers,
>>>>> Marc
>>>>>
>>>>> 1.
>>>>> https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/
>>>>> 2.
>>>>> https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP
>>>>>
>>>>>
>>>>> On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1 in using a separate repo (under pandas-dev) for this
>>>>>>
>>>>>>
>>>>>> On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche <
>>>>>> jorisvandenbossche at gmail.com> wrote:
>>>>>>
>>>>>> 
>>>>>> Thanks for starting this proposal, Marc!
>>>>>>
>>>>>> I have already been doing this in some ad-hoc way with eg the
>>>>>> Copy/View proposal (writing an actual proposal document), so I am very much
>>>>>> in favor of formalizing this a bit more.
>>>>>>
>>>>>> Personally, I would prefer that we use a more dedicated home for this
>>>>>> instead of using the existing pandas repo (e.g. a separate repo in the
>>>>>> pandas-dev org). The main pandas repo has nowadays such a high volume in
>>>>>> issue and PR comments, that it becomes difficult to follow this or notice
>>>>>> specific issues. While there are certainly ways to deal with this (e.g.
>>>>>> consistently using a specific label and title, ensuring we always notify
>>>>>> the mailing list as well, ...), IMO it would make it more accessible to
>>>>>> follow and have an overview of those discussions in e.g. a separate repo.
>>>>>>
>>>>>> (there are examples of both in other projects, for example
>>>>>> scikit-learn has a separate repo, while bumpy uses the main repo I think)
>>>>>>
>>>>>> Joris
>>>>>>
>>>>>> Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc at gmail.com>:
>>>>>>
>>>>>>> We're in the process of implementing PDEPs, equivalent to Python's
>>>>>>> PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap,
>>>>>>> make discussions more efficient, obtain more structured feedback from the
>>>>>>> community, and add visibility to agreed future plans for pandas.
>>>>>>>
>>>>>>> The initial implementation (workflow) is a bit simpler than PEP or
>>>>>>> NEP, but we'll iterate in the future as convenient.
>>>>>>>
>>>>>>> You can see the PR for PDEP-1 with the purpose, scope and guidelines
>>>>>>> here: https://github.com/pandas-dev/pandas/pull/47444
>>>>>>>
>>>>>>> Feedback is very welcome.
>>>>>>> _______________________________________________
>>>>>>> Pandas-dev mailing list
>>>>>>> Pandas-dev at python.org
>>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pandas-dev mailing list
>>>>>> Pandas-dev at python.org
>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>>
>>>>>> [image: Screenshot at 2022-06-25 22-20-50.png]
>>>>> _______________________________________________
>>>>> Pandas-dev mailing list
>>>>> Pandas-dev at python.org
>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>
>>>>> _______________________________________________
>>>>> Pandas-dev mailing list
>>>>> Pandas-dev at python.org
>>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>>
>>>>
>>>>
>>>> --
>>>> Matthew Roeschke
>>>> _______________________________________________
>>>> Pandas-dev mailing list
>>>> Pandas-dev at python.org
>>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>>
>>> _______________________________________________
>>> Pandas-dev mailing list
>>> Pandas-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pandas-dev
>>>
>> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20220805/cf65c03e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot at 2022-06-25 22-20-50.png
Type: image/png
Size: 192689 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20220805/cf65c03e/attachment-0001.png>


More information about the Pandas-dev mailing list