From vlb at cfcl.com Sun Sep 8 16:16:48 2019 From: vlb at cfcl.com (Vicki Brown) Date: Sun, 8 Sep 2019 13:16:48 -0700 Subject: [Pandas-dev] Series.value_counts and length of series Message-ID: <386EDD4E-FC13-489B-BFDD-D0B6AF45DF3F@cfcl.com> Hi - I have a dataset: RangeIndex: 237061 entries, 0 to 237060 Data columns (total 23 columns): Date 237061 non-null datetime64[ns] Station Number 237061 non-null object Depth 237061 non-null float64 ... For three of the columns, I have calculated value_counts. For two of those, the result includes the length of the set; for the third, it does not. Why not? In [1]: dt = wq_df['Date'] dt_counts = dt.value_counts() In [2]: st = wq_df['Station Number'] st_counts = st.value_counts() In [3]: dp = wq_df['Depth'] dp_counts = dp.value_counts() In [4]: dt_counts Out[4]: 1969-04-10 21 ... Name: Date, Length: 1172, dtype: int64 In [5]: st_counts Out[5]: 18 16622 ... Name: Station Number, dtype: int64 In [6]: dp_counts Out[6]: 0.5 1962 ... Name: Depth, Length: 99, dtype: int64 -- Vicki Vicki Brown cfcl.com/vlb From jorisvandenbossche at gmail.com Sun Sep 8 16:29:02 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Sun, 8 Sep 2019 22:29:02 +0200 Subject: [Pandas-dev] Series.value_counts and length of series In-Reply-To: <386EDD4E-FC13-489B-BFDD-D0B6AF45DF3F@cfcl.com> References: <386EDD4E-FC13-489B-BFDD-D0B6AF45DF3F@cfcl.com> Message-ID: Hi, That should not happen (the length is normally part of the Series representation, independent of the data type or the content or length of the Series). Can you provide a reproducible example? (a piece of code that is self-contained and we can run to reproduce the issue) Best, Joris On Sun, 8 Sep 2019 at 22:24, Vicki Brown wrote: > Hi - > > I have a dataset: > > > RangeIndex: 237061 entries, 0 to 237060 > Data columns (total 23 columns): > Date 237061 non-null datetime64[ns] > Station Number 237061 non-null object > Depth 237061 non-null float64 > ... > > For three of the columns, I have calculated value_counts. > For two of those, the result includes the length of the set; for the > third, it does not. > > Why not? > > In [1]: dt = wq_df['Date'] > dt_counts = dt.value_counts() > > In [2]: st = wq_df['Station Number'] > st_counts = st.value_counts() > > In [3]: dp = wq_df['Depth'] > dp_counts = dp.value_counts() > > In [4]: dt_counts > > Out[4]: 1969-04-10 21 > ... > Name: Date, Length: 1172, dtype: int64 > > In [5]: st_counts > Out[5]: 18 16622 > ... > Name: Station Number, dtype: int64 > > In [6]: dp_counts > Out[6]: 0.5 1962 > ... > Name: Depth, Length: 99, dtype: int64 > > > > -- Vicki > > Vicki Brown > cfcl.com/vlb > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Mon Sep 9 06:58:23 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Mon, 9 Sep 2019 05:58:23 -0500 Subject: [Pandas-dev] NumFOCUS Newsletter Message-ID: Hi all, NumFOCUS is asking for this month's highlights. - recent releases - upcoming events - job or hiring announcements - calls for participation My suggestions 1. Released 0.25.1 on August 22nd. 2. Preparing for 0.25.2 with Python 3.8 compatibility 3. Participated in (/ helped organized) the DataFrame summit at EuroSciPy (Marc, can you fill in details here?) 4. Published the results of the pandas user survey: http://dev.pandas.io/pandas-blog/2019-pandas-user-survey.html Anything else to add? Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.ayd at icloud.com Mon Sep 9 12:05:30 2019 From: william.ayd at icloud.com (William Ayd) Date: Mon, 9 Sep 2019 09:05:30 -0700 Subject: [Pandas-dev] NumFOCUS Newsletter In-Reply-To: References: Message-ID: Should we touch on the upcoming new documentation / website? Do we know how far out those are? > On Sep 9, 2019, at 3:58 AM, Tom Augspurger wrote: > > Hi all, > > NumFOCUS is asking for this month's highlights. > > recent releases > upcoming events > job or hiring announcements > calls for participation > My suggestions > > 1. Released 0.25.1 on August 22nd. > 2. Preparing for 0.25.2 with Python 3.8 compatibility > 3. Participated in (/ helped organized) the DataFrame summit at EuroSciPy (Marc, can you fill in details here?) > 4. Published the results of the pandas user survey: http://dev.pandas.io/pandas-blog/2019-pandas-user-survey.html > > Anything else to add? > > Tom > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev William Ayd william.ayd at icloud.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Mon Sep 9 14:08:45 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Mon, 9 Sep 2019 13:08:45 -0500 Subject: [Pandas-dev] Monthly Dev Meeting Message-ID: Hi all, The next pandas meeting is this Wednesday. You can subscribe to one of these calendars with the meeting info. ical: https://calendar.google.com/calendar/ical/pgbn14p6poja8a1cf2dv2jhrmg%40group.calendar.google.com/public/basic.ics gcal: https://calendar.google.com/calendar/embed?src=pgbn14p6poja8a1cf2dv2jhrmg%40group.calendar.google.com&ctz=America%2FChicago Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlb at cfcl.com Mon Sep 9 20:30:53 2019 From: vlb at cfcl.com (Vicki Brown) Date: Mon, 9 Sep 2019 17:30:53 -0700 Subject: [Pandas-dev] Series.value_counts and length of series In-Reply-To: References: <386EDD4E-FC13-489B-BFDD-D0B6AF45DF3F@cfcl.com> Message-ID: <003A1ED7-6E0D-497E-9091-46FFA60937BC@cfcl.com> In trying to create a smaller reproducible test case, I discovered that the inclusion of length in the report appears to depend on the actual length of the Series returned. Specifically, a Series of length 43 only reported on Name and dtype; a Series of length 66 also reports that length. >> a piece of code that is self-contained and we can run to reproduce the issue ``` import pandas as pd a_list = [13.1, 13.1, 13.0, 13.0, 14.1, 14.0, 14.0, 14.1, 13.7, 13.7, 13.7, 13.5, 14.4, 14.4, 14.3, 14.3, 14.2, 14.3, 14.3, 14.1, 14.1, 13.9, 14.0, 14.0, 14.0, 14.0, 14.5, 14.4, 14.3, 14.2, 14.0, 14.4, 14.3, 14.0, 13.7, 14.3, 14.3, 14.1, 14.0, 13.8, 14.1, 14.0, 14.0, 13.9, 13.4, 14.3, 14.7, 14.0, 13.6, 14.4, 14.9, 14.2, 13.6, 13.2, 13.0, 14.3, 13.9, 13.5, 13.0, 14.2, 16.2, 15.8, 14.0, 13.6, 13.2, 15.2, 14.6, 14.3, 14.2, 14.2, 15.1, 14.6, 14.3, 14.0, 13.7, 14.3, 14.2, 14.3, 14.3, 14.3, 14.3, 14.1, 13.7, 13.6, 13.6, 14.0, 14.0, 13.6, 13.2, 13.1, 14.9, 14.1, 13.8, 13.9, 13.8, 14.6, 14.4, 14.3, 14.2] b_list = [0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.3, 0.6, 0.6, 0.8, 3.4, 7.4, 1.6, 3.3, 6.0, 10.2, 0.7, 0.7, 1.2, 5.4, 9.3, 4.1, 4.9, 5.2, 7.1, 13.7, 2.7, 3.0, 5.9, 12.3, 1.8, 4.0, 3.5, 12.2, 16.3, 18.1, 7.8, 10.8, 14.8, 19.5, 7.8, 4.2, 4.4, 10.0, 12.2, 17.7, 5.4, 6.1, 7.1, 7.4, 13.0, 6.3, 6.7, 8.4, 11.0, 1.3, 8.5, 8.8, 8.9, 10.7, 12.5, 10.5, 11.5, 15.4, 16.9, 17.8, 17.3, 17.3, 17.7, 20.0, 21.1, 10.4, 12.9, 15.3, 16.4, 16.4, 12.0, 12.8, 14.6, 16.0] a = pd.Series(a_list) a.value_counts() b = pd.Series(b_list) b.value_counts() ``` > On Sep 8, 2019, at 13:29 , Joris Van den Bossche wrote: > > Hi, > > That should not happen (the length is normally part of the Series representation, independent of the data type or the content or length of the Series). Can you provide a reproducible example? (a piece of code that is self-contained and we can run to reproduce the issue) > > Best, > Joris > > On Sun, 8 Sep 2019 at 22:24, Vicki Brown > wrote: > Hi - > > I have a dataset: > > > RangeIndex: 237061 entries, 0 to 237060 > Data columns (total 23 columns): > Date 237061 non-null datetime64[ns] > Station Number 237061 non-null object > Depth 237061 non-null float64 > ... > > For three of the columns, I have calculated value_counts. > For two of those, the result includes the length of the set; for the third, it does not. > > Why not? > > In [1]: dt = wq_df['Date'] > dt_counts = dt.value_counts() > > In [2]: st = wq_df['Station Number'] > st_counts = st.value_counts() > > In [3]: dp = wq_df['Depth'] > dp_counts = dp.value_counts() > > In [4]: dt_counts > > Out[4]: 1969-04-10 21 > ... > Name: Date, Length: 1172, dtype: int64 > > In [5]: st_counts > Out[5]: 18 16622 > ... > Name: Station Number, dtype: int64 > > In [6]: dp_counts > Out[6]: 0.5 1962 > ... > Name: Depth, Length: 99, dtype: int64 > > > > -- Vicki > > Vicki Brown > cfcl.com/vlb > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev -- Vicki Vicki Brown cfcl.com/vlb -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Tue Sep 10 11:20:22 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Tue, 10 Sep 2019 16:20:22 +0100 Subject: [Pandas-dev] NumFOCUS Newsletter In-Reply-To: References: Message-ID: What Tom proposes sounds good to me. I think it'll still take a while to have the new website/documentation, and probably worth to announce it in the NumFOCUS newsletter once it's published. The dataframe summit was quite successful I think, around 20 people participated, and very interesting discussions. I'll try to put together a write up in the next couple of days, and I'll ask Joris who was also there to review it, so what was discussed is available to all you and anyone else interested. On Mon, Sep 9, 2019 at 5:05 PM William Ayd via Pandas-dev < pandas-dev at python.org> wrote: > Should we touch on the upcoming new documentation / website? Do we know > how far out those are? > > On Sep 9, 2019, at 3:58 AM, Tom Augspurger > wrote: > > Hi all, > > NumFOCUS is asking for this month's highlights. > > > - recent releases > - upcoming events > - job or hiring announcements > - calls for participation > > My suggestions > > 1. Released 0.25.1 on August 22nd. > 2. Preparing for 0.25.2 with Python 3.8 compatibility > 3. Participated in (/ helped organized) the DataFrame summit at EuroSciPy > (Marc, can you fill in details here?) > 4. Published the results of the pandas user survey: > http://dev.pandas.io/pandas-blog/2019-pandas-user-survey.html > > Anything else to add? > > Tom > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > > > William Ayd > william.ayd at icloud.com > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Tue Sep 10 16:38:19 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Tue, 10 Sep 2019 22:38:19 +0200 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References: Message-ID: Coming back to this thread (as I don't think we really reached a conclusion) On Sun, Jul 21, 2019 at 10:10 AM Matthew Rocklin wrote: > I hope you don't mind the intrusion of a non-pandas dev here. > As you well know ;) we certainly don't mind the intrusion. It's exactly from people depending on pandas that we want to hear about this. > My ardent hope as a user is that you all will clean up and improve the > Pandas API continuously. While doing this work I fully expect small bits > of the API to break on pretty much every release (I think that it would be > hard to avoid this). My guess is that if this community adopted SemVer > then devs would be far more cautious about tidying things, which I think > would be unfortunate. As someone who is very sensitive to changes in the > Pandas API I'm fully in support of the devs breaking things regularly if it > means faster progress. > > For me, a possible reason to go for a SemVer-like versioning scheme is *not* to do no more breaking changes / clean-up, but rather the opposite. To make it easier to do big changes (if we agree on wanting to do them), as we then have a mechanism to do that: bump the major version. For example, I very much want that we, at some point, make the nullable integer dtype and a potential new string dtype the default types for pandas (which will inevitable a breaking change). I would like to see us move forward to a more consistent handling of missing values across types (see https://github.com/pandas-dev/pandas/issues/28095), optional indexes, ...Those will only be possible with at some point doing a bigger breaking change release. On Mon, 22 Jul 2019 at 17:54, Tom Augspurger wrote: > ... As you say, SemVer is more popular in absolute terms. But within our > little community (NumPy, pandas, scikit-learn), rolling deprecations seems > to be the preferred approach. > I think there's some value in being consistent with those libraries. > > It's true that others in the ecosystem (numpy, scipy, scikit-learn) use a rolling deprecation. But, a big difference is that they in principle only do deprecations and no breaking changes. > Joris / Wes, do you know what Arrow's policy will be after its 1.0? > After 1.0, Arrow will follow strict backwards compatibility guarantees and SemVer *for the format *( https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst). For library versions (eg the C++ library, or pyarrow), the document states to also use SemVer, but I don't think it already has been discussed much how to deal with that in practice (for the format the rules are much clearer). -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Wed Sep 11 09:56:39 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Wed, 11 Sep 2019 08:56:39 -0500 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References: Message-ID: Thanks Joris, I think your point about versioning and release cadence is important. We have two axes we can operate along: How often we release (breaking) changes, and how what version number we assign to those releasees. For example: SemVer allows a 1.3.0 release that deprecates a feature, and then releasing 2.0.0 the next day that enforces the deprecation. We obviously don't want to do that; we'll continue to give the community time to adjust. --- > Those [changing NA values] will only be possible with at some point doing a bigger breaking change release. In principle, we could have config options to opt into the new behavior. But we've never done that before, and personally I would likely prefer a breaking change, with a bump in the major version (assuming we're doing SemVer). Tom On Tue, Sep 10, 2019 at 3:38 PM Joris Van den Bossche < jorisvandenbossche at gmail.com> wrote: > Coming back to this thread (as I don't think we really reached a > conclusion) > > On Sun, Jul 21, 2019 at 10:10 AM Matthew Rocklin > wrote: > >> I hope you don't mind the intrusion of a non-pandas dev here. >> > > As you well know ;) we certainly don't mind the intrusion. It's exactly > from people depending on pandas that we want to hear about this. > > >> My ardent hope as a user is that you all will clean up and improve the >> Pandas API continuously. While doing this work I fully expect small bits >> of the API to break on pretty much every release (I think that it would be >> hard to avoid this). My guess is that if this community adopted SemVer >> then devs would be far more cautious about tidying things, which I think >> would be unfortunate. As someone who is very sensitive to changes in the >> Pandas API I'm fully in support of the devs breaking things regularly if it >> means faster progress. >> >> For me, a possible reason to go for a SemVer-like versioning scheme is > *not* to do no more breaking changes / clean-up, but rather the opposite. > To make it easier to do big changes (if we agree on wanting to do them), as > we then have a mechanism to do that: bump the major version. > For example, I very much want that we, at some point, make the nullable > integer dtype and a potential new string dtype the default types for pandas > (which will inevitable a breaking change). I would like to see us move > forward to a more consistent handling of missing values across types (see > https://github.com/pandas-dev/pandas/issues/28095), optional indexes, > ...Those will only be possible with at some point doing a bigger breaking > change release. > > On Mon, 22 Jul 2019 at 17:54, Tom Augspurger > wrote: > >> ... As you say, SemVer is more popular in absolute terms. But within our >> little community (NumPy, pandas, scikit-learn), rolling deprecations seems >> to be the preferred approach. >> I think there's some value in being consistent with those libraries. >> >> It's true that others in the ecosystem (numpy, scipy, scikit-learn) use a > rolling deprecation. But, a big difference is that they in principle only > do deprecations and no breaking changes. > > >> Joris / Wes, do you know what Arrow's policy will be after its 1.0? >> > > After 1.0, Arrow will follow strict backwards compatibility guarantees > and SemVer *for the format *( > https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst). > > For library versions (eg the C++ library, or pyarrow), the document states > to also use SemVer, but I don't think it already has been discussed much > how to deal with that in practice (for the format the rules are much > clearer). > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Sep 13 11:15:44 2019 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 13 Sep 2019 10:15:44 -0500 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 In-Reply-To: References: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> Message-ID: hey Marc, I saw the write-up about the meeting on your blog https://datapythonista.github.io/blog/dataframe-summit-at-euroscipy.html Thanks for making this happen! Sorry that I wasn't able to attend. It seems that Sylvain Corlay raised some concerns about the Apache Arrow project. The best place to discuss these is on the dev at arrow.apache.org mailing list. I welcome a direct technical discussion. Some specific responses to some of these 1. Apache arrow C++ API and implementation not following common C++ idioms Sylvain has said this a number of times over the last couple of years in various contexts. The Arrow C++ codebase is a _big_ project, and this criticism AFAICT is specifically about a few header files (in particular arrow/array.h) that he doesn't like. I have said many times, publicly and privately, that the solution to this is to develop an STL-compliant interface layer to the Arrow columnar format that suits the desires of groups like xtensor. We have invited the xtensor developers to contribute more to Apache Arrow. There is nothing structural about this project that's preventing this from happening. We also invite an independent, wholly header-only STL-compliant implementation of the Arrow columnar data structures. PRs welcome. 2. Using a monorepo (including all bindings in the same repo as Arrow) It would be more helpful to have a discussion about this on the dev@ mailing list to understand why this is a concern. We have many interdependent components, written in different programming languages, and the monorepo structure enables us to have peace of mind that pull requests to one component aren't breaking any other. For example, we have binary protocol integration tests testing 4 different implementations against each other on every commit: C++, Go, Java, and JavaScript, with C# and Rust on their way eventually. Unfortunately, people like to criticize monorepos as a matter of principle. But if you actually look at the testing requirements that a project has, often a monorepo is the only really viable solution. I'm open minded about concrete alternative proposals to the project's current structure that enable us to verify whether PRs breaks any of our integration tests (keep in mind the PRs routinely touch multiple project components). 3. Not a clear distinction between the specification and implementation (as in for instance project Jupyter) This is a red herring. It's about the *community*. In the ASF, we have saying "Community over Code". One of the artifacts that the Arrow community has produced is a specification for a columnar in-memory data representation. At this point, the Arrow columnar specification is a relatively small part of the work that's been produced by the community, though it's obviously very important. I explained this in my recent workshop on the project at VLDB * https://twitter.com/wesmckinn/status/1169277437856964614 * https://www.slideshare.net/wesm/apache-arrow-workshop-at-vldb-2019-boss-session-169065658 More generally, I'm interested to understand at what point projects would be able to take on Apache Arrow as dependency. The goal of the project (and why I've invested ~4 years of my life and counting in it) is to make everyone's lives _easier_, not harder. It seems to me to be an inevitability of time, and so if there is work that we can be prioritizing to speed along this outcome, please let me know. Thanks, Wes On Tue, Jul 16, 2019 at 6:25 AM Marc Garcia wrote: > > For the people who has shown interest in joining remote, I added you to the repo of the summit [1], feel free to open issues there of the topics you're interested in discussing. I also created a Gitter channel that you can join. > > EuroSciPy doesn't currently have budget to life stream the session, but if we find a sponsor we'll do it, and also publish the recording in youtube. Based on the experience with the European pandas summit this seems unlikely. > > Cheers! > > 1. https://github.com/python-sprints/dataframe-summit > 2. https://gitter.im/py-sprints/dataframe-summit > > > On Wed, Jul 10, 2019 at 8:30 AM Pietro Battiston wrote: >> >> Hi Marc, >> >> cool! >> >> I won't be able to attend Euroscipy, but if in the "Maintainers >> session" you plan to have a way to participate remotely, I'll >> definitely do. >> >> (I might be busy on the 6th instead... still don't know for sure) >> >> Pietro >> >> Il giorno gio, 04/07/2019 alle 15.45 +0100, Marc Garcia ha scritto: >> > Hi there, >> > >> > Just to let you know that at EuroSciPy 2019 (in September in Spain) >> > we will have a dataframe summit, to stay updated and coordinate among >> > projects replicating the pandas API (other dataframe projects are >> > more than welcome). >> > >> > Maintainers from all the main projects (pandas, dask, vaex, modin, >> > cudf and koalas) will be attending. If you want to get involved >> > (whether you can attend the conference or not), please DM me. >> > >> > More info: https://github.com/python-sprints/dataframe-summit >> > Conference website: https://www.euroscipy.org/2019/ >> > >> > Cheers! >> > _______________________________________________ >> > Pandas-dev mailing list >> > Pandas-dev at python.org >> > https://mail.python.org/mailman/listinfo/pandas-dev >> > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev From garcia.marc at gmail.com Fri Sep 13 11:57:00 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Fri, 13 Sep 2019 16:57:00 +0100 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 In-Reply-To: References: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> Message-ID: Hi Wes, Thanks for the feedback. I actually discussed with Sylvain regarding the blog post, since it didn't seem to be the right channel to communicate them to you and the Arrow team if you didn't already discuss them. But he mentioned you already discussed them in the past. Also, worth commenting that the last point was from Maarten Breddels (vaex author). I don't know enough about C++ or Arrow to have my own opinion on any of them. Just tried to share what was discussed during the meeting, so it was shared with everybody who couldn't attend but could be interested. Also, re-reading what I wrote that "People were in general happy with the idea [Arrow]" may not emphasize enough the satisfaction with the project. But I can say that Sylvain made great comments about Arrow and you personally before commenting on the couple of things he disagrees on Arrow implementation. Sorry if I wasn't able to phrase things in the best way. I'm happy to make amendments if needed. Do you think it makes sense to forward your email to Sylvain? I know you already discussed with him, but may be worth discussing again? Just let me know. On Fri, Sep 13, 2019 at 4:16 PM Wes McKinney wrote: > hey Marc, > > I saw the write-up about the meeting on your blog > > https://datapythonista.github.io/blog/dataframe-summit-at-euroscipy.html > > Thanks for making this happen! Sorry that I wasn't able to attend. > > It seems that Sylvain Corlay raised some concerns about the Apache > Arrow project. The best place to discuss these is on the > dev at arrow.apache.org mailing list. I welcome a direct technical > discussion. > > Some specific responses to some of these > > 1. Apache arrow C++ API and implementation not following common C++ idioms > > Sylvain has said this a number of times over the last couple of years > in various contexts. The Arrow C++ codebase is a _big_ project, and > this criticism AFAICT is specifically about a few header files (in > particular arrow/array.h) that he doesn't like. I have said many > times, publicly and privately, that the solution to this is to develop > an STL-compliant interface layer to the Arrow columnar format that > suits the desires of groups like xtensor. We have invited the xtensor > developers to contribute more to Apache Arrow. There is nothing > structural about this project that's preventing this from happening. > > We also invite an independent, wholly header-only STL-compliant > implementation of the Arrow columnar data structures. PRs welcome. > > 2. Using a monorepo (including all bindings in the same repo as Arrow) > > It would be more helpful to have a discussion about this on the dev@ > mailing list to understand why this is a concern. We have many > interdependent components, written in different programming languages, > and the monorepo structure enables us to have peace of mind that pull > requests to one component aren't breaking any other. For example, we > have binary protocol integration tests testing 4 different > implementations against each other on every commit: C++, Go, Java, and > JavaScript, with C# and Rust on their way eventually. > > Unfortunately, people like to criticize monorepos as a matter of > principle. But if you actually look at the testing requirements that a > project has, often a monorepo is the only really viable solution. I'm > open minded about concrete alternative proposals to the project's > current structure that enable us to verify whether PRs breaks any of > our integration tests (keep in mind the PRs routinely touch multiple > project components). > > 3. Not a clear distinction between the specification and > implementation (as in for instance project Jupyter) > > This is a red herring. It's about the *community*. In the ASF, we have > saying "Community over Code". One of the artifacts that the Arrow > community has produced is a specification for a columnar in-memory > data representation. At this point, the Arrow columnar specification > is a relatively small part of the work that's been produced by the > community, though it's obviously very important. I explained this in > my recent workshop on the project at VLDB > > * https://twitter.com/wesmckinn/status/1169277437856964614 > * > https://www.slideshare.net/wesm/apache-arrow-workshop-at-vldb-2019-boss-session-169065658 > > More generally, I'm interested to understand at what point projects > would be able to take on Apache Arrow as dependency. The goal of the > project (and why I've invested ~4 years of my life and counting in it) > is to make everyone's lives _easier_, not harder. It seems to me to be > an inevitability of time, and so if there is work that we can be > prioritizing to speed along this outcome, please let me know. > > Thanks, > Wes > > On Tue, Jul 16, 2019 at 6:25 AM Marc Garcia wrote: > > > > For the people who has shown interest in joining remote, I added you to > the repo of the summit [1], feel free to open issues there of the topics > you're interested in discussing. I also created a Gitter channel that you > can join. > > > > EuroSciPy doesn't currently have budget to life stream the session, but > if we find a sponsor we'll do it, and also publish the recording in > youtube. Based on the experience with the European pandas summit this seems > unlikely. > > > > Cheers! > > > > 1. https://github.com/python-sprints/dataframe-summit > > 2. https://gitter.im/py-sprints/dataframe-summit > > > > > > On Wed, Jul 10, 2019 at 8:30 AM Pietro Battiston > wrote: > >> > >> Hi Marc, > >> > >> cool! > >> > >> I won't be able to attend Euroscipy, but if in the "Maintainers > >> session" you plan to have a way to participate remotely, I'll > >> definitely do. > >> > >> (I might be busy on the 6th instead... still don't know for sure) > >> > >> Pietro > >> > >> Il giorno gio, 04/07/2019 alle 15.45 +0100, Marc Garcia ha scritto: > >> > Hi there, > >> > > >> > Just to let you know that at EuroSciPy 2019 (in September in Spain) > >> > we will have a dataframe summit, to stay updated and coordinate among > >> > projects replicating the pandas API (other dataframe projects are > >> > more than welcome). > >> > > >> > Maintainers from all the main projects (pandas, dask, vaex, modin, > >> > cudf and koalas) will be attending. If you want to get involved > >> > (whether you can attend the conference or not), please DM me. > >> > > >> > More info: https://github.com/python-sprints/dataframe-summit > >> > Conference website: https://www.euroscipy.org/2019/ > >> > > >> > Cheers! > >> > _______________________________________________ > >> > Pandas-dev mailing list > >> > Pandas-dev at python.org > >> > https://mail.python.org/mailman/listinfo/pandas-dev > >> > > _______________________________________________ > > Pandas-dev mailing list > > Pandas-dev at python.org > > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Fri Sep 13 15:32:01 2019 From: wesmckinn at gmail.com (Wes McKinney) Date: Fri, 13 Sep 2019 14:32:01 -0500 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 In-Reply-To: References: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> Message-ID: hey Marc, On Fri, Sep 13, 2019 at 10:57 AM Marc Garcia wrote: > > Hi Wes, > > Thanks for the feedback. I actually discussed with Sylvain regarding the blog post, since it didn't seem to be the right channel to communicate them to you and the Arrow team if you didn't already discuss them. But he mentioned you already discussed them in the past. Also, worth commenting that the last point was from Maarten Breddels (vaex author). > Thanks. So I won't presume to know Maarten's intent (he can speak here for himself), but he has made a number of public comments that seem to me to discourage people from getting involved in the Arrow community. >From this post https://github.com/pandas-dev/pandas/issues/8640#issuecomment-527416973 "In vaex-core, currently ... we are not depending on arrow. ... My point is, I think if general algorithms (especially string algos) go into arrow, it will be 'lost' for use outside of arrow, because it's such a big dependency." I'd like to focus on this idea of code contributed to Apache Arrow being "lost" or the project being "such a big dependency". Besides appearing "throw shade" at me and other people, it doesn't make a lot of sense to me. In vaex, Maarten has developed string algorithms that execute against the Arrow string memory layout (and rebranded such data "Superstrings"). I don't support this approach for a couple of reasons * It serves to promote vaex at the expense of other projects that use some part of Apache Arrow, whether only the columnar specification or one or more libraries * The code cannot be easily reused outside of vaex If the objective is to write code that deals with Arrow data, why not do this development inside... the Arrow community? You have a group of hundreds of developers around the world working together toward common goals with shared build, test, and packaging infrastructure. Applications that need to process strings may also need to send Arrow protocol messages over shared memory or RPC, we have code for that. There's good reason for creating a composable application development platform. The use of the "big dependency" argument is a distraction from the central idea that the community ideally should work together to build reusable code artifacts that everyone can use. If the part of the project that you need is small, we should work together to make _just_ that part available to you in a way that is not onerous. I am happy to do my part to help with this. In summary, my position is that promoting Arrow development outside of the Arrow community isn't good for the open source world as a whole. Partly why I've made personal and financial sacrifices to establish Ursa Labs and work full time on the project is precisely to support developers of projects like Vaex in their use of the project. I can't help them, though, if they don't want to be helped. We want to become a dependency (in large or small part) of downstream projects so we can work together and help each other. > I don't know enough about C++ or Arrow to have my own opinion on any of them. Just tried to share what was discussed during the meeting, so it was shared with everybody who couldn't attend but could be interested. Also, re-reading what I wrote that "People were in general happy with the idea [Arrow]" may not emphasize enough the satisfaction with the project. But I can say that Sylvain made great comments about Arrow and you personally before commenting on the couple of things he disagrees on Arrow implementation. Sorry if I wasn't able to phrase things in the best way. I'm happy to make amendments if needed. > > Do you think it makes sense to forward your email to Sylvain? I know you already discussed with him, but may be worth discussing again? Just let me know. > I think it's best if we stick to public mailing lists so all of our comments are on the public record, either dev at apache.arrow.org, for general Arrow development matters or pandas-dev at python.org, for pandas specific matters Thanks, Wes > On Fri, Sep 13, 2019 at 4:16 PM Wes McKinney wrote: >> >> hey Marc, >> >> I saw the write-up about the meeting on your blog >> >> https://datapythonista.github.io/blog/dataframe-summit-at-euroscipy.html >> >> Thanks for making this happen! Sorry that I wasn't able to attend. >> >> It seems that Sylvain Corlay raised some concerns about the Apache >> Arrow project. The best place to discuss these is on the >> dev at arrow.apache.org mailing list. I welcome a direct technical >> discussion. >> >> Some specific responses to some of these >> >> 1. Apache arrow C++ API and implementation not following common C++ idioms >> >> Sylvain has said this a number of times over the last couple of years >> in various contexts. The Arrow C++ codebase is a _big_ project, and >> this criticism AFAICT is specifically about a few header files (in >> particular arrow/array.h) that he doesn't like. I have said many >> times, publicly and privately, that the solution to this is to develop >> an STL-compliant interface layer to the Arrow columnar format that >> suits the desires of groups like xtensor. We have invited the xtensor >> developers to contribute more to Apache Arrow. There is nothing >> structural about this project that's preventing this from happening. >> >> We also invite an independent, wholly header-only STL-compliant >> implementation of the Arrow columnar data structures. PRs welcome. >> >> 2. Using a monorepo (including all bindings in the same repo as Arrow) >> >> It would be more helpful to have a discussion about this on the dev@ >> mailing list to understand why this is a concern. We have many >> interdependent components, written in different programming languages, >> and the monorepo structure enables us to have peace of mind that pull >> requests to one component aren't breaking any other. For example, we >> have binary protocol integration tests testing 4 different >> implementations against each other on every commit: C++, Go, Java, and >> JavaScript, with C# and Rust on their way eventually. >> >> Unfortunately, people like to criticize monorepos as a matter of >> principle. But if you actually look at the testing requirements that a >> project has, often a monorepo is the only really viable solution. I'm >> open minded about concrete alternative proposals to the project's >> current structure that enable us to verify whether PRs breaks any of >> our integration tests (keep in mind the PRs routinely touch multiple >> project components). >> >> 3. Not a clear distinction between the specification and >> implementation (as in for instance project Jupyter) >> >> This is a red herring. It's about the *community*. In the ASF, we have >> saying "Community over Code". One of the artifacts that the Arrow >> community has produced is a specification for a columnar in-memory >> data representation. At this point, the Arrow columnar specification >> is a relatively small part of the work that's been produced by the >> community, though it's obviously very important. I explained this in >> my recent workshop on the project at VLDB >> >> * https://twitter.com/wesmckinn/status/1169277437856964614 >> * https://www.slideshare.net/wesm/apache-arrow-workshop-at-vldb-2019-boss-session-169065658 >> >> More generally, I'm interested to understand at what point projects >> would be able to take on Apache Arrow as dependency. The goal of the >> project (and why I've invested ~4 years of my life and counting in it) >> is to make everyone's lives _easier_, not harder. It seems to me to be >> an inevitability of time, and so if there is work that we can be >> prioritizing to speed along this outcome, please let me know. >> >> Thanks, >> Wes >> >> On Tue, Jul 16, 2019 at 6:25 AM Marc Garcia wrote: >> > >> > For the people who has shown interest in joining remote, I added you to the repo of the summit [1], feel free to open issues there of the topics you're interested in discussing. I also created a Gitter channel that you can join. >> > >> > EuroSciPy doesn't currently have budget to life stream the session, but if we find a sponsor we'll do it, and also publish the recording in youtube. Based on the experience with the European pandas summit this seems unlikely. >> > >> > Cheers! >> > >> > 1. https://github.com/python-sprints/dataframe-summit >> > 2. https://gitter.im/py-sprints/dataframe-summit >> > >> > >> > On Wed, Jul 10, 2019 at 8:30 AM Pietro Battiston wrote: >> >> >> >> Hi Marc, >> >> >> >> cool! >> >> >> >> I won't be able to attend Euroscipy, but if in the "Maintainers >> >> session" you plan to have a way to participate remotely, I'll >> >> definitely do. >> >> >> >> (I might be busy on the 6th instead... still don't know for sure) >> >> >> >> Pietro >> >> >> >> Il giorno gio, 04/07/2019 alle 15.45 +0100, Marc Garcia ha scritto: >> >> > Hi there, >> >> > >> >> > Just to let you know that at EuroSciPy 2019 (in September in Spain) >> >> > we will have a dataframe summit, to stay updated and coordinate among >> >> > projects replicating the pandas API (other dataframe projects are >> >> > more than welcome). >> >> > >> >> > Maintainers from all the main projects (pandas, dask, vaex, modin, >> >> > cudf and koalas) will be attending. If you want to get involved >> >> > (whether you can attend the conference or not), please DM me. >> >> > >> >> > More info: https://github.com/python-sprints/dataframe-summit >> >> > Conference website: https://www.euroscipy.org/2019/ >> >> > >> >> > Cheers! >> >> > _______________________________________________ >> >> > Pandas-dev mailing list >> >> > Pandas-dev at python.org >> >> > https://mail.python.org/mailman/listinfo/pandas-dev >> >> >> > _______________________________________________ >> > Pandas-dev mailing list >> > Pandas-dev at python.org >> > https://mail.python.org/mailman/listinfo/pandas-dev From garcia.marc at gmail.com Sat Sep 14 06:59:03 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Sat, 14 Sep 2019 11:59:03 +0100 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 In-Reply-To: References: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> Message-ID: Not sure about Maarteen intention, and I don't know the technical details to understand what he was thinking when writing that comment. But in regards of the discussions in our two hours session, my take is that everybody was happy with Arrow's goal, agreement that you were the right person to lead that (Sylvain said that), happy with the implementation, and there were just these small comments on things they'd do differently. For the tone I took them as constructive criticism, and while I don't know why they don't materialize them in PRs to Arrow, I don't think the intent was to persuade anyone from moving in a different direction. May be I'm too naive, Joris was also there may be you can check with him what was his take. But I have the feeling that there is more misunderstanding than conflicting goals here. Hopefully those things can be discussed over a beer in Berkeley, and avoid unnecessary frictions. :) On Fri, Sep 13, 2019 at 8:32 PM Wes McKinney wrote: > hey Marc, > > On Fri, Sep 13, 2019 at 10:57 AM Marc Garcia > wrote: > > > > Hi Wes, > > > > Thanks for the feedback. I actually discussed with Sylvain regarding the > blog post, since it didn't seem to be the right channel to communicate them > to you and the Arrow team if you didn't already discuss them. But he > mentioned you already discussed them in the past. Also, worth commenting > that the last point was from Maarten Breddels (vaex author). > > > > Thanks. So I won't presume to know Maarten's intent (he can speak here > for himself), but he has made a number of public comments that seem to > me to discourage people from getting involved in the Arrow community. > From this post > > https://github.com/pandas-dev/pandas/issues/8640#issuecomment-527416973 > > "In vaex-core, currently ... we are not depending on arrow. ... My > point is, I think if general algorithms (especially string algos) go > into arrow, it will be 'lost' for use outside of arrow, because it's > such a big dependency." > > I'd like to focus on this idea of code contributed to Apache Arrow > being "lost" or the project being "such a big dependency". Besides > appearing "throw shade" at me and other people, it doesn't make a lot > of sense to me. In vaex, Maarten has developed string algorithms that > execute against the Arrow string memory layout (and rebranded such > data "Superstrings"). I don't support this approach for a couple of > reasons > > * It serves to promote vaex at the expense of other projects that use > some part of Apache Arrow, whether only the columnar specification or > one or more libraries > * The code cannot be easily reused outside of vaex > > If the objective is to write code that deals with Arrow data, why not > do this development inside... the Arrow community? You have a group of > hundreds of developers around the world working together toward common > goals with shared build, test, and packaging infrastructure. > Applications that need to process strings may also need to send Arrow > protocol messages over shared memory or RPC, we have code for that. > There's good reason for creating a composable application development > platform. > > The use of the "big dependency" argument is a distraction from the > central idea that the community ideally should work together to build > reusable code artifacts that everyone can use. If the part of the > project that you need is small, we should work together to make _just_ > that part available to you in a way that is not onerous. I am happy to > do my part to help with this. > > In summary, my position is that promoting Arrow development outside of > the Arrow community isn't good for the open source world as a whole. > Partly why I've made personal and financial sacrifices to establish > Ursa Labs and work full time on the project is precisely to support > developers of projects like Vaex in their use of the project. I can't > help them, though, if they don't want to be helped. We want to become > a dependency (in large or small part) of downstream projects so we can > work together and help each other. > > > I don't know enough about C++ or Arrow to have my own opinion on any of > them. Just tried to share what was discussed during the meeting, so it was > shared with everybody who couldn't attend but could be interested. Also, > re-reading what I wrote that "People were in general happy with the idea > [Arrow]" may not emphasize enough the satisfaction with the project. But I > can say that Sylvain made great comments about Arrow and you personally > before commenting on the couple of things he disagrees on Arrow > implementation. Sorry if I wasn't able to phrase things in the best way. > I'm happy to make amendments if needed. > > > > Do you think it makes sense to forward your email to Sylvain? I know you > already discussed with him, but may be worth discussing again? Just let me > know. > > > > I think it's best if we stick to public mailing lists so all of our > comments are on the public record, either > > dev at apache.arrow.org, for general Arrow development matters or > pandas-dev at python.org, for pandas specific matters > > Thanks, > Wes > > > On Fri, Sep 13, 2019 at 4:16 PM Wes McKinney > wrote: > >> > >> hey Marc, > >> > >> I saw the write-up about the meeting on your blog > >> > >> > https://datapythonista.github.io/blog/dataframe-summit-at-euroscipy.html > >> > >> Thanks for making this happen! Sorry that I wasn't able to attend. > >> > >> It seems that Sylvain Corlay raised some concerns about the Apache > >> Arrow project. The best place to discuss these is on the > >> dev at arrow.apache.org mailing list. I welcome a direct technical > >> discussion. > >> > >> Some specific responses to some of these > >> > >> 1. Apache arrow C++ API and implementation not following common C++ > idioms > >> > >> Sylvain has said this a number of times over the last couple of years > >> in various contexts. The Arrow C++ codebase is a _big_ project, and > >> this criticism AFAICT is specifically about a few header files (in > >> particular arrow/array.h) that he doesn't like. I have said many > >> times, publicly and privately, that the solution to this is to develop > >> an STL-compliant interface layer to the Arrow columnar format that > >> suits the desires of groups like xtensor. We have invited the xtensor > >> developers to contribute more to Apache Arrow. There is nothing > >> structural about this project that's preventing this from happening. > >> > >> We also invite an independent, wholly header-only STL-compliant > >> implementation of the Arrow columnar data structures. PRs welcome. > >> > >> 2. Using a monorepo (including all bindings in the same repo as Arrow) > >> > >> It would be more helpful to have a discussion about this on the dev@ > >> mailing list to understand why this is a concern. We have many > >> interdependent components, written in different programming languages, > >> and the monorepo structure enables us to have peace of mind that pull > >> requests to one component aren't breaking any other. For example, we > >> have binary protocol integration tests testing 4 different > >> implementations against each other on every commit: C++, Go, Java, and > >> JavaScript, with C# and Rust on their way eventually. > >> > >> Unfortunately, people like to criticize monorepos as a matter of > >> principle. But if you actually look at the testing requirements that a > >> project has, often a monorepo is the only really viable solution. I'm > >> open minded about concrete alternative proposals to the project's > >> current structure that enable us to verify whether PRs breaks any of > >> our integration tests (keep in mind the PRs routinely touch multiple > >> project components). > >> > >> 3. Not a clear distinction between the specification and > >> implementation (as in for instance project Jupyter) > >> > >> This is a red herring. It's about the *community*. In the ASF, we have > >> saying "Community over Code". One of the artifacts that the Arrow > >> community has produced is a specification for a columnar in-memory > >> data representation. At this point, the Arrow columnar specification > >> is a relatively small part of the work that's been produced by the > >> community, though it's obviously very important. I explained this in > >> my recent workshop on the project at VLDB > >> > >> * https://twitter.com/wesmckinn/status/1169277437856964614 > >> * > https://www.slideshare.net/wesm/apache-arrow-workshop-at-vldb-2019-boss-session-169065658 > >> > >> More generally, I'm interested to understand at what point projects > >> would be able to take on Apache Arrow as dependency. The goal of the > >> project (and why I've invested ~4 years of my life and counting in it) > >> is to make everyone's lives _easier_, not harder. It seems to me to be > >> an inevitability of time, and so if there is work that we can be > >> prioritizing to speed along this outcome, please let me know. > >> > >> Thanks, > >> Wes > >> > >> On Tue, Jul 16, 2019 at 6:25 AM Marc Garcia > wrote: > >> > > >> > For the people who has shown interest in joining remote, I added you > to the repo of the summit [1], feel free to open issues there of the topics > you're interested in discussing. I also created a Gitter channel that you > can join. > >> > > >> > EuroSciPy doesn't currently have budget to life stream the session, > but if we find a sponsor we'll do it, and also publish the recording in > youtube. Based on the experience with the European pandas summit this seems > unlikely. > >> > > >> > Cheers! > >> > > >> > 1. https://github.com/python-sprints/dataframe-summit > >> > 2. https://gitter.im/py-sprints/dataframe-summit > >> > > >> > > >> > On Wed, Jul 10, 2019 at 8:30 AM Pietro Battiston < > me at pietrobattiston.it> wrote: > >> >> > >> >> Hi Marc, > >> >> > >> >> cool! > >> >> > >> >> I won't be able to attend Euroscipy, but if in the "Maintainers > >> >> session" you plan to have a way to participate remotely, I'll > >> >> definitely do. > >> >> > >> >> (I might be busy on the 6th instead... still don't know for sure) > >> >> > >> >> Pietro > >> >> > >> >> Il giorno gio, 04/07/2019 alle 15.45 +0100, Marc Garcia ha scritto: > >> >> > Hi there, > >> >> > > >> >> > Just to let you know that at EuroSciPy 2019 (in September in Spain) > >> >> > we will have a dataframe summit, to stay updated and coordinate > among > >> >> > projects replicating the pandas API (other dataframe projects are > >> >> > more than welcome). > >> >> > > >> >> > Maintainers from all the main projects (pandas, dask, vaex, modin, > >> >> > cudf and koalas) will be attending. If you want to get involved > >> >> > (whether you can attend the conference or not), please DM me. > >> >> > > >> >> > More info: https://github.com/python-sprints/dataframe-summit > >> >> > Conference website: https://www.euroscipy.org/2019/ > >> >> > > >> >> > Cheers! > >> >> > _______________________________________________ > >> >> > Pandas-dev mailing list > >> >> > Pandas-dev at python.org > >> >> > https://mail.python.org/mailman/listinfo/pandas-dev > >> >> > >> > _______________________________________________ > >> > Pandas-dev mailing list > >> > Pandas-dev at python.org > >> > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Sat Sep 14 09:41:20 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Sat, 14 Sep 2019 15:41:20 +0200 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References: Message-ID: We further discussed this on the hangout we had earlier this week, and based on that, Tom started a pull request trying to write down some of the conclusions for now. A summary of the current proposal is: - We adopt a loose form of semantic versioning (major.minor.patch) - We will only do API breaking changes in major versions, but with the idea to be somewhat flexible* in this definition. Major version can also happen relatively often (so certainly not another 10 years as we did to reach 1.0 ;-)) - We continue to do deprecations continuously (so also in minor releases), but will only enforce them in major releases. - And for things labeled as experimental (eg nullable integers), we take more freedom to deviate from this rule PR is here: https://github.com/pandas-dev/pandas/pull/28415/ Further feedback on this now more concrete proposal is still very welcome! (here or on the PR) Best, Joris *We currently listed a lot of changes in the "API breaking changes" section of the whatsnew documentation. For example, a fix in a not well defined or documented behaviour (but which has impact if you were relying on it) was now often mentioned in the API breaking changes, while it could also be seen as a bug fix. We will, using best judgement (with a big grey area of course), continue to do those also in minor releases when needed, but restrict clear API breakages in explicitly documented behaviour for major releases. On Wed, 11 Sep 2019 at 15:56, Tom Augspurger wrote: > Thanks Joris, > > I think your point about versioning and release cadence is important. We > have two axes we can operate along: How often we release (breaking) > changes, and how what version number we assign to those releasees. > > For example: SemVer allows a 1.3.0 release that deprecates a feature, and > then releasing 2.0.0 the next day that enforces the deprecation. We > obviously don't want to do that; we'll continue to give the community time > to adjust. > > --- > > > Those [changing NA values] will only be possible with at some point > doing a bigger breaking change release. > > In principle, we could have config options to opt into the new behavior. > But we've never done that before, and personally I would likely prefer a > breaking change, with a bump in the major version (assuming we're doing > SemVer). > > Tom > > > On Tue, Sep 10, 2019 at 3:38 PM Joris Van den Bossche < > jorisvandenbossche at gmail.com> wrote: > >> Coming back to this thread (as I don't think we really reached a >> conclusion) >> >> On Sun, Jul 21, 2019 at 10:10 AM Matthew Rocklin >> wrote: >> >>> I hope you don't mind the intrusion of a non-pandas dev here. >>> >> >> As you well know ;) we certainly don't mind the intrusion. It's exactly >> from people depending on pandas that we want to hear about this. >> >> >>> My ardent hope as a user is that you all will clean up and improve the >>> Pandas API continuously. While doing this work I fully expect small bits >>> of the API to break on pretty much every release (I think that it would be >>> hard to avoid this). My guess is that if this community adopted SemVer >>> then devs would be far more cautious about tidying things, which I think >>> would be unfortunate. As someone who is very sensitive to changes in the >>> Pandas API I'm fully in support of the devs breaking things regularly if it >>> means faster progress. >>> >>> For me, a possible reason to go for a SemVer-like versioning scheme is >> *not* to do no more breaking changes / clean-up, but rather the opposite. >> To make it easier to do big changes (if we agree on wanting to do them), as >> we then have a mechanism to do that: bump the major version. >> For example, I very much want that we, at some point, make the nullable >> integer dtype and a potential new string dtype the default types for pandas >> (which will inevitable a breaking change). I would like to see us move >> forward to a more consistent handling of missing values across types (see >> https://github.com/pandas-dev/pandas/issues/28095), optional indexes, >> ...Those will only be possible with at some point doing a bigger breaking >> change release. >> >> On Mon, 22 Jul 2019 at 17:54, Tom Augspurger >> wrote: >> >>> ... As you say, SemVer is more popular in absolute terms. But within our >>> little community (NumPy, pandas, scikit-learn), rolling deprecations seems >>> to be the preferred approach. >>> I think there's some value in being consistent with those libraries. >>> >>> It's true that others in the ecosystem (numpy, scipy, scikit-learn) use >> a rolling deprecation. But, a big difference is that they in principle only >> do deprecations and no breaking changes. >> >> >>> Joris / Wes, do you know what Arrow's policy will be after its 1.0? >>> >> >> After 1.0, Arrow will follow strict backwards compatibility guarantees >> and SemVer *for the format *( >> https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst). >> >> For library versions (eg the C++ library, or pyarrow), the document >> states to also use SemVer, but I don't think it already has been discussed >> much how to deal with that in practice (for the format the rules are much >> clearer). >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maartenbreddels at gmail.com Sun Sep 15 15:10:05 2019 From: maartenbreddels at gmail.com (Maarten Breddels) Date: Sun, 15 Sep 2019 21:10:05 +0200 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 In-Reply-To: References: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> Message-ID: Dear Wes, Let me start by saying that I really appreciate your work on Apache Arrow. I expressed that quite clearly at the summit, during my talk and I talk about arrow in the ?superstring? article. In general I try to be as supportive of Arrow as I can. From this reply I read that you think differently about my actions. Let us see if we can convince you I have good intentions. > On 13 Sep 2019, at 21:32, Wes McKinney wrote: > > hey Marc, > > On Fri, Sep 13, 2019 at 10:57 AM Marc Garcia gmail.com> wrote: >> >> Hi Wes, >> >> Thanks for the feedback. I actually discussed with Sylvain regarding >> the blog post, since it didn't seem to be the right channel to >> communicate them to you and the Arrow team if you didn't already >> discuss them. But he mentioned you already discussed them in the >> past. Also, worth commenting that the last point was from Maarten >> Breddels (vaex author). >> > > Thanks. So I won't presume to know Maarten's intent (he can speak here > for himself), but he has made a number of public comments that seem to > me to discourage people from getting involved in the Arrow community. > From this post > > https://github.com/pandas-dev/pandas/issues/8640#issuecomment-527416973 Quite the opposite, I would like people to get involved, myself included. But I think the approach for strings can be done a bit differently to also make the C++ community benefit from the work (I?ll elaborate a bit more on that thread not to go offtopic). > > "In vaex-core, currently ... we are not depending on arrow. ... My > point is, I think if general algorithms (especially string algos) go > into arrow, it will be 'lost' for use outside of arrow, because it's > such a big dependency." > > I'd like to focus on this idea of code contributed to Apache Arrow > being "lost" or the project being "such a big dependency". Besides > appearing "throw shade" at me and other people, it doesn't make a lot > of sense to me. In vaex, Maarten has developed string algorithms that > execute against the Arrow string memory layout (and rebranded such > data "Superstrings"). I don't support this approach for a couple of > reasons I don?t wanna throw shade on you, I?ll elaborate on the ?lost? idea on GitHub. The reasons that the string processing in vaex is not in Arrow has multiple reasons: ?* What I wanted was not possible in arrow 0.12 (32bit vs 64 bit offsets) https://github.com/apache/arrow/issues/3845 now with arrow 0.14 this is possible ?* I actually saw Arrow as a memory layout spec + implementations. The realisation that that algorithms were also a part of Arrow came during development and discussing with Uwe, and seeing the 0.13 changelist (I think that included value_counts?). I see now that this is clearly mentioned on the website. This is probably my misunderstanding. ?* Funding/time: This work was unfunded, but we had to get this in in short amount of time. We took on some technical dept instead. The code would never be acceptable for a PR to Arrow. Although I wish I could contribute (more) to Arrow, my time is limited, I do my best to open issues, but you simply cannot expect people to contribute to other open source projects no matter what. > > * It serves to promote vaex at the expense of other projects that use > some part of Apache Arrow, whether only the columnar specification or > one or more libraries Not my indent, since I mention Arrow in the superstring article, I did not meant to rebrand them at all. I explicitly mentioned Arrow as a gesture of goodwill, and because I think the spec is good. > * The code cannot be easily reused outside of vaex Totally agree with you on that, and that?s why I replied to the Github issue, since I care. I spend quite some effort on that, and I think nobody should have to do that again (but again, lets continue that on GitHub). > > If the objective is to write code that deals with Arrow data, why not > do this development inside... the Arrow community? You have a group of > hundreds of developers around the world working together toward common > goals with shared build, test, and packaging infrastructure. > Applications that need to process strings may also need to send Arrow > protocol messages over shared memory or RPC, we have code for that. > There's good reason for creating a composable application development > platform. > > The use of the "big dependency" argument is a distraction from the > central idea that the community ideally should work together to build > reusable code artifacts that everyone can use. If the part of the > project that you need is small, we should work together to make _just_ > that part available to you in a way that is not onerous. I am happy to > do my part to help with this. > In summary, my position is that promoting Arrow development outside of > the Arrow community isn't good for the open source world as a whole. I think this is a big part of the misunderstanding and/or disagreement. Don?t take it as bad intent of people don?t agree with you on this, or if they assume otherwise. My idea was that Apache Arrow defined the memory spec, and we can all happily build on top of that, with Apache Arrow as dependency or not, as long as you follow the spec, we can all ?speak the same data?. I am still not fully convinced that everything should to in the Arrow project, since there will always be some obscure algorithms that needs to work on Arrow data that you don?t want in your repo. So I think we might disagree on where to draw the line. You want to have a lot of algorithms directly into Arrow. That?s you decision, and I?m ok with that, but don?t attack me because I did not know your intentions/plans. > Partly why I've made personal and financial sacrifices to establish > Ursa Labs and work full time on the project is precisely to support > developers of projects like Vaex in their use of the project. I can't > help them, though, if they don't want to be helped. We want to become > a dependency (in large or small part) of downstream projects so we can > work together and help each other. I *do* want to be helped :) I see so much more possibilities when vaex-core depends on Arrow, but this was not possible before 0.14. You mentioned on twitter some issues with wheels on Windows right? If that?s fixed I see no reason not to adopt Arrow and to depend on it. > >> I don't know enough about C++ or Arrow to have my own opinion on any >> of them. Just tried to share what was discussed during the meeting, >> so it was shared with everybody who couldn't attend but could be >> interested. Also, re-reading what I wrote that "People were in >> general happy with the idea [Arrow]" may not emphasize enough the >> satisfaction with the project. But I can say that Sylvain made great >> comments about Arrow and you personally before commenting on the >> couple of things he disagrees on Arrow implementation. Sorry if I >> wasn't able to phrase things in the best way. I'm happy to make >> amendments if needed. >> >> Do you think it makes sense to forward your email to Sylvain? I know >> you already discussed with him, but may be worth discussing again? >> Just let me know. >> > > I think it's best if we stick to public mailing lists so all of our > comments are on the public record, either > > dev at apache.arrow.org, for general Arrow development matters or > pandas-dev at python.org, for pandas specific matters > > Thanks, > Wes > >> On Fri, Sep 13, 2019 at 4:16 PM Wes McKinney >> wrote: >>> >>> hey Marc, >>> >>> I saw the write-up about the meeting on your blog >>> >>> https://datapythonista.github.io/blog/dataframe-summit-at-euroscipy.html >>> >>> Thanks for making this happen! Sorry that I wasn't able to attend. >>> >>> It seems that Sylvain Corlay raised some concerns about the Apache >>> Arrow project. The best place to discuss these is on the >>> dev at arrow.apache.org mailing list. I welcome a direct technical >>> discussion. >>> >>> Some specific responses to some of these >>> >>> 1. Apache arrow C++ API and implementation not following common C++ >>> idioms >>> >>> Sylvain has said this a number of times over the last couple of years >>> in various contexts. The Arrow C++ codebase is a _big_ project, and >>> this criticism AFAICT is specifically about a few header files (in >>> particular arrow/array.h) that he doesn't like. I have said many >>> times, publicly and privately, that the solution to this is to develop >>> an STL-compliant interface layer to the Arrow columnar format that >>> suits the desires of groups like xtensor. We have invited the xtensor >>> developers to contribute more to Apache Arrow. There is nothing >>> structural about this project that's preventing this from happening. >>> >>> We also invite an independent, wholly header-only STL-compliant >>> implementation of the Arrow columnar data structures. PRs welcome. >>> >>> 2. Using a monorepo (including all bindings in the same repo as Arrow) >>> >>> It would be more helpful to have a discussion about this on the dev@ >>> mailing list to understand why this is a concern. We have many >>> interdependent components, written in different programming languages, >>> and the monorepo structure enables us to have peace of mind that pull >>> requests to one component aren't breaking any other. For example, we >>> have binary protocol integration tests testing 4 different >>> implementations against each other on every commit: C++, Go, Java, and >>> JavaScript, with C# and Rust on their way eventually. >>> >>> Unfortunately, people like to criticize monorepos as a matter of >>> principle. But if you actually look at the testing requirements that a >>> project has, often a monorepo is the only really viable solution. I'm >>> open minded about concrete alternative proposals to the project's >>> current structure that enable us to verify whether PRs breaks any of >>> our integration tests (keep in mind the PRs routinely touch multiple >>> project components). Totally with you on monorepos, vaex is using the same, and I believe it saves me a lot of time. >>> >>> 3. Not a clear distinction between the specification and >>> implementation (as in for instance project Jupyter) >>> >>> This is a red herring. It's about the *community*. In the ASF, we have >>> saying "Community over Code". One of the artifacts that the Arrow >>> community has produced is a specification for a columnar in-memory >>> data representation. At this point, the Arrow columnar specification >>> is a relatively small part of the work that's been produced by the >>> community, though it's obviously very important. I explained this in >>> my recent workshop on the project at VLDB >>> >>> * https://twitter.com/wesmckinn/status/1169277437856964614 >>> * >>> https://www.slideshare.net/wesm/apache-arrow-workshop-at-vldb-2019-boss-session-169065658 I added that point in a PR to the blogpost because I think it more accurately reflected the discussion, although I did agree with it. Let me try to rephrase your idea, to see if I get it: Instead of having a spec, and everybody building op top of the spec independently and have many people would be attacking the same issues, CI/building/distributing etc. Instead, if we put it in 1 repo, and collaborate, we share that burden. Is that somewhat accurate? >>> >>> More generally, I'm interested to understand at what point projects >>> would be able to take on Apache Arrow as dependency. The goal of the >>> project (and why I've invested ~4 years of my life and counting in it) >>> is to make everyone's lives _easier_, not harder. It seems to me to be >>> an inevitability of time, and so if there is work that we can be >>> prioritizing to speed along this outcome, please let me know. As mentioned above, I?m almost there (take Arrow as a dependency). I would also really love to see the string algorithms go in Arrow (though we disagree on the details maybe). I currently have no funding or serious time to spend on that, but happy to share my thoughts, experiences and help where I can. I?m pretty happy with this reply actually, since you?ve clarified a lot for me. The tone could be a bit different, but given the way you saw my actions it makes more sense. I hope I?ve taken away some frustrations and hope we can build bridges, and a better world :) (well, at least regarding dataframes). If you still feel I?ve done something which goes again you/Arrow or step on your toes, let me know, publicly or privately. We cannot guess people?s motives or incentives, and if you are unintentionally frustrated by my actions, that?s a waste of energy. I?d rather be vaex to be a stimulus for Apache Arrow than a source of frustration. Cheers, Maarten Breddels >>> >>> Thanks, >>> Wes >>> >>> On Tue, Jul 16, 2019 at 6:25 AM Marc Garcia >> gmail.com> wrote: >>>> >>>> For the people who has shown interest in joining remote, I added >>>> you to the repo of the summit [1], feel free to open issues there >>>> of the topics you're interested in discussing. I also created a >>>> Gitter channel that you can join. >>>> >>>> EuroSciPy doesn't currently have budget to life stream the session, >>>> but if we find a sponsor we'll do it, and also publish the >>>> recording in youtube. Based on the experience with the European >>>> pandas summit this seems unlikely. >>>> >>>> Cheers! >>>> >>>> 1. https://github.com/python-sprints/dataframe-summit >>>> 2. https://gitter.im/py-sprints/dataframe-summit >>>> >>>> >>>> On Wed, Jul 10, 2019 at 8:30 AM Pietro Battiston >>> pietrobattiston.it> wrote: >>>>> >>>>> Hi Marc, >>>>> >>>>> cool! >>>>> >>>>> I won't be able to attend Euroscipy, but if in the "Maintainers >>>>> session" you plan to have a way to participate remotely, I'll >>>>> definitely do. >>>>> >>>>> (I might be busy on the 6th instead... still don't know for sure) >>>>> >>>>> Pietro >>>>> >>>>> Il giorno gio, 04/07/2019 alle 15.45 +0100, Marc Garcia ha scritto: >>>>>> Hi there, >>>>>> >>>>>> Just to let you know that at EuroSciPy 2019 (in September in Spain) >>>>>> we will have a dataframe summit, to stay updated and coordinate among >>>>>> projects replicating the pandas API (other dataframe projects are >>>>>> more than welcome). >>>>>> >>>>>> Maintainers from all the main projects (pandas, dask, vaex, modin, >>>>>> cudf and koalas) will be attending. If you want to get involved >>>>>> (whether you can attend the conference or not), please DM me. >>>>>> >>>>>> More info: https://github.com/python-sprints/dataframe-summit >>>>>> Conference website: https://www.euroscipy.org/2019/ >>>>>> >>>>>> Cheers! >>>>>> _______________________________________________ >>>>>> Pandas-dev mailing list >>>>>> Pandas-dev at python.org >>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>> >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dsaxton at pm.me Sun Sep 15 15:17:06 2019 From: dsaxton at pm.me (Daniel Saxton) Date: Sun, 15 Sep 2019 19:17:06 +0000 Subject: [Pandas-dev] DataFrame.value_counts Message-ID: <63jPapumc1F_fW3_WhcDZdxW9DVoq0DB1UeGXnekOpVQY7cHNi4410oKpCU6mupsS4gQ7tLrtiuuIlZwk-UxQlNbpJzJT2muy8M_EO9-vXA=@pm.me> Currently in pandas if we want to count the values for a single column of a DataFrame we would use df["a"].value_counts(), but when we want to count combinations of more than one column we (as far as I know) have to switch syntax and use df.groupby(["a", "b"]).size(). This is a little awkward code-wise and likely carries some unnecessary overhead since we don't actually need to prepare a groupby object that can handle an arbitrary calculation on the subframes. There's some evidence of this overhead in the Series case: import numpy as np import pandas as pd s = pd.Series(np.random.randint(1, 10, 10**6)) %timeit s.value_counts() # 6.74 ms ? 78.8 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) %timeit s.groupby(s).size() # 11.7 ms ? 189 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) I think it would be useful and more efficient if there was a DataFrame.value_counts method, which could take a required columns argument indicating the combinations over which we want to count. This seems like a common enough operation that it might be worthwhile to add this functionality, but wanted to see what other opinions there were on this. I know pandas already has a huge number of methods and it's good to resist adding more, but I would see this more as "filling out" rather than "adding to" the API. -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.ayd at icloud.com Sun Sep 15 16:45:35 2019 From: william.ayd at icloud.com (William Ayd) Date: Sun, 15 Sep 2019 13:45:35 -0700 Subject: [Pandas-dev] DataFrame.value_counts In-Reply-To: <63jPapumc1F_fW3_WhcDZdxW9DVoq0DB1UeGXnekOpVQY7cHNi4410oKpCU6mupsS4gQ7tLrtiuuIlZwk-UxQlNbpJzJT2muy8M_EO9-vXA=@pm.me> References: <63jPapumc1F_fW3_WhcDZdxW9DVoq0DB1UeGXnekOpVQY7cHNi4410oKpCU6mupsS4gQ7tLrtiuuIlZwk-UxQlNbpJzJT2muy8M_EO9-vXA=@pm.me> Message-ID: <74B9430B-5581-4379-8E49-2338CA5917BE@icloud.com> Hi Daniel, Thanks for the feedback. There is actually already a PR to implement this which I think is getting close: https://github.com/pandas-dev/pandas/pull/27350 Would certainly welcome any feedback you can offer there in terms of trying it out on your end and/or taking part in the review process. - Will > On Sep 15, 2019, at 12:17 PM, Daniel Saxton via Pandas-dev wrote: > > Currently in pandas if we want to count the values for a single column of a DataFrame we would use df["a"].value_counts(), but when we want to count combinations of more than one column we (as far as I know) have to switch syntax and use df.groupby(["a", "b"]).size(). This is a little awkward code-wise and likely carries some unnecessary overhead since we don't actually need to prepare a groupby object that can handle an arbitrary calculation on the subframes. There's some evidence of this overhead in the Series case: > > import numpy as np > import pandas as pd > > s = pd.Series(np.random.randint(1, 10, 10**6)) > > %timeit s.value_counts() > # 6.74 ms ? 78.8 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) > > %timeit s.groupby(s).size() > # 11.7 ms ? 189 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) > > I think it would be useful and more efficient if there was a DataFrame.value_counts method, which could take a required columns argument indicating the combinations over which we want to count. This seems like a common enough operation that it might be worthwhile to add this functionality, but wanted to see what other opinions there were on this. I know pandas already has a huge number of methods and it's good to resist adding more, but I would see this more as "filling out" rather than "adding to" the API. > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Sun Sep 15 17:07:09 2019 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 15 Sep 2019 16:07:09 -0500 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 In-Reply-To: References: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> Message-ID: hi Maarten, This discussion definitely brings up "The Lisp Curse" for me http://www.winestockwebdesign.com/Essays/Lisp_Curse.html "Lisp is so powerful that problems which are technical issues in other programming languages are social issues in Lisp." I understand that a lot of miscommunication in open source comes down to differing perspectives on things, so I will try to explain some of my concerns in response to your comments. On Sun, Sep 15, 2019 at 2:40 PM Maarten Breddels wrote: > > Dear Wes, > > Let me start by saying that I really appreciate your work on Apache Arrow. I expressed that quite clearly at the summit, during my talk and I talk about arrow in the ?superstring? article. In general I try to be as supportive of Arrow as I can. From this reply I read that you think differently about my actions. Let us see if we can convince you I have good intentions. > > > On 13 Sep 2019, at 21:32, Wes McKinney wrote: > > hey Marc, > > On Fri, Sep 13, 2019 at 10:57 AM Marc Garcia wrote: > > > Hi Wes, > > Thanks for the feedback. I actually discussed with Sylvain regarding the blog post, since it didn't seem to be the right channel to communicate them to you and the Arrow team if you didn't already discuss them. But he mentioned you already discussed them in the past. Also, worth commenting that the last point was from Maarten Breddels (vaex author). > > > Thanks. So I won't presume to know Maarten's intent (he can speak here > for himself), but he has made a number of public comments that seem to > me to discourage people from getting involved in the Arrow community. > From this post > > https://github.com/pandas-dev/pandas/issues/8640#issuecomment-527416973 > > > Quite the opposite, I would like people to get involved, myself included. But I think the approach for strings can be done a bit differently to also make the C++ community benefit from the work (I?ll elaborate a bit more on that thread not to go offtopic). > > > "In vaex-core, currently ... we are not depending on arrow. ... My > point is, I think if general algorithms (especially string algos) go > into arrow, it will be 'lost' for use outside of arrow, because it's > such a big dependency." > > I'd like to focus on this idea of code contributed to Apache Arrow > being "lost" or the project being "such a big dependency". Besides > appearing "throw shade" at me and other people, it doesn't make a lot > of sense to me. In vaex, Maarten has developed string algorithms that > execute against the Arrow string memory layout (and rebranded such > data "Superstrings"). I don't support this approach for a couple of > reasons > > > I don?t wanna throw shade on you, I?ll elaborate on the ?lost? idea on GitHub. > > The reasons that the string processing in vaex is not in Arrow has multiple reasons: > * What I wanted was not possible in arrow 0.12 (32bit vs 64 bit offsets) https://github.com/apache/arrow/issues/3845 now with arrow 0.14 this is possible > * I actually saw Arrow as a memory layout spec + implementations. The realisation that that algorithms were also a part of Arrow came during development and discussing with Uwe, and seeing the 0.13 changelist (I think that included value_counts?). I see now that this is clearly mentioned on the website. This is probably my misunderstanding. > * Funding/time: This work was unfunded, but we had to get this in in short amount of time. We took on some technical dept instead. The code would never be acceptable for a PR to Arrow. > > Although I wish I could contribute (more) to Arrow, my time is limited, I do my best to open issues, but you simply cannot expect people to contribute to other open source projects no matter what. > I don't expect that, but I don't think it's good to suggest that other people should _also_ not contribute (without disclosing the potential problems -- see below). Whether or not that was your intent, that's how the comments came across to me and others. > > > * It serves to promote vaex at the expense of other projects that use > some part of Apache Arrow, whether only the columnar specification or > one or more libraries > > > Not my indent, since I mention Arrow in the superstring article, I did not meant to rebrand them at all. I explicitly mentioned Arrow as a gesture of goodwill, and because I think the spec is good. > > * The code cannot be easily reused outside of vaex > > > Totally agree with you on that, and that?s why I replied to the Github issue, since I care. I spend quite some effort on that, and I think nobody should have to do that again (but again, lets continue that on GitHub). > > > If the objective is to write code that deals with Arrow data, why not > do this development inside... the Arrow community? You have a group of > hundreds of developers around the world working together toward common > goals with shared build, test, and packaging infrastructure. > Applications that need to process strings may also need to send Arrow > protocol messages over shared memory or RPC, we have code for that. > There's good reason for creating a composable application development > platform. > > The use of the "big dependency" argument is a distraction from the > central idea that the community ideally should work together to build > reusable code artifacts that everyone can use. If the part of the > project that you need is small, we should work together to make _just_ > that part available to you in a way that is not onerous. I am happy to > do my part to help with this. > > > > In summary, my position is that promoting Arrow development outside of > the Arrow community isn't good for the open source world as a whole. > > > I think this is a big part of the misunderstanding and/or disagreement. Don?t take it as bad intent of people don?t agree with you on this, or if they assume otherwise. My idea was that Apache Arrow defined the memory spec, and we can all happily build on top of that, with Apache Arrow as dependency or not, as long as you follow the spec, we can all ?speak the same data?. I am still not fully convinced that everything should to in the Arrow project, since there will always be some obscure algorithms that needs to work on Arrow data that you don?t want in your repo. So I think we might disagree on where to draw the line. You want to have a lot of algorithms directly into Arrow. That?s you decision, and I?m ok with that, but don?t attack me because I did not know your intentions/plans. > I think the "follow the spec" idea here is where we are diverging. The trouble is: creating a 100% complete and provably correct implementation of the columnar specification is difficult. There are hundreds of details that must be exactly right lest users cause applications to compute incorrect results or even segfault. So, my argument (which we can agree to disagree about) is that having a proliferation of independent and incomplete Arrow implementations is almost certainly hurtful to the open source community. If you implement 10% or 20% of the columnar specification and tell people that you are "Arrow-based" or "following the Arrow spec", what's wrong with that? Well, in the short term it may be okay. In the long term, let's say that we end up with a situation like this: * A reference C++ implementation of 100% of the columnar spec, and a library ecosystem that relies on a common core library, call this ecosystem REFERENCE * Some number of independent "Arrow core" implementations in C or C++ that are less than 100% complete. Let's call these libraries THIRDPARTY_A, THIRDPARTY_B, etc. If there is a fragmentation of functionality between these projects, developers may need things from different libraries. For example, suppose I need an algorithm from THIRDPARTY_A. But THIRDPARTY_A may not have a complete, battle-tested implementation of Arrow's binary protocol. So to use THIRDPARTY_A's code, someone will have to write and maintain a "serializer" to zero-copy cast data structures from one library's data structures to the other. If THIRDPARTY_A has implemented something incorrectly, data handed off to REFERENCE may be difficult to fully validate as being compliant and so segfaults or worse could ensue. This will create an inherent tension with such developers that will discourage their involvement or use of REFERENCE or THIRDPARTY_A, or both. It may be the case that the developers of THIRDPARTY_A don't care about some of the stuff in REFERENCE, and they don't care about the parts of the specification that they haven't implemented or haven't tested thoroughly. That's fine, but because many developers don't understand how extensive the specification is, if you say that your third party project is "Arrow-compatible" or "following the Arrow spec" they may be fooled into believing that they can use code from multiple "Arrow-compatible" projects without pain or risk to their applications. So when I read your comments, they said to me "I am not depending on or contributing to REFERENCE because of $REASONS, but I would be interested in creating THIRDPARTY_A that also does not depend on REFERENCE". I think this is a much riskier path that it seems at face value. You said in your Superstring article "it means that all projects supporting the Apache Arrow format will be able to use the same data structure without any memory copying". The handoff (whether in-memory or via the IPC protocol) from project A to project B becomes a source of risk if project A and project B have been insufficiently integration tested. I would like to see everyone depending on a common core library with a 100% complete and battle-tested implementation of the specification and binary protocol to eliminate these risks and fragmentation of labor around compatibility testing. If there are practical barriers to this, I'd like to understand what they are so we can work together to eliminate them. Another area where I'm quite interested is to create a "nanoarrow" ANSI C or C99 library that provides an ultraminimalist set of C data structures to use as the basis for third party applications to develop their custom algorithms against if they want the smallest possible dependency to vendor into their project. Then we can build and maintain C bindings to the C++ library to leverage more advanced features (like memory mapping) if needed. This is inspired by https://github.com/nanopb/nanopb > > Partly why I've made personal and financial sacrifices to establish > Ursa Labs and work full time on the project is precisely to support > developers of projects like Vaex in their use of the project. I can't > help them, though, if they don't want to be helped. We want to become > a dependency (in large or small part) of downstream projects so we can > work together and help each other. > > > I *do* want to be helped :) I see so much more possibilities when vaex-core depends on Arrow, but this was not possible before 0.14. You mentioned on twitter some issues with wheels on Windows right? If that?s fixed I see no reason not to adopt Arrow and to depend on it. > As far as I know 0.15.0 will have Windows wheels. In the absence of more maintainers for wheels the most likely scenario is that the wheel packages will be more minimalist (with many components disabled -- due to the compatibility issues around having statically-linked LLVM symbols and other things in wheels) while the conda packages will be much more comprehensive. Thanks, Wes > > > I don't know enough about C++ or Arrow to have my own opinion on any of them. Just tried to share what was discussed during the meeting, so it was shared with everybody who couldn't attend but could be interested. Also, re-reading what I wrote that "People were in general happy with the idea [Arrow]" may not emphasize enough the satisfaction with the project. But I can say that Sylvain made great comments about Arrow and you personally before commenting on the couple of things he disagrees on Arrow implementation. Sorry if I wasn't able to phrase things in the best way. I'm happy to make amendments if needed. > > Do you think it makes sense to forward your email to Sylvain? I know you already discussed with him, but may be worth discussing again? Just let me know. > > > I think it's best if we stick to public mailing lists so all of our > comments are on the public record, either > > dev at apache.arrow.org, for general Arrow development matters or > pandas-dev at python.org, for pandas specific matters > > Thanks, > Wes > > On Fri, Sep 13, 2019 at 4:16 PM Wes McKinney wrote: > > > hey Marc, > > I saw the write-up about the meeting on your blog > > https://datapythonista.github.io/blog/dataframe-summit-at-euroscipy.html > > Thanks for making this happen! Sorry that I wasn't able to attend. > > It seems that Sylvain Corlay raised some concerns about the Apache > Arrow project. The best place to discuss these is on the > dev at arrow.apache.org mailing list. I welcome a direct technical > discussion. > > Some specific responses to some of these > > 1. Apache arrow C++ API and implementation not following common C++ idioms > > Sylvain has said this a number of times over the last couple of years > in various contexts. The Arrow C++ codebase is a _big_ project, and > this criticism AFAICT is specifically about a few header files (in > particular arrow/array.h) that he doesn't like. I have said many > times, publicly and privately, that the solution to this is to develop > an STL-compliant interface layer to the Arrow columnar format that > suits the desires of groups like xtensor. We have invited the xtensor > developers to contribute more to Apache Arrow. There is nothing > structural about this project that's preventing this from happening. > > We also invite an independent, wholly header-only STL-compliant > implementation of the Arrow columnar data structures. PRs welcome. > > 2. Using a monorepo (including all bindings in the same repo as Arrow) > > It would be more helpful to have a discussion about this on the dev@ > mailing list to understand why this is a concern. We have many > interdependent components, written in different programming languages, > and the monorepo structure enables us to have peace of mind that pull > requests to one component aren't breaking any other. For example, we > have binary protocol integration tests testing 4 different > implementations against each other on every commit: C++, Go, Java, and > JavaScript, with C# and Rust on their way eventually. > > Unfortunately, people like to criticize monorepos as a matter of > principle. But if you actually look at the testing requirements that a > project has, often a monorepo is the only really viable solution. I'm > open minded about concrete alternative proposals to the project's > current structure that enable us to verify whether PRs breaks any of > our integration tests (keep in mind the PRs routinely touch multiple > project components). > > > Totally with you on monorepos, vaex is using the same, and I believe it saves me a lot of time. > > > 3. Not a clear distinction between the specification and > implementation (as in for instance project Jupyter) > > This is a red herring. It's about the *community*. In the ASF, we have > saying "Community over Code". One of the artifacts that the Arrow > community has produced is a specification for a columnar in-memory > data representation. At this point, the Arrow columnar specification > is a relatively small part of the work that's been produced by the > community, though it's obviously very important. I explained this in > my recent workshop on the project at VLDB > > * https://twitter.com/wesmckinn/status/1169277437856964614 > * https://www.slideshare.net/wesm/apache-arrow-workshop-at-vldb-2019-boss-session-169065658 > > > I added that point in a PR to the blogpost because I think it more accurately reflected the discussion, although I did agree with it. > > Let me try to rephrase your idea, to see if I get it: > Instead of having a spec, and everybody building op top of the spec independently and have many people would be attacking the same issues, CI/building/distributing etc. Instead, if we put it in 1 repo, and collaborate, we share that burden. Is that somewhat accurate? > > > > > More generally, I'm interested to understand at what point projects > would be able to take on Apache Arrow as dependency. The goal of the > project (and why I've invested ~4 years of my life and counting in it) > is to make everyone's lives _easier_, not harder. It seems to me to be > an inevitability of time, and so if there is work that we can be > prioritizing to speed along this outcome, please let me know. > > > As mentioned above, I?m almost there (take Arrow as a dependency). > I would also really love to see the string algorithms go in Arrow (though we disagree on the details maybe). I currently have no funding or serious time to spend on that, but happy to share my thoughts, experiences and help where I can. > > I?m pretty happy with this reply actually, since you?ve clarified a lot for me. The tone could be a bit different, but given the way you saw my actions it makes more sense. I hope I?ve taken away some frustrations and hope we can build bridges, and a better world :) (well, at least regarding dataframes). > If you still feel I?ve done something which goes again you/Arrow or step on your toes, let me know, publicly or privately. We cannot guess people?s motives or incentives, and if you are unintentionally frustrated by my actions, that?s a waste of energy. I?d rather be vaex to be a stimulus for Apache Arrow than a source of frustration. > > > Cheers, > > Maarten Breddels > > > > Thanks, > Wes > > On Tue, Jul 16, 2019 at 6:25 AM Marc Garcia wrote: > > > For the people who has shown interest in joining remote, I added you to the repo of the summit [1], feel free to open issues there of the topics you're interested in discussing. I also created a Gitter channel that you can join. > > EuroSciPy doesn't currently have budget to life stream the session, but if we find a sponsor we'll do it, and also publish the recording in youtube. Based on the experience with the European pandas summit this seems unlikely. > > Cheers! > > 1. https://github.com/python-sprints/dataframe-summit > 2. https://gitter.im/py-sprints/dataframe-summit > > > On Wed, Jul 10, 2019 at 8:30 AM Pietro Battiston wrote: > > > Hi Marc, > > cool! > > I won't be able to attend Euroscipy, but if in the "Maintainers > session" you plan to have a way to participate remotely, I'll > definitely do. > > (I might be busy on the 6th instead... still don't know for sure) > > Pietro > > Il giorno gio, 04/07/2019 alle 15.45 +0100, Marc Garcia ha scritto: > > Hi there, > > Just to let you know that at EuroSciPy 2019 (in September in Spain) > we will have a dataframe summit, to stay updated and coordinate among > projects replicating the pandas API (other dataframe projects are > more than welcome). > > Maintainers from all the main projects (pandas, dask, vaex, modin, > cudf and koalas) will be attending. If you want to get involved > (whether you can attend the conference or not), please DM me. > > More info: https://github.com/python-sprints/dataframe-summit > Conference website: https://www.euroscipy.org/2019/ > > Cheers! > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev From garcia.marc at gmail.com Wed Sep 18 09:18:04 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Wed, 18 Sep 2019 14:18:04 +0100 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: Hi, An update on the new website infrastructure. We need to finish discussing the details, but OVH is happy to provide the hosting for the pandas infrastructure we need. My initial idea is to credit them in the page with the rest of the sponsors in the new website: https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and also in the top right corner of the runnable code widgets (see for example where Binder is credited here: https://spacy.io/). What I'd like to ask is: 1. For the production website and docs (static content only, for the traffic we need): https://us.ovhcloud.com/products/public-cloud/object-storage 2. For our tools and processes, like the benchmarks, builds, CI stuff (temporary publish the docs for every PR,...): https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) 3. For BinderHub (runnable code in our docs, launch tutorials on Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ For the BinderHub, QuantStack offered help with the set up (which is great, because I don't know much about Binder myself, and I'm not sure if anyone else does or wants to take care of this). I don't think it'll be easy to estimate how big is the cluster we need beforehand, but I guess we can add things to Binder iteratively, and have more info as we grow. OVH gave us a 200 euros voucher to experiment with the different services. Let me know how all this sounds, and if there are no objections, I'll create an account and buy those services with the voucher, and I'll start to prototype and see how everything works. Cheers! On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia wrote: > Somehow related to the work on the new website ( > https://github.com/pandas-dev/pandas/pull/28014), I've been discussing > with the Binder team, and looks like should be quite easy soon (with a > Sphinx extension) to make all the documentation pages runnable with Binder, > directly from the website (without opening the page as a Jupyter in > mybinder). > > While they are very happy with the idea of having this is pandas, it's > uncertain if the current infrastructure Binder has got, is able to handle > all the traffic we would send. And scikit-learn is working on it too (today > they added to the dev docs a link to mybinder to run the examples). > > I'm discussing with OVH (their infrastructure provider) on whether they'd > be happy to provide a dedicated BinderHub specific to pandas (or may be we > can have one for all NumFOCUS projects). We'll see how it goes, but wanted > to let you know, so you're updated, and in case anyone is interested in > participating in the discussions. Of course before any decision is made > I'll open a discussion here or on GitHub. > > As part of the discussion I'm also trying to get a server for the website, > and one for development stuff. Specfically for the dev docs (including > rendered docs of every PR) and the GitHub app that will generate them. I > guess it should be very easy to find a sponsor for these two servers (in > exchange of a small note in the footer of the website, or something like > that). > > Let me know if you have any comment, want to be involved or whatever. > > Cheers! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Wed Sep 18 09:50:58 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Wed, 18 Sep 2019 08:50:58 -0500 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: Sounds good w.r.t crediting OVH on those pages. For the ASV results at pandas.pydata.org/speed (which I now notice is currently broken for pandas), the only thing on the webserver is a cron job doing a `git pull` from https://github.com/asv-runner/asv-collection, from within `/usr/share/nginx`. Tom On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia wrote: > Hi, > > An update on the new website infrastructure. We need to finish discussing > the details, but OVH is happy to provide the hosting for the pandas > infrastructure we need. > > My initial idea is to credit them in the page with the rest of the > sponsors in the new website: > https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and > also in the top right corner of the runnable code widgets (see for example > where Binder is credited here: https://spacy.io/). > > What I'd like to ask is: > > 1. For the production website and docs (static content only, for the > traffic we need): > https://us.ovhcloud.com/products/public-cloud/object-storage > 2. For our tools and processes, like the benchmarks, builds, CI stuff > (temporary publish the docs for every PR,...): > https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) > 3. For BinderHub (runnable code in our docs, launch tutorials on > Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ > > For the BinderHub, QuantStack offered help with the set up (which is > great, because I don't know much about Binder myself, and I'm not sure if > anyone else does or wants to take care of this). I don't think it'll be > easy to estimate how big is the cluster we need beforehand, but I guess we > can add things to Binder iteratively, and have more info as we grow. > > OVH gave us a 200 euros voucher to experiment with the different services. > Let me know how all this sounds, and if there are no objections, I'll > create an account and buy those services with the voucher, and I'll start > to prototype and see how everything works. > > Cheers! > > On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia > wrote: > >> Somehow related to the work on the new website ( >> https://github.com/pandas-dev/pandas/pull/28014), I've been discussing >> with the Binder team, and looks like should be quite easy soon (with a >> Sphinx extension) to make all the documentation pages runnable with Binder, >> directly from the website (without opening the page as a Jupyter in >> mybinder). >> >> While they are very happy with the idea of having this is pandas, it's >> uncertain if the current infrastructure Binder has got, is able to handle >> all the traffic we would send. And scikit-learn is working on it too (today >> they added to the dev docs a link to mybinder to run the examples). >> >> I'm discussing with OVH (their infrastructure provider) on whether they'd >> be happy to provide a dedicated BinderHub specific to pandas (or may be we >> can have one for all NumFOCUS projects). We'll see how it goes, but wanted >> to let you know, so you're updated, and in case anyone is interested in >> participating in the discussions. Of course before any decision is made >> I'll open a discussion here or on GitHub. >> >> As part of the discussion I'm also trying to get a server for the >> website, and one for development stuff. Specfically for the dev docs >> (including rendered docs of every PR) and the GitHub app that will generate >> them. I guess it should be very easy to find a sponsor for these two >> servers (in exchange of a small note in the footer of the website, or >> something like that). >> >> Let me know if you have any comment, want to be involved or whatever. >> >> Cheers! >> > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy at numfocus.org Wed Sep 18 11:31:45 2019 From: andy at numfocus.org (Andy Terrel) Date: Wed, 18 Sep 2019 10:31:45 -0500 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: Sounds great to me. Just let me know where everything goes. NumPy wants me to help host a discourse for them, maybe OVH would be a good place to do that as well, (although I would be more inclinded if it was pydata and we had pandas, scipy, and numpy on it). -- Andy On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger wrote: > Sounds good w.r.t crediting OVH on those pages. > > For the ASV results at pandas.pydata.org/speed (which I now notice is > currently broken for pandas), the only thing on the webserver is a > cron job doing a `git pull` from > https://github.com/asv-runner/asv-collection, from within > `/usr/share/nginx`. > > Tom > > > On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia wrote: > >> Hi, >> >> An update on the new website infrastructure. We need to finish discussing >> the details, but OVH is happy to provide the hosting for the pandas >> infrastructure we need. >> >> My initial idea is to credit them in the page with the rest of the >> sponsors in the new website: >> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >> also in the top right corner of the runnable code widgets (see for example >> where Binder is credited here: https://spacy.io/). >> >> What I'd like to ask is: >> >> 1. For the production website and docs (static content only, for the >> traffic we need): >> https://us.ovhcloud.com/products/public-cloud/object-storage >> 2. For our tools and processes, like the benchmarks, builds, CI stuff >> (temporary publish the docs for every PR,...): >> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >> 3. For BinderHub (runnable code in our docs, launch tutorials on >> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >> >> For the BinderHub, QuantStack offered help with the set up (which is >> great, because I don't know much about Binder myself, and I'm not sure if >> anyone else does or wants to take care of this). I don't think it'll be >> easy to estimate how big is the cluster we need beforehand, but I guess we >> can add things to Binder iteratively, and have more info as we grow. >> >> OVH gave us a 200 euros voucher to experiment with the different >> services. Let me know how all this sounds, and if there are no objections, >> I'll create an account and buy those services with the voucher, and I'll >> start to prototype and see how everything works. >> >> Cheers! >> >> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >> wrote: >> >>> Somehow related to the work on the new website ( >>> https://github.com/pandas-dev/pandas/pull/28014), I've been discussing >>> with the Binder team, and looks like should be quite easy soon (with a >>> Sphinx extension) to make all the documentation pages runnable with Binder, >>> directly from the website (without opening the page as a Jupyter in >>> mybinder). >>> >>> While they are very happy with the idea of having this is pandas, it's >>> uncertain if the current infrastructure Binder has got, is able to handle >>> all the traffic we would send. And scikit-learn is working on it too (today >>> they added to the dev docs a link to mybinder to run the examples). >>> >>> I'm discussing with OVH (their infrastructure provider) on whether >>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>> wanted to let you know, so you're updated, and in case anyone is interested >>> in participating in the discussions. Of course before any decision is made >>> I'll open a discussion here or on GitHub. >>> >>> As part of the discussion I'm also trying to get a server for the >>> website, and one for development stuff. Specfically for the dev docs >>> (including rendered docs of every PR) and the GitHub app that will generate >>> them. I guess it should be very easy to find a sponsor for these two >>> servers (in exchange of a small note in the footer of the website, or >>> something like that). >>> >>> Let me know if you have any comment, want to be involved or whatever. >>> >>> Cheers! >>> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > -- Andy R. Terrel, PhD President NumFOCUS andy at numfocus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From sylvain.corlay at quantstack.net Wed Sep 18 18:13:28 2019 From: sylvain.corlay at quantstack.net (Sylvain Corlay) Date: Thu, 19 Sep 2019 00:13:28 +0200 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: Hi All, To give you some context, OVH is getting increasingly involved with the PyData ecosystem. They maintain a binderhub instance which handles a portion of the traffic for mybinder.org. Some of you may have seen the recent announcement about this setup, and the "binder federation": https://blog.jupyter.org/the-international-binder-federation-4f6235c1537e - so this is something they have experience with and can deploy at scale should we need another deployment. >From the quick conversation I had with the folks there, they seem very keen on helping the larger PyData community beyond Jupyter - be it for hosting web sites, CI, build artifacts, etc - and I like the idea of seeing another actor than the big three being active in this space. I am happy to help and involve cycles of the QuantStack team in making this happen if necessary (our team comprises several core jupyter devs). @andy I don't know about discourse, but it sounds like a good idea. Sylvain Corlay Founder & CEO @ QuantStack On Wed, Sep 18, 2019 at 5:32 PM Andy Terrel wrote: > Sounds great to me. Just let me know where everything goes. > > NumPy wants me to help host a discourse for them, maybe OVH would be a > good place to do that as well, (although I would be more inclinded if it > was pydata and we had pandas, scipy, and numpy on it). > > -- Andy > > On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger > wrote: > >> Sounds good w.r.t crediting OVH on those pages. >> >> For the ASV results at pandas.pydata.org/speed (which I now notice is >> currently broken for pandas), the only thing on the webserver is a >> cron job doing a `git pull` from >> https://github.com/asv-runner/asv-collection, from within >> `/usr/share/nginx`. >> >> Tom >> >> >> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >> wrote: >> >>> Hi, >>> >>> An update on the new website infrastructure. We need to finish >>> discussing the details, but OVH is happy to provide the hosting for the >>> pandas infrastructure we need. >>> >>> My initial idea is to credit them in the page with the rest of the >>> sponsors in the new website: >>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>> also in the top right corner of the runnable code widgets (see for example >>> where Binder is credited here: https://spacy.io/). >>> >>> What I'd like to ask is: >>> >>> 1. For the production website and docs (static content only, for the >>> traffic we need): >>> https://us.ovhcloud.com/products/public-cloud/object-storage >>> 2. For our tools and processes, like the benchmarks, builds, CI stuff >>> (temporary publish the docs for every PR,...): >>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>> >>> For the BinderHub, QuantStack offered help with the set up (which is >>> great, because I don't know much about Binder myself, and I'm not sure if >>> anyone else does or wants to take care of this). I don't think it'll be >>> easy to estimate how big is the cluster we need beforehand, but I guess we >>> can add things to Binder iteratively, and have more info as we grow. >>> >>> OVH gave us a 200 euros voucher to experiment with the different >>> services. Let me know how all this sounds, and if there are no objections, >>> I'll create an account and buy those services with the voucher, and I'll >>> start to prototype and see how everything works. >>> >>> Cheers! >>> >>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>> wrote: >>> >>>> Somehow related to the work on the new website ( >>>> https://github.com/pandas-dev/pandas/pull/28014), I've been discussing >>>> with the Binder team, and looks like should be quite easy soon (with a >>>> Sphinx extension) to make all the documentation pages runnable with Binder, >>>> directly from the website (without opening the page as a Jupyter in >>>> mybinder). >>>> >>>> While they are very happy with the idea of having this is pandas, it's >>>> uncertain if the current infrastructure Binder has got, is able to handle >>>> all the traffic we would send. And scikit-learn is working on it too (today >>>> they added to the dev docs a link to mybinder to run the examples). >>>> >>>> I'm discussing with OVH (their infrastructure provider) on whether >>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>> wanted to let you know, so you're updated, and in case anyone is interested >>>> in participating in the discussions. Of course before any decision is made >>>> I'll open a discussion here or on GitHub. >>>> >>>> As part of the discussion I'm also trying to get a server for the >>>> website, and one for development stuff. Specfically for the dev docs >>>> (including rendered docs of every PR) and the GitHub app that will generate >>>> them. I guess it should be very easy to find a sponsor for these two >>>> servers (in exchange of a small note in the footer of the website, or >>>> something like that). >>>> >>>> Let me know if you have any comment, want to be involved or whatever. >>>> >>>> Cheers! >>>> >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> https://mail.python.org/mailman/listinfo/pandas-dev >>> >> > > -- > Andy R. Terrel, PhD > President > NumFOCUS > andy at numfocus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Thu Sep 19 06:45:09 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Thu, 19 Sep 2019 11:45:09 +0100 Subject: [Pandas-dev] Fwd: [NumFOCUS Projects] Round 3: NumFOCUS Small Development Grants CFP is Open In-Reply-To: References: Message-ID: The new round of the NumFOCUS small development grants has been announced. Deadline for proposals is in a bit more than a month. I copy here previous ideas for proposals: - Will: A better JSON -> DataFrame parser (I think RapidJSON came up in the past) - Will: Tighter Arrow Integration(s) - Will: Various ExtensionArrays (container support comes to mind) - Brock: Improve ASV workflow Opening the discussion here. I guess to make the proposals more specific, would be good to specify: - Summary of the proposal - Amount required - If applies, who will be working on the proposal Probably worth nothing that funds do not necessarily need to be for development time, but other initiatives like training, events... I forward the relevant parts of the email from NumFOCUS. ---------- Forwarded message --------- Hello everyone, NumFOCUS is pleased to invite proposals from its sponsored and affiliated projects for targeted small development grants three times per year. This is the third and final call for proposals for 2019. There are no restrictions on what the funding can be used for: code development; documentation work; website updates; workshops and sprints; educational, sustainability, and diversity initiatives; or other types of projects. *Yes, you may re-submit a past grant proposal that was previously not chosen for funding. * For a list of all past successful proposals, see our website: https://numfocus.org/programs/sustainability#sdg Only one application may be submitted per project per grant funding cycle. Available Funding: - Up to $5,000 per proposal. Eligibility: - Any NumFOCUS Fiscally Sponsored or Affiliated project may submit one proposal on behalf of the project per grant cycle. - Proposed work must be achievable within calendar year 2019 or the first few months of 2020. - The call is open to applicants from any nationality and can be performed at any university, institute or business worldwide (US export laws permitting). Round 3 Timeline: - *27 Oct 2019: deadline for proposal submissions* - 18 Nov 2019: proposal acceptance notifications -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.ayd at icloud.com Thu Sep 19 17:53:45 2019 From: william.ayd at icloud.com (William Ayd) Date: Thu, 19 Sep 2019 14:53:45 -0700 Subject: [Pandas-dev] Fwd: [NumFOCUS Projects] Round 3: NumFOCUS Small Development Grants CFP is Open In-Reply-To: References: Message-ID: I think ASV would be the best out of these, because I think it would be very useful and also easier to measure completion of versus something like ?Tighter Arrow Integration? which is a little open ended. I think an ASV proposal would be something along the lines of: - Develop a feedback loop to detect regressions as part of the PR process - Standardize test expectations (ex: say how long each test should run, define what kind of memory tests we need) - Build canned reports on our ASV runner to give a high level overview of performance over time (kind of there, but needs some usability polish) - Improving existing benchmark suite performance (I think these take a very long time to run) - Document and update contributing guide on how to test and develop benchmarks Can whittle down remaining points but figured I?d share thoughts for now. - Will > On Sep 19, 2019, at 3:45 AM, Marc Garcia wrote: > > The new round of the NumFOCUS small development grants has been announced. Deadline for proposals is in a bit more than a month. > > I copy here previous ideas for proposals: > - Will: A better JSON -> DataFrame parser (I think RapidJSON came up in the past) > - Will: Tighter Arrow Integration(s) > - Will: Various ExtensionArrays (container support comes to mind) > - Brock: Improve ASV workflow > > Opening the discussion here. I guess to make the proposals more specific, would be good to specify: > - Summary of the proposal > - Amount required > - If applies, who will be working on the proposal > > Probably worth nothing that funds do not necessarily need to be for development time, but other initiatives like training, events... > > I forward the relevant parts of the email from NumFOCUS. > > ---------- Forwarded message --------- > > > Hello everyone, > > NumFOCUS is pleased to invite proposals from its sponsored and affiliated projects for targeted small development grants three times per year. This is the third and final call for proposals for 2019. > > There are no restrictions on what the funding can be used for: code development; documentation work; website updates; workshops and sprints; educational, sustainability, and diversity initiatives; or other types of projects. > > Yes, you may re-submit a past grant proposal that was previously not chosen for funding. > > For a list of all past successful proposals, see our website: https://numfocus.org/programs/sustainability#sdg > > Only one application may be submitted per project per grant funding cycle. > > Available Funding: > Up to $5,000 per proposal. > > Eligibility: > Any NumFOCUS Fiscally Sponsored or Affiliated project may submit one proposal on behalf of the project per grant cycle. > Proposed work must be achievable within calendar year 2019 or the first few months of 2020. > The call is open to applicants from any nationality and can be performed at any university, institute or business worldwide (US export laws permitting). > > > Round 3 Timeline: > > 27 Oct 2019: deadline for proposal submissions > 18 Nov 2019: proposal acceptance notifications > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev William Ayd william.ayd at icloud.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Thu Sep 19 17:58:43 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Thu, 19 Sep 2019 16:58:43 -0500 Subject: [Pandas-dev] Fwd: [NumFOCUS Projects] Round 3: NumFOCUS Small Development Grants CFP is Open In-Reply-To: References: Message-ID: FYI, https://github.com/tomaugspurger/pandas-czi has a section on ASV workflow improvements. On Thu, Sep 19, 2019 at 4:54 PM William Ayd via Pandas-dev < pandas-dev at python.org> wrote: > I think ASV would be the best out of these, because I think it would be > very useful and also easier to measure completion of versus something like > ?Tighter Arrow Integration? which is a little open ended. I think an ASV > proposal would be something along the lines of: > > - Develop a feedback loop to detect regressions as part of the PR process > - Standardize test expectations (ex: say how long each test should run, > define what kind of memory tests we need) > - Build canned reports on our ASV runner to give a high level overview of > performance over time (kind of there, but needs some usability polish) > - Improving existing benchmark suite performance (I think these take a > very long time to run) > - Document and update contributing guide on how to test and develop > benchmarks > > Can whittle down remaining points but figured I?d share thoughts for now. > > - Will > > On Sep 19, 2019, at 3:45 AM, Marc Garcia wrote: > > The new round of the NumFOCUS small development grants has been announced. > Deadline for proposals is in a bit more than a month. > > I copy here previous ideas for proposals: > - Will: A better JSON -> DataFrame parser (I think RapidJSON came up in > the past) > - Will: Tighter Arrow Integration(s) > - Will: Various ExtensionArrays (container support comes to mind) > - Brock: Improve ASV workflow > > Opening the discussion here. I guess to make the proposals more specific, > would be good to specify: > - Summary of the proposal > - Amount required > - If applies, who will be working on the proposal > > Probably worth nothing that funds do not necessarily need to be for > development time, but other initiatives like training, events... > > I forward the relevant parts of the email from NumFOCUS. > > ---------- Forwarded message --------- > > > Hello everyone, > > NumFOCUS is pleased to invite proposals from its sponsored and affiliated > projects for targeted small development grants three times per year. This > is the third and final call for proposals for 2019. > > There are no restrictions on what the funding can be used for: code > development; documentation work; website updates; workshops and sprints; > educational, sustainability, and diversity initiatives; or other types of > projects. > > *Yes, you may re-submit a past grant proposal that was previously not > chosen for funding. * > > For a list of all past successful proposals, see our website: > https://numfocus.org/programs/sustainability#sdg > > Only one application may be submitted per project per grant funding cycle. > > Available Funding: > > - > > Up to $5,000 per proposal. > > > Eligibility: > > - > > Any NumFOCUS Fiscally Sponsored or Affiliated project may submit one > proposal on behalf of the project per grant cycle. > - Proposed work must be achievable within calendar year 2019 or the > first few months of 2020. > - > > The call is open to applicants from any nationality and can be > performed at any university, institute or business worldwide (US export > laws permitting). > > > Round 3 Timeline: > > > - *27 Oct 2019: deadline for proposal submissions* > - 18 Nov 2019: proposal acceptance notifications > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > > > William Ayd > william.ayd at icloud.com > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Fri Sep 20 05:47:24 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Fri, 20 Sep 2019 10:47:24 +0100 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: I don't know much about discourse, but why do we want to self-host it? Seems like Discourse does it for free for open source projects: https://free.discourse.group/ And I don't think we want another system to maintain. Am I missing something? I applied for https://pandas.discourse.group, so we can give it a try. We should have it approved and working in couple of days. For what I saw, Discourse has one level of categories, so I guess we want one per project, so we can have categories for "Users", "Contributors", "Ecosystem"... or something similar. I guess if we have a single Discourse for NumFOCUS, every project will be a category, and it'll be difficult to group conversations. If anyone already has experience with Discourse and disagrees with my guesses, please let me know. On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: > Sounds great to me. Just let me know where everything goes. > > NumPy wants me to help host a discourse for them, maybe OVH would be a > good place to do that as well, (although I would be more inclinded if it > was pydata and we had pandas, scipy, and numpy on it). > > -- Andy > > On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger > wrote: > >> Sounds good w.r.t crediting OVH on those pages. >> >> For the ASV results at pandas.pydata.org/speed (which I now notice is >> currently broken for pandas), the only thing on the webserver is a >> cron job doing a `git pull` from >> https://github.com/asv-runner/asv-collection, from within >> `/usr/share/nginx`. >> >> Tom >> >> >> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >> wrote: >> >>> Hi, >>> >>> An update on the new website infrastructure. We need to finish >>> discussing the details, but OVH is happy to provide the hosting for the >>> pandas infrastructure we need. >>> >>> My initial idea is to credit them in the page with the rest of the >>> sponsors in the new website: >>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>> also in the top right corner of the runnable code widgets (see for example >>> where Binder is credited here: https://spacy.io/). >>> >>> What I'd like to ask is: >>> >>> 1. For the production website and docs (static content only, for the >>> traffic we need): >>> https://us.ovhcloud.com/products/public-cloud/object-storage >>> 2. For our tools and processes, like the benchmarks, builds, CI stuff >>> (temporary publish the docs for every PR,...): >>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>> >>> For the BinderHub, QuantStack offered help with the set up (which is >>> great, because I don't know much about Binder myself, and I'm not sure if >>> anyone else does or wants to take care of this). I don't think it'll be >>> easy to estimate how big is the cluster we need beforehand, but I guess we >>> can add things to Binder iteratively, and have more info as we grow. >>> >>> OVH gave us a 200 euros voucher to experiment with the different >>> services. Let me know how all this sounds, and if there are no objections, >>> I'll create an account and buy those services with the voucher, and I'll >>> start to prototype and see how everything works. >>> >>> Cheers! >>> >>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>> wrote: >>> >>>> Somehow related to the work on the new website ( >>>> https://github.com/pandas-dev/pandas/pull/28014), I've been discussing >>>> with the Binder team, and looks like should be quite easy soon (with a >>>> Sphinx extension) to make all the documentation pages runnable with Binder, >>>> directly from the website (without opening the page as a Jupyter in >>>> mybinder). >>>> >>>> While they are very happy with the idea of having this is pandas, it's >>>> uncertain if the current infrastructure Binder has got, is able to handle >>>> all the traffic we would send. And scikit-learn is working on it too (today >>>> they added to the dev docs a link to mybinder to run the examples). >>>> >>>> I'm discussing with OVH (their infrastructure provider) on whether >>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>> wanted to let you know, so you're updated, and in case anyone is interested >>>> in participating in the discussions. Of course before any decision is made >>>> I'll open a discussion here or on GitHub. >>>> >>>> As part of the discussion I'm also trying to get a server for the >>>> website, and one for development stuff. Specfically for the dev docs >>>> (including rendered docs of every PR) and the GitHub app that will generate >>>> them. I guess it should be very easy to find a sponsor for these two >>>> servers (in exchange of a small note in the footer of the website, or >>>> something like that). >>>> >>>> Let me know if you have any comment, want to be involved or whatever. >>>> >>>> Cheers! >>>> >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> https://mail.python.org/mailman/listinfo/pandas-dev >>> >> > > -- > Andy R. Terrel, PhD > President > NumFOCUS > andy at numfocus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Fri Sep 20 07:06:50 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Fri, 20 Sep 2019 06:06:50 -0500 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: I'd prefer to join a discourse along with NumPy, Dask, and other PyData or NumFOCUS projects, rather than going out on our own. On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia wrote: > I don't know much about discourse, but why do we want to self-host it? > Seems like Discourse does it for free for open source projects: > https://free.discourse.group/ And I don't think we want another system to > maintain. Am I missing something? > > I applied for https://pandas.discourse.group, so we can give it a try. We > should have it approved and working in couple of days. > > For what I saw, Discourse has one level of categories, so I guess we want > one per project, so we can have categories for "Users", "Contributors", > "Ecosystem"... or something similar. I guess if we have a single Discourse > for NumFOCUS, every project will be a category, and it'll be difficult to > group conversations. > > If anyone already has experience with Discourse and disagrees with my > guesses, please let me know. > > On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: > >> Sounds great to me. Just let me know where everything goes. >> >> NumPy wants me to help host a discourse for them, maybe OVH would be a >> good place to do that as well, (although I would be more inclinded if it >> was pydata and we had pandas, scipy, and numpy on it). >> >> -- Andy >> >> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >> tom.augspurger88 at gmail.com> wrote: >> >>> Sounds good w.r.t crediting OVH on those pages. >>> >>> For the ASV results at pandas.pydata.org/speed (which I now notice is >>> currently broken for pandas), the only thing on the webserver is a >>> cron job doing a `git pull` from >>> https://github.com/asv-runner/asv-collection, from within >>> `/usr/share/nginx`. >>> >>> Tom >>> >>> >>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>> wrote: >>> >>>> Hi, >>>> >>>> An update on the new website infrastructure. We need to finish >>>> discussing the details, but OVH is happy to provide the hosting for the >>>> pandas infrastructure we need. >>>> >>>> My initial idea is to credit them in the page with the rest of the >>>> sponsors in the new website: >>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>> also in the top right corner of the runnable code widgets (see for example >>>> where Binder is credited here: https://spacy.io/). >>>> >>>> What I'd like to ask is: >>>> >>>> 1. For the production website and docs (static content only, for the >>>> traffic we need): >>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>> 2. For our tools and processes, like the benchmarks, builds, CI stuff >>>> (temporary publish the docs for every PR,...): >>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>> >>>> For the BinderHub, QuantStack offered help with the set up (which is >>>> great, because I don't know much about Binder myself, and I'm not sure if >>>> anyone else does or wants to take care of this). I don't think it'll be >>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>> can add things to Binder iteratively, and have more info as we grow. >>>> >>>> OVH gave us a 200 euros voucher to experiment with the different >>>> services. Let me know how all this sounds, and if there are no objections, >>>> I'll create an account and buy those services with the voucher, and I'll >>>> start to prototype and see how everything works. >>>> >>>> Cheers! >>>> >>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>>> wrote: >>>> >>>>> Somehow related to the work on the new website ( >>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>> mybinder). >>>>> >>>>> While they are very happy with the idea of having this is pandas, it's >>>>> uncertain if the current infrastructure Binder has got, is able to handle >>>>> all the traffic we would send. And scikit-learn is working on it too (today >>>>> they added to the dev docs a link to mybinder to run the examples). >>>>> >>>>> I'm discussing with OVH (their infrastructure provider) on whether >>>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>>> wanted to let you know, so you're updated, and in case anyone is interested >>>>> in participating in the discussions. Of course before any decision is made >>>>> I'll open a discussion here or on GitHub. >>>>> >>>>> As part of the discussion I'm also trying to get a server for the >>>>> website, and one for development stuff. Specfically for the dev docs >>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>> them. I guess it should be very easy to find a sponsor for these two >>>>> servers (in exchange of a small note in the footer of the website, or >>>>> something like that). >>>>> >>>>> Let me know if you have any comment, want to be involved or whatever. >>>>> >>>>> Cheers! >>>>> >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>> >>> >> >> -- >> Andy R. Terrel, PhD >> President >> NumFOCUS >> andy at numfocus.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy at numfocus.org Fri Sep 20 07:17:49 2019 From: andy at numfocus.org (Andy Terrel) Date: Fri, 20 Sep 2019 06:17:49 -0500 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: Yeah I didn?t know they hosted it for free for open source. I?ll look into a pydata version. ? Andy On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia wrote: > I don't know much about discourse, but why do we want to self-host it? > Seems like Discourse does it for free for open source projects: > https://free.discourse.group/ And I don't think we want another system to > maintain. Am I missing something? > > I applied for https://pandas.discourse.group, so we can give it a try. We > should have it approved and working in couple of days. > > For what I saw, Discourse has one level of categories, so I guess we want > one per project, so we can have categories for "Users", "Contributors", > "Ecosystem"... or something similar. I guess if we have a single Discourse > for NumFOCUS, every project will be a category, and it'll be difficult to > group conversations. > > If anyone already has experience with Discourse and disagrees with my > guesses, please let me know. > > On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: > >> Sounds great to me. Just let me know where everything goes. >> >> NumPy wants me to help host a discourse for them, maybe OVH would be a >> good place to do that as well, (although I would be more inclinded if it >> was pydata and we had pandas, scipy, and numpy on it). >> >> -- Andy >> >> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >> tom.augspurger88 at gmail.com> wrote: >> >>> Sounds good w.r.t crediting OVH on those pages. >>> >>> For the ASV results at pandas.pydata.org/speed (which I now notice is >>> currently broken for pandas), the only thing on the webserver is a >>> cron job doing a `git pull` from >>> https://github.com/asv-runner/asv-collection, from within >>> `/usr/share/nginx`. >>> >>> Tom >>> >>> >>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>> wrote: >>> >>>> Hi, >>>> >>>> An update on the new website infrastructure. We need to finish >>>> discussing the details, but OVH is happy to provide the hosting for the >>>> pandas infrastructure we need. >>>> >>>> My initial idea is to credit them in the page with the rest of the >>>> sponsors in the new website: >>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>> also in the top right corner of the runnable code widgets (see for example >>>> where Binder is credited here: https://spacy.io/). >>>> >>>> What I'd like to ask is: >>>> >>>> 1. For the production website and docs (static content only, for the >>>> traffic we need): >>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>> 2. For our tools and processes, like the benchmarks, builds, CI stuff >>>> (temporary publish the docs for every PR,...): >>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>> >>>> For the BinderHub, QuantStack offered help with the set up (which is >>>> great, because I don't know much about Binder myself, and I'm not sure if >>>> anyone else does or wants to take care of this). I don't think it'll be >>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>> can add things to Binder iteratively, and have more info as we grow. >>>> >>>> OVH gave us a 200 euros voucher to experiment with the different >>>> services. Let me know how all this sounds, and if there are no objections, >>>> I'll create an account and buy those services with the voucher, and I'll >>>> start to prototype and see how everything works. >>>> >>>> Cheers! >>>> >>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>>> wrote: >>>> >>>>> Somehow related to the work on the new website ( >>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>> mybinder). >>>>> >>>>> While they are very happy with the idea of having this is pandas, it's >>>>> uncertain if the current infrastructure Binder has got, is able to handle >>>>> all the traffic we would send. And scikit-learn is working on it too (today >>>>> they added to the dev docs a link to mybinder to run the examples). >>>>> >>>>> I'm discussing with OVH (their infrastructure provider) on whether >>>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>>> wanted to let you know, so you're updated, and in case anyone is interested >>>>> in participating in the discussions. Of course before any decision is made >>>>> I'll open a discussion here or on GitHub. >>>>> >>>>> As part of the discussion I'm also trying to get a server for the >>>>> website, and one for development stuff. Specfically for the dev docs >>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>> them. I guess it should be very easy to find a sponsor for these two >>>>> servers (in exchange of a small note in the footer of the website, or >>>>> something like that). >>>>> >>>>> Let me know if you have any comment, want to be involved or whatever. >>>>> >>>>> Cheers! >>>>> >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>> >>> >> >> -- >> Andy R. Terrel, PhD >> President >> NumFOCUS >> andy at numfocus.org >> > -- Andy R. Terrel, PhD President NumFOCUS andy at numfocus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Fri Sep 20 07:18:12 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Fri, 20 Sep 2019 12:18:12 +0100 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: I'm fine with that conceptually, but I think Discourse will make things quite tricky to find things then. We already got our discourse approved, if you want to join it an experiment with the setting. But it's the first thing I tried, and after you join a category (project), everything feels like it's in the same place (even if subcategories and tags exist). And I think we need at least a clear separation between pandas/users pandas/contributors discussions. May be I just couldn't find the settings, let me know if you manage to get a multi-project set up that makes sense. On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger wrote: > I'd prefer to join a discourse along with NumPy, Dask, and other PyData or > NumFOCUS projects, rather than going out on our own. > > On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia wrote: > >> I don't know much about discourse, but why do we want to self-host it? >> Seems like Discourse does it for free for open source projects: >> https://free.discourse.group/ And I don't think we want another system >> to maintain. Am I missing something? >> >> I applied for https://pandas.discourse.group, so we can give it a try. >> We should have it approved and working in couple of days. >> >> For what I saw, Discourse has one level of categories, so I guess we want >> one per project, so we can have categories for "Users", "Contributors", >> "Ecosystem"... or something similar. I guess if we have a single Discourse >> for NumFOCUS, every project will be a category, and it'll be difficult to >> group conversations. >> >> If anyone already has experience with Discourse and disagrees with my >> guesses, please let me know. >> >> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: >> >>> Sounds great to me. Just let me know where everything goes. >>> >>> NumPy wants me to help host a discourse for them, maybe OVH would be a >>> good place to do that as well, (although I would be more inclinded if it >>> was pydata and we had pandas, scipy, and numpy on it). >>> >>> -- Andy >>> >>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>> tom.augspurger88 at gmail.com> wrote: >>> >>>> Sounds good w.r.t crediting OVH on those pages. >>>> >>>> For the ASV results at pandas.pydata.org/speed (which I now notice is >>>> currently broken for pandas), the only thing on the webserver is a >>>> cron job doing a `git pull` from >>>> https://github.com/asv-runner/asv-collection, from within >>>> `/usr/share/nginx`. >>>> >>>> Tom >>>> >>>> >>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> An update on the new website infrastructure. We need to finish >>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>> pandas infrastructure we need. >>>>> >>>>> My initial idea is to credit them in the page with the rest of the >>>>> sponsors in the new website: >>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>> also in the top right corner of the runnable code widgets (see for example >>>>> where Binder is credited here: https://spacy.io/). >>>>> >>>>> What I'd like to ask is: >>>>> >>>>> 1. For the production website and docs (static content only, for the >>>>> traffic we need): >>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>> 2. For our tools and processes, like the benchmarks, builds, CI stuff >>>>> (temporary publish the docs for every PR,...): >>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>> >>>>> For the BinderHub, QuantStack offered help with the set up (which is >>>>> great, because I don't know much about Binder myself, and I'm not sure if >>>>> anyone else does or wants to take care of this). I don't think it'll be >>>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>>> can add things to Binder iteratively, and have more info as we grow. >>>>> >>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>> services. Let me know how all this sounds, and if there are no objections, >>>>> I'll create an account and buy those services with the voucher, and I'll >>>>> start to prototype and see how everything works. >>>>> >>>>> Cheers! >>>>> >>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>>>> wrote: >>>>> >>>>>> Somehow related to the work on the new website ( >>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>> mybinder). >>>>>> >>>>>> While they are very happy with the idea of having this is pandas, >>>>>> it's uncertain if the current infrastructure Binder has got, is able to >>>>>> handle all the traffic we would send. And scikit-learn is working on it too >>>>>> (today they added to the dev docs a link to mybinder to run the examples). >>>>>> >>>>>> I'm discussing with OVH (their infrastructure provider) on whether >>>>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>>>> wanted to let you know, so you're updated, and in case anyone is interested >>>>>> in participating in the discussions. Of course before any decision is made >>>>>> I'll open a discussion here or on GitHub. >>>>>> >>>>>> As part of the discussion I'm also trying to get a server for the >>>>>> website, and one for development stuff. Specfically for the dev docs >>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>> something like that). >>>>>> >>>>>> Let me know if you have any comment, want to be involved or whatever. >>>>>> >>>>>> Cheers! >>>>>> >>>>> _______________________________________________ >>>>> Pandas-dev mailing list >>>>> Pandas-dev at python.org >>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>> >>>> >>> >>> -- >>> Andy R. Terrel, PhD >>> President >>> NumFOCUS >>> andy at numfocus.org >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Fri Sep 20 07:52:11 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Fri, 20 Sep 2019 13:52:11 +0200 Subject: [Pandas-dev] Discourse discussion forum Message-ID: (let's use a new thread for discourse, as it is a different discussion from the website hosting I think, regardless whether OVH might also host discourse) I am not familiar enough myself with discourse to know whether multiple projects sharing a single discourse will become annoying. But indeed, that sounds as it needs some kind of hierarchical category / tagging. For pandas itself: I think I quite like the idea of having a discourse, but *if* we do that, we should think about how that fits with / replaces / adds to /... some of the other communication channels (pandas-dev mailing list, pydata mailing list, github issues, ..). Joris On Fri, 20 Sep 2019 at 13:18, Marc Garcia wrote: > I'm fine with that conceptually, but I think Discourse will make things > quite tricky to find things then. > > We already got our discourse approved, if you want to join it an > experiment with the setting. But it's the first thing I tried, and after > you join a category (project), everything feels like it's in the same place > (even if subcategories and tags exist). And I think we need at least a > clear separation between pandas/users pandas/contributors discussions. > > May be I just couldn't find the settings, let me know if you manage to get > a multi-project set up that makes sense. > > On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < > tom.augspurger88 at gmail.com> wrote: > >> I'd prefer to join a discourse along with NumPy, Dask, and other PyData >> or NumFOCUS projects, rather than going out on our own. >> >> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >> wrote: >> >>> I don't know much about discourse, but why do we want to self-host it? >>> Seems like Discourse does it for free for open source projects: >>> https://free.discourse.group/ And I don't think we want another system >>> to maintain. Am I missing something? >>> >>> I applied for https://pandas.discourse.group, so we can give it a try. >>> We should have it approved and working in couple of days. >>> >>> For what I saw, Discourse has one level of categories, so I guess we >>> want one per project, so we can have categories for "Users", >>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>> single Discourse for NumFOCUS, every project will be a category, and it'll >>> be difficult to group conversations. >>> >>> If anyone already has experience with Discourse and disagrees with my >>> guesses, please let me know. >>> >>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: >>> >>>> Sounds great to me. Just let me know where everything goes. >>>> >>>> NumPy wants me to help host a discourse for them, maybe OVH would be a >>>> good place to do that as well, (although I would be more inclinded if it >>>> was pydata and we had pandas, scipy, and numpy on it). >>>> >>>> -- Andy >>>> >>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>> tom.augspurger88 at gmail.com> wrote: >>>> >>>>> Sounds good w.r.t crediting OVH on those pages. >>>>> >>>>> For the ASV results at pandas.pydata.org/speed (which I now notice is >>>>> currently broken for pandas), the only thing on the webserver is a >>>>> cron job doing a `git pull` from >>>>> https://github.com/asv-runner/asv-collection, from within >>>>> `/usr/share/nginx`. >>>>> >>>>> Tom >>>>> >>>>> >>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> An update on the new website infrastructure. We need to finish >>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>> pandas infrastructure we need. >>>>>> >>>>>> My initial idea is to credit them in the page with the rest of the >>>>>> sponsors in the new website: >>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>> where Binder is credited here: https://spacy.io/). >>>>>> >>>>>> What I'd like to ask is: >>>>>> >>>>>> 1. For the production website and docs (static content only, for the >>>>>> traffic we need): >>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>> 2. For our tools and processes, like the benchmarks, builds, CI stuff >>>>>> (temporary publish the docs for every PR,...): >>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>> >>>>>> For the BinderHub, QuantStack offered help with the set up (which is >>>>>> great, because I don't know much about Binder myself, and I'm not sure if >>>>>> anyone else does or wants to take care of this). I don't think it'll be >>>>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>>>> can add things to Binder iteratively, and have more info as we grow. >>>>>> >>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>> start to prototype and see how everything works. >>>>>> >>>>>> Cheers! >>>>>> >>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>>>>> wrote: >>>>>> >>>>>>> Somehow related to the work on the new website ( >>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>> mybinder). >>>>>>> >>>>>>> While they are very happy with the idea of having this is pandas, >>>>>>> it's uncertain if the current infrastructure Binder has got, is able to >>>>>>> handle all the traffic we would send. And scikit-learn is working on it too >>>>>>> (today they added to the dev docs a link to mybinder to run the examples). >>>>>>> >>>>>>> I'm discussing with OVH (their infrastructure provider) on whether >>>>>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>>>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>>>>> wanted to let you know, so you're updated, and in case anyone is interested >>>>>>> in participating in the discussions. Of course before any decision is made >>>>>>> I'll open a discussion here or on GitHub. >>>>>>> >>>>>>> As part of the discussion I'm also trying to get a server for the >>>>>>> website, and one for development stuff. Specfically for the dev docs >>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>> something like that). >>>>>>> >>>>>>> Let me know if you have any comment, want to be involved or whatever. >>>>>>> >>>>>>> Cheers! >>>>>>> >>>>>> _______________________________________________ >>>>>> Pandas-dev mailing list >>>>>> Pandas-dev at python.org >>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>> >>>>> >>>> >>>> -- >>>> Andy R. Terrel, PhD >>>> President >>>> NumFOCUS >>>> andy at numfocus.org >>>> >>> _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy at numfocus.org Fri Sep 20 07:57:34 2019 From: andy at numfocus.org (Andy Terrel) Date: Fri, 20 Sep 2019 06:57:34 -0500 Subject: [Pandas-dev] Discourse discussion forum In-Reply-To: References: Message-ID: Thanks Joris for splitting the thread, sorry if I hijacked the other one. For some discussion from numpy you can see here https://github.com/numpy/numpy.org/issues/28 Julia and Jupyter both run their own discourse but Dask, Numpy, Scipy have all told me ?I don?t want to run it ourselves but be part of a larger one? I bet we can figure out how to organize it. I just put in an application to get pydata.discourse.org. ? Andy On Fri, Sep 20, 2019 at 6:52 AM Joris Van den Bossche < jorisvandenbossche at gmail.com> wrote: > (let's use a new thread for discourse, as it is a different discussion > from the website hosting I think, regardless whether OVH might also host > discourse) > > I am not familiar enough myself with discourse to know whether multiple > projects sharing a single discourse will become annoying. But indeed, that > sounds as it needs some kind of hierarchical category / tagging. > > For pandas itself: I think I quite like the idea of having a discourse, > but *if* we do that, we should think about how that fits with / replaces > / adds to /... some of the other communication channels (pandas-dev mailing > list, pydata mailing list, github issues, ..). > > Joris > > On Fri, 20 Sep 2019 at 13:18, Marc Garcia wrote: > >> I'm fine with that conceptually, but I think Discourse will make things >> quite tricky to find things then. >> >> We already got our discourse approved, if you want to join it an >> experiment with the setting. But it's the first thing I tried, and after >> you join a category (project), everything feels like it's in the same place >> (even if subcategories and tags exist). And I think we need at least a >> clear separation between pandas/users pandas/contributors discussions. >> >> May be I just couldn't find the settings, let me know if you manage to >> get a multi-project set up that makes sense. >> >> On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < >> tom.augspurger88 at gmail.com> wrote: >> >>> I'd prefer to join a discourse along with NumPy, Dask, and other PyData >>> or NumFOCUS projects, rather than going out on our own. >>> >>> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >>> wrote: >>> >>>> I don't know much about discourse, but why do we want to self-host it? >>>> Seems like Discourse does it for free for open source projects: >>>> https://free.discourse.group/ And I don't think we want another system >>>> to maintain. Am I missing something? >>>> >>>> I applied for https://pandas.discourse.group, so we can give it a try. >>>> We should have it approved and working in couple of days. >>>> >>>> For what I saw, Discourse has one level of categories, so I guess we >>>> want one per project, so we can have categories for "Users", >>>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>>> single Discourse for NumFOCUS, every project will be a category, and it'll >>>> be difficult to group conversations. >>>> >>>> If anyone already has experience with Discourse and disagrees with my >>>> guesses, please let me know. >>>> >>>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: >>>> >>>>> Sounds great to me. Just let me know where everything goes. >>>>> >>>>> NumPy wants me to help host a discourse for them, maybe OVH would be a >>>>> good place to do that as well, (although I would be more inclinded if it >>>>> was pydata and we had pandas, scipy, and numpy on it). >>>>> >>>>> -- Andy >>>>> >>>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>>> tom.augspurger88 at gmail.com> wrote: >>>>> >>>>>> Sounds good w.r.t crediting OVH on those pages. >>>>>> >>>>>> For the ASV results at pandas.pydata.org/speed (which I now notice >>>>>> is currently broken for pandas), the only thing on the webserver is a >>>>>> cron job doing a `git pull` from >>>>>> https://github.com/asv-runner/asv-collection, from within >>>>>> `/usr/share/nginx`. >>>>>> >>>>>> Tom >>>>>> >>>>>> >>>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> An update on the new website infrastructure. We need to finish >>>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>>> pandas infrastructure we need. >>>>>>> >>>>>>> My initial idea is to credit them in the page with the rest of the >>>>>>> sponsors in the new website: >>>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>>> where Binder is credited here: https://spacy.io/). >>>>>>> >>>>>>> What I'd like to ask is: >>>>>>> >>>>>>> 1. For the production website and docs (static content only, for the >>>>>>> traffic we need): >>>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>>> 2. For our tools and processes, like the benchmarks, builds, CI >>>>>>> stuff (temporary publish the docs for every PR,...): >>>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>>> >>>>>>> For the BinderHub, QuantStack offered help with the set up (which is >>>>>>> great, because I don't know much about Binder myself, and I'm not sure if >>>>>>> anyone else does or wants to take care of this). I don't think it'll be >>>>>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>>>>> can add things to Binder iteratively, and have more info as we grow. >>>>>>> >>>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>>> start to prototype and see how everything works. >>>>>>> >>>>>>> Cheers! >>>>>>> >>>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>>>>>> wrote: >>>>>>> >>>>>>>> Somehow related to the work on the new website ( >>>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>>> mybinder). >>>>>>>> >>>>>>>> While they are very happy with the idea of having this is pandas, >>>>>>>> it's uncertain if the current infrastructure Binder has got, is able to >>>>>>>> handle all the traffic we would send. And scikit-learn is working on it too >>>>>>>> (today they added to the dev docs a link to mybinder to run the examples). >>>>>>>> >>>>>>>> I'm discussing with OVH (their infrastructure provider) on whether >>>>>>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>>>>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>>>>>> wanted to let you know, so you're updated, and in case anyone is interested >>>>>>>> in participating in the discussions. Of course before any decision is made >>>>>>>> I'll open a discussion here or on GitHub. >>>>>>>> >>>>>>>> As part of the discussion I'm also trying to get a server for the >>>>>>>> website, and one for development stuff. Specfically for the dev docs >>>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>>> something like that). >>>>>>>> >>>>>>>> Let me know if you have any comment, want to be involved or >>>>>>>> whatever. >>>>>>>> >>>>>>>> Cheers! >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pandas-dev mailing list >>>>>>> Pandas-dev at python.org >>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Andy R. Terrel, PhD >>>>> President >>>>> NumFOCUS >>>>> andy at numfocus.org >>>>> >>>> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > -- Andy R. Terrel, PhD President NumFOCUS andy at numfocus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Fri Sep 20 08:50:04 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Fri, 20 Sep 2019 07:50:04 -0500 Subject: [Pandas-dev] Discourse discussion forum In-Reply-To: References: Message-ID: On Fri, Sep 20, 2019 at 6:57 AM Andy Terrel wrote: > Thanks Joris for splitting the thread, sorry if I hijacked the other one. > > For some discussion from numpy you can see here > https://github.com/numpy/numpy.org/issues/28 > > Julia and Jupyter both run their own discourse but Dask, Numpy, Scipy have > all told me ?I don?t want to run it ourselves but be part of a larger one? > > I bet we can figure out how to organize it. > > I just put in an application to get pydata.discourse.org. > > ? Andy > > On Fri, Sep 20, 2019 at 6:52 AM Joris Van den Bossche < > jorisvandenbossche at gmail.com> wrote: > >> (let's use a new thread for discourse, as it is a different discussion >> from the website hosting I think, regardless whether OVH might also host >> discourse) >> >> I am not familiar enough myself with discourse to know whether multiple >> projects sharing a single discourse will become annoying. But indeed, that >> sounds as it needs some kind of hierarchical category / tagging. >> >> For pandas itself: I think I quite like the idea of having a discourse, >> but *if* we do that, we should think about how that fits with / replaces >> / adds to /... some of the other communication channels (pandas-dev mailing >> list, pydata mailing list, github issues, ..). >> > IMO, we can replace the pandas-dev & pydata mailing lists with it. Possibly gitter as well. > Joris >> >> On Fri, 20 Sep 2019 at 13:18, Marc Garcia wrote: >> >>> I'm fine with that conceptually, but I think Discourse will make things >>> quite tricky to find things then. >>> >>> We already got our discourse approved, if you want to join it an >>> experiment with the setting. But it's the first thing I tried, and after >>> you join a category (project), everything feels like it's in the same place >>> (even if subcategories and tags exist). And I think we need at least a >>> clear separation between pandas/users pandas/contributors discussions. >>> >>> May be I just couldn't find the settings, let me know if you manage to >>> get a multi-project set up that makes sense. >>> >>> On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < >>> tom.augspurger88 at gmail.com> wrote: >>> >>>> I'd prefer to join a discourse along with NumPy, Dask, and other PyData >>>> or NumFOCUS projects, rather than going out on our own. >>>> >>>> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >>>> wrote: >>>> >>>>> I don't know much about discourse, but why do we want to self-host it? >>>>> Seems like Discourse does it for free for open source projects: >>>>> https://free.discourse.group/ And I don't think we want another >>>>> system to maintain. Am I missing something? >>>>> >>>>> I applied for https://pandas.discourse.group, so we can give it a >>>>> try. We should have it approved and working in couple of days. >>>>> >>>>> For what I saw, Discourse has one level of categories, so I guess we >>>>> want one per project, so we can have categories for "Users", >>>>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>>>> single Discourse for NumFOCUS, every project will be a category, and it'll >>>>> be difficult to group conversations. >>>>> >>>>> If anyone already has experience with Discourse and disagrees with my >>>>> guesses, please let me know. >>>>> >>>>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: >>>>> >>>>>> Sounds great to me. Just let me know where everything goes. >>>>>> >>>>>> NumPy wants me to help host a discourse for them, maybe OVH would be >>>>>> a good place to do that as well, (although I would be more inclinded if it >>>>>> was pydata and we had pandas, scipy, and numpy on it). >>>>>> >>>>>> -- Andy >>>>>> >>>>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>>>> tom.augspurger88 at gmail.com> wrote: >>>>>> >>>>>>> Sounds good w.r.t crediting OVH on those pages. >>>>>>> >>>>>>> For the ASV results at pandas.pydata.org/speed (which I now notice >>>>>>> is currently broken for pandas), the only thing on the webserver is a >>>>>>> cron job doing a `git pull` from >>>>>>> https://github.com/asv-runner/asv-collection, from within >>>>>>> `/usr/share/nginx`. >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> An update on the new website infrastructure. We need to finish >>>>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>>>> pandas infrastructure we need. >>>>>>>> >>>>>>>> My initial idea is to credit them in the page with the rest of the >>>>>>>> sponsors in the new website: >>>>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>>>> where Binder is credited here: https://spacy.io/). >>>>>>>> >>>>>>>> What I'd like to ask is: >>>>>>>> >>>>>>>> 1. For the production website and docs (static content only, for >>>>>>>> the traffic we need): >>>>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>>>> 2. For our tools and processes, like the benchmarks, builds, CI >>>>>>>> stuff (temporary publish the docs for every PR,...): >>>>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>>>> >>>>>>>> For the BinderHub, QuantStack offered help with the set up (which >>>>>>>> is great, because I don't know much about Binder myself, and I'm not sure >>>>>>>> if anyone else does or wants to take care of this). I don't think it'll be >>>>>>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>>>>>> can add things to Binder iteratively, and have more info as we grow. >>>>>>>> >>>>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>>>> start to prototype and see how everything works. >>>>>>>> >>>>>>>> Cheers! >>>>>>>> >>>>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Somehow related to the work on the new website ( >>>>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>>>> mybinder). >>>>>>>>> >>>>>>>>> While they are very happy with the idea of having this is pandas, >>>>>>>>> it's uncertain if the current infrastructure Binder has got, is able to >>>>>>>>> handle all the traffic we would send. And scikit-learn is working on it too >>>>>>>>> (today they added to the dev docs a link to mybinder to run the examples). >>>>>>>>> >>>>>>>>> I'm discussing with OVH (their infrastructure provider) on whether >>>>>>>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>>>>>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>>>>>>> wanted to let you know, so you're updated, and in case anyone is interested >>>>>>>>> in participating in the discussions. Of course before any decision is made >>>>>>>>> I'll open a discussion here or on GitHub. >>>>>>>>> >>>>>>>>> As part of the discussion I'm also trying to get a server for the >>>>>>>>> website, and one for development stuff. Specfically for the dev docs >>>>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>>>> something like that). >>>>>>>>> >>>>>>>>> Let me know if you have any comment, want to be involved or >>>>>>>>> whatever. >>>>>>>>> >>>>>>>>> Cheers! >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Pandas-dev mailing list >>>>>>>> Pandas-dev at python.org >>>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Andy R. Terrel, PhD >>>>>> President >>>>>> NumFOCUS >>>>>> andy at numfocus.org >>>>>> >>>>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> https://mail.python.org/mailman/listinfo/pandas-dev >>> >> -- > Andy R. Terrel, PhD > President > NumFOCUS > andy at numfocus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Fri Sep 20 08:58:29 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Fri, 20 Sep 2019 13:58:29 +0100 Subject: [Pandas-dev] Discourse discussion forum In-Reply-To: References: Message-ID: For what I've seen I'd say that Discourse can be configured to interact with a category like a distribution list (subscribe and have an email address to send messages there). Not sure, but for the settings I've seen should be possible. Personally I think it should replace all the existing lists: - pydata google group - pandas-dev (this) - core devs list I'm also ok to get rid of gitter once we move to discourse (also ok to keep it if people find it useful, but I rarely use it). I created an issue for this discussion some time ago: https://github.com/pandas-dev/pandas/issues/27903 On Fri, Sep 20, 2019 at 1:50 PM Tom Augspurger wrote: > > > On Fri, Sep 20, 2019 at 6:57 AM Andy Terrel wrote: > >> Thanks Joris for splitting the thread, sorry if I hijacked the other one. >> >> For some discussion from numpy you can see here >> https://github.com/numpy/numpy.org/issues/28 >> >> Julia and Jupyter both run their own discourse but Dask, Numpy, Scipy >> have all told me ?I don?t want to run it ourselves but be part of a larger >> one? >> >> I bet we can figure out how to organize it. >> >> I just put in an application to get pydata.discourse.org. >> >> ? Andy >> >> On Fri, Sep 20, 2019 at 6:52 AM Joris Van den Bossche < >> jorisvandenbossche at gmail.com> wrote: >> >>> (let's use a new thread for discourse, as it is a different discussion >>> from the website hosting I think, regardless whether OVH might also host >>> discourse) >>> >>> I am not familiar enough myself with discourse to know whether multiple >>> projects sharing a single discourse will become annoying. But indeed, that >>> sounds as it needs some kind of hierarchical category / tagging. >>> >>> For pandas itself: I think I quite like the idea of having a discourse, >>> but *if* we do that, we should think about how that fits with / >>> replaces / adds to /... some of the other communication channels >>> (pandas-dev mailing list, pydata mailing list, github issues, ..). >>> >> > IMO, we can replace the pandas-dev & pydata mailing lists with it. > Possibly gitter as well. > > >> Joris >>> >>> On Fri, 20 Sep 2019 at 13:18, Marc Garcia wrote: >>> >>>> I'm fine with that conceptually, but I think Discourse will make things >>>> quite tricky to find things then. >>>> >>>> We already got our discourse approved, if you want to join it an >>>> experiment with the setting. But it's the first thing I tried, and after >>>> you join a category (project), everything feels like it's in the same place >>>> (even if subcategories and tags exist). And I think we need at least a >>>> clear separation between pandas/users pandas/contributors discussions. >>>> >>>> May be I just couldn't find the settings, let me know if you manage to >>>> get a multi-project set up that makes sense. >>>> >>>> On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < >>>> tom.augspurger88 at gmail.com> wrote: >>>> >>>>> I'd prefer to join a discourse along with NumPy, Dask, and other >>>>> PyData or NumFOCUS projects, rather than going out on our own. >>>>> >>>>> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >>>>> wrote: >>>>> >>>>>> I don't know much about discourse, but why do we want to self-host >>>>>> it? Seems like Discourse does it for free for open source projects: >>>>>> https://free.discourse.group/ And I don't think we want another >>>>>> system to maintain. Am I missing something? >>>>>> >>>>>> I applied for https://pandas.discourse.group, so we can give it a >>>>>> try. We should have it approved and working in couple of days. >>>>>> >>>>>> For what I saw, Discourse has one level of categories, so I guess we >>>>>> want one per project, so we can have categories for "Users", >>>>>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>>>>> single Discourse for NumFOCUS, every project will be a category, and it'll >>>>>> be difficult to group conversations. >>>>>> >>>>>> If anyone already has experience with Discourse and disagrees with my >>>>>> guesses, please let me know. >>>>>> >>>>>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel >>>>>> wrote: >>>>>> >>>>>>> Sounds great to me. Just let me know where everything goes. >>>>>>> >>>>>>> NumPy wants me to help host a discourse for them, maybe OVH would be >>>>>>> a good place to do that as well, (although I would be more inclinded if it >>>>>>> was pydata and we had pandas, scipy, and numpy on it). >>>>>>> >>>>>>> -- Andy >>>>>>> >>>>>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>>>>> tom.augspurger88 at gmail.com> wrote: >>>>>>> >>>>>>>> Sounds good w.r.t crediting OVH on those pages. >>>>>>>> >>>>>>>> For the ASV results at pandas.pydata.org/speed (which I now notice >>>>>>>> is currently broken for pandas), the only thing on the webserver is a >>>>>>>> cron job doing a `git pull` from >>>>>>>> https://github.com/asv-runner/asv-collection, from within >>>>>>>> `/usr/share/nginx`. >>>>>>>> >>>>>>>> Tom >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> An update on the new website infrastructure. We need to finish >>>>>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>>>>> pandas infrastructure we need. >>>>>>>>> >>>>>>>>> My initial idea is to credit them in the page with the rest of the >>>>>>>>> sponsors in the new website: >>>>>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>>>>> where Binder is credited here: https://spacy.io/). >>>>>>>>> >>>>>>>>> What I'd like to ask is: >>>>>>>>> >>>>>>>>> 1. For the production website and docs (static content only, for >>>>>>>>> the traffic we need): >>>>>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>>>>> 2. For our tools and processes, like the benchmarks, builds, CI >>>>>>>>> stuff (temporary publish the docs for every PR,...): >>>>>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>>>>> >>>>>>>>> For the BinderHub, QuantStack offered help with the set up (which >>>>>>>>> is great, because I don't know much about Binder myself, and I'm not sure >>>>>>>>> if anyone else does or wants to take care of this). I don't think it'll be >>>>>>>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>>>>>>> can add things to Binder iteratively, and have more info as we grow. >>>>>>>>> >>>>>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>>>>> start to prototype and see how everything works. >>>>>>>>> >>>>>>>>> Cheers! >>>>>>>>> >>>>>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia < >>>>>>>>> garcia.marc at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Somehow related to the work on the new website ( >>>>>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>>>>> mybinder). >>>>>>>>>> >>>>>>>>>> While they are very happy with the idea of having this is pandas, >>>>>>>>>> it's uncertain if the current infrastructure Binder has got, is able to >>>>>>>>>> handle all the traffic we would send. And scikit-learn is working on it too >>>>>>>>>> (today they added to the dev docs a link to mybinder to run the examples). >>>>>>>>>> >>>>>>>>>> I'm discussing with OVH (their infrastructure provider) on >>>>>>>>>> whether they'd be happy to provide a dedicated BinderHub specific to pandas >>>>>>>>>> (or may be we can have one for all NumFOCUS projects). We'll see how it >>>>>>>>>> goes, but wanted to let you know, so you're updated, and in case anyone is >>>>>>>>>> interested in participating in the discussions. Of course before any >>>>>>>>>> decision is made I'll open a discussion here or on GitHub. >>>>>>>>>> >>>>>>>>>> As part of the discussion I'm also trying to get a server for the >>>>>>>>>> website, and one for development stuff. Specfically for the dev docs >>>>>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>>>>> something like that). >>>>>>>>>> >>>>>>>>>> Let me know if you have any comment, want to be involved or >>>>>>>>>> whatever. >>>>>>>>>> >>>>>>>>>> Cheers! >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Pandas-dev mailing list >>>>>>>>> Pandas-dev at python.org >>>>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Andy R. Terrel, PhD >>>>>>> President >>>>>>> NumFOCUS >>>>>>> andy at numfocus.org >>>>>>> >>>>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>> >>> -- >> Andy R. Terrel, PhD >> President >> NumFOCUS >> andy at numfocus.org >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Tue Sep 24 08:59:51 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Tue, 24 Sep 2019 13:59:51 +0100 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: Just a quick update on the infrastructure for the pandas hosting. We (Sylvain and myself) just had a call with the people from OVH, to discuss what was said here. A quick summary: - We shared more in detail what we need with them, and the credit we discussed (having them in the sponsors page, and in the corner of the Binder executable widgets) - They seem happy about it, but need to discuss internally, they expect to have a final answer in couple of weeks - It may make more sense to reuse the current Binder infrastructure (if the Binder team agrees, of course) - They'll have to set a limit on the number resources we can use, and if we ever exceed them, we'll discuss again the conditions (more like a formalism, I don't expect the limit to be something to worry about) - They are happy to consider doing the same with the whole ecosystem. But we'll first start with pandas to not make things more complex, and if they host multiple projects we'll manage it at the NumFOCUS level, so they don't need to deal with many projects individually I think those are the main points, if I missed something, or anything is not clear, please feel free to add to that Sylvain. Regarding Discourse, any progress in your side with that Andy? Cheers! On Fri, Sep 20, 2019 at 12:18 PM Marc Garcia wrote: > I'm fine with that conceptually, but I think Discourse will make things > quite tricky to find things then. > > We already got our discourse approved, if you want to join it an > experiment with the setting. But it's the first thing I tried, and after > you join a category (project), everything feels like it's in the same place > (even if subcategories and tags exist). And I think we need at least a > clear separation between pandas/users pandas/contributors discussions. > > May be I just couldn't find the settings, let me know if you manage to get > a multi-project set up that makes sense. > > On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < > tom.augspurger88 at gmail.com> wrote: > >> I'd prefer to join a discourse along with NumPy, Dask, and other PyData >> or NumFOCUS projects, rather than going out on our own. >> >> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >> wrote: >> >>> I don't know much about discourse, but why do we want to self-host it? >>> Seems like Discourse does it for free for open source projects: >>> https://free.discourse.group/ And I don't think we want another system >>> to maintain. Am I missing something? >>> >>> I applied for https://pandas.discourse.group, so we can give it a try. >>> We should have it approved and working in couple of days. >>> >>> For what I saw, Discourse has one level of categories, so I guess we >>> want one per project, so we can have categories for "Users", >>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>> single Discourse for NumFOCUS, every project will be a category, and it'll >>> be difficult to group conversations. >>> >>> If anyone already has experience with Discourse and disagrees with my >>> guesses, please let me know. >>> >>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: >>> >>>> Sounds great to me. Just let me know where everything goes. >>>> >>>> NumPy wants me to help host a discourse for them, maybe OVH would be a >>>> good place to do that as well, (although I would be more inclinded if it >>>> was pydata and we had pandas, scipy, and numpy on it). >>>> >>>> -- Andy >>>> >>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>> tom.augspurger88 at gmail.com> wrote: >>>> >>>>> Sounds good w.r.t crediting OVH on those pages. >>>>> >>>>> For the ASV results at pandas.pydata.org/speed (which I now notice is >>>>> currently broken for pandas), the only thing on the webserver is a >>>>> cron job doing a `git pull` from >>>>> https://github.com/asv-runner/asv-collection, from within >>>>> `/usr/share/nginx`. >>>>> >>>>> Tom >>>>> >>>>> >>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> An update on the new website infrastructure. We need to finish >>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>> pandas infrastructure we need. >>>>>> >>>>>> My initial idea is to credit them in the page with the rest of the >>>>>> sponsors in the new website: >>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>> where Binder is credited here: https://spacy.io/). >>>>>> >>>>>> What I'd like to ask is: >>>>>> >>>>>> 1. For the production website and docs (static content only, for the >>>>>> traffic we need): >>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>> 2. For our tools and processes, like the benchmarks, builds, CI stuff >>>>>> (temporary publish the docs for every PR,...): >>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>> >>>>>> For the BinderHub, QuantStack offered help with the set up (which is >>>>>> great, because I don't know much about Binder myself, and I'm not sure if >>>>>> anyone else does or wants to take care of this). I don't think it'll be >>>>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>>>> can add things to Binder iteratively, and have more info as we grow. >>>>>> >>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>> start to prototype and see how everything works. >>>>>> >>>>>> Cheers! >>>>>> >>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>>>>> wrote: >>>>>> >>>>>>> Somehow related to the work on the new website ( >>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>> mybinder). >>>>>>> >>>>>>> While they are very happy with the idea of having this is pandas, >>>>>>> it's uncertain if the current infrastructure Binder has got, is able to >>>>>>> handle all the traffic we would send. And scikit-learn is working on it too >>>>>>> (today they added to the dev docs a link to mybinder to run the examples). >>>>>>> >>>>>>> I'm discussing with OVH (their infrastructure provider) on whether >>>>>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>>>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>>>>> wanted to let you know, so you're updated, and in case anyone is interested >>>>>>> in participating in the discussions. Of course before any decision is made >>>>>>> I'll open a discussion here or on GitHub. >>>>>>> >>>>>>> As part of the discussion I'm also trying to get a server for the >>>>>>> website, and one for development stuff. Specfically for the dev docs >>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>> something like that). >>>>>>> >>>>>>> Let me know if you have any comment, want to be involved or whatever. >>>>>>> >>>>>>> Cheers! >>>>>>> >>>>>> _______________________________________________ >>>>>> Pandas-dev mailing list >>>>>> Pandas-dev at python.org >>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>> >>>>> >>>> >>>> -- >>>> Andy R. Terrel, PhD >>>> President >>>> NumFOCUS >>>> andy at numfocus.org >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Wed Sep 25 07:38:33 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Wed, 25 Sep 2019 13:38:33 +0200 Subject: [Pandas-dev] Discourse discussion forum In-Reply-To: References: Message-ID: What do other people think about starting to use discourse for pandas? (and about sharing it with other projects or having our own?) -- On the existing lists: I don't think discourse would replace the core devs list (that is intentionally private). And IMO also not gitter (discourse is not a real-time chat). Joris On Fri, 20 Sep 2019 at 14:58, Marc Garcia wrote: > For what I've seen I'd say that Discourse can be configured to interact > with a category like a distribution list (subscribe and have an email > address to send messages there). Not sure, but for the settings I've seen > should be possible. > > Personally I think it should replace all the existing lists: > - pydata google group > - pandas-dev (this) > - core devs list > > I'm also ok to get rid of gitter once we move to discourse (also ok to > keep it if people find it useful, but I rarely use it). > > I created an issue for this discussion some time ago: > https://github.com/pandas-dev/pandas/issues/27903 > > On Fri, Sep 20, 2019 at 1:50 PM Tom Augspurger > wrote: > >> >> >> On Fri, Sep 20, 2019 at 6:57 AM Andy Terrel wrote: >> >>> Thanks Joris for splitting the thread, sorry if I hijacked the other one. >>> >>> For some discussion from numpy you can see here >>> https://github.com/numpy/numpy.org/issues/28 >>> >>> Julia and Jupyter both run their own discourse but Dask, Numpy, Scipy >>> have all told me ?I don?t want to run it ourselves but be part of a larger >>> one? >>> >>> I bet we can figure out how to organize it. >>> >>> I just put in an application to get pydata.discourse.org. >>> >>> ? Andy >>> >>> On Fri, Sep 20, 2019 at 6:52 AM Joris Van den Bossche < >>> jorisvandenbossche at gmail.com> wrote: >>> >>>> (let's use a new thread for discourse, as it is a different discussion >>>> from the website hosting I think, regardless whether OVH might also host >>>> discourse) >>>> >>>> I am not familiar enough myself with discourse to know whether multiple >>>> projects sharing a single discourse will become annoying. But indeed, that >>>> sounds as it needs some kind of hierarchical category / tagging. >>>> >>>> For pandas itself: I think I quite like the idea of having a discourse, >>>> but *if* we do that, we should think about how that fits with / >>>> replaces / adds to /... some of the other communication channels >>>> (pandas-dev mailing list, pydata mailing list, github issues, ..). >>>> >>> >> IMO, we can replace the pandas-dev & pydata mailing lists with it. >> Possibly gitter as well. >> >> >>> Joris >>>> >>>> On Fri, 20 Sep 2019 at 13:18, Marc Garcia >>>> wrote: >>>> >>>>> I'm fine with that conceptually, but I think Discourse will make >>>>> things quite tricky to find things then. >>>>> >>>>> We already got our discourse approved, if you want to join it an >>>>> experiment with the setting. But it's the first thing I tried, and after >>>>> you join a category (project), everything feels like it's in the same place >>>>> (even if subcategories and tags exist). And I think we need at least a >>>>> clear separation between pandas/users pandas/contributors discussions. >>>>> >>>>> May be I just couldn't find the settings, let me know if you manage to >>>>> get a multi-project set up that makes sense. >>>>> >>>>> On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < >>>>> tom.augspurger88 at gmail.com> wrote: >>>>> >>>>>> I'd prefer to join a discourse along with NumPy, Dask, and other >>>>>> PyData or NumFOCUS projects, rather than going out on our own. >>>>>> >>>>>> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >>>>>> wrote: >>>>>> >>>>>>> I don't know much about discourse, but why do we want to self-host >>>>>>> it? Seems like Discourse does it for free for open source projects: >>>>>>> https://free.discourse.group/ And I don't think we want another >>>>>>> system to maintain. Am I missing something? >>>>>>> >>>>>>> I applied for https://pandas.discourse.group, so we can give it a >>>>>>> try. We should have it approved and working in couple of days. >>>>>>> >>>>>>> For what I saw, Discourse has one level of categories, so I guess we >>>>>>> want one per project, so we can have categories for "Users", >>>>>>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>>>>>> single Discourse for NumFOCUS, every project will be a category, and it'll >>>>>>> be difficult to group conversations. >>>>>>> >>>>>>> If anyone already has experience with Discourse and disagrees with >>>>>>> my guesses, please let me know. >>>>>>> >>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel >>>>>>> wrote: >>>>>>> >>>>>>>> Sounds great to me. Just let me know where everything goes. >>>>>>>> >>>>>>>> NumPy wants me to help host a discourse for them, maybe OVH would >>>>>>>> be a good place to do that as well, (although I would be more inclinded if >>>>>>>> it was pydata and we had pandas, scipy, and numpy on it). >>>>>>>> >>>>>>>> -- Andy >>>>>>>> >>>>>>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>>>>>> tom.augspurger88 at gmail.com> wrote: >>>>>>>> >>>>>>>>> Sounds good w.r.t crediting OVH on those pages. >>>>>>>>> >>>>>>>>> For the ASV results at pandas.pydata.org/speed (which I now >>>>>>>>> notice is currently broken for pandas), the only thing on the webserver is a >>>>>>>>> cron job doing a `git pull` from >>>>>>>>> https://github.com/asv-runner/asv-collection, from within >>>>>>>>> `/usr/share/nginx`. >>>>>>>>> >>>>>>>>> Tom >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> An update on the new website infrastructure. We need to finish >>>>>>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>>>>>> pandas infrastructure we need. >>>>>>>>>> >>>>>>>>>> My initial idea is to credit them in the page with the rest of >>>>>>>>>> the sponsors in the new website: >>>>>>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>>>>>> where Binder is credited here: https://spacy.io/). >>>>>>>>>> >>>>>>>>>> What I'd like to ask is: >>>>>>>>>> >>>>>>>>>> 1. For the production website and docs (static content only, for >>>>>>>>>> the traffic we need): >>>>>>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>>>>>> 2. For our tools and processes, like the benchmarks, builds, CI >>>>>>>>>> stuff (temporary publish the docs for every PR,...): >>>>>>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>>>>>> >>>>>>>>>> For the BinderHub, QuantStack offered help with the set up (which >>>>>>>>>> is great, because I don't know much about Binder myself, and I'm not sure >>>>>>>>>> if anyone else does or wants to take care of this). I don't think it'll be >>>>>>>>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>>>>>>>> can add things to Binder iteratively, and have more info as we grow. >>>>>>>>>> >>>>>>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>>>>>> start to prototype and see how everything works. >>>>>>>>>> >>>>>>>>>> Cheers! >>>>>>>>>> >>>>>>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia < >>>>>>>>>> garcia.marc at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Somehow related to the work on the new website ( >>>>>>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>>>>>> mybinder). >>>>>>>>>>> >>>>>>>>>>> While they are very happy with the idea of having this is >>>>>>>>>>> pandas, it's uncertain if the current infrastructure Binder has got, is >>>>>>>>>>> able to handle all the traffic we would send. And scikit-learn is working >>>>>>>>>>> on it too (today they added to the dev docs a link to mybinder to run the >>>>>>>>>>> examples). >>>>>>>>>>> >>>>>>>>>>> I'm discussing with OVH (their infrastructure provider) on >>>>>>>>>>> whether they'd be happy to provide a dedicated BinderHub specific to pandas >>>>>>>>>>> (or may be we can have one for all NumFOCUS projects). We'll see how it >>>>>>>>>>> goes, but wanted to let you know, so you're updated, and in case anyone is >>>>>>>>>>> interested in participating in the discussions. Of course before any >>>>>>>>>>> decision is made I'll open a discussion here or on GitHub. >>>>>>>>>>> >>>>>>>>>>> As part of the discussion I'm also trying to get a server for >>>>>>>>>>> the website, and one for development stuff. Specfically for the dev docs >>>>>>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>>>>>> something like that). >>>>>>>>>>> >>>>>>>>>>> Let me know if you have any comment, want to be involved or >>>>>>>>>>> whatever. >>>>>>>>>>> >>>>>>>>>>> Cheers! >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Pandas-dev mailing list >>>>>>>>>> Pandas-dev at python.org >>>>>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Andy R. Terrel, PhD >>>>>>>> President >>>>>>>> NumFOCUS >>>>>>>> andy at numfocus.org >>>>>>>> >>>>>>> _______________________________________________ >>>>> Pandas-dev mailing list >>>>> Pandas-dev at python.org >>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>> >>>> -- >>> Andy R. Terrel, PhD >>> President >>> NumFOCUS >>> andy at numfocus.org >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From maartenbreddels at gmail.com Wed Sep 25 07:51:44 2019 From: maartenbreddels at gmail.com (Maarten Breddels) Date: Wed, 25 Sep 2019 13:51:44 +0200 Subject: [Pandas-dev] Discourse discussion forum In-Reply-To: References: Message-ID: I personally am not a fan of having Discourse, where will all the content be 10 years from now? Gitter is already an information sinkhole, I'm not sure how Discourse compares. GitHub is soo large that people will find a way to save the content if/when it disappears, and StackOverflow the same. Discourse, GitHub and StackOverflow may not target the same audience, and I'm a bit hesitant to embrace Discourse, but I also don't know of a better scenario. Short version: not sure, I feel uneasy about Discourse, but happy to be proven wrong. Op wo 25 sep. 2019 om 13:39 schreef Joris Van den Bossche < jorisvandenbossche at gmail.com>: > What do other people think about starting to use discourse for pandas? > (and about sharing it with other projects or having our own?) > > -- > > On the existing lists: I don't think discourse would replace the core devs > list (that is intentionally private). And IMO also not gitter (discourse is > not a real-time chat). > > Joris > > On Fri, 20 Sep 2019 at 14:58, Marc Garcia wrote: > >> For what I've seen I'd say that Discourse can be configured to interact >> with a category like a distribution list (subscribe and have an email >> address to send messages there). Not sure, but for the settings I've seen >> should be possible. >> >> Personally I think it should replace all the existing lists: >> - pydata google group >> - pandas-dev (this) >> - core devs list >> >> I'm also ok to get rid of gitter once we move to discourse (also ok to >> keep it if people find it useful, but I rarely use it). >> >> I created an issue for this discussion some time ago: >> https://github.com/pandas-dev/pandas/issues/27903 >> >> On Fri, Sep 20, 2019 at 1:50 PM Tom Augspurger < >> tom.augspurger88 at gmail.com> wrote: >> >>> >>> >>> On Fri, Sep 20, 2019 at 6:57 AM Andy Terrel wrote: >>> >>>> Thanks Joris for splitting the thread, sorry if I hijacked the other >>>> one. >>>> >>>> For some discussion from numpy you can see here >>>> https://github.com/numpy/numpy.org/issues/28 >>>> >>>> Julia and Jupyter both run their own discourse but Dask, Numpy, Scipy >>>> have all told me ?I don?t want to run it ourselves but be part of a larger >>>> one? >>>> >>>> I bet we can figure out how to organize it. >>>> >>>> I just put in an application to get pydata.discourse.org. >>>> >>>> ? Andy >>>> >>>> On Fri, Sep 20, 2019 at 6:52 AM Joris Van den Bossche < >>>> jorisvandenbossche at gmail.com> wrote: >>>> >>>>> (let's use a new thread for discourse, as it is a different discussion >>>>> from the website hosting I think, regardless whether OVH might also host >>>>> discourse) >>>>> >>>>> I am not familiar enough myself with discourse to know whether >>>>> multiple projects sharing a single discourse will become annoying. But >>>>> indeed, that sounds as it needs some kind of hierarchical category / >>>>> tagging. >>>>> >>>>> For pandas itself: I think I quite like the idea of having a >>>>> discourse, but *if* we do that, we should think about how that fits >>>>> with / replaces / adds to /... some of the other communication channels >>>>> (pandas-dev mailing list, pydata mailing list, github issues, ..). >>>>> >>>> >>> IMO, we can replace the pandas-dev & pydata mailing lists with it. >>> Possibly gitter as well. >>> >>> >>>> Joris >>>>> >>>>> On Fri, 20 Sep 2019 at 13:18, Marc Garcia >>>>> wrote: >>>>> >>>>>> I'm fine with that conceptually, but I think Discourse will make >>>>>> things quite tricky to find things then. >>>>>> >>>>>> We already got our discourse approved, if you want to join it an >>>>>> experiment with the setting. But it's the first thing I tried, and after >>>>>> you join a category (project), everything feels like it's in the same place >>>>>> (even if subcategories and tags exist). And I think we need at least a >>>>>> clear separation between pandas/users pandas/contributors discussions. >>>>>> >>>>>> May be I just couldn't find the settings, let me know if you manage >>>>>> to get a multi-project set up that makes sense. >>>>>> >>>>>> On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < >>>>>> tom.augspurger88 at gmail.com> wrote: >>>>>> >>>>>>> I'd prefer to join a discourse along with NumPy, Dask, and other >>>>>>> PyData or NumFOCUS projects, rather than going out on our own. >>>>>>> >>>>>>> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >>>>>>> wrote: >>>>>>> >>>>>>>> I don't know much about discourse, but why do we want to self-host >>>>>>>> it? Seems like Discourse does it for free for open source projects: >>>>>>>> https://free.discourse.group/ And I don't think we want another >>>>>>>> system to maintain. Am I missing something? >>>>>>>> >>>>>>>> I applied for https://pandas.discourse.group, so we can give it a >>>>>>>> try. We should have it approved and working in couple of days. >>>>>>>> >>>>>>>> For what I saw, Discourse has one level of categories, so I guess >>>>>>>> we want one per project, so we can have categories for "Users", >>>>>>>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>>>>>>> single Discourse for NumFOCUS, every project will be a category, and it'll >>>>>>>> be difficult to group conversations. >>>>>>>> >>>>>>>> If anyone already has experience with Discourse and disagrees with >>>>>>>> my guesses, please let me know. >>>>>>>> >>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Sounds great to me. Just let me know where everything goes. >>>>>>>>> >>>>>>>>> NumPy wants me to help host a discourse for them, maybe OVH would >>>>>>>>> be a good place to do that as well, (although I would be more inclinded if >>>>>>>>> it was pydata and we had pandas, scipy, and numpy on it). >>>>>>>>> >>>>>>>>> -- Andy >>>>>>>>> >>>>>>>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>>>>>>> tom.augspurger88 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Sounds good w.r.t crediting OVH on those pages. >>>>>>>>>> >>>>>>>>>> For the ASV results at pandas.pydata.org/speed (which I now >>>>>>>>>> notice is currently broken for pandas), the only thing on the webserver is a >>>>>>>>>> cron job doing a `git pull` from >>>>>>>>>> https://github.com/asv-runner/asv-collection, from within >>>>>>>>>> `/usr/share/nginx`. >>>>>>>>>> >>>>>>>>>> Tom >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia < >>>>>>>>>> garcia.marc at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> An update on the new website infrastructure. We need to finish >>>>>>>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>>>>>>> pandas infrastructure we need. >>>>>>>>>>> >>>>>>>>>>> My initial idea is to credit them in the page with the rest of >>>>>>>>>>> the sponsors in the new website: >>>>>>>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>>>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>>>>>>> where Binder is credited here: https://spacy.io/). >>>>>>>>>>> >>>>>>>>>>> What I'd like to ask is: >>>>>>>>>>> >>>>>>>>>>> 1. For the production website and docs (static content only, for >>>>>>>>>>> the traffic we need): >>>>>>>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>>>>>>> 2. For our tools and processes, like the benchmarks, builds, CI >>>>>>>>>>> stuff (temporary publish the docs for every PR,...): >>>>>>>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>>>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>>>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>>>>>>> >>>>>>>>>>> For the BinderHub, QuantStack offered help with the set up >>>>>>>>>>> (which is great, because I don't know much about Binder myself, and I'm not >>>>>>>>>>> sure if anyone else does or wants to take care of this). I don't think >>>>>>>>>>> it'll be easy to estimate how big is the cluster we need beforehand, but I >>>>>>>>>>> guess we can add things to Binder iteratively, and have more info as we >>>>>>>>>>> grow. >>>>>>>>>>> >>>>>>>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>>>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>>>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>>>>>>> start to prototype and see how everything works. >>>>>>>>>>> >>>>>>>>>>> Cheers! >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia < >>>>>>>>>>> garcia.marc at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Somehow related to the work on the new website ( >>>>>>>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>>>>>>> mybinder). >>>>>>>>>>>> >>>>>>>>>>>> While they are very happy with the idea of having this is >>>>>>>>>>>> pandas, it's uncertain if the current infrastructure Binder has got, is >>>>>>>>>>>> able to handle all the traffic we would send. And scikit-learn is working >>>>>>>>>>>> on it too (today they added to the dev docs a link to mybinder to run the >>>>>>>>>>>> examples). >>>>>>>>>>>> >>>>>>>>>>>> I'm discussing with OVH (their infrastructure provider) on >>>>>>>>>>>> whether they'd be happy to provide a dedicated BinderHub specific to pandas >>>>>>>>>>>> (or may be we can have one for all NumFOCUS projects). We'll see how it >>>>>>>>>>>> goes, but wanted to let you know, so you're updated, and in case anyone is >>>>>>>>>>>> interested in participating in the discussions. Of course before any >>>>>>>>>>>> decision is made I'll open a discussion here or on GitHub. >>>>>>>>>>>> >>>>>>>>>>>> As part of the discussion I'm also trying to get a server for >>>>>>>>>>>> the website, and one for development stuff. Specfically for the dev docs >>>>>>>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>>>>>>> something like that). >>>>>>>>>>>> >>>>>>>>>>>> Let me know if you have any comment, want to be involved or >>>>>>>>>>>> whatever. >>>>>>>>>>>> >>>>>>>>>>>> Cheers! >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Pandas-dev mailing list >>>>>>>>>>> Pandas-dev at python.org >>>>>>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Andy R. Terrel, PhD >>>>>>>>> President >>>>>>>>> NumFOCUS >>>>>>>>> andy at numfocus.org >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>> Pandas-dev mailing list >>>>>> Pandas-dev at python.org >>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>> >>>>> -- >>>> Andy R. Terrel, PhD >>>> President >>>> NumFOCUS >>>> andy at numfocus.org >>>> >>> _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy at numfocus.org Wed Sep 25 08:32:34 2019 From: andy at numfocus.org (Andy Terrel) Date: Wed, 25 Sep 2019 07:32:34 -0500 Subject: [Pandas-dev] New website infrastructure In-Reply-To: References: Message-ID: On Tue, Sep 24, 2019 at 8:00 AM Marc Garcia wrote: > Just a quick update on the infrastructure for the pandas hosting. We > (Sylvain and myself) just had a call with the people from OVH, to discuss > what was said here. > > A quick summary: > - We shared more in detail what we need with them, and the credit we > discussed (having them in the sponsors page, and in the corner of the > Binder executable widgets) > - They seem happy about it, but need to discuss internally, they expect to > have a final answer in couple of weeks > - It may make more sense to reuse the current Binder infrastructure (if > the Binder team agrees, of course) > - They'll have to set a limit on the number resources we can use, and if > we ever exceed them, we'll discuss again the conditions (more like a > formalism, I don't expect the limit to be something to worry about) > - They are happy to consider doing the same with the whole ecosystem. But > we'll first start with pandas to not make things more complex, and if they > host multiple projects we'll manage it at the NumFOCUS level, so they don't > need to deal with many projects individually > > I think those are the main points, if I missed something, or anything is > not clear, please feel free to add to that Sylvain. > > Regarding Discourse, any progress in your side with that Andy? > We moved that to another thread, but for status, I applied for a free account but haven't heard back. > > Cheers! > > On Fri, Sep 20, 2019 at 12:18 PM Marc Garcia > wrote: > >> I'm fine with that conceptually, but I think Discourse will make things >> quite tricky to find things then. >> >> We already got our discourse approved, if you want to join it an >> experiment with the setting. But it's the first thing I tried, and after >> you join a category (project), everything feels like it's in the same place >> (even if subcategories and tags exist). And I think we need at least a >> clear separation between pandas/users pandas/contributors discussions. >> >> May be I just couldn't find the settings, let me know if you manage to >> get a multi-project set up that makes sense. >> >> On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < >> tom.augspurger88 at gmail.com> wrote: >> >>> I'd prefer to join a discourse along with NumPy, Dask, and other PyData >>> or NumFOCUS projects, rather than going out on our own. >>> >>> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >>> wrote: >>> >>>> I don't know much about discourse, but why do we want to self-host it? >>>> Seems like Discourse does it for free for open source projects: >>>> https://free.discourse.group/ And I don't think we want another system >>>> to maintain. Am I missing something? >>>> >>>> I applied for https://pandas.discourse.group, so we can give it a try. >>>> We should have it approved and working in couple of days. >>>> >>>> For what I saw, Discourse has one level of categories, so I guess we >>>> want one per project, so we can have categories for "Users", >>>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>>> single Discourse for NumFOCUS, every project will be a category, and it'll >>>> be difficult to group conversations. >>>> >>>> If anyone already has experience with Discourse and disagrees with my >>>> guesses, please let me know. >>>> >>>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel wrote: >>>> >>>>> Sounds great to me. Just let me know where everything goes. >>>>> >>>>> NumPy wants me to help host a discourse for them, maybe OVH would be a >>>>> good place to do that as well, (although I would be more inclinded if it >>>>> was pydata and we had pandas, scipy, and numpy on it). >>>>> >>>>> -- Andy >>>>> >>>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>>> tom.augspurger88 at gmail.com> wrote: >>>>> >>>>>> Sounds good w.r.t crediting OVH on those pages. >>>>>> >>>>>> For the ASV results at pandas.pydata.org/speed (which I now notice >>>>>> is currently broken for pandas), the only thing on the webserver is a >>>>>> cron job doing a `git pull` from >>>>>> https://github.com/asv-runner/asv-collection, from within >>>>>> `/usr/share/nginx`. >>>>>> >>>>>> Tom >>>>>> >>>>>> >>>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> An update on the new website infrastructure. We need to finish >>>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>>> pandas infrastructure we need. >>>>>>> >>>>>>> My initial idea is to credit them in the page with the rest of the >>>>>>> sponsors in the new website: >>>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>>> where Binder is credited here: https://spacy.io/). >>>>>>> >>>>>>> What I'd like to ask is: >>>>>>> >>>>>>> 1. For the production website and docs (static content only, for the >>>>>>> traffic we need): >>>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>>> 2. For our tools and processes, like the benchmarks, builds, CI >>>>>>> stuff (temporary publish the docs for every PR,...): >>>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>>> >>>>>>> For the BinderHub, QuantStack offered help with the set up (which is >>>>>>> great, because I don't know much about Binder myself, and I'm not sure if >>>>>>> anyone else does or wants to take care of this). I don't think it'll be >>>>>>> easy to estimate how big is the cluster we need beforehand, but I guess we >>>>>>> can add things to Binder iteratively, and have more info as we grow. >>>>>>> >>>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>>> start to prototype and see how everything works. >>>>>>> >>>>>>> Cheers! >>>>>>> >>>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia >>>>>>> wrote: >>>>>>> >>>>>>>> Somehow related to the work on the new website ( >>>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>>> mybinder). >>>>>>>> >>>>>>>> While they are very happy with the idea of having this is pandas, >>>>>>>> it's uncertain if the current infrastructure Binder has got, is able to >>>>>>>> handle all the traffic we would send. And scikit-learn is working on it too >>>>>>>> (today they added to the dev docs a link to mybinder to run the examples). >>>>>>>> >>>>>>>> I'm discussing with OVH (their infrastructure provider) on whether >>>>>>>> they'd be happy to provide a dedicated BinderHub specific to pandas (or may >>>>>>>> be we can have one for all NumFOCUS projects). We'll see how it goes, but >>>>>>>> wanted to let you know, so you're updated, and in case anyone is interested >>>>>>>> in participating in the discussions. Of course before any decision is made >>>>>>>> I'll open a discussion here or on GitHub. >>>>>>>> >>>>>>>> As part of the discussion I'm also trying to get a server for the >>>>>>>> website, and one for development stuff. Specfically for the dev docs >>>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>>> something like that). >>>>>>>> >>>>>>>> Let me know if you have any comment, want to be involved or >>>>>>>> whatever. >>>>>>>> >>>>>>>> Cheers! >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pandas-dev mailing list >>>>>>> Pandas-dev at python.org >>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Andy R. Terrel, PhD >>>>> President >>>>> NumFOCUS >>>>> andy at numfocus.org >>>>> >>>> -- Andy R. Terrel, PhD President NumFOCUS andy at numfocus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Wed Sep 25 09:03:11 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Wed, 25 Sep 2019 10:03:11 -0300 Subject: [Pandas-dev] Discourse discussion forum In-Reply-To: References: Message-ID: Discourse has private categories, we already have a private "Maintainers" one, that only admins can see and use. And there are other permissions levels that can be used. For example, we can have a private category for the memebers of the code of conduct committee... I just need to check if we can associate email addresses to those groups, so when someone emails to coc at pandas.io the messages are posted in that private group. But if we can set up that as we need, I think we should be able to replace all those and centralize everything in Discourse. I'm skeptical on being able to set up a global Discourse for all the ecosystem, where things are easy to find, based on how Discourse works and the tests I did. I'd move forward with our own for now if nobody is able to set that up. Andy, I got the pandas account approved in minutes. I see that we can have a custom domain, so you can use the pandas and see if we can manage to have multiple projects in a way we like, and if we do we just change the domain to discuss.pydata.org (or whatever). You're already an admin, feel free to experiment and change the set up as you need. Maarten, not sure I understand your point. Not a fan of Discourse so far, but I think having the user and the devs discussions in a single place makes it easier to find the information, and I think Discourse interface also makes it easier to find compared to mailman, or google groups. Regardless of gitter (there are no important discussions or decision making there I think), would you prefer to stay with mailman and google groups over Discourse? Or what you think would be the ideal or best option? Thanks! On Wed, Sep 25, 2019 at 8:39 AM Joris Van den Bossche < jorisvandenbossche at gmail.com> wrote: > What do other people think about starting to use discourse for pandas? > (and about sharing it with other projects or having our own?) > > -- > > On the existing lists: I don't think discourse would replace the core devs > list (that is intentionally private). And IMO also not gitter (discourse is > not a real-time chat). > > Joris > > On Fri, 20 Sep 2019 at 14:58, Marc Garcia wrote: > >> For what I've seen I'd say that Discourse can be configured to interact >> with a category like a distribution list (subscribe and have an email >> address to send messages there). Not sure, but for the settings I've seen >> should be possible. >> >> Personally I think it should replace all the existing lists: >> - pydata google group >> - pandas-dev (this) >> - core devs list >> >> I'm also ok to get rid of gitter once we move to discourse (also ok to >> keep it if people find it useful, but I rarely use it). >> >> I created an issue for this discussion some time ago: >> https://github.com/pandas-dev/pandas/issues/27903 >> >> On Fri, Sep 20, 2019 at 1:50 PM Tom Augspurger < >> tom.augspurger88 at gmail.com> wrote: >> >>> >>> >>> On Fri, Sep 20, 2019 at 6:57 AM Andy Terrel wrote: >>> >>>> Thanks Joris for splitting the thread, sorry if I hijacked the other >>>> one. >>>> >>>> For some discussion from numpy you can see here >>>> https://github.com/numpy/numpy.org/issues/28 >>>> >>>> Julia and Jupyter both run their own discourse but Dask, Numpy, Scipy >>>> have all told me ?I don?t want to run it ourselves but be part of a larger >>>> one? >>>> >>>> I bet we can figure out how to organize it. >>>> >>>> I just put in an application to get pydata.discourse.org. >>>> >>>> ? Andy >>>> >>>> On Fri, Sep 20, 2019 at 6:52 AM Joris Van den Bossche < >>>> jorisvandenbossche at gmail.com> wrote: >>>> >>>>> (let's use a new thread for discourse, as it is a different discussion >>>>> from the website hosting I think, regardless whether OVH might also host >>>>> discourse) >>>>> >>>>> I am not familiar enough myself with discourse to know whether >>>>> multiple projects sharing a single discourse will become annoying. But >>>>> indeed, that sounds as it needs some kind of hierarchical category / >>>>> tagging. >>>>> >>>>> For pandas itself: I think I quite like the idea of having a >>>>> discourse, but *if* we do that, we should think about how that fits >>>>> with / replaces / adds to /... some of the other communication channels >>>>> (pandas-dev mailing list, pydata mailing list, github issues, ..). >>>>> >>>> >>> IMO, we can replace the pandas-dev & pydata mailing lists with it. >>> Possibly gitter as well. >>> >>> >>>> Joris >>>>> >>>>> On Fri, 20 Sep 2019 at 13:18, Marc Garcia >>>>> wrote: >>>>> >>>>>> I'm fine with that conceptually, but I think Discourse will make >>>>>> things quite tricky to find things then. >>>>>> >>>>>> We already got our discourse approved, if you want to join it an >>>>>> experiment with the setting. But it's the first thing I tried, and after >>>>>> you join a category (project), everything feels like it's in the same place >>>>>> (even if subcategories and tags exist). And I think we need at least a >>>>>> clear separation between pandas/users pandas/contributors discussions. >>>>>> >>>>>> May be I just couldn't find the settings, let me know if you manage >>>>>> to get a multi-project set up that makes sense. >>>>>> >>>>>> On Fri, Sep 20, 2019 at 12:07 PM Tom Augspurger < >>>>>> tom.augspurger88 at gmail.com> wrote: >>>>>> >>>>>>> I'd prefer to join a discourse along with NumPy, Dask, and other >>>>>>> PyData or NumFOCUS projects, rather than going out on our own. >>>>>>> >>>>>>> On Fri, Sep 20, 2019 at 4:47 AM Marc Garcia >>>>>>> wrote: >>>>>>> >>>>>>>> I don't know much about discourse, but why do we want to self-host >>>>>>>> it? Seems like Discourse does it for free for open source projects: >>>>>>>> https://free.discourse.group/ And I don't think we want another >>>>>>>> system to maintain. Am I missing something? >>>>>>>> >>>>>>>> I applied for https://pandas.discourse.group, so we can give it a >>>>>>>> try. We should have it approved and working in couple of days. >>>>>>>> >>>>>>>> For what I saw, Discourse has one level of categories, so I guess >>>>>>>> we want one per project, so we can have categories for "Users", >>>>>>>> "Contributors", "Ecosystem"... or something similar. I guess if we have a >>>>>>>> single Discourse for NumFOCUS, every project will be a category, and it'll >>>>>>>> be difficult to group conversations. >>>>>>>> >>>>>>>> If anyone already has experience with Discourse and disagrees with >>>>>>>> my guesses, please let me know. >>>>>>>> >>>>>>>> On Wed, Sep 18, 2019 at 4:32 PM Andy Terrel >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Sounds great to me. Just let me know where everything goes. >>>>>>>>> >>>>>>>>> NumPy wants me to help host a discourse for them, maybe OVH would >>>>>>>>> be a good place to do that as well, (although I would be more inclinded if >>>>>>>>> it was pydata and we had pandas, scipy, and numpy on it). >>>>>>>>> >>>>>>>>> -- Andy >>>>>>>>> >>>>>>>>> On Wed, Sep 18, 2019 at 8:51 AM Tom Augspurger < >>>>>>>>> tom.augspurger88 at gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Sounds good w.r.t crediting OVH on those pages. >>>>>>>>>> >>>>>>>>>> For the ASV results at pandas.pydata.org/speed (which I now >>>>>>>>>> notice is currently broken for pandas), the only thing on the webserver is a >>>>>>>>>> cron job doing a `git pull` from >>>>>>>>>> https://github.com/asv-runner/asv-collection, from within >>>>>>>>>> `/usr/share/nginx`. >>>>>>>>>> >>>>>>>>>> Tom >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Sep 18, 2019 at 8:18 AM Marc Garcia < >>>>>>>>>> garcia.marc at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> An update on the new website infrastructure. We need to finish >>>>>>>>>>> discussing the details, but OVH is happy to provide the hosting for the >>>>>>>>>>> pandas infrastructure we need. >>>>>>>>>>> >>>>>>>>>>> My initial idea is to credit them in the page with the rest of >>>>>>>>>>> the sponsors in the new website: >>>>>>>>>>> https://datapythonista.github.io/pandas-web/community/team.html#institutional-partners and >>>>>>>>>>> also in the top right corner of the runnable code widgets (see for example >>>>>>>>>>> where Binder is credited here: https://spacy.io/). >>>>>>>>>>> >>>>>>>>>>> What I'd like to ask is: >>>>>>>>>>> >>>>>>>>>>> 1. For the production website and docs (static content only, for >>>>>>>>>>> the traffic we need): >>>>>>>>>>> https://us.ovhcloud.com/products/public-cloud/object-storage >>>>>>>>>>> 2. For our tools and processes, like the benchmarks, builds, CI >>>>>>>>>>> stuff (temporary publish the docs for every PR,...): >>>>>>>>>>> https://www.ovh.co.uk/vps/vps-ssd.xml (VPS SSD 3) >>>>>>>>>>> 3. For BinderHub (runnable code in our docs, launch tutorials on >>>>>>>>>>> Binder...): https://www.ovh.co.uk/public-cloud/kubernetes/ >>>>>>>>>>> >>>>>>>>>>> For the BinderHub, QuantStack offered help with the set up >>>>>>>>>>> (which is great, because I don't know much about Binder myself, and I'm not >>>>>>>>>>> sure if anyone else does or wants to take care of this). I don't think >>>>>>>>>>> it'll be easy to estimate how big is the cluster we need beforehand, but I >>>>>>>>>>> guess we can add things to Binder iteratively, and have more info as we >>>>>>>>>>> grow. >>>>>>>>>>> >>>>>>>>>>> OVH gave us a 200 euros voucher to experiment with the different >>>>>>>>>>> services. Let me know how all this sounds, and if there are no objections, >>>>>>>>>>> I'll create an account and buy those services with the voucher, and I'll >>>>>>>>>>> start to prototype and see how everything works. >>>>>>>>>>> >>>>>>>>>>> Cheers! >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 20, 2019 at 11:06 PM Marc Garcia < >>>>>>>>>>> garcia.marc at gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Somehow related to the work on the new website ( >>>>>>>>>>>> https://github.com/pandas-dev/pandas/pull/28014), I've been >>>>>>>>>>>> discussing with the Binder team, and looks like should be quite easy soon >>>>>>>>>>>> (with a Sphinx extension) to make all the documentation pages runnable with >>>>>>>>>>>> Binder, directly from the website (without opening the page as a Jupyter in >>>>>>>>>>>> mybinder). >>>>>>>>>>>> >>>>>>>>>>>> While they are very happy with the idea of having this is >>>>>>>>>>>> pandas, it's uncertain if the current infrastructure Binder has got, is >>>>>>>>>>>> able to handle all the traffic we would send. And scikit-learn is working >>>>>>>>>>>> on it too (today they added to the dev docs a link to mybinder to run the >>>>>>>>>>>> examples). >>>>>>>>>>>> >>>>>>>>>>>> I'm discussing with OVH (their infrastructure provider) on >>>>>>>>>>>> whether they'd be happy to provide a dedicated BinderHub specific to pandas >>>>>>>>>>>> (or may be we can have one for all NumFOCUS projects). We'll see how it >>>>>>>>>>>> goes, but wanted to let you know, so you're updated, and in case anyone is >>>>>>>>>>>> interested in participating in the discussions. Of course before any >>>>>>>>>>>> decision is made I'll open a discussion here or on GitHub. >>>>>>>>>>>> >>>>>>>>>>>> As part of the discussion I'm also trying to get a server for >>>>>>>>>>>> the website, and one for development stuff. Specfically for the dev docs >>>>>>>>>>>> (including rendered docs of every PR) and the GitHub app that will generate >>>>>>>>>>>> them. I guess it should be very easy to find a sponsor for these two >>>>>>>>>>>> servers (in exchange of a small note in the footer of the website, or >>>>>>>>>>>> something like that). >>>>>>>>>>>> >>>>>>>>>>>> Let me know if you have any comment, want to be involved or >>>>>>>>>>>> whatever. >>>>>>>>>>>> >>>>>>>>>>>> Cheers! >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Pandas-dev mailing list >>>>>>>>>>> Pandas-dev at python.org >>>>>>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Andy R. Terrel, PhD >>>>>>>>> President >>>>>>>>> NumFOCUS >>>>>>>>> andy at numfocus.org >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>> Pandas-dev mailing list >>>>>> Pandas-dev at python.org >>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>> >>>>> -- >>>> Andy R. Terrel, PhD >>>> President >>>> NumFOCUS >>>> andy at numfocus.org >>>> >>> _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: