From garcia.marc at gmail.com Thu Jul 4 10:45:38 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Thu, 4 Jul 2019 15:45:38 +0100 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 Message-ID: Hi there, Just to let you know that at EuroSciPy 2019 (in September in Spain) we will have a dataframe summit, to stay updated and coordinate among projects replicating the pandas API (other dataframe projects are more than welcome). Maintainers from all the main projects (pandas, dask, vaex, modin, cudf and koalas) will be attending. If you want to get involved (whether you can attend the conference or not), please DM me. More info: https://github.com/python-sprints/dataframe-summit Conference website: https://www.euroscipy.org/2019/ Cheers! -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Thu Jul 4 13:12:04 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Thu, 4 Jul 2019 12:12:04 -0500 Subject: [Pandas-dev] ANN: Pandas 0.25.0rc0 released Message-ID: Hi, I'm pleased to announce the availability of 0.25.0rc0. This is the first release candidate for 0.25.0. *Please try this RC and report any issues on the pandas issue tracker . This is a major release from 0.24.2 and includes a number of API changes, new features, enhancements, and performance improvements along with a large number of bug fixes. Highlights include - Dropped Python 2 support - Groupby aggregation with relabeling - Better repr for MultiIndex - Better truncated repr for Series and DataFrame See the release notes for a full list of all the change from 0.24.2. The release candidate can soon be installed with conda using the conda-forge channel conda install -c conda-forge/label/rc pandas=0.25.0rc0 Or via PyPI python3 -m pip install --upgrade --pre pandas Please report any issues with the release candidate on the pandas issue tracker . Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From aneeshaa45 at gmail.com Sat Jul 6 09:59:18 2019 From: aneeshaa45 at gmail.com (Aneeshaa Chowdhry) Date: Sat, 6 Jul 2019 14:59:18 +0100 Subject: [Pandas-dev] Pandas groupby throws: TypeError: unhashable type: 'numpy.ndarray' Message-ID: Hi Team, I have a dataframe attdf ( https://drive.google.com/open?id=1t_h4b8FQd9soVgYeiXQasY-EbnhfOEYi ) I would like to group the data by Source class and Destination class, count the number of rows in each group and sum up Attention values. While trying to achieve that, I am unable to get past the below type error. Also, attdf.groupby(['Source Class', 'Destination Class']) gives me a which I'm not sure how to use to get what I want. Please adivse. --------------------------------------------------------------------------TypeError Traceback (most recent call last) in ()----> 1 attdf.groupby(['Source Class', 'Destination Class']).count() 8 frames pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__() /usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value) 458 table = hash_klass(size_hint or len(values)) 459 uniques, labels = table.factorize(values, na_sentinel=na_sentinel,--> 460 na_value=na_value) 461 462 labels = ensure_platform_int(labels) pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique() TypeError: unhashable type: 'numpy.ndarray' -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at pietrobattiston.it Wed Jul 10 03:30:05 2019 From: me at pietrobattiston.it (Pietro Battiston) Date: Wed, 10 Jul 2019 09:30:05 +0200 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 In-Reply-To: References: Message-ID: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> Hi Marc, cool! I won't be able to attend Euroscipy, but if in the "Maintainers session" you plan to have a way to participate remotely, I'll definitely do. (I might be busy on the 6th instead... still don't know for sure) Pietro Il giorno gio, 04/07/2019 alle 15.45 +0100, Marc Garcia ha scritto: > Hi there, > > Just to let you know that at EuroSciPy 2019 (in September in Spain) > we will have a dataframe summit, to stay updated and coordinate among > projects replicating the pandas API (other dataframe projects are > more than welcome). > > Maintainers from all the main projects (pandas, dask, vaex, modin, > cudf and koalas) will be attending. If you want to get involved > (whether you can attend the conference or not), please DM me. > > More info: https://github.com/python-sprints/dataframe-summit > Conference website: https://www.euroscipy.org/2019/ > > Cheers! > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev From matthew.brett at gmail.com Wed Jul 10 10:32:51 2019 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 10 Jul 2019 07:32:51 -0700 Subject: [Pandas-dev] Consensus on pct_change Message-ID: Hi, ## Summary The pct_change method does not give the percent change, and that's confusing. What should be done to fix that? ## Problem: Consider the following snippet: [ins] In [3]: pd.Series([1, 1.1]).pct_change() Out[3]: 0 NaN 1 0.1 dtype: float64 Pandas thinks that the percent change from 1 to 1.1 is 0.1, but I think most of us would agree that this is incorrect - the *percent* change is 10. This is very confusing, and tripped up some of my students. There is some discussion of the problem over at this issue: https://github.com/pandas-dev/pandas/issues/20752 ## What can be done? I guess the viable options could be: * Leave the method as is, but add some very clear indication in the docstring that the function does not return percent change, but proportional change. * Do the above, but also add a keyword like 'percent=False', that, when True, multiplies the result by 100. * Rename the method to prop_change, add back a deprecated method pct_change, that points to prop_change and warns about the confusion. What do y'all think? Cheers, Matthew From william.ayd at icloud.com Wed Jul 10 10:47:47 2019 From: william.ayd at icloud.com (William Ayd) Date: Wed, 10 Jul 2019 07:47:47 -0700 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References: Message-ID: Hi Matthew, Thanks for reaching out! I don?t really see a reason to change this as the use of floats to represent percentages is pretty common Python. You can format the output to percentages if you want doing something as follows: >>> pd.Series([1, 1.1]).pct_change().map('{:.0%}'.format) 0 nan% 1 10% Or use a (admittedly more verbose) lambda expression if you don?t want NA values to get formatted: >>> pd.Series([1, 1.1]).pct_change().apply(lambda x: np.nan if pd.isnull(x) else '{:.0%}'.format(x)) 0 NaN 1 10% Someone else may have even more direct ways of approaching this. - Will > On Jul 10, 2019, at 7:32 AM, Matthew Brett wrote: > > Hi, > > ## Summary > > The pct_change method does not give the percent change, and that's > confusing. What should be done to fix that? > > ## Problem: > > Consider the following snippet: > > [ins] In [3]: pd.Series([1, 1.1]).pct_change() > Out[3]: > 0 NaN > 1 0.1 > dtype: float64 > > Pandas thinks that the percent change from 1 to 1.1 is 0.1, but I > think most of us would agree that this is incorrect - the *percent* > change is 10. > > This is very confusing, and tripped up some of my students. > > There is some discussion of the problem over at this issue: > > https://github.com/pandas-dev/pandas/issues/20752 > > ## What can be done? > > I guess the viable options could be: > > * Leave the method as is, but add some very clear indication in the > docstring that the function does not return percent change, but > proportional change. > * Do the above, but also add a keyword like 'percent=False', that, > when True, multiplies the result by 100. > * Rename the method to prop_change, add back a deprecated method > pct_change, that points to prop_change and warns about the confusion. > > What do y'all think? > > Cheers, > > Matthew > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev William Ayd william.ayd at icloud.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 10 10:54:51 2019 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 10 Jul 2019 07:54:51 -0700 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References:

Message-ID: Hi, On Wed, Jul 10, 2019 at 7:47 AM William Ayd wrote: > > Hi Matthew, > > Thanks for reaching out! I don?t really see a reason to change this as the use of floats to represent percentages is pretty common Python. You can format the output to percentages if you want doing something as follows: > > >>> pd.Series([1, 1.1]).pct_change().map('{:.0%}'.format) > 0 nan% > 1 10% > > Or use a (admittedly more verbose) lambda expression if you don?t want NA values to get formatted: > > >>> pd.Series([1, 1.1]).pct_change().apply(lambda x: np.nan if pd.isnull(x) else '{:.0%}'.format(x)) > 0 NaN > 1 10% Sure - of course one can display the proportions as percentages, but I am sure you'd agree, from the definition of percentage change, that the float 0.1 is a very surprising answer to percent change between 1 and 1.1. I mean, you'd surely mark that wrong if you were grading a student assignment, because they failed to do the the "per cent" part of the calculation. Cheers, Matthew From cbartak at gmail.com Wed Jul 10 10:59:37 2019 From: cbartak at gmail.com (Chris Bartak) Date: Wed, 10 Jul 2019 09:59:37 -0500 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References:

Message-ID: I'm sure it depends on your background (mine is more financial than academic), but at least for some users this isn't confusing or surprising - I would say 10% is equivalent to 0.10. Excel works that way, as well as most pocket calculators with a '%' key. I think of it more as an output formatting issue - it's always been a weaker point to me in pandas compared to e.g. SAS, but not clear what the solution is - probably some kind of formatting metadata. Old master issue here - https://github.com/pandas-dev/pandas/issues/4668 On Wed, Jul 10, 2019 at 9:47 AM William Ayd via Pandas-dev < pandas-dev at python.org> wrote: > Hi Matthew, > > Thanks for reaching out! I don?t really see a reason to change this as the > use of floats to represent percentages is pretty common Python. You can > format the output to percentages if you want doing something as follows: > > >>> pd.Series([1, 1.1]).pct_change().map('{:.0%}'.format) > 0 nan% > 1 10% > > Or use a (admittedly more verbose) lambda expression if you don?t want NA > values to get formatted: > > >>> pd.Series([1, 1.1]).pct_change().apply(lambda x: np.nan if > pd.isnull(x) else '{:.0%}'.format(x)) > 0 NaN > 1 10% > > Someone else may have even more direct ways of approaching this. > > - Will > > On Jul 10, 2019, at 7:32 AM, Matthew Brett > wrote: > > Hi, > > ## Summary > > The pct_change method does not give the percent change, and that's > confusing. What should be done to fix that? > > ## Problem: > > Consider the following snippet: > > [ins] In [3]: pd.Series([1, 1.1]).pct_change() > Out[3]: > 0 NaN > 1 0.1 > dtype: float64 > > Pandas thinks that the percent change from 1 to 1.1 is 0.1, but I > think most of us would agree that this is incorrect - the *percent* > change is 10. > > This is very confusing, and tripped up some of my students. > > There is some discussion of the problem over at this issue: > > https://github.com/pandas-dev/pandas/issues/20752 > > ## What can be done? > > I guess the viable options could be: > > * Leave the method as is, but add some very clear indication in the > docstring that the function does not return percent change, but > proportional change. > * Do the above, but also add a keyword like 'percent=False', that, > when True, multiplies the result by 100. > * Rename the method to prop_change, add back a deprecated method > pct_change, that points to prop_change and warns about the confusion. > > What do y'all think? > > Cheers, > > Matthew > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > > > William Ayd > william.ayd at icloud.com > > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 10 11:07:16 2019 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 10 Jul 2019 08:07:16 -0700 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References:

Message-ID: Hi, On Wed, Jul 10, 2019 at 7:59 AM Chris Bartak wrote: > > I'm sure it depends on your background (mine is more financial than academic), but at least for some users this isn't confusing or surprising - I would say 10% is equivalent to 0.10. Excel works that way, as well as most pocket calculators with a '%' key. I think we agree that 0.1 is a proportion and 10 is a percentage, and it's easy to display a proportion as a percentage (by multiplying by 100). So, if there was some way of keeping track of the values as being a proportion, and displaying as a percentage, this would make sense. So, the ideal might be a function called prop_change that returned the current values, but displayed as a percentage. I think you're right though - there isn't an easy way of doing that - hence the confusion. Cheers, Matthew From garcia.marc at gmail.com Thu Jul 11 06:58:38 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Thu, 11 Jul 2019 11:58:38 +0100 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References:

Message-ID: I'm -1 on changing the method, but I think the docstring of pct_change can be improved, and also include a note for that. Can you (or may be your students) open an issue or PR for this? Since you found it confusing, I think you're the best people to clarify the documentation and write it in a way that other people don't have the same misunderstanding as you just had. Thanks! On Wed, Jul 10, 2019 at 4:08 PM Matthew Brett wrote: > Hi, > > On Wed, Jul 10, 2019 at 7:59 AM Chris Bartak wrote: > > > > I'm sure it depends on your background (mine is more financial than > academic), but at least for some users this isn't confusing or surprising - > I would say 10% is equivalent to 0.10. Excel works that way, as well as > most pocket calculators with a '%' key. > > I think we agree that 0.1 is a proportion and 10 is a percentage, and > it's easy to display a proportion as a percentage (by multiplying by > 100). So, if there was some way of keeping track of the values as > being a proportion, and displaying as a percentage, this would make > sense. So, the ideal might be a function called prop_change that > returned the current values, but displayed as a percentage. > > I think you're right though - there isn't an easy way of doing that - > hence the confusion. > > Cheers, > > Matthew > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sds at gnu.org Thu Jul 11 16:23:30 2019 From: sds at gnu.org (Sam Steingold) Date: Thu, 11 Jul 2019 16:23:30 -0400 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: (Matthew Brett's message of "Wed, 10 Jul 2019 07:32:51 -0700") References: Message-ID: IMO the correct view is that "percent" vs "" > * Matthew Brett [2019-07-10 07:32:51 -0700]: > > Pandas thinks that the percent change from 1 to 1.1 is 0.1, but I > think most of us would agree that this is incorrect - the *percent* > change is 10. I think that the _relative change_ from 2 to 2.2 is 0.1. This can be _printed_ as "0.1" or "0.1000" or "10%" or "10.000%". IOW, "percent" is a _unit_, not a separate concept, so the difference between "0.1" and "10%" is like the difference between 100m and 0.1km - same value, different printed representations. Thus I think it is never a good idea for a _function_ to return its value in percentage points. Instead of `pct_change` we need `relative_change` and it should be possible to specify how the resulting values are to be printed (as raw decimals, percentages, "per mille" or something else). in different situations ad hoc or by default. Consider ordinary `diff` - it has an inverse `cumsum` (ignoring the initial value for a second). There should be in inverse to `relative_diff` too, and if it is stored in percentage points, that inverse will be somewhat ugly. Thanks. -- Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671 http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com http://jij.org http://iris.org.il http://americancensorship.org You don't have to prepare when your REAL friends are coming over. From me at pietrobattiston.it Fri Jul 12 03:30:28 2019 From: me at pietrobattiston.it (Pietro Battiston) Date: Fri, 12 Jul 2019 09:30:28 +0200 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References:

Message-ID: <3e58bf5f3e5bfe058e96a31c58c59422e061c26b.camel@pietrobattiston.it> I agree "percentage" strictly speaking denotes what goes _before_ the percent sign... but I'm -1 on changing the method behavior, and -0 on changing the method name. I'm obviously +1 on clarifying the docs. Pietro Il giorno gio, 11/07/2019 alle 11.58 +0100, Marc Garcia ha scritto: > I'm -1 on changing the method, but I think the docstring of > pct_change can be improved, and also include a note for that. > > Can you (or may be your students) open an issue or PR for this? Since > you found it confusing, I think you're the best people to clarify the > documentation and write it in a way that other people don't have the > same misunderstanding as you just had. > > Thanks! > > On Wed, Jul 10, 2019 at 4:08 PM Matthew Brett < > matthew.brett at gmail.com> wrote: > > Hi, > > > > On Wed, Jul 10, 2019 at 7:59 AM Chris Bartak > > wrote: > > > > > > I'm sure it depends on your background (mine is more financial > > than academic), but at least for some users this isn't confusing or > > surprising - I would say 10% is equivalent to 0.10. Excel works > > that way, as well as most pocket calculators with a '%' key. > > > > I think we agree that 0.1 is a proportion and 10 is a percentage, > > and > > it's easy to display a proportion as a percentage (by multiplying > > by > > 100). So, if there was some way of keeping track of the values as > > being a proportion, and displaying as a percentage, this would make > > sense. So, the ideal might be a function called prop_change that > > returned the current values, but displayed as a percentage. > > > > I think you're right though - there isn't an easy way of doing that > > - > > hence the confusion. > > > > Cheers, > > > > Matthew > > _______________________________________________ > > Pandas-dev mailing list > > Pandas-dev at python.org > > https://mail.python.org/mailman/listinfo/pandas-dev > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev From matthew.brett at gmail.com Fri Jul 12 04:45:31 2019 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 12 Jul 2019 01:45:31 -0700 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References: Message-ID: Hi, On Thu, Jul 11, 2019 at 1:23 PM Sam Steingold wrote: > > IMO the correct view is that "percent" vs "" > > > * Matthew Brett [2019-07-10 07:32:51 -0700]: > > > > Pandas thinks that the percent change from 1 to 1.1 is 0.1, but I > > think most of us would agree that this is incorrect - the *percent* > > change is 10. > > I think that the _relative change_ from 2 to 2.2 is 0.1. > > This can be _printed_ as "0.1" or "0.1000" or "10%" or "10.000%". > IOW, "percent" is a _unit_, not a separate concept, so the difference > between "0.1" and "10%" is like the difference between 100m and 0.1km - > same value, different printed representations. > > Thus I think it is never a good idea for a _function_ to return its > value in percentage points. > Instead of `pct_change` we need `relative_change` and it should be > possible to specify how the resulting values are to be printed > (as raw decimals, percentages, "per mille" or something else). > in different situations ad hoc or by default. Thank you - yes - this is an excellent analysis. It's as if there was a function "distance_in_meters" and in fact it returns distance in centimeters. It represents the same thing, but the answer is incorrect, given the function name. I know that backcompatibility requires pct_change stay, but I'd like to repropose my option 3, which is to add `relative_change` or `prop_change` (to taste), and deprecate `pct_change`. Cheers, Matthew From tom.augspurger88 at gmail.com Mon Jul 15 12:18:09 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Mon, 15 Jul 2019 11:18:09 -0500 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References: Message-ID: I'm OK with the way things currently are. On Fri, Jul 12, 2019 at 3:46 AM Matthew Brett wrote: > Hi, > > On Thu, Jul 11, 2019 at 1:23 PM Sam Steingold wrote: > > > > IMO the correct view is that "percent" vs "" > > > > > * Matthew Brett [2019-07-10 07:32:51 -0700]: > > > > > > Pandas thinks that the percent change from 1 to 1.1 is 0.1, but I > > > think most of us would agree that this is incorrect - the *percent* > > > change is 10. > > > > I think that the _relative change_ from 2 to 2.2 is 0.1. > > > > This can be _printed_ as "0.1" or "0.1000" or "10%" or "10.000%". > > IOW, "percent" is a _unit_, not a separate concept, so the difference > > between "0.1" and "10%" is like the difference between 100m and 0.1km - > > same value, different printed representations. > > > > Thus I think it is never a good idea for a _function_ to return its > > value in percentage points. > > Instead of `pct_change` we need `relative_change` and it should be > > possible to specify how the resulting values are to be printed > > (as raw decimals, percentages, "per mille" or something else). > > in different situations ad hoc or by default. > > Thank you - yes - this is an excellent analysis. > > It's as if there was a function "distance_in_meters" and in fact it > returns distance in centimeters. It represents the same thing, but > the answer is incorrect, given the function name. > > I know that backcompatibility requires pct_change stay, but I'd like > to repropose my option 3, which is to add `relative_change` or > `prop_change` (to taste), and deprecate `pct_change`. > > Cheers, > > Matthew > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From contribute at tensortable.com Mon Jul 15 16:21:35 2019 From: contribute at tensortable.com (Tensortable.com) Date: Mon, 15 Jul 2019 20:21:35 +0000 Subject: [Pandas-dev] Consensus on pct_change In-Reply-To: References:

Message-ID: <942387d3-d1db-4402-8fbd-d8de8fc29fbd@www.fastmail.com> Yeah, me too.. -- Terji Petersen On Mon, Jul 15, 2019, at 4:18 PM, Tom Augspurger wrote: > I'm OK with the way things currently are. > > > On Fri, Jul 12, 2019 at 3:46 AM Matthew Brett wrote: >> Hi, >> >> On Thu, Jul 11, 2019 at 1:23 PM Sam Steingold wrote: >> > >> > IMO the correct view is that "percent" vs "" >> > >> > > * Matthew Brett [2019-07-10 07:32:51 -0700]: >> > > >> > > Pandas thinks that the percent change from 1 to 1.1 is 0.1, but I >> > > think most of us would agree that this is incorrect - the *percent* >> > > change is 10. >> > >> > I think that the _relative change_ from 2 to 2.2 is 0.1. >> > >> > This can be _printed_ as "0.1" or "0.1000" or "10%" or "10.000%". >> > IOW, "percent" is a _unit_, not a separate concept, so the difference >> > between "0.1" and "10%" is like the difference between 100m and 0.1km - >> > same value, different printed representations. >> > >> > Thus I think it is never a good idea for a _function_ to return its >> > value in percentage points. >> > Instead of `pct_change` we need `relative_change` and it should be >> > possible to specify how the resulting values are to be printed >> > (as raw decimals, percentages, "per mille" or something else). >> > in different situations ad hoc or by default. >> >> Thank you - yes - this is an excellent analysis. >> >> It's as if there was a function "distance_in_meters" and in fact it >> returns distance in centimeters. It represents the same thing, but >> the answer is incorrect, given the function name. >> >> I know that backcompatibility requires pct_change stay, but I'd like >> to repropose my option 3, which is to add `relative_change` or >> `prop_change` (to taste), and deprecate `pct_change`. >> >> Cheers, >> >> Matthew >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Tue Jul 16 07:24:45 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Tue, 16 Jul 2019 12:24:45 +0100 Subject: [Pandas-dev] Dataframe summit @ EuroSciPy 2019 In-Reply-To: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> References: <61f7be656962ea76e3ffd6fc1984e077ccbb4b19.camel@pietrobattiston.it> Message-ID: For the people who has shown interest in joining remote, I added you to the repo of the summit [1], feel free to open issues there of the topics you're interested in discussing. I also created a Gitter channel that you can join. EuroSciPy doesn't currently have budget to life stream the session, but if we find a sponsor we'll do it, and also publish the recording in youtube. Based on the experience with the European pandas summit this seems unlikely. Cheers! 1. https://github.com/python-sprints/dataframe-summit 2. https://gitter.im/py-sprints/dataframe-summit On Wed, Jul 10, 2019 at 8:30 AM Pietro Battiston wrote: > Hi Marc, > > cool! > > I won't be able to attend Euroscipy, but if in the "Maintainers > session" you plan to have a way to participate remotely, I'll > definitely do. > > (I might be busy on the 6th instead... still don't know for sure) > > Pietro > > Il giorno gio, 04/07/2019 alle 15.45 +0100, Marc Garcia ha scritto: > > Hi there, > > > > Just to let you know that at EuroSciPy 2019 (in September in Spain) > > we will have a dataframe summit, to stay updated and coordinate among > > projects replicating the pandas API (other dataframe projects are > > more than welcome). > > > > Maintainers from all the main projects (pandas, dask, vaex, modin, > > cudf and koalas) will be attending. If you want to get involved > > (whether you can attend the conference or not), please DM me. > > > > More info: https://github.com/python-sprints/dataframe-summit > > Conference website: https://www.euroscipy.org/2019/ > > > > Cheers! > > _______________________________________________ > > Pandas-dev mailing list > > Pandas-dev at python.org > > https://mail.python.org/mailman/listinfo/pandas-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Tue Jul 16 18:55:52 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Tue, 16 Jul 2019 18:55:52 -0400 Subject: [Pandas-dev] Plans for pandas 0.25.0 and pandas 1.0 Message-ID: Hi all, We had some discussion about this on the in-person dev sprint end of June, and I thought it would be good to have some public record of this as well. A pandas 0.25.0 release is close (the RC was released earlier this month), see https://github.com/pandas-dev/pandas/issues/24950 For pandas 1.0, the current plan is to finally "just do it". The idea is that it should not take too long after 0.25.0, without additional major API changes (additions are fine of course) but with removing the current deprecated functionalities. Depending on how much feedback there is on 0.25.0 and on how smoothly it goes for removing deprecated stuff, we could (maybe optimistically) target September for that. Comments certainly welcome! Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From contribute at tensortable.com Tue Jul 16 19:14:30 2019 From: contribute at tensortable.com (Tensortable.com) Date: Tue, 16 Jul 2019 23:14:30 +0000 Subject: [Pandas-dev] Plans for pandas 0.25.0 and pandas 1.0 In-Reply-To: References: Message-ID: <1b9510f1-e872-4b9b-92ef-616f8978adb0@www.fastmail.com> What will be the deprecation procedures post 1.0? If a deprecation doesn't make it for 1.0, will that be an effective moratorium on deprecation for e.g. a year? I would actually like that stability, but that also means an extra effort to get all depecation done now could be worth an extra effort. -- Terji Petersen On Tue, Jul 16, 2019, at 10:56 PM, Joris Van den Bossche wrote: > Hi all, > > We had some discussion about this on the in-person dev sprint end of June, and I thought it would be good to have some public record of this as well. > > A pandas 0.25.0 release is close (the RC was released earlier this month), see https://github.com/pandas-dev/pandas/issues/24950 > > For pandas 1.0, the current plan is to finally "just do it". The idea is that it should not take too long after 0.25.0, without additional major API changes (additions are fine of course) but with removing the current deprecated functionalities. > Depending on how much feedback there is on 0.25.0 and on how smoothly it goes for removing deprecated stuff, we could (maybe optimistically) target September for that. > > Comments certainly welcome! > > Joris > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Tue Jul 16 19:21:13 2019 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Tue, 16 Jul 2019 19:21:13 -0400 Subject: [Pandas-dev] Plans for pandas 0.25.0 and pandas 1.0 In-Reply-To: <1b9510f1-e872-4b9b-92ef-616f8978adb0@www.fastmail.com> References: <1b9510f1-e872-4b9b-92ef-616f8978adb0@www.fastmail.com> Message-ID: Op di 16 jul. 2019 om 19:14 schreef Tensortable.com < contribute at tensortable.com>: > What will be the deprecation procedures post 1.0? If a deprecation doesn't > make it for 1.0, will that be an effective moratorium on deprecation for > e.g. a year? > > I would actually like that stability, but that also means an extra effort > to get all depecation done now could be worth an extra effort. > We can still do deprecations in the 1.x releases, that is no problem. But how to handling removing deprecations and breaking changes in the versioning scheme is something we still need to discuss (eg "rolling deprecations" (each deprecation is kept for ca 3 releases, like we currently have been doing) or removing them in a version bump (like eg django does. Note we can do major version bumps more regularly than we did up to now ;)). But I would propose to have this discussion in a separate, dedicated thread. > > -- > Terji Petersen > > > > > On Tue, Jul 16, 2019, at 10:56 PM, Joris Van den Bossche wrote: > > Hi all, > > We had some discussion about this on the in-person dev sprint end of June, > and I thought it would be good to have some public record of this as well. > > A pandas 0.25.0 release is close (the RC was released earlier this month), > see https://github.com/pandas-dev/pandas/issues/24950 > > For pandas 1.0, the current plan is to finally "just do it". The idea is > that it should not take too long after 0.25.0, without additional major API > changes (additions are fine of course) but with removing the current > deprecated functionalities. > Depending on how much feedback there is on 0.25.0 and on how smoothly it > goes for removing deprecated stuff, we could (maybe optimistically) target > September for that. > > Comments certainly welcome! > > Joris > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Tue Jul 16 19:25:47 2019 From: jeffreback at gmail.com (Jeff Reback) Date: Tue, 16 Jul 2019 19:25:47 -0400 Subject: [Pandas-dev] Plans for pandas 0.25.0 and pandas 1.0 In-Reply-To: <1b9510f1-e872-4b9b-92ef-616f8978adb0@www.fastmail.com> References: <1b9510f1-e872-4b9b-92ef-616f8978adb0@www.fastmail.com> Message-ID: <397FCD56-F42B-42A7-9A1D-40D53F36A8A1@gmail.com> no we will still deprecate after 1.0 but we might need the deprecation to stay in longer ( maybe until a major version bump) though that simply means that we might bump major versions more often though we need to have some discussion about this > On Jul 16, 2019, at 7:14 PM, Tensortable.com wrote: > > What will be the deprecation procedures post 1.0? If a deprecation doesn't make it for 1.0, will that be an effective moratorium on deprecation for e.g. a year? > > I would actually like that stability, but that also means an extra effort to get all depecation done now could be worth an extra effort. > > -- > Terji Petersen > > > > >> On Tue, Jul 16, 2019, at 10:56 PM, Joris Van den Bossche wrote: >> Hi all, >> >> We had some discussion about this on the in-person dev sprint end of June, and I thought it would be good to have some public record of this as well. >> >> A pandas 0.25.0 release is close (the RC was released earlier this month), see https://github.com/pandas-dev/pandas/issues/24950 >> >> For pandas 1.0, the current plan is to finally "just do it". The idea is that it should not take too long after 0.25.0, without additional major API changes (additions are fine of course) but with removing the current deprecated functionalities. >> Depending on how much feedback there is on 0.25.0 and on how smoothly it goes for removing deprecated stuff, we could (maybe optimistically) target September for that. >> >> Comments certainly welcome! >> >> Joris >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Wed Jul 17 16:49:24 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Wed, 17 Jul 2019 15:49:24 -0500 Subject: [Pandas-dev] Version Policy following 1.0 Message-ID: Split from https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html Following 1.0, I think we stop outright breaking APIs. I think that stability will be welcome to users. We still have to decide how we deprecate APIs. The two options are 1. Rolling deprecations: Essentially what we do today: An API is deprecated in release 1.1.0 and can be removed in (say) 1.4.0. 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in 2.0.0 Do people have preferences between the two? The (dis?)advantage of Semver is that all the API-breaking changes are restricted to a single releases. With rolling deprecations, the upgrades from any 1.x to 1.y should be smoother than 1.x to 2.x. Once we choose a strategy, we may want to formalize release schedules around it. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbrockmendel at gmail.com Wed Jul 17 17:42:47 2019 From: jbrockmendel at gmail.com (Brock Mendel) Date: Wed, 17 Jul 2019 14:42:47 -0700 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References: Message-ID: Do we anticipate the rate of deprecations decreasing significantly? i.e. if right now we deprecated everything on which there is a consensus in GH, would we be done for a while? If not, then I think we're better off sticking with zero-dot-*, or else we'll be bumping major versions really frequently. On Wed, Jul 17, 2019 at 1:49 PM Tom Augspurger wrote: > Split from > https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html > > Following 1.0, I think we stop outright breaking APIs. I think that > stability will be welcome to users. > > We still have to decide how we deprecate APIs. The two options are > > 1. Rolling deprecations: Essentially what we do today: An API is > deprecated in release 1.1.0 and can be removed in (say) 1.4.0. > 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in 2.0.0 > > Do people have preferences between the two? The (dis?)advantage of Semver > is that all the API-breaking changes are restricted to a single releases. > With rolling deprecations, the upgrades from any 1.x to 1.y should be > smoother than 1.x to 2.x. > > Once we choose a strategy, we may want to formalize release schedules > around it. > > Tom > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From william.ayd at icloud.com Thu Jul 18 20:28:50 2019 From: william.ayd at icloud.com (William Ayd) Date: Thu, 18 Jul 2019 17:28:50 -0700 Subject: [Pandas-dev] ANN: Pandas 0.25.0 Released Message-ID: <007E9200-27F0-4698-9F05-8ADDADA1B537@icloud.com> Hi, I am pleased to announce the release of pandas 0.25.0. This is a major release from 0.24.2 and includes a number of API changes, new features, enhancements, and performance improvements along with a large number of bug fixes. Highlights include: Dropped Python 2 support Groupby aggregation with relabeling Better repr for MultiIndex Better truncated repr for Series and DataFrame Series.explode to split list-like values to rows See the release notes for a full list of all the changes from 0.24.2. The release can be installed with conda using the conda-forge channel conda install pandas Or via PyPI python3 -m pip install --upgrade pandas Please report any issues with the release on the pandas issue tracker . Will _______________________________________________ Pandas-dev mailing list Pandas-dev at python.org https://mail.python.org/mailman/listinfo/pandas-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.gidden at gmail.com Fri Jul 19 06:45:20 2019 From: matthew.gidden at gmail.com (Matthew Gidden) Date: Fri, 19 Jul 2019 12:45:20 +0200 Subject: [Pandas-dev] Avoiding Dependencies on Private Modules Message-ID: Hi all, I am one of the developers of a niche scientific library, pyam [1]. We provide some standard plotting routines for users with our specific data model, many of which are shims for pandas plotting. In order to keep track of certain aspects of data-plot attributes, we utilize the `_get_standard_colors()` [2]. Before pandas 0.25.0, this lived in pandas.plotting._style. Now it lives in pandas.plotting._matplotlib.style. Of course, this change breaks our package - we know this is our fault for relying on a private module =). Therefore - I wanted to ask the list whether there is a better way to get this functionality that is provided in first-class public interfaces? For now we can hotfix the issue, but we would like to try to guarantee stability in the future. Cheers, Matt [1] https://pyam-iamc.readthedocs.io/en/latest/ [2] https://github.com/IAMconsortium/pyam/blob/master/pyam/plotting.py#L37 -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Fri Jul 19 07:27:54 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Fri, 19 Jul 2019 12:27:54 +0100 Subject: [Pandas-dev] Avoiding Dependencies on Private Modules In-Reply-To: References: Message-ID: I don't think there is any better way. That's just an internal method, and I don't think it makes sense to have it in the pandas public API. May be you can check with Matplotlib if that function is something they want to provide, that sounds like the best option to me if it's useful for other projects. If they are not interested, I'd personally just copy that file into your project. Surely not an ideal solution, but of all the bad solutions I'd say it's the best. There is no discussion about it yet, but we may want to move `pandas.plotting._matplotlib` to a third-party package in the future. That would break your code again if you just update the module path. On Fri, Jul 19, 2019 at 12:10 PM Matthew Gidden wrote: > Hi all, > > I am one of the developers of a niche scientific library, pyam [1]. We > provide some standard plotting routines for users with our specific data > model, many of which are shims for pandas plotting. In order to keep track > of certain aspects of data-plot attributes, we utilize the > `_get_standard_colors()` [2]. Before pandas 0.25.0, this lived in > pandas.plotting._style. Now it lives in pandas.plotting._matplotlib.style. > Of course, this change breaks our package - we know this is our fault for > relying on a private module =). > > Therefore - I wanted to ask the list whether there is a better way to get > this functionality that is provided in first-class public interfaces? For > now we can hotfix the issue, but we would like to try to guarantee > stability in the future. > > Cheers, > Matt > > [1] https://pyam-iamc.readthedocs.io/en/latest/ > [2] https://github.com/IAMconsortium/pyam/blob/master/pyam/plotting.py#L37 > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Fri Jul 19 07:42:28 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Fri, 19 Jul 2019 12:42:28 +0100 Subject: [Pandas-dev] Avoiding Dependencies on Private Modules In-Reply-To: References: Message-ID: I'm unsure, I think someone else should be able to answer about the license. But I think it's worth opening the issue with Matplotlib. If that function is useful for different use cases, I think they can be happy to move it there, and it'd also be good to pandas to just call the Matplotlib function directly. Thanks! On Fri, Jul 19, 2019 at 12:34 PM Matthew Gidden wrote: > Hi Marc, > > Thanks for the quick reply. I agree with your "least bad of all solutions" > approach here. I'm a bit rusty on the correct practice for doing so - do we > need to copy over also panda's license file? Or are we ok with a > notification via comment at the top of the copied file/function? > > Cheers, > Matt > > On Fri, Jul 19, 2019 at 1:28 PM Marc Garcia wrote: > >> I don't think there is any better way. That's just an internal method, >> and I don't think it makes sense to have it in the pandas public API. >> >> May be you can check with Matplotlib if that function is something they >> want to provide, that sounds like the best option to me if it's useful for >> other projects. >> >> If they are not interested, I'd personally just copy that file into your >> project. Surely not an ideal solution, but of all the bad solutions I'd say >> it's the best. There is no discussion about it yet, but we may want to move >> `pandas.plotting._matplotlib` to a third-party package in the future. That >> would break your code again if you just update the module path. >> >> On Fri, Jul 19, 2019 at 12:10 PM Matthew Gidden >> wrote: >> >>> Hi all, >>> >>> I am one of the developers of a niche scientific library, pyam [1]. We >>> provide some standard plotting routines for users with our specific data >>> model, many of which are shims for pandas plotting. In order to keep track >>> of certain aspects of data-plot attributes, we utilize the >>> `_get_standard_colors()` [2]. Before pandas 0.25.0, this lived in >>> pandas.plotting._style. Now it lives in pandas.plotting._matplotlib.style. >>> Of course, this change breaks our package - we know this is our fault for >>> relying on a private module =). >>> >>> Therefore - I wanted to ask the list whether there is a better way to >>> get this functionality that is provided in first-class public interfaces? >>> For now we can hotfix the issue, but we would like to try to guarantee >>> stability in the future. >>> >>> Cheers, >>> Matt >>> >>> [1] https://pyam-iamc.readthedocs.io/en/latest/ >>> [2] >>> https://github.com/IAMconsortium/pyam/blob/master/pyam/plotting.py#L37 >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> https://mail.python.org/mailman/listinfo/pandas-dev >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.gidden at gmail.com Fri Jul 19 07:33:57 2019 From: matthew.gidden at gmail.com (Matthew Gidden) Date: Fri, 19 Jul 2019 13:33:57 +0200 Subject: [Pandas-dev] Avoiding Dependencies on Private Modules In-Reply-To: References: Message-ID: Hi Marc, Thanks for the quick reply. I agree with your "least bad of all solutions" approach here. I'm a bit rusty on the correct practice for doing so - do we need to copy over also panda's license file? Or are we ok with a notification via comment at the top of the copied file/function? Cheers, Matt On Fri, Jul 19, 2019 at 1:28 PM Marc Garcia wrote: > I don't think there is any better way. That's just an internal method, and > I don't think it makes sense to have it in the pandas public API. > > May be you can check with Matplotlib if that function is something they > want to provide, that sounds like the best option to me if it's useful for > other projects. > > If they are not interested, I'd personally just copy that file into your > project. Surely not an ideal solution, but of all the bad solutions I'd say > it's the best. There is no discussion about it yet, but we may want to move > `pandas.plotting._matplotlib` to a third-party package in the future. That > would break your code again if you just update the module path. > > On Fri, Jul 19, 2019 at 12:10 PM Matthew Gidden > wrote: > >> Hi all, >> >> I am one of the developers of a niche scientific library, pyam [1]. We >> provide some standard plotting routines for users with our specific data >> model, many of which are shims for pandas plotting. In order to keep track >> of certain aspects of data-plot attributes, we utilize the >> `_get_standard_colors()` [2]. Before pandas 0.25.0, this lived in >> pandas.plotting._style. Now it lives in pandas.plotting._matplotlib.style. >> Of course, this change breaks our package - we know this is our fault for >> relying on a private module =). >> >> Therefore - I wanted to ask the list whether there is a better way to get >> this functionality that is provided in first-class public interfaces? For >> now we can hotfix the issue, but we would like to try to guarantee >> stability in the future. >> >> Cheers, >> Matt >> >> [1] https://pyam-iamc.readthedocs.io/en/latest/ >> [2] >> https://github.com/IAMconsortium/pyam/blob/master/pyam/plotting.py#L37 >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Fri Jul 19 07:59:34 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Fri, 19 Jul 2019 06:59:34 -0500 Subject: [Pandas-dev] Backport Policy Message-ID: For our new maintainers (and as a reminder): We have a backport branch 0.25.x. Before merging a bugfix PR that should be backported, ensure that it has the 0.25.1 milestone. The backport bot will take care of opening a PR against 0.25.x with the necessary changes. We'd like to be fairly conservative with backports to 0.25.x. Pretty much just bug fixes. Performance improvements, new features, etc. should be merged with the 1.0 milestone. And FYI, our current version on master is 0.26.0.dev0, rather than 1.0.0.dev0, in case we need to issue a 0.26 before 1.0. But the plan is still to do 1.0 for the next major release. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Fri Jul 19 16:58:09 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Fri, 19 Jul 2019 15:58:09 -0500 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References: Message-ID: On Wed, Jul 17, 2019 at 4:42 PM Brock Mendel wrote: > Do we anticipate the rate of deprecations decreasing significantly? i.e. > if right now we deprecated everything on which there is a consensus in GH, > would we be done for a while? > > If not, then I think we're better off sticking with zero-dot-*, or else > we'll be bumping major versions really frequently. > I think this is why I prefer sticking with rolling. If every release bumps the major version number, then no release is a major release. So my preference would be 1. Formally adopt and document that we using a rolling deprecation cycle 2. State that deprecations will be around for `N` major releases (3?) 3. Require that every new deprecation includes the version it'll be enforced in. e.g. ``` DataFrame.get_dtype_counts is deprecated, and will be removed in pandas 1.3.0. Use DataFrame.dtypes.value_counts() instead. ``` Tom On Wed, Jul 17, 2019 at 1:49 PM Tom Augspurger > wrote: > >> Split from >> https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html >> >> Following 1.0, I think we stop outright breaking APIs. I think that >> stability will be welcome to users. >> >> We still have to decide how we deprecate APIs. The two options are >> >> 1. Rolling deprecations: Essentially what we do today: An API is >> deprecated in release 1.1.0 and can be removed in (say) 1.4.0. >> 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in 2.0.0 >> >> Do people have preferences between the two? The (dis?)advantage of Semver >> is that all the API-breaking changes are restricted to a single releases. >> With rolling deprecations, the upgrades from any 1.x to 1.y should be >> smoother than 1.x to 2.x. >> >> Once we choose a strategy, we may want to formalize release schedules >> around it. >> >> Tom >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From metischicagoeventassociate at gmail.com Sat Jul 20 16:54:14 2019 From: metischicagoeventassociate at gmail.com (Metis Chicago) Date: Sat, 20 Jul 2019 15:54:14 -0500 Subject: [Pandas-dev] The FREE Demystifying Data Science Conference is happening again! July 30-31!!! Message-ID: Metis is hosting our 3rd annual *FREE *Demystifying Data Science Conference July 30-31 from 10am - 5pm ET! There will be 16 interactive data science talks and 6 workshops led by industry-leading speakers. Experience interactivity via real-time chat with world-wide audience, polling, and social sharing. Plus, get access to recordings of every presentation and workshop. We speak *Python*. Day 1 - July 30 will be for aspiring data scientists. Day 2 - July 31 will be for business leaders and practitioners Registration page: https://www.thisismetis.com/demystifying-data-science -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Sun Jul 21 05:23:49 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Sun, 21 Jul 2019 10:23:49 +0100 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References:

Message-ID: Personally, I think will make things easier for users if we use SemVer. Mainly because as a user I think it's somehow intuitive that I can upgrade for example from 1.1.0 to 1.8.0 without having to edit code. But I'd expect to have to take a closer look and see incompatibilities when I migrate from 1.* to 2.*. Personally I don't usually know the deprecation policies of packages, and I don't expect most pandas users to know ours even now. So, the simplest the better. Also, for ourselves, I think it's easier and more efficient to forget about removing code in minors, and to do all the removals for majors. I agree this comes at the cost of bigger changes in major releases, compared to a rolling policy. But IMHO it's worth. On Fri, 19 Jul 2019, 21:58 Tom Augspurger, wrote: > On Wed, Jul 17, 2019 at 4:42 PM Brock Mendel > wrote: > >> Do we anticipate the rate of deprecations decreasing significantly? i.e. >> if right now we deprecated everything on which there is a consensus in GH, >> would we be done for a while? >> >> If not, then I think we're better off sticking with zero-dot-*, or else >> we'll be bumping major versions really frequently. >> > > I think this is why I prefer sticking with rolling. If every release bumps > the major version number, then no release is a major release. > > So my preference would be > > 1. Formally adopt and document that we using a rolling deprecation cycle > 2. State that deprecations will be around for `N` major releases (3?) > 3. Require that every new deprecation includes the version it'll be > enforced in. e.g. > > ``` > DataFrame.get_dtype_counts is deprecated, and will be removed in pandas > 1.3.0. > Use DataFrame.dtypes.value_counts() instead. > ``` > > Tom > > On Wed, Jul 17, 2019 at 1:49 PM Tom Augspurger >> wrote: >> >>> Split from >>> https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html >>> >>> Following 1.0, I think we stop outright breaking APIs. I think that >>> stability will be welcome to users. >>> >>> We still have to decide how we deprecate APIs. The two options are >>> >>> 1. Rolling deprecations: Essentially what we do today: An API is >>> deprecated in release 1.1.0 and can be removed in (say) 1.4.0. >>> 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in 2.0.0 >>> >>> Do people have preferences between the two? The (dis?)advantage of >>> Semver is that all the API-breaking changes are restricted to a single >>> releases. With rolling deprecations, the upgrades from any 1.x to 1.y >>> should be smoother than 1.x to 2.x. >>> >>> Once we choose a strategy, we may want to formalize release schedules >>> around it. >>> >>> Tom >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> https://mail.python.org/mailman/listinfo/pandas-dev >>> >> _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrocklin at gmail.com Sun Jul 21 11:10:07 2019 From: mrocklin at gmail.com (Matthew Rocklin) Date: Sun, 21 Jul 2019 08:10:07 -0700 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References:

Message-ID: Hi All, I hope you don't mind the intrusion of a non-pandas dev here. In my opinion SemVer makes more sense for libraries with a well defined and narrowly scoped API, and less sense for an API as vast as the Pandas API. My ardent hope as a user is that you all will clean up and improve the Pandas API continuously. While doing this work I fully expect small bits of the API to break on pretty much every release (I think that it would be hard to avoid this). My guess is that if this community adopted SemVer then devs would be far more cautious about tidying things, which I think would be unfortunate. As someone who is very sensitive to changes in the Pandas API I'm fully in support of the devs breaking things regularly if it means faster progress. Best, -matt On Sun, Jul 21, 2019 at 2:24 AM Marc Garcia wrote: > Personally, I think will make things easier for users if we use SemVer. > Mainly because as a user I think it's somehow intuitive that I can upgrade > for example from 1.1.0 to 1.8.0 without having to edit code. But I'd expect > to have to take a closer look and see incompatibilities when I migrate from > 1.* to 2.*. > > Personally I don't usually know the deprecation policies of packages, and > I don't expect most pandas users to know ours even now. So, the simplest > the better. > > Also, for ourselves, I think it's easier and more efficient to forget > about removing code in minors, and to do all the removals for majors. > > I agree this comes at the cost of bigger changes in major releases, > compared to a rolling policy. But IMHO it's worth. > > > On Fri, 19 Jul 2019, 21:58 Tom Augspurger, > wrote: > >> On Wed, Jul 17, 2019 at 4:42 PM Brock Mendel >> wrote: >> >>> Do we anticipate the rate of deprecations decreasing significantly? >>> i.e. if right now we deprecated everything on which there is a consensus in >>> GH, would we be done for a while? >>> >>> If not, then I think we're better off sticking with zero-dot-*, or else >>> we'll be bumping major versions really frequently. >>> >> >> I think this is why I prefer sticking with rolling. If every release >> bumps the major version number, then no release is a major release. >> >> So my preference would be >> >> 1. Formally adopt and document that we using a rolling deprecation cycle >> 2. State that deprecations will be around for `N` major releases (3?) >> 3. Require that every new deprecation includes the version it'll be >> enforced in. e.g. >> >> ``` >> DataFrame.get_dtype_counts is deprecated, and will be removed in pandas >> 1.3.0. >> Use DataFrame.dtypes.value_counts() instead. >> ``` >> >> Tom >> >> On Wed, Jul 17, 2019 at 1:49 PM Tom Augspurger < >>> tom.augspurger88 at gmail.com> wrote: >>> >>>> Split from >>>> https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html >>>> >>>> Following 1.0, I think we stop outright breaking APIs. I think that >>>> stability will be welcome to users. >>>> >>>> We still have to decide how we deprecate APIs. The two options are >>>> >>>> 1. Rolling deprecations: Essentially what we do today: An API is >>>> deprecated in release 1.1.0 and can be removed in (say) 1.4.0. >>>> 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in 2.0.0 >>>> >>>> Do people have preferences between the two? The (dis?)advantage of >>>> Semver is that all the API-breaking changes are restricted to a single >>>> releases. With rolling deprecations, the upgrades from any 1.x to 1.y >>>> should be smoother than 1.x to 2.x. >>>> >>>> Once we choose a strategy, we may want to formalize release schedules >>>> around it. >>>> >>>> Tom >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>> >>> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Mon Jul 22 11:54:15 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Mon, 22 Jul 2019 10:54:15 -0500 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References:

Message-ID: Thanks Matt, Marc, do you have thoughts on that? As you say, SemVer is more popular in absolute terms. But within our little community (NumPy, pandas, scikit-learn), rolling deprecations seems to be the preferred approach. I think there's some value in being consistent with those libraries. Joris / Wes, do you know what Arrow's policy will be after its 1.0? Tom On Sun, Jul 21, 2019 at 10:10 AM Matthew Rocklin wrote: > Hi All, > > I hope you don't mind the intrusion of a non-pandas dev here. In my > opinion SemVer makes more sense for libraries with a well defined and > narrowly scoped API, and less sense for an API as vast as the Pandas API. > > My ardent hope as a user is that you all will clean up and improve the > Pandas API continuously. While doing this work I fully expect small bits > of the API to break on pretty much every release (I think that it would be > hard to avoid this). My guess is that if this community adopted SemVer > then devs would be far more cautious about tidying things, which I think > would be unfortunate. As someone who is very sensitive to changes in the > Pandas API I'm fully in support of the devs breaking things regularly if it > means faster progress. > > Best, > -matt > > On Sun, Jul 21, 2019 at 2:24 AM Marc Garcia wrote: > >> Personally, I think will make things easier for users if we use SemVer. >> Mainly because as a user I think it's somehow intuitive that I can upgrade >> for example from 1.1.0 to 1.8.0 without having to edit code. But I'd expect >> to have to take a closer look and see incompatibilities when I migrate from >> 1.* to 2.*. >> >> Personally I don't usually know the deprecation policies of packages, and >> I don't expect most pandas users to know ours even now. So, the simplest >> the better. >> >> Also, for ourselves, I think it's easier and more efficient to forget >> about removing code in minors, and to do all the removals for majors. >> >> I agree this comes at the cost of bigger changes in major releases, >> compared to a rolling policy. But IMHO it's worth. >> >> >> On Fri, 19 Jul 2019, 21:58 Tom Augspurger, >> wrote: >> >>> On Wed, Jul 17, 2019 at 4:42 PM Brock Mendel >>> wrote: >>> >>>> Do we anticipate the rate of deprecations decreasing significantly? >>>> i.e. if right now we deprecated everything on which there is a consensus in >>>> GH, would we be done for a while? >>>> >>>> If not, then I think we're better off sticking with zero-dot-*, or else >>>> we'll be bumping major versions really frequently. >>>> >>> >>> I think this is why I prefer sticking with rolling. If every release >>> bumps the major version number, then no release is a major release. >>> >>> So my preference would be >>> >>> 1. Formally adopt and document that we using a rolling deprecation cycle >>> 2. State that deprecations will be around for `N` major releases (3?) >>> 3. Require that every new deprecation includes the version it'll be >>> enforced in. e.g. >>> >>> ``` >>> DataFrame.get_dtype_counts is deprecated, and will be removed in pandas >>> 1.3.0. >>> Use DataFrame.dtypes.value_counts() instead. >>> ``` >>> >>> Tom >>> >>> On Wed, Jul 17, 2019 at 1:49 PM Tom Augspurger < >>>> tom.augspurger88 at gmail.com> wrote: >>>> >>>>> Split from >>>>> https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html >>>>> >>>>> Following 1.0, I think we stop outright breaking APIs. I think that >>>>> stability will be welcome to users. >>>>> >>>>> We still have to decide how we deprecate APIs. The two options are >>>>> >>>>> 1. Rolling deprecations: Essentially what we do today: An API is >>>>> deprecated in release 1.1.0 and can be removed in (say) 1.4.0. >>>>> 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in >>>>> 2.0.0 >>>>> >>>>> Do people have preferences between the two? The (dis?)advantage of >>>>> Semver is that all the API-breaking changes are restricted to a single >>>>> releases. With rolling deprecations, the upgrades from any 1.x to 1.y >>>>> should be smoother than 1.x to 2.x. >>>>> >>>>> Once we choose a strategy, we may want to formalize release schedules >>>>> around it. >>>>> >>>>> Tom >>>>> _______________________________________________ >>>>> Pandas-dev mailing list >>>>> Pandas-dev at python.org >>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>> >>>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> https://mail.python.org/mailman/listinfo/pandas-dev >>> >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev at python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Mon Jul 22 12:25:17 2019 From: garcia.marc at gmail.com (Marc Garcia) Date: Mon, 22 Jul 2019 17:25:17 +0100 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References:

Message-ID: My comment was thinking on final users, and not developers of other packages. I would say that a company that has a large code base with many dependencies including pandas, would prefer to be able to keep updating pandas in its version 1.* without worrying much about breaking anything, and plan well a 1.* to 2.* migration. I'm also unsure if removing deprecated things in a rolling way will cause a faster progress. I'm more biased to forget about removing stuff most of the time, and for major versions just remove everything. I think it would make our life easier to not have the overhead of https://github.com/pandas-dev/pandas/issues/6581 My feeling is that everything will be simpler for everyone with SemVer, and we are the ones deciding when we release a major version, so we'll keep deprecated stuff for as long as we want, if the main advantage of rolling deprecations is to remove things faster. But may be in practice there are other factors that I'm not considering. If the rest of people think rolling deprecations will be better, I'm ok with it, I may be wrong. On Mon, Jul 22, 2019 at 4:54 PM Tom Augspurger wrote: > Thanks Matt, > > Marc, do you have thoughts on that? As you say, SemVer is more popular in > absolute terms. But within our little community (NumPy, pandas, > scikit-learn), rolling deprecations seems to be the preferred approach. > I think there's some value in being consistent with those libraries. > > Joris / Wes, do you know what Arrow's policy will be after its 1.0? > > Tom > > On Sun, Jul 21, 2019 at 10:10 AM Matthew Rocklin > wrote: > >> Hi All, >> >> I hope you don't mind the intrusion of a non-pandas dev here. In my >> opinion SemVer makes more sense for libraries with a well defined and >> narrowly scoped API, and less sense for an API as vast as the Pandas API. >> >> My ardent hope as a user is that you all will clean up and improve the >> Pandas API continuously. While doing this work I fully expect small bits >> of the API to break on pretty much every release (I think that it would be >> hard to avoid this). My guess is that if this community adopted SemVer >> then devs would be far more cautious about tidying things, which I think >> would be unfortunate. As someone who is very sensitive to changes in the >> Pandas API I'm fully in support of the devs breaking things regularly if it >> means faster progress. >> >> Best, >> -matt >> >> On Sun, Jul 21, 2019 at 2:24 AM Marc Garcia >> wrote: >> >>> Personally, I think will make things easier for users if we use SemVer. >>> Mainly because as a user I think it's somehow intuitive that I can upgrade >>> for example from 1.1.0 to 1.8.0 without having to edit code. But I'd expect >>> to have to take a closer look and see incompatibilities when I migrate from >>> 1.* to 2.*. >>> >>> Personally I don't usually know the deprecation policies of packages, >>> and I don't expect most pandas users to know ours even now. So, the >>> simplest the better. >>> >>> Also, for ourselves, I think it's easier and more efficient to forget >>> about removing code in minors, and to do all the removals for majors. >>> >>> I agree this comes at the cost of bigger changes in major releases, >>> compared to a rolling policy. But IMHO it's worth. >>> >>> >>> On Fri, 19 Jul 2019, 21:58 Tom Augspurger, >>> wrote: >>> >>>> On Wed, Jul 17, 2019 at 4:42 PM Brock Mendel >>>> wrote: >>>> >>>>> Do we anticipate the rate of deprecations decreasing significantly? >>>>> i.e. if right now we deprecated everything on which there is a consensus in >>>>> GH, would we be done for a while? >>>>> >>>>> If not, then I think we're better off sticking with zero-dot-*, or >>>>> else we'll be bumping major versions really frequently. >>>>> >>>> >>>> I think this is why I prefer sticking with rolling. If every release >>>> bumps the major version number, then no release is a major release. >>>> >>>> So my preference would be >>>> >>>> 1. Formally adopt and document that we using a rolling deprecation cycle >>>> 2. State that deprecations will be around for `N` major releases (3?) >>>> 3. Require that every new deprecation includes the version it'll be >>>> enforced in. e.g. >>>> >>>> ``` >>>> DataFrame.get_dtype_counts is deprecated, and will be removed in pandas >>>> 1.3.0. >>>> Use DataFrame.dtypes.value_counts() instead. >>>> ``` >>>> >>>> Tom >>>> >>>> On Wed, Jul 17, 2019 at 1:49 PM Tom Augspurger < >>>>> tom.augspurger88 at gmail.com> wrote: >>>>> >>>>>> Split from >>>>>> https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html >>>>>> >>>>>> Following 1.0, I think we stop outright breaking APIs. I think that >>>>>> stability will be welcome to users. >>>>>> >>>>>> We still have to decide how we deprecate APIs. The two options are >>>>>> >>>>>> 1. Rolling deprecations: Essentially what we do today: An API is >>>>>> deprecated in release 1.1.0 and can be removed in (say) 1.4.0. >>>>>> 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in >>>>>> 2.0.0 >>>>>> >>>>>> Do people have preferences between the two? The (dis?)advantage of >>>>>> Semver is that all the API-breaking changes are restricted to a single >>>>>> releases. With rolling deprecations, the upgrades from any 1.x to 1.y >>>>>> should be smoother than 1.x to 2.x. >>>>>> >>>>>> Once we choose a strategy, we may want to formalize release schedules >>>>>> around it. >>>>>> >>>>>> Tom >>>>>> _______________________________________________ >>>>>> Pandas-dev mailing list >>>>>> Pandas-dev at python.org >>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>> >>>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>> >>> _______________________________________________ >>> Pandas-dev mailing list >>> Pandas-dev at python.org >>> https://mail.python.org/mailman/listinfo/pandas-dev >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From irv at princeton.com Tue Jul 23 17:44:31 2019 From: irv at princeton.com (Irv Lustig) Date: Tue, 23 Jul 2019 17:44:31 -0400 Subject: [Pandas-dev] Version Policy following 1.0 (Marc Garcia) In-Reply-To: References: Message-ID: All: Let me add some perspective on version numbering from having lived as a product manager in the software industry for a number of years where we had a lot of discussions on these issues. We didn't tie deprecation events to "major" or "minor" releases. When we decided to deprecate something, we said it would be deprecated in a "future" release, which was typically in 3 releases, major or minor. At ILOG, the decision on having a major (increment the number before the decimal point) was solely a marketing decision. If we felt that the changes in the new release were significant, then we would bump up the major release number. At IBM, there were some "rules" that said whether we could have a major release. So we had a lot of minor releases (from a version 12.0 up to a version 12.9 today), and we still counted each "minor" release in terms of deprecation counts. So a policy could be Announce "will be deprecated in the future" in release X Announce "will be deprecated in 2 releases" in release X+1 Announce "will be deprecated in next release" in release X+2 and include warning messages in code. Announce "has been deprecated" in release X+3 with the code removed. So it doesn't matter whether X=1.0, X+1=1.1, X+2=1.2, and X+3=1.3, or X=1.0, X+1=1.1, X+2=2.0 and X+3=2.1 Don't let the numbering decide the deprecation policy. Just call each "major" or "minor" release a "release" and use something like the "3 release" policy stated above. And if you do decide on the "3 release" policy, have some way to keep track of when each thing is going to be deprecated. I'm not sure that exists today for pandas. -Irv Lustig (Dr-Irv) On Tue, Jul 23, 2019 at 12:00 PM wrote: > Send Pandas-dev mailing list submissions to > pandas-dev at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/pandas-dev > or, via email, send a message with subject or body 'help' to > pandas-dev-request at python.org > > You can reach the person managing the list at > pandas-dev-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Pandas-dev digest..." > > > Today's Topics: > > 1. Re: Version Policy following 1.0 (Marc Garcia) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 22 Jul 2019 17:25:17 +0100 > From: Marc Garcia > To: Tom Augspurger > Cc: Matthew Rocklin , pandas-dev > > Subject: Re: [Pandas-dev] Version Policy following 1.0 > Message-ID: > < > CAEk5N5sNDOFnG+3-JAvANi7pGFvLH5sSvtUknNdEx0zOquaoOQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > My comment was thinking on final users, and not developers of other > packages. I would say that a company that has a large code base with many > dependencies including pandas, would prefer to be able to keep updating > pandas in its version 1.* without worrying much about breaking anything, > and plan well a 1.* to 2.* migration. > > I'm also unsure if removing deprecated things in a rolling way will cause a > faster progress. I'm more biased to forget about removing stuff most of the > time, and for major versions just remove everything. I think it would make > our life easier to not have the overhead of > https://github.com/pandas-dev/pandas/issues/6581 > > My feeling is that everything will be simpler for everyone with SemVer, and > we are the ones deciding when we release a major version, so we'll keep > deprecated stuff for as long as we want, if the main advantage of rolling > deprecations is to remove things faster. But may be in practice there are > other factors that I'm not considering. If the rest of people think rolling > deprecations will be better, I'm ok with it, I may be wrong. > > On Mon, Jul 22, 2019 at 4:54 PM Tom Augspurger > > wrote: > > > Thanks Matt, > > > > Marc, do you have thoughts on that? As you say, SemVer is more popular in > > absolute terms. But within our little community (NumPy, pandas, > > scikit-learn), rolling deprecations seems to be the preferred approach. > > I think there's some value in being consistent with those libraries. > > > > Joris / Wes, do you know what Arrow's policy will be after its 1.0? > > > > Tom > > > > On Sun, Jul 21, 2019 at 10:10 AM Matthew Rocklin > > wrote: > > > >> Hi All, > >> > >> I hope you don't mind the intrusion of a non-pandas dev here. In my > >> opinion SemVer makes more sense for libraries with a well defined and > >> narrowly scoped API, and less sense for an API as vast as the Pandas > API. > >> > >> My ardent hope as a user is that you all will clean up and improve the > >> Pandas API continuously. While doing this work I fully expect small > bits > >> of the API to break on pretty much every release (I think that it would > be > >> hard to avoid this). My guess is that if this community adopted SemVer > >> then devs would be far more cautious about tidying things, which I think > >> would be unfortunate. As someone who is very sensitive to changes in > the > >> Pandas API I'm fully in support of the devs breaking things regularly > if it > >> means faster progress. > >> > >> Best, > >> -matt > >> > >> On Sun, Jul 21, 2019 at 2:24 AM Marc Garcia > >> wrote: > >> > >>> Personally, I think will make things easier for users if we use SemVer. > >>> Mainly because as a user I think it's somehow intuitive that I can > upgrade > >>> for example from 1.1.0 to 1.8.0 without having to edit code. But I'd > expect > >>> to have to take a closer look and see incompatibilities when I migrate > from > >>> 1.* to 2.*. > >>> > >>> Personally I don't usually know the deprecation policies of packages, > >>> and I don't expect most pandas users to know ours even now. So, the > >>> simplest the better. > >>> > >>> Also, for ourselves, I think it's easier and more efficient to forget > >>> about removing code in minors, and to do all the removals for majors. > >>> > >>> I agree this comes at the cost of bigger changes in major releases, > >>> compared to a rolling policy. But IMHO it's worth. > >>> > >>> > >>> On Fri, 19 Jul 2019, 21:58 Tom Augspurger, > > >>> wrote: > >>> > >>>> On Wed, Jul 17, 2019 at 4:42 PM Brock Mendel > >>>> wrote: > >>>> > >>>>> Do we anticipate the rate of deprecations decreasing significantly? > >>>>> i.e. if right now we deprecated everything on which there is a > consensus in > >>>>> GH, would we be done for a while? > >>>>> > >>>>> If not, then I think we're better off sticking with zero-dot-*, or > >>>>> else we'll be bumping major versions really frequently. > >>>>> > >>>> > >>>> I think this is why I prefer sticking with rolling. If every release > >>>> bumps the major version number, then no release is a major release. > >>>> > >>>> So my preference would be > >>>> > >>>> 1. Formally adopt and document that we using a rolling deprecation > cycle > >>>> 2. State that deprecations will be around for `N` major releases (3?) > >>>> 3. Require that every new deprecation includes the version it'll be > >>>> enforced in. e.g. > >>>> > >>>> ``` > >>>> DataFrame.get_dtype_counts is deprecated, and will be removed in > pandas > >>>> 1.3.0. > >>>> Use DataFrame.dtypes.value_counts() instead. > >>>> ``` > >>>> > >>>> Tom > >>>> > >>>> On Wed, Jul 17, 2019 at 1:49 PM Tom Augspurger < > >>>>> tom.augspurger88 at gmail.com> wrote: > >>>>> > >>>>>> Split from > >>>>>> https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html > >>>>>> > >>>>>> Following 1.0, I think we stop outright breaking APIs. I think that > >>>>>> stability will be welcome to users. > >>>>>> > >>>>>> We still have to decide how we deprecate APIs. The two options are > >>>>>> > >>>>>> 1. Rolling deprecations: Essentially what we do today: An API is > >>>>>> deprecated in release 1.1.0 and can be removed in (say) 1.4.0. > >>>>>> 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in > >>>>>> 2.0.0 > >>>>>> > >>>>>> Do people have preferences between the two? The (dis?)advantage of > >>>>>> Semver is that all the API-breaking changes are restricted to a > single > >>>>>> releases. With rolling deprecations, the upgrades from any 1.x to > 1.y > >>>>>> should be smoother than 1.x to 2.x. > >>>>>> > >>>>>> Once we choose a strategy, we may want to formalize release > schedules > >>>>>> around it. > >>>>>> > >>>>>> Tom > >>>>>> _______________________________________________ > >>>>>> Pandas-dev mailing list > >>>>>> Pandas-dev at python.org > >>>>>> https://mail.python.org/mailman/listinfo/pandas-dev > >>>>>> > >>>>> _______________________________________________ > >>>> Pandas-dev mailing list > >>>> Pandas-dev at python.org > >>>> https://mail.python.org/mailman/listinfo/pandas-dev > >>>> > >>> _______________________________________________ > >>> Pandas-dev mailing list > >>> Pandas-dev at python.org > >>> https://mail.python.org/mailman/listinfo/pandas-dev > >>> > >> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/pandas-dev/attachments/20190722/4abdf697/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > > > ------------------------------ > > End of Pandas-dev Digest, Vol 74, Issue 16 > ****************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtratner at gmail.com Wed Jul 24 18:01:49 2019 From: jtratner at gmail.com (Jeffrey Tratner) Date: Wed, 24 Jul 2019 15:01:49 -0700 Subject: [Pandas-dev] Version Policy following 1.0 In-Reply-To: References:

Message-ID: Hi all, >From my experience at medium-sized company that?s used pandas for years - I?d really appreciate a Sem Ver interface, even if it means tons of version numbers. It?d be nice to be able to look at each major version bump and go figure out which code I have to update that way :) Realistically, we are incredibly cautious about bumping pandas/numpy versions and we pin all dependencies with pipenv, so we wouldn?t try to constantly try to stay up to date (but sem ver would help us gauge risk) How about the concept of ?experimental? vs ?stable? APIs as I?ve seen in kubernetes, node and friends? For stable APIs, you follow sem Ver strictly; experimental you get a different release policy :) Regardless I appreciate all the work you?re doing! I?m excited to see a move to 1.0. Mostly I?d encourage you to pick whatever mode makes it easiest for you to develop. Thanks, Jeff On Mon, Jul 22, 2019 at 09:25 Marc Garcia wrote: > My comment was thinking on final users, and not developers of other > packages. I would say that a company that has a large code base with many > dependencies including pandas, would prefer to be able to keep updating > pandas in its version 1.* without worrying much about breaking anything, > and plan well a 1.* to 2.* migration. > > I'm also unsure if removing deprecated things in a rolling way will cause > a faster progress. I'm more biased to forget about removing stuff most of > the time, and for major versions just remove everything. I think it would > make our life easier to not have the overhead of > https://github.com/pandas-dev/pandas/issues/6581 > > My feeling is that everything will be simpler for everyone with SemVer, > and we are the ones deciding when we release a major version, so we'll keep > deprecated stuff for as long as we want, if the main advantage of rolling > deprecations is to remove things faster. But may be in practice there are > other factors that I'm not considering. If the rest of people think rolling > deprecations will be better, I'm ok with it, I may be wrong. > > On Mon, Jul 22, 2019 at 4:54 PM Tom Augspurger > wrote: > >> Thanks Matt, >> >> Marc, do you have thoughts on that? As you say, SemVer is more popular in >> absolute terms. But within our little community (NumPy, pandas, >> scikit-learn), rolling deprecations seems to be the preferred approach. >> I think there's some value in being consistent with those libraries. >> >> Joris / Wes, do you know what Arrow's policy will be after its 1.0? >> >> Tom >> >> On Sun, Jul 21, 2019 at 10:10 AM Matthew Rocklin >> wrote: >> >>> Hi All, >>> >>> I hope you don't mind the intrusion of a non-pandas dev here. In my >>> opinion SemVer makes more sense for libraries with a well defined and >>> narrowly scoped API, and less sense for an API as vast as the Pandas API. >>> >>> My ardent hope as a user is that you all will clean up and improve the >>> Pandas API continuously. While doing this work I fully expect small bits >>> of the API to break on pretty much every release (I think that it would be >>> hard to avoid this). My guess is that if this community adopted SemVer >>> then devs would be far more cautious about tidying things, which I think >>> would be unfortunate. As someone who is very sensitive to changes in the >>> Pandas API I'm fully in support of the devs breaking things regularly if it >>> means faster progress. >>> >>> Best, >>> -matt >>> >>> On Sun, Jul 21, 2019 at 2:24 AM Marc Garcia >>> wrote: >>> >>>> Personally, I think will make things easier for users if we use SemVer. >>>> Mainly because as a user I think it's somehow intuitive that I can upgrade >>>> for example from 1.1.0 to 1.8.0 without having to edit code. But I'd expect >>>> to have to take a closer look and see incompatibilities when I migrate from >>>> 1.* to 2.*. >>>> >>>> Personally I don't usually know the deprecation policies of packages, >>>> and I don't expect most pandas users to know ours even now. So, the >>>> simplest the better. >>>> >>>> Also, for ourselves, I think it's easier and more efficient to forget >>>> about removing code in minors, and to do all the removals for majors. >>>> >>>> I agree this comes at the cost of bigger changes in major releases, >>>> compared to a rolling policy. But IMHO it's worth. >>>> >>>> >>>> On Fri, 19 Jul 2019, 21:58 Tom Augspurger, >>>> wrote: >>>> >>>>> On Wed, Jul 17, 2019 at 4:42 PM Brock Mendel >>>>> wrote: >>>>> >>>>>> Do we anticipate the rate of deprecations decreasing significantly? >>>>>> i.e. if right now we deprecated everything on which there is a consensus in >>>>>> GH, would we be done for a while? >>>>>> >>>>>> If not, then I think we're better off sticking with zero-dot-*, or >>>>>> else we'll be bumping major versions really frequently. >>>>>> >>>>> >>>>> I think this is why I prefer sticking with rolling. If every release >>>>> bumps the major version number, then no release is a major release. >>>>> >>>>> So my preference would be >>>>> >>>>> 1. Formally adopt and document that we using a rolling deprecation >>>>> cycle >>>>> 2. State that deprecations will be around for `N` major releases (3?) >>>>> 3. Require that every new deprecation includes the version it'll be >>>>> enforced in. e.g. >>>>> >>>>> ``` >>>>> DataFrame.get_dtype_counts is deprecated, and will be removed in >>>>> pandas 1.3.0. >>>>> Use DataFrame.dtypes.value_counts() instead. >>>>> ``` >>>>> >>>>> Tom >>>>> >>>>> On Wed, Jul 17, 2019 at 1:49 PM Tom Augspurger < >>>>>> tom.augspurger88 at gmail.com> wrote: >>>>>> >>>>>>> Split from >>>>>>> https://mail.python.org/pipermail/pandas-dev/2019-July/001030.html >>>>>>> >>>>>>> Following 1.0, I think we stop outright breaking APIs. I think that >>>>>>> stability will be welcome to users. >>>>>>> >>>>>>> We still have to decide how we deprecate APIs. The two options are >>>>>>> >>>>>>> 1. Rolling deprecations: Essentially what we do today: An API is >>>>>>> deprecated in release 1.1.0 and can be removed in (say) 1.4.0. >>>>>>> 2. SemVer: An API may be deprecated in 1.x.0. It can be removed in >>>>>>> 2.0.0 >>>>>>> >>>>>>> Do people have preferences between the two? The (dis?)advantage of >>>>>>> Semver is that all the API-breaking changes are restricted to a single >>>>>>> releases. With rolling deprecations, the upgrades from any 1.x to 1.y >>>>>>> should be smoother than 1.x to 2.x. >>>>>>> >>>>>>> Once we choose a strategy, we may want to formalize release >>>>>>> schedules around it. >>>>>>> >>>>>>> Tom >>>>>>> _______________________________________________ >>>>>>> Pandas-dev mailing list >>>>>>> Pandas-dev at python.org >>>>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>>>> >>>>>> _______________________________________________ >>>>> Pandas-dev mailing list >>>>> Pandas-dev at python.org >>>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>>> >>>> _______________________________________________ >>>> Pandas-dev mailing list >>>> Pandas-dev at python.org >>>> https://mail.python.org/mailman/listinfo/pandas-dev >>>> >>> _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.augspurger88 at gmail.com Mon Jul 29 10:15:14 2019 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Mon, 29 Jul 2019 09:15:14 -0500 Subject: [Pandas-dev] Pandas Roadmap Message-ID: Hi all, In https://github.com/pandas-dev/pandas/pull/27478, we're proposing the addition of a pandas roadmap. That document describes 1. The process for adding a roadmap item 2. Seeds the roadmap with a few items we'd like to see implemented We'd welcome your feedback, especially on 1. If / how project roadmaps can be useful to you / your organization 2. Thoughts on the roadmap evolution process Feedback on the individual items is of course also welcome, though we're trying to steer discussion of those issues to dedicated issues (but not all the items have issues yet). Thanks, Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: