From jmcintosh at rti.org Thu Nov 4 20:35:39 2021 From: jmcintosh at rti.org (McIntosh, Jonathan) Date: Fri, 5 Nov 2021 00:35:39 +0000 Subject: [Pandas-dev] resampling limitations after release 1.0.5 Message-ID: Hello, I would like to report an issue that may or may not already be known, but that has been a bug that has prevented our project from progressing to more recent releases of pandas. We are working with datasets that extend out past the maximum timestamp date of 2262-04-11, so we generally use period ranges rather than datetime ranges. However, starting with release 1.1.0, the resample functionality no longer works with period ranges that extend past 2262-04-11. The screenshots below show an example of this phenomenon. Would it be possible to restore the resampling functionality for period ranges that extend past 2262-04-11 in future releases? Example Code Snippet: import pandas as ps from datetime import datetime print(ps.__version__) rng = ps.period_range(datetime(2000, 1, 1, 0, 0), datetime(2300, 1, 1, 0, 0), freq="1D") df = ps.DataFrame(index = rng, data = [i+1 for i in range(0,len(rng))]) resample_df = df.resample("6H").interpolate() print(resample_df.head(10)) Result when running with pandas==1.0.5: [cid:image004.png at 01D7D1AA.38C99720] Result when running with pandas==1.1.1: [cid:image003.png at 01D7D1A9.FBD5A5C0] Thanks, Jonathan McIntosh Water Resources Engineer 970.498.1831 | jmcintosh at rti.org 2950 East Harmony Road, Suite 390 | Fort Collins, CO 80528 [cid:image001.jpg at 01D7D1A9.3C12EB30] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 3746 bytes Desc: image001.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 5495 bytes Desc: image003.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 4613 bytes Desc: image004.png URL: From jeffreback at gmail.com Thu Nov 4 22:13:21 2021 From: jeffreback at gmail.com (Jeff Reback) Date: Thu, 4 Nov 2021 22:13:21 -0400 Subject: [Pandas-dev] resampling limitations after release 1.0.5 In-Reply-To: References: Message-ID: pls open an issue on the pandas tracker > On Nov 4, 2021, at 8:36 PM, McIntosh, Jonathan wrote: > > ? > Hello, > > I would like to report an issue that may or may not already be known, but that has been a bug that has prevented our project from progressing to more recent releases of pandas. We are working with datasets that extend out past the maximum timestamp date of 2262-04-11, so we generally use period ranges rather than datetime ranges. However, starting with release 1.1.0, the resample functionality no longer works with period ranges that extend past 2262-04-11. The screenshots below show an example of this phenomenon. Would it be possible to restore the resampling functionality for period ranges that extend past 2262-04-11 in future releases? > > Example Code Snippet: > import pandas as ps > from datetime import datetime > print(ps.__version__) > rng = ps.period_range(datetime(2000, 1, 1, 0, 0), datetime(2300, 1, 1, 0, 0), freq="1D") > df = ps.DataFrame(index = rng, data = [i+1 for i in range(0,len(rng))]) > resample_df = df.resample("6H").interpolate() > print(resample_df.head(10)) > > Result when running with pandas==1.0.5: > > > Result when running with pandas==1.1.1: > > > Thanks, > > Jonathan McIntosh > Water Resources Engineer > > 970.498.1831 | jmcintosh at rti.org > 2950 East Harmony Road, Suite 390 | Fort Collins, CO 80528 > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From bok7zzang at naver.com Sun Nov 7 11:33:06 2021 From: bok7zzang at naver.com (=?utf-8?B?6rmA65GQ7Jew?=) Date: Mon, 08 Nov 2021 01:33:06 +0900 Subject: [Pandas-dev] =?utf-8?q?Can_I_participate_in_contributing_to_this?= =?utf-8?q?_project=28https=3A//github=2Ecom/pandas-dev/pandas=29?= Message-ID: Hello, I'm a computer major student at Kwangwoon University in Korea. I got the task of participating in the open source project in this semester(2021/November~December). So I was looking for various open sources and found your project(pandas). I'm interested in this project, so can I participate in contributing to this project? It seems that the contribution will be mainly translating README.md to korean and correcting typos. Thanks for reading my email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Tue Nov 9 11:27:41 2021 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Tue, 9 Nov 2021 17:27:41 +0100 Subject: [Pandas-dev] November 2021 monthly community meeting (Wednesday November 10, UTC 18:00) Message-ID: Hi all, A reminder that the next monthly dev call is tomorrow (Wednesday, November 10th) at 18:00 UTC (given the DST changes, this might now be 1 hour earlier in your local time, I think it corresponds to 12am Central). Our calendar is at https://pandas.pydata.org/docs/development/meeting.html#calendar to check your local time. All are welcome to attend! Video Call: https://us06web.zoom.us/j/84484803210?pwd=TjUxNmcyNHcvcG9SNGJvbE53Y21GZz09 Minutes: https://docs.google.com/document/u/1/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?ouid=102771015311436394588&usp=docs_home&ths=true (I probably won't be able to attend myself this time, or only briefly) Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Nov 10 12:35:44 2021 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 10 Nov 2021 11:35:44 -0600 Subject: [Pandas-dev] Changing `interpolation` to `method` for quantile and percentile? Message-ID: <3aff1c0474143bf9da095135ca72a57391e4a48c.camel@sipsolutions.net> Hi all, at NumPy, I would like to rename quantile's `interpolation=` to `method=` and add a DeprecationWarning for it. But I realize that probably much more people use it through pandas than NumPy itself, so wondering if you have any inputs on it: https://github.com/numpy/numpy/pull/20327 The reasons are: * interpolation was a reasonable name for variations of the default method. But, we added new methods (same as R's) and for those it does not quite fit. `method` is more descriptive. * I feel quite a few users might better of using a different method (if they did not use the default). This would nudge them to review the code. * Python's `statistics.quantils` uses `method` (although we do not offer the *same* names for the methods right now!) But, if you think it is not worth the churn/noise, I could consider just sticking with `interpolation` for now, or delay giving a `DeprecationWarning`. Cheers, Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From jeffreback at gmail.com Wed Nov 10 12:37:27 2021 From: jeffreback at gmail.com (Jeff Reback) Date: Wed, 10 Nov 2021 12:37:27 -0500 Subject: [Pandas-dev] Changing `interpolation` to `method` for quantile and percentile? In-Reply-To: <3aff1c0474143bf9da095135ca72a57391e4a48c.camel@sipsolutions.net> References: <3aff1c0474143bf9da095135ca72a57391e4a48c.camel@sipsolutions.net> Message-ID: <3460C114-3E12-469A-BB3D-659BC5F9D7E4@gmail.com> no objection > On Nov 10, 2021, at 12:36 PM, Sebastian Berg wrote: > > ?Hi all, > > at NumPy, I would like to rename quantile's `interpolation=` to > `method=` and add a DeprecationWarning for it. > But I realize that probably much more people use it through pandas than > NumPy itself, so wondering if you have any inputs on it: > > https://github.com/numpy/numpy/pull/20327 > > The reasons are: > * interpolation was a reasonable name for variations of the default > method. But, we added new methods (same as R's) and for those > it does not quite fit. `method` is more descriptive. > * I feel quite a few users might better of using a different method > (if they did not use the default). This would nudge them to > review the code. > * Python's `statistics.quantils` uses `method` (although we do not > offer the *same* names for the methods right now!) > > But, if you think it is not worth the churn/noise, I could consider > just sticking with `interpolation` for now, or delay giving a > `DeprecationWarning`. > > Cheers, > > Sebastian > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev From jorisvandenbossche at gmail.com Wed Nov 10 12:53:10 2021 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Wed, 10 Nov 2021 18:53:10 +0100 Subject: [Pandas-dev] API: Make silent casting behavior consistent by deprecating silent _object_-dtype casting In-Reply-To: References: Message-ID: Thanks for bringing this up. Limiting the discussion to getitem for a moment (I think other methods like fillna could deviate if we really want, or could have keywords for it), I am personally in favor of option 2: making everything strict (since I opened that referenced issue about it: https://github.com/pandas-dev/pandas/issues/39584) Now, on the short term, already starting to deprecate silent casting to object (so the first aspect of option 3) doesn't prevent later becoming even more strict (it only wouldn't fully solve the existing inconsistencies), so for that point of view, I personally am fine with that. Joris On Wed, 27 Oct 2021 at 06:38, Brock Mendel wrote: > TLDR > ---- > We have inconsistent silent-casting vs raising logic for numpy vs EA > dtypes > (and inconsistencies within EA dtypes). By deprecating silently casting > to *object* dtype, we can *mostly* make the behaviors match. > > > Background > ---------- > A number of Series/DataFrame methods will silently cast when dealing with > mismatched values. With a numpy dtype, each of the following silently > cast to float64: > > ser = pd.Series([1, 2, 3], dtype="i8") > > ser.shift(1, fill_value=1.5) > ser.mask([True, False, False], 1.5) > ser.where([False, True, True], 1.5) > ser.replace(1, 1.5) > ser[0] = 1.5 > ser.fillna(1.5) # <- this one doesn't cast as it is a no-op > > If we were to pass "foo" or a pd.Period, these would coerce to object > instead of float. > > By contrast, similar mixed-type operations with an ExtensionDtype Series > _mostly_ raise: > > ser2 = pd.Series(pd.period_range("2016-01-01", periods=3, freq="D")) > > ser2.shift(1, fill_value=1.5) # <- ValueError > ser2.mask([True, False, False], 1.5) # <- ValueError > ser2.where([False, True, True], 1.5) # <- ValueError > ser2.fillna(1.5) # <- TypeError > ser2.replace(ser2[0], 1.5) # <- coerces to object > ser2[0] = 1.5 # <- coerces to object > > ser3 = pd.Series([pd.NA, 2, 3], dtype="Int64") > > ser3.shift(1, fill_value=1.5) # <- TypeError > ser3.mask([True, False, False], 1.5) # <- TypeError > ser3.where([False, True, True], 1.5) # <- TypeError > ser3.fillna(1.5) # <- TypeError > ser3.replace(ser3[0], 1.5) # <- TypeError > ser3[0] = 1.5 # <- TypeError > > timedelta64, datetime64, and datetime64tz mostly behave like the numpy > dtypes, > with a few exceptions: > > - shift raises on mismatch > - fillna raises on mismatch for timedelta64, casts for the others > > Categorical mostly behaves like other ExtensionDtypes, except for replace > which > has special logic. > > Goals > ----- > - Have matching behavior across dtypes. > - Share code. > > Options > ------- > 1) Change EA (and dt64/td64) behavior to match non-EA behavior > 2) Change non-EA behavior to match EA behavior (or stricter xref > https://github.com/pandas-dev/pandas/issues/39584) > 3) Deprecate (and eventually raise on) silent casting to _object_ dtype, > allowing silent casting otherwise. > > > Here I am advocating for option 3). The advantages as I see them: > > A) For numpy dtypes, we retain the most useful cases (int->float) > B) Deprecates cases most likely to be unintentional (e.g. typo > "2016-01-01" -> "2p16-01-01" causing a datetime64 Series to silently cast) > C) For td64/dt64/dt64tz/period, the *only* silent casting is to object, so > this completely gets rid of special-casing among that code > D) For IntegerArray, FloatingArray, IntervalArray leaves open the option > of allowing e.g. Integer->Floating casting (xref > https://github.com/pandas-dev/pandas/issues/25288#issuecomment-941762174) > E) Does not preclude later deciding on the stricter options in 2) > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nt_mahmood at yahoo.com Thu Nov 18 10:18:18 2021 From: nt_mahmood at yahoo.com (Mahmood Naderan) Date: Thu, 18 Nov 2021 15:18:18 +0000 (UTC) Subject: [Pandas-dev] Error on using ax in plot() Pandas 1.2.3 References: <2065801950.1291141.1637248698163.ref@mail.yahoo.com> Message-ID: <2065801950.1291141.1637248698163@mail.yahoo.com> Hi I am using the following versions >>> import matplotlib >>> print(matplotlib. __version__) 3.3.4 >>> import pandas as pd >>> print(pd.__version__) 1.2.3 >>> import sys >>> sys.version_info sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0) In my code, I use axes in Pandas plot() like this (note that I omit some variables in this snippet to highlight the problem): def plot_dataframe(df, cnt, axes): ??? plt.subplot(2, 1, 1) ??? ax1 = row.plot( fontsize=font_size, linewidth=line_width, markersize=marker_size, marker='o', title='Raw values', label=cnt, ax=axes[0] ) def plot_kernels(my_dict2): ??? fig,axes = plt.subplots(2,1, figsize=(20, 15)) ??? should_plot = plot_dataframe(df, cnt, axes=axes) ??? for ax in axes: ??????? ax.legend() ??? plt.show() However, I get this error: Traceback (most recent call last): ? File "process_csv.py", line 174, in ??? plot_kernels( my_dict2 ) ? File "process_csv.py", line 62, in plot_kernels ??? should_plot = plot_dataframe(df, cnt, axes=axes) ? File "process_csv.py", line 34, in plot_dataframe ??? ax1 = row.plot( fontsize=font_size, linewidth=line_width, markersize=marker_size, marker='o', title='Raw values', label=cnt, ax=axes[0] ) ? File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_core.py", line 955, in __call__ ??? return plot_backend.plot(data, kind=kind, **kwargs) ? File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/__init__.py", line 61, in plot ??? plot_obj.generate() ? File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py", line 283, in generate ??? self._adorn_subplots() ? File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py", line 483, in _adorn_subplots ??? all_axes = self._get_subplots() ? File "/home/mahmood/.local/lib/python3.8/site-packages/pandas/plotting/_matplotlib/core.py", line 903, in _get_subplots ??? ax for ax in self.axes[0].get_figure().get_axes() if isinstance(ax, Subplot) AttributeError: 'NoneType' object has no attribute 'get_axes' I guess there is a mismatch between versions. Is there any workaround for that? Regards, Mahmood From irv at princeton.com Tue Nov 23 11:22:14 2021 From: irv at princeton.com (Irv Lustig) Date: Tue, 23 Nov 2021 11:22:14 -0500 Subject: [Pandas-dev] Decoupling type stubs for the public API from the pandas distribution Message-ID: I discovered this feature of typing: https://www.python.org/dev/peps/pep-0561/#stub-only-packages The idea is that for a package like pandas, we can have a separate package "pandas-stubs" that would contain the type stubs for pandas. We wouldn't have to worry about including a `py.typed` file or `.pyi` files in our standard pandas distribution - all typing for the public API would be in the separate package. That would allow pandas typing for the public API to be maintained separately (different GitHub repo). We could start by just copying over what Microsoft created at https://github.com/microsoft/python-type-stubs/tree/main/pandas and then we maintain it as a separate repo, which could be installed via pip and conda. Any thoughts on whether we should consider doing this? -Irv -------------- next part -------------- An HTML attachment was scrubbed... URL: