From rth.yurchak at gmail.com Thu Feb 2 10:09:08 2023 From: rth.yurchak at gmail.com (Roman Yurchak) Date: Thu, 2 Feb 2023 16:09:08 +0100 Subject: [Pandas-dev] pandas-vet Message-ID: Hi, There was interesting work done in https://github.com/deppen8/pandas-vet for enforcing automated checks on pandas code. I was wondering if the core teams had some opinions on the enforced rules and could comment to what extent there is a consensus on those, whether they are consistent with what's recommended in the pandas docs. Particularly on things, like pivot_table vs unstack, .array vs .values, and melt vs stack. Currently working on a largish legacy code with lots of pandas code, so IMO something like pyupgrade for pandas could really be great. Also now that pandas-vet is implemented in ruff, I feel it has the potential to become mainstream in a few years. Just checking whether there is some consensus on what could / should be enforced for pandas linting. For the rule "'inplace = True' should be avoided; it has inconsistent behavior": if there is an issue, this could be fixed in some future major release, right ? Thanks, Roman -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Thu Feb 2 11:59:16 2023 From: garcia.marc at gmail.com (Marc Garcia) Date: Thu, 2 Feb 2023 17:59:16 +0100 Subject: [Pandas-dev] pandas-vet In-Reply-To: References: Message-ID: Thanks for starting this discussion. I think each of the points need an independent discussion. In general I think the solution would be to deprecate things in pandas. For the inplace keyword, there is consensus to deprecate it. There was even before pandas 1.0 and plans to remove it everywhere before it, and we almost removed it for pandas 2.0 (not finally happening), but there are still few details to discuss. I guess a linter can help before we start raising the FutureWarnings. For isna/isnull, the initial plan and obvious solution was to also deprecate isnull, but it was decided it was too common. Seems like deprecating it is a better option that a linter. But I guess if the linter is popular enough could help. I'd personally just deprecate things we don't want users to use (instead of encouraging a linter), but if there is no consensus to deprecate, maybe there will be in the future and the linter can help. Some things are trickier, but I guess in general we could end up deprecating things like Series.values in favor of .array or .to_numpy()... Personally -1 on `import pandas as pd`. If we had to rewrite things I guess the numpy module would be simply named np, so no aliasing is needed. And the pandas module namespace is much smaller and not used so frequently, and shorting it to pd has almost no impact in code verbosity. I never alias the pandas module name, and while consistency across projects can be nice, seems odd to have a linter to recommend something that is more a tradition than a good practice. At least that's my opinion. On Thu, Feb 2, 2023 at 4:09 PM Roman Yurchak wrote: > Hi, > > There was interesting work done in https://github.com/deppen8/pandas-vet > for enforcing automated checks on pandas code. > > I was wondering if the core teams had some opinions on the enforced rules > and could comment to what extent there is a consensus on those, whether > they are consistent with what's recommended in the pandas docs. > Particularly on things, like pivot_table vs unstack, .array vs .values, and > melt vs stack. > > Currently working on a largish legacy code with lots of pandas code, so > IMO something like pyupgrade for pandas could really be great. Also now > that pandas-vet is implemented in ruff, I feel it has the potential to > become mainstream in a few years. Just checking whether there is some > consensus on what could / should be enforced for pandas linting. > > For the rule "'inplace = True' should be avoided; it has inconsistent > behavior": if there is an issue, this could be fixed in some future major > release, right ? > > Thanks, > > Roman > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lthomas at enthought.com Fri Feb 3 07:52:46 2023 From: lthomas at enthought.com (lthomas at enthought.com) Date: Fri, 03 Feb 2023 04:52:46 -0800 (PST) Subject: [Pandas-dev] SciPy 2023 Call for Proposals Message-ID: <63dd039e.050a0220.1818.30ab@mx.google.com> SciPy 2023 Call for Proposals -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Wed Feb 8 10:04:23 2023 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Wed, 8 Feb 2023 16:04:23 +0100 Subject: [Pandas-dev] EA Naming Conventions In-Reply-To: References: Message-ID: On Thu, 26 Jan 2023 at 23:30, Brock Mendel wrote: > For historical reasons we've built up an EA namespace without much > internal logic in terms of what is public/private. While this isn't _that_ > big of a deal, it'd be nice to make this more coherent. I see two useful > options: > In my opinion (and recollection), at the start when ExtensionArrays were introduced, the rule was quite clear: *everything* on the base class is considered as public for developers (EA implementors can (or need to) override those), and then whether the actual name is public vs private (i.e. leading underscore or not) depends on whether it should be public for end users (not implementors). And we use documentation / comments to indicate to developers (EA implementors) which parts are required to implement or are optional to implement. > > 1) Use the traditional "an underscore means this should only be called > from within self". Very few methods on the base class satisfy that > characteristic, including the constructor _from_sequence. One benefit of > moving to this is it would make "official" that we shouldn't be using > _values_for_foo from outside EA methods. > We don't want to make all those "private" functions for EAs to implement public to end-users, so I don't think this is an option. Also, there *are* some valid cases to call the _values_for_.. methods outside of other EA methods, so this is not a general rule. > 2) Use underscores to signal to 3rd party authors whether or not there > exists a working (not necessarily performant) implementation on the base > class. In this scenario authors would _have_ to implement private methods, > while implementing public methods would be optional. > > That would make some of the currently private (and not useful for end-users) methods public, and some public methods private (if we do that for existing methods, and not as a rule for new methods). But what is the main goal you want to achieve here? That it is clearer for EA implementors what they need to implement? (currently we use AbstractMethodError for that which seems already clear to me, and we have base tests that you can inherit that should cover those basic things you need to implement) Joris > Thoughts? > > > _______________________________________________ > Pandas-dev mailing list > Pandas-dev at python.org > https://mail.python.org/mailman/listinfo/pandas-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Wed Feb 8 11:40:45 2023 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Wed, 8 Feb 2023 17:40:45 +0100 Subject: [Pandas-dev] February 2023 bi-monthly community meeting (Wednesday February 8, UTC 18:00) Message-ID: Hi all, A late reminder that the next bi-monthly (twice a month) dev call is today in a bit more than 1 hour (Wednesday, February 8) at 18:00 UTC. Our calendar is at https://pandas.pydata.org/docs/development/meeting.html#calendar to check your local time. The pandas Community Meeting is a regular sync meeting for the project's maintainers which is open to the community. All are welcome to attend! Video Call: https://us06web.zoom.us/j/84484803210?pwd=TjUxNmcyNHcvcG9SNGJvbE53Y21GZz09 Meeting notes: https://docs.google.com/document/u/1/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?ouid=102771015311436394588&usp=docs_home&ths=true Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From erotemic at gmail.com Sat Feb 11 14:42:11 2023 From: erotemic at gmail.com (Jonathan Crall) Date: Sat, 11 Feb 2023 14:42:11 -0500 Subject: [Pandas-dev] DataFrame.pivot positional deprecations Message-ID: Hi all, Please tell me if there is a better place to raise this, but I'm seeing a lot of: FutureWarning: In a future version of pandas all arguments of DataFrame.pivot will be keyword-only. I'm wondering: what is the rationale behind removing the positional arguments here? They seem perfectly natural to me. I'd like to put my two cents in and suggest that maybe moving to keyword only is not the best idea in this case because pivot is very useful in interactive sessions, but keyword only args will make it more cumbersome to type and access. If there is a very good reason for removing positional arguments I'm open to updating my code, but I'd like to see what that rationale and discussion was. If the rationale does not have a solid foundation then my suggestion is perhaps this change should be removed from the roadmap. -- -Dr. Jon Crall (him) -------------- next part -------------- An HTML attachment was scrubbed... URL: From garcia.marc at gmail.com Tue Feb 21 16:55:36 2023 From: garcia.marc at gmail.com (Marc Garcia) Date: Tue, 21 Feb 2023 22:55:36 +0100 Subject: [Pandas-dev] ANN: pandas 2.0.0 RC0 Message-ID: We are happy to announce the *release candidate* of pandas 2.0.0. It can be installed from our conda-forge and PyPI packages via mamba, conda and pip, for example: mamba install -c conda-forge/label/pandas_rc pandas==2.0.0rc0 python -m pip install --upgrade --pre pandas==2.0.0rc0 Users having pandas code in production and maintainers of libraries with pandas as a dependency are *strongly* recommended to run their test suites with the release candidate, and report any breaking change to our issue tracker before the official 2.0.0 release. You can find the documentation of pandas 2.0.0 here , and the list of changes in 2.0.0, in the release notes page . We expect to release the final version of pandas 2.0.0 in around two weeks, but the final date will depend on the issues reported to the release candidate. -------------- next part -------------- An HTML attachment was scrubbed... URL: From suryasriram2950 at gmail.com Wed Feb 15 00:55:32 2023 From: suryasriram2950 at gmail.com (Surya Sriram) Date: Wed, 15 Feb 2023 05:55:32 -0000 Subject: [Pandas-dev] Unable to install pandas on my work computer Message-ID: Hi, I was trying to install python pandas package from Pycharm on my work computer, but I'm unable to install it. I'm attaching the error message below. I don't have adminstrator rights on my computer, I can't open CMD on my computer either. Is there any other way I could install packages? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_20230215_112222086~2.jpg Type: image/jpeg Size: 2608897 bytes Desc: not available URL: