[Pandas-dev] Proposal for consistent, clear copy/view semantics in pandas with Copy-on-Write

Marc Garcia garcia.marc at gmail.com
Mon Jul 12 13:42:13 EDT 2021


+1 on the approach of the proposal, and also +1 to release in a major
version, and not raise deprecation warnings.

Thanks for working on this, it'll make users life much easier.

On Sun, Jul 11, 2021 at 4:58 PM Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

> *(a.k.a. getting rid of the SettingWithCopyWarning)*
>
> Hi all,
>
> As you are probably aware, it's not always straightforward to understand
> the copy or view semantics of indexing methods in pandas. To understand
> when you get a view and when not, or why you get a SettingWithCopyWarning
> or how to get rid of it?
> It's also something that has already been discussed regularly (e.g. the
> discussion and implementation from 2015 started by Nick Eubank at gh-10954
> <https://github.com/pandas-dev/pandas/issues/10954>). Last year, we again
> started to discuss this, which is tracked at
> https://github.com/pandas-dev/pandas/issues/36195. Based on those
> discussions, I have a concrete proposal to change the copy/view semantics
> of pandas.
>
> Short summary of the proposal:
>
>    1. The result of *any* indexing operation (subsetting a DataFrame or
>    Series in any way) or any method returning a new DataFrame, always *behaves
>    as if it were* a copy in terms of user API.
>    2. We implement Copy-on-Write. This way, we can actually use views as
>    much as possible under the hood, while ensuring the user API behaves as a
>    copy.
>
> This addresses multiple aspects: 1) a clear and consistent user API (a
> clear rule: *any* subset or returned series/dataframe is *always* a copy
> of the original, and thus never modifies the original) and 2) improving
> performance by avoiding excessive copies (eg a chained method workflow
> would no longer return an actual data copy at each step).
>
> Longer version of this proposal:
> https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit?usp=sharing
> Proof-of-concept implementation:
> https://github.com/pandas-dev/pandas/pull/41878
> GitHub issue with relevant discussion:
> https://github.com/pandas-dev/pandas/issues/36195
>
> *Since this would be a change with a large impact on users, I think it is
> important to get broad feedback on this*. So comments, thoughts,
> concerns, ideas etc are very welcome (you can comment on the google doc,
> answer to this email or on the github issue).
>
> Best,
> Joris
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20210712/d92cbc42/attachment.html>


More information about the Pandas-dev mailing list