[Pandas-dev] Proposal for consistent, clear copy/view semantics in pandas with Copy-on-Write

Tom Augspurger tom.augspurger88 at gmail.com
Fri Jul 16 12:28:23 EDT 2021


On Fri, Jul 16, 2021 at 11:04 AM Brock Mendel <jbrockmendel at gmail.com>
wrote:

> [xposting from https://github.com/pandas-dev/pandas/issues/36195]
>
> I'm glad there is a proof of concept to help clarify what this looks like.
>
> I do not like the fact that nothing can ever be "just a view" with these
> semantics, including series[::-1], frame[col], frame[:]. Users reasonably
> expect numpy semantics for these.
>

I wonder if we can validate what users (new and old) *actually* expect?
Users coming from R, which IIRC implements Copy on Write for matrices,
might be OK with indexing always being (behaving like) a copy.
I'm not sure what users coming from NumPy would expect, since I don't know
how many NumPy users really understand *a**.)* when a NumPy slice is a view
or copy, and *b.) *how a pandas indexing operation translates to a NumPy
slice.


> We should revisit the alternative "clear/simple rules" approach that is
> "indexing on columns always gives a view" (
> https://github.com/pandas-dev/pandas/pull/33597). This is simpler to
> explain/grok, simpler to implement, and not dependent on BlockManager vs
> ArrayManager.
>
> On Fri, Jul 16, 2021 at 5:26 AM Joris Van den Bossche <
> jorisvandenbossche at gmail.com> wrote:
>
>>
>>
>> On Mon, 12 Jul 2021 at 00:58, Joris Van den Bossche <
>> jorisvandenbossche at gmail.com> wrote:
>>
>>> Short summary of the proposal:
>>>
>>>    1. The result of *any* indexing operation (subsetting a DataFrame or
>>>    Series in any way) or any method returning a new DataFrame, always *behaves
>>>    as if it were* a copy in terms of user API.
>>>
>>>  To explicitly call out the column-as-Series case (since this is a
>> typical case that right now *always* is a view): "any" indexing
>> operation thus also included accessing a DataFrame column as a Series (or
>> slicing a Series).
>>
>> So something like s = df["col"] and then mutating s will no longer
>> update df. Similarly for series_subset = series[1:5], mutating
>> series_subset will no longer update s.
>> _______________________________________________
>> Pandas-dev mailing list
>> Pandas-dev at python.org
>> https://mail.python.org/mailman/listinfo/pandas-dev
>>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20210716/87a75a62/attachment.html>


More information about the Pandas-dev mailing list