[Pandas-dev] Proposal for consistent, clear copy/view semantics in pandas with Copy-on-Write

Adrin adrin.jalali at gmail.com
Tue Aug 10 06:52:35 EDT 2021


>
> > I am not fully sure if I understand your question correctly, but
> something like `df["column_A"] = ....` still edits the DataFrame in
> place. So here there is no "new" or "old" version of the DataFrame.
> That specific example replaces a full column and will not trigger a copy
> (as it doesn't edit the specific column's data inplace), but if you take
> something like `df.loc[mask, '"column_A"] = ...`, the possible copy happens
> inside df: if "column_A" is a view / being viewed, then the underlying
> array for this column first gets copied before being mutated. So the copy
> happens on the level of the array. But the DataFrame df itself is still
> mutated in place (the array for "column_A" get replaced with a copy of it),
> so also here there is no "old"/"new" version of the DataFrame.
> Does that answer the question, or can you otherwise clarify your question?
>

I guess as a user, I find it odd that with and w/o a mask, the behavior is
different. So does that mean `df.loc[mask, '"column_A"] = ...` is not a
valid operation? Cause I guess I've lost that copy which holds the modified
data, right?

Silly question: why not move the other way around, i.e. always modify the
original data, unless the user does a `copy()`? Is that not more intuitive
to people?

On Mon, Aug 9, 2021 at 6:53 PM Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

>
> On Fri, 23 Jul 2021 at 22:09, Brock Mendel <jbrockmendel at gmail.com> wrote:
>
>> > Memory implications should be positive (less copying).
>>
>> This is accurate _only_ in cases where we currently make copies.  In
>> cases where we currently make views, the perf effect goes the other way.
>>
>
> Yes, but to clear: only when you mutate an object. As long as you don't do
> that (which I think is the majority of operations), we will keep making
> views where we currently do that already.
>
> On Mon, 26 Jul 2021 at 18:38, Brock Mendel <jbrockmendel at gmail.com> wrote:
>
>> > data.iloc[:, c] = (data.iloc[:, c] - data.iloc[:, c].mean()) /
>> data.iloc[:, c].std()
>>
>> This would not make any copies under any of the scenarios being
>> discussed, including the status quo.
>>
>
> One small point: this might depend on whether we keep `[:, col]` as a
> special case replacing the column altogether (as we currently still do, I
> think, related to some recent discussions), or if we see it as an in-place
> mutation of the existing column with a slice (which just happens to be a
> "full" slice). In the second case, this could actually trigger
> copy-on-write since the same column is also accessed (only as temporary
> variable, but python might not yet have garbage collected it).
>
> On Mon, 26 Jul 2021 at 11:51, Adrin <adrin.jalali at gmail.com> wrote:
>
>> ....
>> Also, one issue I have, is that if we're doing copy-on-write, then what
>> does the above mean? As in, if I do `df["column_A"] = ....`, where is that
>> copy? How do I access the new one as opposed to the old one?
>>
>
> I am not fully sure if I understand your question correctly, but something
> like `df["column_A"] = ....` still edits the DataFrame in place. So here
> there is no "new" or "old" version of the DataFrame.
> That specific example replaces a full column and will not trigger a copy
> (as it doesn't edit the specific column's data inplace), but if you take
> something like `df.loc[mask, '"column_A"] = ...`, the possible copy happens
> inside df: if "column_A" is a view / being viewed, then the underlying
> array for this column first gets copied before being mutated. So the copy
> happens on the level of the array. But the DataFrame df itself is still
> mutated in place (the array for "column_A" get replaced with a copy of it),
> so also here there is no "old"/"new" version of the DataFrame.
> Does that answer the question, or can you otherwise clarify your question?
>
> Joris
>
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/pandas-dev/attachments/20210810/be594d72/attachment-0001.html>


More information about the Pandas-dev mailing list