[Numpy-discussion] deprecate updateifcopy in nditer operand, flags?

Wed Nov 8 15:12:55 EST 2017

At a higher level:

The issue here is that we need to break the nditer API. This might
affect you if you np.nditer (in Python) or the NpyIter_* APIs (in C).
The exact cases affected are somewhat hard to describe because
nditer's flag processing is complicated [1], but basically it's cases
where you are writing to one of the arrays being iterated over and
then something else non-trivial happens.

The problem is that the API currently uses NumPy's odd UPDATEIFCOPY
feature. What it does is give you an "output" array which is not your
actual output array, but instead some other temporary array which you
can modify freely, and whose contents are later written back to your
actual output array.

When does this copy happen? Since this is an iterator, then most of
the time we can do the writeback for iteration N when we start
iteration N+1. However, this doesn't work for the final iteration. On
the final iteration, currently the writeback happens when the
temporary is garbage collected. *Usually* this happens pretty
promptly, but this is dependent on some internal details of how
CPython's garbage collector works that are explicitly not part of the
Python language spec, and on PyPy you silently and
non-deterministically get incorrect results. Plus it's error-prone
even on CPython -- if you accidentally have a dangling reference to
one array, then suddenly another array will have the wrong contents.

So we have two options:

- We could stop supporting this mode entirely. Unfortunately, it's
hard to know if anyone is using this, since the conditions to trigger
it are so complicated, and not necessarily very exotic (e.g. it can
happen if you have a function that uses nditer to read one array and
write to another, and then someone calls your function with two arrays
whose memory overlaps).

- We could adjust the API so that there's some explicit operation to
trigger the final writeback. At the Python level this would probably
mean that we start supporting the use of nditer as a context manager,
and eventually start raising an error if you're in one of the "unsafe"
case and not using the context manager form. At the C level we
probably need some explicit "I'm done with this iterator now" call.

One question is which cases exactly should produce warnings/eventually
errors. At the Python level, I guess the simplest rule would be that
if you have any write/readwrite arrays in your iterator, then you have
to use a 'with' block. At the C level, it's a little trickier, because
it's hard to tell up-front whether someone has updated their code to
call a final cleanup function, and it's hard to emit a warning/error
on something that *doesn't* happen. (You could print a warning when
the nditer object is GCed if the cleanup function wasn't called, but
you can't raise an error there.) I guess the only reasonable option is
to deprecate NPY_ITER_READWRITE and NP_ITER_WRITEONLY, and make people
switch to passing new flags that have the same semantics but also
promise that the user has updated their code to call the new cleanup
function.

Does that work? Any objections?

-n

[1] The affected cases are the ones that reach this line:

   https://github.com/numpy/numpy/blob/c276f326b29bcb7c851169d34f4767da0b4347af/numpy/core/src/multiarray/nditer_constr.c#L2926

So it's something like
- all of these things are true:
  - you have a writable array (nditer flags "write" or "readwrite")
  - one of these things is true:
    - you passed the "forcecopy" flag
    - all of these things are true:
      - you requested casting
      - you requested updateifcopy
    - there's a memory overlap between this array and another of the
arrays being iterated over

On Wed, Nov 8, 2017 at 11:31 AM, Matti Picus <matti.picus at gmail.com> wrote:
>
> Date: Wed, 8 Nov 2017 18:41:03 +0200
> From: Matti Picus <matti.picus at gmail.com>
> To: numpy-discussion at python.org
> Subject: [Numpy-discussion] deprecate updateifcopy in nditer operand
> flags?
> Message-ID: <c46bfba8-bad8-5166-e580-456527042004 at gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> I filed issue 9714 https://github.com/numpy/numpy/issues/9714 and wrote
> a mail in September trying to get some feedback on what to do with
> updateifcopy semantics and user-exposed nditer.
> It garnered no response, so I am trying again.
> For those who are unfamiliar with the issue see below for a short
> summary and issue 7054 for a lengthy discussion.
> Note that pull request 9639 which should be merged very soon changes the
> magical UPDATEIFCOPY into WRITEBACKIFCOPY, and hopefully will appear in
> NumPy 1.14.
>
> As I mention in the issue, there is a magical update done in this
> snippet in the next-to-the-last line:
>
> |a = np.arange(24, dtype='f8').reshape(2, 3, 4).T i = np.nditer(a, [],
> [['readwrite', 'updateifcopy']], casting='same_kind',
> op_dtypes=[np.dtype('f4')]) # Check that UPDATEIFCOPY is activated
> i.operands[0][2, 1, 1] = -12.5 assert a[2, 1, 1] != -12.5 i = None #
> magic!!! assert a[2, 1, 1] == -12.5|
>
> Formatting
>
> a = np.arange(24, dtype='f8').reshape(2, 3, 4).T
> i = np.nditer(a, [], [['readwrite', 'updateifcopy']], casting='same_kind',
>                op_dtypes=[np.dtype('f4')])
> # Check that WRITEBACKIFCOPY is activated
> i.operands[0][2, 1, 1] = -12.5
> assert a[2, 1, 1] != -12.5
> i=None                       # magic
> assert a[2, 1, 1] == -12.5
>
> Not only is this magic very implicit, it relies on refcount semantics
> and thus does not work on PyPy.
> Possible solutions:
>
> 1. nditer is rarely used, just deprecate updateifcopy use on operands
>
> 2. make nditer into a context manager, so the code would become explicit
>
> |a = np.arange(24, dtype='f8').reshape(2, 3, 4).T with np.nditer(a, [],
> [['readwrite', 'updateifcopy']], casting='same_kind',
> op_dtypes=[np.dtype('f4')]) as i: # Check that WRITEBACKIFCOPY is
> activated i.operands[0][2, 1, 1] = -12.5 assert a[2, 1, 1] != -12.5
> assert a[2, 1, 1] == -12.5 # a is modified in i.__exit__|
>
> Formatting
>
> a = np.arange(24, dtype='f8').reshape(2, 3, 4).T
> with np.nditer(a, [], [['readwrite', 'updateifcopy']], casting='same_kind',
>                op_dtypes=[np.dtype('f4')]) as i:
>     # Check that WRITEBACKIFCOPY is activated
>     i.operands[0][2, 1, 1] = -12.5
>     assert a[2, 1, 1] != -12.5
> assert a[2, 1, 1] == -12.5                   # a is modified in i.__exit__
>
> 3. something else?
>
> Any opinions? Does anyone use nditer in production code?
> Matti
>
> -------------------------
> what are updateifcopy semantics? When a temporary copy or work buffer is
> required, NumPy can (ab)use the base attribute of an ndarray by
>
>  ?? - creating a copy of the data from the base array
>
>  ?? - mark the base array read-only
>
> Then when the temporary buffer is "no longer needed"
>
>  ?? - the data is copied back
>
>  ?? - the original base array is marked read-write
>
> The trigger for the "no longer needed" decision before pull request 9639
> is in the dealloc function.
> That is not generally a place to do useful work, especially on PyPy
> which can call dealloc much later.
> Pull request 9639 adds an explicit PyArray_ResolveWritebackIfCopy api
> function, and recommends calling it explicitly before dealloc.
>
> The only place this change is visible to the python-level user is in
> nditer.
> C-API users will need to adapt their code to use the new API function,
> with a deprecation cycle that is backwardly compatible on CPython.
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

-- 
Nathaniel J. Smith -- https://vorpus.org