[Numpy-discussion] copy on demand

Alexander Schmolck a.schmolck at gmx.net
Thu Jun 13 17:36:10 EDT 2002


"Perry Greenfield" <perry at stsci.edu> writes:

> > I guess the best reason to bite the bullet and carry around state
> > information
> > would be if there were significant other cases where one also
> > would want to
> > optimize operations under the hood. If there isn't much else in
> > this direction
> > then the effort involved might not be justified. One thing that bugs me in
> > Numeric (and that might already have been solved in numarray) is that
> > e.g. ``ravel`` (and I think also ``transpose``) creates
> > unnecessary copies,
> > whereas ``.flat`` doesn't, but won't work in all cases (viz. when
> > the array is
> > non-contiguous), so I can either have ugly or inefficient code.
> >
> I guess that depends on what you mean by unnecessary copies. 

In most cases the array of which I desire a flattened representation is
contiguous (plus, I usually don't intend to modify it). Consequently, in most
cases I don't want to any copies of it to be created (especially not if it is
really large -- which is not seldom the case).

The fact that you can never really be sure whether you can actually use
``.flat``, without checking beforehand if the array is in fact contiguous (I
don't think there are many guarantees about something being contiguous, or are
there?) and that ravel will always work but has a huge overhead, suggests to
me that something is not quite right.

> If the array is non-contiguous what would you have it do?

Simple -- in that case 'lazy ravel' would do the same as 'ravel' currently
does, create a copy (or alternatively rearrange the memory representation to
make it non-contiguous and then create a lazy copy, but I don't know whether
this would be a good or even feasible idea).

A lazy version of ravel would have the same semantics as ravel but only create
an actual copy if necessary-- which means as long as no modification takes
place and the array is non-contiguous, it will be sufficient to return the
``.flat`` (for starters). If it is contiguous than the copying can't be
helped, but these cases are rare and currently you either have to test for
them explicitly or slow everything down and waste memory by just always using
``ravel()``.

For example, if bar is contiguous ``foo = ravel(bar)`` would be
computationally equivalent to ``bar.flat``, as long as neither of them is
modified, but semantically equivalent to the current ``foo = ravel(bar)`` in
all cases.

Thus you could now write:

>>> a = ravel(a)[20:]

wherever you've written this boiler-plate code before:

>>> if a.iscontiguous(): 
>>>    a = a.flat[20:]
>>> else:
>>>    a = ravel(a)[20:]

without any loss of performance.

> 
> > > a feature one wants even though they are not the default, it turns
> > > out that it isn't all that simple to obtain views without sacrificing
> > > ordinary slicing syntax to obtain a view. It is simple to obtain
> > > copies of view slices though.
> >
> > I'm not sure I understand the above.  What is the problem with
> > ``a.view[1:3]``
> > (or``a.view()[1:3])?
> >
> I didn't mean to imply it wasn't possible, but that it was not
> quite as clean. The thing I don't like about this approach (or
> Paul's suggestion of a.sub) is the creation of an odd object
> that has as its only purpose being sliced. (Even worse, in my

I personally don't find it messy.  And please keep in mind that the ``view``
construct would only very seldomly be used if copy-on-demand is the default
-- as I said, I've only needed the aliasing behavior once -- no doubt it was
really handy then, but the fact that e.g. matlab doesn't have anything along
those lines (AFAIK) suggests that many people will never need it.

So even if ``.view`` is messy, I'd rather have something messy that is almost
never used, in exchange for (what I perceive as) significantly nicer and
cleaner semantics for something that is used all the time (array slicing;
alias slicing is messy in at least the respect that it breaks standard usage
and generic sequence code as well as causing potentially devious
bugs. Unexpected behaviors like phantom buffers kept alive in their entirety
by partial views etc. or what ``A = A[::-1]`` does are not exactly pretty
either).


> opinion, is making it a different kind of array where slicing
> behaves differently. That will lead to the problem we have
> discussed for other kinds of array behavior, namely, how do
> you keep from being confused about a particular array's slicing
> behavior). That could lead to confusion as well. Many may be

I don't see that problem, frankly. The view is *not* an array. It doesn't need
(and shouldn't have) anything except a method to access slices (__getitem__).

As mentioned before, I also regard it as highly desirable that ``b =
a.view[3:10]`` sticks out immediately. This signals "warning -- potentially
tricky code ahead". Nothing in ``b = a[3:10]`` tells you that someone intends
to modify a and b depedently (because in more than 9 out of 10 cases he won't)
-- now *this* is confusing.

> under the impression that x = a.view makes x refer to an array
> when it doesn't. Users would need to know that a.view without
> a '[' is usually an error.

Since the ``.view`` shouldn't allow anything except slicing, they'll soon find
out ("Error: you can't multiply me, I'm a view and not an array"). And I can't
see why that would be harder to figure out (or look up in the docu) than that
a[1:3] creates an alias and *not* a copy contrary to *everything* else you've
ever heard or read about python sequences (especially since in most cases it
will work as intended).

Also what exactly is the confused person's notion of the purpose of ``x =
a.view`` supposed to be? That ``x = a`` is what ``x = a.copy()`` really does
and that to create aliases an alias to ``a`` they would have to use 
``x = a.view``? In that case they'd better read the python tutorial before they do
any more python programming, because they are in for all kinds of unpleasant
surprises (``a = []; b = a; b[1] = 3; print a`` -- oops).


alex

-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.Schmolck at gmx.net     http://www.dcs.ex.ac.uk/people/aschmolc/





More information about the NumPy-Discussion mailing list