[Numpy-discussion] Indexing empty dimensions with empty arrays

Wed Dec 28 08:21:01 EST 2011

On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn <
d.s.seljebotn at astro.uio.no> wrote:

> On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote:
> > On 12/28/2011 09:33 AM, Ralf Gommers wrote:
> >>
> >>
> >> 2011/12/27 Jordi Gutiérrez Hermoso<jordigh at octave.org
> >> <mailto:jordigh at octave.org>>
> >>
> >>      On 26 December 2011 14:56, Ralf Gommers<
> ralf.gommers at googlemail.com
> >>      <mailto:ralf.gommers at googlemail.com>>  wrote:
> >>       >
> >>       >
> >>       >  On Mon, Dec 26, 2011 at 8:50 PM,<josef.pktd at gmail.com
> >>      <mailto:josef.pktd at gmail.com>>  wrote:
> >>       >>  I have a hard time thinking through empty 2-dim arrays, and
> >>      don't know
> >>       >>  what rules should apply.
> >>       >>  However, in my code I might want to catch these cases rather
> early
> >>       >>  than late and then having to work my way backwards to find
> out where
> >>       >>  the content disappeared.
> >>       >
> >>       >
> >>       >  Same here. Almost always, my empty arrays are either due to
> bugs
> >>      or they
> >>       >  signal that I do need to special-case something. Silent passing
> >>      through of
> >>       >  empty arrays to all numpy functions is not what I would want.
> >>
> >>      I find it quite annoying to treat the empty set with special
> >>      deference. "All of my great-grandkids live in Antarctica" should be
> >>      true for me (I'm only 30 years old). If you decide that is not true
> >>      for me, it leads to a bunch of other logical annoyances up there
> >>
> >>
> >> Guess you don't mean true/false, because it's neither. But I understand
> >> you want an empty array back instead of an error.
> >>
> >> Currently the problem is that when you do get that empty array back,
> >> you'll then use that for something else and it will probably still
> >> crash. Many numpy functions do not check for empty input and will still
> >> give exceptions. My impression is that you're better off handling these
> >> where you create the empty array, rather than in some random place later
> >> on. The alternative is to have consistent rules for empty arrays, and
> >> handle them explicitly in all functions. Can be done, but is of course a
> >> lot of work and has some overhead.
> >
> > Are you saying that the existence of other bugs means that this bug
> > shouldn't be fixed? I just fail to see the relevance of these other bugs
> > to this discussion.
>

See below.

> > For the record, I've encountered this bug many times myself and it's
> > rather irritating, since it leads to more verbose code.
> >
> > It is useful whenever you want to return data that is a subset of the
> > input data (since the selected subset can usually be zero-sized
> > sometimes -- remember, in computer science the only numbers are 0, 1,
> > and "any number").
> >
> > Here's one of the examples I've had. The Interpolative Decomposition
> > decomposes a m-by-n matrix A of rank k as
> >
> > A = B C
> >
> > where B is an m-by-k matrix consisting of a subset of the columns of A,
> > and C is a k-by-n matrix.
> >
> > Now, if A is all zeros (which is often the case for me), then k is 0. I
> > would still like to create the m-by-0 matrix B by doing
> >
> > B = A[:, selected_columns]
> >
> > But now I have to do this instead:
> >
> > if len(selected_columns) == 0:
> >       B = np.zeros((A.shape[0], 0), dtype=A.dtype)
> > else:
> >       B = A[:, selected_columns]
> >
> > In this case, zero-sized B and C are of course perfectly valid and
> > useful results:
> >
> > In [2]: np.dot(np.ones((3,0)), np.ones((0, 5)))
> > Out[2]:
> > array([[ 0.,  0.,  0.,  0.,  0.],
> >          [ 0.,  0.,  0.,  0.,  0.],
> >          [ 0.,  0.,  0.,  0.,  0.]])
> >
>
> And to answer the obvious question: Yes, this is a real usecase. It is
> used for something similar to image compression, where sub-sections of
> the images may well be all-zero and have zero rank (full story at [1]).
>
> Thanks for the example. I was a little surprised that dot works. Then I
read what wikipedia had to say about empty arrays. It mentions dot like you
do, and that the determinant of the 0-by-0 matrix is 1. So I try:

In [1]: a = np.zeros((0,0))

In [2]: a
Out[2]: array([], shape=(0, 0), dtype=float64)

In [3]: np.linalg.det(a)
Parameter 4 to routine DGETRF was incorrect
<segfault>

 Reading the above thread I understand Ralf's reasoning better, but
> really, relying on NumPy's buggy behaviour to discover bugs in user code
> seems like the wrong approach. Tools should be dumb unless there are
> good reasons to make them smart. I'd be rather irritated about my hammer
> if it refused to drive in nails that it decided where in the wrong spot.
>

The point is not that we shouldn't fix it, but that it's a waste of time to
fix it in only one place. I remember fixing several functions to explicitly
check for empty arrays and then returning an empty array or giving a
sensible error.

So can you answer my question: do you think it's worth the time and
computational overhead to handle empty arrays in all functions?

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111228/d48c316e/attachment.html>