[Numpy-discussion] numpythonically getting elements with the minimum sum

Lluís xscript at gmx.net
Tue Jan 29 10:56:47 EST 2013


Sebastian Berg writes:

> On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote:
>> Gregor Thalhammer writes:
>> 
>> > Am 28.1.2013 um 23:15 schrieb Lluís:
>> 
>> >> Hi,
>> >> 
>> >> I have a somewhat convoluted N-dimensional array that contains information of a
>> >> set of experiments.
>> >> 
>> >> The last dimension has as many entries as iterations in the experiment (an
>> >> iterative application), and the penultimate dimension has as many entries as
>> >> times I have run that experiment; the rest of dimensions describe the features
>> >> of the experiment:
>> >> 
>> >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS)
>> >> 
>> >> So, what I want is to get the data for the best run of each experiment:
>> >> 
>> >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
>> >> 
>> >> by selecting, for each experiment, the run with the lowest total time (sum of
>> >> the time of all iterations for that experiment).
>> >> 
>> >> 
>> >> So far I've got the trivial part, but not the final indexing into "data":
>> >> 
>> >> dsum = data.sum(axis = -1)
>> >> dmin = dsum.min(axis = -1)
>> >> best = data[???]
>> >> 
>> >> 
>> >> I'm sure there must be some numpythonic and generic way to get what I want, but
>> >> fancy indexing is beating me here :)
>> 
>> > Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess:
>> 
>> > dmin_idx = argmin(dsum, axis = -1)
>> > best = data[..., dmin_idx, :]
>> 
>> Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing
>> with it does not exactly work as I expected:
>> 
>> >>> d1.shape
>> (2, 5, 10)
>> >>> dsum = d1.sum(axis = -1)
>> >>> dmin = d1.argmin(axis = -1)
>> >>> dmin.shape
>> (2,)
>> >>> d1_best = d1[...,dmin,:]

> You need to use fancy indexing. Something like:
>>>> d1_best = d1[np.arange(2), dmin,:]

> Because the Ellipsis takes everything from the axis, while you want to
> pick from multiple axes at the same time. That can be achieved with
> fancy indexing (indexing with arrays). From another perspective, you
> want to get rid of two axes in favor of a new one, but a slice/Ellipsis
> always preserves the axis it works on.

Nice, thanks. That works for this specific example, but I couldn't get it to
work with "d1.shape == (1, 2, 16, 5, 10)" (thus "dmin.shape == (1, 2, 16)"):

    >>> def get_best_run (data, field):
    ...     """Returns the best run."""
    ...     data = data.view(np.ndarray)
    ...     assert data.ndim >= 2
    ...     dsum = data[field].sum(axis=-1)
    ...     dmin = dsum.argmin(axis=-1)
    ...     idxs  = [ np.arange(dlen) for dlen in data.shape[:-2] ]
    ...     idxs += [ dmin ]
    ...     idxs += [ slice(None) ]
    ...     return data[tuple(idxs)]
    >>> d1.shape   
    (2, 5, 10)
    >>> get_best_run(d1, "time")
    (2, 10)
    >>> d2.shape
    (1, 2, 16, 5, 10)
    >>> get_best_run(d2, "time")
    Traceback (most recent call last):
      ...
      File "./plot-user.py", line 89, in get_best_run
        res = data.view(np.ndarray)[tuple(idxs)]
    ValueError: shape mismatch: objects cannot be broadcast to a single shape


After reading the "Advanced indexing section", my understanding is that the
elements in "idxs" are not broadcastable to the same shape, but I'm not sure how
I should build them to be broadcastable to what specific shape.


Thanks a lot,
  Lluis


>> >>> d1_best.shape
>> (2, 2, 10)
>> 
>> 
>> Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using
>> this previous code with some example values:
>> 
>> >>> dmin
>> [4 3]
>> >>> d1_best
>> [[[ ... contents of d1[0,4,:] ...]
>> [ ... contents of d1[0,3,:] ...]]
>> [[ ... contents of d1[1,4,:] ...]
>> [ ... contents of d1[1,3,:] ...]]]
>> 
>> 
>> While I actually want this:
>> 
>> [[ ... contents of d1[0,4,:] ...]
>> [ ... contents of d1[1,3,:] ...]]

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth



More information about the NumPy-Discussion mailing list