counting unique numpy subarrays

duncan smith duncan at invalid.invalid
Fri Dec 4 14:43:35 EST 2015


Hello,
      I'm trying to find a computationally efficient way of identifying
unique subarrays, counting them and returning an array containing only
the unique subarrays and a corresponding 1D array of counts. The
following code works, but is a bit slow.

###############

from collections import Counter
import numpy

def bag_data(data):
    # data (a numpy array) is bagged along axis 0
    # returns concatenated array and corresponding array of counts
    vec_shape = data.shape[1:]
    counts = Counter(tuple(arr.flatten()) for arr in data)
    data_out = numpy.zeros((len(counts),) + vec_shape)
    cnts = numpy.zeros((len(counts,)))
    for i, (tup, cnt) in enumerate(counts.iteritems()):
        data_out[i] = numpy.array(tup).reshape(vec_shape)
        cnts[i] =  cnt
    return data_out, cnts

###############

I've been looking through the numpy docs, but don't seem to be able to
come up with a clean solution that avoids Python loops. TIA for any
useful pointers. Cheers.

Duncan



More information about the Python-list mailing list