counting unique numpy subarrays
duncan smith
duncan at invalid.invalid
Fri Dec 4 14:43:35 EST 2015
Hello,
I'm trying to find a computationally efficient way of identifying
unique subarrays, counting them and returning an array containing only
the unique subarrays and a corresponding 1D array of counts. The
following code works, but is a bit slow.
###############
from collections import Counter
import numpy
def bag_data(data):
# data (a numpy array) is bagged along axis 0
# returns concatenated array and corresponding array of counts
vec_shape = data.shape[1:]
counts = Counter(tuple(arr.flatten()) for arr in data)
data_out = numpy.zeros((len(counts),) + vec_shape)
cnts = numpy.zeros((len(counts,)))
for i, (tup, cnt) in enumerate(counts.iteritems()):
data_out[i] = numpy.array(tup).reshape(vec_shape)
cnts[i] = cnt
return data_out, cnts
###############
I've been looking through the numpy docs, but don't seem to be able to
come up with a clean solution that avoids Python loops. TIA for any
useful pointers. Cheers.
Duncan
More information about the Python-list
mailing list