counting unique numpy subarrays
Albert-Jan Roskam
sjeik_appie at hotmail.com
Fri Dec 4 17:36:52 EST 2015
Hi
(Sorry for topposting)
numpy.ravel is faster than numpy.flatten (no copy)
numpy.empty is faster than numpy.zeros
numpy.fromiter might be useful to avoid the loop (just a hunch)
Albert-Jan
> From: duncan at invalid.invalid
> Subject: counting unique numpy subarrays
> Date: Fri, 4 Dec 2015 19:43:35 +0000
> To: python-list at python.org
>
> Hello,
> I'm trying to find a computationally efficient way of identifying
> unique subarrays, counting them and returning an array containing only
> the unique subarrays and a corresponding 1D array of counts. The
> following code works, but is a bit slow.
>
> ###############
>
> from collections import Counter
> import numpy
>
> def bag_data(data):
> # data (a numpy array) is bagged along axis 0
> # returns concatenated array and corresponding array of counts
> vec_shape = data.shape[1:]
> counts = Counter(tuple(arr.flatten()) for arr in data)
> data_out = numpy.zeros((len(counts),) + vec_shape)
> cnts = numpy.zeros((len(counts,)))
> for i, (tup, cnt) in enumerate(counts.iteritems()):
> data_out[i] = numpy.array(tup).reshape(vec_shape)
> cnts[i] = cnt
> return data_out, cnts
>
> ###############
>
> I've been looking through the numpy docs, but don't seem to be able to
> come up with a clean solution that avoids Python loops. TIA for any
> useful pointers. Cheers.
>
> Duncan
> --
> https://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list