[SciPy-user] histogramdd once more
David Huard
david.huard at gmail.com
Fri Feb 15 11:25:22 EST 2008
Chris,
Thanks again for the bug report.
I implemented a solution where you need only one axes-swapping pass, no
matter what.
I just submitted the fix into SVN and the test you sent me passes. Maybe
check the fix does the appropriate thing for you too.
Cheers,
David
2008/2/15, Chris Lee <c.j.lee at tnw.utwente.nl>:
>
> Hi All,
>
> I didn't strip all of the debug print comments from the earlier snippet.
> I think this is clean
>
> def histogramdd(sample, bins=10, range=None, normed=False, weights=None):
> """histogramdd(sample, bins=10, range=None, normed=False,
> weights=None)
>
> Return the N-dimensional histogram of the sample.
>
> Parameters:
>
> sample : sequence or array
> A sequence containing N arrays or an NxM array. Input data.
>
> bins : sequence or scalar
> A sequence of edge arrays, a sequence of bin counts, or a
> scalar
> which is the bin count for all dimensions. Default is 10.
>
> range : sequence
> A sequence of lower and upper bin edges. Default is [min,
> max].
>
> normed : boolean
> If False, return the number of samples in each bin, if True,
> returns the density.
>
> weights : array
> Array of weights. The weights are normed only if normed is
> True.
> Should the sum of the weights not equal N, the total bin
> count will
> not be equal to the number of samples.
>
> Returns:
>
> hist : array
> Histogram array.
>
> edges : list
> List of arrays defining the lower bin edges.
>
> SeeAlso:
>
> histogram
>
> Example
>
> >>> x = random.randn(100,3)
> >>> hist3d, edges = histogramdd(x, bins = (5, 6, 7))
>
> """
>
> try:
> # Sample is an ND-array.
> N, D = sample.shape
> except (AttributeError, ValueError):
> # Sample is a sequence of 1D arrays.
> sample = atleast_2d(sample).T
> N, D = sample.shape
> nbin = empty(D, int)
> edges = D*[None]
> dedges = D*[None]
> if weights is not None:
> weights = asarray(weights)
>
> try:
> M = len(bins)
> if M != D:
> raise AttributeError, 'The dimension of bins must be a equal
> to the dimension of the sample x.'
> except TypeError:
> bins = D*[bins]
>
> # Select range for each dimension
> # Used only if number of bins is given.
> if range is None:
> smin = atleast_1d(array(sample.min(0), float))
> smax = atleast_1d(array(sample.max(0), float))
> else:
> smin = zeros(D)
> smax = zeros(D)
> for i in arange(D):
> smin[i], smax[i] = range[i]
>
> # Make sure the bins have a finite width.
> for i in arange(len(smin)):
> if smin[i] == smax[i]:
> smin[i] = smin[i] - .5
> smax[i] = smax[i] + .5
>
> # Create edge arrays
> for i in arange(D):
> if isscalar(bins[i]):
> nbin[i] = bins[i] + 2 # +2 for outlier bins
> edges[i] = linspace(smin[i], smax[i], nbin[i]-1)
> else:
> edges[i] = asarray(bins[i], float)
> nbin[i] = len(edges[i])+1 # +1 for outlier bins
> dedges[i] = diff(edges[i])
>
> nbin = asarray(nbin)
>
> # Compute the bin number each sample falls into.
> Ncount = {}
> for i in arange(D):
> Ncount[i] = digitize(sample[:,i], edges[i])
>
> # Using digitize, values that fall on an edge are put in the right
> bin.
> # For the rightmost bin, we want values equal to the right
> # edge to be counted in the last bin, and not as an outlier.
> outliers = zeros(N, int)
> for i in arange(D):
> # Rounding precision
> decimal = int(-log10(dedges[i].min())) +6
> # Find which points are on the rightmost edge.
> on_edge = where(around(sample[:,i], decimal) ==
> around(edges[i][-1], decimal))[0]
> # Shift these points one bin to the left.
> Ncount[i][on_edge] -= 1
>
> # Flattened histogram matrix (1D)
> hist = zeros(nbin.prod(), int)
>
> # Compute the sample indices in the flattened histogram matrix.
> ni = nbin.argsort()
> shape = []
> xy = zeros(N, int)
> for i in arange(0, D-1):
> xy += Ncount[ni[i]] * nbin[ni[i+1:]].prod()
> xy += Ncount[ni[-1]]
>
> # Compute the number of repetitions in xy and assign it to the
> flattened histmat.
> if len(xy) == 0:
> return zeros(nbin-2, int), edges
>
> flatcount = bincount(xy, weights)
> a = arange(len(flatcount))
> hist[a] = flatcount
> # Shape into a proper matrix
> hist = hist.reshape(sort(nbin))
> mustPermute = True
> while mustPermute:
> nothingChanged = True
> for i in arange(nbin.size):
> j = ni[i]
> if j != i:
> nothingChanged = False
> hist = hist.swapaxes(i,j)
> ni[i],ni[j] = ni[j],ni[i]
> if nothingChanged:
> mustPermute = False
> #print "THis is after swapping the axis ", hist.shape
> # Remove outliers (indices 0 and -1 for each dimension).
> core = D*[slice(1,-1)]
> hist = hist[core]
>
> # Normalize if normed is True
> if normed:
> s = hist.sum()
> for i in arange(D):
> shape = ones(D, int)
> shape[i] = nbin[i]-2
> hist = hist / dedges[i].reshape(shape)
> hist /= s
>
> return hist, edges
>
> --
> **********************************************
> * Chris Lee *
> * Laser physics and nonlinear optics group *
> * MESA+ Institute *
> * University of Twente *
> * Phone: ++31 (0)53 489 3968 *
> * fax: ++31 (0) 53 489 1102 *
> **********************************************
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20080215/a58e1ae6/attachment.html>
More information about the SciPy-User
mailing list