[SciPy-Dev] Histogram as its own class

Johann Goetz theodore.goetz at gmail.com
Fri Aug 15 10:59:47 EDT 2014


Hello,
I'm a long-time user of scipy doing mostly multivariate big-data (several
terabytes) analysis in the high-energy physics realm. One thing I've found
useful was to promote the histogram to it's own class. Instead of creating
yet another package, I have a mind to include it into the scipy.stats
module and I would like some feed-back. I.e. is this the right place for
such an object?

I have some documentation, but not enough I would say, and the classes are
currently buried in my "pyhep" project, but they are easily extracted out.

https://bitbucket.org/theodoregoetz/pyhep/wiki/Home

Here are some details:

The histograms I am addressing are N-dimensional over a continuous-domain
(floating-point data, no gaps - though bins can have value inf or nan if
need-be) along each axis. The axes need not be uniform.

There are two classes: HistogramAxis and Histogram. The Axes are always
floating point, but the histogram's data can be any dtype (default: np.int,
a "cast" to float is done when dividing two histograms). I make use of
np.histogramdd() and store the data along with the uncertainty. Many
operations are supported including adding, subtracting, multiplying,
dividing, bin-merging, cutting/clipping along one or more axes, projecting
along an axis, iterating over an axis, filling from a sample with or
without weights.

Most of power in this package is in the fitting method of the histogram
which makes use of scipy.curve_fit(). It handles missing data (when a bin
is inf or nan), can include the uncertainty in the fit, and calculates a
goodness of fit.

On top of this, I have free functions to plot 1D and 2D histograms using
matplotlib, as well as functions to handle reading in large HDF5 files.
These are auxiliary and may not fit into scipy directly.

Thank you all,
Johann.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140815/6c10b2c8/attachment.html>


More information about the SciPy-Dev mailing list