histogram complete makeover

Neal Becker ndbecker2 at gmail.com
Tue Oct 17 20:46:51 EDT 2006


David Huard wrote:

> Hi all,
> 
> I'd like to poll the list to see what people want from numpy.histogram(),
> since I'm currently writing a contender.
> 
> My main complaints with the current version are:
> 1. upper outliers are stored in the last bin, while lower outliers are not
> counted at all,
> 2. cannot use weights.
> 
> The new histogram function is well under way (it address these issues and
> adds an axis keyword),
> but I want to know what is the preferred behavior regarding the function
> output, and your
> willingness to introduce a new behavior that will break some code.
> 
> Given a number of bins N and range (min, max), histogram constructs
> linearly spaced bin edges
> b0 (out-of-range)  | b1 | b2 | b3 | .... | bN | bN+1 out-of-range
> and may return:
> 
> A.  H = array([N_b0, N_b1, ..., N_bN,  N_bN+1])
> The out-of-range values are the first and last values of the array. The
> returned array is hence N+2
> 
> B.  H = array([N_b0 + N_b1, N_b2, ..., N_bN + N_bN+1])
> The lower and upper out-of-range values are added to the first and last
> bin respectively.
> 
> C.  H = array([N_b1, ..., N_bN + N_bN+1])
> Current behavior: the upper out-of-range values are added to the last bin.
> 
> D.  H = array([N_b1, N_b2, ..., N_bN]),
> Lower and upper out-of-range values are given after the histogram array.
> 
> Ideally, the new function would not break the common usage: H =
> histogram(x)[0], so this exclude A.  B and C are not acceptable in my
> opinion, so only D remains, with the downsize that the outliers are not
> returned. A solution might be to add a keyword full_output=False, which
> when set to True, returns the out-of-range values in a dictionnary.
> 
> Also, the current function returns -> H, ledges
> where ledges is the array of left bin edges (N).
> I propose returning the complete array of edges (N+1), including the
> rightmost edge. This is a little bit impractical for plotting, as the
> edges array does not have the same length as the histogram array, but
> allows the use of user-defined non-uniform bins.
> 
> Opinions, suggestions ?
> 
> David

I have my own histogram that might interest you.  The core is modern c++,
with boost::python wrapper.

Out-of-bounds behavior is programmable.  I'll send it to you if you are
interested.


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the NumPy-Discussion mailing list