histogram complete makeover
Neal Becker
ndbecker2 at gmail.com
Tue Oct 17 20:46:51 EDT 2006
David Huard wrote:
> Hi all,
>
> I'd like to poll the list to see what people want from numpy.histogram(),
> since I'm currently writing a contender.
>
> My main complaints with the current version are:
> 1. upper outliers are stored in the last bin, while lower outliers are not
> counted at all,
> 2. cannot use weights.
>
> The new histogram function is well under way (it address these issues and
> adds an axis keyword),
> but I want to know what is the preferred behavior regarding the function
> output, and your
> willingness to introduce a new behavior that will break some code.
>
> Given a number of bins N and range (min, max), histogram constructs
> linearly spaced bin edges
> b0 (out-of-range) | b1 | b2 | b3 | .... | bN | bN+1 out-of-range
> and may return:
>
> A. H = array([N_b0, N_b1, ..., N_bN, N_bN+1])
> The out-of-range values are the first and last values of the array. The
> returned array is hence N+2
>
> B. H = array([N_b0 + N_b1, N_b2, ..., N_bN + N_bN+1])
> The lower and upper out-of-range values are added to the first and last
> bin respectively.
>
> C. H = array([N_b1, ..., N_bN + N_bN+1])
> Current behavior: the upper out-of-range values are added to the last bin.
>
> D. H = array([N_b1, N_b2, ..., N_bN]),
> Lower and upper out-of-range values are given after the histogram array.
>
> Ideally, the new function would not break the common usage: H =
> histogram(x)[0], so this exclude A. B and C are not acceptable in my
> opinion, so only D remains, with the downsize that the outliers are not
> returned. A solution might be to add a keyword full_output=False, which
> when set to True, returns the out-of-range values in a dictionnary.
>
> Also, the current function returns -> H, ledges
> where ledges is the array of left bin edges (N).
> I propose returning the complete array of edges (N+1), including the
> rightmost edge. This is a little bit impractical for plotting, as the
> edges array does not have the same length as the histogram array, but
> allows the use of user-defined non-uniform bins.
>
> Opinions, suggestions ?
>
> David
I have my own histogram that might interest you. The core is modern c++,
with boost::python wrapper.
Out-of-bounds behavior is programmable. I'll send it to you if you are
interested.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
More information about the NumPy-Discussion
mailing list