[Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Mar 12 23:34:41 EDT 2018


On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser
<wieser.eric+numpy at gmail.com> wrote:
>> Given that the bin selection are data driven, transferring them across datasets might not be so useful.
>
> The main application would be to compute bins across the union of all
> datasets. This is already possibly by using `np.histogram` and
> discarding the first result, but that's super wasteful.

assuming "union" means a combined dataset.

If you stack  datasets, then the number of observations will not be
correct for individual datasets.

In that case an additional keyword like nobs, or whatever name would
be appropriate for numpy, would be useful, e.g. use the average number
of observations across datasets.
Auxiliary statistic like std could then be computed on the total
dataset (if that makes sense, which would not be the case if the
variance across datasets is larger than the variance within datasets.

Josef

> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


More information about the NumPy-Discussion mailing list