[Numpy-discussion] PR to add a function to calculate histogram edges without calculating the histogram

Nathaniel Smith njs at pobox.com
Fri Mar 16 03:06:58 EDT 2018


Oh sure, I'm not suggesting it be impossible to calculate for a single data
set. If nothing else, if we had a version that accepted a list of data
sets, then you could always pass in a single-element list :-).

On Mar 15, 2018 22:10, "Eric Wieser" <wieser.eric+numpy at gmail.com> wrote:

> That sounds like a reasonable extension - but I think there still exist
> cases where you want to treat the data as one uniform set when computing
> bins (toggling between orthogonal subsets of data) so isn't really a useful
> replacement.
>
> I suppose this becomes relevant when `density` is passed to the individual
> histogram invocations. Does matplotlib handle that correctly for stacked
> histograms?
>
> On Thu, Mar 15, 2018, 20:14 Nathaniel Smith <njs at pobox.com> wrote:
>
>> Instead of an nobs argument, maybe we should have a version that accepts
>> multiple data sets, so that we have the full information and can improve
>> the algorithm over time.
>>
>> On Mar 15, 2018 7:57 PM, "Thomas Caswell" <tcaswell at gmail.com> wrote:
>>
>>> Yes I like the name.
>>>
>>> The primary use-case for Matplotlib is that our `hist` method can take
>>> in a list of arrays and produces N histograms in one shot. Currently with
>>> 'auto' we only use the first data set to sort out what the bins should be
>>> and then re-use those for the rest of the data sets.  This will let us get
>>> the bins on the merged input, but I take Josef's point that this is not
>>> actually what we want....
>>>
>>> Tom
>>>
>>> On Mon, Mar 12, 2018 at 11:35 PM <josef.pktd at gmail.com> wrote:
>>>
>>>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser
>>>> <wieser.eric+numpy at gmail.com> wrote:
>>>> >> Given that the bin selection are data driven, transferring them
>>>> across datasets might not be so useful.
>>>> >
>>>> > The main application would be to compute bins across the union of all
>>>> > datasets. This is already possibly by using `np.histogram` and
>>>> > discarding the first result, but that's super wasteful.
>>>>
>>>> assuming "union" means a combined dataset.
>>>>
>>>> If you stack  datasets, then the number of observations will not be
>>>> correct for individual datasets.
>>>>
>>>> In that case an additional keyword like nobs, or whatever name would
>>>> be appropriate for numpy, would be useful, e.g. use the average number
>>>> of observations across datasets.
>>>> Auxiliary statistic like std could then be computed on the total
>>>> dataset (if that makes sense, which would not be the case if the
>>>> variance across datasets is larger than the variance within datasets.
>>>>
>>>> Josef
>>>>
>>>> > _______________________________________________
>>>> > NumPy-Discussion mailing list
>>>> > NumPy-Discussion at python.org
>>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180316/039356f8/attachment-0001.html>


More information about the NumPy-Discussion mailing list