[Pandas-dev] DataFrame.value_counts
William Ayd
william.ayd at icloud.com
Sun Sep 15 16:45:35 EDT 2019
Hi Daniel,
Thanks for the feedback. There is actually already a PR to implement this which I think is getting close:
https://github.com/pandas-dev/pandas/pull/27350 <https://github.com/pandas-dev/pandas/pull/27350>
Would certainly welcome any feedback you can offer there in terms of trying it out on your end and/or taking part in the review process.
- Will
> On Sep 15, 2019, at 12:17 PM, Daniel Saxton via Pandas-dev <pandas-dev at python.org> wrote:
>
> Currently in pandas if we want to count the values for a single column of a DataFrame we would use df["a"].value_counts(), but when we want to count combinations of more than one column we (as far as I know) have to switch syntax and use df.groupby(["a", "b"]).size(). This is a little awkward code-wise and likely carries some unnecessary overhead since we don't actually need to prepare a groupby object that can handle an arbitrary calculation on the subframes. There's some evidence of this overhead in the Series case:
>
> import numpy as np
> import pandas as pd
>
> s = pd.Series(np.random.randint(1, 10, 10**6))
>
> %timeit s.value_counts()
> # 6.74 ms ± 78.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> %timeit s.groupby(s).size()
> # 11.7 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> I think it would be useful and more efficient if there was a DataFrame.value_counts method, which could take a required columns argument indicating the combinations over which we want to count. This seems like a common enough operation that it might be worthwhile to add this functionality, but wanted to see what other opinions there were on this. I know pandas already has a huge number of methods and it's good to resist adding more, but I would see this more as "filling out" rather than "adding to" the API.
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190915/87580034/attachment.html>
More information about the Pandas-dev
mailing list