[Pandas-dev] DataFrame.value_counts

William Ayd william.ayd at icloud.com
Sun Sep 15 16:45:35 EDT 2019


Hi Daniel,

Thanks for the feedback. There is actually already a PR to implement this which I think is getting close:

https://github.com/pandas-dev/pandas/pull/27350 <https://github.com/pandas-dev/pandas/pull/27350>

Would certainly welcome any feedback you can offer there in terms of trying it out on your end and/or taking part in the review process.

- Will

> On Sep 15, 2019, at 12:17 PM, Daniel Saxton via Pandas-dev <pandas-dev at python.org> wrote:
> 
> Currently in pandas if we want to count the values for a single column of a DataFrame we would use df["a"].value_counts(), but when we want to count combinations of more than one column we (as far as I know) have to switch syntax and use df.groupby(["a", "b"]).size().  This is a little awkward code-wise and likely carries some unnecessary overhead since we don't actually need to prepare a groupby object that can handle an arbitrary calculation on the subframes.  There's some evidence of this overhead in the Series case:
> 
> import numpy as np                                               
> import pandas as pd                                              
> 
> s = pd.Series(np.random.randint(1, 10, 10**6))                   
> 
> %timeit s.value_counts()                                         
> # 6.74 ms ± 78.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
> 
> %timeit s.groupby(s).size()                                      
> # 11.7 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
> 
> I think it would be useful and more efficient if there was a DataFrame.value_counts method, which could take a required columns argument indicating the combinations over which we want to count.  This seems like a common enough operation that it might be worthwhile to add this functionality, but wanted to see what other opinions there were on this.  I know pandas already has a huge number of methods and it's good to resist adding more, but I would see this more as "filling out" rather than "adding to" the API.
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190915/87580034/attachment.html>


More information about the Pandas-dev mailing list