[Pandas-dev] Series.value_counts and length of series

Vicki Brown vlb at cfcl.com
Mon Sep 9 20:30:53 EDT 2019


In trying to create a smaller reproducible test case, I discovered that the inclusion of length in the report appears to depend on the actual length of the Series returned.

Specifically, a Series of length 43 only reported on Name and dtype; a Series of length 66 also reports that length.

>> a piece of code that is self-contained and we can run to reproduce the issue

```
import pandas as pd

a_list = [13.1, 13.1, 13.0, 13.0, 14.1, 14.0, 14.0, 14.1, 13.7, 13.7, 13.7, 13.5, 14.4, 14.4, 14.3, 14.3, 14.2, 14.3, 14.3, 14.1, 14.1, 13.9, 14.0, 14.0, 14.0, 14.0, 14.5, 14.4, 14.3, 14.2, 14.0, 14.4, 14.3, 14.0, 13.7, 14.3, 14.3, 14.1, 14.0, 13.8, 14.1, 14.0, 14.0, 13.9, 13.4, 14.3, 14.7, 14.0, 13.6, 14.4, 14.9, 14.2, 13.6, 13.2, 13.0, 14.3, 13.9, 13.5, 13.0, 14.2, 16.2, 15.8, 14.0, 13.6, 13.2, 15.2, 14.6, 14.3, 14.2, 14.2, 15.1, 14.6, 14.3, 14.0, 13.7, 14.3, 14.2, 14.3, 14.3, 14.3, 14.3, 14.1, 13.7, 13.6, 13.6, 14.0, 14.0, 13.6, 13.2, 13.1, 14.9, 14.1, 13.8, 13.9, 13.8, 14.6, 14.4, 14.3, 14.2]

b_list = [0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.3, 0.6, 0.6, 0.8, 3.4, 7.4, 1.6, 3.3, 6.0, 10.2, 0.7, 0.7, 1.2, 5.4, 9.3, 4.1, 4.9, 5.2, 7.1, 13.7, 2.7, 3.0, 5.9, 12.3, 1.8, 4.0, 3.5, 12.2, 16.3, 18.1, 7.8, 10.8, 14.8, 19.5, 7.8, 4.2, 4.4, 10.0, 12.2, 17.7, 5.4, 6.1, 7.1, 7.4, 13.0, 6.3, 6.7, 8.4, 11.0, 1.3, 8.5, 8.8, 8.9, 10.7, 12.5, 10.5, 11.5, 15.4, 16.9, 17.8, 17.3, 17.3, 17.7, 20.0, 21.1, 10.4, 12.9, 15.3, 16.4, 16.4, 12.0, 12.8, 14.6, 16.0]

a = pd.Series(a_list)
a.value_counts()

b = pd.Series(b_list)
b.value_counts()

```

> On Sep 8, 2019, at 13:29 , Joris Van den Bossche <jorisvandenbossche at gmail.com> wrote:
> 
> Hi,
> 
> That should not happen (the length is normally part of the Series representation, independent of the data type or the content or length of the Series). Can you provide a reproducible example? (a piece of code that is self-contained and we can run to reproduce the issue)
> 
> Best,
> Joris
> 
> On Sun, 8 Sep 2019 at 22:24, Vicki Brown <vlb at cfcl.com <mailto:vlb at cfcl.com>> wrote:
> Hi -
> 
> I have a dataset:
> 
>         <class 'pandas.core.frame.DataFrame'>
>         RangeIndex: 237061 entries, 0 to 237060
>         Data columns (total 23 columns):
>         Date                                 237061 non-null datetime64[ns]
>         Station Number                       237061 non-null object
>         Depth                                237061 non-null float64
>         ...
> 
> For three of the columns, I have calculated value_counts. 
> For two of those, the result includes the length of the set; for the third, it does not.
> 
> Why not?
> 
>         In [1]: dt = wq_df['Date']
>         dt_counts = dt.value_counts()
> 
>         In [2]: st = wq_df['Station Number']
>         st_counts = st.value_counts()
> 
>         In [3]: dp = wq_df['Depth']
>         dp_counts = dp.value_counts()
> 
>         In [4]: dt_counts
> 
>         Out[4]: 1969-04-10     21
>                 ...
>                 Name: Date, Length: 1172, dtype: int64
> 
>         In [5]: st_counts
>         Out[5]: 18      16622
>                 ...
>                 Name: Station Number, dtype: int64
> 
>         In [6]: dp_counts
>         Out[6]: 0.5      1962
>                 ...
>                 Name: Depth, Length: 99, dtype: int64
> 
> 
> 
> -- Vicki
> 
> Vicki Brown
> cfcl.com/vlb <http://cfcl.com/vlb>
> 
> 
> 
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org <mailto:Pandas-dev at python.org>
> https://mail.python.org/mailman/listinfo/pandas-dev <https://mail.python.org/mailman/listinfo/pandas-dev>

-- Vicki

Vicki Brown
cfcl.com/vlb



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20190909/88c9baad/attachment-0001.html>


More information about the Pandas-dev mailing list