[Python-ideas] Additions to collections.Counter and a Counter derived class

Brendan Barnwell brenbarn at brenbarn.net
Wed Mar 15 14:14:39 EDT 2017


On 2017-03-15 11:06, David Mertz wrote:
>     Just because a data point is uncommon doesn't mean it is an outlier.
>
>
> That's kinda *by definition* what an outlier is in categorical data!

	Not really.  Or rather, it depends what you mean by "uncommon".  But 
this thread is about adding "least_common", and just because a data 
point is among the least frequent doesn't mean it's an outlier.  You 
explained why yourself:

> I realize from my example, however, that I'm probably more interested in the actual uncommonality, not the specific `.least_common()`.

	Exactly.  If you have one data point that occurs once, another that 
occurs twice, another that occurs three times, and so on up to 10, then 
the "least common" one (or two or three) isn't an outlier.  To be an 
outlier, it would have to be "much less common than the rest".  That is, 
what matters is not the frequency rank but the magnitude of the 
separation in frequency between the outliers and the nonoutliers.  But 
that's a much subtler notion than just "least common".

-- 
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."
    --author unknown


More information about the Python-ideas mailing list