Numpy outlier removal

Robert Kern robert.kern at gmail.com
Mon Jan 7 10:35:05 EST 2013


On 07/01/2013 15:20, Oscar Benjamin wrote:
> On 7 January 2013 05:11, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Mon, 07 Jan 2013 02:29:27 +0000, Oscar Benjamin wrote:
>>
>>> On 7 January 2013 01:46, Steven D'Aprano
>>> <steve+comp.lang.python at pearwood.info> wrote:
>>>> On Sun, 06 Jan 2013 19:44:08 +0000, Joseph L. Casale wrote:
>>>>
>>>> I'm not sure that this approach is statistically robust. No, let me be
>>>> even more assertive: I'm sure that this approach is NOT statistically
>>>> robust, and may be scientifically dubious.
>>>
>>> Whether or not this is "statistically robust" requires more explanation
>>> about the OP's intention.
>>
>> Not really. Statistics robustness is objectively defined, and the user's
>> intention doesn't come into it. The mean is not a robust measure of
>> central tendency, the median is, regardless of why you pick one or the
>> other.
>
> Okay, I see what you mean. I wasn't thinking of robustness as a
> technical term but now I see that you are correct.
>
> Perhaps what I should have said is that whether or not this matters
> depends on the problem at hand (hopefully this isn't an important
> medical trial) and the particular type of data that you have; assuming
> normality is fine in many cases even if the data is not "really"
> normal.

"Having outliers" literally means that assuming normality is not fine. If 
assuming normality were fine, then you wouldn't need to remove outliers.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco




More information about the Python-list mailing list