Numpy outlier removal

Sun Jan 6 17:33:54 EST 2013

On 6/01/13 20:44:08, Joseph L. Casale wrote:
> I have a dataset that consists of a dict with text descriptions and values that are integers. If
> required, I collect the values into a list and create a numpy array running it through a simple
> routine: data[abs(data - mean(data)) < m * std(data)] where m is the number of std deviations
> to include.
> 
> 
> The problem is I loos track of which were removed so the original display of the dataset is
> misleading when the processed average is returned as it includes the removed key/values.
> 
> 
> Ayone know how I can maintain the relationship and when I exclude a value, remove it from
> the dict?

Assuming your data and the dictionary are keyed by a common set of keys:

for key in descriptions:
    if abs(data[key] - mean(data)) >= m * std(data):
        del data[key]
        del descriptions[key]

Hope this helps,

-- HansM