[Numpy-discussion] help in improving data analysis code
Francesc Altet
faltet at carabos.com
Fri Nov 25 07:28:04 EST 2005
A Divendres 25 Novembre 2005 15:24, gf va escriure:
>
> from numarray import add, array, asarray, absolute, argsort, floor, take,
> size
>
> def mean(m,axis=0):
> m = asarray(m)
> return add.reduce(m,axis)/float(m.shape[axis])
>
> def eliminate_outliers(dat,frac):
> num_to_eliminate = int(floor(size(dat,0)*frac))
> for i in range(num_to_eliminate):
> ind = argsort(absolute(dat-mean(dat)),0)
> sdat = take(dat,ind,0)[:,0]
> dat = sdat[:-1]
> return dat
>
> #--------------------------------------------------------------------
>
> if __name__ == "__main__":
> from MLab import rand
> sz = 100
> nn = rand(sz,1)
> nn[:10] = 20*rand(10,1)
> nn[sz-10:] = -20*rand(10,1)
> print eliminate_outliers(nn,0.10)
For sz=100, the next line of code is 10x faster on my machine (more if
sz is bigger):
print nn[argsort(abs(nn_c-nn_c.mean()),0)][:-int(sz*0.10),0]
I haven't checked it very carefully, so you should double check it.
BTW, you will need to use the numarray MLab interface:
from numarray.mlab import rand
Cheers,
--
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"
More information about the NumPy-Discussion
mailing list