[SciPy-User] multivariate empirical distribution function, avoid double loop ?

Alan G Isaac alan.isaac at gmail.com
Wed Aug 24 14:27:15 EDT 2011


On 8/24/2011 10:23 AM, josef.pktd at gmail.com wrote:
> Does anyone know whether there is an algorithm that avoids the double
> loop to get a multivariate empirical distribution function?

I think that is pretty standard.
I'll attach something posted awhile ago.
It seemed right at the time, but I did
not test it.  Once upon a time it was at
http://svn.scipy.org/svn/scipy/trunk/scipy/sandbox/dhuard/stats.py

Cheers,
Alan


def empiricalcdf(data, method='Hazen'):
     """Return the empirical cdf.
     
     Methods available (here i goes from 1 to N)
         Hazen:       (i-0.5)/N
         Weibull:     i/(N+1)
         Chegodayev:  (i-.3)/(N+.4)
         Cunnane:     (i-.4)/(N+.2)
         Gringorten:  (i-.44)/(N+.12)
         California:  (i-1)/N

     :author: David Huard
     """
     i = np.argsort(np.argsort(data)) + 1.
     nobs = len(data)
     method = method.lower()
     if method == 'hazen':
         cdf = (i-0.5)/nobs
     elif method == 'weibull':
         cdf = i/(nobs+1.)
     elif method == 'california':
         cdf = (i-1.)/nobs
     elif method == 'chegodayev':
         cdf = (i-.3)/(nobs+.4)
     elif method == 'cunnane':
         cdf = (i-.4)/(nobs+.2)
     elif method == 'gringorten':
         cdf = (i-.44)/(nobs+.12)
     else:
         raise 'Unknown method. Choose among Weibull, Hazen, Chegodayev, Cunnane, Gringorten and California.'
     return cdf




More information about the SciPy-User mailing list