[SciPy-User] multivariate empirical distribution function, avoid double loop ?

Wed Aug 24 21:23:09 EDT 2011

On Wed, Aug 24, 2011 at 7:25 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Aug 24, 2011 at 09:23,  <josef.pktd at gmail.com> wrote:
>> Does anyone know whether there is an algorithm that avoids the double
>> loop to get a multivariate empirical distribution function?
>>
>> for point in data:
>>     count how many points in data are smaller or equal to point
>>
>> with 1d data it's just argsort(argsort(data))
>>
>> double loop version with some test cases is attached.
>>
>> I didn't see a way that sorting would help.
>
> If you can bear to make a few (nobs, nobs) bool arrays, you can do
> just a kvars-sized loop in Python:
>
> dominates = np.ones((len(data), len(data)), dtype=bool)
> for x in data.T:
>    dominates &= x[:,np.newaxis] > x
> sorta_ranks = dominates.sum(axis=1)

Thanks, quite a bit better, 14 times faster for (5000,2) and still 2.5
times faster for (5000,20),
12 times for (10000,3) compared to my original.

Josef

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>