[Numpy-discussion] Coverting ranks to a Gaussian

Mon Jun 9 19:45:08 EDT 2008

On Mon, Jun 9, 2008 at 18:34, Keith Goodman <kwgoodman at gmail.com> wrote:
> Does anyone have a function that converts ranks into a Gaussian?
>
> I have an array x:
>
>>> import numpy as np
>>> x = np.random.rand(5)
>
> I rank it:
>
>>> x = x.argsort().argsort()
>>> x_ranked = x.argsort().argsort()
>>> x_ranked
>   array([3, 1, 4, 2, 0])

There are subtleties in computing ranks when ties are involved. Take a
look at the implementation of scipy.stats.rankdata().

> I would like to convert the ranks to a Gaussian without using scipy.

No dice. You are going to have to use scipy.special.ndtri somewhere. A
basic transformation (off the top of my head, I have no idea if this
is statistically meaningful):

  scipy.special.ndtri((ranks + 1.0) / (len(ranks) + 1.0))

Barring tied first or last items, this should give equal weight to
each of the tails outside of the range of your data.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco