[SciPy-User] Question about gaussian_kde

Mon Apr 12 13:48:01 EDT 2010

On Mon, Apr 12, 2010 at 11:00, Jorge Scandaliaris
<jorgesmbox-ml at yahoo.es> wrote:
> Hi,
> I am using gaussian_kde mainly to visualize the distribution of some 2D
> measurements. It works ok but there's something I don't understand. When I
> evaluate the estimated pdf, the peaks have values larger than one. If I use
> integrate_box(), however, the results seem correct. How should I normalize the
> values obtained from evaluate? Dividing by the number of datapoints?

The values are correct. Remember that this is a probability *density*.
All that means is that the integral over the domain is equal to 1. The
value at any point is not a probability itself; it just needs to be
non-negative. For example:

In [1]: from scipy import stats

In [2]: stats.norm.pdf(0.0, scale=0.01)
Out[2]: 39.894228040143268

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco