[SciPy-User] visual inspection of kernel density estimators in 2d

Robert Kern robert.kern at gmail.com
Thu May 26 21:59:20 EDT 2011


On Thu, May 26, 2011 at 19:22,  <josef.pktd at gmail.com> wrote:
> On Thu, May 26, 2011 at 4:31 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Thu, May 26, 2011 at 15:17,  <josef.pktd at gmail.com> wrote:
>>> I'm trying to find a visual way to see whether a kernel density
>>> estimator "looks" good in 2d. For univariate it is easy to compare
>>> histogram and the estimated density.
>>>
>>> Does anyone know what graphs would give a good visual inspection?
>>>
>>> My attempt with contours, which should show some oversmoothing if the
>>> default gaussian_kde is used on mixture distributions:
>>>
>>> http://picasaweb.google.com/josef.pktd/Joepy#5611119163809333922
>>
>> With a bit of tweaking, that would work fine. A few suggestions:
>>
>> 1. Use smaller dots for the data, and maybe less color.
>> 2. Use fewer contour lines, without labels. Maybe just make the
>> contours that would have labels be a bit thicker so you can make a
>> good apples-to-apples comparison between the two sets of contours.
>> 3. Make the true contour lines gray and in the background.
>> 4. Make the estimated contour lines black and in the foreground. I.e.
>> draw the dots, then the true contours, then the estimated contours.
>
> Thanks Robert,
>
> I still have to figure out how to do many of these things with
> matplotlib, for example I have to find the manual to change the line
> width.

There may not actually be one.  :-)

You may have to overplot with a second set of contours that only pick
out one or two contours with a thicker line.

> here is some improvement in this direction
> http://picasaweb.google.com/josef.pktd/Joepy#5611180514445456594
>
> here is a first attempt at 3d
> http://picasaweb.google.com/josef.pktd/Joepy#5611133359567574274
>
> This actually shows the oversmoothing quite clearly.
>
>>
>> You might also try a colormapped image plot of the difference between
>> the two densities. Plot the data as small points (i.e. with 'k,') over
>> the residual image.
>
> just to see how it looks I tried the color overlay of the difference
> on the same graph . It's a bit weird but quite informative.
>
> http://picasaweb.google.com/josef.pktd/Joepy#5611180522655961714
>
> (Given the nice colors in the last graph, I will try for a Miro next. :)

I have a long rant about such colormaps (I am red-green colorblind),
but I'll save it. It suffices to say that you would be better served
with a diverging colormap, one that goes from a deep color of one hue
at the most negative to white at 0 and the to a deep color of another
hue, e.g. 'RdBu' is a nice one. Just be sure to center it correctly
(using vmin and vmax arguments, IIRC). I recommend a full, continuous
colormapped image instead of a small number of colored contours.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco



More information about the SciPy-User mailing list