[SciPy-user] neighbourhood of randomly scattered points

Thu Aug 30 12:28:25 EDT 2007

fred wrote:
> Robert Kern a écrit :
>> Show us your code, if you think there is a problem in it.
> I really think there is no problem in my code,
> it has been already validated.
> 
> My 2D data array is a n=501x501 array.
> If I get n points from it, the neighbourhood is uniform,
> I think this is a problem for nobody ;-)
> 
> In fact, I don't get n points, but far less, say 15000.
> 
> If these points were uniformly distributed,
> I think I could not see theses structures: theses structures are not an 
> artifact.

You *will* see the things that you call structures if the sampling is correct.
The sample will only be really uniform in the limit as you get near total
sampling. As is, you are only drawing a sample about 6% of the total. You *will*
get fluctuations.

>>  Looking at images is a
>> very poor way to judge randomness;
> Yes, but one can see structures.
> The question is : why I can see these structures .

There are no structures. Just random fluctuations that are accentuated by your
neighborhood scheme and your visual system looking for patterns.

> Do they have any meaning ?

Plotting the points in the neighborhood essentially takes the raw data and
convolves it with a kernel. That broadens the effect of each point, so you see
more low-frequency "structure" than you otherwise would. Other than that, no
meaning.

> I was expecting to see no structure at all, in fact.
> But may be I'm wrong.

Your intuition is wrong. Please accept that fact.

> I understand the trick like this : if I get 90 neighbours
> in a neighbourhood, the density of points is much higher
> than in a neighbourhood where I get only 40 neighbours per points.
> So, for me, it is not uniformly distributed.

In fact, that is *exactly* what you should see if the sampling was random.
Fluctuations of that size are expected for the parameters you've given. Only if
the sampling were non-random, like the low-discrepancy sequences, would you see
a tighter spread.

If you want to double-check the sampling, you can try a slower, but
easier-to-verify method of sampling without replacement than the shuffle method:

def sample_noreplace(nurn, nsample):
    sampled = []
    while len(sampled) != nsample:
        i = random.randint(nurn)
        if i not in sampled:
            sampled.append(i)
    return array(sampled, dtype=int)

nside = 501
nsample = 15000
assert xy.shape == (nside*nside, 2)
xy_sampled = xy[sample_noreplace(nside*nside, nsample)]

You will see similar results.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco