[Numpy-discussion] Sampling from the multivariate normal
Robert Kern
robert.kern at gmail.com
Wed Nov 9 12:14:27 EST 2011
On Wed, Nov 9, 2011 at 16:40, Joshua Anthony Reyes
<joshua.reyes at gmail.com> wrote:
> Hi List,
>
> I'm new to Numpy and I'm a little confused about the behavior of
> numpy.random.multivariate_normal(). I'm not sure I'm passing the
> variances correctly. My goal is to sample from a bivariate normal, but
> the kooky behavior shows up when I sample from a univariate distribution.
>
> In short, the multivariate normal function doesn't seem to give me
> values in the appropriate ranges. Here's an example of what I mean.
>
> In[1]: from numpy.random import normal, multivariate_normal
>
> In [2]: normal(100, 100, 10)
> Out[2]:
> array([ 258.62344586, 70.16378815, 49.46826997, 49.58567724,
> 182.68652256, 226.67292034, 92.03801549, 18.2686146 ,
> 94.09104313, 80.35697507])
Here, you are asking for a Gaussian distribution with a mean of 100
and a stdev of 100 (or equivalently, a variance of 10000).
> The samples look about right to me. But then when I try to do the same
> using the multivariate_normal, the values it draws look too close to the
> mean.
>
> In [3]: multivariate_normal([100], [[100]], 10)
Here you are asking for a Gaussian with a mean of 100 and a variance
of 100 (or equivalently, a stdev of 10). Note the differences between
the two signatures:
np.random.normal(mean, stdev)
np.random.multivariate_normal(mean, covariance)
> Out[3]:
> array([[ 109.10083984],
> [ 97.43526716],
> [ 108.43923772],
> [ 97.87345947],
> [ 103.405562 ],
> [ 110.2963736 ],
> [ 103.96445585],
> [ 90.58168544],
> [ 91.20549222],
> [ 104.4051761 ]])
>
> These values all fall within 10 units of the mean.
>
> In [4]: multivariate_normal([100], [[1000]], 10)
> Out[4]:
> array([[ 62.04304611],
> [ 123.29364557],
> [ 83.53840083],
> [ 64.67679778],
> [ 127.82433157],
> [ 11.3305656 ],
> [ 95.4101701 ],
> [ 126.53213908],
> [ 104.68868736],
> [ 32.45886112]])
>
> In [5]: normal(100, 1000, 10)
> Out[5]:
> array([ 1052.93998938, -1254.12576419, 258.75390045, 688.32715327,
> -2.36543806, -1570.54842269, 472.90045029, 394.62908014,
> 137.10320437, 1741.85017871])
>
> And just to exaggerate things a little more:
>
> In [6]: multivariate_normal([100], [[10000]], 10).T][0]
> Out[6]:
> array([ 274.45446694, 85.79359682, 245.03248721, 120.10912405,
> -34.76526896, 134.93446664, 47.6768889 , 93.34140984,
> 80.27244669, 229.64700591])
>
> Whereas
> In [7]: normal(100, 10000, 10)
> Out[7]:
> array([ -554.68666687, 3724.59638363, -14873.55303901, -3111.22162495,
> -10813.66412901, 4688.81310356, -9510.92470735, -12689.02667559,
> -10379.01381925, -4534.60480031])
>
> I'm shocked that I don't get some negative values in Out[4]. And Out[6]
> really ought to have some numbers in the thousands.
>
> I'd totally be willing to believe that I don't understand the
> multivariate normal and/or variance. Can someone tell me whether these
> numbers look sane?
>
> For the bivariate case I do something like this:
>
> means = [100, 100]
> variances = [100, 1000]
> Sx, Sy = variances
> sx, sy = map(sqrt, variances)
> cor = 0.7
> cov = [[Sx, cor*sx*sy], [cor*sy*sx, Sy]]
> draws = 10
> samples = multivariate_normal(means, cov, draws)
>
> As mentioned before, the samples are shockingly close to their means.
They look right to me.
[~]
|19> means = [100, 100]
[~]
|20> variances = [100, 1000]
[~]
|21> Sx, Sy = variances
[~]
|22> sx, sy = map(sqrt, variances)
[~]
|23> cor = 0.7
[~]
|24> cov = np.array([[Sx, cor*sx*sy], [cor*sy*sx, Sy]])
[~]
|26> samples = np.random.multivariate_normal(means, cov, 10000)
[~]
|27> cov
array([[ 100. , 221.35943621],
[ 221.35943621, 1000. ]])
[~]
|28> np.cov(samples.T)
array([[ 101.16844481, 222.00301056],
[ 222.00301056, 1001.58403922]])
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list