[scikit-learn] Bayesian Gaussian Mixture
Andreas Mueller
t3kcit at gmail.com
Wed Nov 30 15:50:39 EST 2016
There are plenty of examples and plots on the scikit-learn website.
On 11/30/2016 12:17 PM, Tommaso Costanzo wrote:
>
> Dear Andreas,
>
> thank you so much for your answser now I can see my mistake. What I am
> trying to do is convince myself that the fact that when I analyze my
> data I am getting probability of only 0 and 1 is it because the data
> are well separated so I was trying to make some synthetic data where
> there is a probabioity different from 0 or 1, but I did it in the
> wrong way. Does it sounds correct if I make 300 samples with random
> number centered at 0 and STD 1 and other 300 centered at 0.5 and then
> adding some samples in between these two gaussian distributions (say
> in between 0.15 and 0.35)? In this case I think that I should expect
> probability different from 0 or 1 in the two components (when using 2
> components).
>
> Thank you in advance
> Tommaso
>
> On Nov 28, 2016 11:58 AM, "Andreas Mueller" <t3kcit at gmail.com
> <mailto:t3kcit at gmail.com>> wrote:
>
> Hi Tommaso.
> So what's the issue? The distributions are very distinct, so there
> is no confusion.
> The higher the dimensionality, the further apart the points are
> (compare the distance between (-1, 1) and (1, -1) to the one
> between (-1, -.5, 0, .5, 1) and (1, .5, 0, -.5, -1).
> I'm not sure what you mean by "the cross in the middle".
> You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)
> and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,
> these points are very far apart.
> Then you add standard normal noise to it. So this data is two
> perfect Gaussians. In low dimensions, they are "close together" so
> there is some confusion,
> in high dimensions, they are "far apart" so there is less confusion.
>
> Hth,
> Andy
>
> On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:
>> Hi Jacob,
>>
>> I have just changed my code from BayesianGaussianMixture to
>> GaussianMixture, and the results is the same. I attached here the
>> picture of the first component when I runned the code with 5, 10,
>> and 50 nfeatures and 2 components. In my short test function I
>> expect to have point that they can be in one component as well as
>> another has visible for small number of nfeatures, but 0 1 for
>> nfeatures >50 does not sounds correct. Seems that is just
>> related to the size of the model and in particular to the number
>> of features. With the BayesianGaussianMixture I have seen that it
>> is sligthly better to increase the degree of freedoms to
>> 2*nfeatures instead of the default nfeatures. However, this does
>> not change the result when the nfeatures are 50 or more.
>>
>> Thank you in advance
>> Tommaso
>>
>> 2016-11-25 21:32 GMT-05:00 Jacob Schreiber
>> <jmschreiber91 at gmail.com <mailto:jmschreiber91 at gmail.com>>:
>>
>> Typically this means that the model is so confident in its
>> predictions it does not believe it possible for the sample to
>> come from the other component. Do you get the same results
>> with a regular GaussianMixture?
>>
>> On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo
>> <tommaso.costanzo01 at gmail.com
>> <mailto:tommaso.costanzo01 at gmail.com>> wrote:
>>
>> Hi,
>>
>> I am facing some problem with the
>> "BayesianGaussianMixture" function, but I do not know if
>> it is because of my poor knowledge on this type of
>> statistics or if it is something related to the
>> algorithm. I have set of data of around 1000 to 4000
>> observation (every feature is a spectrum of around 200
>> point) so in the end I have n_samples = ~1000 and
>> n_features = ~20. The good things is that I am getting
>> the same results of KMeans however the "predict_proba"
>> has value only of 0 or 1.
>>
>> I have wrote a small function to simulate my problem with
>> random data that is reported below. The first 1/2 of the
>> array has the point with a positive slope while the
>> second 1/2 has a negative slope, so the cross in the
>> middle. What I have seen is that for a small number of
>> features I obtain good probability, but if the number of
>> features increases (say 50) than the probability become
>> only 0 or 1.
>> Can someone help me in interpret this result?
>>
>> Here is the code I wrote with the generated random
>> number, I'll generally run it with ncomponent=2 and
>> nfeatures=5 or 10 or 50 or 100. I am not sure if it will
>> work in every case is not very highly tested. I have also
>> attached as a file!
>>
>> ##########################################################################
>> import numpy as np
>> from sklearn.mixture import GaussianMixture,
>> BayesianGaussianMixture
>> import matplotlib.pyplot as plt
>>
>> def test_bgm(ncomponent, nfeatures):
>> temp = np.random.randn(500,nfeatures)
>> temp = temp + np.arange(-1,1, 2.0/nfeatures)
>> temp1 = np.random.randn(400,nfeatures)
>> temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
>> X = np.vstack((temp, temp1))
>>
>> bgm =
>> BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)
>>
>> bgm_proba = bgm.predict_proba(X)
>> bgm_labels = bgm.predict(X)
>>
>> plt.figure(-1)
>> plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
>> interpolatio='none')
>> plt.colorbar()
>>
>> for i in np.arange(0,ncomponent):
>> plt.figure(i)
>> plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
>> interpolatio='none')
>> plt.colorbar()
>>
>> plt.show()
>> ##############################################################################
>>
>> Thank you in advance
>> Tommaso
>>
>>
>> --
>> Please do NOT send Microsoft Office Attachments:
>> http://www.gnu.org/philosophy/no-word-attachments.html
>> <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>> --
>> Please do NOT send Microsoft Office Attachments:
>> http://www.gnu.org/philosophy/no-word-attachments.html
>> <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>> <https://mail.python.org/mailman/listinfo/scikit-learn>
> _______________________________________________ scikit-learn
> mailing list scikit-learn at python.org
> <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
> <https://mail.python.org/mailman/listinfo/scikit-learn>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161130/9d82b335/attachment-0001.html>
More information about the scikit-learn
mailing list