[scikit-learn] Bayesian Gaussian Mixture

Andreas Mueller t3kcit at gmail.com
Wed Nov 30 15:50:39 EST 2016


There are plenty of examples and plots on the scikit-learn website.

On 11/30/2016 12:17 PM, Tommaso Costanzo wrote:
>
> Dear Andreas,
>
> thank you so much for your answser now I can see my mistake. What I am 
> trying to do is convince myself that the fact that when I analyze my 
> data I am getting probability of only 0 and 1 is it because the data 
> are well separated so I was trying to make some synthetic data where 
> there is a probabioity different from 0 or 1, but I did it in the 
> wrong way. Does it sounds correct if I make 300 samples with random 
> number centered at 0 and STD 1 and other 300 centered at 0.5 and then 
> adding some samples in between these two gaussian distributions (say 
> in between 0.15 and 0.35)? In this case I think that I should expect 
> probability different from 0 or 1 in the two components (when using 2 
> components).
>
> Thank you in advance
> Tommaso
>
> On Nov 28, 2016 11:58 AM, "Andreas Mueller" <t3kcit at gmail.com 
> <mailto:t3kcit at gmail.com>> wrote:
>
>     Hi Tommaso.
>     So what's the issue? The distributions are very distinct, so there
>     is no confusion.
>     The higher the dimensionality, the further apart the points are
>     (compare the distance between (-1, 1) and (1, -1) to the one
>     between (-1, -.5, 0, .5, 1)  and (1, .5, 0, -.5, -1).
>     I'm not sure what you mean by "the cross in the middle".
>     You create two fixed points, one at np.arange(-1,1, 2.0/nfeatures)
>     and one at np.arange(1,-1, (-2.0/nfeatures)). In high dimensions,
>     these points are very far apart.
>     Then you add standard normal noise to it. So this data is two
>     perfect Gaussians. In low dimensions, they are "close together" so
>     there is some confusion,
>     in high dimensions, they are "far apart" so there is less confusion.
>
>     Hth,
>     Andy
>
>     On 11/27/2016 11:47 AM, Tommaso Costanzo wrote:
>>     Hi Jacob,
>>
>>     I have just changed my code from BayesianGaussianMixture to
>>     GaussianMixture, and the results is the same. I attached here the
>>     picture of the first component when I runned the code with 5, 10,
>>     and 50 nfeatures and 2 components. In my short test function I
>>     expect to have point that they can be in one component as well as
>>     another has visible for small number of nfeatures, but 0 1 for
>>     nfeatures >50 does  not sounds correct. Seems that is just
>>     related to the size of the model and in particular to the number
>>     of features. With the BayesianGaussianMixture I have seen that it
>>     is sligthly better to increase the degree of freedoms to
>>     2*nfeatures instead of the default nfeatures. However, this does
>>     not change the result when the nfeatures are 50 or more.
>>
>>     Thank you in advance
>>     Tommaso
>>
>>     2016-11-25 21:32 GMT-05:00 Jacob Schreiber
>>     <jmschreiber91 at gmail.com <mailto:jmschreiber91 at gmail.com>>:
>>
>>         Typically this means that the model is so confident in its
>>         predictions it does not believe it possible for the sample to
>>         come from the other component. Do you get the same results
>>         with a regular GaussianMixture?
>>
>>         On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo
>>         <tommaso.costanzo01 at gmail.com
>>         <mailto:tommaso.costanzo01 at gmail.com>> wrote:
>>
>>             Hi,
>>
>>             I am facing some problem with the
>>             "BayesianGaussianMixture" function, but I do not know if
>>             it is because of my poor knowledge on this type of
>>             statistics or if it is something related to the
>>             algorithm. I have set of data of around 1000 to 4000
>>             observation (every feature is a spectrum of around 200
>>             point) so in the end I have n_samples = ~1000 and
>>             n_features = ~20. The good things is that I am getting
>>             the same results of KMeans however the "predict_proba"
>>             has value only of 0 or 1.
>>
>>             I have wrote a small function to simulate my problem with
>>             random data that is reported below. The first 1/2 of the
>>             array has the point with a positive slope while the
>>             second 1/2 has a negative slope, so the cross in the
>>             middle. What I have seen is that for a small number of
>>             features I obtain good probability, but if the number of
>>             features increases (say 50) than the probability become
>>             only 0 or 1.
>>             Can someone help me in interpret this result?
>>
>>             Here is the code I wrote with the generated random
>>             number, I'll generally run it with ncomponent=2 and
>>             nfeatures=5 or 10 or 50 or 100. I am not sure if it will
>>             work in every case is not very highly tested. I have also
>>             attached as a file!
>>
>>             ##########################################################################
>>             import numpy as np
>>             from sklearn.mixture import GaussianMixture,
>>             BayesianGaussianMixture
>>             import matplotlib.pyplot as plt
>>
>>             def test_bgm(ncomponent, nfeatures):
>>                 temp = np.random.randn(500,nfeatures)
>>                 temp = temp + np.arange(-1,1, 2.0/nfeatures)
>>                 temp1 = np.random.randn(400,nfeatures)
>>                 temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
>>                 X = np.vstack((temp, temp1))
>>
>>                 bgm =
>>             BayesianGaussianMixture(ncomponent,degrees_of_freedom_prior=nfeatures*2).fit(X)
>>
>>                 bgm_proba = bgm.predict_proba(X)
>>                 bgm_labels = bgm.predict(X)
>>
>>                 plt.figure(-1)
>>                 plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
>>             interpolatio='none')
>>                 plt.colorbar()
>>
>>                 for i in np.arange(0,ncomponent):
>>             plt.figure(i)
>>             plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
>>             interpolatio='none')
>>             plt.colorbar()
>>
>>                 plt.show()
>>             ##############################################################################
>>
>>             Thank you in advance
>>             Tommaso
>>
>>
>>             -- 
>>             Please do NOT send Microsoft Office Attachments:
>>             http://www.gnu.org/philosophy/no-word-attachments.html
>>             <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>>             _______________________________________________
>>             scikit-learn mailing list
>>             scikit-learn at python.org <mailto:scikit-learn at python.org>
>>             https://mail.python.org/mailman/listinfo/scikit-learn
>>             <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>         _______________________________________________
>>         scikit-learn mailing list
>>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>>         https://mail.python.org/mailman/listinfo/scikit-learn
>>         <https://mail.python.org/mailman/listinfo/scikit-learn>
>>
>>
>>
>>
>>     -- 
>>     Please do NOT send Microsoft Office Attachments:
>>     http://www.gnu.org/philosophy/no-word-attachments.html
>>     <http://www.gnu.org/philosophy/no-word-attachments.html>
>>
>>
>>     _______________________________________________
>>     scikit-learn mailing list
>>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>>     https://mail.python.org/mailman/listinfo/scikit-learn
>>     <https://mail.python.org/mailman/listinfo/scikit-learn>
>     _______________________________________________ scikit-learn
>     mailing list scikit-learn at python.org
>     <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>     <https://mail.python.org/mailman/listinfo/scikit-learn> 
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161130/9d82b335/attachment-0001.html>


More information about the scikit-learn mailing list