[scikit-learn] Bayesian Gaussian Mixture

Fri Nov 25 21:32:20 EST 2016

Typically this means that the model is so confident in its predictions it
does not believe it possible for the sample to come from the other
component. Do you get the same results with a regular GaussianMixture?

On Fri, Nov 25, 2016 at 11:34 AM, Tommaso Costanzo <
tommaso.costanzo01 at gmail.com> wrote:

> Hi,
>
> I am facing some problem with the "BayesianGaussianMixture" function, but
> I do not know if it is because of my poor knowledge on this type of
> statistics or if it is something related to the algorithm. I have set of
> data of around 1000 to 4000 observation (every feature is a spectrum of
> around 200 point) so in the end I have n_samples = ~1000 and n_features =
> ~20. The good things is that I am getting the same results of KMeans
> however the "predict_proba" has value only of 0 or 1.
>
> I have wrote a small function to simulate my problem with random data that
> is reported below. The first 1/2 of the array has the point with a positive
> slope while the second 1/2 has a negative slope, so the cross in the
> middle. What I have seen is that for a small number of features I obtain
> good probability, but if the number of features increases (say 50) than the
> probability become only 0 or 1.
> Can someone help me in interpret this result?
>
> Here is the code I wrote with the generated random number, I'll generally
> run it with ncomponent=2 and nfeatures=5 or 10 or 50 or 100. I am not sure
> if it will work in every case is not very highly tested. I have also
> attached as a file!
>
> ##########################################################################
> import numpy as np
>
> from sklearn.mixture import GaussianMixture, BayesianGaussianMixture
>
> import matplotlib.pyplot as plt
>
>
>
> def test_bgm(ncomponent, nfeatures):
>
>     temp = np.random.randn(500,nfeatures)
>
>     temp = temp + np.arange(-1,1, 2.0/nfeatures)
>
>     temp1 = np.random.randn(400,nfeatures)
>
>     temp1 = temp1 + np.arange(1,-1, (-2.0/nfeatures))
>
>     X = np.vstack((temp, temp1))
>
>
>
>     bgm = BayesianGaussianMixture(ncomponent,degrees_of_freedom_
> prior=nfeatures*2).fit(X)
>     bgm_proba = bgm.predict_proba(X)
>
>     bgm_labels = bgm.predict(X)
>
>
>
>     plt.figure(-1)
>
>     plt.imshow(bgm_labels.reshape(30,-1), origin='lower',
> interpolatio='none')
>     plt.colorbar()
>
>
>
>     for i in np.arange(0,ncomponent):
>
>         plt.figure(i)
>
>         plt.imshow(bgm_proba[:,i].reshape(30,-1), origin='lower',
> interpolatio='none')
>         plt.colorbar()
>
>
>
>     plt.show()
> ############################################################
> ##################
>
> Thank you in advance
> Tommaso
>
>
> --
> Please do NOT send Microsoft Office Attachments:
> http://www.gnu.org/philosophy/no-word-attachments.html
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161125/cd408e39/attachment.html>