[SciPy-dev] PyEM, toolbox for Expectation Maximization for Gaussian Mixtures (proposal for inclusion into scipy.sandbox)

Fri Oct 6 10:36:05 EDT 2006

Hi,

    A few months ago, I posted a preliminary version of PyEM, a numpy 
package for Expectation Maximization for Gaussian Mixture Models. As it 
was developed during the various change of core functions in numpy 
(change of axis convention in mean, sum, etc...), I stopped doing public 
releases.
    Now that this numpy API has settled, I propose the package for 
inclusion into the scipy sandbox. The package has already been used with 
success by several other people; I tried to have a coherent and easy to 
use API, and I have included some pretty plotting functions :). By 
including it to scipy, I hope also to get some feedback on usage, 
possible improvements, more testing, etc...

 * DOWNLOAD:

    The scipy version is available here:

http://www.ar.media.kyoto-u.ac.jp/members/david/pyem/pyem-scipy-0.5.3.tar.gz

 * INSTALLATION INSTRUCTIONS:

    I don't know the best way to package a package so that it is 
included in scipy: to make it work, you just need to uncompress the  
archive, and move the directory to Lib/sandbox/pyem. An example script 
is included in the archive, example.py. Some preliminary tests are 
included (including the not-enabled by default ctype version, which 
requires a recent version of ctype).

 * EXAMPLE USAGE:

import numpy as N

from scipy.sandbox.pyem import GM, GMM, EM
import copy

#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# Create an artificial 2 dimension, 3 component GMM model, sample it
#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
d           = 2
k           = 3
w, mu, va   = GM.gen_param(d, k, 'diag', spread = 1.5)
# GM.fromvalues is a class function
gm          = GM.fromvalues(w, mu, va)

# Sample nframes frames  from the model
data    = gm.sample(nframes)

#++++++++++++++++++++++++
# Learn the model with EM
#++++++++++++++++++++++++

# Init the model: here we create a mixture from its meta-parameters only
# (dimension, number of components) using the GM "ctor"
lgm = GM(d, k, mode)
# Create a model to be trained from a mixture, with kmean for initialization
gmm = GMM(lgm, 'kmean')
gmm.init(data)

# The actual EM, with likelihood computation. The threshold
# is compared to the (linearly approximated) derivative of the likelihood
em      = EM()
like    = em.train(data, gmm, maxiter = 30, thresh = 1e-8)

# "Trained" parameters are available through gmm.gm.w, gmm.gm.mu, gmm.gm.va

 * PLOTTING EXAMPLES:

http://www.ar.media.kyoto-u.ac.jp/members/david/pyem/example_1_dimension_mode_diag.png

http://www.ar.media.kyoto-u.ac.jp/members/david/pyem/example_2_dimension_mode_diag.png

 * FUTURE:

I use the package myself quite regularly, and intend to improve it in 
the near future:

 - a script online_em.py for online EM for reinforcement learning is 
included, but not available by default as this is beta, the API awkward, 
and not likely to work really well for now.
 - inclusion of priors to avoid covariance shrinking toward 0.
 - I started to code some core functions in C with ctypes (this can be 
enabled if you uncomment #import _c_densities as densities
in the file gmm_em.py, and comment the line import densities).
 - Ideally, I was hoping to start a project of numpy packages for 
Machine Learning (Kalman filtering, HMM, etc...); I don't know if other 
people would be interested in developing such a package.

Cheers,

David