[SciPy-dev] Another GSoC idea
David Cournapeau
david at ar.media.kyoto-u.ac.jp
Wed Mar 25 13:25:25 EDT 2009
David Warde-Farley wrote:
> Hi David,
>
> Thanks for your reply - I fell ill over the weekend and then fell
> behind on email (and other things :).
>
Hope everything is going well now.
>
> For sure - it's usually not a good idea to throw out code that works
> unless you have a very good reason! Do you think you'll ever get
> around to improving it?
>
Yes, otherwise, I would not have mentioned it - who cares that I have
code if it is not somewhere available publicly :)
>
> The idea of general "building blocks" for doing EM (and other things)
> with probabilistic models in Python interests me very much, and
> probably interests a lot of other people. However, it's a somewhat
> ambitious undertaking, let alone for a GSoC. Part of the difficulty I
> see is that there's a lot of good code that we wouldn't want to
> reinvent.
>
I think I may not have been very clear: building blocks for machine
learning is definitely out of scope. What I had in mind, following your
example of recursive kmeans, is a set of simple algorithms which can be
used recursively. By simple, I meant things like averages and other
moment-like statistics.
There was some discussion before:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg14473.html
But again, that's only a mere suggestion, being something I am
interested in myself, and which sounded similar to some of the ideas you
talked about (for application to tracking).
> Then there's PyMC, which as far as I can see has developed a *really*
> well thought out object-oriented system for specifying probabilistic
> graphical models. Of course, it's geared toward Bayesian inference via
> MCMC. In the (relatively rare) case that the posterior is analytically
> available it shouldn't be all that difficult to graft on code for
> doing that. Likewise with maximum likelihood (hyper)parameter fitting
> via EM or gradient-based optimization.
>
I have even worse (very research quality :) ) code implementing
Variational Bayes for GMM, if that's something you are interested in,
which is a relatively well known approximation of Bayesian computation
for latent models.
> In summary, I think a general treatment of mixture models, etc. in
> Python is a big task, and as such I'm not certain it'd be suitable for
> a SoC. Having a really solid module with a few canned non-
> probabilistic algorithms like k-means (like it already does), k-
> medoids/centers might be a more manageable task in the short term.
Yes, agreed. My suggestion was about focusing more on the recursive
aspect rather than cython side of things, since I have partly done the
job already, although not publicly (yet).
cheers,
David
More information about the SciPy-Dev
mailing list