[SciPy-dev] Another GSoC idea

David Cournapeau david at ar.media.kyoto-u.ac.jp
Wed Mar 25 13:25:25 EDT 2009


David Warde-Farley wrote:
> Hi David,
>
> Thanks for your reply - I fell ill over the weekend and then fell  
> behind on email (and other things :).
>   

Hope everything is going well now.

>
> For sure - it's usually not a good idea to throw out code that works  
> unless you have a very good reason! Do you think you'll ever get  
> around to improving it?
>   

Yes, otherwise, I would not have mentioned it  - who cares that I have
code if it is not somewhere available publicly :)
>
> The idea of general "building blocks" for doing EM (and other things)  
> with probabilistic models in Python interests me very much, and  
> probably interests a lot of other people. However, it's a somewhat  
> ambitious undertaking, let alone for a GSoC. Part of the difficulty I  
> see is that there's a lot of good code that we wouldn't want to  
> reinvent.
>   

I think I may not have been very clear: building blocks for machine
learning is definitely out of scope. What I had in mind, following your
example of recursive kmeans, is a set of simple algorithms which can be
used recursively. By simple, I meant things like averages and other
moment-like statistics.

There was some discussion before:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg14473.html

But again, that's only a mere suggestion, being something I am
interested in myself, and which sounded similar to some of the ideas you
talked about (for application to tracking).

> Then there's PyMC, which as far as I can see has developed a *really*  
> well thought out object-oriented system for specifying probabilistic  
> graphical models. Of course, it's geared toward Bayesian inference via  
> MCMC. In the (relatively rare) case that the posterior is analytically  
> available it shouldn't be all that difficult to graft on code for  
> doing that. Likewise with maximum likelihood (hyper)parameter fitting  
> via EM or gradient-based optimization.
>   

I have even worse (very research quality :) ) code implementing
Variational Bayes for GMM, if that's something you are interested in,
which is a relatively well known approximation of Bayesian computation
for latent models.

> In summary, I think a general treatment of mixture models, etc. in  
> Python is a big task, and as such I'm not certain it'd be suitable for  
> a SoC. Having a really solid module with a few canned non- 
> probabilistic algorithms like k-means (like it already does), k- 
> medoids/centers might be a more manageable task in the short term.
Yes, agreed. My suggestion was about focusing more on the recursive
aspect rather than cython side of things, since I have partly done the
job already, although not publicly (yet).

cheers,

David




More information about the SciPy-Dev mailing list