[SciPy-dev] Google Summer of Code and scipy.learn (another trying)

Anton Slesarev slesarev.anton at gmail.com
Tue Mar 18 08:05:02 EDT 2008


On Sun, Mar 16, 2008 at 3:58 PM, Matthieu Brucher <
matthieu.brucher at gmail.com> wrote:

> Hi,
>
> I completely agree with you, there should be more documentation, but I
> still don't see your point with sparse data format. Scipy proposes this,
> doesn't it ?

Scipy support work with sparse matrix. It is true. But I mean that learn
should support sparse format in parsers. To learn svm it should be enough to
use syntax like:

data = LoadSparseData('data.file')
s = svm()
s.train(data)

I hope to implement such kind of syntax. User needn't know what exactly data
is. He should just use it.


>
> BBR will be welcomed by a lot of people, when you will implement it, code
> everythign as generic as possible. For instance, feature selection (perhaps
> extraction as well) could be used by other algorithms (even SVMs), so it
> should be as generic as possible (feature comparison should be explained, is
> it in terms of classification results ?).

Yes, it's a good wish. But I think that main part of bbr is classifier,
feature selection can be implemented in separate module. It is not need to
integrate to scikits their implementation of feature selection algorithm.

>
>
> I'd like to add manifold learning tools (this can be thought as some
> feature extraction tools, visualization, ...) which could benefit from your
> approach and vice-versa.
>
That's great!

>
> Matthieu
>
> 2008/3/16, Anton Slesarev <slesarev.anton at gmail.com>:
> >
> > Hi.
> >
> > I'm going to describe what problems I see in current version of
> > scikits.learn. After that I'll write what I want to improve during
> > Google Summer of Code. In my last letter I tried to numerate some
> > limitations in other open-source frameworks such as PyML and Orange.
> >
> > Let's start about Scikits.learn.
> >
> > First of all is a lack of documentation. I can find nothing beside David
> > Cournapeau proposal on google Summer of Code. Nothing in wiki and nothing in
> > maillist. There are few examples in svm, of course. But it is very hard use
> > only examples. I can't find parser of different data formats. Only for
> > datasets. As I understand datasets don't support sparse data format. There
> > is no common structure in ML package. It has scattered modules such as svm,
> > em, ann, but no main idea.
> >
> > If I mistake in understanding current state of affair you can correct
> > me.
> >
> > Well, now about what I want to change.
> >
> > I am going to make learn package appropriate for text classification.
> > Also I want to copy most of PyML (pyml.sourceforge.net/) functionality.
> >
> > First of all we need sparse data format. I want to write parsers for a
> > number of common data formats.
> >
> > We need some preprocessing utilities, such as normalization, feature
> > selection algorithms.
> > This part should be common for all of machine learning package.
> >
> > Also package is need a number of classifiers. There are at least 2
> > state-of-art approaches in text classification and categorization:svm and
> > Bayesian  logistic regression. Svm has already been implemented in Scikits.
> > There are a lot of implementations of logistic regression. I am going to
> > integrate one of them (http://www.stat.rutgers.edu/~madigan/BBR/<http://www.stat.rutgers.edu/%7Emadigan/BBR/>)
> > into scikits.
> >
> > It is need interpretation module, which consists of processing
> > results(different metrics of quality), visualization, feature comparison.
> >
> > There are common text collection (for instance
> > http://trec.nist.gov/data/reuters/reuters.html). I'll try to make work
> > with them absolutely simple.
> >
> > After all, it is very important to write(or generate) reference
> > documentation and tutorial.
> >
> > OK, that's all. I expect to hear your opinions. Particularly I want to
> > see answer of David Cournapeau, who is ,as I understand, maintainer of the
> > learn package.
> >
> >
> >
> > --
> > Anton Slesarev
> > _______________________________________________
> > Scipy-dev mailing list
> > Scipy-dev at scipy.org
> > http://projects.scipy.org/mailman/listinfo/scipy-dev
> >
> >
>
>
> --
> French PhD student
> Website : http://matthieu-brucher.developpez.com/
> Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92
> LinkedIn : http://www.linkedin.com/in/matthieubrucher
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>
>


-- 
Anton Slesarev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20080318/4da875c3/attachment.html>


More information about the SciPy-Dev mailing list