[SciPy-User] Trying hand-writing recognition with scikits.learn

Mon May 9 06:02:13 EDT 2011

On Sat, May 07, 2011 at 02:34:00PM +0200, Klonuo Umom wrote:
> Looking at matplotlib examples gallery everything seemed so 
> perfect and possible, but to be honest I spent too much time to plot 
> just one stacked histogram the way I wanted, and then went to easier to 
> me gnuplot to save some time and finish my plots. 

That's just a simple learning issue: you know gnuplot, thus you find it
easier to use than matplotlib. I was there at some point. I don't think
that out of the documentations of gnuplot or matplotlib, any one is
significantly easier to grasp.

> In the same manner, but on much higher level, I got interested, out of 
> pure curiosity, in hand-writing recognition which is apparently possible 
> with scikits.learn (among other older packages).
> AFAIK almost all scikits lack documentation beyond bare docstrings 
> inside modules, but scikits.learn however has user guide 
> ( http://scikit-learn.sourceforge.net/user_guide.html ) with examples 
> inside. One of them is 'Recognizing hand-written digits' in 'Machine 
> Learning' chapter. This example has no explanation - source code is 
> presented along with plot output.

Fair enough, this example could be improved (darn, I have my name on it
:$).

I won't reply to the detailed points of message. If you feel that
specific issues must be addressed, feel free to point them out
(preferably on the scikits.learn mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general). We
will do our best to integrate your feedback. However keep in mind that
what you are trying to do is actually a challenging task and the subject
of ongoing research. There will be a learning curve. The scikits.learn is
trying to do explain as much as possible machine learning to non
specialists, but it a full-blown scientific field. You will not be able
to have a functioning hand-written digits pipeline without investing time
in it. I wouldn't be able to solve that problem quickly either.

That said, we fight as hard as we can to make it easier. People usually
tell us that the scikits.learn is amongst the easiest machine learning
package to use. It has a 300-pages long manual, with many different
examples, and external references. All methods have fully-fledged
docstrings. Achieving this alone is a huge amount of work, on top of
providing binaries for various platforms, making sure that it gives
correct and controled results. We can do better, of course; we want to do
better, but we are a bunch of volunteers, loosing sleep on the scikit,
just like most of the developers of open source packages. My back is
telling me that I can't sacrifice more sleep to open source development.
To have better packages, we need more help.

> I'll whiteness steep curve unlike general Python comfortability.

I think that scientific packages tend to have a steeper learning curve
partly because there is the science to learn on top of the software.

To end on a positive note, I must say that I understand you frustration,
and I agree with you that usability of software is premium. As a
community we must keep this in mind, and do our best.

Cheers,

Gaël