[SciPy-dev] [pymachine] moving code outside the sandbox into scikits ?

Mon Jun 25 05:19:48 EDT 2007

Jarrod Millman wrote:
> On 6/23/07, Matthieu Brucher <matthieu.brucher at gmail.com> wrote:
>>> We have been calling the project pymachine. But we would rather use a
>>> more descriptive name for the scikit package (and one that doesn't
>>> contain 'py').  Here are some ideas:
>>> - scikits.learning
>>> - scikits.learn
>>> - scikits.machinelearning
>>> - scikits.mlearn
>>> What do you think of these names?  Does anyone have better name in mind?
>> machinelearning is my favourite, but I would think of a more global
>> hierarchy inside this namespace. svm and em do not have the same final goal,
>> so perhaps adding classification or estimation as sub-sub-namespace would be
>> worth considering.
>>
>> Matthieu
>
> Hey Matthieu,
>
> I agree with you, scikits.machinelearning is my favorite as well.  I
> understand Dmitrey's concern about it being such a long name, but I
> think that it is much more important for the package name to be
> obvious as to what it does.  Hopefully, having a well-named package
> will make it more obvious what the very terse names like svm or em
> mean given that they are found inside a machinelearning package.  I
> also want to make sure that a good precedent is started regarding the
> naming of scikits packages.
>
> I also like your suggestion to use something like "import
> scikits.machinelearning as ml".  It might be good to even have a
> recommendation like this in the package docstring.  That way we could
> encourage the adoption of ml (for scikits.machinelearning) as a
> consistent convention.
>
> I also agree that we may need to create a nested hierarchy.  But I
> would prefer to keep a flat namespace at least for the next few weeks.
>  That way we can make the hierarchy after seeing what code ends up in
> the package.  In addition to the code David is working on, there are a
> few other developers who have tentatively offered to contribute some
> working code that they have written.  But we should definitely return
> to this point before making an official release.
I agree on avoiding a flat namespace, but I disagree on doing it as 
Matthieu suggested: where does classification starts, where does 
clustering ends, where does pdf estimation goes in between ? You can use 
EM or SVM to do similar things (discriminative classification, 
clustering). For example, I have almost ready examples to do clustering, 
pdf estimation and discrimative learning: the actual implementation is 
the same, EM. The usage is different.

I prefer to keep the "implementation concept" and the "usage concept" 
separate at the namespace level. That is I agree that having a 
classification or clustering namespace is useulf, but not to separate 
svm or em. I may miss your argument, though ?

David