[SciPy-Dev] SciPy Goal

Wed Jan 4 22:30:30 EST 2012

On Wed, Jan 4, 2012 at 9:33 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Wed, Jan 4, 2012 at 6:43 PM, Travis Oliphant <travis at continuum.io> wrote:
>>
>> Thanks for the feedback.      My point was to generate discussion and
>> start the ball rolling on exactly the kind of conversation that has started.
>>
>>
>> Exactly as Ralf mentioned, the point is to get development on sub-packages
>> --- something that the scikits effort and other individual efforts have done
>> very, very well.   In fact, it has worked so well, that it taught me a great
>> deal about what is important in open source.   My perhaps irrational dislike
>> for the *name* "scikits" should not be interpreted as anything but a naming
>> taste preference (and I am not known for my ability to choose names well
>> anyway).     I very much like and admire the community around scikits.  I
>> just would have preferred something easier to type (even just sci_* would
>> have been better in my mind as high-level packages:  sci_learn, sci_image,
>> sci_statsmodels, etc.).    I didn't feel like I was able to fully
>> participate in that discussion when it happened, so you can take my comments
>> now as simply historical and something I've been wanting to get off my chest
>> for a while.
>>
>> Without better packaging and dependency management systems (especially on
>> Windows and Mac), splitting out code doesn't help those who are not
>> distribution dependent (who themselves won't be impacted much).   There are
>> scenarios under which it could make sense to split out SciPy, but I agree
>> that right now it doesn't make sense to completely split everything.
>> However, I do think it makes sense to clean things up and move some things
>> out in preparation for SciPy 1.0
>>
>> One thing that would be nice is what is the view of documentation and
>> examples for the different packages.   Where is work there most needed?
>>
>>
>> Looking at Travis' list of non-core packages I'd say that sparse certainly
>> belongs in the core and integrate probably too. Looking at what's left:
>> - constants : very small and low cost to keep in core. Not much to improve
>> there.
>>
>>
>> Agreed.
>>
>> - cluster : low maintenance cost, small. not sure about usage, quality.
>>
>>
>> I think cluster overlaps with scikits-learn quite a bit.   It basically
>> contains a K-means vector quantization code with functionality that I
>> suspect  exists in scikits-learn.   I would recommend deprecation and
>> removal while pointing people to scikits-learn for equivalent functionality
>> (or moving it to scikits-learn).
>>
>
> I disagree. Why should I go to scikits-learn for basic functionality like
> that? It is hardly specific to machine learning. Same with various matrix
> factorizations.
>>
>> - ndimage : difficult one. hard to understand code, may not see much
>> development either way.
>>
>>
>> This overlaps with scikits-image but has quite a bit of useful
>> functionality on its own.   The package is fairly mature and just needs
>> maintenance.
>>
>
> Again, pretty basic stuff in there, but I could be persuaded to go to
> scikits-image since it *is* image specific and might be better maintained.
>>
>> - spatial : kdtree is widely used, of good quality. low maintenance cost.
>>
>>
>
> Indexing of all sorts tends to be fundamental. But not everyone knows they
> want it ;)
>
>> Good to hear maintenance cost is low.
>>
>> - odr : quite small, low cost to keep in core. pretty much done as far as
>> I can tell.
>>
>>
>> Agreed.
>>
>> - maxentropy : is deprecated, will disappear.
>>
>>
>> Great.
>>
>> - signal : not in great shape, could be viable independent package. On the
>> other hand, if scikits-signal takes off and those developers take care to
>> improve and build on scipy.signal when possible, that's OK too.
>>
>>
>> What are the needs of this package?  What needs to be fixed / improved?
>> It is a broad field and I could see fixing scipy.signal with a few simple
>> algorithms (the filter design, for example), and then pushing a separate
>> package to do more advanced signal processing algorithms.    This sounds
>> fine to me.   It looks like I can put attention to scipy.signal then, as It
>> was one of the areas I was most interested in originally.
>>
>
> Filter design could use improvement. I also have a remez algorithm that
> works for complex filter design that belongs somewhere.

ltisys was pretty neglected, but Warren, I think, made quite big improvements.
There was several times the discussion whether MIMO works or should
work, similar there was a discrete time proposal but I didn't keep up
with what happened to it.

In statsmodels we are very happy with signal.lfilter but I wished
there were a multi input version of it.
Other things that are basic, periodograms, burg and levinson_durbin
are scipy algorithms I think, but having them in a scikits.signal
would be good also.

Josef

>>
>> - weave : no point spending any effort on it. keep for backwards
>> compatibility only, direct people to Cython instead.
>>
>>
>> Agreed.   Anyway we can deprecate this for SciPy 1.0?
>>
>>
>> Overall, I don't see many viable independent packages there. So here's an
>> alternative to spending a lot of effort on reorganizing the package
>> structure:
>> 1. Formulate a coherent vision of what in principle belongs in scipy
>> (current modules + what's missing).
>>
>>
>> O.K.  so SciPy should contain "basic" modules that are going to be needed
>> for a lot of different kinds of analysis to be a dependency for other more
>> advanced packages.  This is somewhat vague, of course.
>>
>> What do others think is missing?  Off the top of my head:   basic wavelets
>> (dwt primarily) and more complete interpolation strategies (I'd like to
>> finish the basic interpolation approaches I started a while ago).
>> Originally, I used GAMS as an "overview" of the kinds of things needed in
>> SciPy.   Are there other relevant taxonomies these days?
>>
>> http://gams.nist.gov/cgi-bin/serve.cgi
>>
>>
>> 2. Focus on making it easier to contribute to scipy. There are many ways
>> to do this; having more accessible developer docs, having a list of "easy
>> fixes", adding info to tickets on how to get started on the reported issues,
>> etc. We can learn a lot from Sympy and IPython here.
>>
>>
>> Definitely!
>>
>> 3. Recognize that quality of code and especially documentation is
>> important, and fill the main gaps.
>>
>>
>> Is there a write-up of recognized gaps here that we can start with?
>>
>> 4. Deprecate sub-modules that don't belong in scipy (anymore), and remove
>> them for scipy 1.0. I think that this applies only to maxentropy and weave.
>>
>>
>> I think it also applies to cluster as described above.
>>
>> 5. Find a clear (group of) maintainer(s) for each sub-module. For people
>> familiar with one module, responding to
>>
>> tickets and pull requests for that module would not cost so much time.
>>
>>
>> Is there a list where this is kept?
>>
>>
>> In my opinion, spending effort on improving code/documentation quality and
>> attracting new developers (those go hand in hand) instead of reorganizing
>> will have both more impact and be more beneficial for our users.
>>
>>
>
> Chuck
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>