[SciPy-Dev] SciPy Goal

Thu Jan 5 09:10:20 EST 2012

On Thu, Jan 5, 2012 at 1:47 AM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Thu, Jan 5, 2012 at 7:26 AM, Travis Oliphant <travis at continuum.io> wrote:
>>
>>
>> On Jan 5, 2012, at 12:02 AM, Warren Weckesser wrote:
>>
>>
>>
>> On Wed, Jan 4, 2012 at 9:29 PM, Travis Oliphant <travis at continuum.io>
>> wrote:
>>>
>>>
>>> On Jan 4, 2012, at 8:22 PM, Fernando Perez wrote:
>>>
>>> > Hi all,
>>> >
>>> > On Wed, Jan 4, 2012 at 5:43 PM, Travis Oliphant <travis at continuum.io>
>>> > wrote:
>>> >> What do others think is missing?  Off the top of my head:   basic
>>> >> wavelets
>>> >> (dwt primarily) and more complete interpolation strategies (I'd like
>>> >> to
>>> >> finish the basic interpolation approaches I started a while ago).
>>> >> Originally, I used GAMS as an "overview" of the kinds of things needed
>>> >> in
>>> >> SciPy.   Are there other relevant taxonomies these days?
>>> >
>>> > Well, probably not something that fits these ideas for scipy
>>> > one-to-one, but the Berkeley 'thirteen dwarves' list from the 'View
>>> > from Berkeley' paper on parallel computing is not a bad starting
>>> > point; summarized here they are:
>>> >
>>> >    Dense Linear Algebra
>>> >    Sparse Linear Algebra [1]
>>> >    Spectral Methods
>>> >    N-Body Methods
>>> >    Structured Grids
>>> >    Unstructured Grids
>>> >    MapReduce
>>> >    Combinational Logic
>>> >    Graph Traversal
>>> >    Dynamic Programming
>>> >    Backtrack and Branch-and-Bound
>>> >    Graphical Models
>>> >    Finite State Machines
>>>
>>>
>>> This is a nice list, thanks!
>>>
>>> >
>>> > Descriptions of each can be found here:
>>> > http://view.eecs.berkeley.edu/wiki/Dwarf_Mine and the full study is
>>> > here:
>>> >
>>> > http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
>>> >
>>> > That list is biased towards the classes of codes used in
>>> > supercomputing environments, and some of the topics are probably
>>> > beyond the scope of scipy (say structured/unstructured grids, at least
>>> > for now).
>>> >
>>> > But it can be a decent guiding outline to reason about what are the
>>> > 'big areas' of scientific computing, so that scipy at least provides
>>> > building blocks that would be useful in these directions.
>>> >
>>>
>>> Thanks for the links.
>>>
>>>
>>> > One area that hasn't been directly mentioned too much is the situation
>>> > with statistical tools.  On the one hand, we have the phenomenal work
>>> > of pandas, statsmodels and sklearn, which together are helping turn
>>> > python into a great tool for statistical data analysis (understood in
>>> > a broad sense).  But it would probably be valuable to have enough of a
>>> > statistical base directly in numpy/scipy so that the 'out of the box'
>>> > experience for statistical work is improved.  I know we have
>>> > scipy.stats, but it seems like it needs some love.
>>>
>>> It seems like scipy stats has received quite a bit of attention.   There
>>> is always more to do, of course, but I'm not sure what specifically you
>>> think is missing or needs work.
>>
>>
>>
>> Test coverage, for example.  I recently fixed several wildly incorrect
>> skewness and kurtosis formulas for some distributions, and I now have very
>> little confidence that any of the other distributions are correct.  Of
>> course, most of them probably *are* correct, but without tests, all are in
>> doubt.
>>
>>
>> There is such a thing as *over-reliance* on tests as well.
>
>
> True in principle, but we're so far from that point that you don't have to
> worry about that for the foreseeable future.
>
>>
>> Tests help but it is not a black or white kind of thing as seems to come
>> across in many of the messages on this list about what part of scipy is in
>> "good shape" or "easy to maintain" or "has love."    Just because tests
>> exist doesn't mean that you can trust the code --- you also then have to
>> trust the tests.   Ultimately, trust is built from successful *usage*.
>> Tests are only a pseudo-subsitute for that usage.  It so happens that usage
>> that comes along with the code itself makes it easier to iterate on changes
>> and catch some of the errors that can happen on re-factoring.
>>
>> In summary, tests are good!  But, they also add overhead and themselves
>> must be maintained, and I don't think it helps to disparage working code.
>> I've seen a lot of terrible code that has *great* tests and seen projects
>> fail because developers focus too much on the tests and not enough on what
>> the code is actually doing.   Great tests can catch many things but they
>> cannot make up for not paying attention when writing the code.
>
>
> Certainly, but besides giving more confidence that code is correct, a major
> advantage is that it is a massive help when working on existing code -
> especially for new developers. Now we have to be extremely careful in
> reviewing patches to check nothing gets broken (including backwards
> compatibility). Tests in that respect are not a maintenance burden, but a
> time saver.

Overall I also think that adding sufficient tests at the time of
adding the code is a big time saver in the long run. It is a lot more
difficult to figure out later why something is wrong and how to fix
it.

Without sufficient tests it's also difficult to tell whether code that
looks good works as advertised, (my last mistake was a misplaced
bracket that only showed up in cases that were not covered by the
tests).

And of course as Ralf mentioned, refactoring without test coverage is
dangerous business even if the change looks "innocent.

Josef

>
> As an example, last week I wanted to add a way to easily adjust the
> bandwidth of gaussian_kde. This was maybe 10 lines of code, didn't take long
> at all. Then I spent some time adding tests and improving the docs, and
> thought I was done. After sending the PR, I spent at least an equal amount
> of time reworking everything a couple of times to not break any of the
> existing subclasses that could be found. In addition it took a lot of
> Josef's time to review it all and convince me of the error of my way. A few
> tests could have saved us a lot of time.
>
> Ralf
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>