[SciPy-Dev] SciPy Goal

Charles R Harris charlesr.harris at gmail.com
Thu Jan 5 09:45:12 EST 2012


On Thu, Jan 5, 2012 at 7:10 AM, <josef.pktd at gmail.com> wrote:

> On Thu, Jan 5, 2012 at 1:47 AM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
> >
> >
> > On Thu, Jan 5, 2012 at 7:26 AM, Travis Oliphant <travis at continuum.io>
> wrote:
> >>
> >>
> >> On Jan 5, 2012, at 12:02 AM, Warren Weckesser wrote:
> >>
> >>
> >>
> >> On Wed, Jan 4, 2012 at 9:29 PM, Travis Oliphant <travis at continuum.io>
> >> wrote:
> >>>
> >>>
> >>> On Jan 4, 2012, at 8:22 PM, Fernando Perez wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > On Wed, Jan 4, 2012 at 5:43 PM, Travis Oliphant <travis at continuum.io
> >
> >>> > wrote:
> >>> >> What do others think is missing?  Off the top of my head:   basic
> >>> >> wavelets
> >>> >> (dwt primarily) and more complete interpolation strategies (I'd like
> >>> >> to
> >>> >> finish the basic interpolation approaches I started a while ago).
> >>> >> Originally, I used GAMS as an "overview" of the kinds of things
> needed
> >>> >> in
> >>> >> SciPy.   Are there other relevant taxonomies these days?
> >>> >
> >>> > Well, probably not something that fits these ideas for scipy
> >>> > one-to-one, but the Berkeley 'thirteen dwarves' list from the 'View
> >>> > from Berkeley' paper on parallel computing is not a bad starting
> >>> > point; summarized here they are:
> >>> >
> >>> >    Dense Linear Algebra
> >>> >    Sparse Linear Algebra [1]
> >>> >    Spectral Methods
> >>> >    N-Body Methods
> >>> >    Structured Grids
> >>> >    Unstructured Grids
> >>> >    MapReduce
> >>> >    Combinational Logic
> >>> >    Graph Traversal
> >>> >    Dynamic Programming
> >>> >    Backtrack and Branch-and-Bound
> >>> >    Graphical Models
> >>> >    Finite State Machines
> >>>
> >>>
> >>> This is a nice list, thanks!
> >>>
> >>> >
> >>> > Descriptions of each can be found here:
> >>> > http://view.eecs.berkeley.edu/wiki/Dwarf_Mine and the full study is
> >>> > here:
> >>> >
> >>> > http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
> >>> >
> >>> > That list is biased towards the classes of codes used in
> >>> > supercomputing environments, and some of the topics are probably
> >>> > beyond the scope of scipy (say structured/unstructured grids, at
> least
> >>> > for now).
> >>> >
> >>> > But it can be a decent guiding outline to reason about what are the
> >>> > 'big areas' of scientific computing, so that scipy at least provides
> >>> > building blocks that would be useful in these directions.
> >>> >
> >>>
> >>> Thanks for the links.
> >>>
> >>>
> >>> > One area that hasn't been directly mentioned too much is the
> situation
> >>> > with statistical tools.  On the one hand, we have the phenomenal work
> >>> > of pandas, statsmodels and sklearn, which together are helping turn
> >>> > python into a great tool for statistical data analysis (understood in
> >>> > a broad sense).  But it would probably be valuable to have enough of
> a
> >>> > statistical base directly in numpy/scipy so that the 'out of the box'
> >>> > experience for statistical work is improved.  I know we have
> >>> > scipy.stats, but it seems like it needs some love.
> >>>
> >>> It seems like scipy stats has received quite a bit of attention.
> There
> >>> is always more to do, of course, but I'm not sure what specifically you
> >>> think is missing or needs work.
> >>
> >>
> >>
> >> Test coverage, for example.  I recently fixed several wildly incorrect
> >> skewness and kurtosis formulas for some distributions, and I now have
> very
> >> little confidence that any of the other distributions are correct.  Of
> >> course, most of them probably *are* correct, but without tests, all are
> in
> >> doubt.
> >>
> >>
> >> There is such a thing as *over-reliance* on tests as well.
> >
> >
> > True in principle, but we're so far from that point that you don't have
> to
> > worry about that for the foreseeable future.
> >
> >>
> >> Tests help but it is not a black or white kind of thing as seems to come
> >> across in many of the messages on this list about what part of scipy is
> in
> >> "good shape" or "easy to maintain" or "has love."    Just because tests
> >> exist doesn't mean that you can trust the code --- you also then have to
> >> trust the tests.   Ultimately, trust is built from successful *usage*.
> >> Tests are only a pseudo-subsitute for that usage.  It so happens that
> usage
> >> that comes along with the code itself makes it easier to iterate on
> changes
> >> and catch some of the errors that can happen on re-factoring.
> >>
> >> In summary, tests are good!  But, they also add overhead and themselves
> >> must be maintained, and I don't think it helps to disparage working
> code.
> >> I've seen a lot of terrible code that has *great* tests and seen
> projects
> >> fail because developers focus too much on the tests and not enough on
> what
> >> the code is actually doing.   Great tests can catch many things but they
> >> cannot make up for not paying attention when writing the code.
> >
> >
> > Certainly, but besides giving more confidence that code is correct, a
> major
> > advantage is that it is a massive help when working on existing code -
> > especially for new developers. Now we have to be extremely careful in
> > reviewing patches to check nothing gets broken (including backwards
> > compatibility). Tests in that respect are not a maintenance burden, but a
> > time saver.
>
> Overall I also think that adding sufficient tests at the time of
> adding the code is a big time saver in the long run. It is a lot more
> difficult to figure out later why something is wrong and how to fix
> it.
>
> Without sufficient tests it's also difficult to tell whether code that
> looks good works as advertised, (my last mistake was a misplaced
> bracket that only showed up in cases that were not covered by the
> tests).
>
> And of course as Ralf mentioned, refactoring without test coverage is
> dangerous business even if the change looks "innocent.
>
>
And sufficient means test everything. I always turn up bugs when I increase
test coverage. It can be embarrassing.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120105/a7158e93/attachment.html>


More information about the SciPy-Dev mailing list