[SciPy-Dev] cKDTree

Sat Sep 10 06:34:19 EDT 2016

On Fri, Sep 9, 2016 at 9:12 AM, Sylvain Corlay <sylvain.corlay at gmail.com>
wrote:

> Hi Pauli, Ralf,
>
> On Thu, Sep 8, 2016 at 8:26 PM, Pauli Virtanen <pav at iki.fi> wrote:
>>
>> > Just imagine: I have a new uniformly filling sequence, but no proof that
>> > it is pseudo-random, and I don't even know the discrepancy of the
>> > sequence,
>> > but a bunch of examples for which "it works"...  Well, I doubt that
>> > anyone would want to use it for Monte-Carlo simulation / cryptography
>> > etc...
>>
>> Are you presenting this as a fair analogy to DE and the literature
>> about it?
>>
>> > In any case, I find this question on the need of a scientific
>> > justification worthy to be answered in general - especially in the
>> > context of the discussions on scope and the 1.0 release.
>>
>> Yes, justification that an approach works and that is regarded as useful
>> is required. This is not really the place to do completely original
>> research.
>>
>> If I understand you correctly, you are saying that you would not
>> generally recommend DE as a global optimization method, because there are
>> superior choices? Or, are you saying it's essentially crock?
>
>
> That is not really it. I was mostly pointing DE as an example which is
> sort of in a gray area, in order to ask the questions of scope, criterion
> for inclusion into scipy etc.
>
> When it was included into scipy, it triggered my attention since I had
> worked on flavors of DE in the past (It is used as an alternative to lloyd
> algorithms for optimal quantization). There is indeed some literature about
> the applications in this area. So I did find it useful, and found that it
> does work well for the problems I used it for. (However the sort of generic
> implementation proposed today in scipy would not have been a good fit in
> this case.)
>
> My understanding of what scipy's scope is, is that it is a collection of
> routines that are robust reference implementations
>

"reference implementation" has a bit of a negative implication for me. Like
we have reference BLAS/LAPACK, which are references for correctness but you
should really be using something better for pretty much any application.
That is definitely not the intention for anything in Scipy, nor is it the
case for most modules.

If we have an algorithm or data structure, we want to strike a good balance
between features, maintainability, usability and performance.

> of well established numerical methods that are of general interest
> regardless of the area of applications. Linear algebra, interpolation,
> quadrature, random number generation, convex optimization, specialized
> optimizers for dense LP and QP etc... In each ones of these areas, if you
> need something more specialized, you should probably used a specialized
> library or implement something ad-hoc to your use case.
>

For linear algebra there's really not that much more specialized that we
want to not have in scope. Statistical distributions, special functions and
distance metrics are other examples of where we go for comprehensive
coverage. It can also depend on what other packages are out there, for
example for hypothesis tests we don't accept much anymore because
statsmodels is a better package for more comprehensive coverage.

>
> Evolutionary optimization algorithms don't seem to fall into this category
> for the reasons that we discussed earlier. It is mostly a set of
> heuristics. It is cool, inspired by nature, etc. (however, a number of
> citations is probably not a substitute for a mathematical proof...)
>

We're just going to have to disagree about this one. You seem to attach an
unusually high value to a "mathematical proof" (not that that's a black or
white thing either), and a low value to things like realistic benchmarks
and solving users' problems. It's not about number of citations either. The
AMPGO (http://infinity77.net/global_optimization/ampgo.html,
https://github.com/andyfaff/ampgo) paper has only 14 I see, but we'd
probably add it if someone submits a good-quality PR.

> The other methods that I listed for stochastic optimization would have
> been more natural candidates to fall into the "category" that I roughly
> described above, in that they are extremely well established and backed by
> theory. I imagine that the inclusion of DE into Scipy could have been
> questioned at the time,
>

I still don' see any reason for having had to question that inclusion.

>
> but that now that it is in there, it should probably not be removed
> without a good alternative. Finally, I am still curious about what can be
> considered a bug or a feature in the case of a method like this.
>

I answered the feature part already, but I guess you didn't like the answer
much:)

> On the subject of the use of a faster flavor of KdTree as I was proposing,
> I was only gauging interest. The long discussion on DE on this specific
> thread is mostly coincidental. My main goal was to use it as an example for
> the question of the scope
>

I hope this discussion helped a bit. But if you want a fixed definition of
scope and a recipe to apply so that you can know without discussion if a
new feature will be in scope for Scipy - that's not possible I'm afraid.
There's always some judgment involved.

> - also to ask about the "big split" idea. If there was to be a split, a
> potential scipy-incubator organization with proposals for inclusion as
> scipy subprojects would make sense then...
>

There are no plans for splitting up SciPy. It was considered several times,
but it's really not the best way to spend our time. It's possible we would
get more development on some modules, but that's not guaranteed, and the
maintenance and release overhead goes up significantly.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20160910/b179e20f/attachment.html>