[SciPy-Dev] SciPy-Dev Digest, Vol 155, Issue 12

Thu Sep 8 08:24:56 EDT 2016

Dear All,

Following the discussion regarding DE and simulating annealing, I have made a PR for proposing a new implementation for simulating annealing (that is a generalized one).
https://github.com/scipy/scipy/pull/6197 (already kindly commented by Ralf and Andrew).
The homemade (parallelized) benchmarks are showing very good results especially in terms of successful rate.
Since, Andrew has fixed the benchmark code for python3. In order to consider this enhancement proposal, what do you guys recommend me to do. The idea is to benchmark it with the new fixed benchmark (using the PR branch for having this new optimizer). I presented a poster in EuroSciPy in Erlangen describing the performance of this approach and had a nice talk with Olivier Grisel about it.
Any thoughts to move this forward?

Thanks a lot in advance.

Sylvain.

-----Original Message-----
From: SciPy-Dev [mailto:scipy-dev-bounces at scipy.org] On Behalf Of scipy-dev-request at scipy.org
Sent: jeudi 8 septembre 2016 10:19
To: scipy-dev at scipy.org
Subject: SciPy-Dev Digest, Vol 155, Issue 12

Send SciPy-Dev mailing list submissions to
	scipy-dev at scipy.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://mail.scipy.org/mailman/listinfo/scipy-dev
or, via email, send a message with subject or body 'help' to
	scipy-dev-request at scipy.org

You can reach the person managing the list at
	scipy-dev-owner at scipy.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of SciPy-Dev digest..."

Today's Topics:

   1. Re: misc.bytescale bug (Greg Dooper)
   2. Re: cKDTree (Ralf Gommers)
   3. Re: cKDTree (Ralf Gommers)

----------------------------------------------------------------------

Message: 1
Date: Wed, 7 Sep 2016 23:37:46 -0600
From: Greg Dooper <greg.dooper at gmail.com>
To: SciPy Developers List <scipy-dev at scipy.org>
Subject: Re: [SciPy-Dev] misc.bytescale bug
Message-ID:
	<CAKYMPC8XqK1kACAj9fPDNEmMhypVj4aMGcpLyRau7uLK1YVsqQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I have a pull request up for this bug (
https://github.com/scipy/scipy/pull/6554). I should probably squash some of the commits together, but I'd rather wait on feedback first.
-Greg

On Tue, Sep 6, 2016 at 11:10 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote:

>
>
> On Wed, Sep 7, 2016 at 12:56 AM, Greg Dooper <greg.dooper at gmail.com>
> wrote:
>
>> Hello,
>> It seems like there is a bug in misc.bytescale. Or I am not 
>> understanding what the intended behavior should be. What I think is a 
>> bug shows up when you try to bytescale an array with both cmin/cmax 
>> parameters and low/high parameters.
>>
>> I think it's a pretty simple fix that I would like to tackle.
>>
>> What is the next step for me? Creating an issue on github with some 
>> example code and data?
>>
>
> Yes, that would be great.  Be sure to include an example that 
> demonstrates the problem.
>
> Warren
>
>
>
>> Thanks for the input.
>>
>> -Greg Dooper
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.scipy.org/pipermail/scipy-dev/attachments/20160907/2a3ad533/attachment-0001.html>

------------------------------

Message: 2
Date: Thu, 8 Sep 2016 20:11:51 +1200
From: Ralf Gommers <ralf.gommers at gmail.com>
To: SciPy Developers List <scipy-dev at scipy.org>
Subject: Re: [SciPy-Dev] cKDTree
Message-ID:
	<CABL7CQi1ZZh4JKxTHb+4ztRa-t71eYhwCT0ZmqFE9kXXbhWuQQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Wed, Sep 7, 2016 at 11:19 AM, Pauli Virtanen <pav at iki.fi> wrote:

> Tue, 06 Sep 2016 20:23:05 +0200, Sylvain Corlay kirjoitti:
> [clip]
> > This sort of raises the question of the scope of scipy. Should scipy 
> > go through the same sort of "big split" as Jupyter or more ? la 
> > d3js? Is amount of specialized knowledge required to understand a 
> > methods part of what defines the line of division in what should be or not be in scipy?
>
> To write a bit more on this, although I think it's difficult to give 
> hard rules on what "generally useful and generally agreed to work" 
> means, I would perhaps weigh the following against each other:
>
> - Is the method used/useful in different domains in practice?
>   How much domain-specific background knowledge is needed to use it
>   properly?
>
> - Consider the stuff already in the module.  Is what you are adding
>   an omission?  Does it solve a problem that you'd expect the module
>   be able to solve?  Does it supplement an existing feature in
>   a significant way?
>
> - Consider the equivalence class of similar methods / features usually
>   expected. Among them, what would in principle be the minimal set so
>   that there's not a glaring omission in the offered features remaining?
>   How much stuff would that be? Does including a representative one of
>   them cover most use cases? Would it in principle sound reasonable to
>   include everything from the minimal set in the module?
>
> - Is what you are adding something that is well understood in the
>   literature? If not, how sure are you that it will turn out well?
>   Does the method perform well compared to other similar ones?
>
> - Note that the twice-a-year release cycle and backward-compat
>   policy makes correcting things later on more difficult.
>
> The scopes of the submodules also vary, so it's probably best to 
> consider each as if a separate project --- "numerical evaluation of 
> special functions" is relatively well-defined, but "commonly needed 
> optimization algorithms" less so.
>
> On a meta-level, it's probably also bad to be too restrictive on the 
> scope, as telling people to go away can result to just that.
>

Thanks Pauli, this and your other mail are the best summary yet of how to judge suitability for inclusion. I propose to stick this almost verbatim in the developer docs.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.scipy.org/pipermail/scipy-dev/attachments/20160908/7d53013f/attachment-0001.html>

------------------------------

Message: 3
Date: Thu, 8 Sep 2016 20:18:56 +1200
From: Ralf Gommers <ralf.gommers at gmail.com>
To: SciPy Developers List <scipy-dev at scipy.org>
Subject: Re: [SciPy-Dev] cKDTree
Message-ID:
	<CABL7CQhhT6VYaWSq4r1fKtMq3TMBksSnN4PM+Xotejycja+qDQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Wed, Sep 7, 2016 at 6:23 AM, Sylvain Corlay <sylvain.corlay at gmail.com>
wrote:

> I understand (especially about the scikits), although I was surprised 
> by some recently added features in scipy, such as differential 
> evolution
>
> DE is more of a  "recipe" which has been studied empirically for which 
> there is no theoretical convergence rate. Even though it may "work well"
> for certain problems, it causes some issues such as defining on which 
> basis it should even be "improved", and what should be the foundation 
> for a change. (Recently a result on convergence in probability (with 
> no error
> rate) on a bounded domain has been published, but no result exists on 
> the speed of convergence afaik...)
>

> Evolutionary optimization is definitely cool and may work strikingly 
> well for certain problems, but I was surprised that it got elected to 
> inclusion in scipy. DE would have been a nice seed for a scikit-evolution.
>
> For those interested in stochastic algorithms for global multivariate 
> optimization for which we have proven convergence results and an 
> abundant literature, we have at least the two following categories of 
> methods
>
>  - *MCMC* (Markov Chain Monte Carlo) which comprises simulated 
> annealing, for which there are nice results of convergence in distribution.
>  - *Robbins-Monro* methods, which comprise stochastic gradient 
> methods, for which we have almost-sure convergence and central-limit - type theorems.
>

I think it's fair to summarize the reason for the current status and inclusion of DE as: theoretical convergence rates and papers are nice, but code that actually works on a good set of benchmarks is way more important.

Some years ago we had only bad global optimizers. We did have simulated annealing, but it just didn't work for most problems. No one stepped up to improve it, so we deprecated and removed it.

DE was benchmarked quite thoroughly (on the benchmark functions from
benchmarks/benchmarks/test_go_benchmark_functions.py) and came out looking good. It solved problems that the only other good optimizer we had
(basinhopping) did not do well on. So that means there was value in adding it.

Cheers,
Ralf

> This sort of raises the question of the scope of scipy. Should scipy 
> go through the same sort of "big split" as Jupyter or more ? la d3js? 
> Is amount of specialized knowledge required to understand a methods 
> part of what defines the line of division in what should be or not be in scipy?
>
> Sylvain
>
> On Tue, Sep 6, 2016 at 8:08 PM, Jacob Vanderplas < 
> jakevdp at cs.washington.edu> wrote:
>
>> As for adding more functionality along those lines, I would advocate 
>> creation of a new package or perhaps a scikit. As we've seen with 
>> scikit-learn, useful features can filter-up into scipy (e.g.
>> sparse.csgraph) and the development of new features within an 
>> independent package is *much* easier than development from within a scipy PR.
>>    Jake
>>
>>  Jake VanderPlas
>>  Senior Data Science Fellow
>>  Director of Research in Physical Sciences  University of Washington 
>> eScience Institute
>>
>> On Tue, Sep 6, 2016 at 11:03 AM, Sylvain Corlay 
>> <sylvain.corlay at gmail.com
>> > wrote:
>>
>>> Would you guys consider in scope for scipy to have implementation of 
>>> faster nearest neighbor search methods than KdTree?
>>>
>>> Some methods are fairly simple... e.g principal axis tree which use 
>>> the principal direction of the dataset to split the dataset into 
>>> smaller subsets.  As soon as intrinsic dimensionality is 
>>> significantly smaller than the dimension of the space, it is significantly faster.
>>>
>>> Besides, only having to compute the (an approximate) principal axis 
>>> is much faster than doing an actual PCA.
>>>
>>> On Tue, Sep 6, 2016 at 4:14 AM, Jacob Vanderplas < 
>>> jakevdp at cs.washington.edu> wrote:
>>>
>>>> From my own casual benchmarks, the new scipy cKDTree is much faster 
>>>> than any of the scikit-learn options, though it still only supports 
>>>> axis-aligned euclidean-like metrics (where sklearn's BallTree 
>>>> supports dozens of additional metrics). The cKDTree also has a 
>>>> limited range of query types compared to scikit-learn's trees,
>>>>    Jake
>>>>
>>>>  Jake VanderPlas
>>>>  Senior Data Science Fellow
>>>>  Director of Research in Physical Sciences  University of 
>>>> Washington eScience Institute
>>>>
>>>> On Mon, Sep 5, 2016 at 12:46 AM, Da?id <davidmenhur at gmail.com> wrote:
>>>>
>>>>> On 4 September 2016 at 23:00, Robert Lucente 
>>>>> <rlucente at pipeline.com>
>>>>> wrote:
>>>>> > Please note that I am a newbie and just a lurker.
>>>>> >
>>>>> > I noticed in a recent email that cKDTree was mentioned.
>>>>> >
>>>>> > Q: What is the relationship if any between SciPy an scikit-learn
>>>>> when it comes to cKDTree?
>>>>> >
>>>>> > The reason that I ask are the following 2 links
>>>>> >
>>>>> > https://jakevdp.github.io/blog/2013/04/29/benchmarking-neare
>>>>> st-neighbor-searches-in-python/
>>>>> >
>>>>> > https://github.com/scikit-learn/scikit-learn/issues/3682
>>>>>
>>>>> Note that these benchmarks are from 2013 and 2014. Scipy's KDTree 
>>>>> has seen its performance recently improved, twice. Scikit's last 
>>>>> update to its KDTree was in 2015. So, we need to run the benchmarks again.
>>>>>
>>>>> /David.
>>>>> _______________________________________________
>>>>> SciPy-Dev mailing list
>>>>> SciPy-Dev at scipy.org
>>>>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> SciPy-Dev mailing list
>>>> SciPy-Dev at scipy.org
>>>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.scipy.org/pipermail/scipy-dev/attachments/20160908/597b9be4/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at scipy.org
https://mail.scipy.org/mailman/listinfo/scipy-dev

------------------------------

End of SciPy-Dev Digest, Vol 155, Issue 12
******************************************