[SciPy-Dev] Global Optimization Benchmarks

Mon Jan 11 06:37:43 EST 2021

Hi Hans,

On Mon, 11 Jan 2021 at 11:47, Hans Dembinski <hans.dembinski at gmail.com>
wrote:

> Hi Andrea,
>
> > On 8. Jan 2021, at 10:20, Andrea Gavana <andrea.gavana at gmail.com> wrote:
> >
> >     long time no see :-) . I thought to start 2021 with a bit of a bang,
> to try and forget how bad 2020 has been... So I am happy to present you
> with a revamped version of the Global Optimization Benchmarks from my
> previous exercise in 2013.
> >
> > This new set of benchmarks pretty much superseeds - and greatly expands
> - the previous analysis that you can find at this location:
> http://infinity77.net/global_optimization/ .
>
> thank you for sharing this. I was going to point out the "No Free Lunch
> Theorem" but you mention it yourself on your website, good.
>

Thank you for your comments. Before trying to answer your concerns, let me
clarify a few things as it seems to me we are mixing two sets of results in
the comments:

1. This page: http://infinity77.net/global_optimization/ represents my old
work on global optimization (dated 2013), and I consider it now obsolete
and superseded by the work at point (2)
2. This page: http://infinity77.net/go_2021/index.html (and subsequent
pages) is now my latest reference. If you read the description (and in
particular the section about "The Rules"), you will see that they are not
quite the same as they were in point (1).

>
> I have a few questions/comments:
>
> - The exclusion of gradient-based solvers is unfortunate. It is of course
> up to you what you investigate, but gradient-based solvers are surely
> useful in practice. Not all real-life problems involve a (non-analytical)
> simulation.
>

I welcome suggestions on which *global* optimization algorithms are
gradient-based and can be applied relatively easily using Python, NumPy,
SciPy. Note that there is no mention of gradient-based or non
gradient-based algorithms in the page
http://infinity77.net/go_2021/index.html.

>
> - "This effort stems from the fact that I got fed up with the current
> attitude of most mathematicians/numerical optimization experts, who tend to
> demonstrate the advantages of an algorithm based on “elapsed time” or “CPU
> time” or similar meaningless performance indicators." I don't know where
> you get that from, what are your sources? I have never seen an academic
> paper that used elapsed time or CPU time. The scientific papers I have read
> use the number of function evaluations to compare performance, which is a
> meaningful machine-independent performance measure if the total time is
> dominated by the time spend in the function, as it usually is.
>

While I agree that in recent years the shift to use functions evaluations
instead of CPU time has greatly prevailed, that was not the case a while
back. A few examples:

https://www.mat.univie.ac.at/~neum/ms/comparison.pdf
https://arxiv.org/pdf/1709.08242.pdf

But since that sentence seems to be annoying, I'll just remove it :-) .

>
> - I feel uneasy about your performance measure. It is whether the solver
> finds the minimum value of the function (why not the location of the
> minimum? Isn't that usually of interest?) within some fixed tolerance in
> 2000 function evaluations.
>

I have both the function value and the location of the minimum. Of course
they are both important, and of course I will use the location of the
minimum to do further analysis. The point of the exercise is: I know where
the global optimum (optima) lies, can the solver find it with a specific
tolerance? Most benchmarks are designed like this. The 2,000 is not a hard
limit anymore per se, as you will notice I have run all the 16 benchmarks
with different stopping conditions in terms of maximum number of function
evaluations. Specifically, all the benchmarks have been run 22 times, for
each run limiting the function evaluations budget to 100, then 200, 300,
400, ..., 2000, 5000, 10000. Some of the benchmarks have been extended to
50,000 (http://infinity77.net/go_2021/globopt.html).

> a) The maximum number of function evaluations that you use does not depend
> on the dimensionality of the problem, but it clearly should. The search
> space is larger in higher dimensional problems, so more evaluations are
> needed by any algorithm. That is also obvious from your results.
>

I agree, and this is why I have run all the benchmarks multiple times with
varying budgets of function evaluations, and which is also why many
benchmarks have a "Dimensionality Effect" chapter to look at what happens
when the problem grows in size (
http://infinity77.net/go_2021/gkls.html#size-dimensionality-effects,
http://infinity77.net/go_2021/mmtfg.html#size-dimensionality-effects , many
others). That said, most of the problems I usually deal with are
computationally expensive - so if one day my model has two parameters to
tune and I allow the algorithm to have 500 function evaluations, that may
take me one week to solve. If the month after it is decided that the model
is better described by a function of 10 parameters, I am not going to ask
for 2,500 simulations - it's going to take me a month and a half to get the
results back. The solver will have to deal with a 10-parameter model in
500-600 evaluations anyway.

> b) Instead of recording a binary outcome (success/failure to find the
> minimum in N evaluations), I think it would be more useful to record the
> number of evaluations until the minimum is reached and then give the mean
> or median number of function evaluations over many trials as well as the
> percentage of successful convergences. Algorithms can be ranked by
> robustness (the number of correctly solved problems) and convergence rate
> (the average/median number of function evaluations). Robustness and
> convergence rate may be anti-correlated. You are mixing the two in your
> performance measure, which makes it more difficult to interpret.
>

This is exactly what has been done for the SciPy Extended benchmark.
http://infinity77.net/go_2021/scipy_extended.html#test-functions-general-solvers-performances
tells you that this specific benchmark has been run considering for every
benchmark function 100 random starting points, and the reported statistics
(overall success, number of functions evaluations) refer to successful
optimizations only. I haven't repeated the 100 random starting points for
the other 15 benchmarks because it would take me forever to run them like
that.

>
> - http://infinity77.net/global_optimization/ does not list the same
> algorithms as the table in
> http://infinity77.net/go_2021/thebenchmarks.html#info-general-results

please see above: http://infinity77.net/global_optimization is old and
should not be looked at anymore.

> - While I think we agree that CPU-time is not a useful means to compare
> algorithms, it is then quite surprising to see Fig 0.6 with the CPU times.
> I suppose the pure-Python implementations perform so badly because the
> benchmark functions are all rather small python functions which are quick
> to evaluate so that the time spend inside the solver does matter.
>

That's generally correct. It was just a way for me to say that some of the
solvers are inherently slower than others, although for a real life problem
it shouldn't matter so much.

>
> - I see some an inconsistency between your declared goals and your
> benchmark. You say you only care about optimization of non-analytical
> functions in which the function computation involves a simulation, but then
> most of your benchmark functions are analytical functions. In other words,
> you do not measure the performance on non-analytical functions. It is quite
> possible that the ranking would look different if you used non-analytical
> functions in the benchmarks.
>

The problems in the benchmarks are analytical (or almost analytical, not
all benchmarks are like that - did you take a look at
http://infinity77.net/go_2021/huygens.html ? ) because the benchmarks are
designed to be that way. In global optimization, a benchmark is designed to
try and reproduce features that a real-life objective function may exhibit.
In the event that I will ever manage to get a paper published on this
exercise I will definitely include real-life problems in the analysis, as
this is always my aim.

The usefulness of this set of benchmarks come from the fact that, after all
this monster analysis with 16 different test suites, I can approach a new,
real-life study by saying: "look, I don't know which solvers will be the
best, but I will definitely start with MCS, SHGO or Dual Annealing"..

>
> - I would prefer to read technical documents written in a more
> professional detached writing style and I think others would do, too.
>

That will probably follow if and when (if ever) a paper may be published
about this. That said, I welcome, any time, corrections or modifications to
the writings - I understand that some of the paragraphs I have written can
be seen as less professional than needed, so I will be happy to change them.

Andrea.

> Regards,
> Hans
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210111/69622270/attachment-0001.html>