[SciPy-dev] optimizers module

Tue Aug 21 04:04:44 EDT 2007

Matthieu Brucher wrote:
>
>     > So the state dictionary is only responsible for what is
>     specifically
>     > connected to the function. Either the parameters, or different
>     > evaluations (hessian, gradient, direction and so on). That's why you
>     > "can't" put gradtol in it (for instance).
>     I'm not know your code very good yet, but why can't you just set
>     default
>     params as I do in /Kernel/BaseProblem.py?
>
>
>
> Because there are a lot of default parameters that could be set 
> depending on the algorithm. From an object-oriented point of view, 
> this way of doing things is correct: the different modules posess the 
> arguments because they are responsible for using them. Besides, you 
> may want a different gradient tolerance for different sub-modules.
I do have other gradtol for different problems, for example NLP and NSP 
(non-smooth). In any problem class default gradtol value should be known 
to user constant, like TOMLAB do. Setting different gradtol for each 
solver is senseless, it's matter of how close xk is to x_opt (of course, 
if function is tooo special and/or non-convex and/or non-smooth default 
value maybe is worth to be changed by other, but it must decide user 
according to his knowledge of function).
The situation to xtol or funtol or diffInt is more complex, but still 
TOMLAB has their default constants common to all algs, almost in the 
same way as I do, and years of successful TOMLAB spreading (see 
http://tomopt.com/tomlab/company/customers.php) are one more evidence 
that this approach is good.
As for special solver params, like space transformation parameter in 
ralg, it can be passed from dictionary or something like that, for 
example in openopt for MATLAB I do
p.ralg.alpha = 2.5
p.ralg.n2 = 1.2
in Python it's not implemented yet, but I intend to do something like
p = NSP(...)
p.ralg = {'alpha' : 2.5, 'n2': 1.2}
(so other 3 ralg params remain unchanged - they will be loaded from 
default settings)
r = p.solve('ralg')

either

p.ralg = p.setdefaults('ralg')
print p.ralg.alpha # previous approach didn't allow
...
r = p.solve('ralg')

BTW TOMLAB handle numerical gradient obtaining the same way that I do. I 
think 90% of users doesn't care at all about which tolfun, tolx etc are 
set and which way gradient is calculating - maybe, lots of them don't 
know at all that gradient is obtaining. Trey just require problem to be 
solved - and no matter which way, use that way gradient or not (as you 
see, the only one scipy optimizer that handles non-lin constraints is 
cobyla and it takes no gradient by user).
Of course, if problem is too complex to be solved quickly, they can 
begun investigating ways to speedup and gradient probably will be the 
1st one.
>  
>
>     I still think the approach is incorrect, user didn't ought to supply
>     gradient, we should calculate it by ourselves if it's absent. At least
>     any known to me optimization software do the trick.
>
>
>
> Perhaps, but I have several reasons :
> - when it's hidden, it's a magic trick. My view of the framework is 
> that it must not do anything like that. It's designed for advanced 
> users that do not want those tricks
> - from an architectural point of view, it's wrong, plainly wrong. I'm 
> a scientist specialized in electronics and signal processing, but I 
> have high requirements for everything that is IT-oriented. 
> Encapsulation is one part of the object principle, and implementing 
> finite-difference outside breaks it (and I'm not talking about code 
> duplication).
So, as I said, it could be solved like p.showDefaults('ralg') or p = 
NLP(...), print p.xtol. For 99.9% of users it should be enough.
>  
>
>     Also, as you see my f_and_df is optimized to not recalculate f(x0)
>     while
>     gradient obtaining numerically, like some do, for example
>     approx_fprime
>     in scipy.optimize. For problems with costly funcs and small nVars
>     (1..5)
>     speedup can be significant.
>
>
>
> Yes, I agree. For optimization, an additional argument could be given 
> to the gradient that will be used if needed (remember that the other 
> way of implementing finite difference do not use f(x0)), but it will 
> bring some trouble to the user (every gradient function must have this 
> additional argument).
I didn't see any troubles for user, he just provides either f and df or 
only f, as anywhere else, w/o any changes. Or did you mean gradient to 
be func(x, arg1, arg2,...)? There are no troubles as well - for example 
redefinition df = lambda x: func(x, arg1, arg2) from the very beginning.

As for my openopt, there are lots of tools to prevent recalculating for 
twice. One of them is to check for previous x, if its the same - return 
previous fval (cval, hval). for example if a solver (or user from his 
df) calls
F = p.f(x)
DF = p.df(x)
and df is calculating numerically, it will not recalculate f(x=x0), it 
will just use previous value from calculating p.f (because x is equal to 
p.FprevX).
the same to dc, dh (c(x)<=0, h(x)=0). As you know, comparison 
numpy.all(x==xprev) don't take too much time, at least it takes much 
less than 0.1% of whole time/cputime elapsed, while I was observing 
MATLAB profiler results on different NL problems.

Regards, D.