[SciPy-dev] optimizers module
dmitrey
openopt at ukr.net
Tue Aug 21 04:04:44 EDT 2007
Matthieu Brucher wrote:
>
> > So the state dictionary is only responsible for what is
> specifically
> > connected to the function. Either the parameters, or different
> > evaluations (hessian, gradient, direction and so on). That's why you
> > "can't" put gradtol in it (for instance).
> I'm not know your code very good yet, but why can't you just set
> default
> params as I do in /Kernel/BaseProblem.py?
>
>
>
> Because there are a lot of default parameters that could be set
> depending on the algorithm. From an object-oriented point of view,
> this way of doing things is correct: the different modules posess the
> arguments because they are responsible for using them. Besides, you
> may want a different gradient tolerance for different sub-modules.
I do have other gradtol for different problems, for example NLP and NSP
(non-smooth). In any problem class default gradtol value should be known
to user constant, like TOMLAB do. Setting different gradtol for each
solver is senseless, it's matter of how close xk is to x_opt (of course,
if function is tooo special and/or non-convex and/or non-smooth default
value maybe is worth to be changed by other, but it must decide user
according to his knowledge of function).
The situation to xtol or funtol or diffInt is more complex, but still
TOMLAB has their default constants common to all algs, almost in the
same way as I do, and years of successful TOMLAB spreading (see
http://tomopt.com/tomlab/company/customers.php) are one more evidence
that this approach is good.
As for special solver params, like space transformation parameter in
ralg, it can be passed from dictionary or something like that, for
example in openopt for MATLAB I do
p.ralg.alpha = 2.5
p.ralg.n2 = 1.2
in Python it's not implemented yet, but I intend to do something like
p = NSP(...)
p.ralg = {'alpha' : 2.5, 'n2': 1.2}
(so other 3 ralg params remain unchanged - they will be loaded from
default settings)
r = p.solve('ralg')
either
p.ralg = p.setdefaults('ralg')
print p.ralg.alpha # previous approach didn't allow
...
r = p.solve('ralg')
BTW TOMLAB handle numerical gradient obtaining the same way that I do. I
think 90% of users doesn't care at all about which tolfun, tolx etc are
set and which way gradient is calculating - maybe, lots of them don't
know at all that gradient is obtaining. Trey just require problem to be
solved - and no matter which way, use that way gradient or not (as you
see, the only one scipy optimizer that handles non-lin constraints is
cobyla and it takes no gradient by user).
Of course, if problem is too complex to be solved quickly, they can
begun investigating ways to speedup and gradient probably will be the
1st one.
>
>
> I still think the approach is incorrect, user didn't ought to supply
> gradient, we should calculate it by ourselves if it's absent. At least
> any known to me optimization software do the trick.
>
>
>
> Perhaps, but I have several reasons :
> - when it's hidden, it's a magic trick. My view of the framework is
> that it must not do anything like that. It's designed for advanced
> users that do not want those tricks
> - from an architectural point of view, it's wrong, plainly wrong. I'm
> a scientist specialized in electronics and signal processing, but I
> have high requirements for everything that is IT-oriented.
> Encapsulation is one part of the object principle, and implementing
> finite-difference outside breaks it (and I'm not talking about code
> duplication).
So, as I said, it could be solved like p.showDefaults('ralg') or p =
NLP(...), print p.xtol. For 99.9% of users it should be enough.
>
>
> Also, as you see my f_and_df is optimized to not recalculate f(x0)
> while
> gradient obtaining numerically, like some do, for example
> approx_fprime
> in scipy.optimize. For problems with costly funcs and small nVars
> (1..5)
> speedup can be significant.
>
>
>
> Yes, I agree. For optimization, an additional argument could be given
> to the gradient that will be used if needed (remember that the other
> way of implementing finite difference do not use f(x0)), but it will
> bring some trouble to the user (every gradient function must have this
> additional argument).
I didn't see any troubles for user, he just provides either f and df or
only f, as anywhere else, w/o any changes. Or did you mean gradient to
be func(x, arg1, arg2,...)? There are no troubles as well - for example
redefinition df = lambda x: func(x, arg1, arg2) from the very beginning.
As for my openopt, there are lots of tools to prevent recalculating for
twice. One of them is to check for previous x, if its the same - return
previous fval (cval, hval). for example if a solver (or user from his
df) calls
F = p.f(x)
DF = p.df(x)
and df is calculating numerically, it will not recalculate f(x=x0), it
will just use previous value from calculating p.f (because x is equal to
p.FprevX).
the same to dc, dh (c(x)<=0, h(x)=0). As you know, comparison
numpy.all(x==xprev) don't take too much time, at least it takes much
less than 0.1% of whole time/cputime elapsed, while I was observing
MATLAB profiler results on different NL problems.
Regards, D.
More information about the SciPy-Dev
mailing list