[Numpy-discussion] automatic differentiation with PyAutoDiff

Thu Jun 14 16:49:25 EDT 2012

On Thu, Jun 14, 2012 at 4:22 PM, srean <srean.list at gmail.com> wrote:
>>
>> For example, I wrote a library routine for doing log-linear
>> regression. Doing this required computing the derivative of the
>> likelihood function, which was a huge nitpicky hassle; took me a few
>> hours to work out and debug. But it's still just 10 lines of Python
>> code that I needed to figure out once and they're done forever, now.
>> I'd have been perfectly happy if I could have gotten those ten lines
>> by asking a random unreleased library I pulled off github, which
>> depended on heavy libraries like Theano and relied on a mostly
>> untested emulator for some particular version of the CPython VM. But
>> I'd be less happy to ask everyone who uses my code to install that
>> library as well, just so I could avoid having to spend a few hours
>> doing math. This isn't a criticism or your library or anything, it's
>> just that I'm always going to be reluctant to rely on an automatic
>> differentiation tool that takes arbitrary code as input, because it
>> almost certainly cannot be made fully robust. So it'd be nice to have
>> the option to stick a human in the loop.
>
> Log-linears are by definition too simple a model to appreciate
> auto-differentiation. Try computing the Hessian by hand  on a modestly
> sized multilayer neural network and you will start seeing the
> advantages. Or say computing the Hessian of a large graphical model.
> But I do have my own reservations about auto-diff. Until we have the
> smart enough compiler that does common subexpression elimination, and
> in fact even then, hand written differentiation code will often turn
> out to be more efficient. Terms cancel out (subtraction or division),
> terms factorize, terms can be arranged into an efficient Horner's
> scheme. It will take a very smart symbolic manipulation of the parse
> tree to get all that.
>

You're right - there is definitely a difference between a correct
gradient and a gradient is both correct and fast to compute.

The current quick implementation of pyautodiff is naive in this
regard.  However, it is delegating the heavy lifting to Theano. Theano
performs the sort of optimization-oriented tree manipulations you're
talking about, and Theano contributors have been tweaking them for a
few years now. They aren't always perfect but they are often pretty
good. In fact, I would go as far as saying that they are often
*better* than what you might do by hand if you don't sit down for a
long time and tune the hell out of your computation.

>  So places where I really need to optimize the derivative code, I
> would still do it by hand and delegate it to an AD system when the
> size gets unwieldy. In theory a good compromise is to let the AD churn
> out the code and then hand optimize it. But here readable output
> indeed does help.
>
>
> As far as correctness of the computed derivative is concerned,
> computing the dot product between the gradient of a function and the
> secant computed numerically from the function does guard against gross
> errors. If i remember correctly the scipy module on optimization
> already has a function to do such sanity checks. Of course it cannot
> guarantee correctness, but usually goes a long way.
>
> -- srean

True, even approximating a gradient by finite differences is a subtle
thing if you want to get the most precision per time spent. Another
thing I was wondering about was periodically re-running the original
bytecode on inputs to make sure that the derived bytecode produces the
same answer (!). Those two sanity checks would detect the two most
scary errors to my mind as a user:
a) that autodiff got the original function wrong
b) that autodiff is mis-computing a gradient.

- James
-- 
http://www-etud.iro.umontreal.ca/~bergstrj