[Numpy-discussion] automatic differentiation with PyAutoDiff

Thu Jun 14 16:22:40 EDT 2012

>
> For example, I wrote a library routine for doing log-linear
> regression. Doing this required computing the derivative of the
> likelihood function, which was a huge nitpicky hassle; took me a few
> hours to work out and debug. But it's still just 10 lines of Python
> code that I needed to figure out once and they're done forever, now.
> I'd have been perfectly happy if I could have gotten those ten lines
> by asking a random unreleased library I pulled off github, which
> depended on heavy libraries like Theano and relied on a mostly
> untested emulator for some particular version of the CPython VM. But
> I'd be less happy to ask everyone who uses my code to install that
> library as well, just so I could avoid having to spend a few hours
> doing math. This isn't a criticism or your library or anything, it's
> just that I'm always going to be reluctant to rely on an automatic
> differentiation tool that takes arbitrary code as input, because it
> almost certainly cannot be made fully robust. So it'd be nice to have
> the option to stick a human in the loop.

Log-linears are by definition too simple a model to appreciate
auto-differentiation. Try computing the Hessian by hand  on a modestly
sized multilayer neural network and you will start seeing the
advantages. Or say computing the Hessian of a large graphical model.
But I do have my own reservations about auto-diff. Until we have the
smart enough compiler that does common subexpression elimination, and
in fact even then, hand written differentiation code will often turn
out to be more efficient. Terms cancel out (subtraction or division),
terms factorize, terms can be arranged into an efficient Horner's
scheme. It will take a very smart symbolic manipulation of the parse
tree to get all that.

 So places where I really need to optimize the derivative code, I
would still do it by hand and delegate it to an AD system when the
size gets unwieldy. In theory a good compromise is to let the AD churn
out the code and then hand optimize it. But here readable output
indeed does help.

As far as correctness of the computed derivative is concerned,
computing the dot product between the gradient of a function and the
secant computed numerically from the function does guard against gross
errors. If i remember correctly the scipy module on optimization
already has a function to do such sanity checks. Of course it cannot
guarantee correctness, but usually goes a long way.

-- srean