[Numpy-discussion] automatic differentiation with PyAutoDiff

Thu Jun 14 17:53:16 EDT 2012

On Thu, Jun 14, 2012 at 9:22 PM, srean <srean.list at gmail.com> wrote:
>>
>> For example, I wrote a library routine for doing log-linear
>> regression. Doing this required computing the derivative of the
>> likelihood function, which was a huge nitpicky hassle; took me a few
>> hours to work out and debug. But it's still just 10 lines of Python
>> code that I needed to figure out once and they're done forever, now.
>> I'd have been perfectly happy if I could have gotten those ten lines
>> by asking a random unreleased library I pulled off github, which
>> depended on heavy libraries like Theano and relied on a mostly
>> untested emulator for some particular version of the CPython VM. But
>> I'd be less happy to ask everyone who uses my code to install that
>> library as well, just so I could avoid having to spend a few hours
>> doing math. This isn't a criticism or your library or anything, it's
>> just that I'm always going to be reluctant to rely on an automatic
>> differentiation tool that takes arbitrary code as input, because it
>> almost certainly cannot be made fully robust. So it'd be nice to have
>> the option to stick a human in the loop.
>
> Log-linears are by definition too simple a model to appreciate
> auto-differentiation. Try computing the Hessian by hand  on a modestly
> sized multilayer neural network and you will start seeing the
> advantages.

No, I'm saying I totally see the advantages. Here's the code I'm talking about:

    def _loglik(self, params):
        alpha, beta = self.used_alpha_beta(params)
        if np.any(alpha < 0):
            return 1e20
        total = 0
        for group in self._model._groups.itervalues():
            alpha_part = np.dot(group["alpha_matrix"], alpha)
            eff_beta_matrix = group["beta_matrix"].copy()
            nab = self._model._num_alpha_betas
            eff_beta_matrix[:, :nab] *= np.log(alpha_part[:, np.newaxis])
            exponent = np.dot(eff_beta_matrix, beta)
            Z = np.exp(exponent).sum()
            total += (group["counts"] * exponent).sum()
            total += group["counts"].sum() * -np.log(Z)
        return total

It's not complex, but it's complicated. Enough that propagating all
those multidimensional chain rules through it was a pain in the butt.
But not so complicated that an automatic tool couldn't have worked it
out, especially with some hand-holding (e.g. extracting the inner
loop, sticking the dict lookups into local variables, getting rid of
the .copy()).

Of course, maybe you were pointing out that if your derivative
calculation depends in some intrinsic way on the topology of some
graph, then your best bet is to have an automatic way to recompute it
from scratch for each new graph you see. In that case, fair enough!

> Or say computing the Hessian of a large graphical model.
> But I do have my own reservations about auto-diff. Until we have the
> smart enough compiler that does common subexpression elimination, and
> in fact even then, hand written differentiation code will often turn
> out to be more efficient. Terms cancel out (subtraction or division),
> terms factorize, terms can be arranged into an efficient Horner's
> scheme. It will take a very smart symbolic manipulation of the parse
> tree to get all that.
>
>  So places where I really need to optimize the derivative code, I
> would still do it by hand and delegate it to an AD system when the
> size gets unwieldy. In theory a good compromise is to let the AD churn
> out the code and then hand optimize it. But here readable output
> indeed does help.
>
> As far as correctness of the computed derivative is concerned,
> computing the dot product between the gradient of a function and the
> secant computed numerically from the function does guard against gross
> errors. If i remember correctly the scipy module on optimization
> already has a function to do such sanity checks. Of course it cannot
> guarantee correctness, but usually goes a long way.

Right, and what I want is to do those correctness checks once, and
then save the validated derivative function somewhere and know that it
won't break the next time I upgrade some library or make some
seemingly-irrelevant change to the original code.

-N