[Numpy-discussion] [ANN]: Taylorpoly, an implementation of vectorized Taylor polynomial operations and request for opinions

Sun Feb 28 15:06:43 EST 2010

2010/2/28 Sebastian Walter <sebastian.walter at gmail.com>:
>> I think I can use that to make my upy accept arbitrary functions, but
>> how do you apply sin() to a TTP?
>
> perform truncated Taylor expansion of  [y]_D = sin([x]_D), i.e.
> y_d = d^d/dt^d  sin( \sum_{k=0}^{D-1} x_k t^k) |_{t=0}
> to obtain an explicit algorithm.
>
>>
>> One more question: You said, t is an "external paramter".  I, and
>> maybe not only me, interpreted this as a complicated name for
>> "variable".  So I assumed it will be a parameter to some method of the
>> TTP.  But it isn't?  It's just the way to define the ring?

I guess you overlooked this question?

>> You could
>> define it the same in Fourier space, except that you have to make the
>> array large enough from the beginning?  Why not doing that, and
>> saying, your computation relies on the Fourier transform of the
>> representation?  Can this give insight why TTPs are a ring and why
>> they have zero divisors?
> it has zero divisors because for instance multiplication of the two
> polynomials t*t^{D-1}
> truncated at t^{D-1} yields is zero.

Yes, but I wanted to have a look from Fourier space of view.  Because
there everything is just a multiplication, and one does not have to
perform the convolution in mind.  I have to give up here.  In fact, I
do not really understand why my approach also works with DFT and not
only analytically with steady FT.  Consider the snippet:

>>> p1 = gdft_polynomials.Polynomial([1])
>>> p1.get_dft(3)
array([ 1.+0.j,  1.+0.j,  1.+0.j])
>>> p2 = gdft_polynomials.Polynomial([0, 1])
>>> p2.get_dft(3)
array([ 1.0+0.j       , -0.5+0.8660254j, -0.5-0.8660254j])
>>> p2.get_dft(4)
array([  1.00000000e+00 +0.00000000e+00j,
         6.12303177e-17 +1.00000000e+00j,
        -1.00000000e+00 +1.22460635e-16j,  -1.83690953e-16 -1.00000000e+00j])

As one increases the number of padding zeros, one increases Fourier
space resolution, without affecting result:

>>> p3 = gdft_polynomials.Polynomial([0, 1, 0, 0, 0])
>>> print p2 * p2
Polynomial(real part =
[  1.85037171e-16  -7.40148683e-17   1.00000000e+00]
imaginary part =
[  2.59052039e-16   7.40148683e-17   0.00000000e+00])
>>> print p2 * p3
Polynomial(real part =
[  1.66533454e-16   1.48029737e-16   1.00000000e+00  -7.40148683e-17
  -4.44089210e-16  -3.70074342e-17]
imaginary part =
[  9.25185854e-17   1.48029737e-16   2.96059473e-16   1.11022302e-16
  -3.70074342e-16  -1.44497045e-16])
>>>

It's a bit of mystery to me.  Of course, one can argue, well, DFT is
information maintaining, and thus one can "feel" that it should work,
but this is only a gut feeling.

>>>  E.g. if you have a function
>>> f: R^N -> R
>>> x -> y=f(x)
>>> where x = [x1,...,xN]
>>>
>>> and you want to compute the gradient g(x) of f(x), then you can compute
>>> df(x)/dxn by propagating  the following array of Taylor polynomials:
>>>
>>> x = numpy.array( UTPS([x1_0, 0]), ..., UTPS([xn_0, 1]), ...,
>>> UTPS([xN_0,0]), dtype=object)
>>> y = f(x)
>>
>> So what is the result of applying f to some UTPS instance, is it a
>> plain number, is an UTPS again?  How do you calculate?
>>
>> Can one calculate the derivative of some function using your method at
>> a certain point without knowledge of the analytical derivative?  I
>> guess that's the purpose.
> Yes, that's the whole point: Obtaining (higher order) derivatives of
> functions at machine precision for which no symbolic representation is
> readily available.
> That includes computer codes with recursions (e.g. for loops) that are
> a no-go for symbolic differentiation. Supposedly (I've never done
> that) you can even differentiate Monte Carlo simulations in that way.

http://en.wikipedia.org/wiki/Automatic_differentiation:
"Both classical methods have problems with calculating higher
derivatives, where the complexity and errors increase. Finally, both
classical methods are slow at computing the partial derivatives of a
function with respect to many inputs, as is needed for gradient-based
optimization algorithms. Automatic differentiation solves all of these
problems."

Yeah.

Note at this point, that there is no chance for your project to be
integrated in scipy, because you maybe HAVE TO PUBLISH UNDER GPL/CPAL
(the ADOL-C is licensed GPL or CPAL).  I cannot find CPL on
www.opensource.org, but I guess it has been renamed to CPAL?  Anyway,
CPAL looks long enough to be GPL style ;-).  I also published my
projects under GPL first, and switched now to MIT, because Python,
numpy, scipy, matplotlib, ... are published under BSD kind too, and in
fact I like MIT/BSD more.  Please check if your aren't violating GPL
style licenses with publishing under BSD style.

>>>  E.g. if you have a function
>>> f: R^N -> R
>>> x -> y=f(x)
>>> where x = [x1,...,xN]
>>>
>>> and you want to compute the gradient g(x) of f(x), then you can compute
>>> df(x)/dxn by propagating  the following array of Taylor polynomials:
>>>
>>> x = numpy.array( UTPS([x1_0, 0]), ..., UTPS([xn_0, 1]), ...,
>>> UTPS([xN_0,0]), dtype=object)
>>> y = f(x)

But doesn't the call f(x) with x.shape = (N,) result in an array too?
But you want a scalar number?

>>> if you want to have the complete gradient, you will have to repeat N
>>> times. Each time for the same zero'th coefficients [x1,...,xN].
>>>
>>> Using the vecorized version, you would do only one propagation
>>> x = numpy.array( UTPS([x1_0, 1,0,...,0]), ..., UTPS([xn_0,
>>> 0,...,1,...0]), ..., UTPS([xN_0,0,....,1]), dtype=object)
>>> y = f(x)
>>>
>>> i.e. you save the overhead of calling the same function N times.
>>
>> Ok, I understand.  Today it's too late, I will reason tomorrow about it.

I think I grasped the idea.  But the thing is really tricky.

I thought:
So UTPS is not the thing you implemented, but you implemented rather
the complete array.  Right?
But it's maybe wrong:

UTPS([x1_0, 1, 0, ..., 0]) is with D = 1 and P = N (f: R^N -> R).
I.e., P = N polynomials of degree 1, for calculating the first-order
derivative?  That's why your question (1) from Feb 27: What to hand
over?  I would say, make it possible to hand over an (P, N) ndarray.
It will increase impact of your module (and graspableness)
dramatically.  And you have an indication how to interpret the array
handed over without additional init args to UTPS().

I think the nominal value is always stored in x_{i, 0}, am I wrong?

I'm not shure what to use as initial UTPSs.  Can you make it possible
that one doesn't have to think about that?  What would be great, if I
have a target function f(a, b, c, ...) stored, to hand over instead of
ordinary numbers objects from your package, and everything works out
such that I end up in the result with an object where both the nominal
value is stored as also the gradient.  Is that feasible?  You could
also monkey-patch numpy.sin etc. by a replacement calling the original
numpy.sin with the nominal values but also doing the ttp job.

>> I guess it's not overhead.  The number of calculations should be in
>> equilibrium at very low D, am I wrong?  And you win to not have to
>> compile a C library but only native text Python code.
>
> Well, I'm really no expert on the DFT. But doesn't the  DFT compute on
> the complex numbers? you'll have extra overhead (let's say factor >=
> 2?)
> And as far as I can tell,  you do the computations on padded arrays
> which possibly introduces cache misses (maybe another factor 2?)

What are "cache misses"?

> Isn't the advantage of the DFT not that you can use the FFT which
> would reduce the runtime from O(D^2) to O(D log(D))?
> I'm pretty sure that only pays off for D larger than 10.

Your algorithm stays at O(D^2) as you do the convolution by hand, no?

>> E.g., your
>> optimisation package is quite interesting for me, but I'm using
>> Windows as my main system, so it will be painful to compile.  And the
>> code is more straightforward, more concise, easier to maintain and
>> easier to understand, ... :-)  I really do not want to diminish your
>> programming skills, please do not misunderstand!  I only mean the
>> subject.
> The project uses scons which is available for windows as binaries.
> I haven't tried it myself but I'm confident that it's a 1 minutes job
> on windows.

The optimisation package or utp?  I want to give utp a try.

> I have implemented some of the algorithms just as you explained in
> another package
> (http://github.com/b45ch1/algopy/blob/master/algopy/utp/utps.py).
> But I don't think the code looks easier to maintain than the C code
> and it's also slower.

Can you explain which repo I should clone at best?  What are the
differences between algopy, taylorpoly and pyadolc?

I'm getting a bit confused slowly with all the mails in this thread,
and also the subject isn't that easy ...
http://en.wikipedia.org/wiki/Automatic_differentiation also refers to
TTP only in a marginal note :-(

Friedrich