[SciPy-dev] Generic polynomials class (was Re: Volunteer for Scipy Project)

Wed Oct 14 14:28:20 EDT 2009

On Wed, Oct 14, 2009 at 11:08 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Oct 13, 2009 at 10:52 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Tue, Oct 13, 2009 at 10:38 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Tue, Oct 13, 2009 at 7:24 PM, Anne Archibald
>> > <peridot.faceted at gmail.com>
>> > wrote:
>> >>
>> >> 2009/10/13 Charles R Harris <charlesr.harris at gmail.com>:
>> >> >
>> >> >
>> >> > On Tue, Oct 13, 2009 at 5:55 PM, Anne Archibald
>> >> > <peridot.faceted at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> 2009/10/13 Charles R Harris <charlesr.harris at gmail.com>:
>> >> >> >
>> >> >> >> I'm not sure I see why returning all those NotImplementedErrors
>> >> >> >> is a
>> >> >> >> problem - yes, the code needs to be written to override the
>> >> >> >> methods
>> >> >> >> that return NotImplementedError, but that code needs to be
>> >> >> >> written
>> >> >> >> no
>> >> >> >> matter how we set up an interface to it.
>> >> >> >
>> >> >> > It's not trivial to do a complete workup. My template class is 698
>> >> >> > lines
>> >> >> > long, but it exists once in one place, and that does the trick for
>> >> >> > *all*
>> >> >> > the
>> >> >> > classes.
>> >> >>
>> >> >> Aha. I think I see where the confusion is. In my proposed interface,
>> >> >> the coefficient-oriented functions would *be* the ones in the Basis
>> >> >> object. So, for example,
>> >> >>
>> >> >> chebadd(c1,c2,interval) -> ChebyshevBasis(interval).add(c1, c2)
>> >> >>
>> >> >> or, more likely, you'd have somewhere earlier
>> >> >>
>> >> >> cheb = ChebyshevBasis(interval)
>> >> >>
>> >> >> and then it's
>> >> >>
>> >> >> chebadd(c1,c2,interval) -> cheb.add(c1,c2)
>> >> >>
>> >> >> So the coefficient-oriented interface that we need to have is what
>> >> >> lives in the basis objects, and it is what the Polynomial object
>> >> >> uses
>> >> >> to implement (e.g.) p1*p2. There would be no separate chebmul,
>> >> >> polymul, lagrangemul, ....
>> >> >>
>> >> >
>> >> > But the separate functions are what we want. They provide the most
>> >> > flexible
>> >> > low level implementation and make the fewest assumptions as to what
>> >> > the
>> >> > programmer might want to use them for. You don't really have a base
>> >> > class,
>> >> > you essentially have classes derived from different base classes,
>> >> > which
>> >> > isn't so different from what I am proposing.
>> >>
>> >> Do we really want the separate functions, rather than methods on a
>> >> basis object? Why? I don't really see how they're more flexible or
>> >> make fewer assumptions, and they become very cumbersome (and prevent
>> >> some important optimizations) when working with polynomials in the
>> >> Lagrange basis. It really can be the difference between chebadd(...)
>> >> and cheb.add(...), in the (common?) case where people aren't
>> >
>> > But why do that if you already have the name space when you import the
>> > module? It's redundant.
>> >
>> >>
>> >> specifying the interval. And if people do want to specify the
>> >> interval, using a basis object makes it much easier to avoid
>> >> forgetting it and getting nonsensical results (or worse, sensible
>> >> results until you try a different interval).
>> >>
>> >
>> > That is because they are low level building blocks, they should have the
>> > absolute minimum of crud stuck on, and that means the interval is [-1,
>> > 1].
>> > If you need more, that is what a class is for. That how low level things
>> > should work, small functions doing the minimum amount provide the most
>> > flexibility. The programmer should be handed enough rope to hang
>> > themselves
>> > at that level if they so choose. That's why folks use C and not Pascal.
>> >
>> >>
>> >> In this respect, the current interface in chebyshev.py looks
>> >> error-prone to me: you can't suppy an interval to chebadd and chebmul,
>> >> but if you forget it for chebval you get bogus answers. (And in fact
>> >> chebint and chebder are wrong, or at least misleading, if you use a
>> >> different interval.)
>> >>
>> >
>> > Tracking things like intervals is best delegated to higher level
>> > functions
>> > where you can deal with it without worrying about the low level
>> > implementation at the same time. Of course there is some loop-de-loop
>> > involved to get to the ideal dividing line, but it is important to draw
>> > that
>> > line.
>> >
>> >>
>> >> I'm not sure what you mean when you say I don't really have a base
>> >> class: Basis is literally the base class for all Basis objects, and it
>> >> provides nontrivial functionality (e.g. the power implementation) that
>> >> would otherwise need to be repeated for each representation.
>> >> Polynomial is the base class for all polynomials.
>> >>
>> >> >> The Basis object is effectively just a namespace holding the
>> >> >> collection of functions, and possibly any data they need (interval,
>> >> >> for example). So the separation you're describing is still present
>> >> >> in
>> >> >> my proposed interface; it's the separation between Basis objects
>> >> >> (which contain the functions but no particular plumbing) and the
>> >> >> Polynomial objects (which are all plumbing).
>> >> >
>> >> > Why not let the module hold the functions? It is the more natural
>> >> > namespace
>> >> > for python.
>> >>
>> >> Because there is important context - intervals, lists of specified
>> >> points - that must be provided to all the coefficient-array functions.
>> >> And collections of methods on objects are very natural in python.
>> >>
>> >
>> > That is higher level stuff and doesn't belong at the bottom, it should
>> > be
>> > separate IMHO.The functions are the ingredients, not the cake. I've
>> > spent
>> > too much time pulling apart classes to get to the reusable bits not to
>> > want
>> > them separated out up front. Now you could start with the separate
>> > functions
>> > and then construct a basis to pass into the initializer. At that point
>> > you
>> > and I are operating along the same lines except I don't pass them to the
>> > contructor, I put them in the local environment where the class methods
>> > will
>> > find them, that's what the code completion does and Python handles that
>> > efficiently. They are consequently more like static (class) methods than
>> > instance methods. That way I end up with a whole new class that uses
>> > those
>> > functions, inherits from nothing but object, and has a minimal
>> > constructor.
>> >
>> > I'll admit that my class implementation may not be suitable for some of
>> > the
>> > other types of polynomials you are looking at, I hadn't been thinking
>> > about
>> > them yet. On the other hand it is a pretty complete implementation that
>> > requires nothing more than the functions, raises no NotImplementedErrors
>> > and
>> > should work with most of the orthogonal polynomial bases. And it can be
>> > reworked without reference to the low level functions.
>>
>> How expensive can _cseries_to_zseries, as_series and similar be?
>> Since you currently have all main calculations in self-contained
>> functions, you need to convert and check the inputs each time.  If
>> these were expensive intermediate results, then methods could save
>> time in this. For short series, this doesn't seem to be a problem with
>> polycheb, but inside a loop I'm not so sure.
>> (as an aside: we were also thinking about moving some methods in
>> statsmodels to functions, but in many cases we would need a larger
>> number of function arguments of intermediate results or duplicate
>> calculations, which makes it less useful.)
>>
>
> One reason for the low level functions is that they are short and can be
> easily converted to cython functions or be library functions written in
> fortran or whatever. The z series could be done that way, but at the cython
> level they could probably be dispensed with and the underlying Cheybyshev
> series used instead, that's one reason they are private functions. What they
> brought to the python case was some vectorization of the algorithms. Another
> way to look at them is as symbolic Fourier transforms.
>
> Finding the proper collection of small routines can be hard work, which
> seems like it might be the case in the statsmodels. Sometimes profiling will
> reveal small bits that are bottlenecks and they are potential candidates.
> Another view that helps is to separate the data shuffling and verification
> to the python level and think of that as an interface to some underlying
> routines in a faster language. I don't know enough about statsmodels to make
> any more useful suggestions ;)
>
>>
>> I was playing a bit with the cats, and inheritance seems to work
>> without problems.
>>
>
> Things can be done with inheritance, but in this case the need was for a
> common mass produced interface and the desire to avoid work. Inheritance
> isn't the only way to build interfaces, unless it happens to be the only
> tool at hand. Templated classes are another way that is closer to the duck
> typing model and they seem to work better in some circumstances. For
> instance, in the C++ STL templated classes are provided so that you can
> have, say, lists of different (uniform) types, but the lists all have the
> same methods. That works better than trying to use inheritance. I think that
> in general the templated class method is preferable because the design is
> flatter and more flexible, and it doesn't use hokey virtual base classes and
> such. Inheritance works best when you need to call a bunch of different
> objects through a common base class, i.e., the objects are all "is a" base
> class, and I don't think that requirement is there for the polynomial
> classes. The temptation of inheritance is to use it for implementation, and
> more often than not that leads to trouble down the line. The upshot is that
> I was looking for the python equivalent of templated classes and completions
> looked to be the best bet, so I gave it a shot to see how it looked. I think
> it looks OK, but the functional approach is a bit strange to folks like
> myself who have spent most of our lives working with imperative languages.

<a bit of topic>

I didn't know polynomials are going into mass production.

I like subclasses, I think I would have a much harder time keeping track of
90 stats.distribution classes if they were generated with a factory function
than as subclasses. The current structure keeps all the information of each
distribution together and I don't have to chase down many individual
functions. It is also relatively easy to tell whether a problem is in
the generic
superclass (interface) code or in the individual distributions.

I tried to convert the cats example to using metaclasses and it looks ok.
Your class factory approach or the metaclasses look useful for generating
non-linear transformations of distributions on the fly, where the
transformation
functions are not known in advance.

Josef

>
>> Following roughly the discussion, I would find it useful if the
>> functions specify the used or assumed domain in the doc strings. Since
>> I'm not very familiar with Chebyshev polynomials, I wouldn't know
>> which functions would require rescaling of the domain.
>>
>
> Yes, that should be documented.
>
> Chuck
>
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>