[SciPy-dev] time series implementation approach

Bruce Southey bsouthey at gmail.com
Tue Dec 12 17:32:47 EST 2006


Hi,
I guess my real concern is how either of these implementations are
going to be used and abused.

How does either implementation handles unevenly spaced points (both
within and between series) and when the series are in different units
(such as days versus weeks)?

I read from Pierre's email that 'the series are regularly spaced' but
I do think you need to address it sooner than later because it may
show major flaws in one implementation. With uneven spacing, you
probably need a sparse structure to avoid wasting resources.

With different units, one of the series must be converted either by
the user or by code (which could be rather complex to get correct).

Also, what you really mean by 'blah = series1 + series2'?
Do you mean concatenation as with strings, or summation as with
numbers, or some sort of merging of values?

>From a different angle, Implementation A looks to be easier to
maintain and troubleshoot than Implementation B.

Also, Implementation B appears to me to have a larger overhead than
Implementation A as many things have to happen 'behind the scenes'.
However, it may have the advantage of only being slow in a few places
and much quicker in general.

Many of the time series methods can be applied to other series than
just time. Are you going to allow other types of series?

Regards
Bruce

On 12/12/06, Matt Knox <mattknox_ca at hotmail.com> wrote:
>
> Hi everyone. I have been discussing the approach I've used for my time
> series module (available in the sandbox) with another reader of this mailing
> list, and there is one particular issue that we seem to disagree on that I
> would like to hear some other thoughts on, if anyone has any opinions one
> way or the other.
>
> I'm going to just outline the two proposed approaches and highlight some
> pros/cons of each. And I'm certainly open to hearing another completely
> different approach all together if you have ideas.
>
> Common to both implementations is a Date class, where each Date has a
> frequency (daily, monthly, business days, etc) and a value. The value
> represents periods since the origin, where the origin is taken to be some
> chosen fixed date (currently 1st period in the year 1850). Also, every time
> series object has a frequency, and a starting date, in both proposed
> implementations.
>
> == Implementation A ==
>
> Sub-class masked array. This allows usage of all the currently available
> functions and methods for masked array and minimizes the amount of work
> needing to be done on actually writing any custom internals. Indexing for
> the timeseries object is always done relative to the start and end dates of
> the series.  So for example, if series1.start == '1 jan 1999' (shown as a
> string here for clarity, but not implemented as a string), and this was a
> daily frequency series, then series1[0] represents Jan 1, 1999, series1[2]
> represents the value at Jan 3, 1999, etc...
>
> The __getitem__ and __setitem__ methods would be overwritten to additionally
> accept a Date object of the same frequency as the series, so you could do
> something like: jan25val =
> series1[Date(freq='d',year=1999,month=1,day=25)]
>
> Functions would be provided to take multiple series and align them
> appropriately so they can be added together, and so forth.
>
> The drawback of this approach (relative to the next one to be discussed) is
> that an index used for one series has no inherent meaning to any other
> series unless you explicitly aligned them ahead of time. Doing something
> like: foo = series1[5:25] + series2[5:25] , doesn't make any sense unless
> you are careful to align the two series before hand.
>
>  == Implementation B ==
>
> Construct a new Class (let's call it ShiftingArray) that has no inherent
> size. It stores an underlying data array that is hidden from the user, and
> when points outside the bounds of this underlying array are requested, the
> array is dynamically resized to accomodate these new bounds. Index X means
> the same thing for any ShiftingArray. If I add two shifting arrays, they are
> aligned appropriately behind the scenes with no user intervention. The
> TimeSeries class is then constructed as a sub-class of ShiftingArray. This
> makes it possible to do things like the following:
>
> startDate = Date(freq='d',year=1999,month=1,day=25)
> endDate = startDate + 50
>
> mySlice = slice(int(startDate),int(endDate))
> foo1 = series1[mySlice]
> foo2 = series2[mySlice]
>
> blah = series1 + series2
>
> without worrying about where series1 and series2 start and end.
>
> A problem with this approach is that there is more overhead than just sub
> classing masked array because the dynamic shifting has a cost, and existing
> functions will have to be wrapped in order to act on the time series
> objects. The internals of the Class will be more complicated, but it takes
> away some micro-management from the user.
>
> ======================
>
> So... I realize this was a bit long winded, and for that I apologize, but if
> you have any thoughts on the subject, please share.
>
> Thanks,
>
> - Matt Knox
>
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>
>
>



More information about the SciPy-Dev mailing list