[SciPy-dev] time series implementation approach

David Douard david.douard at logilab.fr
Wed Dec 13 04:55:32 EST 2006


Hi,

I have not read carefully enougt te propsed implementations so I won't
give my POV on this. 
I have coded a pure Python timeserie ojbect for a project I'm working on
right now. I attach the code here.

The TS class I've coded if basically a non-regular TS: the time vector
is kept as a simple float numpy  array, values representing gmticks
(mx.DateTime).
TS can be linearly interpolated or step TS. Arithmetics on S does not
reaquire TS to share te same time support. 

This is just in case someone could be interested...

Note that this is unfinished WiP, probably buggy code, etc. 

On Tue, Dec 12, 2006 at 03:15:43PM -0500, Matt Knox wrote:
> Hi everyone. I have been discussing the approach I've used for my time series module (available in the sandbox) with another reader of this mailing list, and there is one particular issue that we seem to disagree on that I would like to hear some other thoughts on, if anyone has any opinions one way or the other.
> I'm going to just outline the two proposed approaches and highlight some pros/cons of each. And I'm certainly open to hearing another completely different approach all together if you have ideas.
> Common to both implementations is a Date class, where each Date has a frequency (daily, monthly, business days, etc) and a value. The value represents periods since the origin, where the origin is taken to be some chosen fixed date (currently 1st period in the year 1850). Also, every time series object has a frequency, and a starting date, in both proposed implementations.
> == Implementation A ==
> Sub-class masked array. This allows usage of all the currently available functions and methods for masked array and minimizes the amount of work needing to be done on actually writing any custom internals. Indexing for the timeseries object is always done relative to the start and end dates of the series.  So for example, if series1.start == '1 jan 1999' (shown as a string here for clarity, but not implemented as a string), and this was a daily frequency series, then series1[0] represents Jan 1, 1999, series1[2] represents the value at Jan 3, 1999, etc...
> The __getitem__ and __setitem__ methods would be overwritten to additionally accept a Date object of the same frequency as the series, so you could do something like: jan25val = series1[Date(freq='d',year=1999,month=1,day=25)]
> Functions would be provided to take multiple series and align them appropriately so they can be added together, and so forth.
> The drawback of this approach (relative to the next one to be discussed) is that an index used for one series has no inherent meaning to any other series unless you explicitly aligned them ahead of time. Doing something like: foo = series1[5:25] + series2[5:25] , doesn't make any sense unless you are careful to align the two series before hand.
>  
> == Implementation B ==
> Construct a new Class (let's call it ShiftingArray) that has no inherent size. It stores an underlying data array that is hidden from the user, and when points outside the bounds of this underlying array are requested, the array is dynamically resized to accomodate these new bounds. Index X means the same thing for any ShiftingArray. If I add two shifting arrays, they are aligned appropriately behind the scenes with no user intervention. The TimeSeries class is then constructed as a sub-class of ShiftingArray. This makes it possible to do things like the following:
> startDate = Date(freq='d',year=1999,month=1,day=25)endDate = startDate + 50
> mySlice = slice(int(startDate),int(endDate))foo1 = series1[mySlice]foo2 = series2[mySlice]
> blah = series1 + series2
> without worrying about where series1 and series2 start and end.
> A problem with this approach is that there is more overhead than just sub classing masked array because the dynamic shifting has a cost, and existing functions will have to be wrapped in order to act on the time series objects. The internals of the Class will be more complicated, but it takes away some micro-management from the user.
> ======================
> So... I realize this was a bit long winded, and for that I apologize, but if you have any thoughts on the subject, please share.
> Thanks,
> - Matt Knox
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev


-- 
David Douard                             LOGILAB, Paris (France)
Formations Python, Zope, Plone, Debian : http://www.logilab.fr/formations
Développement logiciel sur mesure :      http://www.logilab.fr/services
Informatique scientifique :              http://www.logilab.fr/science
-------------- next part --------------
A non-text attachment was scrubbed...
Name: timeseries.py
Type: text/x-python
Size: 14200 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20061213/4143cc65/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_timeseries.py
Type: text/x-python
Size: 10180 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20061213/4143cc65/attachment-0001.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20061213/4143cc65/attachment.sig>


More information about the SciPy-Dev mailing list