[SciPy-dev] time series implementation approach

Matt Knox mattknox_ca at hotmail.com
Tue Dec 12 18:20:54 EST 2006


> How does either implementation handles unevenly spaced points (both
> within and between series) and when the series are in different units
> (such as days versus weeks)?
 
> I read from Pierre's email that 'the series are regularly spaced' but
> I do think you need to address it sooner than later because it may
> show major flaws in one implementation. With uneven spacing, you
> probably need a sparse structure to avoid wasting resources.
> With different units, one of the series must be converted either by
> the user or by code (which could be rather complex to get correct).
 
This would be handled the same in either implementation. I have already written functions in C that convert a series of one frequency to another specified frequency. Yes, these conversion functions are somewhat complex to write, but it only needs to be done once. The method is user-specifiable (eg. if going from daily to monthly, you can specify to average the values for each month, sum them, etc...). In either approach the underlying data is just a masked array, so missing values would just be masked. What the indices of the series represent is dependent on the frequency of the series. So if the series is monthly frequency, and index x represents January 2005, then index x+1 represents February 2005, etc.
 
> Also, what you really mean by 'blah = series1 + series2'?
> Do you mean concatenation as with strings, or summation as with
> numbers, or some sort of merging of values?
 
I mean element-wise addition.
 
Eg.
 
Suppose series1 has the following data:
 
 jan 1, 2005 = 1
 jan 2, 2005 = 1
 jan 3, 2005 = 1
 
and series 2 has the following data:
 
jan 2, 2005 = 1
jan 3, 2005 = 1
jan 4, 2005 = 1
 
then blah = series1 + series2  would give:
 
jan 2, 2005 = 2
jan 3, 2005 = 2
 
behind the scenes what happens is that series1 and series2 are resized so that their indices match up, and then the underlying masked arrays are just added together as normal. You can take a look at my example script in the scipy sandbox if you want a clearer idea of how the current design works.  http://svn.scipy.org/svn/scipy/trunk/Lib/sandbox/timeseries/examples/
 
> Many of the time series methods can be applied to other series than
> just time. Are you going to allow other types of series?
 
no reason not to, but I'm mainly concerned with figuring out a good structure to the TimeSeries and Date classes right now that will provide a good foundation to build on.
 
- Matt


More information about the SciPy-Dev mailing list