[SciPy-user] [timeseries] Missing dates

Pierre GM pgmdevlist at gmail.com
Fri Apr 3 19:23:21 EDT 2009


On Apr 3, 2009, at 6:46 PM, Christiaan Putter wrote:

> Hi guys,
>
> I've been playing with timeseries for the last hour or so and it's
> pretty cool.  Still lots of things I have to go through.

Write your experience down and keep us posted, that'd be great  
material for a FAQ.

> In the one plotting example (using yahoo finance) I saw that one can
> fill missing dates before plotting so that the missing ones get
> masked.  Though when applying some moving windows functions that
> caused all periods that were effected by the missing values to also
> become masked, which isn't the behaviour I was expecting.  It does
> make sense to do it that way though.
>
>  I'm working with stock
> prices, so the "missing" dates over the weekends will increase file
> size by more then 30%.  Is there any other reason  to fill in missing
> dates besides for plotting?


Most functions (like convert or align_series) do require consecutive  
dates, hence the need for fill_missing_dates. This function ensures  
that your dates are all consecutive, and the values corresponding to  
the initially missing dates are replaced by the constant masked.

For the case of movering average (for example), if you don't fill the  
dates, you may end up grouping values  separated by more than your  
window size, which may not give the results you'd expect.

For the case of plotting: when plotting a series w/ missing dates  
using some line between the points, you'll connect existing points  
(eg, Friday w/ Monday). If you want a separation between Fridays and  
Mondays, fill the dates first.
OK, one picture is worth a lot of words, so compare the two plots:
 >>> import numpy as np, numpy.ma as ma, scikits.timeseries as ts,  
scikits.timeseries.lib.plotlib as tpl
 >>> s=ts.time_series(np.range(10), dates=ts.date_array(['2001-%02i' %  
i for i in (1,2,3,4,5,6,7,10,11,12)],freq='M'))
 >>> tpl.tsplot(s, 'o-b')
 >>> tpl.tsplot(s.fill_missing_dates(), 's-r')


> The question I'm trying to get at though is if I'm going to store my
> timeseries in hdf5 will I fill in the missing dates before I do so, or
> only do that whenever I plot the timeseries?

If you need to save space, don't save the series w/ missing data. You  
can use the `compressed` method to get rid of those before saving, and  
use fill_missing_dates after loading.







More information about the SciPy-User mailing list