[SciPy-user] [timeseries] Missing dates

Pierre GM pgmdevlist at gmail.com
Sat Apr 4 13:54:46 EDT 2009


On Apr 4, 2009, at 1:41 PM, Christiaan Putter wrote:
>
> I'm trying to wrap my head around how the different frequencies
> behave...   Correct me if I'm wrong. Using the yahoo example as a
> reference:  A timeseries with daily frequency for one year does not
> need to have a date value for every day in that year in its date
> array.

Correct. For a TimeSeries, the dates don't have to be consecutive nor  
even in the right order...

> But when it gets plotted the index (x-axis) runs over the
> entire year, and a line plot will simply connect all the dots
> basically as if it were linearly interpolating the values for the
> missing dates.

Correct again. You just plot a set of couples (value,date)

>  Thus even though the frequency is expecting values for
> every day, the process of masking the missing dates needs to be done
> explicitly.

Correct.


> Which is done by fill_missing_dates(), which then adds a
> date to our array for every date that the frequency expects, and sets
> the mask to true for those dates that weren't in the array initially.

Correct

> So when I'm using a business frequency the index on plots doesn't
> behave like a calender, but mondays immediately follow fridays.

Yes.


>  And
> the only dates 'missing' are in fact the holidays as you pointed out,
> which we need to add explicitly so that we can mask them.

Yes, and this is somethng *you* have to do, depending on your own  
definition of holidays.
>
> The way I've been working with my data up until now was purely looking
> at trading days, ignoring weekends, holidays etc.  So any analysis
> that takes time into account basically has 'trading day' as its unit
> of time.  This makes some things simpler.  close[-11] would always be
> the closing price 10 trading days ago.  Rate of change or other
> indicators aren't effected by long holidays, though that's just a
> minor issue.
>
> The disadvantage is obviously that indexing the array with an actual
> date becomes a bit harder and you always need to do a search.
>
> What would really be the advantage if I were to use 'business day' as
> my frequency?

* should  you want to compute statistics over months or years
* should you want to plot some graphs where the axis ticks are  
automatically adjusted depending on the zoom level.
* should you need to find data falling between two given dates...


>
>
> And another question:  Do you use the closing price from yahoo or the
> adjusted closing price?  It seems they use the adjusted prices
> themselves, though I've come across one or two graphs where they
> didn't.

I'll let Matt answering that...

> Thanks for all your advice Matt and Pierre.

Quite welcome.



More information about the SciPy-User mailing list