[SciPy-user] [timeseries] Missing dates
Pierre GM
pgmdevlist at gmail.com
Sat Apr 4 13:54:46 EDT 2009
On Apr 4, 2009, at 1:41 PM, Christiaan Putter wrote:
>
> I'm trying to wrap my head around how the different frequencies
> behave... Correct me if I'm wrong. Using the yahoo example as a
> reference: A timeseries with daily frequency for one year does not
> need to have a date value for every day in that year in its date
> array.
Correct. For a TimeSeries, the dates don't have to be consecutive nor
even in the right order...
> But when it gets plotted the index (x-axis) runs over the
> entire year, and a line plot will simply connect all the dots
> basically as if it were linearly interpolating the values for the
> missing dates.
Correct again. You just plot a set of couples (value,date)
> Thus even though the frequency is expecting values for
> every day, the process of masking the missing dates needs to be done
> explicitly.
Correct.
> Which is done by fill_missing_dates(), which then adds a
> date to our array for every date that the frequency expects, and sets
> the mask to true for those dates that weren't in the array initially.
Correct
> So when I'm using a business frequency the index on plots doesn't
> behave like a calender, but mondays immediately follow fridays.
Yes.
> And
> the only dates 'missing' are in fact the holidays as you pointed out,
> which we need to add explicitly so that we can mask them.
Yes, and this is somethng *you* have to do, depending on your own
definition of holidays.
>
> The way I've been working with my data up until now was purely looking
> at trading days, ignoring weekends, holidays etc. So any analysis
> that takes time into account basically has 'trading day' as its unit
> of time. This makes some things simpler. close[-11] would always be
> the closing price 10 trading days ago. Rate of change or other
> indicators aren't effected by long holidays, though that's just a
> minor issue.
>
> The disadvantage is obviously that indexing the array with an actual
> date becomes a bit harder and you always need to do a search.
>
> What would really be the advantage if I were to use 'business day' as
> my frequency?
* should you want to compute statistics over months or years
* should you want to plot some graphs where the axis ticks are
automatically adjusted depending on the zoom level.
* should you need to find data falling between two given dates...
>
>
> And another question: Do you use the closing price from yahoo or the
> adjusted closing price? It seems they use the adjusted prices
> themselves, though I've come across one or two graphs where they
> didn't.
I'll let Matt answering that...
> Thanks for all your advice Matt and Pierre.
Quite welcome.
More information about the SciPy-User
mailing list