[SciPy-user] timeseries: logging of defective time series
Pierre GM
pgmdevlist at gmail.com
Mon Feb 2 14:02:57 EST 2009
Timmie,
Remember that the mask is an array of boolean and can be used for
indexing.
I will also assume that your data is 1D
* To find the dates corresponding to the missing values in your series:
>>> series.dates[series.mask]
* To find the missing dates, use fill_missing_dates first (to make
sure the dates are continuous) and get the missing dates by
>>> series.dates[series.mask]
With your example:
>>> mser_1_filled = ts.fill_missing_dates(mser_1)
>>> missing_dates = mser_1_filled.dates[mser_1.mask]
Note that if your initial `series` has already some missing dates,
you'll pick those ones up as well. you shuld then check whether you
have missing values in the first place, find the corresponding dates,
fill the dates, recheck the missing ones, and take the difference
between the two sets.
* To find duplicated dates:
Things get a tad more complicated:
1. make sure that your `series` is sorted chronologically first
2. construct the following array:
>>> d = series.dates
>>> dupcheck = np.r_[False, (d[1:]==d[:-1])]
dupcheck is a ndarray of booleans with True values where the
corresponding date is the same as the previous ones. Note that the
first date of a duplicated series is flagged as False
Gimme a few days to whip up a more useable function that would
reproduce that (I think I already have something along those lines
somewhere on my HD).
>
> Such series may return may have one or more of the following
> properties:
>
> * duplicate dates (ts.time_series.has_duplicated_dates() )
> * missing dates (ts.time_series.has_missing_dates() )
> * masked values (ts.time_series.mask)
has_duplicated_dates and has_missing_dates were not really meant to be
used directly, but more internally to keep track of some info on the
distribution of dates
More information about the SciPy-User
mailing list