[SciPy-user] timeseries: logging of defective time series

Tim Michelsen timmichelsen at gmx-topmail.de
Mon Feb 2 13:23:29 EST 2009


Hello,
I have a question on how to effectively log invalid timeseries.
Such series may return may have one or more of the following properties:

* duplicate dates (ts.time_series.has_duplicated_dates() )
* missing dates (ts.time_series.has_missing_dates() )
* masked values (ts.time_series.mask)

The functions above in brackets return either "True" or "False" or the 
boolean mask array.
But would be interested in the dates that my series are missing or the 
data points that are duplicated or masked (from input).

May you give me an example how to retrieve these? I put some demo code 
with comments below.

Example use cases:
Someone sends you a data file from a datalogger or sensor recording device.
* Due to battery problems, the logger did stop recording for some time 
(=> missing dates). It is important for inspection of the device setup 
to know when this happend or how long that period lasted.
* The data file may have been reformatted or treated before sent to you. 
Due to this processing, some timsstamps have been saved twice or more 
(=> duplicated dates). For a correction, one would like to know where to 
search in the input files.
* The input file has already NoData markers. They where used to mask 
data during loading in python (=> masked data). For error analysis the 
date and length of masked period is important.

I would appreciate a pointer here.

Regards,
Timmie


#### demo code:
### using the examples from 
http://pytseries.sourceforge.net/core/TimeSeries.html
import numpy as np
import scikits.timeseries as ts

mlist_1 = ['2005-%02i' % i for i in range(1,10)]
mlist_1 += ['2006-%02i' % i for i in range(2,13)]
mdata_1 = np.arange(len(mlist_1))
mser_1 = ts.time_series(mdata_1, mlist_1, freq='M')


mser_1.has_missing_dates()
                               <55> True
### how do I retrieve a new series which contains only the dates that 
are missing?


## a series with masked
mser_1_fill = mser_1.fill_missing_dates()
mser_1_fill.mask
# I tried "mser_1_fill.mask" but it returns the masked array. The 
timedate information is lost here.
### how do I retrieve a new series which contains only the dates that 
are masked?
### Basically it seems that I am looking for the opposite of 
mser_1_fill.compressed()


mser_1_annual = ts.time_series(mdata_1, mlist_1, freq='A')
mser_daily = mser_1.asfreq('D')

### how do I retrieve a new series which contains only the dates that 
are duplicated?
mser_daily.has_duplicated_dates()
                               <53> True




More information about the SciPy-User mailing list