[Numpy-discussion] fixing up datetime

Wed Jun 8 06:37:38 EDT 2011

Wes McKinney <wesmckinn <at> gmail.com> writes:

> 
> >
> > - Fundamental need to be able to work with multiple time series,
> > especially performing operations involving cross-sectional data
> > - I think it's a bit hard for lay people to use (read: ex-MATLAB/R
> > users). This is just my opinion, but a few years ago I thought about
> > using it and concluded that teaching people how to properly use it (a
> > precision tool, indeed!) was going to cause me grief.
> > - The data alignment problem, best explained in code:
> >

> >
> > - Inability to derive custom offsets:
> >
> > I can do:
> >
> > In [14]: ts.shift(2, offset=2 * datetools.BDay())
> > Out[14]:
> > 2000-01-11 00:00:00    0.0503706684002
> > 2000-01-18 00:00:00    -1.7660004939
> > 2000-01-25 00:00:00    1.11716758554
> > 2000-02-01 00:00:00    -0.171029995265
> > 2000-02-08 00:00:00    -0.99876580126
> > 2000-02-15 00:00:00    -0.262729046405
> >
> > or even generate, say, 5-minutely or 10-minutely date ranges thusly:
> >
> > In [16]: DateRange('6/8/2011 5:00', '6/8/2011 12:00',
> > offset=datetools.Minute(5))
> > Out[16]:
> > <class 'pandas.core.daterange.DateRange'>
> > offset: <5 Minutes>, tzinfo: None
> > [2011-06-08 05:00:00, ..., 2011-06-08 12:00:00]
> > length: 85
> >

It would be nice to have a step argument in the date_array function. The
following works though:

In [96]: integers = r_[ts.Date('T',"01-Aug-2011 05:00").value:
                       ts.Date('T',"06-Aug-2011 12:01").value:
                       5]

In [97]: ts.date_array(integers, freq='T')
Out[97]:
DateArray([01-Aug-2011 05:00, 01-Aug-2011 05:05, 01-Aug-2011 05:10, ...,
       06-Aug-2011 11:45, 06-Aug-2011 11:50, 06-Aug-2011 11:55, ...,
       06-Aug-2011 12:00],
          freq='T')

> > - (possible now??) Ability to have a set of frequency-naive dates
> > (possibly not in order).
> >
> > This last point actually matters. Suppose you wanted to get the worst
> > 5-performing days in the S&P 500 index:
> >
> > In [7]: spx.index
> > Out[7]:
> > <class 'pandas.core.daterange.DateRange'>
> > offset: <1 BusinessDay>, tzinfo: None
> > [1999-12-31 00:00:00, ..., 2011-05-10 00:00:00]
> > length: 2963
> >
> > # but this is OK
> > In [8]: spx.order()[:5]
> > Out[8]:
> > 2008-10-15 00:00:00    -0.0903497960942
> > 2008-12-01 00:00:00    -0.0892952780505
> > 2008-09-29 00:00:00    -0.0878970494885
> > 2008-10-09 00:00:00    -0.0761670761671
> > 2008-11-20 00:00:00    -0.0671229140321
> >
> > - W
> >

Seems like you're looking for the tssort function:

In [90]: series = ts.time_series(randn(365),start_date='01-Jan-2011',freq='D')

In [91]: def tssort(series):
   ....:     indices = series.argsort()
   ....:     return ts.time_series(series._series[indices],
   ....:                           series.dates[indices], 
   ....:                           autosort=False)
   ....:

In [92]: tssort(series)[0:5]
Out[92]:
timeseries([-2.96602612 -2.81612524 -2.61558511 -2.59522921 -2.4899447 ],
   dates = [26-Aug-2011 18-Apr-2011 27-Aug-2011 21-Aug-2011 19-Nov-2011],
   freq  = D)

Pandas seems to offer an even higher level api than scikits.timeseries. I view
them as mostly complementary but I haven't (yet) had much experience with 
pandas...

-Dave