[Numpy-discussion] fixing up datetime
Dave Hirschfeld
dave.hirschfeld at gmail.com
Wed Jun 8 06:37:38 EDT 2011
Wes McKinney <wesmckinn <at> gmail.com> writes:
>
> >
> > - Fundamental need to be able to work with multiple time series,
> > especially performing operations involving cross-sectional data
> > - I think it's a bit hard for lay people to use (read: ex-MATLAB/R
> > users). This is just my opinion, but a few years ago I thought about
> > using it and concluded that teaching people how to properly use it (a
> > precision tool, indeed!) was going to cause me grief.
> > - The data alignment problem, best explained in code:
> >
> >
> > - Inability to derive custom offsets:
> >
> > I can do:
> >
> > In [14]: ts.shift(2, offset=2 * datetools.BDay())
> > Out[14]:
> > 2000-01-11 00:00:00 0.0503706684002
> > 2000-01-18 00:00:00 -1.7660004939
> > 2000-01-25 00:00:00 1.11716758554
> > 2000-02-01 00:00:00 -0.171029995265
> > 2000-02-08 00:00:00 -0.99876580126
> > 2000-02-15 00:00:00 -0.262729046405
> >
> > or even generate, say, 5-minutely or 10-minutely date ranges thusly:
> >
> > In [16]: DateRange('6/8/2011 5:00', '6/8/2011 12:00',
> > offset=datetools.Minute(5))
> > Out[16]:
> > <class 'pandas.core.daterange.DateRange'>
> > offset: <5 Minutes>, tzinfo: None
> > [2011-06-08 05:00:00, ..., 2011-06-08 12:00:00]
> > length: 85
> >
It would be nice to have a step argument in the date_array function. The
following works though:
In [96]: integers = r_[ts.Date('T',"01-Aug-2011 05:00").value:
ts.Date('T',"06-Aug-2011 12:01").value:
5]
In [97]: ts.date_array(integers, freq='T')
Out[97]:
DateArray([01-Aug-2011 05:00, 01-Aug-2011 05:05, 01-Aug-2011 05:10, ...,
06-Aug-2011 11:45, 06-Aug-2011 11:50, 06-Aug-2011 11:55, ...,
06-Aug-2011 12:00],
freq='T')
> > - (possible now??) Ability to have a set of frequency-naive dates
> > (possibly not in order).
> >
> > This last point actually matters. Suppose you wanted to get the worst
> > 5-performing days in the S&P 500 index:
> >
> > In [7]: spx.index
> > Out[7]:
> > <class 'pandas.core.daterange.DateRange'>
> > offset: <1 BusinessDay>, tzinfo: None
> > [1999-12-31 00:00:00, ..., 2011-05-10 00:00:00]
> > length: 2963
> >
> > # but this is OK
> > In [8]: spx.order()[:5]
> > Out[8]:
> > 2008-10-15 00:00:00 -0.0903497960942
> > 2008-12-01 00:00:00 -0.0892952780505
> > 2008-09-29 00:00:00 -0.0878970494885
> > 2008-10-09 00:00:00 -0.0761670761671
> > 2008-11-20 00:00:00 -0.0671229140321
> >
> > - W
> >
Seems like you're looking for the tssort function:
In [90]: series = ts.time_series(randn(365),start_date='01-Jan-2011',freq='D')
In [91]: def tssort(series):
....: indices = series.argsort()
....: return ts.time_series(series._series[indices],
....: series.dates[indices],
....: autosort=False)
....:
In [92]: tssort(series)[0:5]
Out[92]:
timeseries([-2.96602612 -2.81612524 -2.61558511 -2.59522921 -2.4899447 ],
dates = [26-Aug-2011 18-Apr-2011 27-Aug-2011 21-Aug-2011 19-Nov-2011],
freq = D)
Pandas seems to offer an even higher level api than scikits.timeseries. I view
them as mostly complementary but I haven't (yet) had much experience with
pandas...
-Dave
More information about the NumPy-Discussion
mailing list