[Numpy-discussion] fixing up datetime

Wed Jun 8 06:46:23 EDT 2011

On Wed, Jun 8, 2011 at 6:37 AM, Dave Hirschfeld
<dave.hirschfeld at gmail.com> wrote:
> Wes McKinney <wesmckinn <at> gmail.com> writes:
>
>>
>> >
>> > - Fundamental need to be able to work with multiple time series,
>> > especially performing operations involving cross-sectional data
>> > - I think it's a bit hard for lay people to use (read: ex-MATLAB/R
>> > users). This is just my opinion, but a few years ago I thought about
>> > using it and concluded that teaching people how to properly use it (a
>> > precision tool, indeed!) was going to cause me grief.
>> > - The data alignment problem, best explained in code:
>> >
>
>> >
>> > - Inability to derive custom offsets:
>> >
>> > I can do:
>> >
>> > In [14]: ts.shift(2, offset=2 * datetools.BDay())
>> > Out[14]:
>> > 2000-01-11 00:00:00    0.0503706684002
>> > 2000-01-18 00:00:00    -1.7660004939
>> > 2000-01-25 00:00:00    1.11716758554
>> > 2000-02-01 00:00:00    -0.171029995265
>> > 2000-02-08 00:00:00    -0.99876580126
>> > 2000-02-15 00:00:00    -0.262729046405
>> >
>> > or even generate, say, 5-minutely or 10-minutely date ranges thusly:
>> >
>> > In [16]: DateRange('6/8/2011 5:00', '6/8/2011 12:00',
>> > offset=datetools.Minute(5))
>> > Out[16]:
>> > <class 'pandas.core.daterange.DateRange'>
>> > offset: <5 Minutes>, tzinfo: None
>> > [2011-06-08 05:00:00, ..., 2011-06-08 12:00:00]
>> > length: 85
>> >
>
> It would be nice to have a step argument in the date_array function. The
> following works though:
>
> In [96]: integers = r_[ts.Date('T',"01-Aug-2011 05:00").value:
>                       ts.Date('T',"06-Aug-2011 12:01").value:
>                       5]
>
> In [97]: ts.date_array(integers, freq='T')
> Out[97]:
> DateArray([01-Aug-2011 05:00, 01-Aug-2011 05:05, 01-Aug-2011 05:10, ...,
>       06-Aug-2011 11:45, 06-Aug-2011 11:50, 06-Aug-2011 11:55, ...,
>       06-Aug-2011 12:00],
>          freq='T')
>
>
>> > - (possible now??) Ability to have a set of frequency-naive dates
>> > (possibly not in order).
>> >
>> > This last point actually matters. Suppose you wanted to get the worst
>> > 5-performing days in the S&P 500 index:
>> >
>> > In [7]: spx.index
>> > Out[7]:
>> > <class 'pandas.core.daterange.DateRange'>
>> > offset: <1 BusinessDay>, tzinfo: None
>> > [1999-12-31 00:00:00, ..., 2011-05-10 00:00:00]
>> > length: 2963
>> >
>> > # but this is OK
>> > In [8]: spx.order()[:5]
>> > Out[8]:
>> > 2008-10-15 00:00:00    -0.0903497960942
>> > 2008-12-01 00:00:00    -0.0892952780505
>> > 2008-09-29 00:00:00    -0.0878970494885
>> > 2008-10-09 00:00:00    -0.0761670761671
>> > 2008-11-20 00:00:00    -0.0671229140321
>> >
>> > - W
>> >
>
> Seems like you're looking for the tssort function:
>
> In [90]: series = ts.time_series(randn(365),start_date='01-Jan-2011',freq='D')
>
> In [91]: def tssort(series):
>   ....:     indices = series.argsort()
>   ....:     return ts.time_series(series._series[indices],
>   ....:                           series.dates[indices],
>   ....:                           autosort=False)
>   ....:
>
> In [92]: tssort(series)[0:5]
> Out[92]:
> timeseries([-2.96602612 -2.81612524 -2.61558511 -2.59522921 -2.4899447 ],
>   dates = [26-Aug-2011 18-Apr-2011 27-Aug-2011 21-Aug-2011 19-Nov-2011],
>   freq  = D)
>
>
> Pandas seems to offer an even higher level api than scikits.timeseries. I view
> them as mostly complementary but I haven't (yet) had much experience with
> pandas...
>
> -Dave
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I admit that my partial ignorance about scikits.timeseries (due to
seldom use) makes my statements a bit unfair-- but you're right in
that it's a high-level versus low-level thing. Just about everything
is possible but may require a certain level of experience /
sophistication / deep understanding (as you have).