[SciPy-dev] Timeseries Unusual Behaviour

Pierre GM pgmdevlist at gmail.com
Mon Oct 20 01:56:04 EDT 2008


David,
Sorry for the delayed answer.

* When you create a DateArray from a series of dates using `date_array` (the 
recommended way), the dates will automatically be sorted in chronological 
order, but the original order stored temporarily into a private attribute.

* When you create a time series using `time_series`, the same thing happens:
1. a DateArray is created from the list of dates with `date_array`
2. it is sorted chronologically and the initial order is stored temporarily
3. the values are then sorted chronologically,
4. the attribute storing the initial order of the dates reset to None.

The reason behind the automatic sorting is that in a huge majority of cases, 
one works with chronological series. 

Now, let's go through your example
>>> dates = (datetime.datetime(2011, 5, 1, 0, 0),
... datetime.datetime(2011, 8, 1, 0, 0),
... datetime.datetime(2011, 6, 1, 0, 0),
... datetime.datetime(2011, 9, 1, 0, 0),
... datetime.datetime(2011, 7, 1, 0, 0))
>>> values = [1, 4, 2, 5, 3]
>>> d1 = ts.date_array(dates)

d1 is sorted chronologically by default.

>>> zip(d1, values)
[(<U : 01-May-2011>, 1),
 (<U : 01-Jun-2011>, 4),
 (<U : 01-Jul-2011>, 2),
 (<U : 01-Aug-2011>, 5),
 (<U : 01-Sep-2011>, 3)]

Here, we're mixing d1 (sorted) and values (unsorted).

>>> series = ts.time_series(values,dates=d1)
timeseries([1 2 3 4 5],
           dates = [01-May-2011 01-Jun-2011 01-Jul-2011 01-Aug-2011 
01-Sep-2011],
           freq  = U)

d1 is sorted chronologically, values is sorted according to the initial order 
of d1, the private flag of d1 is reset to show it has been sorted.

>>> series = ts.time_series(values,dates=d1)
timeseries([1 4 2 5 3],
           dates = [01-May-2011 01-Jun-2011 01-Jul-2011 01-Aug-2011 
01-Sep-2011],
           freq  = U)

d1 had already been sorted and its flag reset, the values are *not* sorted.

I agree that the last line is so confusing that it should be considered a bug. 
I'll try to find some workaround.
Meanwhile, a couple of advices:

* As time_series calls date_array anyway, you don't gain a lot by creating a 
DateArray from your dates in the first place. On the contrary, it can lead to 
the surprises we just discuss: just use the dates instead.
* Take the habit of always specifying a frequency, it will save you some 
headaches in the long run when converting between frequencies



More information about the SciPy-Dev mailing list