[Numpy-discussion] fixing up datetime

Pierre GM pgmdevlist at gmail.com
Tue Jun 7 19:53:11 EDT 2011


On Jun 8, 2011, at 1:16 AM, Mark Wiebe wrote:

> Hi Dave,
> 
> Thanks for all the feedback on the datetime, it's very useful to help understand the timeseries ideas, in particular with the many examples you're sprinkling in.
> 
> One overall impression I have about timeseries in general is the use of the term "frequency" synonymously with the time unit. To me, a frequency is a numerical quantity with a unit of 1/(time unit), so while it's related to the time unit, naming it the same is something the specific timeseries domain has chosen to do, I think the numpy datetime class shouldn't have anything called "frequency" in it, and I would like to remove the current usage of that terminology from the codebase.

True. We rather abused the term in scikits.timeseries, but we meant it as "given time unit".
Matt came with the idea of representing a series of consecutive dates as an array of consecutive integers. The conversion integer<>datetime is done internally with an epoch and a unit. Initially, we called this latter frequency, but in the experimental git version I switched to unit. Anyhow, each time yo read 'frequency' in scikits.timeseries, think 'unit'.


> In Wes's comment, he said
> 
> I'm hopeful that the datetime64 dtype will enable scikits.timeseries
> and pandas to consolidate much ofir the datetime / frequency code.
> scikits.timeseries has a ton of great stuff for generating dates with
> all the standard fixed frequencies.
> 
> implying to me that the important functionality needed in time series is the ability to generate arrays of dates in specific ways. I suspect equating the specification of the array of dates and the unit of precision used to store the date isn't good for either the datetime functionality or supporting timeseries, and I'm presently trying to understand what it is that timeseries uses.

You want a series of 365 consecutive days from today ? 'now' + np.arange(365). This kind of stuff.





> On Tue, Jun 7, 2011 at 7:34 AM, Dave Hirschfeld <dave.hirschfeld at gmail.com> wrote:
> 
> I think some of the complexity is coming from the definition of the timedelta.
> In the timeseries package each date simply represents the number of periods
> since the epoch and the difference between dates is therefore just and integer
> with no attached metadata - its meaning is determined by the context it's used
> in. e.g.

Exactly that.

> timeseries gets on just fine without a timedelta type - a timedelta is just an
> integer and if you add an integer to a date it's interpreted as the number of
> periods of that dates frequency. From a useability point of view M1 + 1 is
> much nicer than having to do something like M1 + ts.TimeDelta(M1.freq, 1).

Likewise, the difference between two dates is just an integer. 

[Mark]
> I think the timedelta is important, especially with the large number of units NumPy's datetime supports. When you're subtracting two nanosecond datetimes and two minute datetimes in the same code, having the units there to avoid confusion is pretty useful.

Indeed.

>  I don't envision 'asfreq' being a datetime function, this is the kind of thing that would layer on top in a specialized timeseries library. The behavior of timedelta follows a more physics-like idea with regard to the time unit, and I don't think something more complicated belongs at the bottom layer that is shared among all datetime uses.

'asfreq' converts from one unit to another (there's another function, convert, that does not quite exactly the same thing, but I won't get into details here). You'll probably have to take unit conversion into account if you allow the .view() or .astype() methods on your np.datetime array...

> In [80]: ts.Date('S', (_64.value + _65.value)//2)
> Out[80]: <S : 02-Jul-2011 12:00:00>
> 
> Adding dates definitely doesn't work, because datetimes have no zero, but I would express it like this:

Well, it can be argued that the epoch is 0... But in scikits.timeseries, keep in mind that underneath, a DateArray is just an array of integer.

[Dave]
> I really like the idea of being able to specify multiples of the base frequency
> - e.g. [7D] is equivalenty to [W] not the least because it provides an easy
> way to specify quarters [3M] or seasons [6M] which are important in my work.
> NB: I also deal with half-hourly and quarter-hourly timeseries and I'm sure
> there are many other example which are all made possible by allowing
> multipliers.

Well, the experimental version kinda allowed that...

> 
> This is one of the things where I think mixing the datetime storage precision with timeseries frequency seems counterproductive. Having different origins for datetime64 starting on different weekdays near 1970-01-01 doesn't seem like the right way to tackle the problem to me. I see other valid reasons for reintroducing the origin metadata, but this one I don't really like.

We needed the concept to convert time series, for example from monthly to quarterly (what is the first month of the year (as in succession of 12 months) you want to start with ?)

>  




More information about the NumPy-Discussion mailing list