[Numpy-discussion] fixing up datetime

Fri Jun 10 19:16:54 EDT 2011

On Thu, Jun 9, 2011 at 1:27 PM, Christopher Barker <Chris.Barker at noaa.gov>wrote:

> Mark Wiebe wrote:
> > Because datetime64 is a NumPy data type, it needs a well-defined rule
> > for these kinds of conversions. Treating datetimes as moments in time
> > instead of time intervals makes a very nice rule which appears to be
> > very amenable to a variety of computations, which is why I like the
> > approach.
>
> This really is the key issue that I've been harping on (sorry...) this
> whole thread:
>
> For many uses, a datetime as a moment in time is a great abstraction,
> and I think how most datetime implementations (like the std lib one) are
> used.
>
> However, when you are trying to represent/work with data like monthly
> averages and the like, you need something that represents something else
> -- and trying to use the same mechanism as for time instants, and hoping
> the the ambiguities will resolve themselves from the context is dangerous.
>

I'm trying to have clear, unambiguous meanings for all the operations on
datetime types. There are issues that make things complicated, but I think
they arise from the nature of dates and times.

I don't work in finance, so I'm not sure about things like b-monthly
> payments -- it seems those could well be defined as instances -- the
> payments are due on a given day each month( say the 1st and 15th), and,
> I assume that is well defined to the instant -- i.e. before the end of
> the day in some time zone. (note that that would be hour time:
> 23:59.99999, rather than, zero, however). The trick with these comes in
> when you do math -- the timedelta issue -- what is a 1 month timedelta?
> It's NOT an given number of days, hours, etc.
>

For datetimes, converting between months/years representation and
days/hours/etc representations is well defined, but a non-linear
relationship. For timedeltas, I've flagged such conversions as 'unsafe'
according to the can_cast function, and this is related to my desire to
tighten up NumPy's casting rules in general. I view this issue as the same
as silent conversion of 3.72 to 3 if you assign to an integer array, and
would like the thing that stops you from easily converting a month timedelta
to a day timedelta be the same. When the cast is forced, it currently uses
the average over 400 years (the leap-year period).

I don't know that anyone has time to do this, but it seems a written up
> set of use-cases would help focus this conversation -- I know I've
> pretty lost of what uses we are trying to support.
>
> another question:
>
> can you instantiate a datetime64 with something other than a string?
> i.e. a (year, [month], [day], [hour], [second], [usecond]) tuple?
>

The Python datetime.datetime, datetime,date, and datetime.timedelta objects
convert to datetime64 and timedelta64. I can add more construction methods,
but would like to get more people building the branch and trying it before
doing that. Some feedback from user experience would be helpful to suggest
what's necessary.

> > The fact that it's a NumPy dtype probably is the biggest limiting
>  > factor preventing parameters like 'start' and 'end' during conversion.
>  > Having a datetime represent an instant in time neatly removes any
>  > ambiguity, so converting between days and seconds as a unit is
>  > analogous to converting between int32 and float32.
>
> Sure, but I don't know that that is the best way to go -- integers are
> precisely defined and generally used as 3 == 3.00000000 That's not the
> case for months, at least if it's supposed be be a monthly average-type
> representation.
>

I don't think a monthly average-type representation is desirable for the the
most primitive datetime type. For this I expect you would want a datetime
and a timedelta, or a pair of datetimes, representing an interval of time.

This reminds me a question recently on this list -- someone was using
> np.histogram() to bin integer values, and was surprised at the results
> -- what they needed to do was consider the bin intervals as floating
> point numbers to get what they wanted: 0.5, 1.5, 2.5, rather than
> 1,2,3,4, because what they really wanted was an categorical definition
> of an integer, NOT a truncated floating point number. I'm not sure how
> that informs this conversation, though...
>
>
>  > > >>> np.timedelta64(10, 's') + 10
>  > > numpy.timedelta64(20,'s')
>  >
>  > Here, the unit is defined: 's'
>  >
>  >  For the first operand, the inconsistency is with the second. Here's
>  > the reasoning I didn't spell out:
>
>  > We're adding a timedelta + int, so lets convert 10 into a timedelta.
>  > No units specified, so it's
>  > 10 microseconds, so we add 10 seconds and 10 microseconds, not 10
>  > seconds and 10 seconds.
>
> This sure seems ripe for error to me -- if a datetime and timedelta are
> going to be represented in various possible units, then I don't think it
> it's a good idea to allow one to and an integer -- especially if the
> unit can be inferred from the input data, rather than specified.
>
> "Explicit is better than implicit."
>
> "In the face of ambiguity, refuse the temptation to guess."
>
> If you must allow this, then using the default for the unspecified unit
> as above is the way to go.
>

There's always a balance between how easy it is to do things at the
interactive prompt, and how tightly the system controls what you do. I'm
trying to strike a balance, people will have to use it to see if I'm hitting
the mark.

> Dave Hirschfeld wrote:
> >> Here are some current behaviors that are inconsistent with the
> microsecond
> > default, but consistent with the "generic time unit" idea:
> >>>>> np.timedelta64(10, 's') + 10
> >> numpy.timedelta64(20,'s')
> >
> > That is what I would expect (and hope) would happen. IMO an integer
> should be
> > cast to the dtype ([s]) of the datetime/timedelta.
>
> This is way too ripe for error, particularly if we have the unit
> auto-determined from input data.
>
>
>
>
> Not to take us back to a probably already resolved issue, but maybe all
> this unit conversion could and should be avoided by following the python
> datetime approach -- all datetimes and timedeltas are always defined
> with microsecond precision -- period.
>
> Maybe there are computational efficiencies that we want to avoid.
>
> This would also preclude any use of these dtypes for work that required
> greater precision, but does anyone really need both year, month, day
> specification AND nanoseconds? Given all the leap-second issues, that
> seems a bit ridiculous.
>
> But it would make things easier.
>
> I note that in this entire conversation, all the talk has been about
> finance examples -- I think I'm the only one that has brought up science
> use, and that only barely (and mostly the simple cases). So do we really
> need to have the same dtype useful for finance and particle physics?
>

I'm not sure, but I think it can work out quite well. More domain-specific
feedback in other areas always helps.

-Mark

>
>
> -Chris
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110610/576bbd94/attachment.html>