[Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

Thu Nov 12 17:45:34 EST 2020

On 12/11/2020 17:40, Matti Picus wrote:
> In a one-on-one discussion with Noam in a pre-community call (that, how 
> ironically, we had time for since we both messed up the meeting 
> time-zone change) we reached the conclusion that the request is to 
> clarify whether NumPy's datetime64 represents TAI time [0] or POSIX 
> time, with a preferecne for TAI time. The documentation mentions POSIX 
> time[1]. As Stefano points out, there is a couple of seconds difference 
> between POSIX (or Unix) time and TAI time. In practice numpy simply 
> stores a int64 value to represent the datetime64, and relies on others 
> to convert it. The leap-second might be getting lost in the conversions. 
> So it might make sense to clarify exactly how those conversions deal 
> with the leap-seconds and choose which one we mean when we use 
> datetime64. Noam please correct me if I am mistaken.

Unix time is a representation of the UTC timescale that counts 1 seconds
intervals starting from a defined epoch. It deals with leap seconds
either skipping one interval (never happened so far) or repeating an
interval so that two moments in time that on the UTC timescale are
separated by one second (for example 2016-12-31 23:59:59 and 2016-12-31
23:59:60) are represented in the same way and thus the conversion from
Unix time to UTC is ambiguous during this one second. This happened 37
times since 1972.

This comes with the nice properties that minutes, hours and days have
always the same duration (in Unix time), thus converting from the Unix
time representation to an date and hour and vice versa is fairly easy.

The drawback are, as seen above, an ambiguity on leap seconds and the
fact that the trivial computation of time intervals does not take into
account leap seconds and thus may be shorted of a few seconds (any time
interval across 2016-12-31 23:59:59 is off by at least one second if
computed simply subtracting Unix times).

I don't think these two drawbacks are important for Numpy (or any other
general purpose library). As things stand, it is not even possible, in
Python, with or without Numpy, to create a datetime or datetime64 object
from the time "2016-12-31 23:59:60" (neither accept the existence of a
minute with 61 seconds) thus the ambiguity issue is not an issue in
practice. The time interval issue may matter for some applications, but
the ones affected are aware of the issue and have means to deal with it
(the most common one being taking a day off on the days leap seconds are
introduced).

I think documenting that datetime64 is a representation of fixed time
intervals since a conventional epoch, neglecting leap seconds, is easy
to explain and implement and allows for easy interoperability with the
rest of the world.

What advantage would making datetime64 explicitly a representation of
TAI bring?

One disadvantage would be that `np.datetime64(datetime.now())` would be
harder to support as we are trying to match a point in time on the UTC
time scale to a point in time in on the TAI time scale. This is trivial
for past times (just need to adjust for the right offset) but it is
impossible to do correctly for dates in the future because we cannot
predict future leap second insertions. This would, for example, make
timestamp conversions not be reproducible across announcement of leap
second insertions.

Cheers,
Dan