[Numpy-discussion] datetime64: Remove deprecation warning when constructing with timezone

Noam Yorav-Raphael noamraph at gmail.com
Fri Nov 6 05:47:46 EST 2020


Hi,

I actually arrived at this by first trying to use pandas.Timestamp and
getting very frustrated about it. With pandas, I get:

>>> pd.Timestamp.now()
Timestamp('2020-11-06 09:45:24.249851')

I find the whole notion of a "timezone naive timestamp" to be nearly
meaningless. A timestamp should mean a moment in time (as the current numpy
documentation defines very well). A "naive timestamp" doesn't mean
anything. It's exactly like a "unit naive length". I can have a Length type
which just takes a number, and be very happy that it works both if my "unit
zone" is inches or centimeters. So "Length(3)" will mean 3 cm in most of
the world and 3 inches in the US. But then, if I get "Length(3)" from
someone, I can't be sure what length it refers to.

So currently, this happens with pandas timestamps:

>>> os.environ['TZ'] = 'UTC'; time.tzset()
... t0 = pd.Timestamp.now()
... time.sleep(1)
... os.environ['TZ'] = 'EST-5'; time.tzset()
... t1 = pd.Timestamp.now()
... t1 - t0
Timedelta('0 days 05:00:01.001583')

This is not just theoretical - I actually need to work with data from
several devices, each in its own time zone. And I need to know that I won't
get such meaningless results.

And you can even get something like this:

>>> t0 = pd.Timestamp.now()
... time.sleep(10)
... t1 = pd.Timestamp.now()
... t1 - t0
Timedelta('0 days 01:00:10.001583')

if the first measurement happened to be in winter time and the second
measurement happened to be in daylight saving time.

The solution is simple, and is what datetime64 used to do before the change
- have a type that just represents a moment in time. It's not "in UTC" - it
just stores the number of seconds that passed since an agreed moment in
time (which is usually 1970-01-01 02:00+0200, which is more commonly
referred to as 1970-01-01 00:00Z - it's the exact same moment).

I think it would make things clearer if I'll mention that there are
operations that are not dealing with timestamps. For example, it's
meaningless to ask what is the year of a timestamp - it may depend on the
time zone. These are always *human* related questions, that depend on
certain human conventions. We can call them "calendar questions". For these
types of questions, a type that includes both a timestamp and a timezone
offset (in minutes from UTC) can be useful. Some questions even require
full timezone information, meaning a function that defines what's the
timezone offset for each moment. However, I don't think numpy should deal
with those calendar issues. As a very simple example, even for
"timestamp+offset" types, it's not clear how to compare them - should
values with the same timestamp and different offsets be considered equal or
not? And in virtually all of my data analysis, this calendar aspect has
nothing to do with the questions I'm trying to answer.

I have a suggestion. Instead of changing datetime64 (which I consider to be
ill-defined, but never mind), add a new type called "timestamp64". It will
have the exact same behavior as datetime64 had before the change, except
that its only allowed units will be seconds, milliseconds, microseconds and
nanoseconds.  Removing the longer units will make it clear that it doesn't
deal with calendar and dates. Also, all the business day functionality will
not be applicable to timestamp64. In order to get calendar information
(such as the year) from timestamp64, you will have to manually convert it
to python's datetime (or to np.datetime64) with an explicit timezone (utc,
local, an offset, or a timezone object).

What do you think?

Thanks,
Noam





On Fri, Nov 6, 2020 at 1:45 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> I can try to dig up the old discussions, but datetime64 used to implement
> both (1) and (3), and this was updated in a very intentional way.
> Datetime64 now works like Python's own time-zone naive datetime.datetime
> objects. The documentation referencing "Z" should be updated -- datetime64
> can be in any timezone you like.
>
> Timezone aware datetime objects are certainly useful, but NumPy's
> datetime64 was restricted to UTC. The consensus was that it was worse to
> have UTC-only rather than timezone-naive-only. NumPy's datetime64 is often
> used for data analysis purposes, for which automatic conversion to the
> local timezone of the computer running the analysis is often
> counter-productive.
>
> If you care about timezone conversions, I would highly recommend looking
> into pandas's Timestamp class for this purpose. In the future, this would
> be a good use-case for a new custom NumPy dtype. (The existing
> np.datetime64 code cannot easily handle multiple timezones.)
>
> On Thu, Nov 5, 2020 at 1:04 PM Eric Wieser <wieser.eric+numpy at gmail.com>
> wrote:
>
>> Without weighing in yet on how I feel about the deprecation, you can see
>> some discussion about why this was originally deprecated in the PR that
>> introduced the warning:
>>
>> https://github.com/numpy/numpy/pull/6453
>>
>> Eric
>>
>> On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael <noamraph at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I suggest removing the deprecation warning when constructing a
>>> datetime64 with a timezone. For example, this is the current behavior:
>>>
>>> >>> np.datetime64('2020-11-05 16:00+0200')
>>> <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is
>>> deprecated; this will raise an error in the future
>>> numpy.datetime64('2020-11-05T14:00')
>>>
>>> I suggest removing the deprecation warning because I find this to be a
>>> useful behavior, and because it is a correct behavior. The manual says:
>>> "The datetime object represents a single moment in time... Datetimes are
>>> always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z."
>>> So 2020-11-05T16:00+0200 is indeed the moment in time represented by
>>> np.datetime64('2020-11-05T14:00').
>>>
>>> I just used this to restrict my data set to records created after a
>>> certain moment. It was easier for me to write the moment in my local time
>>> and add "+0200" than to figure out the moment representation in UTC.
>>>
>>> So this is my simple suggestion: remove the deprecation warning.
>>>
>>>
>>> Beyond that, I have 3 ideas for changing the repr of datetime64 that I
>>> would like to discuss.
>>>
>>> 1. Add "Z" at the end, for example,
>>> numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which
>>> moment it refers. I think this is significant - I had to dig quite a bit to
>>> realize that datetime64('2020-11-05T14:00') means 14:00 UTC.
>>>
>>> 2. Replace the 'T' with a space. I just find it much easier to read
>>> '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of
>>> characters makes it hard for my brain to parse.
>>>
>>> 3. This will require discussion, but will be very convenient: have the
>>> repr display the time using the environment time zone, including a time
>>> offset. So, in my specific time zone (+0200), I will have:
>>>
>>> repr(np.datetime64('2020-11-05 14:00Z')) ==
>>> "numpy.datetime64('2020-11-05T16:00+0200')"
>>>
>>> I'm sure the pros and cons of having an environment-dependent repr
>>> should be discussed. But I will list some pros:
>>> 1. It's very convenient - it's immediately obvious to me to which moment
>>> 2020-11-05 16:00+0200 refers.
>>> 2. It's well defined - I may collect timestamps from machines with
>>> different time zones, and I will be able to know to which exact moment each
>>> timestamp refers.
>>> 3. It's very simple - I could compare any two timestamps, I don't have
>>> to worry about time zones.
>>>
>>> I would be happy to hear your thoughts.
>>>
>>> Thanks,
>>> Noam
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20201106/d43fa3b5/attachment-0001.html>


More information about the NumPy-Discussion mailing list