[Numpy-discussion] datetime64/timedelta64 support in linspace

Tue Sep 29 16:43:34 EDT 2020

On Sat, 2020-09-26 at 09:52 -0500, Lee Johnston wrote:
> I propose adding support for datetime64/timedelta64 in linspace and
> solicit
> feedback on the feature. As is, linspace raises UFuncTypeError when
> parameters start and stop are datetime64/timedelta64. The
> complementary
> function arange supports these types. Work was started on this
> feature in PR
> 14700 <https://github.com/numpy/numpy/pull/14700> but has stalled and
> I
> would like to complete it, but there are some issues worth getting
> feedback
> on.
> 
>    1. Supporting datetime64/timedelta64 will require a special case
> code
>    path within linspace. The code path is selected based on the start
>    parameter data type.
>    2. The output dtype has to be explicitly set.
>    3. The step size resolution is determined by the lesser resolution
> of
>    start and dtype.
> 
> Issue 3 may lead to an unexpected result for an end-user. For
> example,
> 
> > > > import numpy as np
> > > > np.linspace(np.timedelta64(0, "s"), np.timedelta64(1, "s"), 4,
> dtype="timedelta64[ms]")
> array([   0,    0,    0, 1000], dtype='timedelta64[ms]')
> 
> The existing solution in PR 14700 does not override the end-user's
> start
> and dtype resolution. In this case, the end-user would have to set
> both
> start and dtype to "ms" resolution to get the expected result.
> 
> > > > np.linspace(np.timedelta64(0, "ms"), np.timedelta64(1, "s"), 4,
> dtype="timedelta64[ms]")
> array([   0,  333,  666, 1000], dtype='timedelta64[ms]')

Thanks for taking the time and looking into this!

Can you explain why your solution of using the input units to represent
the step size is better then using the provided one?
If this turns out tricky, we could also make the rule: cast everything
to a single unit (as long as the cast is considered "safe"), that may
force the user to do the cast in the long run, but I maybe most users
are not dealing with a mix of units here to begin with?

The approach in the last state of the PR, had issues with the
timedelta/datetime equivalent of:

    >>> np.diff(np.linspace(0, 1000, 33, dtype='int64'))
    array([31, 31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31,
           31, 31, 32, 31, 31, 31, 32, 31, 31, 31, 32, 31, 31,
           31, 32, 31, 31, 31, 32])

which has an uneven step size that was not spread out (note the 32
values).  I assume you have a solution for that?

Maybe it is best if you can just pick up the PR and create a new one
(if possible pull in the existing commits, or tests for attribution as
well), so we can discuss easier reading the tests.

> 
> In PR 14700, there is some discussion of "NaT" handling. In my
> implementation, "NaT" works the same as "NaN" and I am not aware of
> any
> corner cases.

There may not be, I think this had to do with how we approached certain
difficulties in the PR (around viewing as int64 or using floats,
probably).  We just should make sure to have tests for both start and
end being NaT.
Maybe NaT is not a big issue, because we can probably add an explicit
code path if necessary.

Cheers,

Sebastian

> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200929/1478825d/attachment.sig>