[Pandas-dev] tslibs 2.0 and non-nanosecond datetime64/timedelta64

Mon Jun 1 19:36:44 EDT 2020

Before responding to questions, one topic I forgot to include in the OP:

The performance of Timestamp, Timedelta, and Period could be improved (i do
not have an estimate of how much) if they were cdef (cython) classes.  This
is not viable at the moment because they each have `__new__` methods, which
are needed because the constructors can return pd.NaT.  If we had
dtype-specific NaTs (xref #24983
<https://github.com/pandas-dev/pandas/issues/24983>) that would allow us to
make these cdef classes.

---------
> Will this [casting non-nano timestamps to nano to use existing
tz-conversion code] cause issues if the original datetime isn't in the
bounds of a ns-precision timestamp?

Both technically and conceptually, yes.  [note to self, expand on this
before hitting send]

> [...] since it represents a point in time rather than a span.

>From an implementations standpoint, that distinction is meaningless; the
same conversion code (the hard part) is used for both.  Conceptually, I
think of `datetime64[minute]` as representing the same thing as
`Period[minute]` (both can be used to represent the "4:32" in the corner of
my screen).

Or for Timestamp[D] we can just call that a Date dtype instead of
re-implementing it (xref #34441
<https://github.com/pandas-dev/pandas/pull/34441>)

---------
> Personally, I don't think we necessarily need to add all units that are
supported by numpy's datetime64/timedelta64 dtypes.

I have a strong preference against using the Year or Month units, as the
conversions of those to/from the others is not just
multiplication/division.  The others I don't feel as strongly about; once
nanos is no longer hard-coded, the marginal cost of adding more should be
relatively small.

On Sat, May 30, 2020 at 12:18 PM Joris Van den Bossche <
jorisvandenbossche at gmail.com> wrote:

> Thanks for starting this discussion, Brock!
>
> On Fri, 29 May 2020 at 21:03, Tom Augspurger <tom.augspurger88 at gmail.com>
> wrote:
>
>> On Fri, May 29, 2020 at 11:37 AM Brock Mendel <jbrockmendel at gmail.com>
>> wrote:
>>
>>>
>>> We could then consider de-duplication. Tick is already redundant with
>>> Timedelta, and Timestamp[H] would render Period[H] redundant.  With
>>> appropriate deprecation cycle, we could rip out a bunch of code.
>>>
>>
>> What would the user facing changes that warrant deprecation? For me,
>> `Period` represents a span of time. It would make sense to implement
>> something like `pd.Timestamp("2000-01-01") in pd.Period("2000-01-01",
>> freq="H")`. But something checking whether that timestamp is in a
>> `Timestamp[H]` doesn't seem natural, since it represents a point in time
>> rather than a span.
>>
>>
> Personally, I don't think we necessarily need to add all units that are
> supported by numpy's datetime64/timedelta64 dtypes. First, because I don't
> think it is an important use case (people mostly want to be able to have
> dates outside of the range limits that nanosecond resolution gives us), and
> also because it makes it conceptually a lot more difficult. For example,
> what is a "Timestamp[H]" value? Does it represent the beginning or the end
> of the hour? That are questions that are already handled by the Period
> dtype, and I think it is a good thing to keep those concepts separated (you
> can of course ask the same question with a millisecond resolution, but I
> think generally people don't do that).
> Further, all the resolutions from nanosecond up to second are "just"
> multiplications x1000, keeping the implementation more simple (compared to
> resolutions of hours, months, ..).
>
> So for a timestamp dtype, we could maybe only support ns / µs / ms / s
> resolutions?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200601/89e841e9/attachment.html>