[Python-Dev] Proposing an alternative to PEP 410

Sun Feb 26 00:31:56 CET 2012

On Sat, Feb 25, 2012 at 1:31 PM, Barry Warsaw <barry at python.org> wrote:
> On Feb 23, 2012, at 01:28 PM, Larry Hastings wrote:
>
>>* Improve datetime.datetime objects so they support nanosecond resolution,
>>   in such a way that it's 100% painless to make them even more precise in
>>   the future.
>
> +1

And how would you do that? Given the way the API currently works you
pretty much have to add a separate field 'nanosecond' with a range of
0-999, leaving the microseconds field the same. (There are no
redundant fields.) This is possible but makes it quite awkward by the
time we've added picosecond and femtosecond.

>>* Add support to datetime objects that allows adding and subtracting ints
>>   and floats as seconds.  This behavior is controllable with a flag on the
>>   object--by default this behavior is off.
>
> Why conditionalize this behavior?  It should either be enabled or not, but
> making it switchable on a per-object basis seems like asking for trouble.

I am guessing that Larry isn't convinced that this is always a good
idea, but I agree with Barry that making it conditional is just too
complex.

>>* Support accepting naive datetime.datetime objects in all functions that
>>   accept a timestamp in os (utime etc).
>
> +1

What timezone would it assume? Timestamps are traditionally linked to
UTC -- but naive timestamps are most frequently used for local time.
Local time is awkward due to the ambiguities around DST transitions.

I do think we should support APIs for going back and forth between
timezone-aware datetime and timestamps.

>>* Change the result of os.stat to be a custom class rather than a
>>   PyStructSequence.  Support the sequence protocol on the custom class but
>>   mark it PendingDeprecation, to be removed completely in 3.5.  (I can't
>>   take credit for this idea; MvL suggested it to me once while we were
>>   talking about this issue.  Now that the os.stat object has named fields,
>>   who uses the struct unpacking anymore?)
>
> +1

Yeah, the sequence protocol is outdated here.

Would this be a mutable or an immutable object?

>>* Add support for setting "stat_float_times=2" (or perhaps
>>   "stat_float_times=datetime.datetime" ?) to enable returning st_[acm]time as
>>   naive datetime.datetime objects--specifically, ones that allow addition and
>>   subtraction of ints and floats.  The value would be similar to calling
>>   datetime.datetime.fromdatetime() on the current float timestamp, but
>>   would preserve all available precision.
>
> I personally don't much like the global state represented by
> os.stat_float_times() in the first place.

Agreed. We should just deprecate stat_float_times().

> Even though it also could be
> considered somewhat un-Pythonthic, I think it probably would have been better
> to accept an optional argument in os.stat() to determine the return value.

I still really don't like this.

> Or maybe it would have been more acceptable to have os.stat(), os.stat_float(),
> and os.stat_datetime() methods.

But I also don't like a proliferation of functions, especially since
there are already so many stat() functions: stat(), fstat(),
fstatat().

My proposal: add extra fields that represent the time in different
types. E.g. st_atime_nsec could be an integer expressing the entire
timestamp in nanoseconds; st_atime_decimal could give as much
precision as happens to be available as a Decimal; st_atime_datetime
could be a UTC-based datetime; and in the future we could have other
forms. Plain st_atime would be a float. (It can change if and when the
default floating point type changes.)

We could make these fields lazily computed so that if you never touch
st_atime_decimal, the decimal module doesn't get loaded. (It would be
awkward if "import os" would imply "import decimal", since the latter
is huge.)

>>* Add a new parameter to functions that produce stat-like timestamps to
>>   explicitly specify the type of the timestamps (float or datetime),
>>   as proposed in PEP 410.
>
> +1

No.

>>I disagree with PEP 410's conclusions about the suitability of datetime as
>>a timestamp object.  I think "naive" datetime objects are a perfect fit.
>>Specficially addressing PEP 410's concerns:
>>
>>   * I don't propose doing anything about the other functions that have no
>>     explicit start time; I'm only proposing changing the functions that deal
>>     with timestamps.  (Perhaps the right thing for epoch-less times like
>>     time.clock would be timedelta?  But I think we can table this discussion
>>     for now.)
>
> +1, and yeah, I think we've had general agreement about using timedeltas for
> epoch-less times.

Scratch that, *I* don't agree. timedelta is a pretty clumsy type to
use. Have you ever tried to compute the number of seconds between two
datetimes? You can't just use the .seconds field, you have to combine
the .days and .seconds fields. And negative timedeltas are even harder
due to the requirement that seconds and microseconds are never
negative; e.g -1 second is represented as -1 days plus 86399 seconds.

For fixed-epoch timestamps, *maybe* UTC datetime makes some sense. (We
did add the UTC timezone to the stdlib right?) But still I think the
flexibility of floating point wins, and there are no worries about
ambiguities.

>>   * "You can't compare naive and non-naive datetimes."  So what?  The
>>     existing timestamp from os.stat is a float, and you can't compare floats
>>     and non-naive datetimes.  How is this an issue?
>
> Exactly.

The problem is with the ambiguity of naive datetimes.

>>Perhaps someone else can propose something even better,
>
> If we really feel like we need to make a change to support higher resolution
> timestamps, this comes pretty darn close to what I'd like to see.

I'm currently also engaged in an off-list discussion with Victor.

I still think that when you are actually interested in *using* times,
the current float format is absolutely fine. Anybody who thinks they
need to accurately know the absolute time that something happened with
nanosecond accuracy is out of their mind; given relativity such times
have an incredibly local significance anyway. So I don't worry about
not being able to represent a timestamp with nsec precision. For
*relative* times, nanoseconds may be useful, and a float has no
trouble representing them. (A float can represent time intervals of
many millions of seconds with nanosecond precision. There are probably
only a few clocks in the world whose drift is less than a nanosecond
over such a timespan.)

The one exception here is making accurate copies of filesystem
metadata. This can be dealt with by making certain changes to
os.stat() and os.utime(). For os.stat(), adding extra fields like I
suggested above should work. For os.utime(), we could use keyword
arguments, or some other API hack.

-- 
--Guido van Rossum (python.org/~guido)