[Python-Dev] PEP: New timestamp formats

Thu Feb 2 04:47:07 CET 2012

On Thu, Feb 2, 2012 at 11:03 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Even if I am not really conviced that a PEP helps to design an API,
> here is a draft of a PEP to add new timestamp formats to Python 3.3.
> Don't see the draft as a final proposition, it is just a document
> supposed to help the discussion :-)

Helping keep a discussion on track (and avoiding rehashing old ground)
is precisely why the PEP process exists. Thanks for writing this up :)

> ---
>
> PEP: xxx
> Title: New timestamp formats
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner <victor.stinner at haypocalc.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 01-Feburary-2012
> Python-Version: 3.3
>
>
> Abstract
> ========
>
> Python 3.3 introduced functions supporting nanosecond resolutions. Python 3.3
> only supports int or float to store timestamps, but these types cannot be use
> to store a timestamp with a nanosecond resolution.
>
>
> Motivation
> ==========
>
> Python 2.3 introduced float timestamps to support subsecond resolutions,
> os.stat() uses float timestamps by default since Python 2.5. Python 3.3
> introduced functions supporting nanosecond resolutions:
>
>  * os.stat()
>  * os.utimensat()
>  * os.futimens()
>  * time.clock_gettime()
>  * time.clock_getres()
>  * time.wallclock() (reuse time.clock_gettime(time.CLOCK_MONOTONIC))
>
> The problem is that floats of 64 bits are unable to store nanoseconds (10^-9)
> for timestamps bigger than 2^24 seconds (194 days 4 hours: 1970-07-14 for an
> Epoch timestamp) without loosing precision.
>
> .. note::
>   64 bits float starts to loose precision with microsecond (10^-6) resolution
>   for timestamp bigger than 2^33 seconds (272 years: 2242-03-16 for an Epoch
>   timestamp).
>
>
> Timestamp formats
> =================
>
> Choose a new format for nanosecond resolution
> ---------------------------------------------
>
> To support nanosecond resolution, four formats were considered:
>
>  * 128 bits float
>  * decimal.Decimal
>  * datetime.datetime
>  * tuple of integers

I'd add datetime.timedelta to this list. It's exactly what timestamps
are, after all - the difference between the current time and the
relevant epoch value.

> Various kind of tuples have been proposed. All propositions only use integers:
>
>  * a) (sec, nsec): C timespec structure, useful for os.futimens() for example
>  * b) (sec, floatpart, exponent): value = sec + floatpart * 10**exponent
>  * c) (sec, floatpart, divisor): value = sec + floatpart / divisor
>
> The format (a) only supports nanosecond resolution.
>
> The format (a) and (b) may loose precision if the clock divisor is not a
> power of 10.
>
> For format (c) should be enough for most cases.

Format (b) only loses precision if the exponent chosen for a given
value is too small relative to the precision of the underlying timer
(it's the same as using decimal.Decimal in that respect). The problem
with (a) is that it simply cannot represent times with greater than
nanosecond precision. Since we have the opportunity, we may as well
deal with the precision question once and for all.

Alternatively, you could return a 4-tuple that specifies the base in
addition to the exponent.

> Callback and creating a new module to convert timestamps
> --------------------------------------------------------
>
> Use a callback taking integers to create a timestamp. Example with float:
>
>    def timestamp_to_float(seconds, floatpart, divisor):
>        return seconds + floatpart / divisor
>
> The time module can provide some builtin converters, and other module, like
> datetime, can provide their own converters. Users can define their own types.
>
> An alternative is to add new module for all functions converting timestamps.
>
> The problem is that we have to design the API of the callback and we cannot
> change it later. We may need more information for future needs later.

I'd be more specific here - either of the 3-tuple options already
presented in the PEP, or the 4-tuple option I mentioned above, would
be suitable as the signature of an arbitrary precision callback API
that assumes timestamps are always expressed as "seconds since a
particular epoch value". Such an API could only become limiting if
timestamps ever become something other than "the difference in time
between right now and the relevant epoch value", and that's a
sufficiently esoteric possibility that it really doesn't seem
worthwhile to take it into account. The past problems with timestamp
APIs have all related to increases in precision, not timestamps being
redefined as something radically different.

The PEP should also mention PJE's suggestion of creating a new named
protocol specifically for the purpose (with a signature based on one
of the proposed tuple formats), such that you could simply write:

    time.time()  # output=float by default
    time.time(output=float)
    time.time(output=int)
    time.time(output=fractions.Fraction)
    time.time(output=decimal.Decimal)
    time.time(output=datetime.timedelta)
    time.time(output=datetime.datetime)
    # (and similarly for os.stat with a timestamp=type parameter)

Rather than being timestamp specific, such a protocol would be a
general numeric protocol. If (integer, numerator, denominator) is used
(i.e. a "mixed number" in mathematical terms), then "__from_mixed__"
would be an appropriate name. If (integer, fractional, exponent) is
used (i.e. a fixed point notation), then "__from_fixed__" would work.

    # Algorithm for a "from mixed numbers" protocol, assuming division
doesn't lose precision...
    def __from_mixed__(cls, integer, numerator, denominator):
        return cls(integer) + cls(numerator) / cls(denominator)

    # Algorithm for a "from fixed point" protocol, assuming negative
exponents don't lose precision...
    def __from_fixed__(cls, integer, mantissa, base, exponent):
        return cls(integer) + cls(mantissa) * cls(base) ** cls(exponent)

>From a *usage* point of view, this idea is actually the same as the
proposal currently in the PEP. The difference is that instead of
adding custom support for a few particular types directly to time and
os, it instead defines a more general purpose protocol that covers not
only this use case, but also any other situation where high precision
fractions are relevant.

One interesting question with a named protocol approach is whether
such a protocol should *require* explicit support, or if it should
fall back to the underlying mathematical operations. Since the
conversions to float and int in the timestamp case are already known
to be lossy, permitting lossy conversion via the mathematical
equivalents seems reasonable, suggesting possible protocol definitions
as follows:

    # Algorithm for a potentially precision-losing "from mixed numbers" protocol
    def from_mixed(cls, integer, numerator, denominator):
        try:
            factory = cls.__from_mixed__
        except AttributeError:
            return cls(integer) + cls(numerator) / cls(denominator)
        return factory(integer, numerator, denominator)

    # Algorithm for a potentially lossy "from fixed point" protocol
    def from_fixed(cls, integer, mantissa, base, exponent):
        try:
            factory = cls.__from_fixed__
        except AttributeError:
            return cls(integer) + cls(mantissa) * cls(base) ** cls(exponent)
        return factory(integer, mantissa, base, exponent)

> os.stat: add new fields
> -----------------------
>
> It was proposed to add 3 fields to os.stat() structure to get nanoseconds of
> timestamps.

It's worth noting that the challenge with this is that it's
potentially time consuming to populating the extra fields, and that
this approach doesn't help with the time APIs that return timestamps
directly.

> Add an argument to change the result type
> -----------------------------------------
>
> Add a argument to all functions creating timestamps, like time.time(), to
> change their result type. It was first proposed to use a string argument,
> e.g. time.time(format="decimal"). The problem is that the function has
> to import internally a module. Then it was decided to pass directly the
> type, e.g. time.time(format=decimal.Decimal). Using a type, the user has
> first to import the module. There is no direct link between a type and the
> function used to create the timestamp.
>
> By default, the float type is used to keep backward compatibility. For stat
> functions like os.stat(), the default type depends on os.stat_float_times().

There should also be a description of the "set a boolean flag to
request high precision output" approach.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia