[Numpy-discussion] fixing up datetime
Christopher Barker
Chris.Barker at noaa.gov
Tue Jun 7 12:53:04 EDT 2011
Pierre GM wrote:
> Using the ISO as reference, you have a good definition of months.
Yes, but only one. there are others. For instance, the climate modelers
like to use a calendar that has 360 days a year: 12 30 day months. That
way they get something with the same timescale as months and years, but
have nice, linear, easy to use units (differentiable, and all that).
Mark Wiebe wrote:
> Code Interpreted as
> Y 12M, 52W, 365D
> M 4W, 30D, 720h
>
> This is even self inconsistent:
>
> 1Y == 365D
>
> 1Y == 12M == 12 * 30D == 360D
>
> 1Y == 12M == 12 * 4W == 12 * 4 * 7D == 336D
>
> 1Y == 52W == 52 * 7D == 364D
>
> Is it not clear from this what a mess of mis-interpretation might result
> from all that?
>
>
> This part of the code is used for mapping metadata like [Y/4] -> [3M],
> or [Y/26] -> [2W]. I agree that this '/' operator in the unit metadata
> is weird, and wouldn't object to removing it.
Weird, dangerous, and unnecessary. I can see how some data may be on,
for example quarters, but that should require a definition of quarters
that's more defined.
> This goes to heck is the data is expressed in something like "months
> since 1995-01-01"
>
> Because months are only defined on a Calendar.
>
>
> Here's what the current implementation can do with that one:
>
> >>> np.datetime64('1995-01-01', 'M') + 13
> numpy.datetime64('1996-02','M')
I see -- I have a better idea of the intent here, and I can see that as
long as you keep everything in the same unit (say, months, in this
case), then this can be a clean and effective way to deal with this sort
of data.
As I said, the netcdf case is a different use case, but I think the
issue there was that the creator of the data was thinking of it as being
used like above: "months since January, 1995", and the data was all
integer values for months, it makes perfect sense, and is well defined.
The problem in that case is that the standard does not have a
specification that enforces that the units stay months, and that the
intervals are integers -- so software looked at that, converted it to,
for example, python datetime instances, using some pre-defined
definition for the length of a month), and gt something that
mis-represented the data.
The numpy use-case is different, but it's my concern that that same kind
of thing could easily happen, because people want to write generic code
that deals with arbitrary np.datetime64 instances.
I suppose we could consider this analogous to issues with integer an
floating point dtypes -- when you convert between those, it's
user-beware, but I think that would be more clear if we had a set of dtypes:
datetime_months
datetime_hours
datetime_seconds
But that list would get big in a hurry!
Also, with the Python datetime module, for instance, what I like about
it is that I don't have to know or care how it's stored internally --
all I need to know is what range and precision it can deal with. numpy
has performance issues that may not make that possible, but I still like it.
maybe two types:
datetime_calendar: for Calendar-type units (months, business days, ...)
datetime_continuous: for "linear units" (seconds, hours, ...)
or something like that?
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list