From tim.peters at gmail.com Tue Sep 1 01:55:10 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 31 Aug 2015 18:55:10 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: [Alex] >> After some thought, I believe the way to fix the implementation is what I >> suggested at first: reset fold to 0 before calling utcoffset() in __hash__. >> A rare hash collision is a small price to pay for having datetimes with >> different timezones in the same dictionary. [Tim] > Ya, I can live with that. In effect, we give up on converting to UTC > correctly for purposes of computing hash(), but only in rare cases. > hash() doesn't really care, and it remains true that datetime equality > (which does care) still implies hash equality. The later and earlier > of ambiguous times will simply land on the same hash chain. Nope, you wore me out prematurely ;-) Consider datetimes dt1 and dt2 representing the earlier & later of an ambiguous time in their common zone (whatever it may be - doesn't matter). Then all fields are identical except for `fold`. Assume __hash__ forces `fold` to 0 before obtaining the UTC offset. Then we have: dt1 == dt2 hash(dt1) == hash(dt2) Fine so far as it goes. Now do: u1 = dt1.astimezone(timezone.utc) u2 = dt2.astimezone(timezone.utc) At this point we have: u1 == dt1 == dt2 == u2 and u1 < u2 hash(dt1) == hash(dt2) == hash(u1) (Parenthetically, note that despite the chain of equalities in the first of those lines, we do _not_ have u1 == u2 - transitivity fails, which is a bit of a wart by itself.) Since u1 == dt1, and hash(u1) == hash(dt1), no problem there either. But u1 isn't at all the same as u2, so hash(u2) can be the same as hash(u1) only by (unlikely) accident. hash(u2) is off in a world of its own. Therefore hash(dt2) can be the same as hash(u2) only by (the same unlikely) accident, despite that dt2 == u2. So, in all, __hash__ forcing fold=0 at the start hides the problem for ambiguous times in the same zone, but doesn't really touch the problem for cross-zone equivalent spellings of such times (not even if one of the zones is UTC, which is likely the most important case). One way to fix that is to have datetime.__hash__() _always_ return, say, 0 ;-) From alexander.belopolsky at gmail.com Tue Sep 1 02:16:29 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 31 Aug 2015 20:16:29 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On Mon, Aug 31, 2015 at 7:55 PM, Tim Peters wrote: > > [Alex] > >> After some thought, I believe the way to fix the implementation is what I > >> suggested at first: reset fold to 0 before calling utcoffset() in __hash__. > >> A rare hash collision is a small price to pay for having datetimes with > >> different timezones in the same dictionary. > > [Tim] > > Ya, I can live with that. In effect, we give up on converting to UTC > > correctly for purposes of computing hash(), but only in rare cases. > > hash() doesn't really care, and it remains true that datetime equality > > (which does care) still implies hash equality. The later and earlier > > of ambiguous times will simply land on the same hash chain. > > Nope, you wore me out prematurely ;-) > It's getting late in my TZ, but what you are saying below sounds like a complaint that if you put t=second 01:30 as a key in the dictionary, you cannot later retrieve it by looking up t.astimezone(timezone.utc). Sorry, but PEP 495 has never promised you that: "instances that differ only by the value of fold will compare as equal. Applications that need to differentiate between such instances should check the value of fold or convert them to a timezone that does not have ambiguous times." Maybe if we decide to do something with the arithmetic, we will be able to fix this wart as well. > > Consider datetimes dt1 and dt2 representing the earlier & later of an > ambiguous time in their common zone (whatever it may be - doesn't > matter). Then all fields are identical except for `fold`. Assume > __hash__ forces `fold` to 0 before obtaining the UTC offset. Then we > have: > > dt1 == dt2 > hash(dt1) == hash(dt2) > > Fine so far as it goes. Now do: > > u1 = dt1.astimezone(timezone.utc) > u2 = dt2.astimezone(timezone.utc) > > At this point we have: > > u1 == dt1 == dt2 == u2 and u1 < u2 > hash(dt1) == hash(dt2) == hash(u1) > > (Parenthetically, note that despite the chain of equalities in the > first of those lines, we do _not_ have u1 == u2 - transitivity fails, > which is a bit of a wart by itself.) > > Since u1 == dt1, and hash(u1) == hash(dt1), no problem there either. > > But u1 isn't at all the same as u2, so hash(u2) can be the same as > hash(u1) only by (unlikely) accident. hash(u2) is off in a world of > its own. Therefore hash(dt2) can be the same as hash(u2) only by (the > same unlikely) accident, despite that dt2 == u2. > > So, in all, __hash__ forcing fold=0 at the start hides the problem for > ambiguous times in the same zone, but doesn't really touch the problem > for cross-zone equivalent spellings of such times (not even if one of > the zones is UTC, which is likely the most important case). > > One way to fix that is to have datetime.__hash__() _always_ return, say, 0 ;-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 1 02:25:36 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 31 Aug 2015 20:25:36 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On Mon, Aug 31, 2015 at 8:16 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > Sorry, but PEP 495 has never promised you that: "instances that differ > only by the value of fold will compare as equal. Applications that need to > differentiate between such instances should check the value of fold or > convert them to a timezone that does not have ambiguous times." When I was writing some early drafts, I thought about advising users to use (local_datetime, local_datetime.fold) pairs as dictionary keys, but decided not to because using local_datetime.astimezone(timezone.utc) is a much better option. Now I think such advise may be relevant if a user truly has a need to sort out timestamps that come from many different timezones and for some reason wants to avoid conversion to UTC, but I don't think it will belong to the PEP. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 1 03:01:18 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 31 Aug 2015 20:01:18 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: [Alex] > It's getting late in my TZ, but what you are saying below sounds like a > complaint that if you put t=second 01:30 as a key in the dictionary, you > cannot later retrieve it by looking up t.astimezone(timezone.utc). I don't grasp that. What I am saying should be very clear: under all schemes so far, there are datetimes x and y such that x == y but hash(x) != hash(y). You see that or you don't. If you don't, I'll keep trying until you do ;-) So do you see that? > Sorry, but PEP 495 has never promised you that: "instances that differ > only by the value of fold will compare as equal. Applications that need to > differentiate between such instances should check the value of fold or > convert them to a timezone that does not have ambiguous times." Oh, come on. That's in the "Temporal Arithmetic" section: > There isn't a single instance of any kind of arithmetic in the example I gave, except for comparison, where I assumed only that comparison would behave the way the PEP _said_ it behaves. I'm not fighting the PEP here - I'm trying to illustrate a _consequence_ of what the PEP says. It's simply impossible to deduce from the paragraph above.that the fundamental invariant required for dict key types may fail. Here from the __hash__ docs: https://docs.python.org/3/reference/datamodel.html#object.__hash__ ... The only required property is that objects which compare equal have the same hash value; It's a violation of __hash__'s _only_ requirement, so even if there's no intent to fix it, the PEP needs to spell that out clearly. Code slinging dicts can fail in bizarre ways when the invariant is violated. > Maybe if we decide to do something with the arithmetic, we will be able to > fix this wart as well. Doubt it - this has nothing to do with arithmetic I can see. It's a consequence of wanting to ignore `fold` in contexts where it really does make a difference. __hash__() is one such place. Like I said at the start, it's a puzzle. From ethan at stoneleaf.us Tue Sep 1 03:12:15 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 31 Aug 2015 18:12:15 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: <55E4FB6F.4080504@stoneleaf.us> On 08/31/2015 04:55 PM, Tim Peters wrote: [...] > At this point we have: > > u1 == dt1 == dt2 == u2 and u1 < u2 > hash(dt1) == hash(dt2) == hash(u1) > > (Parenthetically, note that despite the chain of equalities in the > first of those lines, we do _not_ have u1 == u2 - transitivity fails, > which is a bit of a wart by itself.) At this point are there any other cases in the stdlib where transitivity fails? I was under the impression that such cases are to be considered bugs. I know it was a driving concern in the implementation of the enum module. -- ~Ethan~ From tim.peters at gmail.com Tue Sep 1 04:59:22 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 31 Aug 2015 21:59:22 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E4FB6F.4080504@stoneleaf.us> References: <55E4FB6F.4080504@stoneleaf.us> Message-ID: [Tim] >> [...] >> At this point we have: >> >> u1 == dt1 == dt2 == u2 and u1 < u2 >> hash(dt1) == hash(dt2) == hash(u1) >> >> (Parenthetically, note that despite the chain of equalities in the >> first of those lines, we do _not_ have u1 == u2 - transitivity fails, >> which is a bit of a wart by itself.) [Ethan Furman ] > At this point are there any other cases in the stdlib where transitivity > fails? I don't know. Python grew more features than I needed some time ago, so I'm not up to date. Did we ever implement the long-awaited RockScissorsPaper type? ;-) If not, there are none that I know of. > I was under the impression that such cases are to be considered > bugs. I know it was a driving concern in the implementation of the enum > module. Sorry, I just had to laugh at the notion that enums _could_ be implemented in such a convoluted way that there'd ever be the slightest possibility of transitivity failing ;-) Anyway, sure, they're considered bugs, unless there's some darned good reason for it. In this case, I'm not entirely sure. Having comparison ignore `fold` seems aimed at backward compatibility - but it's another case where a non-zero fold can't appear unless a user forces it to, until 495-compliant tzinfos appear (in which case .fromutc() may create fold=1 by magic). When that happens, it will seem strange that fold is ignored by comparisons. At a higher level, I'd say that a datetime with fold=1 is veritably _screaming_ "I'm no longer following the naive time model". But there are consequences too from following that intuition ... From rosuav at gmail.com Tue Sep 1 06:03:46 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 1 Sep 2015 14:03:46 +1000 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E4FB6F.4080504@stoneleaf.us> Message-ID: On Tue, Sep 1, 2015 at 12:59 PM, Tim Peters wrote: >> I was under the impression that such cases are to be considered >> bugs. I know it was a driving concern in the implementation of the enum >> module. > > Sorry, I just had to laugh at the notion that enums _could_ be > implemented in such a convoluted way that there'd ever be the > slightest possibility of transitivity failing ;-) > Easy: you just declare that different enumerations are not comparable,, but that all are comparable to their base type. class Color(IntEnum): RED=1 GREEN=2 BLUE=3 class Permission(IntEnum): READ=1 WRITE=2 EXECUTE=3 What should Color.RED==Permission.READ give? True, because they're both 1? False, because they're completely different things? TypeError, because you can't logically compare colors and permissions? If you allow that Color.RED==1 (which you need to if it's going to be possible to backwardly-compatibly change raw numbers into an enum), then transitivity demands that the otherwise-illogical comparison above succeed, and be True. As a design decision, it could viably be taken either way, but once taken, it has to be maintained forever. ChrisA From tim.peters at gmail.com Tue Sep 1 07:23:46 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 1 Sep 2015 00:23:46 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E49D28.8030204@oddbird.net> <55E49F27.9000906@oddbird.net> Message-ID: [Alex] > I think the main difference between Tim's current proposal and what was > previously discussed is that all older proposals somehow required a third > value for fold. Note that there is a third variant suggested by Guido > off-list and discussed in the PEP: have fold=-1 by default, ignore it > unless it is nonnegative and design whatever you want for fold=0/1 without > concerns for backward compatibility. This effectively will give two > different datetime classes: classic and new. Both are perfectly consistent, > but if you think interoperation between naive and aware is confusing, try to > explain how new naive instances will interoperate with classic aware! It's worth some thought. I don't think interoperation between naive and aware now is confusing at all. It's usually just plain forbidden; e.g., >>> import datetime >>> x = datetime.datetime.now() # naive >>> y = x.replace(tzinfo=datetime.timezone.utc) # aware >>> x < y Traceback (most recent call last): File "", line 1, in x < y TypeError: can't compare offset-naive and offset-aware datetimes >>> x - y Traceback (most recent call last): File "", line 1, in x - y TypeError: can't subtract offset-naive and offset-aware datetimes >>> x == y False Only that last one may be surprising, but it's really just another way of saying "naive and aware don't mix, period". Do you have other kinds of interoperation in mind? Presumably a similarly high wall would be erected between fold < 0 and fold >= 0 instances. If this were pursued then, e.g., the seemingly intractable problem with __hash__() would go away (no more reason to _try_ to ignore fold >= 0), and, e.g., for an aware dt then dt.replace(fold=1) - dt.replace(fold=0) could return the expected result when dt specified an ambiguous time (ditto: no more reason to try to ignore fold==1), and likewise for comparing those values. I can see one kind of annoyance that would remain: dt2 = dt1 + a_timedelta is currently specified to force dt2.fold==0 even if dt1.fold==1. But that may not make good sense. There's no way to know whether adding `a_timedelta` takes dt1 out of a fold without doing timeline arithmetic. The conceptual mess in my head is that "fold=1" screams "I'm no longer in naive time", but "fold=0" does not (where "in naive time" means classic arithmetic is appropriate, and "not in naive time" means timeline arithmetic is appropriate - while fold < 0 would be an explicit way to say "in naive time", it's unclear that "fold >= 0" should always mean "not in naive time", despite that fold=1 makes no sense in naive time). At least it's all clear now ;-) From alexander.belopolsky at gmail.com Tue Sep 1 16:34:45 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 10:34:45 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E49D28.8030204@oddbird.net> <55E49F27.9000906@oddbird.net> Message-ID: On Tue, Sep 1, 2015 at 1:23 AM, Tim Peters wrote: > >>> x == y > False > > Only that last one may be surprising, but it's really just another way > of saying "naive and aware don't mix, period". > This is a relatively recent feature. [1, 2, 3] Changed in version 3.3. [1]: http://mail.python.org/pipermail/python-dev/2012-June/119933.html [2]: http://bugs.python.org/issue15006 [3]: https://hg.python.org/cpython/rev/8272699973cb -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 1 16:41:28 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 10:41:28 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E49D28.8030204@oddbird.net> <55E49F27.9000906@oddbird.net> Message-ID: On Tue, Sep 1, 2015 at 1:23 AM, Tim Peters wrote: > I can see one kind of annoyance that would remain: > > dt2 = dt1 + a_timedelta > > is currently specified to force dt2.fold==0 even if dt1.fold==1. But > that may not make good sense. > Note that dt2.fold==0 even if dt1.fold==1 *and* a_timedelta==timedelta(0). This is what I call "fold-unaware" arithmetic. It is consistent with dt2==dt1. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 1 17:59:15 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 11:59:15 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On Mon, Aug 31, 2015 at 9:01 PM, Tim Peters wrote: > [Alex] > > It's getting late in my TZ, but what you are saying below sounds like a > > complaint that if you put t=second 01:30 as a key in the dictionary, you > > cannot later retrieve it by looking up t.astimezone(timezone.utc). > > I don't grasp that. What I am saying should be very clear: under all > schemes so far, there are datetimes x and y such that x == y but > hash(x) != hash(y). You see that or you don't. If you don't, I'll > keep trying until you do ;-) So do you see that? > > I do (in the morning.) > .. > [Alex] > > > Maybe if we decide to do something with the arithmetic, we will be able > to > > fix this wart as well. > [Tim] > Doubt it - this has nothing to do with arithmetic I can see. It's a > consequence of wanting to ignore `fold` in contexts where it really > does make a difference. __hash__() is one such place. > Arithmetic and comparisons are intertwined as long as you require that not(a - b) ? a==b. The main problem for hash as I see it is that x == y may or may not call x.utcoffset() depending on the value of y. This is a problem for hash(x) which should decide whether > > Like I said at the start, it's a puzzle. > Let's formulate the puzzle: Define datetime.__hash__ so that given PEP 495 rules for datetime.__eq__, x == y implies hash(x) == hash(y). First (trivial) observation: a solution exists, e.g. hash(x) == 0. This is not a very practical solution, but shows that the puzzle is not a logical impossibility. Second observation: We cannot improve on hash(x) == 0 without some knowledge of what timezones are known to the system. Proof: let u1 < u2 be two arbitrary UTC times. We can always construct a timezone (FoldZone) where u1 and u2 map to the same local time. All we need to do is to create a fold of size u2 - u1 at some time u between u1 and u2. Let t1 = u1.astimezone(FoldZone) and t2 = u2.astimezone(FoldZone). By construction, t1 == t2, t1.fold = 0 and t2.fold = 1. If x == y implies hash(x) == hash(y), then u1 == t1 implies hash(u1) == hash(t1) and similarly u2 == t2 implies hash(u2) == hash(t2) and t1 == t2 implies hash(t1) == hash(t2). Since hash values are integers and == is transitive for integers, from a chain hash(u1) == hash(t1), hash(t1) == hash(t2), hash(t2) == hash(u2 )we conclude that hash(u1) == hash(u2) and therefore the only solution is hash(x) == const. This sounds discouraging, but note that the FoldZone that we constructed is rather unrealistic because depending on the values of u1 and u2, the size of the fold can range from microseconds to centuries. Third observation: If we have only one variable offset timezone (Local), then we can solve the problem by defining datetime.__hash__(self) as for example, hash(self.astimezone(Local).replace(fold=0) - datetime(1, 1, 1, tzinfo=Local)). Note that in the last expression, hash is taken of a timedelta object, so the definition is not circular. (A proof that x == y implies hash(x) == hash(y) in this case is left as an exercise for the reader.:-) Fourth observation: A solution for one variable offset timezone generalizes to the case of an arbitrary number of such timezones. A theoretical construction is to simply iterate x = x.astimezone(Zone).replace(fold=0) over all the zones known to the system, but certainly a more efficient algorithm can be devised to to achieve the same result in a single lookup in a specially crafted table. So the puzzle is not unsolvable, but how much of it has to be solved in PEP 495? I would say not much. I agree with Tim that non-transitivity of == and the violation of the hash invariant need to be mentioned in the PEP. However, since PEP 495 by itself does not introduce any new tzinfo implementations and the existing fixed offset timezones don't suffer from this problem, I think we can leave the final resolution to the timezone.local or the zoneinfo PEP. An important lesson is in the second observation. To solve the hash puzzle, we need to have a global view of the totality of timezones that will be supported by the system. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 1 18:06:36 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 12:06:36 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On Tue, Sep 1, 2015 at 11:59 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > Third observation: If we have only one variable offset timezone (Local), > then we can solve the problem by defining datetime.__hash__(self) as for > example, hash(self.astimezone(Local).replace(fold=0) - datetime(1, 1, 1, > tzinfo=Local)). > Note that given classic arithmetic, .replace(fold=0) is redundant. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 1 18:12:53 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Sep 2015 09:12:53 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: I could not accept a PEP that leads to different datetime being considered == but having a different hash (*unless* due to a buggy tzinfo subclass implementation -- however no historical timezone data should ever depend on such a bug). I'm much less concerned about < being intransitive in edge cases. I also don't particularly care about == following from the difference being zero. Still, unless we're constrained by backward compatibility, I would rather not add equivalence between *any* two datetimes whose tzinfo is not the same object -- even if we can infer that they both must refer to the same instant. On Tue, Sep 1, 2015 at 8:59 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > > On Mon, Aug 31, 2015 at 9:01 PM, Tim Peters wrote: > >> [Alex] >> > It's getting late in my TZ, but what you are saying below sounds like a >> > complaint that if you put t=second 01:30 as a key in the dictionary, you >> > cannot later retrieve it by looking up t.astimezone(timezone.utc). >> >> I don't grasp that. What I am saying should be very clear: under all >> schemes so far, there are datetimes x and y such that x == y but >> hash(x) != hash(y). You see that or you don't. If you don't, I'll >> keep trying until you do ;-) So do you see that? >> >> > I do (in the morning.) > >> .. >> > [Alex] > >> >> > Maybe if we decide to do something with the arithmetic, we will be able >> to >> > fix this wart as well. >> > > [Tim] > >> Doubt it - this has nothing to do with arithmetic I can see. It's a >> consequence of wanting to ignore `fold` in contexts where it really >> does make a difference. __hash__() is one such place. >> > > Arithmetic and comparisons are intertwined as long as you require that > not(a - b) ? a==b. The main problem for hash as I see it is that x == y > may or may not call x.utcoffset() depending on the value of y. This is a > problem for hash(x) which should decide whether > >> >> Like I said at the start, it's a puzzle. >> > > Let's formulate the puzzle: Define datetime.__hash__ so that given PEP 495 > rules for datetime.__eq__, x == y implies hash(x) == hash(y). > > First (trivial) observation: a solution exists, e.g. hash(x) == 0. This > is not a very practical solution, but shows that the puzzle is not a > logical impossibility. > > Second observation: We cannot improve on hash(x) == 0 without some > knowledge of what timezones are known to the system. Proof: let u1 < u2 be > two arbitrary UTC times. We can always construct a timezone (FoldZone) > where u1 and u2 map to the same local time. All we need to do is to create > a fold of size u2 - u1 at some time u between u1 and u2. Let t1 = > u1.astimezone(FoldZone) > and t2 = u2.astimezone(FoldZone). By construction, t1 == t2, t1.fold = 0 > and t2.fold = 1. If x == y implies hash(x) == hash(y), then u1 == t1 > implies hash(u1) == hash(t1) and similarly u2 == t2 implies hash(u2) == > hash(t2) and t1 == t2 implies hash(t1) == hash(t2). Since hash values are > integers and == is transitive for integers, from a chain hash(u1) == > hash(t1), hash(t1) == hash(t2), hash(t2) == hash(u2 )we conclude that > hash(u1) == hash(u2) and therefore the only solution is hash(x) == const. > > This sounds discouraging, but note that the FoldZone that we constructed > is rather unrealistic because depending on the values of u1 and u2, the > size of the fold can range from microseconds to centuries. > > Third observation: If we have only one variable offset timezone (Local), > then we can solve the problem by defining datetime.__hash__(self) as for > example, hash(self.astimezone(Local).replace(fold=0) - datetime(1, 1, 1, > tzinfo=Local)). Note that in the last expression, hash is taken of a > timedelta object, so the definition is not circular. (A proof that x == y > implies hash(x) == hash(y) in this case is left as an exercise for the > reader.:-) > > Fourth observation: A solution for one variable offset timezone > generalizes to the case of an arbitrary number of such timezones. A > theoretical construction is to simply iterate x = > x.astimezone(Zone).replace(fold=0) over all the zones known to the system, > but certainly a more efficient algorithm can be devised to to achieve the > same result in a single lookup in a specially crafted table. > > So the puzzle is not unsolvable, but how much of it has to be solved in > PEP 495? I would say not much. I agree with Tim that non-transitivity of > == and the violation of the hash invariant need to be mentioned in the > PEP. However, since PEP 495 by itself does not introduce any new tzinfo > implementations and the existing fixed offset timezones don't suffer from > this problem, I think we can leave the final resolution to the > timezone.local or the zoneinfo PEP. > > An important lesson is in the second observation. To solve the hash > puzzle, we need to have a global view of the totality of timezones that > will be supported by the system. > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Tue Sep 1 18:36:05 2015 From: carl at oddbird.net (Carl Meyer) Date: Tue, 1 Sep 2015 10:36:05 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: <55E5D3F5.40600@oddbird.net> On 09/01/2015 10:12 AM, Guido van Rossum wrote: > I'm much less concerned about < being intransitive in edge cases. I also > don't particularly care about == following from the difference being > zero. Still, unless we're constrained by backward compatibility, I would > rather not add equivalence between *any* two datetimes whose tzinfo is > not the same object -- even if we can infer that they both must refer to > the same instant. I think the latter is certainly a backwards-compatibility requirement, since that equivalence is already very much present in the current implementation of datetime.__eq__ (well, datetime._cmp). If two datetimes have different tzinfo objects, they are converted to UTC and compared as instants. Following the same model would certainly imply that a fold=0 and fold=1 datetime that are otherwise identical should not be considered equal, because they clearly represent different instants. I guess Alex's opposition to that is the (very small) chance of backwards-incompatibility, since currently it is possible to take two non-equal UTC datetimes an hour apart at a fold, convert them to local time, and then have them compare equal (since pre PEP 495 the conversion to local time during a fold loses information). Personally I think that latter backwards-incompatibility would be a reasonable bugfix to make the existing semantics of datetime equality consistent in folds. Though I suppose it's possible someone somewhere is relying on that as a very strange way of detecting a fold? Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Tue Sep 1 18:37:05 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 12:37:05 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On Tue, Sep 1, 2015 at 12:12 PM, Guido van Rossum wrote: > I could not accept a PEP that leads to different datetime being considered > == but having a different hash (*unless* due to a buggy tzinfo subclass > implementation -- however no historical timezone data should ever depend on > such a bug). > I agree, but my analysis demonstrates that we cannot fix hash to make an arbitrary tzinfo work. ("Arbitrary" includes tzinfos with leap microseconds and leap centuries.) We can probably come up with a good enough hash if we restrict fold sizes to multiples of 15 min up to 1 hour and locations to a hour boundaries. My preferred solution would be to delegate hash calculation to tzinfo and make it someone else's headache, but I know you don't like this solution. > I'm much less concerned about < being intransitive in edge cases. I also > don't particularly care about == following from the difference being zero. > I believe Tim does care about this. I did consider divorcing comparison and arithmetic, but I think that led to problems with the total ordering. Maybe we can make == differentiate between fold=0 and fold=1 at the expense of not(a > b) and not(b Still, unless we're constrained by backward compatibility, I would rather > not add equivalence between *any* two datetimes whose tzinfo is not the > same object -- even if we can infer that they both must refer to the same > instant. > Not even for fixed offset timezones? I am afraid this will break too many programs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 1 18:58:34 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Sep 2015 09:58:34 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On Tue, Sep 1, 2015 at 9:37 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Tue, Sep 1, 2015 at 12:12 PM, Guido van Rossum > wrote: > >> I could not accept a PEP that leads to different datetime being >> considered == but having a different hash (*unless* due to a buggy tzinfo >> subclass implementation -- however no historical timezone data should ever >> depend on such a bug). >> > > I agree, but my analysis demonstrates that we cannot fix hash to make an > arbitrary tzinfo work. ("Arbitrary" includes tzinfos with leap > microseconds and leap centuries.) We can probably come up with a good > enough hash if we restrict fold sizes to multiples of 15 min up to 1 hour > and locations to a hour boundaries. > That's bizarre. I suspect this came from assuming too much about how == must work. > My preferred solution would be to delegate hash calculation to tzinfo and > make it someone else's headache, but I know you don't like this solution. > > > >> I'm much less concerned about < being intransitive in edge cases. I also >> don't particularly care about == following from the difference being zero. >> > > I believe Tim does care about this. I did consider divorcing comparison > and arithmetic, but I think that led to problems with the total ordering. > Maybe we can make == differentiate between fold=0 and fold=1 at the expense > of not(a > b) and not(b I am not too hopeful. Messing with total ordering axioms is just as fatal > for binary searches as messing with hash invariants is for dictionary > lookups. > I think it's better to have some values that are neither < nor == nor > each other, than to have two values that are == but differ in hash. > Still, unless we're constrained by backward compatibility, I would rather >> not add equivalence between *any* two datetimes whose tzinfo is not the >> same object -- even if we can infer that they both must refer to the same >> instant. >> > > Not even for fixed offset timezones? I am afraid this will break too many > programs. > Oh, it looks like we currently allow < and > if the utcoffset() of both arguments are the same. I presume that's really a proxy for "both tzinfos have the same fixed offset" which we can't detect directly. But this is already pretty broken -- for tzinfos that don't have fixed offsets, the comparison succeeds if both datetimes happen to fall in a period where the offsets *are* the same. In any case, a broken total ordering doesn't bother me that much, except when the tzinfo is the same object. I wonder if we could cache the built-in fixed-offset timezone instances? (Currently a new instance is created each time you call astimezone(None).) Does pytz reuse its fixed-offset objects? And given that we already have total ordering problems, from that perspective I could live with declaring that two datetimes that differ only in the fold are unequal. (Hm, aren't they already unequal because their utcoffset() differs?) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 1 19:00:19 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 13:00:19 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E5D3F5.40600@oddbird.net> References: <55E5D3F5.40600@oddbird.net> Message-ID: On Tue, Sep 1, 2015 at 12:36 PM, Carl Meyer wrote: > On 09/01/2015 10:12 AM, Guido van Rossum wrote: > > I'm much less concerned about < being intransitive in edge cases. I also > > don't particularly care about == following from the difference being > > zero. Still, unless we're constrained by backward compatibility, I would > > rather not add equivalence between *any* two datetimes whose tzinfo is > > not the same object -- even if we can infer that they both must refer to > > the same instant. > > I think the latter is certainly a backwards-compatibility requirement, > since that equivalence is already very much present in the current > implementation of datetime.__eq__ (well, datetime._cmp). If two > datetimes have different tzinfo objects, they are converted to UTC and > compared as instants. > > Following the same model would certainly imply that a fold=0 and fold=1 > datetime that are otherwise identical should not be considered equal, > because they clearly represent different instants. I guess Alex's > opposition to that is the (very small) chance of > backwards-incompatibility, since currently it is possible to take two > non-equal UTC datetimes an hour apart at a fold, convert them to local > time, and then have them compare equal (since pre PEP 495 the conversion > to local time during a fold loses information). Here is an idea that I think may work: let's consider fold=1 instances as if they have a different tzinfo instance from the other side in both datetime subtractions and comparisons. This will be consistent with the current stdlib and pytz work-arounds of representing "second" times using fictitious fixed-offset timezones. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Tue Sep 1 19:01:45 2015 From: carl at oddbird.net (Carl Meyer) Date: Tue, 1 Sep 2015 11:01:45 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: <55E5D9F9.4050700@oddbird.net> On 09/01/2015 11:00 AM, Alexander Belopolsky wrote: > Here is an idea that I think may work: let's consider fold=1 instances > as if they have a different tzinfo instance from the other side in both > datetime subtractions and comparisons. This will be consistent with the > current stdlib and pytz work-arounds of representing "second" times > using fictitious fixed-offset timezones. +1 Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Tue Sep 1 19:13:04 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 13:13:04 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On Tue, Sep 1, 2015 at 12:58 PM, Guido van Rossum wrote: > And given that we already have total ordering problems, from that > perspective I could live with declaring that two datetimes that differ only > in the fold are unequal. (Hm, aren't they already unequal because their > utcoffset() differs?) They are not unequal because their tzinfos are the same. In this case __sub__ (and as a consequence __eq__) does not call utcoffset() to follow the rules of classic arithmetic. My new suggestion is to use timeline arithmetic whenever fold=1 datetime instance is involved. This should not break any programs that don't encounter fold=1 instances and in effect will make fold=1 instances behave similar to how their timezone.utc equivalents behave now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 1 19:26:51 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 1 Sep 2015 12:26:51 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: [Guido] > I could not accept a PEP that leads to different datetime being considered > == but having a different hash (*unless* due to a buggy tzinfo subclass > implementation -- however no historical timezone data should ever depend on > such a bug). > > I'm much less concerned about < being intransitive in edge cases. Offhand I don't know whether it can be (probably). The case I stumbled into yesterday showed that equality ("==") could be intransitive: assert a == b == c == d and a < d While initially jarring, I called it a "minor wart", because the middle "==" there is working in classic arithmetic but the other two are working in timeline arithmetic. But _a_ wart all the same, since transitivity doesn't fail today. > I also don't particularly care about == following from the difference being zero. > Still, unless we're constrained by backward compatibility, I would rather > not add equivalence between *any* two datetimes whose tzinfo is not the same > object -- even if we can infer that they both must refer to the same > instant. Assuming "equivalent" means "compare equal", we're highly constrained. For datetimes x and y with distinct non-None tzinfos, it's always been the case that: 1. x-y effectively converted both to UTC before subtraction. 2. comparison effectively interpreted x-y as a __cmp__ result 2a. various comparison transitivities essentially followed from that 3. Because of #2, to maintain __hash__'s contract datetime.__hash__ also effectively converted to UTC before hashing All of that would (well, "should") continue to work fine, except that fold=1 is being ignored in intrazone arithmetic (subtraction and comparisons) and by hash(). Maybe there are other surprises. I just happened to notice the hash() problem, and equality intransitivity, both yesterday. via thought experiments. On the face of it, it's a conceptual mess to try to make fold=1 "mean something" in some contexts but not in others. In particular, arithmetic, comparison, and hashing are usually deeply interrelated, and have been in datetime so far. Ignoring `fold` in single-zone arithmetic, comparisons and hashing works fine (in "naive time", where `fold` is senseless), but when going across zones `fold` cannot be ignored. That's a huge problem for hash(), because it can have no idea whether the pattern of later equality comparisons relying on hash results _will_ be using classic or timeline rules (or a mix of both). That didn't matter before, because _a_ unique UTC equivalent always existed (the possibility of ambiguous times was effectively ignored). Now it does matter, because the UTC equivalent can differ depending on the `fold` value. Ignoring it sometimes but not others leads to the current quandary. From tim.peters at gmail.com Tue Sep 1 19:35:23 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 1 Sep 2015 12:35:23 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E49D28.8030204@oddbird.net> <55E49F27.9000906@oddbird.net> Message-ID: [Tim] >> I can see one kind of annoyance that would remain: >> >> dt2 = dt1 + a_timedelta >> >> is currently specified to force dt2.fold==0 even if dt1.fold==1. But >> that may not make good sense. [Alex] > Note that dt2.fold==0 even if dt1.fold==1 *and* a_timedelta==timedelta(0). Yup. > This is what I call "fold-unaware" arithmetic. It is consistent with > dt2==dt1. Heh - setting dt2.fold = random.randrange(2) would also be consistent with dt2 == dt1. That is, "==" ignores both `fold`s entirely in this case. From tim.peters at gmail.com Tue Sep 1 19:44:47 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 1 Sep 2015 12:44:47 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: [Alex] > Here is an idea that I think may work: let's consider fold=1 instances as if > they have a different tzinfo instance from the other side in both datetime > subtractions and comparisons. This will be consistent with the current > stdlib and pytz work-arounds of representing "second" times using fictitious > fixed-offset timezones. That's what I was getting at by saying "fold=1 veritably _screams_ 'I'm no longer working in naive time'". Which implies "I need timeline arithmetic", and everything else follows from that, including hash() not ignoring fold=1 either. But then the concept of "naive time" gets muddier: sometimes, e.g., dt1 - dt2 in a common zone (same tzinfo) will use classic arithmetic, but in other cases (fold=1 in at least one) timeline arithmetic. And there's also that, after d = dt1 - dt2 I suspect it may no longer always be the case that dt1 == dt2 + d (unsure, but can't make time for it now) From alexander.belopolsky at gmail.com Tue Sep 1 20:00:31 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 14:00:31 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: On Tue, Sep 1, 2015 at 1:44 PM, Tim Peters wrote: > [Alex] > > Here is an idea that I think may work: let's consider fold=1 instances > as if > > they have a different tzinfo instance from the other side in both > datetime > > subtractions and comparisons. This will be consistent with the current > > stdlib and pytz work-arounds of representing "second" times using > fictitious > > fixed-offset timezones. > > That's what I was getting at by saying "fold=1 veritably _screams_ > 'I'm no longer working in naive time'". Which implies "I need > timeline arithmetic", and everything else follows from that, including > hash() not ignoring fold=1 either. > > But then the concept of "naive time" gets muddier: sometimes, e.g., > > dt1 - dt2 > > in a common zone (same tzinfo) will use classic arithmetic, but in > other cases (fold=1 in at least one) timeline arithmetic. > I don't think this is a problem as long as we disallow mixing naive and aware instances in arithmetic and ordering and keep naive ? aware always rule. > > And there's also that, after > > d = dt1 - dt2 > > I suspect it may no longer always be the case that > > dt1 == dt2 + d > > (unsure, but can't make time for it now) > That's the price we pay for classic arithmetic anyways. I am not even sure we want to trigger timeline arithmetics in dt + delta expressions when dt.fold=1. If you do, dt - hour + hour will still not take you back because the seconds + hour will be classic. I don't think we can ever get rid of all paradoxes here. Once you let your time go back, all bets are off. What we can do is to shift them from one place to another so that you only see odd behavior when a fold=1 instance is involved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 1 21:03:59 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Sep 2015 12:03:59 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: As a point of order, I don't have time today (nor probably this week) to keep up with this discussion. :-( -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 1 21:50:41 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 15:50:41 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: On Tue, Sep 1, 2015 at 2:00 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > And there's also that, after >> >> d = dt1 - dt2 >> >> I suspect it may no longer always be the case that >> >> dt1 == dt2 + d >> >> (unsure, but can't make time for it now) >> > > That's the price we pay for classic arithmetic anyways. > Let me clarify what I mean by that: >>> from datetime import * >>> exec(open("Doc/includes/tzinfo-examples.py").read()) >>> t1 = datetime(2015, 10, 31, 12, tzinfo=Eastern) >>> t2 = datetime(2015, 11, 1, 12, tzinfo=Eastern) >>> u = datetime(2000, 1, 1, tzinfo=timezone.utc) >>> (t1 - u) - (t2 - u) == t2 - t1 False This is a fundamental property of classic arithmetic and the only way to prevent something like this from happening is (as Guido mentioned previously) to disallow cross-zone arithmetic. This would be quite justifiable from the relativity POV: whether or not two events occur simultaneously at two different places depends on the speed of the observer. This fact will be important when ordinary computers get clocks with nanosecond precision. Meanwhile, our governments let us enjoy the effects relativity and time travel twice a year at pedestrian speeds: if you add a day in New York, go to Paris, subtract a day there and come back to New York you may not find yourself at the same time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 1 21:55:22 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 1 Sep 2015 15:55:22 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: On Tue, Sep 1, 2015 at 3:50 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > >>> (t1 - u) - (t2 - u) == t2 - t1 > False > I messed up the order. the above should have been >>> (t1 - u) - (t2 - u) == t1 - t2 False -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 1 23:56:23 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 1 Sep 2015 16:56:23 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: [Alex] >>> Here is an idea that I think may work: let's consider fold=1 instances >>> as if they have a different tzinfo instance from the other side in both >>> datetime subtractions and comparisons. This will be consistent with >>> the current stdlib and pytz work-arounds of representing "second" >>> times using fictitious fixed-offset timezones. [Tim] >> That's what I was getting at by saying "fold=1 veritably _screams_ >> 'I'm no longer working in naive time'". Which implies "I need >> timeline arithmetic", and everything else follows from that, including >> hash() not ignoring fold=1 either. >> >> But then the concept of "naive time" gets muddier: sometimes, e.g., >> >> dt1 - dt2 >> >> in a common zone (same tzinfo) will use classic arithmetic, but in >> other cases (fold=1 in at least one) timeline arithmetic. [Alex] > I don't think this is a problem as long as we disallow mixing naive and > aware instances in arithmetic and ordering and keep naive ? aware always > rule. "Concept gets muddier" isn't about the code, it's about the concept getting muddier ;-) That is, the number of brain cells needed for a human to grasp the model, and the number of words in the docs needed to explain it all. Paying attention to fold=1 in naive time does muddy the naive-time concept. A little. But it should hardly ever matter: even using a 495 tzinfo, there is nothing a user working _in_ naive time can do to see a fold=1 value. They have to force it by hand, or use an operation _outside_ of naive time (like .astimezone()) to get one. Doesn't really bother me. >> And there's also that, after >> >> d = dt1 - dt2 >> >> I suspect it may no longer always be the case that >> >> dt1 == dt2 + d >> >> (unsure, but can't make time for it now) > That's the price we pay for classic arithmetic anyways. Not so. Classic arithmetic obeys all the same friendly identities as do, e.g., timedelta and integer arithmetic. You gave an example in a later message, but that didn't stick to classic arithmetic. As soon as you mixed timezones, you went outside of naive time, and timeline arithmetic was used in the instances of cross-zone subtraction. Of course the classic arithmetic identities won't (can't always) apply to a _mix_ of classic and timeline arithmetic. The proposed behavior will be the first time timeline arithmetic can be used sticking to what sure looks like "naive time" operations (staying within a single zone). It's the invisible fold=1 in this case that says "not in naive time - I really want timeline arithmetic". I have little problem with that. I'm just not going to pretend it isn't _a_ change, or not _a_ muddying. > I am not even sure we want to trigger timeline arithmetics in dt + delta expressions > when dt.fold=1. I am ;--) Leaving aside that there's no sane reason to refuse to believe a datetime _means_ fold=1 when we see it, haven't we had enough of "unintended consequences" from trying to ignore it in other contexts? And carving out an exception for "oh - except fold is ignored in datetime + timedelta, and datetime - timedelta" would be another muddying of the newly-muddied model. If there's isn't a solid reason in favor of ignoring it, that would be a gratuitous muddying. > If you do, dt - hour + hour will still not take you back because > the seconds + hour will be classic. > > I don't think we can ever get rid of all paradoxes here. Once you let your > time go back, all bets are off. What we can do is to shift them from one > place to another so that you only see odd behavior when a fold=1 instance is > involved. I agree. Where I go beyond is that they should _always_ see potentially odd (to naive-time eyes) behavior when fold=1. That's understandable. "Sometime yes, sometimes no" is unexplainable beyond exhaustive listing of the "sometimes yes" and "sometimes no" cases. Unless there's a strong reason for the distinction. From tim.peters at gmail.com Wed Sep 2 03:41:05 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 1 Sep 2015 20:41:05 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: [Guido] > As a point of order, I don't have time today (nor probably this week) to > keep up with this discussion. :-( So. short & sweet, the higher-order bit of the hash problem is easy enough to sketch. Suppose x and y represent the earlier and later of an ambiguous time in their common zone. All fields are identical except for `fold`. If intrazone comparison ignores `fold`, then x == y is true. Implying their hashes must be equal. Implying that (any non-insanely-convoluted) hash() must also ignore `fold`, to get the same UTC offset for both. All fine so far. But screws up when x and y are (for example) converted to their _real_ UTC equivalents, ux and uy. Those _aren't_ equal. hash(x) == hash(y) == hash(ux) then, but hash(uy) is almost certainly different. But y == uy is true, so we're left with two equal datetimes whose hashes are almost certainly different. Note "y == uy is true" must be so for backward compatibility (interzone comparisons have always been supported). The high-order bit of the proposed solution (to this,and to the loss of total ordering, and ..) is to stop ignoring fold=1. End of problems. Start of other problems. For why the latter are thought (so far) to be infinitely easier to live with, you would have to follow the discussion. By the time you do, there will be no problems left - or at least none we'll admit to ;-) From guido at python.org Wed Sep 2 04:30:56 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Sep 2015 19:30:56 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: On Tue, Sep 1, 2015 at 6:41 PM, Tim Peters wrote: > [Guido] > > As a point of order, I don't have time today (nor probably this week) to > > keep up with this discussion. :-( > > So. short & sweet, the higher-order bit of the hash problem is easy > enough to sketch. Suppose x and y represent the earlier and later of > an ambiguous time in their common zone. All fields are identical > except for `fold`. > > If intrazone comparison ignores `fold`, then x == y is true. Implying > their hashes must be equal. Implying that (any > non-insanely-convoluted) hash() must also ignore `fold`, to get the > same UTC offset for both. All fine so far. > > But screws up when x and y are (for example) converted to their _real_ > UTC equivalents, ux and uy. Those _aren't_ equal. hash(x) == hash(y) > == hash(ux) then, but hash(uy) is almost certainly different. But y > == uy is true, so we're left with two equal datetimes whose hashes are > almost certainly different. Note "y == uy is true" must be so for > backward compatibility (interzone comparisons have always been > supported). > Ah, now I understand why someone in desperation proposed to do make some kind of assumption about the size of DST offsets. > The high-order bit of the proposed solution (to this,and to the loss > of total ordering, and ..) is to stop ignoring fold=1. End of > problems. > > Start of other problems. For why the latter are thought (so far) to > be infinitely easier to live with, you would have to follow the > discussion. By the time you do, there will be no problems left - or > at least none we'll admit to ;-) > OK, looks like the PEP has some evolving to do! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartbishop.net Wed Sep 2 09:42:07 2015 From: stuart at stuartbishop.net (Stuart Bishop) Date: Wed, 2 Sep 2015 14:42:07 +0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On 1 September 2015 at 04:42, Alexander Belopolsky wrote: > This forum may not be inclusive enough for this. People in this group know > too much! Not all of us. I claim ignorance from not being able to follow this complete thread :) My naive assumptions would be that dt1 == dt2 implies that dt1.utctimetuple() == dt2.utctimetuple(). Which means the hash implementation can just be hash(dt.utctimetuple()). datetime.utctimetuple() already defines dst flag munging, which seems very similar to the fold munging suggestions I skimmed past. -- Stuart Bishop http://www.stuartbishop.net/ From tim.peters at gmail.com Wed Sep 2 18:33:59 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 Sep 2015 11:33:59 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: [Alex] >> This forum may not be inclusive enough for this. People in this group know >> too much! [Stuart] > Not all of us. I claim ignorance from not being able to follow this > complete thread :) Despite the Subject line, it's mostly been about consequences of PEP 495 ignoring `fold` altogether in some contexts, so as to have no visible effect whatsoever in "naive time" (even for a datetime with a zone). Your brain cells have worked in the opposite direction so far, to fight "naive time" tooth & nail inside pytz for aware datetimes. > My naive assumptions would be that dt1 == dt2 implies that > dt1.utctimetuple() == dt2.utctimetuple(). Yup! Which is another reasonable expectation that could fail under the current 495, when dt1 and dt2 share a zone. If dt1 and dt2 are the earlier and later of an ambiguous time in a common zone, they differ only in their `fold` value. Under 495, dt1 == dt2 would be true anyway, but anything related to zone _conversion_ would see the difference. So .utctimetuple() would differ. At a more basic level, utcoffset() would also differ. The proposed solution is to "simply" stop ignoring fold=1. Then dt1 != dt2 from the start, so no reasonable expectations are violated. Except for someone working in naive time who somehow manages to force `fold` to 1 anyway. They may be surprised to see dt1 != dt2 in the case above. But only the first time they see it ;-) > Which means the hash implementation can just be hash(dt.utctimetuple()). Yup, it could be, provided 495 is changed to stop ignoring fold=1 for intrazone comparisons. It isn't (and won't be), because that would be a poorer-quality hash implementation (nothing about the current __hash__ should need to change): - .utctimetuple() throws away dt.microsecond, so hash() would produce massive collisions in some cases of regular inputs. - There are faster ways of getting the effect of converting to UTC (including microseconds). The actual implementation isn't documented, because it doesn't need to be ;-) From alexander.belopolsky at gmail.com Wed Sep 2 18:54:56 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 2 Sep 2015 12:54:56 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: On Tue, Sep 1, 2015 at 5:56 PM, Tim Peters wrote: > Paying attention to fold=1 in naive time does muddy the naive-time > concept. A little. But it should hardly ever matter: even using a > 495 tzinfo, there is nothing a user working _in_ naive time can do to > see a fold=1 value. They have to force it by hand, or use an > operation _outside_ of naive time (like .astimezone()) to get one. > There are two more cases: (1) datetime.now() will return fold=1 instances during one hour each year; (2) datetime.fromtimestamp(s) will return fold=1 instances for some values of s. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 2 19:20:14 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 Sep 2015 12:20:14 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: [Tim] >> Paying attention to fold=1 in naive time does muddy the naive-time >> concept. A little. But it should hardly ever matter: even using a >> 495 tzinfo, there is nothing a user working _in_ naive time can do to >> see a fold=1 value. They have to force it by hand, or use an >> operation _outside_ of naive time (like .astimezone()) to get one. [Alex] > There are two more cases: > > (1) datetime.now() will return fold=1 instances during one hour each year; > (2) datetime.fromtimestamp(s) will return fold=1 instances for some values > of s. Sure - but anything reflecting how a local clock actually behaves is outside of "naive time". Clocks in naive time never jump forward or backward. Specifically, .now() and .fromtimestamp() are also operations outside of naive time. It might, of course, have helped had the docs said a word about any of this ;-) From alexander.belopolsky at gmail.com Wed Sep 2 19:40:15 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 2 Sep 2015 13:40:15 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: On Wed, Sep 2, 2015 at 1:20 PM, Tim Peters wrote: > [Alex] > > There are two more cases: > > > > (1) datetime.now() will return fold=1 instances during one hour each > year; > > (2) datetime.fromtimestamp(s) will return fold=1 instances for some > values > > of s. > > Sure - but anything reflecting how a local clock actually behaves is > outside of "naive time". Clocks in naive time never jump forward or > backward. Specifically, .now() and .fromtimestamp() are also > operations outside of naive time. > I agree, but the worst thing we can do to our users is to plant a time bomb that will go off once a year. Suppose someone has a program that uses naive local times and relies on t < prev_t test to detect the fall-back fold and do something about it. If we don't ignore fold in naive datetime comparisons - this program will start producing incorrect results. Fortunately, we don't need to do anything about naive times. The hash invariant is only violated by aware instances. I think what you are really fighting against is the notion that for regular times, fold=1 is just an alternative spelling for fold=0 times. It looks like you would rather see fold=1 as some different (and invalid) time. Think of the German A and B hours: are regular hours A or B? The German standard say that they are neither, but PEP 495 say that they are both: 2A is the same as 2B unless "2" in the fold and that allows you not to display A/B in those cases. Folds do not exist in naive time, so all times are regular and therefore time(h, m, s, us, fold=0) == time(h, m, s, us, fold=1) always. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 2 21:59:03 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 Sep 2015 14:59:03 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: [Alex] >>> There are two more cases: >>> >>> (1) datetime.now() will return fold=1 instances during one hour each >>> year; >>> (2) datetime.fromtimestamp(s) will return fold=1 instances for some >>> values of s. [Tim] >> Sure - but anything reflecting how a local clock actually behaves is >> outside of "naive time". Clocks in naive time never jump forward or >> backward. Specifically, .now() and .fromtimestamp() are also >> operations outside of naive time. [Alex] > I agree, but the worst thing we can do to our users is to plant a time bomb > that will go off once a year. Suppose someone has a program that uses naive > local times and relies on t < prev_t test to detect the fall-back fold and > do something about it. If we don't ignore fold in naive datetime > comparisons - this program will start producing incorrect results. Yes, but I believe it's worse: that it's impossible for PEP 495 to be wholly backward compatible regardless of whether intrazone comparison ignores `fold`. It's not just "stare at one line of code" that counts for compatibility, breaking former invariants also counts. Like Stewart mentioned just before, anyone in their right mind ;-) _implicitly_ assumed all along that x == y implies x.utctimetuple() == y.utctimetuple() and, indeed, x.astimezone(SOMETZINFO) == y.astimezone(SOMETZINFO) too for any value of SOMETZINFO. PEP 495's original form breaks those (among others) - it's not credible to claim that no existing code could possibly be relying on those (or relying on total datetime ordering, etc). That may not be reflected in any single line of code, but only in what code _didn't_ do to worm around "a problem" it reasonably - perhaps not even consciously - assumed could never happen. The only way I see to be wholly backward compatible is to default to fold = -1, where fold < 0 is wholly ignored by everything, always. That's the only way to be sure no code breaks, because no behaviors whatsoever change, in any context, except possibly for the datetime.__repr_() string produced. Not just in single lines of code, but no invariants break either. But that also means .now() and .fromtimestamp() and .fromutc() must set set fold = -1, lest a fold=1 sneak in (your "time bomb once a year" scenario). Then we either need different fold-aware versions of all such functions, or new optional foldaware=False arguments on all such functions. But then it's so annoying and error-prone to use, who would bother? Whoever responds with "global flag" will be shot ;-) > Fortunately, we don't need to do anything about naive times. The hash > invariant is only violated by aware instances. Proving yet again that naive time is the only way to go ;-) > I think what you are really fighting against is the notion that for regular > times, fold=1 is just an alternative spelling for fold=0 times. It looks > like you would rather see fold=1 as some different (and invalid) time. In naive time, `fold=1` is simply senseless. It "should be" ignored in naive time. But there is no wall between "naive time" and "timeline time" in datetime's design - indeed, there is no _explicit_ way to say which you have in mind. Something has to give, because an aware datetime can be _viewed_ as being either in naive time or as in timeline time. That's in the programmer's head. Since fold=1 makes no sense in naive time, the sanest thing is to take it as meaning the datetime can _only_ be viewed as being in timeline time. We already know that solves a world of problems. But it will create others. Alas, best I can see, nothing short of fold < 0 can create _no_ problems (except for making it all kinds of pain to get fold-aware behaviors instead). > Think of the German A and B hours: are regular hours A or B? The German > standard say that they are neither, but PEP 495 say that they are both: > 2A is the same as 2B unless "2" in the fold and that allows you not to display > A/B in those cases. I'm not sure appealing to German A and B hours really clarifies it ;-) > Folds do not exist in naive time, so all times are regular and therefore > time(h, m, s, us, fold=0) == time(h, m, s, us, fold=1) always. As above, we can have no real idea whether the programmer _intends_ that an aware datetime lives in naive time or timeline time. fold=1 screams "timeline". From carl at oddbird.net Wed Sep 2 22:26:10 2015 From: carl at oddbird.net (Carl Meyer) Date: Wed, 2 Sep 2015 14:26:10 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> Message-ID: <55E75B62.2060905@oddbird.net> On 09/02/2015 01:59 PM, Tim Peters wrote: [snip] > Yes, but I believe it's worse: that it's impossible for PEP 495 to be > wholly backward compatible regardless of whether intrazone comparison > ignores `fold`. It's not just "stare at one line of code" that counts > for compatibility, breaking former invariants also counts. Like > Stewart mentioned just before, anyone in their right mind ;-) > _implicitly_ assumed all along that > > x == y > > implies > > x.utctimetuple() == y.utctimetuple() > > and, indeed, > > x.astimezone(SOMETZINFO) == y.astimezone(SOMETZINFO) > > too for any value of SOMETZINFO. > > PEP 495's original form breaks those (among others) - it's not > credible to claim that no existing code could possibly be relying on > those (or relying on total datetime ordering, etc). That may not be > reflected in any single line of code, but only in what code _didn't_ > do to worm around "a problem" it reasonably - perhaps not even > consciously - assumed could never happen. > > The only way I see to be wholly backward compatible is to default to > fold = -1, [...] > > In naive time, `fold=1` is simply senseless. It "should be" ignored > in naive time. But there is no wall between "naive time" and > "timeline time" in datetime's design - indeed, there is no _explicit_ > way to say which you have in mind. Something has to give, because an > aware datetime can be _viewed_ as being either in naive time or as in > timeline time. That's in the programmer's head. Since fold=1 makes > no sense in naive time, the sanest thing is to take it as meaning the > datetime can _only_ be viewed as being in timeline time. We already > know that solves a world of problems. Totally in agreement with everything above. To summarize: trying to disambiguate folds leads to contradiction if the implementation doesn't fully accept a "timeline" view of tz-aware datetimes, because in a "naive" view, the two overlapping times in a fold are the _same time_. The very idea of disambiguation itself is a "timeline view" concept; it's not consistent with naive time. > But it will create others. Can we enumerate the specific problems this would create? Let's hypothesize the following proposal: * As discussed in earlier threads, datetime is taught to respect a new `strict` flag on tzinfo objects, treating aware datetimes as fully in "timeline time," including for arithmetic, (only) if it is set. If it is not set, no behavior changes from what we have today. * The `fold` flag is respected in any way (and ever set to anything other than -1 by built-in methods) _only_ if the attached tzinfo has `strict=True`. Now what problems would this cause? * Backwards compatibility is not a problem. There are no tzinfo classes currently in existence with `strict=True`. * All of PEP 495's problems with hashes, equality, and ordering that have been discussed in this thread are solved; `fold` is entirely unused with non-strict tzinfo, and entirely consistent with strict tzinfo. * Ability to work with timezone-annotated datetimes (I can't say "timezone-aware" with a straight face for datetimes that operate in naive time) in naive time, which is a use case that some people have, is preserved; just use a tzinfo with `strict=False`. * Working with a "timeline view" of tz-aware datetimes (which is also a valid use case that some people have) becomes much simpler than it is today; much simpler even than with pytz. It looks like all wins to me. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Thu Sep 3 00:26:41 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 Sep 2015 17:26:41 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E75B62.2060905@oddbird.net> References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> Message-ID: [Carl Meyer ] > ... > To summarize: trying to disambiguate folds leads to contradiction if the > implementation doesn't fully accept a "timeline" view of tz-aware > datetimes, because in a "naive" view, the two overlapping times in a > fold are the _same time_. The very idea of disambiguation itself is a > "timeline view" concept; it's not consistent with naive time. Fun, isn't it? [Tim] >> But it will create others. > Can we enumerate the specific problems this would create? That use of "we" appears to mean "anyone but Carl" ;-) The problems it could create depend on the contexts in which the PEP says fold would not be ignored. While nobody has mentioned it, it _could_ be that someone working in naive time would be annoyed even by dt.utcoffset() returning different results depending on `fold`. While it's a goal of the PEP to _make_ them differ in some cases, that in itself isn't wholly backward compatible in all conceivable cases. Then why would the use a 495-compliant tzinfo to begin with? Because they'er a user, and they don't understand any of this stuff ;-) > Let's hypothesize the following proposal: > > * As discussed in earlier threads, datetime is taught to respect a new > `strict` flag on tzinfo objects, treating aware datetimes as fully in > "timeline time," including for arithmetic, (only) if it is set. If it is > not set, no behavior changes from what we have today. Why conflate this with arithmetic? It's. e.g., quite possible someone wants correct interzone conversion in all cases without getting sucked into way-slower arithmetic too. For the purposes of 495, I'm going to pretend that using fold is controlled by the presence of a new tzinfo __fold__ attribute (we can't use a flag, because _existing_ tzinfos don't already have it). Arithmetic is a different issue. Presumably a `strict` tzinfo would be required to say "fold-aware" too, but also say more than just that. > * The `fold` flag is respected in any way (and ever set to anything > other than -1 by built-in methods) _only_ if the attached tzinfo has > `strict=True`. Since there's now a way to spell "ignore fold" versus "respect fold", there's no longer any point to fold < 0. "Ignore fold" is now the default, and "respect fold" has to be explicitly requested. For simplicity, any function that knows how to set fold correctly should be _allowed_ to do so regardless. > Now what problems would this cause? > > * Backwards compatibility is not a problem. There are no tzinfo classes > currently in existence with `strict=True`. It does appear to be wholly backward compatible, and that would be great. > * All of PEP 495's problems with hashes, equality, and ordering that > have been discussed in this thread are solved; `fold` is entirely unused > with non-strict tzinfo, and entirely consistent with strict tzinfo. There are still questions, like, e.g., what fold_aware_datetime + timedelta should do when fold=1, but only in my variation of what you proposed. You proposed mixing "pay attention to fold" with "timeline arithmetic", which leaves no choice. Alex and I seem to disagree about what to do when "only pay attention to fold" is meant instead. I think it makes a difference now that they're explicitly asking to respect fold - but I'm not yet sure _what_ difference it makes ;-) > * Ability to work with timezone-annotated datetimes (I can't say > "timezone-aware" with a straight face for datetimes that operate in > naive time) in naive time, which is a use case that some people have, is > preserved; just use a tzinfo with `strict=False`. "timezone-annotated" is a winner! LOL - what a frickin ' mess ;-) > * Working with a "timeline view" of tz-aware datetimes (which is also a > valid use case that some people have) becomes much simpler than it is > today; much simpler even than with pytz. I'm still to keen to push timeline arithmetic off to a later PEP. It doesn't have to be addressed to solve 495's problems. > It looks like all wins to me. Good food for thought. Thanks! From carl at oddbird.net Thu Sep 3 01:01:32 2015 From: carl at oddbird.net (Carl Meyer) Date: Wed, 2 Sep 2015 17:01:32 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> Message-ID: <55E77FCC.9040507@oddbird.net> [Tim] > [Carl Meyer ] >> ... >> To summarize: trying to disambiguate folds leads to contradiction if the >> implementation doesn't fully accept a "timeline" view of tz-aware >> datetimes, because in a "naive" view, the two overlapping times in a >> fold are the _same time_. The very idea of disambiguation itself is a >> "timeline view" concept; it's not consistent with naive time. > > Fun, isn't it? I think the number of people for whom this qualifies as "fun" approaches the number of people who have ever implemented a tzinfo ;) > [Tim] >>> But it will create others. > >> Can we enumerate the specific problems this would create? > > That use of "we" appears to mean "anyone but Carl" ;-) Right - I mis-read the referent of "it" above. You were talking about the proposal to make fold=1 (only) force "timeline view". I understand the problems that causes. I mis-read and thought you were suggesting the possibility of "use timeline view always," and saying _that_ "creates other problems." So I was trying to think of what problems those would be, and not thinking of any. [Carl] >> Let's hypothesize the following proposal: >> >> * As discussed in earlier threads, datetime is taught to respect a new >> `strict` flag on tzinfo objects, treating aware datetimes as fully in >> "timeline time," including for arithmetic, (only) if it is set. If it is >> not set, no behavior changes from what we have today. [Tim] > Why conflate this with arithmetic? It's. e.g., quite possible someone > wants correct interzone conversion in all cases without getting sucked > into way-slower arithmetic too. One reason to conflate with arithmetic is to limit the number of mental models people have to comprehend. If we conflate, there would be two models: "naive model" and "timeline model", and the choice between them would be controlled by one flag. I think that's already more than enough complexity for most people, but it's simplicity itself compared to the possibility that we could end up with three models: "naive model", "timeline model for conversions but still naive for arithmetic", and "timeline model". ISTM the second is too confusing and inconsistent in its view of the world to be featured as a primary mode; if someone really needs it, it'd be easy enough to write functions to do fast naive arithmetic on strict-aware datetimes (strip the tzinfo, then add it back). (The write-your-own-function argument can go both ways! ;) > For the purposes of 495, I'm going to > pretend that using fold is controlled by the presence of a new tzinfo > __fold__ attribute (we can't use a flag, because _existing_ tzinfos > don't already have it). As an API choice I think "boolean flag with default if not present" is preferable to "mere existence of an attribute causes a switch in behavior, regardless of its value." But this is definitely a low-order bit here. >> * The `fold` flag is respected in any way (and ever set to anything >> other than -1 by built-in methods) _only_ if the attached tzinfo has >> `strict=True`. > > Since there's now a way to spell "ignore fold" versus "respect fold", > there's no longer any point to fold < 0. "Ignore fold" is now the > default, and "respect fold" has to be explicitly requested. > > For simplicity, any function that knows how to set fold correctly > should be _allowed_ to do so regardless. Yes, good point. >> * All of PEP 495's problems with hashes, equality, and ordering that >> have been discussed in this thread are solved; `fold` is entirely unused >> with non-strict tzinfo, and entirely consistent with strict tzinfo. > > There are still questions, like, e.g., what > > fold_aware_datetime + timedelta > > should do when fold=1, but only in my variation of what you proposed. > You proposed mixing "pay attention to fold" with "timeline > arithmetic", which leaves no choice. Yes. Point for my proposal :-) The fact that that even has to be a question illustrates how "timeline-mode conversions with fold disambiguation, but naive model for arithmetic" remains a problematic split-brain model that leads to inconsistencies. [Tim] > I'm still to keen to push timeline arithmetic off to a later PEP. It > doesn't have to be addressed to solve 495's problems. I think you've convincingly demonstrated in this thread that conversions, equality, comparisons, and arithmetic _are_ all fundamentally linked. If you try to cut them apart and handle some with a timeline model and some with a naive model, you'll have to violate a reasonably-expected invariant _somewhere_. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Thu Sep 3 01:05:04 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 2 Sep 2015 19:05:04 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> Message-ID: On Wed, Sep 2, 2015 at 6:26 PM, Tim Peters wrote: > There are still questions, like, e.g., what > > fold_aware_datetime + timedelta > > should do when fold=1, but only in my variation of what you proposed. > You proposed mixing "pay attention to fold" with "timeline > arithmetic", which leaves no choice. Alex and I seem to disagree > about what to do when "only pay attention to fold" is meant instead. > This is one of those cases where I don't have a strong opinion. Unlike the datetime - datetime case where we have a strong argument to do timeline arithmetic in the presence of fold=1 (namely to preserve the hash invariant), any choice here will lead to surprises. What should [01:30/fold=1] - (1 hour) yield? Given that [01:30/fold=0] + (1 hour) = [02:30/fold=0] and [00:30/fold=0] + (1 hour) = [01:30/fold=0], both answers [01:30/fold=0] and [00:30/fold=0] are equally wrong. The third possibility, [00:30/fold=1] is probably more wrong than the first two. Whatever logic we will end up implementing will likely need to be modified by the applications to fit their needs. In this case, I think we need to provide the faster to compute option so that applications don't end up undoing some expensive operations. As Guido said, arithmetic is a way to move the hands of the clock. It does not need to be a way to mess with the fold attribute. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 3 02:39:41 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 Sep 2015 19:39:41 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E77FCC.9040507@oddbird.net> References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> Message-ID: We should be clearer about something up front: there are _two_ notions of "backward compatibility" here: 1. After PEP 495 is implemented but before any 495-compliant tzinfos exist. 2. After PEP 495 is implemented and the user explicitly employs a 495-compliant tzinfo. There are few potential issues under #1, but - alas - not none, and - double alas - none of them would be altered by any kind of flag or inheritance pattern on tzinfos. Under #1, no tzinfos know anything about `fold`, so little _could_ possibly change. Users can explicitly set fold=1, and if so they deserve whatever they get ;-) But, as I read the PEP, there are 3 places Python may force fold=1 all on its own, all based on Python's own (independent of any tzinfo) idea of the local timezone: A. datetime.fromtimestamp() without a trzinfo argument. B. datetime.now(). C. datetime.today() [The PEP doesn't mention #C, but it probably should (it's defined to be equivalent to passing time.time() to #A)] Since none of these consult a tzinfo, they can start producing fold=1 immediately, and nothing can stop that. So I take almost everything back ;-) 1. No trick with tzinfos can make a lick of difference to what A/B/C will do from the start. 2. Because of #1, the idea of explicitly saying "I want a fold-aware tzinfo" is the same thing as using a 495-compliant tzinfo. Keep using pre-495 tzinfos, and A/B/C remain your only worries. But they're minor worries at worst, since in #1 no tzinfos pay any attention to `fold` - the fundamental .utcoffset() in a pre-495 tzinfo is oblivious to `fold`. BTW, it may be useful to add a standardized (by the PEP) way for a tzinfo to _say_ "I implement 495". Like a magic new attribute. Then code that cares could use hasattr() to refuse or require using 495-compliant tzinfos. [Carl] > ... > One reason to conflate with arithmetic is to limit the number of mental > models people have to comprehend. If we conflate, there would be two > models: "naive model" and "timeline model", and the choice between them > would be controlled by one flag. > > I think that's already more than enough complexity for most people, but > it's simplicity itself compared to the possibility that we could end up > with three models: "naive model", "timeline model for conversions but > still naive for arithmetic", and "timeline model". > > ISTM the second is too confusing and inconsistent in its view of the > world to be featured as a primary mode; if someone really needs it, it'd > be easy enough to write functions to do fast naive arithmetic on > strict-aware datetimes (strip the tzinfo, then add it back). (The > write-your-own-function argument can go both ways! ;) It was always intended that users who wanted timeline arithmetic work in UTC instead. Everyone agrees that's best practice for many reasons. "Even Stuart" ;-) will agree with the latter. As to using functions, they're not symmetric situations: classic arithmetic is very fast, so fast that the overheads of calling a function and mucking around with stripping/reattaching tzinfos would be a major speed hit. timeline arithmetic is so slow that hardly matters. But work in UTC, as intended, and timeline arithmetic is the same thing as classic arithmetic, so is also very fast when performed the intended way. > ... > The fact that that even has to be a question illustrates how > "timeline-mode conversions with fold disambiguation, but naive model for > arithmetic" remains a problematic split-brain model that leads to > inconsistencies. I'm more inclined now to see it as an illustration that Alex's view is right: datetime +/- timedelta should indeed ignore fold. If I wanted timeline arithmetic, I should have been working in UTC from the start ;-) ... >> I'm still to keen to push timeline arithmetic off to a later PEP. It >> doesn't have to be addressed to solve 495's problems. > I think you've convincingly demonstrated in this thread that > conversions, equality, comparisons, and arithmetic _are_ all > fundamentally linked. If you try to cut them apart and handle some with > a timeline model and some with a naive model, you'll have to violate a > reasonably-expected invariant _somewhere_. Python already did, using timeline arithmetic for cross-zone subtraction and comparisons, and (necessarily so) for timezone conversions, but classic arithmetic for all other intrazone computations. Mucking with that old model really does belong in a different PEP. We're having quite enough pain already just figuring out what can go wrong with a single new bit ;-) From tim.peters at gmail.com Thu Sep 3 02:48:54 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 Sep 2015 19:48:54 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> Message-ID: [Tim] >> There are still questions, like, e.g., what >> >> fold_aware_datetime + timedelta >> >> should do when fold=1, but only in my variation of what you proposed. >> You proposed mixing "pay attention to fold" with "timeline >> arithmetic", which leaves no choice. Alex and I seem to disagree >> about what to do when "only pay attention to fold" is meant instead. [Alex] > This is one of those cases where I don't have a strong opinion. I do: it should ignore fold=1. Precisely the opposite of what you _thought_ I've been saying ;-) > Unlike the datetime - datetime case where we have a strong argument > to do timeline arithmetic in the presence of fold=1 (namely to preserve > the hash invariant), And total ordering, and equivalence between comparison outcomes and subtraction results. There are any number of "common sense" invariants that rely on this. > any choice here will lead to surprises. Indeed so. So screw it ;-) > ... From carl at oddbird.net Thu Sep 3 05:24:25 2015 From: carl at oddbird.net (Carl Meyer) Date: Wed, 2 Sep 2015 21:24:25 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> Message-ID: <55E7BD69.3060905@oddbird.net> [Tim] >>> I'm still to keen to push timeline arithmetic off to a later PEP. It >>> doesn't have to be addressed to solve 495's problems. [Carl] >> I think you've convincingly demonstrated in this thread that >> conversions, equality, comparisons, and arithmetic _are_ all >> fundamentally linked. If you try to cut them apart and handle some with >> a timeline model and some with a naive model, you'll have to violate a >> reasonably-expected invariant _somewhere_. [Tim] > Python already did, using timeline arithmetic for cross-zone > subtraction and comparisons, and (necessarily so) for timezone > conversions, but classic arithmetic for all other intrazone > computations. I know :( > Mucking with that old model really does belong in a > different PEP. We're having quite enough pain already just figuring > out what can go wrong with a single new bit ;-) But the point is that changing that model (in a backwards-compatible way, by means of a tzinfo flag) to draw a clear line between timeline-mode and naive-mode, _eliminates_ almost all of that pain. All these puzzles about arithmetic, ordering, equality, and hashing go away entirely (that is, they have obvious and unsurprising answers). So doing these two things together doesn't add to the net pain; it reduces it considerably. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Thu Sep 3 05:36:21 2015 From: carl at oddbird.net (Carl Meyer) Date: Wed, 2 Sep 2015 21:36:21 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> Message-ID: <55E7C035.7080707@oddbird.net> [Tim] >>> There are still questions, like, e.g., what >>> >>> fold_aware_datetime + timedelta >>> >>> should do when fold=1, but only in my variation of what you proposed. >>> You proposed mixing "pay attention to fold" with "timeline >>> arithmetic", which leaves no choice. Alex and I seem to disagree >>> about what to do when "only pay attention to fold" is meant instead. [Alex] >> This is one of those cases where I don't have a strong opinion. [Tim] > I do: it should ignore fold=1. Precisely the opposite of what you > _thought_ I've been saying ;-) [Alex] >> Unlike the datetime - datetime case where we have a strong argument >> to do timeline arithmetic in the presence of fold=1 (namely to preserve >> the hash invariant), > > And total ordering, and equivalence between comparison outcomes and > subtraction results. There are any number of "common sense" > invariants that rely on this. IIUC, choosing this combination of behavior means that it is possible to have a datetime `dt1` (with fold=1) such that: dt1 - dt2 => delta where `fold` is respected in this case, but dt2 + delta != dt1 because fold is ignored for timedelta arithmetic (but is respected for equality-checking, because that's necessary to maintain the hashing invariant). Are we really so wedded to maintaining an unpredictable hybrid naive/aware model for timezone-annotated datetimes that we're willing to break basic invariants of arithmetic and equality to preserve it? >> any choice here will lead to surprises. > > Indeed so. So screw it ;-) An alternative to "so screw it" in the face of this puzzle would be to choose the option that preserves all the invariants and behaves predictably in all cases. But I suppose that would make things too easy... Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Thu Sep 3 05:47:26 2015 From: carl at oddbird.net (Carl Meyer) Date: Wed, 2 Sep 2015 21:47:26 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> Message-ID: <55E7C2CE.5030407@oddbird.net> [Tim] > [Carl Meyer ] >> ... >> To summarize: trying to disambiguate folds leads to contradiction if the >> implementation doesn't fully accept a "timeline" view of tz-aware >> datetimes, because in a "naive" view, the two overlapping times in a >> fold are the _same time_. The very idea of disambiguation itself is a >> "timeline view" concept; it's not consistent with naive time. > > Fun, isn't it? If this is a fair summary, then why are we still trying to both keep a "naive" model for aware datetimes and also disambiguate folds, when we've just accepted that the two concepts are inherently contradictory and combining them inevitably will lead to surprises? If timezone-annotated datetimes in Python are really just supposed to represent naive clock time with an associated timezone, then there is no point in trying to disambiguate at a fold; both sides of the fold are the same naive clock time in the same timezone. If timezone-annotated datetimes in Python represent an unambiguously UTC-convertible instant, then why shouldn't they consistently behave that way (and happily eliminate all the surprising corner cases from PEP 495)? If they are supposed to represent some quantum hybrid of the two, where in some situations they behave like one and in some situations like the other (that is the status quo, of course), is there a concisely-stated consistent rule by which one can predict when they will behave like one and when they will behave like the other? Will that rule still apply post-PEP-495? Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Thu Sep 3 06:02:39 2015 From: carl at oddbird.net (Carl Meyer) Date: Wed, 2 Sep 2015 22:02:39 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> Message-ID: <55E7C65F.8050106@oddbird.net> On 09/02/2015 06:39 PM, Tim Peters wrote: > We should be clearer about something up front: there are _two_ > notions of "backward compatibility" here: > > 1. After PEP 495 is implemented but before any 495-compliant tzinfos exist. > > 2. After PEP 495 is implemented and the user explicitly employs a > 495-compliant tzinfo. > > There are few potential issues under #1, but - alas - not none, and - > double alas - none of them would be altered by any kind of flag or > inheritance pattern on tzinfos. If the `fold` attribute is entirely ignored, in all cases, unless a new-style tzinfo is present, then there are no issues under #1. > [Carl] >> ... >> One reason to conflate with arithmetic is to limit the number of mental >> models people have to comprehend. If we conflate, there would be two >> models: "naive model" and "timeline model", and the choice between them >> would be controlled by one flag. >> >> I think that's already more than enough complexity for most people, but >> it's simplicity itself compared to the possibility that we could end up >> with three models: "naive model", "timeline model for conversions but >> still naive for arithmetic", and "timeline model". >> >> ISTM the second is too confusing and inconsistent in its view of the >> world to be featured as a primary mode; if someone really needs it, it'd >> be easy enough to write functions to do fast naive arithmetic on >> strict-aware datetimes (strip the tzinfo, then add it back). (The >> write-your-own-function argument can go both ways! ;) > > It was always intended that users who wanted timeline arithmetic work > in UTC instead. Everyone agrees that's best practice for many > reasons. "Even Stuart" ;-) will agree with the latter. For apps doing heavy datetime arithmetic, I agree that working in UTC is best (and that's what I do). It would also be reasonable to say that if you want naive arithmetic with an implied timezone shared by all instances, the best practice is to use naive datetimes and track the implied timezone separately. But given that we're not proposing to raise an exception on all arithmetic with tz-annotated datetimes, it has to behave _somehow_, and it should behave in the least-surprising and most-consistent way possible. In a post-PEP-495 world, it is abundantly clear that consistent timeline arithmetic would require fewer (that is, zero) surprising violations of invariants. If a new Python user is trying to calculate how long they slept when they went to bed at 10pm on March 2 and got up at 6am on March 3, Python should give them the right answer. Telling them "you should convert to UTC first if you want your tz-aware datetime to actually be aware of the tz transition" is going to sound a bit silly to them; they neither went to sleep in UTC nor awoke in UTC; they did both in their own timezone. > As to using functions, they're not symmetric situations: classic > arithmetic is very fast, so fast that the overheads of calling a > function and mucking around with stripping/reattaching tzinfos would > be a major speed hit. timeline arithmetic is so slow that hardly > matters. But work in UTC, as intended, and timeline arithmetic is the > same thing as classic arithmetic, so is also very fast when performed > the intended way. Ok, continue using an old-style tzinfo (without the new `strict` attribute) and you can continue to have fast classic arithmetic on tz-annotated datetimes forever. Or you can use a strict tzinfo and have tz-aware datetimes that unambiguously represent a UTC-convertible instant. But how many contortions and surprising behaviors is it worth to try to provide both of those at once, in the same object? Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Thu Sep 3 06:07:14 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 Sep 2015 23:07:14 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E7BD69.3060905@oddbird.net> References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> Message-ID: [Carl Meyer] > But the point is that changing that model (in a backwards-compatible > way, by means of a tzinfo flag) to draw a clear line between > timeline-mode and naive-mode, _eliminates_ almost all of that pain. All > these puzzles about arithmetic, ordering, equality, and hashing go away > entirely (that is, they have obvious and unsurprising answers). The puzzles about arithmetic, ordering, equality and hashing have already been resolved. The problems were all due to a single cause: ignoring fold=1 where it really matters. There remain no significant backward-compatibility issues until 495-compliant tzinfos exist. Then people can choose to use them, or not. Up to them. > So doing these two things together doesn't add to the net pain; it > reduces it considerably. You're trying to retroactively change datetime's original design. It simply won't fly. Classic arithmetic was intentional in Python. It's unreasonable to ask people to settle for arithmetic at best 10x slower just to get correct timezone conversions (your idea of "backward compatible": get both or neither, and only "neither" is _really_ backward-compatible - more below). pytz users are certainly free to chose that, but we can't inflict it on everyone. Worse for your view, Guido wouldn't _want_ to regardless. Under 495's view, you can get fast timeline arithmetic _and_ correct conversions just by working in UTC. Stop fighting the intent, and life is easy. I also need to mention that your idea requires a lot more changes to the core Python code, from implementing timeline arithmetic internally to slowing down everything all the time with "is this the right kind of tzinfo?" conditional branches. then doing entirely different things depending on the outcome. Layers of complication do not generally increase robustness ;-) Even then, it's certain to be backward _incompatible_ with mounds of code if they choose to use the "fold and timeline" option. I have, for example, previously shown pieces of Python's own datetime implementation, and of my own code, that _require_ using classic arithmetic. Python's own datetime implementation would fail in sundry miserable ways under your option. All such places could be changed to live with timeline arithmetic, but they can't find and fix themselves by magic. Neither can any other user code implicitly or explicitly relying on classic arithmetic. Since classic _has_ been used forever, it's certain that lots of code does. 495 triggers no such problems. So I await the patch ;-) In its absence, we'll likely continue taking one useful, small step at a time. From carl at oddbird.net Thu Sep 3 06:16:22 2015 From: carl at oddbird.net (Carl Meyer) Date: Wed, 2 Sep 2015 22:16:22 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> Message-ID: <55E7C996.5060009@oddbird.net> On 09/02/2015 10:07 PM, Tim Peters wrote: > [Carl Meyer] >> But the point is that changing that model (in a backwards-compatible >> way, by means of a tzinfo flag) to draw a clear line between >> timeline-mode and naive-mode, _eliminates_ almost all of that pain. All >> these puzzles about arithmetic, ordering, equality, and hashing go away >> entirely (that is, they have obvious and unsurprising answers). > > The puzzles about arithmetic, ordering, equality and hashing have > already been resolved. The problems were all due to a single cause: > ignoring fold=1 where it really matters. But aren't we still left with arithmetic that violates basic invariants in the presence of a fold=1 datetime? [Tim] > It's unreasonable to ask people to settle for arithmetic at best 10x > slower just to get correct timezone conversions If the intended meaning of a tz-annotated datetime is "naive clock time with an associated timezone", then we don't need PEP 495; timezone conversions are already as correct as the model allows. PEP 495 just worsens the existing "naive or aware?" identity crisis of tz-annotated datetimes. > So I await the patch ;-) Fair! I'll work on one :-) > In its absence, we'll likely continue taking one useful, small step at a time. It's no longer clear to me that PEP 495 is a useful step. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Thu Sep 3 08:17:12 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 01:17:12 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E7C65F.8050106@oddbird.net> References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> Message-ID: I'm out of time for tonight, but will try to make more tomorrow. Just one for now, because I think it cuts to the _real_ heart of this batch of messages: [Carl Meyer ] > ... > If a new Python user is trying to calculate how long they slept when > they went to bed at 10pm on March 2 and got up at 6am on March 3, Python > should give them the right answer. Telling them "you should convert > to UTC first if you want your tz-aware datetime to actually be aware of > the tz transition" is going to sound a bit silly to them; they neither went > to sleep in UTC nor awoke in UTC; they did both in their own timezone. That's the heart: you simply despise classic arithmetic. This example has nothing to with PEP 495 - it's a complaint about classic arithmetic, period. It's more likely that a new user will want to set an alarm to get up at 6am, then add timedelta(days=1) to set a new alarm for "same time next day". They'd be surprised and annoyed if that ended up at 7am or 9am just because DST switched. Their stupid alarm-setting code works fine today, and will continue to work fine with a 495-compliant tzinfo when they're available. It doesn't help to point out that "period arithmetic" _could_ be done in some other way. These particular kinds of uses already work, and always have. Different purposes require different kinds of arithmetic. Python picked one. That timeline arithmetic wasn't the choice doesn't mean Python despises it. It was just judged "probably less useful overall - and there are other, better ways to get it". You can disagree with that choice, but it can't be changed now. I know, you're not proposing to change it: you're proposing to leave it exactly the way it is, but exploit the desire for correct timezone conversions to sneak timeline arithmetic into the core - because that's "the only sane way" to do it. Tricky ;-) Once your new user understood the _potential_ problems when dealing with pseudo-real-world durations in classic arithmetic, no, for something this trivial I wouldn't advise converting to UTC explicitly. Instead I'd give them a 1-line Python function implementing timeline datetime-datetime subtraction, which they can use forever after. It can't always work right today, because conversions alone can't always work right today. When the user obtains a 495-compliant tzinfo , the same function will always work right, by magic. But _only_ under 495. Under your view, timezone conversion would continue to fail in some cases, because users who didn't want to drink the timeline-all-the-time Kool-Aid would be left out. Also left out would be users who usually want classic arithmetic but _do_ convert to UTC for fancier stuff: conversion to UTC would continue to give rare wrong results for them too. So you're not really looking to do anything for anyone, _except_ for those who want the whole timeline enchilada. That's a legitimate view, but in particular it wouldn't help me a bit ;-) I _want_ what 495 is offering. I usually want classic arithmetic. When I want timeline arithmetic, I switch to UTC, or use a 1-liner, and I'd sleep a tad better if the latter two always did work correctly. BTW, if your new user is also a physicist, we'[ll _both_ need to give them a much more annoying function, in case they ask the question near the end of June or December, and need to account for a leap second that may have occurred while they were sleeping. From carl at oddbird.net Thu Sep 3 12:59:48 2015 From: carl at oddbird.net (Carl Meyer) Date: Thu, 3 Sep 2015 04:59:48 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> Message-ID: <55E82824.7020607@oddbird.net> [Tim] > I'm out of time for tonight, but will try to make more tomorrow. Just > one for now, because I think it cuts to the _real_ heart of this batch > of messages: I don't think this cuts to the heart of anything :/ I think it avoids the main point I've made (several times) to latch instead onto a tangent I should have left out. [Tim] > That's the heart: you simply despise classic arithmetic. Sorry, but no. I have nothing at all against naive arithmetic. I think both naive arithmetic and timeline arithmetic have good use cases. What I have trouble with is a tz-annotated datetime object that fundamentally can't decide whether it's living in a naive or timeline model, and thus behaves unpredictably. This is a problem today, but at least the behavior can be explained fairly simply: the model is naive when operating within the same timezone, and aware anytime you're converting between timezones or interoperating between timezones. PEP 495, AFAICS, makes the problem worse, because it introduces another bit of information that only makes sense in a timeline view. That new bit now allows round-tripping from UTC, which is great (no problem, because conversions are an area where tz-annotated datetimes already tried to behave as tz-aware instants in time). But then it can't quite decide how to rationalize that new bit of information with its naive internal view of time, so it settles on a mish-mash of inconsistent behavior that violates basic arithmetic identities we all learned in elementary school and only makes any sense if you've followed this entire thread. If you want to cut to the heart of the matter, tell me how you would write the documentation for how arithmetic works on a tz-annotated datetime post-PEP-495. Does it work on a naive "move the hands of the clock" model? (No, because I can subtract 1:30AM from 2:30AM and get "2 hours" in some cases.) Does it work on a UTC timeline model? (No, clearly not.) So what is the model, stated precisely and concisely? And is it actually backwards-compatible with current code that converts from UTC to local time and then does arithmetic on those local times, or compares them to each other? (Not around a DST transition, no.) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Thu Sep 3 16:27:48 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 10:27:48 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E82824.7020607@oddbird.net> References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> Message-ID: On Thu, Sep 3, 2015 at 6:59 AM, Carl Meyer wrote: > But then it can't quite > decide how to rationalize that new bit of information with its naive > internal view of time, so it settles on a mish-mash of inconsistent > behavior that violates basic arithmetic identities we all learned in > elementary school and only makes any sense if you've followed this > entire thread. > It is actually easier to understand if you *don't* read this thread because some of the earlier posts (including my own) are quite confusing. The rule we settled on is quite simple and consistent with the status quo. First, you need to realize that aware fold=1 times *can* be represented in the current version of datetime, but you must use a different tzinfo for that. Popular choices are timezone.utc or the fictitious fixed offset standard time zone. (I call these zones fictitious because they represent a possibly non-existing time zone which does not observe DST changes.) For example, in US/Eastern, if you want to represent [01:30/fold=1], you can either use [01:30/tzinfo=EST] or [06:30/tzinfo=UTC] which conveniently compare as equal. What 495 gives you is the third way to spell the same time: [01:30/fold=1,tzinfo=Eastern]. It is quite natural that this third spelling will have exactly the same properties as the first two: [01:30/fold=0] < [01:59/fold=0] < [06:30/tzinfo=UTC] == [01:30/tzinfo=EST] == [01:30/fold=1] < [02:00/fold=0] The only "basic arithmetic identities" that are being violated here are the ones that are already violated by aware datetimes. For example (t1 - u) - (t2 - u) is not equal to t1 - t2 if u is a tzinfo=UTC instance and t1 and t2 are two tzinfo=Eastern instances on the different sides of the gap. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Thu Sep 3 16:43:30 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 10:43:30 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> Message-ID: On Thu, Sep 3, 2015 at 10:27 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > .. are two tzinfo=Eastern instances on the different sides of the gap. s/gap/fold/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Thu Sep 3 16:52:06 2015 From: carl at oddbird.net (Carl Meyer) Date: Thu, 3 Sep 2015 08:52:06 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> Message-ID: <55E85E96.5050500@oddbird.net> [Alex] > The only "basic arithmetic identities" that are being violated here are > the ones that are already violated by aware datetimes. For example (t1 > - u) - (t2 - u) is not equal to t1 - t2 if u is a tzinfo=UTC instance > and t1 and t2 are two tzinfo=Eastern instances on the different sides of > the gap. Yes, you can already get such results, because aware datetimes are already sometimes aware and sometimes naive depending on context. That's a problem for learning the API, but it's at least an easily-explained problem: arithmetic within a timezone is always naive, arithmetic between timezones is always aware, if you mix the two (as your example does) you may get surprising results. I don't see any such easily comprehensible explanation for the new proposed PEP 495 behavior. It is no longer true that "arithmetic within a timezone is always naive." Now "arithmetic within a timezone is naive, unless you happen to have a particular kind of special time in a single hour once per year, in which case some kinds of arithmetic (dt/dt) are aware of the DST transition, but other kinds (dt/delta) still ignore it." Is that roughly what you propose to put in the documentation? Currently you only get results that violate arithmetic identities if you mix arithmetic within a timezone and arithmetic between timezones. Again, a simple rule. Under PEP 495, you can get such results even if you always stay within a single timezone. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Thu Sep 3 17:02:21 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 11:02:21 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E85E96.5050500@oddbird.net> References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> Message-ID: On Thu, Sep 3, 2015 at 10:52 AM, Carl Meyer wrote: > It is no longer true that "arithmetic within a timezone is always naive." > If you like this rule, you can keep it. :-) Just note that fold=1 instances are in a different timezone. This is unavoidable because within the same timezone fold=1 instances don't exist: 01:59 is followed by 02:00 with no room for "second 01:30" in between. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 3 17:05:40 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 10:05:40 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> Message-ID: [Alex] >> The only "basic arithmetic identities" that are being violated here are >> the ones that are already violated by aware datetimes. For example >> (t1 - u) - (t2 - u) is not equal to t1 - t2 >> if u is a tzinfo=UTC instance and t1 and t2 are two tzinfo=Eastern >> instances on the different sides of the gap. [Alex] > s/gap/fold/ What you said is true either way (fold or gap); the sign of the hour difference (between the two expressions) just differs. Although _sometimes_ the expressions can be equal, if you move t1 and/or t2 far enough away from the gap/fold to encompass some number of _additional_ gaps/folds, so as to just cancel out overall. As an obvious example, pick d1 = 2000-01-01 and d2 = 2001-01-01. They're on different sides of one gap, but also on different sides of one fold. Then you get 366 days (2000 is a leap year) via either way of computing the difference. The conceptual muddying here is that this kind of stuff wasn't possible before when sticking within a _single_ zone. We are introducing oddball cases of timeline arithmetic into what used to be "surprise-free" classic arithmetic. I don't like that, but I'm not scared to death of it either. Yet ;-) From alexander.belopolsky at gmail.com Thu Sep 3 17:19:16 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 11:19:16 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> Message-ID: On Thu, Sep 3, 2015 at 11:05 AM, Tim Peters wrote: > The conceptual muddying here is that this kind of stuff wasn't > possible before when sticking within a _single_ zone. > This is what Carl is complaining about, but once you realize that fold=1 on an ambiguous datetime instance effectively modifies the zone (changes the value returned by utcoffset()), it becomes quite natural. > We are introducing oddball cases of timeline arithmetic into what used > to be > "surprise-free" classic arithmetic. I don't like that, but I'm not > scared to death of it either. Yet ;-) > Wait for the next PEP update. :-) I am adding a section titled "An Overview of the Current State of Aware Arithmetic and Comparisons." A reader who will survive that won't be impressed by the additional PEP 495 rules. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Thu Sep 3 17:19:22 2015 From: carl at oddbird.net (Carl Meyer) Date: Thu, 3 Sep 2015 09:19:22 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> Message-ID: <55E864FA.90704@oddbird.net> On 09/03/2015 09:02 AM, Alexander Belopolsky wrote: > On Thu, Sep 3, 2015 at 10:52 AM, Carl Meyer > wrote: > > It is no longer true that "arithmetic within a timezone is always > naive." > > If you like this rule, you can keep it. :-) Just note that fold=1 > instances are in a different timezone. Ok, so for most of the year when I do utctime.astimezone(Eastern), I get a result in Eastern, but during one hour of the year I get a result in "some other timezone that isn't quite Eastern" (but its tzinfo is still the same object as all the others). That's your proposal for a _less_ surprising interpretation? ;-) > This is unavoidable because > within the same timezone fold=1 instances don't exist: 01:59 is followed > by 02:00 with no room for "second 01:30" in between. Right. That's an excellent statement of why respecting `fold` at all is inconsistent with how tz-annotated datetimes are designed to behave in Python (they operate internally in naive time, in which the "fold" time does not even exist). Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From chris.barker at noaa.gov Thu Sep 3 17:19:31 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 3 Sep 2015 08:19:31 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> Message-ID: <-7656714205635425925@unknownmsgid> > It's unreasonable to ask people to settle for arithmetic at best 10x > slower just to get correct timezone conversions I'm not sure. As has been pointed out, best practice is to use UTC or naive time anyway. So if the casual user wants to compute how long s/he slept last night, it can be slow. It's easier to document "computations are much faster in UTC" than to document all the surprising inconsistencies. And as for original intent -- my understanding of the entire architecture was designed NOT to be about fast arithmetic. If you want that, use tics or numpy.datetime64. And intentional or not, "classic" arithmetic may be easy to implement and fast, but it is hard to explain, surprising, and not very useful. -Chris From alexander.belopolsky at gmail.com Thu Sep 3 17:23:44 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 11:23:44 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <-7656714205635425925@unknownmsgid> References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> <-7656714205635425925@unknownmsgid> Message-ID: On Thu, Sep 3, 2015 at 11:19 AM, Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > If you want that, use tics or numpy.datetime64. > Chris, please stop promoting numpy.datetime64 here. It is definitely not a positive example of how a date/time manipulation library should be designed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 3 17:30:12 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 10:30:12 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E85E96.5050500@oddbird.net> References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> Message-ID: [Alex] >> The only "basic arithmetic identities" that are being violated here are >> the ones that are already violated by aware datetimes. For example (t1 >> - u) - (t2 - u) is not equal to t1 - t2 if u is a tzinfo=UTC instance >> and t1 and t2 are two tzinfo=Eastern instances on the different sides of >> the gap. [Carl] > Yes, you can already get such results, because aware datetimes are > already sometimes aware and sometimes naive depending on context. That's > a problem for learning the API, but it's at least an easily-explained > problem: arithmetic within a timezone is always naive, arithmetic > between timezones is always aware, if you mix the two (as your example > does) you may get surprising results. > > I don't see any such easily comprehensible explanation for the new > proposed PEP 495 behavior. It can't possibly require more confusing words than _already_ exist trying to explain the subtleties behind why timezone conversion can fail in rare cases ;-) People _expect_ the obvious roundtrip identities there too. It's a tradeoff. The doc problem here seems much simpler: in arithmetic involving two datetimes, the operands will be treated as having distinct tzinfos if at least one has fold=1. It reduces to a prior case. The equally rare conversion problems require paragraph after paragraph to explain. > .. > Currently you only get results that violate arithmetic identities if you > mix arithmetic within a timezone and arithmetic between timezones. And we currently have timeline conversions that can violate basic identities in _that_ space. It is trading one for the other. From carl at oddbird.net Thu Sep 3 17:37:11 2015 From: carl at oddbird.net (Carl Meyer) Date: Thu, 3 Sep 2015 09:37:11 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> Message-ID: <55E86927.1000103@oddbird.net> [Carl] >> Currently you only get results that violate arithmetic identities if you >> mix arithmetic within a timezone and arithmetic between timezones. [Tim] > And we currently have timeline conversions that can violate basic > identities in _that_ space. It is trading one for the other. Yes. The new proposed behavior for PEP 495 abandons the assertion that it can be "independent of arithmetic", recognizing that instead we're trading consistency of arithmetic within a timezone for consistency of round-trips between timezones. So PEP 495 is already breaking the design of datetime, that tz-annotated datetimes operate internally on a naive time model. It _has_ to break that design, because it must introduce times that don't exist in that model. But it's choosing to change that design piecemeal and inconsistently instead of thoroughly and consistently. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Thu Sep 3 17:38:52 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 11:38:52 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E864FA.90704@oddbird.net> References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> <55E864FA.90704@oddbird.net> Message-ID: On Thu, Sep 3, 2015 at 11:19 AM, Carl Meyer wrote: > > This is unavoidable because > > within the same timezone fold=1 instances don't exist: 01:59 is followed > > by 02:00 with no room for "second 01:30" in between. > > Right. That's an excellent statement of why respecting `fold` at all is > inconsistent with how tz-annotated datetimes are designed to behave in > Python (they operate internally in naive time, in which the "fold" time > does not even exist). I wish we could have a design where fold is always ignored when you have a single tzinfo. The reason we cannot has been explained several times in this thread. The core reason is possibly a mistake in the original design that permitted cross-zone arithmetic and comparison. If == was defined so that no two instances with different tzinfo ever compare equal and <, - and friends are only defined for datetimes sharing the tzinfo, we would not have this problem. Recall that datetime was designed at the time when it was thought that mixing bytes and unicode was a good idea. We all know what it took to fix that wart. I don't think cross-zone datetime arithmetic is an issue of the same scale or impact. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Thu Sep 3 17:42:02 2015 From: carl at oddbird.net (Carl Meyer) Date: Thu, 3 Sep 2015 09:42:02 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> <55E864FA.90704@oddbird.net> Message-ID: <55E86A4A.6010503@oddbird.net> On 09/03/2015 09:38 AM, Alexander Belopolsky wrote: > On Thu, Sep 3, 2015 at 11:19 AM, Carl Meyer > wrote: > > This is unavoidable because > > within the same timezone fold=1 instances don't exist: 01:59 is followed > > by 02:00 with no room for "second 01:30" in between. > > Right. That's an excellent statement of why respecting `fold` at all is > inconsistent with how tz-annotated datetimes are designed to behave in > Python (they operate internally in naive time, in which the "fold" time > does not even exist). > > I wish we could have a design where fold is always ignored when you have > a single tzinfo. The reason we cannot has been explained several times > in this thread. The core reason is possibly a mistake in the original > design that permitted cross-zone arithmetic and comparison. If == was > defined so that no two instances with different tzinfo ever compare > equal and <, - and friends are only defined for datetimes sharing the > tzinfo, we would not have this problem. Yes, I understand why that doesn't work. There is an alternative solution available that avoids this problem, and all other inconsistencies. > Recall that datetime was > designed at the time when it was thought that mixing bytes and unicode > was a good idea. We all know what it took to fix that wart. I don't > think cross-zone datetime arithmetic is an issue of the same scale or > impact. True. Which should make it more feasible to fix. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Thu Sep 3 17:57:14 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 11:57:14 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E86A4A.6010503@oddbird.net> References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> <55E864FA.90704@oddbird.net> <55E86A4A.6010503@oddbird.net> Message-ID: On Thu, Sep 3, 2015 at 11:42 AM, Carl Meyer wrote: > There is an alternative solution available that avoids this problem, and > all other inconsistencies. > Really? PEP 495 has a more or less complete reference implementation in my github fork [1] of cpython. I have recently added the hash invariant preservation rule which required a change to the grand total of two lines in datetime.py. Something that is that easy to implement cannot be too hard to explain and document. I would like to specifically point out that the only existing unit test that my patch has to modify is the one which checks that astimezone() method raises an exception on a naive datetime. I have not seen any "alternative solution" implemented anywhere. If you have not tried it yourself - trust me - keeping 4000+ lines of unit tests intact while adding features to the datetime module is not an easy task. [1]: https://github.com/abalkin/cpython/tree/issue24773 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 3 17:56:47 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 10:56:47 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <-7656714205635425925@unknownmsgid> References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> <-7656714205635425925@unknownmsgid> Message-ID: [Tim] >> It's unreasonable to ask people to settle for arithmetic at best 10x >> slower just to get correct timezone conversions [Chris Barker] > I'm not sure. As has been pointed out, best practice is to use UTC or > naive time anyway. We're not designing a new language here. Python already has more users than an instance of numpy.datetime64 has bits ;-) As is, working in UTC does nothing to help you get correct conversions in all cases. That problem has nothing to do with arithmetic. It has entirely to do with what PEP 495 is addressing: the current inability of a local time to record _which_ UTC time it corresponds to in ambiguous cases. timeline vs classic arithmetic is irrelevant to that "in theory". In practice, it seems to be unfortunately true that resolving it in a way that plays nice with everything else requires muddying the classic arithmetic rules in some rare cases. > So if the casual user wants to compute how long s/he slept last night, > it can be slow. It's easier to document "computations are much faster > in UTC" than to document all the surprising inconsistencies. Ditto. > And as for original intent -- my understanding of the entire > architecture was designed NOT to be about fast arithmetic. Quite so. But it's been in the field for over a decade, and relatively fast arithmetic happens to a property that's been maintained all along. That's another kind of "backward compatibility" we have to respect. > If you want that, use tics or numpy.datetime64. Or just leave your already-working Python datetime "fast enough" code alone. > And intentional or not, "classic" arithmetic may be easy to implement > and fast, but it is hard to explain, surprising, and not very useful. I find it very useful. So does Guido. As to being hard to explain, you must be joking: classic arithmetic has the same semantics as doing integer arithmetic on integer POSIX timestamps (although extended to support microseconds). They're different representations of the same thing. I would have _preferred_ that an aware datetime followed timeline rules instead (or didn't support builtin arithmetic at all), but too late for that. From tim.peters at gmail.com Thu Sep 3 18:39:56 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 11:39:56 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E86927.1000103@oddbird.net> References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> <55E86927.1000103@oddbird.net> Message-ID: [Carl] > ... > So PEP 495 is already breaking the design of datetime, that tz-annotated > datetimes operate internally on a naive time model. It _has_ to break > that design, because it must introduce times that don't exist in that > model. But it's choosing to change that design piecemeal and > inconsistently instead of thoroughly and consistently. It was never consistent for all possible uses: as has been gone over many times before, an aware datetime _can_ be viewed as being an instant in "naive time", _or_ as an instant in civil time. That's solely in the programmer's head. They may even view a single datetime in both ways in different lines of code (I know I do - indeed, that's the norm for me). Python has no way to know which the programmer has in mind; there is no way to _spell_ "I mean naive time" versus "I mean civil time" for aware datetimes. I believe Guido thinks that's "a feature". I think it's just "good enough" ;-) Since the concept of "timezone conversion" doesn't exist in naive time, a programmer asking for a timezone conversion can only have "instant in civil time" in mind at the instant they ask for that conversion (or invoke any other tzinfo method). We're aiming to accommodate that use, in a design that never put a wall between the concepts from the start. It's not ideal, but that's not really news ;-) From tim.peters at gmail.com Thu Sep 3 18:58:22 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 11:58:22 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> <55E864FA.90704@oddbird.net> Message-ID: [Alex] > I wish we could have a design where fold is always ignored when you have a > single tzinfo. Me too - and you tried very hard to make that so. Valiant effort! > The reason we cannot has been explained several times in > this thread. The core reason is possibly a mistake in the original design > that permitted cross-zone arithmetic and comparison. If == was defined so > that no two instances with different tzinfo ever compare equal and <, - and > friends are only defined for datetimes sharing the tzinfo, we would not have > this problem. Recall that datetime was designed at the time when it was > thought that mixing bytes and unicode was a good idea. We all know what it > took to fix that wart. It was also designed at a time when Python was just starting to stop ;-) allowing comparisons between _any_ two objects. Things like 1 < "1" {10: 20} < [None] were true near that time. Why? "Because" in senseless cases (both comparands said "not implemented"), sometimes the string names of the types were compared instead, and "int" < "str" and "dict" < "list" are true. Compared to stuff like that, doing timeline arithmetic for interzone comparisons seemed to be a welcome case of principled sanity ;-) But there's no question (in my mind) that if datetime had been designed today, interzone comparisons would be disallowed (except for "==" always saying False and "!=" always True). From tim.peters at gmail.com Thu Sep 3 19:18:09 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 12:18:09 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> <55E864FA.90704@oddbird.net> <55E86A4A.6010503@oddbird.net> Message-ID: [Carl] >> There is an alternative solution available that avoids this problem, and >> all other inconsistencies. [Alex] > Really? Carl means ignoring `fold` everywhere, all the time, unless a datetime's tzinfo is of a new "strict" flavor that implements PEP 495 _and_ forces the datetime to use timeline arithmetic all the time. > ... > I have not seen any "alternative solution" implemented anywhere. In a sense, pytz kinda does this already (but not all by magic). > If you have not tried it yourself - trust me - keeping 4000+ lines of unit tests > intact while adding features to the datetime module is not an easy task. They would continue to pass, _until_ you used one of the new "strict" tzinfos. Then they'd barf all over the place. Indeed, it would fatally confuse Python's _implementation_ of datetime (which, as you know, currently exploits that arithmetic on aware datetimes is classic - which could be changed, but won't change itself by magic). So, assuming many changes to Python itself, this is "backward compatible" even to the extent of leaving conversions broken forever for code that wants to use classic arithmetic. From alexander.belopolsky at gmail.com Thu Sep 3 19:31:23 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 13:31:23 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> <55E85E96.5050500@oddbird.net> <55E864FA.90704@oddbird.net> <55E86A4A.6010503@oddbird.net> Message-ID: On Thu, Sep 3, 2015 at 1:18 PM, Tim Peters wrote: > So, assuming many changes to Python itself, this is "backward > compatible" even to the extent of leaving conversions broken forever > for code that wants to use classic arithmetic. > On top of this, I think any operations that mix strict and classic datetimes will be prohibited as well. Effectively a new class is proposed. The only thing I don't understand is why would you want to call it "datetime"? mxDateTime will be a much better name. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Sep 4 00:16:33 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 3 Sep 2015 15:16:33 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> <-7656714205635425925@unknownmsgid> Message-ID: On Thu, Sep 3, 2015 at 8:23 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Thu, Sep 3, 2015 at 11:19 AM, Chris Barker - NOAA Federal < > chris.barker at noaa.gov> wrote: > >> If you want that, use tics or numpy.datetime64. >> > > Chris, please stop promoting numpy.datetime64 here. It is definitely not > a positive example of how a date/time manipulation library should be > designed. > Sorry -- didn't mean to promote -- and yes, it's actually really horrible, particularly for anything to do with time zones. The point was that there are other ways to get performance for datetime arithmetic if that's what you need. That's all. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Sep 4 00:51:30 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 3 Sep 2015 15:51:30 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> <-7656714205635425925@unknownmsgid> Message-ID: On Thu, Sep 3, 2015 at 8:56 AM, Tim Peters wrote: > > And intentional or not, "classic" arithmetic may be easy to implement > > and fast, but it is hard to explain, surprising, and not very useful. > > > > As to being hard to explain, > you must be joking: sigh. Look at the length of this stinking thread! and how much confusion there was at the beginning about what the heck the current datetime implementation actually did. Classic arithmetic may well be the best possible solution given the constraints, but it is not obvious, clear, lacking in surprises or well documented ( and no one reads docs until they run into a problem) I know I only got it when someone explained the implementation: "remove the tzinfo object, do the math, tack the tzinfo back on" Simple elegant, and now I get it. And get why things go wonky with datetimes with two different tzinfo objects. By the way, something like that should be in the docs. Anyway, clearly timeline math is an important use case for folks -- just as many (more?) than classic math. It would be nice to support it one way or another. Which can have nothing to do with this PEP -- so carry one. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Sep 4 01:55:05 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 3 Sep 2015 19:55:05 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> <-7656714205635425925@unknownmsgid> Message-ID: On Thu, Sep 3, 2015 at 6:51 PM, Chris Barker wrote: > > I know I only got it when someone explained the implementation: > > "remove the tzinfo object, do the math, tack the tzinfo back on" > > Simple elegant, and now I get it. And get why things go wonky with datetimes with two different tzinfo objects. > > By the way, something like that should be in the docs. Doc patches from good writers are always welcome, but in this case, I don't see what needs to be added to what the reference manual already says: """ Subtraction of a datetime from a datetime is defined only if both operands are naive, or if both are aware. If one is aware and the other is naive, TypeError is raised. If both are naive, or both are aware and have the same tzinfo attribute, the tzinfo attributes are ignored, and the result is a timedelta object t such that datetime2 + t == datetime1. No time zone adjustments are done in this case. If both are aware and have different tzinfo attributes, a-b acts as if a and b were first converted to naive UTC datetimes first. The result is (a.replace(tzinfo=None) - a.utcoffset()) -(b.replace(tzinfo=None) - b.utcoffset()) except that the implementation never overflows. """ https://docs.python.org/3/library/datetime.html#datetime.datetime The only improvement that comes to mind is to make "Supported operations:" a linkable section. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 4 02:03:02 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 19:03:02 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> <-7656714205635425925@unknownmsgid> Message-ID: [Chris Barker] >>> And intentional or not, "classic" arithmetic may be easy to implement >>> and fast, but it is hard to explain, surprising, and not very useful. [Tim] >> As to being hard to explain, you must be joking: [Chris] > sigh. Look at the length of this stinking thread! I don't recall any confusions in this thread about what classic arithmetic does. Do you? > and how much confusion there was at the beginning about what > the heck the current datetime implementation actually did. Which covered a world of issues. > Classic arithmetic may well be the best possible solution given > the constraints, It's impossible that this - or any other - PEP could succeed at changing the default arithmetic. > but it is not obvious, clear, lacking in surprises or well documented > ( and no one reads docs until they run into a problem) Maybe they should ;-) But, yup, the docs could be clearer. > I know I only got it when someone explained the implementation: > > "remove the tzinfo object, do the math, tack the tzinfo back on" > > Simple elegant, and now I get it. So, you start with "hard to explain", and end with "simple elegant, and now I get it" after a one-sentence explanation - yet wonder why I said "you must be joking"? I don't see how it could be all of those simultaneously. It's easy to explain. It just took you a while to find the simple explanation. Some people people get it instantly; others don't. For the latter, that's a doc problem, not a "hard to explain" problem. > And get why things go wonky with datetimes with two different tzinfo objects. > > By the way, something like that should be in the docs. I agree. Patches welcome ;-) > Anyway, clearly timeline math is an important use case for folks -- just as > many (more?) than classic math. It would be nice to support it one way or > another. > > Which can have nothing to do with this PEP -- so carry one. In the meantime, use UTC - you'll be much happier with that in the end (simpler, clearer, cleaner, faster, ...). That was the intent from the start, and will likely always be the best way to get timeline arithmetic (regardless of programming language too), From tim.peters at gmail.com Fri Sep 4 02:32:11 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 19:32:11 -0500 Subject: [Datetime-SIG] Another round on error-checking Message-ID: [Alex] > Doc patches from good writers are always welcome, but in this case, I don't > see what needs to be added to what the reference manual already says: The docs lack a coherent, friendly overview. For example, I don't think they even mention "naive time". The docs you quote here are "buried" in a footnote on a table of datetime operations. They're accurate, but provide no context, motivation, or exposition of the _model_. Chris's "remove the tzinfo object, do the math, tack the tzinfo back on" explains a whole lot about classic arithmetic in one brief & comprehensible sentence. > """ > Subtraction of a datetime from a datetime is defined only if both operands > are naive, or if both are aware. If one is aware and the other is naive, > TypeError is raised. I wrote almost all this stuff to begin with, but right now even I'm already half asleep ;-) > If both are naive, or both are aware and have the same tzinfo attribute, the > tzinfo attributes are ignored, and the result is a timedelta object t such > that datetime2 + t == datetime1. Assuming the reader already digested the similarly legalistic footnote just above about what "datetime + timedelta" does. In reference-manual style, you can't jump in just anywhere, because the details are too numerous and involved to keep repeating them. > No time zone adjustments are done in this case. > > If both are aware and have different tzinfo attributes, a-b acts as if a and > b were first converted to naive UTC datetimes first. The result is > (a.replace(tzinfo=None) - a.utcoffset()) -(b.replace(tzinfo=None) - > b.utcoffset()) except that the implementation never overflows. > """ And stuff like "except that the implementation never overflows" is important in a spec (it's a constraint on allowable implementations of the spec), but of approximately no interest to 99.997% of users. > https://docs.python.org/3/library/datetime.html#datetime.datetime > > The only improvement that comes to mind is to make "Supported operations:" a > linkable section. As above, it's not that the docs lack sufficient detail - they're _buried_ in detail. Something more akin to the ever-popular "binary floating-point" tutorial appendix would probably be more useful to most users. Just the high-order bits, with pragmatic advice (like "if you need timeline arithmetic, use UTC - don't be a sucker" ;-) ). From tim.peters at gmail.com Fri Sep 4 06:11:30 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 3 Sep 2015 23:11:30 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: <55E82824.7020607@oddbird.net> References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> Message-ID: [Tim] >> I'm out of time for tonight, but will try to make more tomorrow. Just >> one for now, because I think it cuts to the _real_ heart of this batch >> of messages: [Carl] > I don't think this cuts to the heart of anything :/ I think it avoids > the main point I've made (several times) to latch instead onto a tangent > I should have left out. Fair enough. I only had time for one, so latched on to the lamest one ;-) >> That's the heart: you simply despise classic arithmetic. > Sorry, but no. I have nothing at all against naive arithmetic. I think > both naive arithmetic and timeline arithmetic have good use cases. > > What I have trouble with is a tz-annotated datetime object that > fundamentally can't decide whether it's living in a naive or timeline > model, and thus behaves unpredictably. > > This is a problem today, but at least the behavior can be explained > fairly simply: the model is naive when operating within the same > timezone, and aware anytime you're converting between timezones or > interoperating between timezones. > > PEP 495, AFAICS, makes the problem worse, because it introduces another > bit of information that only makes sense in a timeline view. That new > bit now allows round-tripping from UTC, which is great (no problem, > because conversions are an area where tz-annotated datetimes already > tried to behave as tz-aware instants in time). But then it can't quite > decide how to rationalize that new bit of information with its naive > internal view of time, so it settles on a mish-mash of inconsistent > behavior that violates basic arithmetic identities we all learned in > elementary school and only makes any sense if you've followed this > entire thread. Eh. It's not perfect, but I don't know that anyone (present company excepted) will care much. It matters only for the later of ambiguous times in at worst (in common zones) one hour per year, and then only for someone using classic datetime-datetime subtraction or comparison starting in _some_ (not all) cases in such a fold. Perhaps this makes it wholly unusable. I doubt most would reach that conclusion, but it's possible. > If you want to cut to the heart of the matter, tell me how you would > write the documentation for how arithmetic works on a tz-annotated > datetime post-PEP-495. Already did in a different message ("if at least one operand has fold=1, acts as if the tzinfos were distinct" - reduced to a prior case). Of course that doesn't make _sense_ in the naive time model. Repeating that point isn't really needed ;-) > Does it work on a naive "move the hands of the > clock" model? (No, because I can subtract 1:30AM from 2:30AM and get "2 > hours" in some cases.); Assuming DST is ending and moves the clock back 1 hour, then: 1. Assuming a post-495 tzinfo: A. If 2:30AM is the later of ambiguous times with fold=1, 2 hours. B. If 2:30AM is the earlier of ambiguous times with fold=0, 1 hour. C. If 1:30AM is the later of ambiguous times with fold=1, 1 hour. D. If 1:30AM is the earlier of ambiguous times with fold=0, 1 hour. In all other cases, 1 hour. In all cases, 1:30AM will compare "less than" 2:30AM.. Note that classic arithmetic is still used if both operands have fold=0; so nothing _could_ change in cases B and D. Note that using US rules, it's 1 hour in all cases (2:30AM isn;t ambiguous under US rules. so A and B can't apply). Switch to, e.g., 1:30AM - 12;30AM to get an "interesting" case for US rules.. 2. Assuming a pre-495 tzinfo: What they see will depend on what their fold-blind tzinfo makes up for times in a fold. The choice recommended in the docs is to treat an ambiguous time as being the later. If so, cases 1A & 1C still apply, and all cases return the same results. If the tzinfo makes the opposite choice, then case 1A returns 1 hour and case 1C returns 2 hours. So after 495 is implemented, they will see a difference of 2 hours in some cases when the "real world" difference really is 2 hours, and regardless of whether they're using a pre- or post-495 tzinfo.. That's not particularly surprising: nobody thinks _wholly_ in "naive time" ;-) Of course nobody will (or should even try to) remember all those cases. An app that really cares (if any exist - none of my code cares) will need to "do something" about it. Or we'll need to add code to ignore `fold` if a pre-495 tzinfo is in use (in which case nothing will change if they stick to pre-495 tzinfos). Yes, it would be better if nobody had to do anything. No, I'm not appalled, just mildly annoyed so far. > Does it work on a UTC timeline model? (No, clearly not.) So what is the > model, stated precisely and concisely? This part isn't driven a model; it's driven by pragmatism ("practicality beats purity"). The sanest model is "it's classic unless you're near a fold, and if you care anything about what happens then when doing classic arithmetic you're wasting your time: e.g., force it out of a fold if you need to care". I've never written an app that needs to worry about this. Classic arithmetic in naive time is a simple (but highly useful) form of "period arithmetic", and things like "same time next week" are rarely (never, for me) concerned with hours near a transition time. They're usually about interacting with other people or businesses. > And is it actually backwards-compatible with current code that converts > from UTC to local time and then does arithmetic on those local times, or > compares them to each other? (Not around a DST transition, no.) You don't need any of that - the 2:30AM - 1:30AM example above already sufficed to show it's not always backward compatible. That's not surprising (said before I don't think anything _useful_ to existing code can be wholly backward compatible). From chris.barker at noaa.gov Fri Sep 4 17:45:07 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 4 Sep 2015 08:45:07 -0700 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E5D3F5.40600@oddbird.net> <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7BD69.3060905@oddbird.net> <-7656714205635425925@unknownmsgid> Message-ID: On Thu, Sep 3, 2015 at 4:55 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > Doc patches from good writers are always welcome, but in this case, I > don't see what needs to be added to what the reference manual already says: > Wow -- I did not find that when I went looking early in this thread -- so maybe not missing, but not in an easy to find place. The trick with docs is that: a) people don't read them ;-) b) OK, they do when they can't figure out how to do something -- in which case, they read as little as they can to solve their problem. The trick with datetime arithmetic is that people come to it with an expectation of how it works, so we want to make sure they won't go away with that that expectation (if it's wrong) with a quick read of the docs. This is particularly a problem because datetime arithmetic behaves like both Period and Duration arithmetic if you stay away from DST -- so folks can come in with either expectation and their code could work fine, and not find an the issue until it fails next fall. And yes, I could have written some nice doc patches with the time I've spent on this thread -- but that's less fun! And maybe we should wait for post-PEP, so we know what to write (if anything different). -Chris > https://docs.python.org/3/library/datetime.html#datetime.datetime > > The only improvement that comes to mind is to make "Supported operations:" > a linkable section. > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Sep 4 18:01:19 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 4 Sep 2015 09:01:19 -0700 Subject: [Datetime-SIG] Timeline arithmetic? Message-ID: Folks, It seems to me that it's clear that timeline arithmetic will not get implemented in concert with PEP 495. So -- is the door open to a PEP that DOES implement timeline arithmetic with tz-aware datetimes in the standard lib? I would like a flag on datetime, but it seems it might be better to put that flag on a tzinfo object. But the implementation is the something to argue about only if there is any chance of doing it at all. Also, particularly as PEP 495 will introduce changes to tzinfo, that will presumable lead to changes in tzinfo implementations (like pytz, etc), it seems that if other changes are afoot, now is a good time to map out how they should be done. Stuart, if you are listening: IIUC, you want "timeline" arithmetic to work with pytz tzinfo-aware datetimes. To the extent that the current implementation functions in a maybe "hacky", and at least inconvenient, way to achieve this. So you are an obvious person to say what we might put in the stdlib that would facilitate cleaning all that up. If anything. BTW: I'll at least take it as a given that we're not breaking backward compatibility, and that arithmetic needs to stay as fast as it currently is -- at least in the cases where it currently works. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Sep 4 18:23:40 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 4 Sep 2015 12:23:40 -0400 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: On Thu, Sep 3, 2015 at 8:32 PM, Tim Peters wrote: > I wrote almost all this stuff to begin with, but right now even I'm > already half asleep ;-) > I agree that the datetime documentation is showing its age and could benefit from a face-lift, but note that being an entertaining read is not a primary goal of the reference documentation if at all. The datetime documentation has evolved through a series of local patches as new features have been added to the module. At each turn, the primary goal was to have a complete and accurate documentation for each method and not as much on having the overall document well-organized. Some of the complaints expressed in this thread can be better addressed in a tutorial-style document rather than the reference documentation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Sep 4 18:31:38 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 4 Sep 2015 12:31:38 -0400 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: Message-ID: On Fri, Sep 4, 2015 at 12:01 PM, Chris Barker wrote: > It seems to me that it's clear that timeline arithmetic will not get > implemented in concert with PEP 495. > > So -- is the door open to a PEP that DOES implement timeline arithmetic > with tz-aware datetimes in the standard lib? > The door is always open to good ideas! PEP 500 was my failed attempt to bring timeline arithmetic to aware datetime objects. I will not make another attempt before PEP 495 is finalized. Please don't interpret this as a lack of interest in the subject. I just want to focus on one issue at a time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 4 18:39:38 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 4 Sep 2015 11:39:38 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: Message-ID: [Chris Barker ] > It seems to me that it's clear that timeline arithmetic will not get > implemented in concert with PEP 495. It's certainly not _part_ of 495. 495 aims to fix timezone conversions in all cases for code that's already working fine in all other respects. Tying that to timeline arithmetic would wholly miss the "for code that's already working fine" goal. Carl's scheme would tie fixing conversions _to_ using a brand new builtin implementation of timeline arithmetic, so would do nothing for existing code (would neither hurt nor help it, although all code currently doing arithmetic on aware datetimes could fail in subtle or gross ways _if_ it tried using one of Carl's new tzinfos). > So -- is the door open to a PEP that DOES implement timeline > arithmetic with tz-aware datetimes in the standard lib? I would say instead the door isn't shut ;-) Note that Guido already rejected PEP 500, which proposed one way to allow it. He didn't like its generality. A PEP concerned with timeline arithmetic alone would overcome that objection. But you have to know by now that datetime always intended that apps needing timeline arithmetic use UTC instead (or timestamps), and there's scarcely an experienced voice on the planet that would _recommend_ doing it any other way. Building in "by magic" timeline arithmetic would be fighting both datetime's design and universally recognized best practice. So I dare to say it will never be _attractive_ to Guido. At best it could get grudging acceptance. Which is possible! Just want to make clear that it's likely to be an uphill fight. Note that PEP 495 may also be rejected. "Grudging acceptance" is the best 495 can do too (always-correct conversions are an interest of mine, not particularly of Guido's - but, to be fair, at least Guido doesn't hate the idea of fixing conversions ;-) ). > ... > Also, particularly as PEP 495 will introduce changes to tzinfo, that will > presumable lead to changes in tzinfo implementations (like pytz, etc), it > seems that if other changes are afoot, now is a good time to map out how > they should be done. It seems 495 really doesn't do anything for pytz, so I'm not sure Stuart would bother to implement 495-conforming tzinfos. _Someone_ will, though. Eventually ;-) > Stuart, if you are listening: > > IIUC, you want "timeline" arithmetic to work with pytz tzinfo-aware > datetimes. To the extent that the current implementation functions in a > maybe "hacky", and at least inconvenient, way to achieve this. > > So you are an obvious person to say what we might put in the stdlib that > would facilitate cleaning all that up. If anything. > > BTW: I'll at least take it as a given that we're not breaking backward > compatibility, and that arithmetic needs to stay as fast as it currently is > -- at least in the cases where it currently works. A timeline arithmetic PEP would have to ensure that timeline arithmetic is never used unless a programmer explicitly asks for it. PEP 500 met that goal, and so does Carl's scheme (both via the same basic mechanism: by the user asking for a new flavor of tzinfo). From carl at oddbird.net Fri Sep 4 19:37:47 2015 From: carl at oddbird.net (Carl Meyer) Date: Fri, 4 Sep 2015 11:37:47 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: Message-ID: <55E9D6EB.2090108@oddbird.net> [Tim] > But you have to know by now that datetime always intended that apps > needing timeline arithmetic use UTC instead (or timestamps), and > there's scarcely an experienced voice on the planet that would > _recommend_ doing it any other way. Building in "by magic" timeline > arithmetic would be fighting both datetime's design and universally > recognized best practice. I find this argument a bit disingenuous - though it depends what exactly you are arguing, which isn't clear to me. All else being equal, designing a green-field datetime library, "universally recognized best practice" does not provide any argument for naive arithmetic over aware arithmetic on aware datetimes. Making the choice to implement aware arithmetic is not "fighting" a best practice, it's just providing a reasonable and fully consistent convenience for simple cases. You could perhaps argue that implementing _any_ kind of arithmetic on aware non-UTC datetimes is unnecessary and likely to give someone, at some point, results they didn't expect, and that it should instead just raise an exception telling you to convert to UTC first. The fact that best practice is to manipulate datetimes internally in UTC (meaning the use case already has a usually-better alternative) can certainly _weaken_ the argument for bothering to _change_ the behavior of arithmetic on aware datetimes, once it's been implemented otherwise for many years. That may be all you're trying to say here, in which case I fully agree. The core arguments _for_ aware arithmetic on aware datetimes are: 1) Conceptual coherence. Naive is naive, aware is aware, both models are fully internally consistent. Mixing them, as datetime does, will never be fully consistent. You may call this "purity" if you like, but the issues with PEP 495 do reveal a lack of coherence in datetime's design (that is, that it lacks a consistently-applied notion of what a tz-annotated datetime means). I think you've admitted this much yourself, though you suggested (in passing) that it could/should have achieved coherence in the opposite direction, by disallowing all comparisons and aware arithmetic (that is, all implicit conversions to UTC) between datetimes in different timezones. 2) Principle of least surprise for casual users. On this question, "you should use UTC for arithmetic" is equivalent to "you should use a period recurrence library for period arithmetic." Both arguments are true in principle, neither one is relevant to the question of casual users getting the results they expect. There may of course be legitimate disagreement on which behavior is less surprising for casual users. Unfortunately I don't think datetime.py (even in its many years of existence) has given us useful data on that, since it never included a timezone database and most people who need one use pytz. It's often unclear to me when you're trying to justify datetime's design choices, and when you're just pointing out that the bar is really high for changing established "good enough" behavior. If you want me to shut up and stop arguing with you (which would be an eminently reasonable desire!) clarifying that it's the latter more than the former would help tremendously, because on the latter point I agree completely. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Fri Sep 4 19:50:23 2015 From: carl at oddbird.net (Carl Meyer) Date: Fri, 4 Sep 2015 11:50:23 -0600 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: <55E75B62.2060905@oddbird.net> <55E77FCC.9040507@oddbird.net> <55E7C65F.8050106@oddbird.net> <55E82824.7020607@oddbird.net> Message-ID: <55E9D9DF.4000309@oddbird.net> [Tim] > Eh. It's not perfect, but I don't know that anyone (present company > excepted) will care much. It matters only for the later of ambiguous > times in at worst (in common zones) one hour per year, and then only > for someone using classic datetime-datetime subtraction or comparison > starting in _some_ (not all) cases in such a fold. > > Perhaps this makes it wholly unusable. I doubt most would reach that > conclusion, but it's possible. It's certainly not wholly unusable; I'd never claim that. We can have reasonable disagreement about whether it's the best option available. I think it's reasonable (in principle; pending working code) to tie fully-consistent timezone conversions to full consistency in general, and make it a migration choice, leaving existing working code entirely alone. You (and Alex and PEP 495) think consistent timezone-conversion round-trips are a valuable enough addition (even for existing code that's already working) to be worth pragmatically trading off some consistency and backwards-compatibility in other edge cases. I can see your point of view, and I think it's a reasonable disagreement to have. And I don't have much leg to stand on until I provide a working patch for my point of view, since Alex already has one for yours :-) [Tim] > Of course nobody will (or should even try to) remember all those > cases. An app that really cares (if any exist - none of my code > cares) will need to "do something" about it. Or we'll need to add > code to ignore `fold` if a pre-495 tzinfo is in use (in which case > nothing will change if they stick to pre-495 tzinfos). I'm actually quite curious how many homegrown tzinfo implementations exist in the wild, or if we're really just talking about "dateutil.tz users" vs "pytz users". When you talk about "your code", which bucket does it fall into? Clearly not the latter - are you a "homegrown tzinfo" user, or a dateutil.tz user? [Tim] > This part isn't driven a model; it's driven by pragmatism > ("practicality beats purity"). The sanest model is "it's classic > unless you're near a fold, and if you care anything about what happens > then when doing classic arithmetic you're wasting your time: e.g., > force it out of a fold if you need to care". Yep. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Fri Sep 4 20:11:46 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 4 Sep 2015 14:11:46 -0400 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55E9D6EB.2090108@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> Message-ID: On Fri, Sep 4, 2015 at 1:37 PM, Carl Meyer wrote: > Principle of least surprise for casual users. On this question, "you > should use UTC for arithmetic" is equivalent to "you should use a period > recurrence library for period arithmetic." > Keep in mind that the standard library should not only support "casual users", but also those who will write a "period recurrence library" for those "casual users." This is where classic arithmetic is indispensable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Fri Sep 4 20:19:47 2015 From: carl at oddbird.net (Carl Meyer) Date: Fri, 4 Sep 2015 12:19:47 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> Message-ID: <55E9E0C3.7070003@oddbird.net> On 09/04/2015 12:11 PM, Alexander Belopolsky wrote: > Keep in mind that the standard library should not only support "casual > users", but also those who will write a "period > recurrence library" for those "casual users." This is where classic > arithmetic is indispensable. Oh, I'm well aware. But naive arithmetic is always available - on naive datetimes. Btw, I have a minor objection to the term "classic arithmetic." It's a made-up term from this mailing list, and I don't think it describes a real distinct thing, it's just a euphemism for "naive arithmetic." I'm not sure why the euphemism arose; I _think_ it arose because it sounds wrong to say that aware datetimes perform naive arithmetic. I think that sounds wrong to roughly the same extent that it is wrong, so I don't see any point in using a made-up euphemism to hide it :-) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From guido at python.org Fri Sep 4 20:25:21 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 4 Sep 2015 11:25:21 -0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55E9E0C3.7070003@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> Message-ID: I made it up, in analogy to "classic classes" in Python 2. I did this not as a euphemism, but to avoid confusion, since in the existing docs "naive" is only ever applied to objects (meaning tzinfo-less) and I wanted to have a term that couldn't confuse anyone into thinking we were only talking about arithmetic of naive objects. On Fri, Sep 4, 2015 at 11:19 AM, Carl Meyer wrote: > On 09/04/2015 12:11 PM, Alexander Belopolsky wrote: > > Keep in mind that the standard library should not only support "casual > > users", but also those who will write a "period > > recurrence library" for those "casual users." This is where classic > > arithmetic is indispensable. > > Oh, I'm well aware. But naive arithmetic is always available - on naive > datetimes. > > Btw, I have a minor objection to the term "classic arithmetic." It's a > made-up term from this mailing list, and I don't think it describes a > real distinct thing, it's just a euphemism for "naive arithmetic." > > I'm not sure why the euphemism arose; I _think_ it arose because it > sounds wrong to say that aware datetimes perform naive arithmetic. I > think that sounds wrong to roughly the same extent that it is wrong, so > I don't see any point in using a made-up euphemism to hide it :-) > > Carl > > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Sep 4 20:38:58 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 4 Sep 2015 11:38:58 -0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55E9E0C3.7070003@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> Message-ID: On Fri, Sep 4, 2015 at 11:19 AM, Carl Meyer wrote: > On 09/04/2015 12:11 PM, Alexander Belopolsky wrote: > > Keep in mind that the standard library should not only support "casual > > users", but also those who will write a "period > > recurrence library" for those "casual users." This is where classic > > arithmetic is indispensable. > I dont get that at all -- a Period recurrence lib needs to know all sorts of stuff about the timezone, and other things, like days of the week. And it needs to be able to do "timeline arithmetic", but it would presumable be able to remove and tack back on a tzinfo object all on it's own -- i.e. so the arithmetic it wants. But maybe if I tried to implement one (which I will never do) , I'd see you point. Bu tin any case, doesn't dateutils already provide this? Btw, I have a minor objection to the term "classic arithmetic." It's a > made-up term from this mailing list, and I don't think it describes a > real distinct thing, it's just a euphemism for "naive arithmetic." > well, naive arithmetic is a made-up term too. there was a lot of bandying about about terminology early on, and this seems to be what we've settled on. And unlike "Period arithmetic" or "Duration arithmetic", I haven't seen any other reference to this type of arithmetic anywhere. > I'm not sure why the euphemism arose; I _think_ it arose because it > sounds wrong to say that aware datetimes perform naive arithmetic. yes -- I know I, and probably other thought "naive arithmetic" meant arithmetic on naive datetimes -- there was much confusion ;-) I don't see any point in using a made-up euphemism to hide it :-) unless you can find another reference, we need to make up something. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 4 21:08:02 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 4 Sep 2015 14:08:02 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55E9D6EB.2090108@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> Message-ID: [Tim] >> But you have to know by now that datetime always intended that apps >> needing timeline arithmetic use UTC instead (or timestamps), and >> there's scarcely an experienced voice on the planet that would >> _recommend_ doing it any other way. Building in "by magic" timeline >> arithmetic would be fighting both datetime's design and universally >> recognized best practice. [Carl Meyer ] > I find this argument a bit disingenuous - though it depends what exactly > you are arguing, which isn't clear to me. In the above, I'm not arguing at all. I'm trying to tell Chris in advance what the most likely fundamental objections to any "timeline arithmetic PEP" are likely to be when it comes to the one vote that matters the most: Guido's. Here I'm wearing my "attempt to channel Guido in his absence" hat. Forewarned is forearmed. In this case, it happens to be much the same as I'd say wearing several of my other hats ;-) In other contexts, I wear my "Tim as a Python user hat", "Tim as a computer `scientist'" hat, "Tim as an explainer of past decisions" hat, "Tim as an advocate for a particular change" hat, "Tim as a Python developer" hat, "Tim thinking out loud" hat, and so on. It's absurd to expect consistency among _all_ those roles. In human communication, context is necessary to distinguish, but sometimes fails. > All else being equal, designing a green-field datetime library, > "universally recognized best practice" does not provide any argument for > naive arithmetic over aware arithmetic on aware datetimes. Making the > choice to implement aware arithmetic is not "fighting" a best practice, > it's just providing a reasonable and fully consistent convenience for > simple cases. It would create an "attractive nuisance", yes ;-) It's for much the same reason, e.g., that Guido never gave a moment's serious consideration to magically making 1 + "123" return "1123" or 124. Make a dubious thing dead easy to spell, and that _implicitly_ encourages its use. That's where "best practice" comes in. Best practice when mixing ints and strings is to explicitly force the choice you intend. Best practice for timeline arithmetic in goofy timezones is to explicitly convert to a non-goofy zone first. In which case the distinction between "timeline" and "classic" arithmetic is non-existent. For that reason ("explainer of past decisions" hat), timeline arithmetic was never really on the table - it was never needed for any "best practice" use case, and Python never _intends_ to encourage poor practices. > You could perhaps argue that implementing _any_ kind of arithmetic on > aware non-UTC datetimes is unnecessary and likely to give someone, at > some point, results they didn't expect, and that it should instead just > raise an exception telling you to convert to UTC first. Wearing my "Tim as computer 'scientist'" hat, that's what I would have preferred. As a plain old Python user, I'm happy enough with the status quo. It's been useful to me! > The fact that best practice is to manipulate datetimes internally in UTC > (meaning the use case already has a usually-better alternative) can > certainly _weaken_ the argument for bothering to _change_ the behavior > of arithmetic on aware datetimes, There is no argument that can possibly succeed for changing arithmetic on aware datetimes: "Tim as Python developer hat" there. That would be massively backward-incompatible. No chance whatsoever. Not even if there were 100% agreement from everyone that classic arithmetic is utterly useless for all purposes and that allowing it at all was a horrible mistake. That kind of change could only be made in Python 4. > once it's been implemented otherwise for many years. That may be all > you're trying to say here, in which case I fully agree. I wasn't saying any of that. I was telling Chris where a timeline arithmetic PEP would most likely face deepest resistance from Guido. > The core arguments _for_ aware arithmetic on aware datetimes are: > > 1) Conceptual coherence. Naive is naive, aware is aware, both models are > fully internally consistent. Mixing them, as datetime does, will never > be fully consistent. You may call this "purity" if you like, but the > issues with PEP 495 do reveal a lack of coherence in datetime's design I think making no distinction between "naive time" and "civil time" is the core of coherence glitches. An aware datetime is purely neither in the implementation, and different operations treat it in different ways. Wearing many hats, I don't like that. Wearing my "real life Python user" hat, though - eh, I can't really say it's caused me problems. > (that is, that it lacks a consistently-applied notion of what a > tz-annotated datetime means). I think you've admitted this much > yourself, though you suggested (in passing) that it could/should have > achieved coherence in the opposite direction, by disallowing all > comparisons and aware arithmetic (that is, all implicit conversions to > UTC) between datetimes in different timezones. When wearing several different hats, yes, _that's_ more appealing. But kinda pointless, since that's not what's actually done, and PEPs have to move on from what _is_ the case. > 2) Principle of least surprise for casual users. On this question, "you > should use UTC for arithmetic" is equivalent to "you should use a period > recurrence library for period arithmetic." Both arguments are true in > principle, neither one is relevant to the question of casual users > getting the results they expect. That last wasn't ever really a _driving_ force in Python's design. >From the earlier example, a great many users have complained a great many times that 1 + "123" _doesn't_ return 124. That _is_ what most casual users expect. Tough luck - Python's not for the terminally lazy. That said, Guido's belief was that "adding 24 hours" _should_ return "same clock time tomorrow" in all cases. There was extensive public review at the time, and I don't recall anyone disagreeing. > There may of course be legitimate disagreement on which behavior is > less surprising for casual users. > Unfortunately I don't think datetime.py (even in its many years of > existence) has given us useful data on that, since it never included a > timezone database and most people who need one use pytz. I agree, except that I'm not sure we can deduce much from pytz's experience either. Stuart has said that his _primary_ goal was to fix conversion in all cases, not really to "fix arithmetic". To fix the former, fixed-offset classes always get used (to supply the "missing bit" in a wonderfully convoluted way), and "timeline arithmetic" was the _natural_ result of doing so (because timeline and classic arithmetic are exactly the same thing in any fixed-offset zone). So, in pytz, assuming they always remember to call .normalize(), timeline arithmetic is forced. > It's often unclear to me when you're trying to justify datetime's design > choices, I'm not sure I ever try to justify them. Why bother? I do often try to explain them, and sometimes express an opinion _about_ them when wearing one hat or another. It doesn't really matter whether anyone (including me) agrees or disagrees with decisions made a decade ago - with my Python developer hat on, it's only what we do tomorrow that matters. The past can only be a constraint on, or inspiration for, future decisions. > and when you're just pointing out that the bar is really high > for changing established "good enough" behavior. If you want me to shut > up and stop arguing with you (which would be an eminently reasonable > desire!) clarifying that it's the latter more than the former would help > tremendously, because on the latter point I agree completely. Well, you can't see me, but I really do have a collection of 42 hats on the table next to me, and every time I write a reply, sentence by sentence I put on the hat most appropriate to what the current sentence intends ;-) From tim.peters at gmail.com Fri Sep 4 21:29:34 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 4 Sep 2015 14:29:34 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55E9E0C3.7070003@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> Message-ID: [Carl] > Btw, I have a minor objection to the term "classic arithmetic." It's a > made-up term from this mailing list, and I don't think it describes a > real distinct thing, it's just a euphemism for "naive arithmetic." "Naive arithmetic" is also a made-up term from this mailing list (perhaps from one of the related messages choking some other mailing list before this list was created). I know, because I'm the one who made it up :-) > I'm not sure why the euphemism arose; I _think_ it arose because it > sounds wrong to say that aware datetimes perform naive arithmetic. I > think that sounds wrong to roughly the same extent that it is wrong, so > I don't see any point in using a made-up euphemism to hide it :-) Guido made up "classic arithmetic" to replace my made-up "naive arithmetic". I think it's a good change. "naive arithmetic" currently to both "naive" and "aware" datetimes, but from "naive arithmetic" alone it's too easy to _assume_ it only applies to naive datetimes. There was also agreement that it was unfortunate the docs ever used the word "naive" anywhere for any purpose. The term "timeline arithmetic" (aka "strict arithmetic") was also made up on this mailing list, but isn't needed to describe anything Python does. From carl at oddbird.net Fri Sep 4 21:51:02 2015 From: carl at oddbird.net (Carl Meyer) Date: Fri, 4 Sep 2015 13:51:02 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> Message-ID: <55E9F626.1080906@oddbird.net> [Tim] > In other contexts, I wear my "Tim as a Python user hat", "Tim as a > computer `scientist'" hat, "Tim as an explainer of past decisions" > hat, "Tim as an advocate for a particular change" hat, "Tim as a > Python developer" hat, "Tim thinking out loud" hat, and so on. It's > absurd to expect consistency among _all_ those roles. In human > communication, context is necessary to distinguish, but sometimes > fails. I don't expect consistency from humans, it's just that my hat-intuiter doesn't always work right :-) [Carl] >> All else being equal, designing a green-field datetime library, >> "universally recognized best practice" does not provide any argument for >> naive arithmetic over aware arithmetic on aware datetimes. Making the >> choice to implement aware arithmetic is not "fighting" a best practice, >> it's just providing a reasonable and fully consistent convenience for >> simple cases. [Tim] > It would create an "attractive nuisance", yes ;-) I think that either choice of arithmetic might be an attractive nuisance; what matters is consistency with the rest of the choices in the library. If datetime did naive arithmetic on tz-annotated datetimes, and also refused to ever implicitly convert them to UTC for purposes of cross-timezone comparison or arithmetic, and included a `fold` parameter not on the datetime object itself but only as an additional input argument when you explicitly convert from some other timezone to UTC, that would be a consistent view of the meaning of a tz-annotated datetime, and I wouldn't have any problem with that. It would be a view consistent with what Guido described a few days ago, that "noon Eastern on June 3 2020" is not necessarily equivalent to a UTC instant; it means nothing more than "noon Eastern on June 3 2020" until you choose to explicitly convert it to UTC, providing a full zoneinfo definition of "Eastern" (and possibly a `fold` argument too, though it's not needed for "noon Eastern June 3 2020" unless something changes) at that moment. But that isn't datetime's view, at least not consistently. The problem isn't datetime's choice of arithmetic; it's just that sometimes it wants to treat a tz-annotated datetime as one thing, and sometimes as another. (The fact that a _person_ might also want to have one sometimes and another sometimes is not a reason for an implementation to try to guess when they want one and when they want another. It could be a reason for two different types.) [Tim] > There is no argument that can possibly succeed for changing arithmetic > on aware datetimes: "Tim as Python developer hat" there. That would > be massively backward-incompatible. No chance whatsoever. Of course! That's abundantly clear, and I'd be every bit as opposed as you are to a backwards-incompatible change. Can we just assume that if I refer to "changing arithmetic" it's short-hand for "provide an option for full consistency in a way that only occurs with an opt-in choice by the user, leaving existing code behaving identically." The latter is the only thing I've ever proposed, so your choice to assume here that I meant the former feels a bit like an intentional misunderstanding so as to provide an opportunity for unnecessary hyperbole. Or maybe your intuiter is just fallible too ;-) > I think making no distinction between "naive time" and "civil time" is > the core of coherence glitches. An aware datetime is purely neither > in the implementation, and different operations treat it in different > ways. Wearing many hats, I don't like that. Yes! > Wearing my "real life > Python user" hat, though - eh, I can't really say it's caused me > problems. Fair enough. I am also not sure that the consistency glitches are enough of a problem to be worth fixing. I still think it's useful to clearly identify them and understand their source. "What is the root issue" and "is the root issue practically worth fixing today" are separable questions. I'm still trying to figure out the former (but I think we're finally getting there); I'm not at all sure what I think of the latter (and won't be until I try an implementation). [Carl] >> (that is, that it lacks a consistently-applied notion of what a >> tz-annotated datetime means). I think you've admitted this much >> yourself, though you suggested (in passing) that it could/should have >> achieved coherence in the opposite direction, by disallowing all >> comparisons and aware arithmetic (that is, all implicit conversions to >> UTC) between datetimes in different timezones. [Tim] > When wearing several different hats, yes, _that's_ more appealing. > But kinda pointless, since that's not what's actually done, and PEPs > have to move on from what _is_ the case. Of course. But I don't believe at all that understanding the core issues clearly, and identifying what we'd ideally have chosen initially, is pointless. It can be very useful (even a precondition) for deciding _how_ to move on from what is the case. >> 2) Principle of least surprise for casual users. On this question, "you >> should use UTC for arithmetic" is equivalent to "you should use a period >> recurrence library for period arithmetic." Both arguments are true in >> principle, neither one is relevant to the question of casual users >> getting the results they expect. > > That last wasn't ever really a _driving_ force in Python's design. > From the earlier example, a great many users have complained a great > many times that > > 1 + "123" > > _doesn't_ return 124. That _is_ what most casual users expect. Tough > luck - Python's not for the terminally lazy. This example is a false equivalence. Clearly, trying to guess what a casual user expects to result from an ambiguous operation is a bad idea. I don't think datetime arithmetic (even on non-UTC datetimes) is an ambiguous operation, given an implementation that consistently treats all timezone-aware datetimes as unambiguous instants, or an implementation that consistently treats them as naive datetimes with a timezone annotation. Given an implementation like datetime that isn't sure what they are, _either_ choice of arithmetic is an attractive nuisance. > Well, you can't see me, but I really do have a collection of 42 hats > on the table next to me, and every time I write a reply, sentence by > sentence I put on the hat most appropriate to what the current > sentence intends ;-) That's an excellent image, and I'll keep it in mind :-) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Fri Sep 4 22:50:10 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 4 Sep 2015 16:50:10 -0400 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> Message-ID: On Fri, Sep 4, 2015 at 2:38 PM, Chris Barker wrote: > On Fri, Sep 4, 2015 at 11:19 AM, Carl Meyer wrote: > >> On 09/04/2015 12:11 PM, Alexander Belopolsky wrote: >> > Keep in mind that the standard library should not only support "casual >> > users", but also those who will write a "period >> > recurrence library" for those "casual users." This is where classic >> > arithmetic is indispensable. >> > > I dont get that at all -- a Period recurrence lib needs to know all sorts > of stuff about the timezone, and other things, like days of the week. And > it needs to be able to do "timeline arithmetic", but it would presumable be > able to remove and tack back on a tzinfo object all on it's own -- i.e. so > the arithmetic it wants. > Let me try again. In my view, datetime class is a fancy way to encode 315537897600000000 integers: >>> 1 + (datetime.max - datetime.min) // datetime.resolution 315537897600000000 A timedelta class is a slightly less fancy way to encode some other 172799999913600000000 integers. The *natural* arithmetic on datetime and timedelta objects stems from the bijection between them and long integers. >>> t = datetime.now() >>> i = (t - datetime.min) // datetime.resolution >>> t == datetime.min + i * datetime.resolution True >>> d = timedelta(0, random()) >>> j = (d - timedelta.min) // timedelta.resolution >>> d == timedelta.min + j * timedelta.resolution True The "arithmetic" that datetime module implements is an efficient way to do addition and subtraction of datetime/timedelta objects without an explicit round trip to long integers (even though at the implementation level a round trip may take place). This arithmetic forms the basis for anything that you may want to do with datetimes: compute the number of business days in a year, compute the number of seconds in a century with or without the leap seconds, compute the angle in radians between the long hand and short hand of the Big Ben at 17:45:33.01 New York time. Timeline arithmetic is one of the simpler applications of the *natural* arithmetic provided by the datetime module: the timeline difference between t1 and t2 (assuming t1.tzinfo is t2.tzinfo) is just (t1 - t1.utcoffset()) - (t2 - t2.utcoffset()). Since "naive" difference between t1 and t2 that don't share tzinfo does not make sense, it was defined as a timeline difference. I think that was a mistake. I believe both Tim and Guido expressed a similar sentiment at various times. If datetime was designed today, t1 - t2 where t1.tzinfo is not t2.tzinfo would be an error and the user would have to choose between t1 - t2.astimezone(t1.tzinfo), t1.astimezone(t2.tzinfo) - t2 or t1.astimezone(utc) - t2.astimezone(utc) depending on the application need and with a full understanding that these three expressions can produce different results. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Fri Sep 4 23:54:41 2015 From: carl at oddbird.net (Carl Meyer) Date: Fri, 4 Sep 2015 15:54:41 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> Message-ID: <55EA1321.8030805@oddbird.net> [Tim] > The term "timeline arithmetic" (aka "strict arithmetic") was also made > up on this mailing list, but isn't needed to describe anything Python > does. Not even the thing that Python does when you subtract two datetimes whose tzinfo differs? Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Sat Sep 5 00:02:17 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 4 Sep 2015 18:02:17 -0400 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55EA1321.8030805@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> <55EA1321.8030805@oddbird.net> Message-ID: On Fri, Sep 4, 2015 at 5:54 PM, Carl Meyer wrote: > [Tim] > > The term "timeline arithmetic" (aka "strict arithmetic") was also made > > up on this mailing list, but isn't needed to describe anything Python > > does. > > Not even the thing that Python does when you subtract two datetimes > whose tzinfo differs? > No, because in this case there is no sensible alternative other than what is implemented and making it an error. The only case where two options make sense is the t1 - t2 case where t1.tzinfo is t2.tzinfo. In this case "timeline arithmetic" is not used, so it does not need a name. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sat Sep 5 00:10:18 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 4 Sep 2015 17:10:18 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55EA1321.8030805@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> <55EA1321.8030805@oddbird.net> Message-ID: [Tim] >> The term "timeline arithmetic" (aka "strict arithmetic") was also made >> up on this mailing list, but isn't needed to describe anything Python >> does. [Carl] > Not even the thing that Python does when you subtract two datetimes > whose tzinfo differs? It's reasonable to call that "timeline arithmetic". "Need" is much stronger ;-) The docs don't give a name to it at all - they just provide a mathematical expression defining the result. Because that, and interzone comparison (which is really just a way of squashing most of the bits out of interzone subtraction), are the only instances of what's being called "timeline arithmetic" in this mailing list, the docs are better off not naming it. The docs don't give a name to what's being called "classic arithmetic" here either, but for the opposite reason: that's so _much_ the norm, there's no need to give a name to a thing with just a few exceptions explicitly defined to do their own thing. That's all about what "Python does". For talking about what some future Python _may_ do, the terms can be indispensable. From tim.peters at gmail.com Sat Sep 5 04:02:36 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 4 Sep 2015 21:02:36 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> Message-ID: [Chris Barker] > I dont get that at all -- a Period recurrence lib needs to know all sorts of > stuff about the timezone, and other things, like days of the week. And it > needs to be able to do "timeline arithmetic", but it would presumable be > able to remove and tack back on a tzinfo object all on it's own -- i.e. so > the arithmetic it wants. Chris, I think you must mean something quite different by "period recurrence" than others mean. I like the term "calendar operation" better. Things like "2pm the 3rd Monday of every 5th month". Nobody ever means, for example, "but change it to 1pm or maybe 3pm if daylight time starts or ends". They always mean "2pm on the local clock, regardless of how often or by how much politicians change the local clock". timeline arithmetic is horrid for this kind of thing. It's only if you _do_ use timeline arithmetic for calendar operations that you need to know about timezone rules, in order to _undo_ the damage timeline arithmetic did. Ignore the timezone entirely (classic arithmetic), and it's much easier. Indeed, if you added a dateutil relativedelta to a datetime with a tzinfo that _did_ force timeline arithmetic, nothing would blow up but the result could be dead wrong, and _would_ most likely be dead wrong whenever the input and result had an odd number of DST transitions between them. You can, of course, look at its source. While it could be rewritten to force classic arithmetic, it doesn't bother now. The relativedelta type's implementation never even checks to see whether a datetime input _has_ a tzinfo. It doesn't need to care now. It builds the result out of a mix of replacing some fields in the datetime (like the year and/or month, if required), and leaves the rest to one or more uses of Python's datetime + timedelta arithmetic. For example, "3rd Monday of the month" reduces to dateutil figuring out when the first Monday of the month is, then adding a Python timedelta with days=[the number needed to get to the first Monday of the month] and weeks=2. dateutil has lots of its own logic to implement, but it currently relies on that classic arithmetic is always in effect, and is spared from needing to duplicate the logic already implemented by Python's timedelta arithmetic. The latter is a very useful building block for these kinds of applications, directly handling all (& only) the units needed in "calendar operations" for which there is no argument about "the best" meaning. > But maybe if I tried to implement one (which I will never do) , I'd see you > point. Bu tin any case, doesn't dateutils already provide this? What a coincidence! I must have read your mind ;-) From tim.peters at gmail.com Sat Sep 5 10:06:37 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 5 Sep 2015 03:06:37 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55E9F626.1080906@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> Message-ID: [Tim, on hats] >> ... [Carl] > I don't expect consistency from humans, it's just that my hat-intuiter > doesn't always work right :-) Nor my hat-signaler! [Carl] >>> All else being equal, designing a green-field datetime library, >>> "universally recognized best practice" does not provide any argument for >>> naive arithmetic over aware arithmetic on aware datetimes. Making the >>> choice to implement aware arithmetic is not "fighting" a best practice, >>> it's just providing a reasonable and fully consistent convenience for >>> simple cases. >> It would create an "attractive nuisance", yes ;-) > I think that either choice of arithmetic might be an attractive > nuisance; what matters is consistency with the rest of the choices in > the library. I went on to explain why the specific case of default timeline arithmetic is an "attractive nuisance": making it dead easy to spell a poor practice. That remains poor practice forever after. "Easy to spell" makes it attractive. "Poor practice forever after" makes it a nuisance. Classic arithmetic is equivalent to doing integer arithmetic on integer POSIX timestamps (although with wider range the same across all platforms, and extended to microsecond precision). That's hardly novel - there's a deep and long history of doing exactly that in the Unix(tm) world. Which is Guido's world. There "shouldn't be" anything controversial about that. The direct predecessor was already best practice in its world. How that could be considered a nuisance seems a real strain to me. Where it gets muddy is extending classic arithmetic to aware datetimes too. Then compounding the conceptual confusion by adding timeline interzone subtraction and comparison. > If datetime did naive arithmetic on tz-annotated datetimes, and also > refused to ever implicitly convert them to UTC for purposes of > cross-timezone comparison or arithmetic, and included a `fold` parameter > not on the datetime object itself but only as an additional input > argument when you explicitly convert from some other timezone to UTC, > that would be a consistent view of the meaning of a tz-annotated > datetime, and I wouldn't have any problem with that. I would. Pure or not, it sounds unusable: when I convert _from_ UTC to a local zone, I have no idea whether I'll end up in a gap, a fold, or neither. And so I'll have no idea either what to pass _to_ .utcoffset() when I need to convert back to UTC. It doesn't solve the conversion problem. It's a do-it-yourself kit missing the most important piece. "But .fromutc() could return the right flag to pass back later" isn't attractive either. Then the user ends up needing to maintain their own (datetime, convert_back_flag) pairs. In which case, why not just store the flag _in_ the datetime? Only tzinfo methods would ever need to look at it. But note it's still not theoretically ideal: it would mean timezone conversion is not a wholly order-preserving function in all cases.. I'd much rather be drinking that poison, though :-( > It would be a view consistent with what Guido described a few days ago, > that "noon Eastern on June 3 2020" is not necessarily equivalent to a > UTC instant; it means nothing more than "noon Eastern on June 3 2020" If it wasn't obvious, "noon Eastern on June 3 2020" _is_ a "naive time" in Guido's head. One that will eventually become a civil time, but not before civil time gets close to 2020. > until you choose to explicitly convert it to UTC, providing a full > zoneinfo definition of "Eastern" (and possibly a `fold` argument too, > though it's not needed for "noon Eastern June 3 2020" unless something > changes) at that moment. > > But that isn't datetime's view, at least not consistently. The problem > isn't datetime's choice of arithmetic; it's just that sometimes it wants > to treat a tz-annotated datetime as one thing, and sometimes as another. How many times do we need to agree on this? ;-) Although the conceptual fog has not really been an impediment to using the module in my experience. In yours? Do you use datetime? If so, do you trip over this? > (The fact that a _person_ might also want to have one sometimes and > another sometimes is not a reason for an implementation to try to guess > when they want one and when they want another. It could be a reason for > two different types.) Or three, or four, or ... but, in practice, one type has worked OK for me. Guido's "noon Eastern on June 3 2020" won't actually create any problems for him either. >> There is no argument that can possibly succeed for changing arithmetic >> on aware datetimes: "Tim as Python developer hat" there. That would >> be massively backward-incompatible. No chance whatsoever. > Of course! That's abundantly clear, and I'd be every bit as opposed as > you are to a backwards-incompatible change. Can we just assume that if I > refer to "changing arithmetic" it's short-hand for "provide an option > for full consistency in a way that only occurs with an opt-in choice by > the user, leaving existing code behaving identically." > > The latter is the only thing I've ever proposed, so your choice to > assume here that I meant the former feels a bit like an intentional > misunderstanding so as to provide an opportunity for unnecessary > hyperbole. Or maybe your intuiter is just fallible too ;-) You missed that I had my jester hat on ;-) That was intended to be comic relief, a dogmatic & rigid over-the-top rant from "a Python developer". It's a shame that you chopped part of it, because the fragment that remains doesn't do it full justice. Next time I'll try to sound even more insanely enraged ;-) > ... > "What is the root issue" and "is the root issue practically worth fixing > today" are separable questions. I'm still trying to figure out the > former (but I think we're finally getting there); I'm not at all sure > what I think of the latter (and won't be until I try an implementation). I think the root problem is that "civil time" is a frickin' mess. If you want purity on all counts, then you need an object that solely represents civil time, even to the extent of _requiring_ a non-None, fully functional tzinfo. Else you're leaving "but _whose_ civil time?" ambiguous, and your object no longer represents a single instant in UTC, and you can only possibly support classic arithmetic (if you support any arithmetic at all). But so much baggage is required to specify one of those, lots of apps will look elsewhere. So types will multiply. Maybe that's the best that can be done. > ... > Of course. But I don't believe at all that understanding the core issues > clearly, and identifying what we'd ideally have chosen initially, is > pointless. It can be very useful (even a precondition) for deciding > _how_ to move on from what is the case. Except PEPs yearn to get beyond this stage ;-) That is, there's always an early stage where everyone wants to debate every design decision that was ever made leading up to the PEP (sometimes even just vaguely related to something the PEP mentions). That's fine, but the PEP author(s) eventually tune out. They're not free to redesign anything, and are usually trying to solve a more-or-less specific problem. Like here, we're just trying to add one stinking bit ;-) If that inspires someone else to create a grander solution, that's great. I'm not sure it's ever happened, but it _could_ be great :-) >>> 2) Principle of least surprise for casual users. On this question, "you >>> should use UTC for arithmetic" is equivalent to "you should use a period >>> recurrence library for period arithmetic." Both arguments are true in >>> principle, neither one is relevant to the question of casual users >>> getting the results they expect. >> That last wasn't ever really a _driving_ force in Python's design. >> From the earlier example, a great many users have complained a great >> many times that >> >> 1 + "123" >> >> _doesn't_ return 124. That _is_ what most casual users expect. Tough >> luck - Python's not for the terminally lazy. > This example is a false equivalence. All equivalences are false, yes? I remain happy enough with the high-order bits of this one. > Clearly, trying to guess what a casual user expects to result from > an ambiguous operation is a bad idea. We're not guessing at all: we know darned well what most casual users expect in this case. They've been _screaming_ 124 from the start. The high-order bit is, as I said earlier, that catering to what casual users expect has never been a primary driver in Python's design. It' may be a consideration, but perhaps never at the top of the list. > I don't think datetime arithmetic (even on non-UTC datetimes) is an > ambiguous operation, The analogy wasn't about ambiguity; it was intended to be about "what a casual user expects" not being a strong argument in the context of Python's design history. > given an implementation that consistently treats all timezone-aware > datetimes as unambiguous instants, or an implementation that > consistently treats them as naive datetimes with a timezone annotation. > Given an implementation like datetime that isn't sure what they are, > _either_ choice of arithmetic is an attractive nuisance. But only if I also assume the user is terminally dense. It's like PEP 20 says: There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. datetime does Dutch arithmetic. Once a user figures that out, it's obvious _then_. And then also the only obvious way to do classic arithmetic. Guido thought using UTC for timeline arithmetic was the one obvious way to do that; The first time a user encounters datetime, they may well _think_ "OK, I'll add a tzinfo, and now I'll get timeline arithmetic!". That's why this general rule of Python design required two entire lines in PEP 20 - their thinking is flawed because they're not Dutch. But they can learn to be. Then there is indeed one - and only one - obvious way to do each flavor of arithmetic, and each way is consistent with best practices appropriate for that way. I couldn't care less whether they "get it" at once. I would care if they _never_ got it. But Guido still wouldn't care - he will always be more profoundly Dutch than me ;-) >> Well, you can't see me, but I really do have a collection of 42 hats >> on the table next to me, and every time I write a reply, sentence by >> sentence I put on the hat most appropriate to what the current >> sentence intends ;-) > That's an excellent image, and I'll keep it in mind :-) If you picture me wearing a nightcap now, you have it nailed ;-) From tim.peters at gmail.com Sun Sep 6 03:22:04 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 5 Sep 2015 20:22:04 -0500 Subject: [Datetime-SIG] Another approach to 495's glitches Message-ID: Thinking out loud. Right now, we're making interzone arithmetic consistent at the expense of making intrazone operations baffling in some fold edge cases. I'd like to see if we could reverse that. Partly because datetime "shouldn't have" supported by-magic interzone arithmetic to begin with. But mostly because, outside of Python's test suite, I've never seen an instance of by-magic interzone comparison or subtraction (it's certain none of my code ever used it, and I've never seen it elsewhere in real code I can recall). So, compared to what Python does today: 1. Intrazone. Go back to what the first 495 stab did: ignore fold entirely (act as if it were always 0), including in hash(). 2. Interzone. A. Subtraction. Change nothing. B. Comparison. B1. __eq__. If either operand has fold=1, return False. B2. __ne__. If either operand has fold=1, return True. B3. The others. Change nothing. The hash problem goes away, because equality transitivity is restored in the cases it matters for the hash problem (under 2B1 a datetime with fold=1 never compares equal to any datetime in a different zone). Before (first 495 stab) we had, where `early` and `late` are the same except for `fold`: uearly = early.astimezone(utc) ulate = late.astimezone(utc) and then: uearly == early == late == ulate uearly < ulate hash(uearly) == hash(early) == hash(late) hash(ulate) almost certainly != to those, despite late == ulate That made a high-quality & correct hash() exceedingly painful. Now (current 495 stab) we have: uearly == early < late == ulate hash(uearly) == hash(early) hash(ulate) == hash(late) No problem there, but "early < late" within the zone is so at odds with "naive time" that various kinds of endcase backwards incompatibilty snuck in (some of which explained in great detail in messages between Carl and me). It "looks nice" because we _are_ favoring by-magic intrazone consistency at the expense of everything else. In endcases sticking within the zone, it doesn't always "look nice" at all. Under 2B1 and 2B2: uearly == early == late != ulate uearly < ulate hash(uearly) == hash(early) == hash(late) hash(ulate) almost certainly != to those, but that's fine since late != ulate, early != ulate, and uearly != ulate What we lose is: A. trichotomy in interzone comparison in rare cases. Right above, we have late != ulate, but we do _not_ have late < ulate or late > ulate either. We're forcing __eq__ to say they're not equal, despite that otherwise comparison logic would say they are equal. B. equivalence between interzone comparison and interzone subtraction in rare cases. Right above, we have late - ulate == 0 despite that late != ulate. C. equality transitivity in rare cases that don't affect the hash problem. Right above, `late` has fold=1 so 2B2 says it's not equal to `uearly` or `ulate` (it's "not equal" to _any_ datetime in UTC). However, we also have uearly == early == late, from which we could normally infer uearly == late. D. zone conversion isn't wholly order-preserving. Right above, the ambiguous times compare equal in their own zone, but map to != values in UTC. `early` and `late` are equal in their own zone but not in any other zone where neither ends up with fold=1. So, until I find something I missed ;-) , all the rare endcase surprises are pushed into interzone operations I doubt are used much (if at all). Seems better than putting them in routinely used intrazone operations. For the docs, the spiel would be along the lines that fold=1 is a new case, and for technical reasons an aware datetime with fold=1 can't compare equal to any datetime in any other zone. That's "really" all this amounts to. Apps that need interzone comparison or subtraction should convert to UTC instead. Then everything will work fine. I'd also say that by-magic interzone comparison and subtraction may be deprecated someday. Something to discourage its use. Especially because, in fact, I bet it's barely (if ever) used now. Someone else's turn now ;-) PS: not quite yet. All the examples above assumed PEP 495-compliant tzinfos were in use. As detailed in a message with Carl, there are also "backward compatibility" issues to consider after 495 is implement but pre-495 tzinfos are used. Making early < late can cause endcase surprises there too. Under the idea here, as in the first 495 stab, those surprises go away again, because _nothing_ within a zone will "see fold=1", not even the tzinfo (remember, it's a pre-495 tzinfo in this case). From alexander.belopolsky at gmail.com Sun Sep 6 03:33:59 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 5 Sep 2015 21:33:59 -0400 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: On Sat, Sep 5, 2015 at 9:22 PM, Tim Peters wrote: > B. Comparison. > B1. __eq__. If either operand has fold=1, return False. > Congratulations, you've just reinvented a NAN. Sorry, but I won't sacrifice the reflexivity of == for any other invariant. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sun Sep 6 03:38:44 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 5 Sep 2015 21:38:44 -0400 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: Sorry, I missed the " Interzone" part. Maybe you are on to something ... On Sat, Sep 5, 2015 at 9:33 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Sat, Sep 5, 2015 at 9:22 PM, Tim Peters wrote: > >> B. Comparison. >> B1. __eq__. If either operand has fold=1, return False. >> > > Congratulations, you've just reinvented a NAN. Sorry, but I won't > sacrifice the reflexivity of == for any other invariant. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 6 03:38:47 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 5 Sep 2015 20:38:47 -0500 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: [Tim] >> B. Comparison. >> B1. __eq__. If either operand has fold=1, return False. [Alex] > Congratulations, you've just reinvented a NAN. Sorry, but I won't > sacrifice the reflexivity of == for any other invariant. Neither would I. I suspect you read hastily and missed that this quote, in context, is in the "interzone" section. It only applies to comparisons between _different_ zones. Of course x == x would return True for any datetime x. That case was in the earlier "intrazone" section, where it just said "do what the first stab at 495 did" (ignore fold entirely for intrazone comparisons). From tim.peters at gmail.com Sun Sep 6 03:41:57 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 5 Sep 2015 20:41:57 -0500 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: [Alex] > Sorry, I missed the " Interzone" part. Maybe you are on to something ... That's OK - I find it hard _not_ to be punch-drunk by now ;-) But if you see anything that in any way complicates intrazone behavior, I'll be appalled. The entire point here is to restore intrazone relative sanity at the expense of pushing garbage into the (possibly never used in real life) interzone by-magic operations. From alexander.belopolsky at gmail.com Sun Sep 6 03:53:56 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 5 Sep 2015 21:53:56 -0400 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: On Sat, Sep 5, 2015 at 9:22 PM, Tim Peters wrote: > 1. Intrazone. > > Go back to what the first 495 stab did: ignore fold entirely (act as > if it were always 0), including in hash(). > > 2. Interzone. > > A. Subtraction. Change nothing. > > B. Comparison. > B1. __eq__. If either operand has fold=1, return False. > B2. __ne__. If either operand has fold=1, return True. > B3. The others. Change nothing. > I really like this solution. The reason I was procrastinating with updating the PEP to reflect the previous solution was that I really did not like the fact that it would make fold=1 times in the gap equal to the times right before the gap. Now this problem will go away with many others. Let me sleep on this, but I really think this may work. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 6 04:02:33 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 5 Sep 2015 21:02:33 -0500 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: [Alex] > I really like this solution. The reason I was procrastinating with updating > the PEP to reflect the previous solution was that I really did not like the > fact that it would make fold=1 times in the gap equal to the times right > before the gap. Now this problem will go away with many others. > > Let me sleep on this, but I really think this may work. Good! I need to think on it more too. It's much more like your first 495 stab, which indeed had many nice properties. Nobody here gives a shit about the interzone by-magic operations, so I'm happy to sacrifice damn near anything in those ;-) FYI, I'm most concerned about how glibly I "sold" the idea that it really does solve the hash problem. It seems obvious to me that it does, but ... hash problems have a way of popping up in unexpected ways in unconsidered contexts :-( From tim.peters at gmail.com Sun Sep 6 08:11:33 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 6 Sep 2015 01:11:33 -0500 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: [Tim] > ... > FYI, I'm most concerned about how glibly I "sold" the idea that it > really does solve the hash problem. It seems obvious to me that it > does, but ... hash problems have a way of popping up in unexpected > ways in unconsidered contexts :-( So, after thinking about this for a few days, it's obvious after all ;-) Consider two aware datetimes that compare equal. The task is to prove they have the same hash. The subtlety is that while __eq__ and __hash__ both use a notion of "UTC equivalent", they're not always the same notion. __eq__ always uses the given values of `fold`, while __hash__ always forces fold=0. 1. Same zone. .utcoffset() isn't used for equality in this case; it's only used by hash. Equality implies they differ at most in `fold`. Since hash() forces fold=0, hash's calls to .utcoffset() see exactly the same stuff for both, so hash's force-fold-to-0 UTC equivalents are the same. Same UTC equivalents, same hashes. 2. Different zones. Equality implies fold=0 for both, and that both map to the same UTC time. Since we know fold=0 for both, we know __eq__ and __hash__ use the same notion of UTC equivalent for both, so __hash__ sees the same UTC equivalents __eq__ already saw and judged equal. Same UTC equivalents, same hashes. Where it failed before: `later` is the later of an ambiguous time, so has fold=1. `ulater` is its UTC equivalent (with fold=0). They compared equal before. But hash(later) computed the hash based on the force-fold-to-0 UTC equivalent, which is not the same as the fold=1 UTC equivalent `ulater`. hash(ulater) and hash(later) had no more in common than hash(math.pi) and hash("hash"). And they still won't. But in the new world later != ulater (at least one has fold=1 in a cross-zone comparison), so it no longer matters that the hashes differ. From tim.peters at gmail.com Sun Sep 6 20:58:56 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 6 Sep 2015 13:58:56 -0500 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: [Tim] > ... > Consider two aware datetimes that compare equal. The task is to prove > they have the same hash. The subtlety is that while __eq__ and > __hash__ both use a notion of "UTC equivalent", they're not always the > same notion. __eq__ always uses the given values of `fold`, while > __hash__ always forces fold=0. Which obviously ;-) suggests yet another, possibly cleaner, approach: have interzone subtraction, and all interzone comparisons, _also_ force fold to 0 (instead of having only interzone __eq__ and __ne__ special-case fold=1) . There are many details about consequences for me to work out, but it sounds promising on the face of it. "The story" gets a lot more uniform then: fold=1 is simply ignored (acts as if 0) by virtually everything, except for 495 tzinfo operations, where `fold` is essential. Then we'd again have, e.g., uearly == early == late != ulate uearly < ulate but "late != ulate" in this variant not because __ne__ is special casing fold=1, but because all cross-zone comparisons use the force-fold-to-0 UTC equivalents for both `late` and `ulate`, and they're simply not equal (assuming 495-conforming tzinfo; for a pre-495 tzinfo, they would be equal, but in that case uearly==ulate too). We'd also have late - ulate != timedelta(0) for the same reason, and consistency between interzone comparison and subtraction would be restored. Trichotomy for cross-zone comparison would also be restored (for x and y in different zones, exactly one of xy would be true). That zone conversion isn't always order-preserving would remain so, but it's impossible for any scheme to always preserve order so long as early == late in the source zone, and it's highly desirable that they do compare equal. The only remaining obvious glitch is that Interzone by-magic subtraction and comparison would act as if fold=0 all the time, so may return wrong results in cases where fold=1, although wrong results consistent between interzone subtraction and comparison. I don't care much, for reasons explained before. Convert to UTC first if you need to care about cross-zone comparison or subtraction in cases of ambiguous times - that will always get the right answers (assuming 495-conforming tzinfos are in use). From alexander.belopolsky at gmail.com Sun Sep 6 22:53:41 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 6 Sep 2015 16:53:41 -0400 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: On Sun, Sep 6, 2015 at 2:58 PM, Tim Peters wrote: > [Tim] > > ... > > Consider two aware datetimes that compare equal. The task is to prove > > they have the same hash. The subtlety is that while __eq__ and > > __hash__ both use a notion of "UTC equivalent", they're not always the > > same notion. __eq__ always uses the given values of `fold`, while > > __hash__ always forces fold=0. > > Which obviously ;-) suggests yet another, possibly cleaner, approach: > have interzone subtraction, and all interzone comparisons, _also_ > force fold to 0 (instead of having only interzone __eq__ and __ne__ > special-case fold=1) . > I would not go that far. While interzone subtraction between arbitrary zones is a rarely needed overkill, I find it useful to have subtraction work between a local zone and UTC. For me, subtraction in this case is similar to conversion. Fix the EPOCH and d = t - EPOCH together with t = EPOCH + d gives you a bijection between times and timedeltas. From that, you are one step away from various numeric time scales. For example (t - datetime(1, 1, 1, tzinfo=timezone.utc)) // timedelta.resolution will give you a bijection between datetimes and some range of integers. Thus if we are going to "sell" fold as a way to implement conversions that "always work", I think we should include these types of conversions as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Sep 7 02:19:58 2015 From: carl at oddbird.net (Carl Meyer) Date: Sun, 6 Sep 2015 18:19:58 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> Message-ID: <55ECD82E.9070305@oddbird.net> Hi Tim, (tl;dr I think your latest proposal re PEP 495 is great.) I think we're still mis-communicating somewhat. Before replying point by point, let me just try to explain what I'm saying as clearly as I can. Please tell me precisely where we part ways in this analysis. Consider two models for the meaning of a "timezone-aware datetime object". Let's just call them Model A and Model B: In Model A, an aware datetime (in any timezone) is nothing more than an alternate (somewhat complexified for human use) spelling of a Unix timestamp, much like a timedelta is just a complexified spelling of some number of microseconds. In this model, there's a bijection between aware datetimes in any two timezones. (This model requires the PEP 495 flag, or some equivalent. Technically, this model _could_ be implemented by simply storing a Unix timestamp and a timezone name, and doing all date/time calculations at display time.) In this model, "Nov 2 2014 1:30am US/Eastern fold=1" and "Nov 2 2014 6:30am UTC" are just alternate spellings of the _same_ underlying timestamp. Characteristics of Model A: * There's no issue with comparisons or arithmetic involving datetimes in different timezones; they're all just Unix timestamps under the hood anyway, so ordering and arithmetic is always obvious and consistent: it's always equivalent to simple integer arithmetic with Unix timestamps. * Conversions between timezones are always unambiguous and lossless: they're just alternate spellings of the same integer, after all. * In this model, timeline arithmetic everywhere is the only option. Every non-UTC aware datetime is just an alternate spelling of an equivalent UTC datetime / Unix timestamp, so in a certain sense you're always doing "arithmetic in UTC" (or "arithmetic with Unix timestamps"), but you can spell it in whichever timezone you like. In this model, there's very little reason to consider arithmetic in non-UTC timezones problematic; it's always consistent and predictable and gives exactly the same results as converting to UTC first. For sizable systems it may still be good practice to do everything internally in UTC and convert at the edges, but the reasons are not strong; mostly just avoiding interoperability issues with databases or other systems that don't implement the same model, or have poor timezone handling. * In this model, "classic" arithmetic doesn't even rise to the level of "attractive nuisance," it's simply "wrong arithmetic," because you get different results if working with the "same time" represented in different timezones, which violates the core axiom of the model; it's no longer simply arithmetic with Unix timestamps. I don't believe there's anything wrong with Model A. It's not the right model for _all_ tasks, but it's simple, easy to understand, fully consistent, and useful for many tasks. On the whole, it's still the model I find most intuitive and would prefer for most of the timezone code I personally write (and it's the one I actually use today in practice, because it's the model of pytz). Now Model B. In Model B, an "aware datetime" is a "clock face" or "naive" datetime with an annotation of which timezone it's in. A non-UTC aware datetime in model B doesn't inherently know what POSIX timestamp it corresponds to; that depends on concepts that are outside of its naive model of local time, in which time never jumps or goes backwards. Model B is what Guido was describing in his email about an aware datetime in 2020: he wants an aware datetime to mean "the calendar says June 3, the clock face says noon, and I'm located in US/Eastern" and nothing more. Characteristics of Model B: * Naive (or "classic", or "move the clock hands") arithmetic is the only kind that makes sense under Model B. * As Guido described, if you store an aware datetime and then your tz database is updated before you load it again, Model A and Model B aware datetimes preserve different invariants. A Model A aware datetime will preserve the timestamp it represents, even if that means it now represents a different local time than before the zoneinfo change. A Model B aware datetime will preserve the local clock time, even though it now corresponds to a different timestamp. * You can't compare or do arithmetic between datetimes in different timezones under Model B; you need to convert them to the same time zone first (which may require resolving an ambiguity). * Maintaining a `fold` attribute on datetimes at all is a departure from Model B, because it represents a bit of information that's simply nonsense/doesn't exist within Model B's naive-clock-time model. * Under Model B, conversions between timezones are lossy during a fold in the target timezone, because two different UTC times map to the same Model B local time. These models aren't chosen arbitrarily; they're the two models I'm aware of for what a "timezone-aware datetime" could possibly mean that preserve consistent arithmetic and total ordering in their allowed domains (in Model A, all aware datetimes in any timezone can interoperate as a single domain; in Model B, each timezone is a separate domain). A great deal of this thread (including most of my earlier messages and, I think, even parts your last message here that I'm replying to) has consisted of proponents of one of these two models arguing that behavior from the other model is wrong or inferior or buggy (or an "attractive nuisance"). I now think these assertions are all wrong :-) Both models are reasonable and useful, and in fact both are capable enough to handle all operations, it's just a question of which operations they make simple. Model B people say "just do all your arithmetic and comparisons in UTC"; Model A people say "if you want Model B, just use naive datetimes and track the implied timezone separately." I came into this discussion assuming that Model A was the only sensible way for a datetime library to behave. Now (thanks mostly to Guido's note about dates in 2020), I've been convinced that Model B is also reasonable, and preferable for some uses. I've also been convinced that Model B is the dominant influence and intended model in datetime's design, and that's very unlikely to change (even in a backwards-compatible way), so I'm no longer advocating that. Datetime.py, unfortunately, has always mixed behavior from the two models (interzone operations are all implemented from a Model A viewpoint; intrazone are Model B). Part of the problem with this is that it results in a system that looks like it ought to have total ordering and consistent arithmetic, but doesn't. The bigger problem is that it has allowed people to come to the library from either a Model A or Model B viewpoint and find enough behavior confirming their mental model to assume they were right, and assume any behavior that doesn't match their model is a bug. That's what happened to Stuart, and that's why pytz implements Model A, and has thus encouraged large swathes of Python developers to even more confidently presume that Model A is the intended model. I think your latest proposal for PEP 495 (always ignore `fold` in all intra-zone operations, and push the inconsistency into inter-zone comparisons - which were already inconsistent - instead) is by far the best option for bringing loss-less timezone-conversion round-trips to Model B. Instead of saying (as earlier revisions of PEP 495 did) "we claim we're really Model B, but we're going to introduce even more Model A behaviors, breaking the consistency of Model B in some cases - good luck keeping it straight!" it says "we're sticking with Model B, in which `fold` is meaningless when you're working within a timezone, but in the name of practical usability we'll still track `fold` internally after a conversion, so you don't have to do it yourself in case you want to convert to another timezone later." If the above analysis makes any sense at all to anyone, and you think something along these lines (but shorter and more carefully edited) would make a useful addition to the datetime docs (either as a tutorial-style "intro to how datetime works and how to think about aware datetimes" or as an FAQ), I'd be very happy to write that patch. Now on to your message: [Tim] > Classic arithmetic is equivalent to doing integer arithmetic on > integer POSIX timestamps (although with wider range the same across > all platforms, and extended to microsecond precision). That's hardly > novel - there's a deep and long history of doing exactly that in the > Unix(tm) world. Which is Guido's world. There "shouldn't be" > anything controversial about that. The direct predecessor was already > best practice in its world. How that could be considered a nuisance > seems a real strain to me. Unless I'm misunderstanding what you are saying (always likely!), I think this is just wrong. POSIX timestamps are a representation of an instant in time (a number of seconds since the epoch _in UTC_). If you are doing any kind of "integer arithmetic on POSIX timestamps", you are _always_ doing timeline arithmetic. Classic arithmetic may be many things, but the one thing it definitively is _not_ is "arithmetic on POSIX timestamps." This is easy to demonstrate: take one POSIX timestamp, convert it to some timezone with DST, add 86400 seconds to it (using "classic arithmetic") across a DST gap or fold, and then convert back to a POSIX timestamp, and note that you don't have a timestamp 86400 seconds away from the first timestamp. If you were doing simple "arithmetic on POSIX timestamps", such a result would not be possible. In Model A (the one that Lennart and myself and Stuart and Chris have all been advocating during all these threads), all datetimes (in any timezone) are unambiguous representations of a POSIX timestamp, and all arithmetic is "arithmetic on POSIX timestamps." That right there is the definition of timeline arithmetic. So yes, I agree with you that it's hard to consider "arithmetic on POSIX timestamps" an attractive nuisance :-) > Where it gets muddy is extending classic arithmetic to aware datetimes > too. If by "muddy" you mean "not in any way 'arithmetic on POSIX timestamps' anymore." :-) I don't even know what you mean by "extending to aware datetimes" here; the concept of "arithmetic on POSIX timestamps" has no meaning at all with naive datetimes (unless you're implicitly assuming some timezone), because naive datetimes don't correspond to any particular instant, whereas a POSIX timestamp does. > Then compounding the conceptual confusion by adding timeline > interzone subtraction and comparison. Yes, that addition (of Model A behavior into a Model B world) has caused plenty of confusion! It's the root cause for most of the content on this mailing list so far, I think :-) [Carl] >> If datetime did naive arithmetic on tz-annotated datetimes, and also >> refused to ever implicitly convert them to UTC for purposes of >> cross-timezone comparison or arithmetic, and included a `fold` parameter >> not on the datetime object itself but only as an additional input >> argument when you explicitly convert from some other timezone to UTC, >> that would be a consistent view of the meaning of a tz-annotated >> datetime, and I wouldn't have any problem with that. [Tim] > I would. Pure or not, it sounds unusable: when I convert _from_ UTC > to a local zone, I have no idea whether I'll end up in a gap, a fold, > or neither. And so I'll have no idea either what to pass _to_ > .utcoffset() when I need to convert back to UTC. It doesn't solve the > conversion problem. It's a do-it-yourself kit missing the most > important piece. "But .fromutc() could return the right flag to pass > back later" isn't attractive either. Then the user ends up needing to > maintain their own (datetime, convert_back_flag) pairs. In which > case, why not just store the flag _in_ the datetime? Only tzinfo > methods would ever need to look at it. Yes, I agree with you here. I think your latest proposal for PEP 495 does a great job of providing this additional convenience for the user without killing the intra-timezone Model B consistency. I just wish that the inconsistent inter-timezone operations weren't supported at all, but I know it's about twelve years too late to do anything about that other than document some variant of "you shouldn't compare or do arithmetic with datetimes in different timezones; if you do you'll get inconsistent results in some cases around DST transitions. Convert to the same timezone first instead." [Tim] >> But that isn't datetime's view, at least not consistently. The problem >> isn't datetime's choice of arithmetic; it's just that sometimes it wants >> to treat a tz-annotated datetime as one thing, and sometimes as another. > > How many times do we need to agree on this? ;-) Everybody all together now, one more time! :-) Until your latest proposal on PEP 495, I wasn't sure we really did agree on this, because it seemed you were still willing to break the consistency of Model B arithmetic in order to gain some of the benefits of Model A (that is, introduce _even more_ of this context-dependent ambiguity as to what a tz-annotated datetime means.) But your latest proposal fixes that in a way I'm quite happy with, given where we are. > Although the > conceptual fog has not really been an impediment to using the module > in my experience. > > In yours? Do you use datetime? If so, do you trip over this? No, because I use pytz, in which there is no conceptual fog, just strict Model A (and an unfortunate API). I didn't get to experience the joy of this conceptual fog until I started arguing with you on this mailing list! And now I finally feel like I'm seeing through that fog a bit. I hope I'm right :-) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Mon Sep 7 02:31:12 2015 From: carl at oddbird.net (Carl Meyer) Date: Sun, 6 Sep 2015 18:31:12 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9E0C3.7070003@oddbird.net> Message-ID: <55ECDAD0.3000505@oddbird.net> On 09/04/2015 12:25 PM, Guido van Rossum wrote: > I made it up, in analogy to "classic classes" in Python 2. I did this > not as a euphemism, but to avoid confusion, since in the existing docs > "naive" is only ever applied to objects (meaning tzinfo-less) and I > wanted to have a term that couldn't confuse anyone into thinking we were > only talking about arithmetic of naive objects. Thanks for the clarification; that's reasonable. I shouldn't have presumed a reason for the term. And, as others have pointed out, "naive arithmetic" is just as invented-here as "classic arithmetic" -- perhaps more meaningful to someone not already familiar with it, but also possibly leading them to the wrong meaning. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Mon Sep 7 02:36:51 2015 From: carl at oddbird.net (Carl Meyer) Date: Sun, 6 Sep 2015 18:36:51 -0600 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: <55ECDC23.1080201@oddbird.net> On 09/06/2015 02:53 PM, Alexander Belopolsky wrote: > On Sun, Sep 6, 2015 at 2:58 PM, Tim Peters > wrote: > > [Tim] > > ... > > Consider two aware datetimes that compare equal. The task is to prove > > they have the same hash. The subtlety is that while __eq__ and > > __hash__ both use a notion of "UTC equivalent", they're not always the > > same notion. __eq__ always uses the given values of `fold`, while > > __hash__ always forces fold=0. > > Which obviously ;-) suggests yet another, possibly cleaner, approach: > have interzone subtraction, and all interzone comparisons, _also_ > force fold to 0 (instead of having only interzone __eq__ and __ne__ > special-case fold=1) . > > I would not go that far. While interzone subtraction between arbitrary > zones is a rarely needed overkill, I find it useful to have subtraction > work between a local zone and UTC. For me, subtraction in this case is > similar to conversion. Fix the EPOCH and d = t - EPOCH together with t > = EPOCH + d gives you a bijection between times and timedeltas. From > that, you are one step away from various numeric time scales. For > example (t - datetime(1, 1, 1, tzinfo=timezone.utc)) // > timedelta.resolution will give you a bijection between datetimes and > some range of integers. Thus if we are going to "sell" fold as a way to > implement conversions that "always work", I think we should include > these types of conversions as well. FWIW, Tim's latest proposal (either variant) resolves all my concerns with PEP 495 (as I explained at greater length in the "Timeline arithmetic" thread). Fundamentally I don't care between these two variants (because the difference between them only impacts interzone operations, and my general advice on those going forward would be "don't use them"). Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Mon Sep 7 03:05:41 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 6 Sep 2015 20:05:41 -0500 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: [Tim] >> ... >> Which obviously ;-) suggests yet another, possibly cleaner, approach: >> have interzone subtraction, and all interzone comparisons, _also_ >> force fold to 0 (instead of having only interzone __eq__ and __ne__ >> special-case fold=1) . [Alex] > I would not go that far. While interzone subtraction between arbitrary > zones is a rarely needed overkill, I find it useful to have subtraction work > between a local zone and UTC. Have you done so already in real life, or did it just occur to you that you _could_ find it useful? > For me, subtraction in this case is similar to conversion. Fix the EPOCH > and d = t - EPOCH together with t = EPOCH + d gives you a bijection between > times and timedeltas. Well, not without more words to clarify which operations are intended. For example, it's impossible to tell what "-" means there unless you spell out whether you're using classic or timeline arithmetic. In order to make your final claim true, I have to (I believe) reverse-engineer that the claim is restricted to naive EPOCH and `d`, or aware datetimes in a common fixed-offset zone. Otherwise your "-" uses timeline arithmetic and your "+" classic arithmetic, and they're different kinds of arithmetic in a non-fixed-offset zone. > From that, you are one step away from various numeric time scales. > For example (t - datetime(1, 1, 1, tzinfo=timezone.utc)) // timedelta.resolution will > give you a bijection between datetimes and some range of integers. In this case the ambiguity is whether, by `datetimes`, you mean `t` represents points in t.tzinfo's civil time, or points in a tzinfo-annotated naive time. I have to believe you mean the former, because converting to UTC irretrievably loses tzinfo-annotated naive times that correspond to "gap times" in that tzinfo's civil time (i.e., this code doesn't give a bijection of tzinfo-annotated naive datetimes if there are gaps in the tzinfo's civil time: more than one naive time can map to the same UTC time then, and so also to the same integer then). Replacing `t` with t.astimezone(utc) would make that obvious instead of a puzzle, making it utterly clear that you only have civil time in mind. All instances of by-magic timeline arithmetic are an "attractive nuisance" in datetime's current design :-( > Thus if we are going to "sell" fold as a way to implement conversions that > "always work", I think we should include these types of conversions as well. Unfortunately, I have to suspect _someone_ out there already has this kind of code, wrong-headed ;-) as it is. So that kills that. Unfortunately, that leaves the "special-case fold=1 in __eq__ and __ne__" idea violating enough formal properties in interzone arithmetic, albeit in rare cases, that I expect the best we can hope for this PEP is "grudging acceptance". I'll have to go back and read the "how about an insanely delicate hash() implementation instead?" messages again ;-) From tim.peters at gmail.com Mon Sep 7 03:19:08 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 6 Sep 2015 20:19:08 -0500 Subject: [Datetime-SIG] Another approach to 495's glitches In-Reply-To: References: Message-ID: [Alex] >> For me, subtraction in this case is similar to conversion. Fix the EPOCH >> and d = t - EPOCH together with t = EPOCH + d gives you a bijection between >> times and timedeltas. [Tim] > Well, not without more words to clarify which operations are intended. > For example, it's impossible to tell what "-" means there unless you > spell out whether you're using classic or timeline arithmetic. In > order to make your final claim true, I have to (I believe) > reverse-engineer that the claim is restricted to naive EPOCH and `d`, > or aware datetimes in a common fixed-offset zone. Otherwise your "-" > uses timeline arithmetic and your "+" classic arithmetic, and they're > different kinds of arithmetic in a non-fixed-offset zone. I'm missing a case there: common non-fixed-offset zone. That one doesn't fail because different kinds of arithmetic are used (classic is always used then), but because classic arithmetic ignores `fold` entirely - there's no bijection in that case if you're viewing `t` as civil time. So, your EPOCH and ` t` share a common (possibly None) tzinfo, and you're talking about a bijection in naive (not civil) time. From tim.peters at gmail.com Mon Sep 7 11:12:04 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 04:12:04 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55ECD82E.9070305@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> Message-ID: [Carl Meyer ] > (tl;dr I think your latest proposal re PEP 495 is great.) I don't. The last two were less annoying, though ;-) > I think we're still mis-communicating somewhat. Before replying point by > point, Or it could be we have different goals here, and each keep trying to nudge the other to change the topic ;-) > let me just try to explain what I'm saying as clearly as I can. > Please tell me precisely where we part ways in this analysis. > > Consider two models for the meaning of a "timezone-aware datetime > object". Let's just call them Model A and Model B: In which context? Abstractly, or the context of Python's current datetime module, or in the context of some hypothetical future Python datetime module, or some datetime module that _might_ have existed instead, or ...? My only real interest here is moving the module that actually exists to one that can get conversions right in all cases, preferably in a wholly backward-compatible way. Models don't really matter to that, but specific behaviors do. > In Model A, an aware datetime (in any timezone) is nothing more than an > alternate (somewhat complexified for human use) spelling of a Unix > timestamp, much like a timedelta is just a complexified spelling of some > number of microseconds. A Python datetime is also just a complexified spelling of some number of microseconds (since the start of 1 January 1 of the proleptic Gregorian calendar). > In this model, there's a bijection between aware datetimes in any > two timezones. (This model requires the PEP 495 flag, > or some equivalent. Technically, this model _could_ be implemented by > simply storing a Unix timestamp and a timezone name, and doing all > date/time calculations at display time.) In this model, "Nov 2 2014 > 1:30am US/Eastern fold=1" and "Nov 2 2014 6:30am UTC" are just alternate > spellings of the _same_ underlying timestamp. > > Characteristics of Model A: > > * There's no issue with comparisons or arithmetic involving datetimes in > different timezones; they're all just Unix timestamps under the hood > anyway, so ordering and arithmetic is always obvious and consistent: > it's always equivalent to simple integer arithmetic with Unix timestamps. > > * Conversions between timezones are always unambiguous and lossless: > they're just alternate spellings of the same integer, after all. > > * In this model, timeline arithmetic everywhere is the only option. Why? The kind of arithmetic needed for a task depends on the task. There are no specific use cases given here, so who can say? Some tasks need to account for real-world durations; others need to overlook irregularities in real-world durations (across zone transitions) in order to maintain regularities between the before-and-after calendar notations. Timeline arithmetic is only directly useful for dealing with real-world durations as they affect civil calendar notations. Some tasks require that, other tasks can't tolerate that. That said, it would be cleanest to have distinct types for each purpose. Whether that would be more _usable_ I don't know. > Every non-UTC aware datetime is just an alternate spelling of an > equivalent UTC datetime / Unix timestamp, so in a certain sense you're > always doing "arithmetic in UTC" (or "arithmetic with Unix timestamps"), > but you can spell it in whichever timezone you like. In this model, > there's very little reason to consider arithmetic in non-UTC timezones > problematic; it's always consistent and predictable and gives exactly > the same results as converting to UTC first. For sizable systems it may > still be good practice to do everything internally in UTC and convert at > the edges, but the reasons are not strong; mostly just avoiding > interoperability issues with databases or other systems that don't > implement the same model, or have poor timezone handling. How do you think timeline arithmetic is implemented? datetime's motivating use cases overwhelmingly involved quick access to local calendar notation, so datetime stores local calendar notation (both in memory and in pickles) directly. Any non-toy implementation of timeline arithmetic would store time internally in UTC ticks instead, enduring expensive conversions to local calendar notation only when explicitly demanded. As is, the only way to get timeline arithmetic in datetime is to do some equivalent to converting to UTC first, doing dirt simple arithmetic in UTC, then converting back to local calendar notation. That's _horridly_ expensive in comparison. pytz doesn't avoid this. The arithmetic itself is fast, because it is in fact classic arithmetic. The expense is hidden in the .normalize() calls, which perform to-UTC-and-back "repair". Pragmatics are important here too. For many problem domains, you have to get results before the contract expires ;-) > * In this model, "classic" arithmetic doesn't even rise to the level of > "attractive nuisance," it's simply "wrong arithmetic," because you get > different results if working with the "same time" represented in > different timezones, which violates the core axiom of the model; it's no > longer simply arithmetic with Unix timestamps. Models are irrelevant to right or wrong; right or wrong can only be judged with respect to use cases (does a gimmick address the required task, or not? if so, "right"; if not, is it at least feasible to get the job done? if so, "grr - but OK"; if still not, "wrong"). Models can make _achieving_ "right" harder or easier, depending on what a use case requires. datetime's model and implementation made it relatively easy to address every use case collected across an extensive public design phase. None of them were about accounting for real-world duration delta as they affect, or are affected by, civil calendar notations. Of course those may not be _your_ use cases. > I don't believe there's anything wrong with Model A. It's not the right > model for _all_ tasks, but it's simple, easy to understand, fully > consistent, and useful for many tasks. Sure! Except for the "simple" and "easy to understand" parts ;-) People really do trip all the time over zone transitions, to the extent that no two distinct implementations of C mktime() can really be expected to set is_dst the same way in all cases, not even after decades of bug fixes. Your "poor timezone handling" is a real problem in edge cases across platforms. > On the whole, it's still the model I find most intuitive and would prefer > for most of the timezone code I personally write (and it's the one I actually > use today in practice, because it's the model of pytz). Do you do much datetime _arithmetic_ in pytz? If you don't, the kind of arithmetic you like is pretty much irrelevant ;-) But, if you do, take pytz's own docs to heart: The preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans. Your arithmetic-intensive code would run much faster if you followed that advice, and you could throw out mountains of .normalize() calls. You're working in Python, and even the storage format of Python datetimes strongly favors classic arithmetic (as before, any serious implementation of timeline arithmetic would store UTC ticks directly instead). > Now Model B. In Model B, an "aware datetime" is a "clock face" or > "naive" datetime with an annotation of which timezone it's in. A non-UTC > aware datetime in model B doesn't inherently know what POSIX timestamp > it corresponds to; that depends on concepts that are outside of its > naive model of local time, in which time never jumps or goes backwards. > Model B is what Guido was describing in his email about an aware > datetime in 2020: he wants an aware datetime to mean "the calendar says > June 3, the clock face says noon, and I'm located in US/Eastern" and > nothing more. > > Characteristics of Model B: > > * Naive (or "classic", or "move the clock hands") arithmetic is the only > kind that makes sense under Model B. It again depends on which specific use cases you have in mind. Few people think inside a rigid model. Sometimes they want to break out of the model, especially when a use case requires it ;-) As you know all too well already, Python also intends to support a programmer changing their mind, to view their annotated naive datetime as a moment in civil time too, at least for zone conversion purposes. > * As Guido described, if you store an aware datetime and then your tz > database is updated before you load it again, Model A and Model B aware > datetimes preserve different invariants. A Model A aware datetime will > preserve the timestamp it represents, even if that means it now > represents a different local time than before the zoneinfo change. A > Model B aware datetime will preserve the local clock time, even though > it now corresponds to a different timestamp. > > * You can't compare or do arithmetic between datetimes in different > timezones under Model B; you need to convert them to the same time zone > first (which may require resolving an ambiguity). > > * Maintaining a `fold` attribute on datetimes at all is a departure from > Model B, because it represents a bit of information that's simply > nonsense/doesn't exist within Model B's naive-clock-time model. > > * Under Model B, conversions between timezones are lossy during a fold > in the target timezone, because two different UTC times map to the same > Model B local time. Should also note that Model B conversions to UTC can map two datetimes to the same UTC time (for times in a gap - they don't exist on the local civil clock, so have to map to the same UTC value as some other Model B time that _does_ exist on the local clock). > These models aren't chosen arbitrarily; they're the two models I'm aware > of for what a "timezone-aware datetime" could possibly mean that > preserve consistent arithmetic and total ordering in their allowed > domains (in Model A, all aware datetimes in any timezone can > interoperate as a single domain; in Model B, each timezone is a separate > domain). > > A great deal of this thread (including most of my earlier messages and, > I think, even parts your last message here that I'm replying to) has > consisted of proponents of one of these two models arguing that behavior > from the other model is wrong or inferior or buggy (or an "attractive > nuisance"). Direct overloaded-operator support for timeline arithmetic is an attractive nuisance _in datetime_, or any other Python module sharing datetime's data representation. I disagree with your "but the reasons are not strong" above. It requires relatively enormous complexity and expense to perform each lousy timeline addition, subtraction, and comparison in a non-eternally-fixed-offset zone. It's poor practice for that reason alone. Nevertheless, your code, your choice. > I now think these assertions are all wrong :-) Both models > are reasonable and useful, and in fact both are capable enough to handle > all operations, it's just a question of which operations they make > simple. Model B people say "just do all your arithmetic and comparisons > in UTC"; Model A people say "if you want Model B, just use naive > datetimes and track the implied timezone separately." Do note that my _only_ complaint against timeline arithmetic is making it seductively easy to spell in Python's datetime. It's dead easy to get the same results in the intended way (or, would be, in a post-495 world). > I came into this discussion assuming that Model A was the only sensible > way for a datetime library to behave. Now (thanks mostly to Guido's note > about dates in 2020), I've been convinced that Model B is also > reasonable, and preferable for some uses. For the use cases collected when datetime was being designed, it was often the clearly better model, and was never the worse model. Where "better" and "worse" are judged relative to the model's naturalness in addressing a use case. Alas, those were collected on a public Wiki that no longer appears to exist. > I've also been convinced that Model B is the dominant influence > and intended model in datetime's design, and that's very unlikely > to change (even in a backwards-compatible way), so I'm no > longer advocating that. That's good, because futility can become tiring as the decades drag on ;-) > Datetime.py, unfortunately, has always mixed behavior from the two > models (interzone operations are all implemented from a Model A > viewpoint; intrazone are Model B). Part of the problem with this is that > it results in a system that looks like it ought to have total ordering > and consistent arithmetic, but doesn't. The bigger problem is that it > has allowed people to come to the library from either a Model A or Model > B viewpoint and find enough behavior confirming their mental model to > assume they were right, and assume any behavior that doesn't match their > model is a bug. That's what happened to Stuart, and that's why pytz > implements Model A, and has thus encouraged large swathes of Python > developers to even more confidently presume that Model A is the intended > model. Stuart would have to address that. He said earlier that his primary concern was to fix conversions in all cases, not arithmetic. Explained before that timeline arithmetic was a natural consequence of the _way_ pytz repaired conversions. It's natural enough then to assume "oh, I just fixed _two_ bugs!" ;-) As is, as Isaac noted earlier, he's had a hellish time getting, e.g., pytz and dateutil to work together. dateutil requires classic arithmetic (which is by far the more convenient for implementing almost all forms of "calendar operations"). So, e.g., take a pytz aware datetime d, and do d += relativedelta(month=12, day=1, weekday=FR(+3)) where everything on the RHS is a dateutil way to spell "same time on the 3rd Friday of this December" when added to a datetime. That's not particularly contrived - it's, e.g., a way to spell the day monthly US equity options expire in December, and a user may well need to set an alarm "at the same wall clock time" then to check their expiring December contracts before the market closes. Being an hour off _could_ be a financial disaster to them. The result is fine, until you do a pytz .normalize(). If d, e.g., started in June, then in the US the hour _will_ magically become wrong "because" there was a DST transition between the original and final times. Far worse than useless. A similar fate awaits any attempt to make timeline arithmetic a default behavior (if it changed what datetime + timedelta did directly, the dateutil result would be wrong immediately, because dateutil's relativedelta.__add__ relies in part on what `datetime + timedelta` does). "Plays nice with others" is also important unless a module is content to live in a world of its own. > I think your latest proposal for PEP 495 (always ignore `fold` in all > intra-zone operations, and push the inconsistency into inter-zone > comparisons - which were already inconsistent - instead) is by far the > best option for bringing loss-less timezone-conversion round-trips to > Model B. Instead of saying (as earlier revisions of PEP 495 did) "we > claim we're really Model B, but we're going to introduce even more Model > A behaviors, breaking the consistency of Model B in some cases - good > luck keeping it straight!" it says "we're sticking with Model B, in > which `fold` is meaningless when you're working within a timezone, but > in the name of practical usability we'll still track `fold` internally > after a conversion, so you don't have to do it yourself in case you want > to convert to another timezone later." Alas, there's still no _good_ solution to this :-( > If the above analysis makes any sense at all to anyone, and you think > something along these lines (but shorter and more carefully edited) > would make a useful addition to the datetime docs (either as a > tutorial-style "intro to how datetime works and how to think about aware > datetimes" or as an FAQ), I'd be very happy to write that patch. I've mentioned a few times before that I'd welcome something more akin to the "floating-point surprises" appendix: https://docs.python.org/3/tutorial/floatingpoint.html Most users don't want to read anything about theory, but it needs to be discussed sometimes. So in that appendix, the approach is to introduce bite-sized chunks of theory to explain concrete, visible _behaviors_, along with practical advice. The goal is to get the reader unstuck, not to educate them _too_ much ;-) Anyway, that appendix appears to have been effective at getting many users unstuck, so I think it's a now-proven approach. >> Classic arithmetic is equivalent to doing integer arithmetic on >> integer POSIX timestamps (although with wider range the same across >> all platforms, and extended to microsecond precision). That's hardly >> novel - there's a deep and long history of doing exactly that in the >> Unix(tm) world. Which is Guido's world. There "shouldn't be" >> anything controversial about that. The direct predecessor was already >> best practice in its world. How that could be considered a nuisance >> seems a real strain to me. > Unless I'm misunderstanding what you are saying (always likely!), I > think this is just wrong. POSIX timestamps are a representation of an > instant in time (a number of seconds since the epoch _in UTC_). Well, in the POSIX approximation to UTC. Strict POSIX forbids using real-world UTC (which suffers leap seconds). But, below, I won't keep making this distinction. That should be a relief ;-) > If you are doing any kind of "integer arithmetic on POSIX timestamps", you > are _always_ doing timeline arithmetic. True. > Classic arithmetic may be many things, but the one thing it definitively is > _not_ is "arithmetic on POSIX timestamps." False. UTC is an eternally-fixed-offset zone. There are no transitions to be accounted for in UTC. Classic and timeline arithmetic are exactly the same thing in any eternally-fixed-offset zone. Because POSIX timestamps _are_ "in UTC", any arithmetic performed on one is being done in UTC too. Your illustration next goes way beyond anything I could possibly read as doing arithmetic on POSIX timestamps: > This is easy to demonstrate: take one POSIX timestamp, convert it to > some timezone with DST, add 86400 seconds to it (using "classic > arithmetic") across a DST gap or fold, and then convert back to a POSIX > timestamp, and note that you don't have a timestamp 86400 seconds away > from the first timestamp. If you were doing simple "arithmetic on POSIX > timestamps", such a result would not be possible. But you're cheating there. It's clear as mud what you have in mind, concretely, for the _result_ of what you get from "convert it to some timezone with DST", but the result of that can't possibly be a POSIX timestamp: as you said at the start, a POSIX timestamp denotes a number of seconds from the epoch _in UTC_ You're no longer in UTC. You left the POSIX timestamp world at your very first step. So anything you do after that is irrelevant to how arithmetic on POSIX timestamps behaves. BTW, how do you intend to do that conversion to begin with? C's localtime() doesn't return time_t (a POSIX timestamp). The standard C library supports no way to perform the conversion you described, because that's not how times are intended to work in C, because in turn the Unix world has the same approach to this as Python's datetime: all timeline arithmetic is intended to be done in UTC (equivalent to POSIX timestamps), converting to UTC first (C's mktime()), then back when arithmetic is done (C's localtime()). The only difference is that datetime spells both C library functions via .astimezone(), and is about 1000 times easier to use ;-) If you're unfamiliar with how this stuff is done in C, here's a typically incomprehensible ;-) man page briefly describing all the main C time functions: http://linux.die.net/man/3/mktime Note that mktime ("convert from local to UTC") is the _only_ one returning a timestamp (time_t). The intent is you do all arithmetic on time_t's, staying in UTC for the duration. When you're done, _then_ localtime() converts your final time_t back to local calendar notation (fills a `struct tm` for output). Exactly the same dance datetime intends. Python stole almost all of this from C best practice, except for the spelling. If by "convert it to some timezone with DST", you intended to get a struct tm (local calendar notation), then add 86400 to the tm_sec member, then that doesn't even have an hallucinogenic resemblance to doing arithmetic on POSIX timestamps. > In Model A (the one that Lennart and myself and Stuart and Chris have > all been advocating during all these threads) timezone) are unambiguous > representations of a POSIX timestamp, and all arithmetic is "arithmetic > on POSIX timestamps." That right there is the definition of timeline arithmetic. Here's an example of arithmetic on POSIX timestamps: 1 + 2 returning 3. It's not some kind of equivalence relation or bijection, it's concretely adding two integers to get a third integer. That's all I mean by "arithmetic on POSIX timestamps". It's equally useful for implementing classic or timeline arithmetic. The difference between those isn't in the timestamp arithmetic, it's in how conversions between integers and calendar notations are defined. There does happen to be an obvious bijection between arithmetic on (wide enough) POSIX timestamps and naive datetime arithmetic, which is in turn trivially isomorphic to aware datetime arithmetic in UTC. Although the "obvious" there depends on knowing first that, at heart, a Python datetime is an integer count of microseconds since the start of 1 January 1. It's just an integer stored in a bizarre mixed-radix notation. > So yes, I agree with you that it's hard to consider "arithmetic on POSIX > timestamps" an attractive nuisance :-) >> Where it gets muddy is extending classic arithmetic to aware datetimes >> too. > If by "muddy" you mean "not in any way 'arithmetic on POSIX timestamps' > anymore." :-) > > I don't even know what you mean by "extending to aware datetimes" here; I meant what I said: extending classic arithmetic to aware datetimes muddied the waters. Because some people do expect aware datetimes to implement timeline arithmetic instead. That's all. > the concept of "arithmetic on POSIX timestamps" has no meaning at all > with naive datetimes (unless you're implicitly assuming some timezone), > because naive datetimes don't correspond to any particular instant, > whereas a POSIX timestamp does. If you need to, implicitly assume UTC. There are no surprises at all if you want to _think_ of naive datetimes as being in (the POSIX approximation of real-world) UTC. They're identical in all visible behaviors that don't require a tzinfo. Indeed, here's how to convert a naive datetime `dt` "by hand" to an integer POSIX timestamp, pretending `dt` is a UTC time: EPOCH = datetime(1970, 1, 1) ts = (dt - EPOCH) // timedelta(seconds=1) Try it! If you don't have Python 3, it's just as trivial, but you'll have to convert the 3 timedelta attributes (days, seconds, microseconds) to seconds by hand and add them. After, do EPOCH + timedelta(seconds=ts) to get back the original dt. To get a floating POSIX timestamp instead (including microseconds): ts = (dt - EPOCH).total_seconds() Please let's not argue about trivially easy bijections. datetime's natural EPOCH is datetime(1, 1, 1), and _all_ classic arithmetic is easily defined in terms of integer arithmetic on integer-count-of-microsecond timestamps starting from there. While it would be _possible_ to think of those as denoting UTC timestamps, it wouldn't really be helpful ;-) ... >>> If datetime did naive arithmetic on tz-annotated datetimes, and also >>> refused to ever implicitly convert them to UTC for purposes of >>> cross-timezone comparison or arithmetic, and included a `fold` parameter >>> not on the datetime object itself but only as an additional input >>> argument when you explicitly convert from some other timezone to UTC, >>> that would be a consistent view of the meaning of a tz-annotated >>> datetime, and I wouldn't have any problem with that. >> I would. Pure or not, it sounds unusable: when I convert _from_ UTC >> to a local zone, I have no idea whether I'll end up in a gap, a fold, >> or neither. And so I'll have no idea either what to pass _to_ >> .utcoffset() when I need to convert back to UTC. It doesn't solve the >> conversion problem. It's a do-it-yourself kit missing the most >> important piece. "But .fromutc() could return the right flag to pass >> back later" isn't attractive either. Then the user ends up needing to >> maintain their own (datetime, convert_back_flag) pairs. In which >> case, why not just store the flag _in_ the datetime? Only tzinfo >> methods would ever need to look at it. > Yes, I agree with you here. I think your latest proposal for PEP 495 > does a great job of providing this additional convenience for the user > without killing the intra-timezone Model B consistency. I just wish that > the inconsistent inter-timezone operations weren't supported at all, but > I know it's about twelve years too late to do anything about that other > than document some variant of "you shouldn't compare or do arithmetic > with datetimes in different timezones; if you do you'll get inconsistent > results in some cases around DST transitions. Convert to the same > timezone first instead." Alas, I'm afraid Alex is right that people may well be using interzone subtraction to do conversions already. For example, the timestamp snippets I gave above are easily extended to convert any aware datetime to a POSIX timestamp: just slap tzinfo=utc on the EPOCH constant, and then by-magic interzone subtraction converts `dt` to UTC automatically. For that to continue to work as intended in all cases post-495, we can't change anything about interzone subtraction. Which, for consistency between them, implies we "shouldn't" change anything about interzone comparisons either. > ... > Until your latest proposal on PEP 495, I wasn't sure we really did agree > on this, because it seemed you were still willing to break the > consistency of Model B arithmetic in order to gain some of the benefits > of Model A (that is, introduce _even more_ of this context-dependent > ambiguity as to what a tz-annotated datetime means.) But your latest > proposal fixes that in a way I'm quite happy with, given where we are. I'm still not sure it's a net win to change anything . Lots of tradeoffs. I do gratefully credit our exchanges for cementing my hatred of muddying Model B: the more I had to "defend" Model B, the more intense my determination to preserve its God-given honor at all costs ;-) >> Although the conceptual fog has not really been an impediment to >> using the module in my experience. >> In yours? Do you use datetime? If so, do you trip over this? > No, because I use pytz, in which there is no conceptual fog, just strict > Model A (and an unfortunate API). And applications that apparently require no use whatsoever of dateutil operations ;-) > I didn't get to experience the joy of this conceptual fog until I > started arguing with you on this mailing list! And now I finally feel > like I'm seeing through that fog a bit. I hope I'm right :-) I doubt we'll ever know for sure ;-) From alexander.belopolsky at gmail.com Mon Sep 7 14:50:20 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 7 Sep 2015 08:50:20 -0400 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55ECD82E.9070305@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> Message-ID: On Sun, Sep 6, 2015 at 8:19 PM, Carl Meyer wrote: > In this model, there's a bijection between aware > datetimes in any two timezones. (This model requires the PEP 495 flag, > or some equivalent. > A nitpick, but since I am also guilty of such loose usage of the term "bijection", it may be worth a clarification. We often say that there is a bijection between two sets when in fact there is only a bijection between a subset of one set an a subset of another. In a particular case of aware datetimes with tzinfo=UTC and tzinfo=Local, a set U = {u ? datetime | u.tzinfo is UTC, u.fold=0} maps to a subset of L= {t ? datetime | u.tzinfo is Local}. This map creates a bijection between U and its image under the map, but we are still ignoring the possibility that timezone correction may take you out of [datetime.min, datetime.max] range. To rigorously construct a mathematical bijection - you need to account for those edge effects as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Sep 7 15:13:39 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 7 Sep 2015 09:13:39 -0400 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> Message-ID: On Mon, Sep 7, 2015 at 5:12 AM, Tim Peters wrote: > For the use cases collected when datetime was being designed, it was > often the clearly better model, and was never the worse model. Where > "better" and "worse" are judged relative to the model's naturalness in > addressing a use case. Alas, those were collected on a public Wiki > that no longer appears to exist. > The Wayback Machine to the rescue! https://web.archive.org/web/20060504021923/http://www.zope.org/Members/fdrake/DateTimeWiki/FrontPage -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Sep 7 18:20:55 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 7 Sep 2015 10:20:55 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> Message-ID: <55EDB967.2050108@oddbird.net> I'll offer another TL;DR: * You prefer Model B, and the use cases that drove the implementation of datetime favored Model B. Great! I have zero problem with that, and zero problem with datetime continuing to implement Model B (thus I agree with you completely that by-default -- operator overloaded -- timeline arithmetic in datetime would be wrong and break its model). As with any library I use, I just want its objects to implement a consistent and simple-as-possible (but no simpler!) mental model so that I can reliably predict its behavior. I understand that it's too late for datetime to do that fully, but we can still keep it in mind as a principle to help guide future changes. On 09/07/2015 03:12 AM, Tim Peters wrote: > [Carl Meyer ] >> (tl;dr I think your latest proposal re PEP 495 is great.) > > I don't. The last two were less annoying, though ;-) "Great" here is thoroughly in context of "where we are today, and where it's feasible to go from here." Isn't that the context you keep trying to get me to think in? Keep up with my hats already! ;-) More on the PEP 495 options later on. >> Consider two models for the meaning of a "timezone-aware datetime >> object". Let's just call them Model A and Model B: > > In which context? Abstractly, or the context of Python's current > datetime module, or in the context of some hypothetical future Python > datetime module, or some datetime module that _might_ have existed > instead, or ...? Any of 1, 3, or 4. But the exercise is illuminating for question 2, also. Per what you say below, it sounds like my insistence on discussing abstract mental models and their implications has already helped nudge you towards a proposal that maintains Model B consistency better. My preference for model A vs B is negligible compared to my preference for _some_ consistently-applied mental model, so I think that's "great." > My only real interest here is moving the module that actually exists > to one that can get conversions right in all cases, preferably in a > wholly backward-compatible way. Models don't really matter to that, > but specific behaviors do. I think the two most important questions you can ask about the behavior of any library are a) Does it apply a consistent mental model of the problem domain? and b) is that mental model applicable to the problems you need to solve? (Or perhaps it may offer more than one mental model, but clearly split in the API so you can decide which one applies best to your use cases). I can't really fathom an approach to library design (even library design constrained by backwards compatibility) that honestly believes "models don't really matter, but specific behaviors do." Models are critical in order to present a consistent set of behaviors that the user of the library can successfully predict, once they understand the model. >> In Model A, an aware datetime (in any timezone) is nothing more than an >> alternate (somewhat complexified for human use) spelling of a Unix >> timestamp, much like a timedelta is just a complexified spelling of some >> number of microseconds. > > A Python datetime is also just a complexified spelling of some number > of microseconds (since the start of 1 January 1 of the proleptic > Gregorian calendar). Which is a "naive time" concept, which is a pretty good sign that Python datetime wasn't intended to implement Model A. I thought it was already pretty clear that I'd figured that out by now :-) >> In this model, there's a bijection between aware datetimes in any >> two timezones. (This model requires the PEP 495 flag, >> or some equivalent. Technically, this model _could_ be implemented by >> simply storing a Unix timestamp and a timezone name, and doing all >> date/time calculations at display time.) In this model, "Nov 2 2014 >> 1:30am US/Eastern fold=1" and "Nov 2 2014 6:30am UTC" are just alternate >> spellings of the _same_ underlying timestamp. >> >> Characteristics of Model A: >> >> * There's no issue with comparisons or arithmetic involving datetimes in >> different timezones; they're all just Unix timestamps under the hood >> anyway, so ordering and arithmetic is always obvious and consistent: >> it's always equivalent to simple integer arithmetic with Unix timestamps. >> >> * Conversions between timezones are always unambiguous and lossless: >> they're just alternate spellings of the same integer, after all. >> >> * In this model, timeline arithmetic everywhere is the only option. > > Why? Because it's the only choice that doesn't break the mental model. If "all datetimes in any timezone are really just alternate spellings of a Unix timestamp", then adding X seconds to a datetime in any timezone must result in a datetime that represents a Unix timestamp that's X seconds later. _If you're within this mental model_. You may not prefer this mental model; you may think is less useful, or slower, or whatever, and that's fine. But you have to at least acknowledge that it is internally consistent and conceptually simple; it's fundamentally nothing more than arithmetic on POSIX timestamps, all the time and everywhere. I don't know how to say this any more clearly. If you still can't acknowledge that much, I think I have to give up. > The kind of arithmetic needed for a task depends on the task. > There are no specific use cases given here, so who can say? Some > tasks need to account for real-world durations; others need to > overlook irregularities in real-world durations (across zone > transitions) in order to maintain regularities between the > before-and-after calendar notations. Timeline arithmetic is only > directly useful for dealing with real-world durations as they affect > civil calendar notations. Some tasks require that, other tasks can't > tolerate that. Of course! I'm describing the implications of a mental model here, not arguing that it's the best model for all tasks. >> Every non-UTC aware datetime is just an alternate spelling of an >> equivalent UTC datetime / Unix timestamp, so in a certain sense you're >> always doing "arithmetic in UTC" (or "arithmetic with Unix timestamps"), >> but you can spell it in whichever timezone you like. In this model, >> there's very little reason to consider arithmetic in non-UTC timezones >> problematic; it's always consistent and predictable and gives exactly >> the same results as converting to UTC first. For sizable systems it may >> still be good practice to do everything internally in UTC and convert at >> the edges, but the reasons are not strong; mostly just avoiding >> interoperability issues with databases or other systems that don't >> implement the same model, or have poor timezone handling. > > How do you think timeline arithmetic is implemented? datetime's > motivating use cases overwhelmingly involved quick access to local > calendar notation, so datetime stores local calendar notation (both in > memory and in pickles) directly. Any non-toy implementation of > timeline arithmetic would store time internally in UTC ticks instead, > enduring expensive conversions to local calendar notation only when > explicitly demanded. As is, the only way to get timeline arithmetic > in datetime is to do some equivalent to converting to UTC first, doing > dirt simple arithmetic in UTC, then converting back to local calendar > notation. That's _horridly_ expensive in comparison. pytz doesn't > avoid this. The arithmetic itself is fast, because it is in fact > classic arithmetic. The expense is hidden in the .normalize() calls, > which perform to-UTC-and-back "repair". Yes, of course. I know all this. In summary: "datetime wasn't intended as Model A." How many times do we need to agree on that? ;-) And I've also agreed that datetime shouldn't be converted to Model A. So what are you trying to convince me of, here? >> * In this model, "classic" arithmetic doesn't even rise to the level of >> "attractive nuisance," it's simply "wrong arithmetic," because you get >> different results if working with the "same time" represented in >> different timezones, which violates the core axiom of the model; it's no >> longer simply arithmetic with Unix timestamps. > > Models are irrelevant to right or wrong; right or wrong can only be > judged with respect to use cases (does a gimmick address the required > task, or not? if so, "right"; if not, is it at least feasible to get > the job done? if so, "grr - but OK"; if still not, "wrong"). Models > can make _achieving_ "right" harder or easier, depending on what a use > case requires. Once again, you seem to be trying to interpret every characterization of Model A as an argument that "Model A is right, other models are wrong, and datetime ought to be Model A." I'm not saying any of that; which model is best obviously depends on the use case (though both models are _capable_ of handling all use cases, it just may be slower and less convenient. That's a typical set of tradeoffs when choosing a model). All I'm saying is "if you accept Model A as your mental model, this is the behavior that must follow (the behavior that is _right_ _for the model_; which _is_ something that is possible to judge), else you've broken the model, and you're implementing some other model instead, or (worse) you're not implementing a consistent model at all." >> I don't believe there's anything wrong with Model A. It's not the right >> model for _all_ tasks, but it's simple, easy to understand, fully >> consistent, and useful for many tasks. > > Sure! Except for the "simple" and "easy to understand" parts ;-) Maybe not to you, I guess; though I have to suspect that you're playing a little dumb here for effect (is this the jester hat?). I think "everything is isomorphic to a Unix timestamp, just represented in different spellings, and all arithmetic is isomorphic to integer arithmetic on Unix timestamps" is pretty simple and easy to understand, personally. > People really do trip all the time over zone transitions, Of course they do, because timezones, and timezone transitions specifically, are terrible. And some will continue to trip over them, in different ways and in different scenarios, regardless of whether they work in Model A or Model B. They will trip over them _more_ if they are using a library that can't decide what mental model it implements, and tries to guess that they mean one for this operation and another for that operation, than if they are using a library that consistently implements one mental model. Do we still agree on that, or not anymore? ;-) >> On the whole, it's still the model I find most intuitive and would prefer >> for most of the timezone code I personally write (and it's the one I actually >> use today in practice, because it's the model of pytz). > > Do you do much datetime _arithmetic_ in pytz? If you don't, the kind > of arithmetic you like is pretty much irrelevant ;-) But, if you do, > take pytz's own docs to heart: > > The preferred way of dealing with times is to always work in UTC, > converting to localtime only when generating output to be read > by humans. > > Your arithmetic-intensive code would run much faster if you followed > that advice, and you could throw out mountains of .normalize() calls. > You're working in Python, and even the storage format of Python > datetimes strongly favors classic arithmetic (as before, any serious > implementation of timeline arithmetic would store UTC ticks directly > instead). I do follow that advice; I don't believe my latest heavy-datetime-using application does non-UTC timeline arithmetic anywhere. But unless a library outlaws arithmetic on non-UTC datetimes altogether, I'd like it to implement it in a way that's consistent with its mental model, whichever one it picks. Because not all little scripts need to follow the ideal best practice and squeeze out optimal performance, but they nonetheless deserve predictable behavior that consistently implements _some_ mental model of the problem domain. >> Now Model B. In Model B, an "aware datetime" is a "clock face" or >> "naive" datetime with an annotation of which timezone it's in. A non-UTC >> aware datetime in model B doesn't inherently know what POSIX timestamp >> it corresponds to; that depends on concepts that are outside of its >> naive model of local time, in which time never jumps or goes backwards. >> Model B is what Guido was describing in his email about an aware >> datetime in 2020: he wants an aware datetime to mean "the calendar says >> June 3, the clock face says noon, and I'm located in US/Eastern" and >> nothing more. >> >> Characteristics of Model B: >> >> * Naive (or "classic", or "move the clock hands") arithmetic is the only >> kind that makes sense under Model B. > > It again depends on which specific use cases you have in mind. Few > people think inside a rigid model. Sometimes they want to break out > of the model, especially when a use case requires it ;-) As you know > all too well already, Python also intends to support a programmer > changing their mind, to view their annotated naive datetime as a > moment in civil time too, at least for zone conversion purposes. I'm all in favor of Python supporting a programmer switching from one mental model to another. There are good ways to do that explicitly, e.g. by representing each mental model with its own type of object. See JodaTime/NodaTime for one example. I'm not in favor of Python guessing that the programmer "probably" has one mental model in mind when doing one operation, and another when doing another, on the very same object. That kind of thing leads to angry programmers who think the library is buggy. You may have seen a few of them on this mailing list ;-) I thought we agreed on this (I recall you saying "how many times do we have to agree on this?"), but then it seems like you keep waffling as to whether you actually do or not. I guess it depends which hat you're wearing at the time ;-) ... >> These models aren't chosen arbitrarily; they're the two models I'm aware >> of for what a "timezone-aware datetime" could possibly mean that >> preserve consistent arithmetic and total ordering in their allowed >> domains (in Model A, all aware datetimes in any timezone can >> interoperate as a single domain; in Model B, each timezone is a separate >> domain). >> >> A great deal of this thread (including most of my earlier messages and, >> I think, even parts your last message here that I'm replying to) has >> consisted of proponents of one of these two models arguing that behavior >> from the other model is wrong or inferior or buggy (or an "attractive >> nuisance"). > > Direct overloaded-operator support for timeline arithmetic is an > attractive nuisance _in datetime_, or any other Python module sharing > datetime's data representation. I 100% agree with you. Datetime is a Model B implementation (mostly); its data representation reflects that, and I absolutely don't think it should have operator-overloaded support for timeline arithmetic. Was I insufficiently clear about that? Actually, I think "attractive nuisance" is too weak here. I think operator-overloaded timeline arithmetic on aware datetimes in datetime would be simply wrong; it would break the mental model of what an aware datetime is, under Model B. > I disagree with your "but the reasons > are not strong" above. It requires relatively enormous complexity and > expense to perform each lousy timeline addition, subtraction, and > comparison in a non-eternally-fixed-offset zone. "In datetime or a a module sharing datetime's data representation," yes. My "but the reasons are not strong" was clearly specific to Model A, which datetime is not. I tried very hard to set up a clear delineation between the two models, and be very clear that I understand datetime is Model B and should remain that way. But nonetheless, you seem very determined to blur that line and interpret all my comments about Model A as if I'm saying they should apply to datetime. Please don't do that ;-) >> I now think these assertions are all wrong :-) Both models >> are reasonable and useful, and in fact both are capable enough to handle >> all operations, it's just a question of which operations they make >> simple. Model B people say "just do all your arithmetic and comparisons >> in UTC"; Model A people say "if you want Model B, just use naive >> datetimes and track the implied timezone separately." > > Do note that my _only_ complaint against timeline arithmetic is making > it seductively easy to spell in Python's datetime. Great! Then we agree, so can we stop arguing about it? ;-) I thought I was already pretty clear that I no longer believed that timeline arithmetic should be made easy to spell in Python's datetime. I just _also_ think that there _is_ a reasonable alternative mental model in which only timeline arithmetic makes sense and classic arithmetic looks buggy, and I thought that trying to clearly outline that alternative mental model might help make sense of where the "classic arithmetic is wrong!" viewpoint originates. >> If the above analysis makes any sense at all to anyone, and you think >> something along these lines (but shorter and more carefully edited) >> would make a useful addition to the datetime docs (either as a >> tutorial-style "intro to how datetime works and how to think about aware >> datetimes" or as an FAQ), I'd be very happy to write that patch. > > I've mentioned a few times before that I'd welcome something more akin > to the "floating-point surprises" appendix: > > https://docs.python.org/3/tutorial/floatingpoint.html > > Most users don't want to read anything about theory, but it needs to > be discussed sometimes. So in that appendix, the approach is to > introduce bite-sized chunks of theory to explain concrete, visible > _behaviors_, along with practical advice. The goal is to get the > reader unstuck, not to educate them _too_ much ;-) Anyway, that > appendix appears to have been effective at getting many users unstuck, > so I think it's a now-proven approach. That's very similar to what I had in mind, actually. I'll work on a doc patch, and look forward to you tearing it apart ;-) > >>> Classic arithmetic is equivalent to doing integer arithmetic on >>> integer POSIX timestamps (although with wider range the same across >>> all platforms, and extended to microsecond precision). That's hardly >>> novel - there's a deep and long history of doing exactly that in the >>> Unix(tm) world. Which is Guido's world. There "shouldn't be" >>> anything controversial about that. The direct predecessor was already >>> best practice in its world. How that could be considered a nuisance >>> seems a real strain to me. > >> If you are doing any kind of "integer arithmetic on POSIX timestamps", you >> are _always_ doing timeline arithmetic. > > True. > >> Classic arithmetic may be many things, but the one thing it definitively is >> _not_ is "arithmetic on POSIX timestamps." > > False. UTC is an eternally-fixed-offset zone. There are no > transitions to be accounted for in UTC. Classic and timeline > arithmetic are exactly the same thing in any eternally-fixed-offset > zone. Because POSIX timestamps _are_ "in UTC", any arithmetic > performed on one is being done in UTC too. Your illustration next > goes way beyond anything I could possibly read as doing arithmetic on > POSIX timestamps: Translation: "I refuse to countenance the possibility of Model A." >> This is easy to demonstrate: take one POSIX timestamp, convert it to >> some timezone with DST, add 86400 seconds to it (using "classic >> arithmetic") across a DST gap or fold, and then convert back to a POSIX >> timestamp, and note that you don't have a timestamp 86400 seconds away >> from the first timestamp. If you were doing simple "arithmetic on POSIX >> timestamps", such a result would not be possible. > > But you're cheating there. It's clear as mud what you have in mind, > concretely, for the _result_ of what you get from "convert it to > some timezone with DST", but the result of that can't possibly be a > POSIX timestamp: as you said at the start, a POSIX timestamp denotes > a number of seconds from the epoch _in UTC_ You're no longer in UTC. > You left the POSIX timestamp world at your very first step. So > anything you do after that is irrelevant to how arithmetic on POSIX > timestamps behaves. Not if your mental model is that an aware datetime in some other timezone is isomorphic to a POSIX timestamp with a timezone annotation. In that case, the "timezone conversion" part is really easy and obvious; you just change the timezone annotation. > BTW, how do you intend to do that conversion to begin with? C's > localtime() doesn't return time_t (a POSIX timestamp). The standard C > library supports no way to perform the conversion you described, > because that's not how times are intended to work in C, because in > turn the Unix world has the same approach to this as Python's > datetime: all timeline arithmetic is intended to be done in UTC > (equivalent to POSIX timestamps), converting to UTC first (C's > mktime()), then back when arithmetic is done (C's localtime()). The > only difference is that datetime spells both C library functions via > .astimezone(), and is about 1000 times easier to use ;-) > > If you're unfamiliar with how this stuff is done in C, here's a > typically incomprehensible ;-) man page briefly describing all the > main C time functions: > > http://linux.die.net/man/3/mktime Thank you. In exchange, here's a reference to the ZonedDateTime object from NodaTime: http://nodatime.org/1.3.x/api/html/T_NodaTime_ZonedDateTime.htm I think (notably unlike the C libraries) NodaTime/JodaTime is an excellent example of a datetime library that maintains its mental models clearly, and provides the necessary set of objects to represent all the various concepts unambiguously and consistently. I think its usability is attested to by the fact that it's become the de facto standard in the Java world, and somebody went to the trouble of porting it to .NET, too, where it's also become quite popular. >> In Model A (the one that Lennart and myself and Stuart and Chris have >> all been advocating during all these threads) timezone) are unambiguous >> representations of a POSIX timestamp, and all arithmetic is "arithmetic >> on POSIX timestamps." That right there is the definition of timeline arithmetic. > > Here's an example of arithmetic on POSIX timestamps: > > 1 + 2 > > returning 3. It's not some kind of equivalence relation or bijection, > it's concretely adding two integers to get a third integer. That's > all I mean by "arithmetic on POSIX timestamps". It's equally useful > for implementing classic or timeline arithmetic. The difference > between those isn't in the timestamp arithmetic, it's in how > conversions between integers and calendar notations are defined. > There does happen to be an obvious bijection between arithmetic on > (wide enough) POSIX timestamps and naive datetime arithmetic, which is > in turn trivially isomorphic to aware datetime arithmetic in UTC. > Although the "obvious" there depends on knowing first that, at heart, > a Python datetime is an integer count of microseconds since the start > of 1 January 1. It's just an integer stored in a bizarre mixed-radix > notation. So, "timeline arithmetic is just arithmetic on POSIX timestamps" means viewing aware datetimes as isomorphic to POSIX timestamps. "Classic arithmetic is just arithmetic on POSIX timestamps" means viewing aware datetimes as naive datetimes which one can pretend are in a hypothetical (maybe UTC, if you like) fixed-offset timezone which is isomorphic to actual POSIX timestamps (even though their actual timezone may not be fixed-offset). I accept that those are both true and useful in the implementation of their respective model. I just don't think either one is inherently obvious or useful as a justification of their respective mental models; rather, which one you find "obvious" just reveals your preferred mental model. ... >> I think your latest proposal for PEP 495 >> does a great job of providing this additional convenience for the user >> without killing the intra-timezone Model B consistency. I just wish that >> the inconsistent inter-timezone operations weren't supported at all, but >> I know it's about twelve years too late to do anything about that other >> than document some variant of "you shouldn't compare or do arithmetic >> with datetimes in different timezones; if you do you'll get inconsistent >> results in some cases around DST transitions. Convert to the same >> timezone first instead." > > Alas, I'm afraid Alex is right that people may well be using interzone > subtraction to do conversions already. For example, the timestamp > snippets I gave above are easily extended to convert any aware > datetime to a POSIX timestamp: just slap tzinfo=utc on the EPOCH > constant, and then by-magic interzone subtraction converts `dt` to UTC > automatically. For that to continue to work as intended in all cases > post-495, we can't change anything about interzone subtraction. > Which, for consistency between them, implies we "shouldn't" change > anything about interzone comparisons either. Such code wouldn't be any _more_ broken after PEP 495 in a fold case than it is already. You can't maintain consistency everywhere, because datetime already wants to treat aware datetimes as two different things in different places. I thought we'd established that. The interzone timeline arithmetic combined with intrazone classic arithmetic already results in inconsistencies. So your choices are: a) don't do PEP 495, and leave timezone conversions lossy for everyone (except people using pytz). This effectively forces everyone who wants loss-less conversions (and doesn't want to roll their own solution) into the pytz model, which you don't like, and the pytz API, which nobody likes. b) add `fold` solely as an argument to `astimezone` (and maybe `combine` and the constructor too?), and maybe somehow allow users to get its value out of a conversion going the other way (no idea what API would work there) and make the user keep track of it themselves if they are working in "local" time but may want to convert back later. This option forces the inconsistency out of datetime by just making it the user's problem. Usability is pretty bad, but at least it doesn't change existing behavior, gives users _some_ way to be correct, and doesn't guess at their intentions in inconsistent cases. c) spike your intra-timezone classic arithmetic with a dash of timeline arithmetic, making datetime even more confused about its mental model than it is already. d) don't support PEP 495 in interzone operations at all, meaning code using interzone operations gains no benefit from PEP 495, but is no more broken than it is today (but code using explicit timezone conversions does benefit) e) make interzone equality weird in fold cases, but otherwise support PEP 495 in interzone operations as well as conversions. I think (d) and (e) are the best options of those, and I don't have a strong preference between them. They aren't ideal, but there is no ideal option, including the "do nothing" option. All of these cases introduce inconsistency somewhere, it's just a question of where you want to put it. I'm personally not that fussed if you decide to stick with (a) instead. > I'm still not sure it's a net win to change anything . Lots of > tradeoffs. I do gratefully credit our exchanges for cementing my > hatred of muddying Model B: the more I had to "defend" Model B, the > more intense my determination to preserve its God-given honor at all > costs ;-) My work here is done ;-) Funny how it had roughly the opposite result from what I thought I wanted when I entered the conversation, but I still think it's the right result. >>> Although the conceptual fog has not really been an impediment to >>> using the module in my experience. > >>> In yours? Do you use datetime? If so, do you trip over this? > >> No, because I use pytz, in which there is no conceptual fog, just strict >> Model A (and an unfortunate API). > > And applications that apparently require no use whatsoever of dateutil > operations ;-) Oh, I use dateutil.rrule frequently, I just separate the tzinfo from the datetime first, which makes perfect sense to me as a way to say "Ok, I want to operate in the naive time model now, please." It's really not that hard :-) Please don't take my "I use pytz, so I don't have _these_ problems" as "I use pytz, so I have _no_ problems." I fully accept that pytz is a god-awful (though very impressive!) hack to implement Model A on top of something that was always meant to be Model B, and that results in both a bad API and bad performance for some operations (though the latter really couldn't be less of an issue for my uses). I'm still not sure what's a _better_ option than pytz for someone who wants fully-correct and round-trippable timezone conversions and fully-consistent behavior from a Python datetime library _today_. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From guido at python.org Mon Sep 7 19:06:19 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Sep 2015 10:06:19 -0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55EDB967.2050108@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: FYI, I am still completely overwhelmed by this discussion. I will wait until Tim and Alexander tell me there's a PEP to review and then I'll read that. Carl: if you feel your position is not represented in that PEP (even under "rejected alternatives") I recommend that you write your own PEP. But I really hope that you all will come to an agreement without competing PEPs! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Sep 7 19:28:02 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 7 Sep 2015 11:28:02 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: <55EDC922.7050103@oddbird.net> On 09/07/2015 11:06 AM, Guido van Rossum wrote: > FYI, I am still completely overwhelmed by this discussion. I will wait > until Tim and Alexander tell me there's a PEP to review and then I'll > read that. Carl: if you feel your position is not represented in that > PEP (even under "rejected alternatives") I recommend that you write your > own PEP. But I really hope that you all will come to an agreement > without competing PEPs! Sure. At the moment I think PEP 495 is headed in a direction I support, relative to the other options available. So I don't have any plans for a competing PEP. My latest couple messages in this thread are more about figuring out the right framing for a documentation addition that might help people (like me) coming from a pytz-style model understand datetime's model (and specifically understand how "classic arithmetic" is not a bug). I think I finally understand it now, so I'd like to put that understanding to good use. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Mon Sep 7 19:28:25 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 12:28:25 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: [Guido] > FYI, I am still completely overwhelmed by this discussion. I recommend that you skip any message with "timeline" in the Subject line ;-) Nobody is actually arguing to make timeline arithmetic (beyond what already exists) any part of PEP 495. But this is a "datetime" SIG, not a "PEP 495" SIG, so it's fair game to discuss it here. > I will wait until Tim and Alexander tell me there's a PEP to review and > then I'll read that. Carl: if you feel your position is not represented in that > PEP (even under "rejected alternatives") I recommend that you write > your own PEP. But I really hope that you all will come to an agreement > without competing PEPs! Short course: Carl prefers timeline arithmetic, but is not trying to change anything about what Python's datetime does by default. He would like a new kind of tzinfo that simultaneously fixes the conversion endcases _and_ forces use of timeline arithmetic for all operations Current code would neither be hurt nor helped, only code using the new tzinfos would see any difference. But current code trying to use a new tzinfo could break anywhere it relied on classic arithmetic. While I'm not entirely sure, best guess is that Carl would also prefer that 495 not be implemented. But his new kind of tzinfo could be implemented regardless. They don't really compete, except in the eternal battle over theoretical purity ;-) From carl at oddbird.net Mon Sep 7 19:37:12 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 7 Sep 2015 11:37:12 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: <55EDCB48.9010900@oddbird.net> On 09/07/2015 11:28 AM, Tim Peters wrote: > Short course: Carl prefers timeline arithmetic, but is not trying to > change anything about what Python's datetime does by default. He > would like a new kind of tzinfo that simultaneously fixes the > conversion endcases _and_ forces use of timeline arithmetic for all > operations Current code would neither be hurt nor helped, only code > using the new tzinfos would see any difference. But current code > trying to use a new tzinfo could break anywhere it relied on classic > arithmetic. I did propose that a couple days ago, and found the exercise of proposing it enlightening :-) but I don't even think that's a good idea anymore (as of yesterday, when I finally got my head fully around the internal consistency of the "naive local time" model). Trying to have both mental models implemented within datetime using different types of tzinfo would just confuse matters further. Different types of datetime would be a better bet, but that can just be a different library altogether. Better to have datetime be as true to its model as it can, and improve the intro docs so people assuming a timeline-arithmetic model can also get their heads around the naive-local-time model and do things the right way for that model. > While I'm not entirely sure, best guess is that Carl would also prefer > that 495 not be implemented. But his new kind of tzinfo could be > implemented regardless. They don't really compete, except in the > eternal battle over theoretical purity ;-) No, I would (weakly) prefer for PEP 495 to be accepted, as long as it chooses to push the required inconsistency into inter-timezone operations instead of breaking the consistency of classic arithmetic. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Mon Sep 7 20:43:29 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 13:43:29 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55EDB967.2050108@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: All this violent agreement ;-) is sucking away all the free time I have. So I'm going to try something else: focus on a single _seeming_ disagreement that makes no sense to me. [Carl] >>> In Model A, an aware datetime (in any timezone) is nothing more than an >>> alternate (somewhat complexified for human use) spelling of a Unix >>> timestamp, much like a timedelta is just a complexified spelling of some >>> number of microseconds. [Tim] >> A Python datetime is also just a complexified spelling of some number >> of microseconds (since the start of 1 January 1 of the proleptic >> Gregorian calendar). [Carl] > Which is a "naive time" concept, which is a pretty good sign that Python > datetime wasn't intended to implement Model A. I thought it was already > pretty clear that I'd figured that out by now :-) So: - You tell me that in model A an aware datetime is a spelling of a Unix timestamp. - I tell you that a Python datetime is a spelling of a different flavor of timestamp. - You tell me that "means" Python is using a naive time concept, and wasn't intended to implement model A. Can you see why I'm baffled? If it needs to explained, it's even more baffling to me. So here goes anyway: Model A uses a very similar concept. Not identical, because: - The Unix timestamp takes 1970-1-1 as its epoch, while Python's takes 1-1-1. They nevertheless use exactly the same proleptic calendar system. - The Unix timestamp counts seconds, but Python's counts microseconds (on a platform where time_t is a floating type, a Unix timestamp can approximate decimal microseconds too, as fractions of a second). - The resolution and range of a Unix timestamp vary across platforms, but Python defines both. Where's a theoretically _significant_ difference? It's simply not true that viewing datetimes as timestamps has anything to do with drawing a distinction between your models A and B. An implementation of model A may or may not explicitly store the Unix timestamp it has in mind. From your statement that under model A it's a "complexified" spelling of a Unix timestamp, I have to assume you have in mind implementations where it's not explicitly stored. In which case it's exactly the same as in Python today: to _find_ that Unix timestamp, you need to convert your complexified spelling to UTC first. Perhaps the distinction you have in mind is that, under Model A, it's impossible to think of an aware datetime as being anything _other_ than a Unix timestamp? That may have been what your "nothing more" meant. Then, yes, there is that difference: Python doesn't intend to force any specific interpretation of its timestamps beyond that they're instants in the proleptic Gregorian calendar. Model A also views them as instants in the proleptic Gregorian calendar, but tacks on "and that calendar must always be viewed as being in (a proleptic extension of an approximation to) UTC". So maybe I understand you now after all. But, if so, are these kinds of seeming disagreements really worth resolving? It requires a seemingly unreasonable amount of time & effort to arrive at the obvious ;-) From guido at python.org Mon Sep 7 21:04:42 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Sep 2015 12:04:42 -0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: Again, I can't follow this because I don't recall the definition of model A. But here's a fundamental difference between a timezone-aware datetime and a POSIX stamp (apart from epoch, range and precision). The difference applies only to "political" timezones, which may change offsets or DST rules. The difference is that an aware datetime says "in timezone Z, when the local clock says T". If T is in the future, politicians may change the mapping of T to UTC in Z. However, politics can't change the meaning of a POSIX timestamp. Even for T in the (distant) past the mapping may still change, when research finds that the rules for Z were different at some year in the past than they were presumed. So, to me, an aware datetime *fundamentally* differs from a POSIX timestamp, and even from a pair composed of a POSIX timestamp plus a tzinfo object. (POSIX timestamps are however embeddable in datetimes by using a fixed-offset tzinfo.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Sep 7 21:38:41 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 7 Sep 2015 13:38:41 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: <55EDE7C1.5010903@oddbird.net> On 09/07/2015 12:43 PM, Tim Peters wrote: > [Carl] >>>> In Model A, an aware datetime (in any timezone) is nothing more than an >>>> alternate (somewhat complexified for human use) spelling of a Unix >>>> timestamp, much like a timedelta is just a complexified spelling of some >>>> number of microseconds. > > [Tim] >>> A Python datetime is also just a complexified spelling of some number >>> of microseconds (since the start of 1 January 1 of the proleptic >>> Gregorian calendar). > > [Carl] >> Which is a "naive time" concept, which is a pretty good sign that Python >> datetime wasn't intended to implement Model A. I thought it was already >> pretty clear that I'd figured that out by now :-) > > So: > > - You tell me that in model A an aware datetime is a spelling of a > Unix timestamp. > > - I tell you that a Python datetime is a spelling of a different > flavor of timestamp. > > - You tell me that "means" Python is using a naive time concept, and wasn't > intended to implement model A. > > Can you see why I'm baffled? If it needs to explained, it's even more > baffling to me. So here goes anyway: Model A uses a very similar > concept. Not identical, because: > > - The Unix timestamp takes 1970-1-1 as its epoch, while Python's takes 1-1-1. > They nevertheless use exactly the same proleptic calendar system. > > - The Unix timestamp counts seconds, but Python's counts microseconds (on > a platform where time_t is a floating type, a Unix timestamp can approximate > decimal microseconds too, as fractions of a second). > > - The resolution and range of a Unix timestamp vary across platforms, but Python > defines both. Right, but (as you know) those are all incidental to the actual distinction I was trying to make. > Where's a theoretically _significant_ difference? It's simply not > true that viewing datetimes as timestamps has anything to do with > drawing a distinction between your models A and B. The key difference is that a Unix timestamp defines a single instant in "real time" (or the UTC approximation of "real time," which is good enough), because the Unix epoch is defined to be in UTC. The point of even _having_ representations in other timezones (under Model A) is never to change that basic "real monotonic time" model, it's solely to get or parse a representation for the sake of a human (or some other computer system) living naively in that timezone. A Python datetime "timestamp," on the other hand, is "naive" or "timezone-relative." It doesn't define a single instant in real time until you pair it with an offset. The timestamp itself is timezone-relative (it's "the number of microseconds since datetime(1, 1, 1) in naive local time in whatever timezone we're currently in"). That's why doing integer arithmetic on this kind of timestamp does classic arithmetic instead of timeline arithmetic. That's a Model B understanding of what a non-UTC aware datetime represents. > An implementation of model A may or may not explicitly store the Unix > timestamp it has in mind. From your statement that under model A it's > a "complexified" spelling of a Unix timestamp, I have to assume you > have in mind implementations where it's not explicitly stored. In > which case it's exactly the same as in Python today: to _find_ that > Unix timestamp, you need to convert your complexified spelling to UTC > first. I intentionally didn't specify any implementation. In outlining the difference between Model A and Model B, I'm not concerned about implementation details; I'm concerned about the mental model of what an "aware datetime" represents (and thus what invariants you can expect it to keep once you grasp the model.) I think Model A and Model B do represent clear alternative mental models in that respect (regardless of how they are implemented, and what e.g. speed/size tradeoffs that may involve). In Model A, an aware datetime is always a single unambiguous instant in time (that is, isomorphic to UTC), and that alone tells you a lot about how to expect it to behave in terms of arithmetic, equality, etc (or even in "being stored across a zoneinfo update"). In Model B, an aware datetime is a local-clock time annotated with a timezone, and that gives you a different set of consistent expectations about how it should behave. > Perhaps the distinction you have in mind is that, under Model A, it's > impossible to think of an aware datetime as being anything _other_ > than a Unix timestamp? Yes, that's basically right. If you're working in Model A and you want to work in "local clock time", you strip off the timezone information and use an object representing simple naive clock time, with no timezone awareness at all. > That may have been what your "nothing more" > meant. Then, yes, there is that difference: Python doesn't intend to > force any specific interpretation of its timestamps beyond that > they're instants in the proleptic Gregorian calendar. According to my use of the term (which I borrowed from J/NodaTime) datetime's "timestamps" aren't really "instants" at all, in the sense that they don't (alone) tell you when something occurred in the real world (which is another way of saying that they don't map isomorphically to UTC, or any other monotonic representation of time). They represent a point in the (abstract) proleptic Gregorian calendar, which only represents an instant once paired with a UTC offset. > Model A also > views them as instants in the proleptic Gregorian calendar, but tacks > on "and that calendar must always be viewed as being in (a proleptic > extension of an approximation to) UTC". I think I understand what you mean here. I would say that both Model A and Model B have an equally opinionated interpretation of what an aware datetime represents, but it's true that Model A's interpretation requires it to carry enough information (in some form) to always be isomorphic to UTC, whereas Model B doesn't require it to carry that much information. What Python actually _does_ is a bit more muddled, as we've both said many times, because sometimes it acts like Model B (intra-zone) and sometimes like Model A (inter-zone). I think that's unfortunate, because it results in arithmetic and ordering inconsistencies, and headaches like the ones you're having with PEP 495. But I've accepted that Python wants _more_ to be Model B than Model A, so it's best to just discourage use of the "magic" interzone operations and be consistently Model B everywhere else, rather than finding a way (like my earlier "strict tzinfo" proposal tried to) to arrive at an implementation that's consistently Model A. > So maybe I understand you now after all. But, if so, are these kinds > of seeming disagreements really worth resolving? It requires a > seemingly unreasonable amount of time & effort to arrive at the > obvious ;-) Well, perhaps all of this was always obvious to you, in which case I do apologize for wasting so much of your time! But it _seemed_ to me that we had proponents of both Model A and Model B in this mailing list, almost entirely talking past each other, and that trying to outline how each one is a consistent and usable model on its own terms might help proponents of both to at least understand the other better. It helped me understand the benefits of Model B, anyway. I'm curious if it made any sense to Chris, if he's still following this thread. I'm still hopeful of leveraging that understanding into something useful for the docs. Sorry if it didn't help you :/ I certainly don't want to keep wasting your time, so I'm happy to leave it here. Thanks for the discussion; it's been useful to me, and I appreciate your time. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Mon Sep 7 21:42:40 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 7 Sep 2015 13:42:40 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: <55EDE8B0.4020103@oddbird.net> On 09/07/2015 01:04 PM, Guido van Rossum wrote: > Again, I can't follow this because I don't recall the definition of > model A. But here's a fundamental difference between a timezone-aware > datetime and a POSIX stamp (apart from epoch, range and precision). The > difference applies only to "political" timezones, which may change > offsets or DST rules. The difference is that an aware datetime says "in > timezone Z, when the local clock says T". If T is in the future, > politicians may change the mapping of T to UTC in Z. However, politics > can't change the meaning of a POSIX timestamp. Even for T in the > (distant) past the mapping may still change, when research finds that > the rules for Z were different at some year in the past than they were > presumed. So, to me, an aware datetime *fundamentally* differs from a > POSIX timestamp, and even from a pair composed of a POSIX timestamp plus > a tzinfo object. (POSIX timestamps are however embeddable in datetimes > by using a fixed-offset tzinfo.) Yes, that's a great description of the precise difference that I've been trying to describe. Thanks. (In an attempt to use totally value-neutral terms, I called the "POSIX timestamp" model "Model A" and the "clock time plus a timezone" -- what a Python aware datetime is -- "Model B". That probably just introduced even more confusion.) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Mon Sep 7 22:04:58 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 15:04:58 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: [Guido] > Again, I can't follow this because I don't recall the definition of model A. Pretty much that an aware datetime is exactly and only a spelling of a POSIX timestamp. Various things follow from that, such that timeline arithmetic is overwhelmingly most natural in that model. > But here's a fundamental difference between a timezone-aware datetime and a > POSIX stamp (apart from epoch, range and precision). The difference applies > only to "political" timezones, which may change offsets or DST rules. The > difference is that an aware datetime says "in timezone Z, when the local > clock says T". If T is in the future, politicians may change the mapping of > T to UTC in Z. However, politics can't change the meaning of a POSIX > timestamp. Even for T in the (distant) past the mapping may still change, > when research finds that the rules for Z were different at some year in the > past than they were presumed. So, to me, an aware datetime *fundamentally* > differs from a POSIX timestamp, and even from a pair composed of a POSIX > timestamp plus a tzinfo object. The last is unclear to me, unless it's a conceptual distinction with no visible consequences. An aware datetime _is_ a pair, and there's a natural bijection between naive datetimes and POSIX timestamps (across all instants both can represent). That a time_t is "in UTC" is as inconsequential for this purpose as that to compute 3+1 I happen to have 3 turtles in mind rather than the distance in meters to my refrigerator ;-) I do see that it's useless conceptual baggage (even potentially misleading) to drag UTC into it at all. > (POSIX timestamps are however embeddable in datetimes by using a fixed-offset tzinfo.) Or use a naive datetime, for all practical purposes. From carl at oddbird.net Mon Sep 7 22:50:42 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 7 Sep 2015 14:50:42 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: > On Sep 7, 2015, at 2:04 PM, Tim Peters wrote: > > [Guido] >> But here's a fundamental difference between a timezone-aware datetime and a >> POSIX stamp (apart from epoch, range and precision). The difference applies >> only to "political" timezones, which may change offsets or DST rules. The >> difference is that an aware datetime says "in timezone Z, when the local >> clock says T". If T is in the future, politicians may change the mapping of >> T to UTC in Z. However, politics can't change the meaning of a POSIX >> timestamp. Even for T in the (distant) past the mapping may still change, >> when research finds that the rules for Z were different at some year in the >> past than they were presumed. So, to me, an aware datetime *fundamentally* >> differs from a POSIX timestamp, and even from a pair composed of a POSIX >> timestamp plus a tzinfo object. > > The last is unclear to me, unless it's a conceptual distinction with > no visible consequences. A pair is what I've been calling a "model A aware datetime." A pair is what I've been calling a "model B aware datetime." There are many visible differences if you assume that in both cases you do simple integer arithmetic and comparisons on the time stamp component. > An aware datetime _is_ a tzinfo> pair, and there's a natural bijection between naive datetimes > and POSIX timestamps (across all instants both can represent). I don't understand this, and I suspect it's at the heart of our misunderstanding. I would say there are many possible bijections between naive datetimes and posix time stamps, one corresponding to every possible UTC offset. (Or if you allow that a naive datetime may represent a time in a zone with a non fixed offset, there may not be a bijection to posix time stamps at all). How do you decide which one is "natural"? Without the offset, you don't know how to compare a naive datetime to an instant expressed as a posix time stamp, meaning you don't actually know what instant it represents. > That a > time_t is "in UTC" is as inconsequential for this purpose as that to > compute 3+1 I happen to have 3 turtles in mind rather than the > distance in meters to my refrigerator ;-) I do see that it's useless > conceptual baggage (even potentially misleading) to drag UTC into it > at all. > > >> (POSIX timestamps are however embeddable in datetimes by using a fixed-offset tzinfo.) > > Or use a naive datetime, for all practical purposes. > Conceptually, sure, if you're willing to assume an implied fixed offset timezone. "For all practical purposes," no, because the _practical_ purpose of a model A tz-aware datetime is to always be able to easily and unambiguously ask it "how do you spell yourself in timezone X." Carl From guido at python.org Mon Sep 7 23:04:19 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Sep 2015 14:04:19 -0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: On Mon, Sep 7, 2015 at 1:04 PM, Tim Peters wrote: > [Guido] > > Again, I can't follow this because I don't recall the definition of > model A. > > Pretty much that an aware datetime is exactly and only a spelling of a > POSIX timestamp. Various things follow from that, such that timeline > arithmetic is overwhelmingly most natural in that model. > OK. I'll just remember "model A bad, model B good." :-) Or, perhaps more fairly, "model A is how pytz thinks, model B is how the stdlib thinks." > > But here's a fundamental difference between a timezone-aware datetime > and a > > POSIX stamp (apart from epoch, range and precision). The difference > applies > > only to "political" timezones, which may change offsets or DST rules. The > > difference is that an aware datetime says "in timezone Z, when the local > > clock says T". If T is in the future, politicians may change the mapping > of > > T to UTC in Z. However, politics can't change the meaning of a POSIX > > timestamp. Even for T in the (distant) past the mapping may still change, > > when research finds that the rules for Z were different at some year in > the > > past than they were presumed. So, to me, an aware datetime > *fundamentally* > > differs from a POSIX timestamp, and even from a pair composed of a POSIX > > timestamp plus a tzinfo object. > > The last is unclear to me, unless it's a conceptual distinction with > no visible consequences. An aware datetime _is_ a tzinfo> pair, and there's a natural bijection between naive datetimes > and POSIX timestamps (across all instants both can represent). That a > time_t is "in UTC" is as inconsequential for this purpose as that to > compute 3+1 I happen to have 3 turtles in mind rather than the > distance in meters to my refrigerator ;-) I do see that it's useless > conceptual baggage (even potentially misleading) to drag UTC into it > at all. > OK, you nerd-sniped me. :-) In my view it *is* important that a time_t references UTC. Using a time_t to store a non-UTC timestamp feels as wrong to me as using it to store a number of turtles (even though I know there is code that does this). OTOH a naive timestamp does not have this prejudice towards UTC -- it *could* refer to UTC (e.g. when it's returned from utcnow()) or to local time (e.g. from now()) or to some other timezone that is only inferred from the context. (A struct tm also doesn't have this prejudice to me.) Anyways, when I say "a (POSIX timestamp, tzinfo) tuple", the way I think of it is that when I ask "what does the local clock say" this uses a mapping from POSIX timestamp to that tzinfo. But when I say "a (naive datetime, tzinfo) tuple", I assume the naive datetime to be what the local clock says, so the tzinfo is only needed when I ask "what time is it in another timezone". Next, whatever the future of UTC relative to TAI or other time standards, I expect that UTC will continue to approximate mean solar time somewhere in Greenwich(*), and I expect that the vast majority of other timezones will continue to be defined in terms of offsets from UTC (and typically in whole hours). But I expect that the exact definition of many local timezones will continue to be modified by local politicians, and as a consequence I cannot be *sure* what UTC will be at noon on June 3rd 2020 in the US/Eastern timezone. But I *can* be (tautologically) sure what the local clock will say: 12:00:00. And what I intend by all this is that when I pickle or otherwise persist that particular datetime, I want to be sure that it records the naive local time and the timezone, not the UTC time and the timezone. (Also, I want it to record the timezone in a way that if I unpickle it years from now, it will reference the US/Eastern timezone as it is defined at that time -- I don't want it to reference a copy of the timezone rules at the time I pickled it. This is similar to how globals such as classes and functions are pickled by reference.) I should also mention that this only matters when you persist an aware datetime and restore it later. I don't think we should worry about timezone definitions to be mutable within a process (though if processes were to have expected lifetimes measured in years you might have to worry about this -- but that worry is derived from more general worries about software upgrades over such timescales). > > (POSIX timestamps are however embeddable in datetimes by using a > fixed-offset tzinfo.) > > Or use a naive datetime, for all practical purposes. > As long as the naive datetime is specified in UTC. :-) (*) I visited the Royal Observatory this summer, and learned that there are a number of different competing meridians. It's fascinating to realize that as early as the 19th century astronomers cared about the location of their telescopes to within meters: https://en.wikipedia.org/wiki/United_Kingdom_Ordnance_Survey_Zero_Meridian . -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Sep 7 23:44:39 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 16:44:39 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: [Tim] >> An aware datetime _is_ a > tzinfo> pair, and there's a natural bijection between naive datetimes >> and POSIX timestamps (across all instants both can represent). [Carl] > I don't understand this, and I suspect it's at the heart of our > misunderstanding. I would say there are many possible bijections .... "Natural" bijection. I gave you very simple Python code implementing that bijection already. A naive datetime represents an instant in the proleptic Gregorian calendar. So does a POSIX timestamp. In POSIX, the relationship between a timestamp and calendar notation is defined by the C expression ("/" is truncating integer division): timestamp = tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86400 + (tm_year-70)*31536000 + ((tm_year-69)/4)*86400 - ((tm_year-1)/100)*86400 + ((tm_year+299)/400)*86400 The natural bijection, between naive datetimes and POSIX timestamps, is the bijection in which a naive datetime maps to/from the POSIX timestamp such that the naive datetime's calendar notation is exactly equal to the POSIX calendar notation corresponding to that POSIX timestamp as defined by the expression above. Any other bijection is strained in comparison, hence "unnatural". Natural doesn't necessarily mean unique (although it does in this specific case - there is only one bijection satisfying the above); "natural" is more related to Occam's Razor ;-) ... >>> (POSIX timestamps are however embeddable in datetimes by using a fixed-offset tzinfo.) >> Or use a naive datetime, for all practical purposes. > Conceptually, sure, if you're willing to assume an implied > fixed offset timezone. "For all practical purposes," no, because > the _practical_ purpose of a model A tz-aware datetime is > to always be able to easily and unambiguously ask it "how > do you spell yourself in timezone X." Guido wasn't talking about any of that, and neither was I. He was talking about "embedding". That's passive with respect to thing being embedded. Of course it's possible to "embed" a POSIX timestamp in a naive datetime - for the purpose of being embedded, it's just a frickin' integer ;-) From carl at oddbird.net Tue Sep 8 01:52:28 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 7 Sep 2015 17:52:28 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: <55EE233C.1020307@oddbird.net> On 09/07/2015 03:44 PM, Tim Peters wrote: > [Tim] >>> An aware datetime _is_ a >> tzinfo> pair, and there's a natural bijection between naive datetimes >>> and POSIX timestamps (across all instants both can represent). > [Carl] >> I don't understand this, and I suspect it's at the heart of our >> misunderstanding. I would say there are many possible bijections .... [Tim] > "Natural" bijection. I gave you very simple Python code implementing > that bijection already. A naive datetime represents an instant in the > proleptic Gregorian calendar. What is your definition of "instant" here? I don't think a naive datetime represents an instant at all; it represents a range of possible instants, depending which timezone that naive datetime is interpreted in. Without an offset, who knows which instant it might represent. > So does a POSIX timestamp. In POSIX, > the relationship between a timestamp and calendar notation is defined > by the C expression ("/" is truncating integer division): > > timestamp = tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86400 + > (tm_year-70)*31536000 + ((tm_year-69)/4)*86400 - > ((tm_year-1)/100)*86400 + ((tm_year+299)/400)*86400 > > The natural bijection, between naive datetimes and POSIX timestamps, > is the bijection in which a naive datetime maps to/from the POSIX > timestamp such that > > the naive datetime's calendar notation > is exactly equal to > the POSIX calendar notation > corresponding to that POSIX timestamp > as defined by the expression above. > > Any other bijection is strained in comparison, hence "unnatural". > Natural doesn't necessarily mean unique (although it does in this > specific case - there is only one bijection satisfying the above); > "natural" is more related to Occam's Razor ;-) Ok, sure, because POSIX is defined in terms of the Gregorian calendar in UTC, if you have(for some reason) _must_ compare a naive datetime to a POSIX timestamp, it's simplest to assume the naive datetime is also in UTC, so that their Gregorian calendars line up with no offset. I buy that's "most natural" of the available bijections in some sense, but I'm missing the "so what?" Under what circumstances is it reasonable to make that assumption about a naive datetime? Rather than saying "a naive datetime simply doesn't correspond to any particular POSIX timestamp; they aren't comparable at all unless you have additional information," which is what I'd say. I mean, I certainly hope you wouldn't want datetime to make `utcdt - naivedt` a defined operation where it's assumed the naive datetime is UTC. [Guido] >>>> (POSIX timestamps are however embeddable in datetimes by using a fixed-offset tzinfo.) [Tim] >>> Or use a naive datetime, for all practical purposes. [Carl] >> Conceptually, sure, if you're willing to assume an implied >> fixed offset timezone. "For all practical purposes," no, because >> the _practical_ purpose of a model A tz-aware datetime is >> to always be able to easily and unambiguously ask it "how >> do you spell yourself in timezone X." [Tim] > Guido wasn't talking about any of that, and neither was I. He was > talking about "embedding". That's passive with respect to thing being > embedded. Of course it's possible to "embed" a POSIX timestamp in a > naive datetime - for the purpose of being embedded, it's just a > frickin' integer ;-) Yes, of course. Sorry, I missed the context. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Tue Sep 8 02:00:01 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 7 Sep 2015 18:00:01 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: <55EE2501.6090901@oddbird.net> [Guido] > OK. I'll just remember "model A bad, model B good." :-) Fine by me. :-) > Or, perhaps more fairly, "model A is how pytz thinks, model B is how the > stdlib thinks." We'd be in better shape if it were that simple. pytz is strictly model A. Unfortunately the stdlib isn't consistent in how it thinks (short version: because __hash__ and cross-timezone equality and arithmetic implicitly treat aware datetimes as if they were unambiguous model A instants, when they aren't), and that's the root of all the difficulty with PEP 495. (I can give a longer explanation of _why_ that causes difficulty with PEP 495 if you want it, or you can go back and read the last few threads in detail, or you can just wait for the PEP :-) ). Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Tue Sep 8 03:54:54 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 20:54:54 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55EE233C.1020307@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> <55EE233C.1020307@oddbird.net> Message-ID: [Tim] >>>> An aware datetime _is_ a >>> tzinfo> pair, and there's a natural bijection between naive datetimes >>>> and POSIX timestamps (across all instants both can represent). [Carl] >>> I don't understand this, and I suspect it's at the heart of our >>> misunderstanding. I would say there are many possible bijections .... [Tim] >> "Natural" bijection. I gave you very simple Python code implementing >> that bijection already. A naive datetime represents an instant in the >> proleptic Gregorian calendar. [Carl] > What is your definition of "instant" here? I didn't need one - Occam's Razor again ;-) To establish a bijection, all that's required is to show that a proposed function meets all the formal requirements. I couldn't care less whether it does or doesn't fit in with anyone's mental model, including my own. "Represents an instance" was just vague English motivation for what followed. The bijection was wholly defined by the latter, and never mentioned "instant". If it meets what someone _wants_ to think "an instant" means. fine; if not, also fine. Whether a proposed function is in fact a bijection has nothing to do with anyone's opinion of what "an instant" means, should mean, or must not mean. But if you can't leave that alone, here: by "an instant in the proleptic Gregorian calendar", I mean any 5-tuple of integers that meets the defined (by POSIX) requirements for a valid struct tm's tm_sec, tm_min, tm_hour, tm_yday. and tm_year members. > I don't think a naive datetime represents an instant at all; Fine by me - and by Python. Also fine if you _never_ use a naive datetime. > it represents a range of possible instants, Heh - I see you haven't defined what _you_ mean by "instant". When you do, please be sure it's consistent with what POSIX says here too: The relationship between the actual time of day and the current value for seconds since the Epoch is unspecified. How any changes to the value of seconds since the Epoch are made to align to a desired relationship with the current actual time is implementation-defined. As represented in seconds since the Epoch, each and every day shall be accounted for by exactly 86400 seconds. While you're at it, define a clean model in which all that makes a lick of sense to a casual user ;-) > depending which timezone that naive datetime is interpreted > in. Without an offset, who knows which instant it might represent. I understand much of it is at odds with Model A. I also understand that some datetime libraries for other languages supply different types for different purposes. That's fine by me too But we're on a Python datetime mailing list, so in the absence of explicit statements to the contrary, it makes most sense here to assume Python's datetime is being discussed on its own terms. >> So does a POSIX timestamp. In POSIX, >> the relationship between a timestamp and calendar notation is defined >> by the C expression ("/" is truncating integer division): >> >> timestamp = tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86400 + >> (tm_year-70)*31536000 + ((tm_year-69)/4)*86400 - >> ((tm_year-1)/100)*86400 + ((tm_year+299)/400)*86400 >> >> The natural bijection, between naive datetimes and POSIX timestamps, >> is the bijection in which a naive datetime maps to/from the POSIX >> timestamp such that >> >> the naive datetime's calendar notation >> is exactly equal to >> the POSIX calendar notation >> corresponding to that POSIX timestamp >> as defined by the expression above. >> >> Any other bijection is strained in comparison, hence "unnatural". >> Natural doesn't necessarily mean unique (although it does in this >> specific case - there is only one bijection satisfying the above); >> "natural" is more related to Occam's Razor ;-) > Ok, sure, because POSIX is defined in terms of the Gregorian calendar in > UTC, if you have(for some reason) _must_ compare a naive datetime to a > POSIX timestamp, it's simplest to assume the naive datetime is also in > UTC, so that their Gregorian calendars line up with no offset. It does happen to be an order-preserving bijection. But I said nothing in the quote about comparing anything apart from comparing pairs of integers (not timestamps, and not datetimes - just the little integers in the two calendar notations) for equality. > I buy that's "most natural" of the available bijections in some sense, but I'm > missing the "so what?" The "so what?", in context, was to tweak Guido about saying an aware datetime is fundamentally different from a pair, despite that the space of such pairs is isomorphic to the space of aware datetimes (which _is_ the space of pairs) under the natural naive_datetime <-> timestamp bijection. Why is that setting _you_ off? Guido handled it just fine ;-) > Under what circumstances is it reasonable to make that assumption > about a naive datetime? Any use case where it's convenient That's up to the user. not me - or you. For example, before Python grew its builtin datetime.timezone.utc implementation of a UTC class, I routinely used naive datetimes I thought of as being in UTC. I was too lazy to remember where I hid my own UTC class. No problem. > Rather than saying "a naive datetime simply doesn't correspond to > any particular POSIX timestamp; they aren't comparable at all unless > you have additional information," which is what I'd say. I'm starting to suspect you didn't design datetime ;-) In context, I was replying to Guido, who was talking about Python. In Python's datetime, naive datetimes are comparable. Naive time has no _concept_ of time zone. Naive datetimes nevertheless have a notion of total order, which is isomorphic to the POSIX timestamp notion of total order under the natural bijection. Likewise for arithmetic, etc. There's nothing "wrong" about exploiting any of that when it's convenient. > I mean, I certainly hope you wouldn't want datetime to make `utcdt - > naivedt` a defined operation where it's assumed the naive datetime is UTC. Certainly not. That _would_ be wrong ;-) From alexander.belopolsky at gmail.com Tue Sep 8 03:57:12 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 7 Sep 2015 21:57:12 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve Message-ID: The good news that other than a few editorial changes there is only one issue which keeps me from declaring PEP 495 complete. The bad news is that the remaining issue is subtle and while several solutions have been proposed, neither stands out as an obviously right. The Problem ----------- PEP 495 requires that the value of the fold attribute is ignored when two aware datetime objects that share tzinfo are compared. This is motivated by the reasons of backward compatibility: we want the value of fold to only matter in conversions from one zone to another and not in arithmetic within a single timezone. As Tim pointed out, this rule is in conflict with the only requirement that a hash function must satisfy: if two objects compare as equal, their hashes should be equal as well. Let t0 and t1 be two times in the fold that differ only by the value of their fold attribute: t0.fold == 0, t1.fold == 1. Let u0 = t0.astimezone(utc) and u1 = t1.astimezone(t1). PEP 495 requires that u0 < u1. (In fact, this is the main purpose of the PEP to disambiguate between t0 and t1 so that conversion to UTC is well defined.) However, by the current PEP 495 rules, t0 == t1 is True, by the pre-PEP rule (and the PEP rule that fold is ignored in comparisons) we also have t0 == u0 and t1 == u1. So, we have (a) a violation of the transitivity of ==: u0 == t0 == t1 == u1 does not imply u0 == u1 which is bad enough by itself, and (b) since hash(u0) can be equal to hash(u1) only by a lucky coincidence, the rule "equality of objects implies equality of hashes" leads to contradiction because applying it to the chain u0 == t0 == t1 == u1, we get hash(u0) == hash(t0) == hash(t1) == hash(u1) which is now a chain of equalities of integers and on integers == is transitive, so we have hash(u0) == hash(u1) which as we said can only happen by a lucky coincidence. The Root of the Problem ----------------------- The rules of arithmetic on aware datetime objects already cause some basic mathematical identities to break. The problem described above is avoided by not having a way to represent u1 in the timezone where u0 and u1 map to the same local time. We still have a surprising u0 < u1, but u0.astimezone(local) == u1.astimezone(local), but it does not rise to the level of a hash invariant violation because u0.astimezone(local) and u1.astimezone(local) are not only equal: they are identical in all other ways and if we convert them back to UTC - they both convert to u0. The root of the hash problem is not in the t0 == t1 is True rule. It is in u0 == t0. The later equality is just too fragile: if you add timedelta(hour=1) to both sides to this equation, then (assuming an ordinary 1 hour fall-back fold), you will get two datetime objects that are no longer equal. (Indeed, local to utc equality t == u is defined as t - t.utcoffset() == u.replace(tzinfo=t.tzinfo), but when you add 1 hour to t0, utcoffset() changes so the equality that held for t0 and u0 will no longer hold for t0 + timedelta(hour=1) and u0 + timedelta(hour=1).) PEP 495 gives us a way to break the u0 == t0 equality by replacing t0 with an "equal" object t1 and simultaneously have u0 == t0, t0 == t1 and t1 != u0. The Solutions ------------- Tim suggested several solutions to this problem, but by his own admission neither is more than "grudgingly acceptable." For completeness, I will also present my "non-solution." Solution 0: Ignore the problem. Since PEP 495 does not by itself introduce any tzinfo implementations with variable utcoffset(), it does not create a hash invariant violation. I call this a non-solution because it would once again punt an unsolvable problem to tzinfo implementors. It is unsolvable for *them* because without some variant of the rejected PEP 500, they will have no control over datetime comparisons or hashing. Solution 1: Make t1 > t0. Solution 2: Leave t1 == t0, but make t1 != u1. Request for Comments -------------------- I will not discuss pros and cons on the two solutions because my goal here was only to state the problem, identify the root case and indicate the possible solutions. Those interested in details can read Tim's excellent explanations in the "Another round on error-checking" [1] and "Another approach to 495's glitches" [2] threads. I "bcc" python-dev in a hope that someone in the expanded forum will either say "of course solution N is the right one and here is why" or "here is an obviously right solution - how could you guys miss it." [1]: https://mail.python.org/pipermail/datetime-sig/2015-September/000622.html [2]: https://mail.python.org/pipermail/datetime-sig/2015-September/000716.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 8 04:25:28 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 21:25:28 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55EE2501.6090901@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> <55EE2501.6090901@oddbird.net> Message-ID: [Guido] >> OK. I'll just remember "model A bad, model B good." :-) [Carl] > Fine by me. :-) That's the spirit! We'll have you chugging Dutch Kool-Aid yet ;-) >> Or, perhaps more fairly, "model A is how pytz thinks, model B is how the >> stdlib thinks." > We'd be in better shape if it were that simple. pytz is strictly model > A. Unfortunately the stdlib isn't consistent in how it thinks (short > version: because __hash__ and cross-timezone equality and arithmetic > implicitly treat aware datetimes as if they were unambiguous model A > instants, when they aren't), and that's the root of all the difficulty > with PEP 495. > > (I can give a longer explanation of _why_ that causes difficulty with > PEP 495 if you want it, or you can go back and read the last few threads > in detail, or you can just wait for the PEP :-) ). Time for just the "high-order bits" again (for Guido): Last time we left off with "End of problems. Start of new problems.". You can just repeat that now. The new problems turned out to be even uglier than the earlier problems. So after going from "ignore fold as much as possible" to "pay attention to it as much as possible", we're back to "ignore it as much as possible" again. The real pain remaining is that we'd love to ignore it in interzone by-magic subtraction and comparison too, but doing so would break a weak form of backward compatibility: interzone code that already works fine would continue to work fine, but after `fold` started showing up may well no longer compute the _intended_ results in fold=1 cases. Alex made a good case for why such code may actually exist, and for why this would be a real regression for such code's intended purposes. So the best idea now is to special-case the snot out of fold==1 only in interzone __eq__ and __ne__, to say that any datetime with fold=1 is "not equal" to any datetime in any other zone. That hackery is to squash the return of "the hash problem" (without needing an insanely delicate hash() implementation). This causes annoying special-case warts in current by-magic interzone operations. For example, cross-zone comparison trichotomy could fail: if x.fold==1 and y is in a different zone, none of xy would be true. Best guess is that's of little consequence, but it's ugly. So, if your time machine is gassed up and ready to go, just remove by-magic interzone comparison and subtraction before they were added. Thanks! PEP 495 could be a delight then :-) From guido at python.org Tue Sep 8 06:21:29 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Sep 2015 21:21:29 -0700 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: Maybe I should just reject PEP 495 in disgust. :-) I think #2 is the only reasonable solution (of these three). Of all the existing semantics we're trying to preserve, I find interzone comparison the unholiest. (With the possible exceptions of the case where both zones are known to be forever-fixed-offset, such as datetime.timezone instances and pytz.utc, and even possibly the fixed-offset zones that pytz returns from localize(). How exactly we're going to recognize those is a different question, though I have an opinion there too.) On Mon, Sep 7, 2015 at 6:57 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > The good news that other than a few editorial changes there is only one > issue which keeps me from declaring PEP 495 complete. The bad news is that > the remaining issue is subtle and while several solutions have been > proposed, neither stands out as an obviously right. > > The Problem > ----------- > > PEP 495 requires that the value of the fold attribute is ignored when two > aware datetime objects that share tzinfo are compared. This is motivated > by the reasons of backward compatibility: we want the value of fold to only > matter in conversions from one zone to another and not in arithmetic within > a single timezone. > > As Tim pointed out, this rule is in conflict with the only requirement > that a hash function must satisfy: if two objects compare as equal, their > hashes should be equal as well. > > Let t0 and t1 be two times in the fold that differ only by the value of > their fold attribute: t0.fold == 0, t1.fold == 1. Let u0 = > t0.astimezone(utc) and u1 = t1.astimezone(t1). PEP 495 requires that u0 < > u1. (In fact, this is the main purpose of the PEP to disambiguate between > t0 and t1 so that conversion to UTC is well defined.) However, by the > current PEP 495 rules, t0 == t1 is True, by the pre-PEP rule (and the PEP > rule that fold is ignored in comparisons) we also have t0 == u0 and t1 == > u1. So, we have (a) a violation of the transitivity of ==: u0 == t0 == t1 > == u1 does not imply u0 == u1 which is bad enough by itself, and (b) since > hash(u0) can be equal to hash(u1) only by a lucky coincidence, the rule > "equality of objects implies equality of hashes" leads to contradiction > because applying it to the chain u0 == t0 == t1 == u1, we get hash(u0) == > hash(t0) == hash(t1) == hash(u1) which is now a chain of equalities of > integers and on integers == is transitive, so we have hash(u0) == hash(u1) > which as we said can only happen by a lucky coincidence. > > > The Root of the Problem > ----------------------- > > The rules of arithmetic on aware datetime objects already cause some basic > mathematical identities to break. The problem described above is avoided > by not having a way to represent u1 in the timezone where u0 and u1 map to > the same local time. We still have a surprising u0 < u1, but > u0.astimezone(local) == u1.astimezone(local), but it does not rise to the > level of a hash invariant violation because u0.astimezone(local) and > u1.astimezone(local) are not only equal: they are identical in all other > ways and if we convert them back to UTC - they both convert to u0. > > The root of the hash problem is not in the t0 == t1 is True rule. It is > in u0 == t0. The later equality is just too fragile: if you add > timedelta(hour=1) to both sides to this equation, then (assuming an > ordinary 1 hour fall-back fold), you will get two datetime objects that are > no longer equal. (Indeed, local to utc equality t == u is defined as t - > t.utcoffset() == u.replace(tzinfo=t.tzinfo), but when you add 1 hour to t0, > utcoffset() changes so the equality that held for t0 and u0 will no longer > hold for t0 + timedelta(hour=1) and u0 + timedelta(hour=1).) > > PEP 495 gives us a way to break the u0 == t0 equality by replacing t0 with > an "equal" object t1 and simultaneously have u0 == t0, t0 == t1 and t1 != > u0. > > > The Solutions > ------------- > > Tim suggested several solutions to this problem, but by his own admission > neither is more than "grudgingly acceptable." For completeness, I will > also present my "non-solution." > > Solution 0: Ignore the problem. Since PEP 495 does not by itself > introduce any tzinfo implementations with variable utcoffset(), it does not > create a hash invariant violation. I call this a non-solution because it > would once again punt an unsolvable problem to tzinfo implementors. It is > unsolvable for *them* because without some variant of the rejected PEP 500, > they will have no control over datetime comparisons or hashing. > > Solution 1: Make t1 > t0. > > Solution 2: Leave t1 == t0, but make t1 != u1. > > > Request for Comments > -------------------- > > I will not discuss pros and cons on the two solutions because my goal here > was only to state the problem, identify the root case and indicate the > possible solutions. Those interested in details can read Tim's excellent > explanations in the "Another round on error-checking" [1] and "Another > approach to 495's glitches" [2] threads. > > I "bcc" python-dev in a hope that someone in the expanded forum will > either say "of course solution N is the right one and here is why" or "here > is an obviously right solution - how could you guys miss it." > > > [1]: > https://mail.python.org/pipermail/datetime-sig/2015-September/000622.html > [2]: > https://mail.python.org/pipermail/datetime-sig/2015-September/000716.html > > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 8 06:43:01 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Sep 2015 21:43:01 -0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> <55EE2501.6090901@oddbird.net> Message-ID: A bit of levity: http://penny-arcade.com/comic/2015/09/07/the-twain --Guido -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartbishop.net Tue Sep 8 06:48:53 2015 From: stuart at stuartbishop.net (Stuart Bishop) Date: Tue, 8 Sep 2015 11:48:53 +0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: Message-ID: On 4 September 2015 at 23:01, Chris Barker wrote: > I would like a flag on datetime, but it seems it might be better to put that > flag on a tzinfo object. But the implementation is the something to argue > about only if there is any chance of doing it at all. I would still lean towards a separate datetimetz class, but that is just semantics. > Also, particularly as PEP 495 will introduce changes to tzinfo, that will > presumable lead to changes in tzinfo implementations (like pytz, etc), it > seems that if other changes are afoot, now is a good time to map out how > they should be done. > > Stuart, if you are listening: > > IIUC, you want "timeline" arithmetic to work with pytz tzinfo-aware > datetimes. To the extent that the current implementation functions in a > maybe "hacky", and at least inconvenient, way to achieve this. > > So you are an obvious person to say what we might put in the stdlib that > would facilitate cleaning all that up. If anything. > > BTW: I'll at least take it as a given that we're not breaking backward > compatibility, and that arithmetic needs to stay as fast as it currently is > -- at least in the cases where it currently works. To clean up pytz's interface and allow it to easily bolt on timeline arithmetic to the existing datetime library, I need two hooks to replace calls to tzinfo.localize() and tzinfo.normalize(). When a user does datetime.datetime(2000, 10, 9, 8, 7, 6, tzinfo=pytz.timezone('US/Eastern'), a method on the tzinfo needs to be invoked that returns the real tzinfo to be used for that datetime (ie. the tzinfo instance for Oct 2000, not the default one for January 1878). When arithmetic has been performed, a method on the resulting tzinfo needs to be invoked that returns a datetime containing the real, adjusted result. These hooks are entirely separate to PEP-495 AFAICT. PEP-495 doesn't help pytz the library much. It should help pytz users though, as most use cases can stop using pytz and switch to using stdlib. -- Stuart Bishop http://www.stuartbishop.net/ From tim.peters at gmail.com Tue Sep 8 06:50:04 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 23:50:04 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Guido] > Maybe I should just reject PEP 495 in disgust. :-) Maybe so :-) > I think #2 is the only reasonable solution (of these three). No argument there either. > Of all the > existing semantics we're trying to preserve, I find interzone comparison the > unholiest. (With the possible exceptions of the case where both zones are > known to be forever-fixed-offset, such as datetime.timezone instances and > pytz.utc, and even possibly the fixed-offset zones that pytz returns from > localize(). How exactly we're going to recognize those is a different > question, though I have an opinion there too.) No real worries about those: if 495 is implemented, there will be two kinds of tzinfos: 1. With pre-495 semantics. Those will never even look at `fold`, let alone set it to 1. 2. With post-495 semantics. .fromutc() is the only tzinfo method that will set `fold`. Any correct implementation of .fromutc() converting to a fixed-offset zone will always set `fold` to 0 in its result, since there are no ambiguous times in a fixed-offset zone. There are two flavors of "solution 2" (which differ in how much they muck with interzone subtraction and/or comparison), but neither of those flavors changes anything about what happens when neither operand has `fold=1`. So the only way by-magic cross-zone subtraction or comparison between fixed-offset zones could cease working exactly as they do today is if the user forces `fold=1` manually. And by "the only way", I mean the only way I just happened to think of ;-) But it's certain a correct 495 .fromutc() could not screw this up. Note that intrazone arithmetic ignores `fold` in the current proposal (classic arithmetic changes in no way, ever), but always forces it to `0` when there's a datetime result. So some stray fold=1 propagating through intrazone datatime arithmetic isn't a concern either. From stuart at stuartbishop.net Tue Sep 8 06:53:43 2015 From: stuart at stuartbishop.net (Stuart Bishop) Date: Tue, 8 Sep 2015 11:53:43 +0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: Message-ID: On 4 September 2015 at 23:39, Tim Peters wrote: > It seems 495 really doesn't do anything for pytz, so I'm not sure > Stuart would bother to implement 495-conforming tzinfos. _Someone_ > will, though. Eventually ;-) I'll do it, but more than happy for someone else to do it first. 3.6 I guess. More support in stdlib means fewer confused pytz users. I still worry that landing real timezones in stdlib will be dropping the pants on datetime, exposing its warts for all to see. -- Stuart Bishop http://www.stuartbishop.net/ From tim.peters at gmail.com Tue Sep 8 06:58:55 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 7 Sep 2015 23:58:55 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Guido] > ... > (With the possible exceptions of the case where both zones are > known to be forever-fixed-offset, such as datetime.timezone instances and > pytz.utc, and even possibly the fixed-offset zones that pytz returns from > localize(). How exactly we're going to recognize those is a different > question, though I have an opinion there too.) BTW, I was looking at what it would take to do a 495-compliant wrapping of zoneinfo. That essentially hands us .fromutc(), but leaves .utcoffset() a puzzle (mktime() all over again). I found what I thought was a very happy solution: when loading the tzfile, it's easy to construct a list of every unique total UTC offset in the zone's history. Order them from most recent to least, and then .utcoffset() would typically need to try no more than the first two to find one where .fromutc() reproduced .utcoffset()'s input. In that scheme, "is this a fixed offset zone?" is the same as asking whether the zone's unique-offsets list is a singleton. That doesn't belong in 495, just noting that the recognition question you raised is dead easy to answer for the most important source of timezone info. From guido at python.org Tue Sep 8 07:44:24 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Sep 2015 22:44:24 -0700 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: No, the question I care about is more like "could politicians change the utc offset", not whether they have done so in the past. So instances of datetime.timezone qualify, as do (I believe) lettered "military" zone names. On Monday, September 7, 2015, Tim Peters wrote: > [Guido] > > ... > > (With the possible exceptions of the case where both zones are > > known to be forever-fixed-offset, such as datetime.timezone instances and > > pytz.utc, and even possibly the fixed-offset zones that pytz returns from > > localize(). How exactly we're going to recognize those is a different > > question, though I have an opinion there too.) > > BTW, I was looking at what it would take to do a 495-compliant > wrapping of zoneinfo. That essentially hands us .fromutc(), but > leaves .utcoffset() a puzzle (mktime() all over again). > > I found what I thought was a very happy solution: when loading the > tzfile, it's easy to construct a list of every unique total UTC offset > in the zone's history. Order them from most recent to least, and then > .utcoffset() would typically need to try no more than the first two to > find one where .fromutc() reproduced .utcoffset()'s input. > > In that scheme, "is this a fixed offset zone?" is the same as asking > whether the zone's unique-offsets list is a singleton. > > That doesn't belong in 495, just noting that the recognition question > you raised is dead easy to answer for the most important source of > timezone info. > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 8 08:10:20 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 01:10:20 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Guido] > No, the question I care about is more like "could politicians change the utc > offset", not whether they have done so in the past. So instances of > datetime.timezone qualify, as do (I believe) lettered "military" zone names. Ah, got it now. No, that's impossible to determine from a tzfile. Yes, the 25 {"A", "B", ... ,"Z"} - {"J"} military zones do (one for each hour offset in -12 through +12 inclusive). The military "J" zone does not (that's whatever local civil zone is implied by context - good luck programming that one ;-) ). In any case, the message before still applies: interzone subtraction and comparison for such zones would continue to work fine after 495, because their .fromutc() would never set `fold` to 1. From alexander.belopolsky at gmail.com Tue Sep 8 09:59:15 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 03:59:15 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Mon, Sep 7, 2015 at 9:57 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > Solution 1: Make t1 > t0. > > Solution 2: Leave t1 == t0, but make t1 != u1. > Solution 3: Leave t1 == t0, but make *both* t0 != u0 and t1 != u1 if t0.utcoffset() != t1.utcoffset(). In other words, def __eq__(self, other): n_self = self.replace(tzinfo=None) n_other = other.replace(tzinfo=None) if self.tzinfo is other.tzinfo: return n_self == n_other u_self = n_self - self.utcoffset() v_self = n_self - self.replace(fold=(1-self.fold)).utcoffset() u_other = n_other - other.utcoffset() v_other = n_other - other.replace(fold=(1-self.fold)).utcoffset() return u_self == u_other == v_self == v_other Before anyone complaints that this makes comparison 4x slower, I note that we can add obvious optimizations for the common tzinfo is datetime.timezone.utc and isinstance(tzinfo, datetime.timezone) cases. Users that truly want to compare aware datetime instances between two variable offset timezones, should realize that fold/gap detection in *both* r.h.s. and l.h.s. zones is part of the operation that they request. This solution has some nice properties compared to the solution 2: (1) it restores the transitivity - we no longer have u0 == t0 == t1 and t1 != u1; (2) it restores the symmetry between fold=0 and fold=1 while preserving a full backward compatibility. I also think this solution makes an intuitive sense: since we cannot decide which of the two UTC times u0 and u1 should belong in the equivalency class of t0 == t1 - neither should. "In the face of ambiguity" and all that. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 8 17:09:59 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Sep 2015 08:09:59 -0700 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 12:59 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Mon, Sep 7, 2015 at 9:57 PM, Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: > >> Solution 1: Make t1 > t0. >> >> Solution 2: Leave t1 == t0, but make t1 != u1. >> > > Solution 3: Leave t1 == t0, but make *both* t0 != u0 and t1 != u1 if > t0.utcoffset() != t1.utcoffset(). > > In other words, > > def __eq__(self, other): > n_self = self.replace(tzinfo=None) > n_other = other.replace(tzinfo=None) > if self.tzinfo is other.tzinfo: > return n_self == n_other > u_self = n_self - self.utcoffset() > v_self = n_self - self.replace(fold=(1-self.fold)).utcoffset() > u_other = n_other - other.utcoffset() > v_other = n_other - other.replace(fold=(1-self.fold)).utcoffset() > return u_self == u_other == v_self == v_other > > Before anyone complaints that this makes comparison 4x slower, I note that > we can add obvious optimizations for the common tzinfo is > datetime.timezone.utc and isinstance(tzinfo, datetime.timezone) cases. > Users that truly want to compare aware datetime instances between two > variable offset timezones, should realize that fold/gap detection in *both* > r.h.s. and l.h.s. zones is part of the operation that they request. > > This solution has some nice properties compared to the solution 2: (1) it > restores the transitivity - we no longer have u0 == t0 == t1 and t1 != u1; > (2) it restores the symmetry between fold=0 and fold=1 while preserving a > full backward compatibility. > > I also think this solution makes an intuitive sense: since we cannot > decide which of the two UTC times u0 and u1 should belong in the > equivalency class of t0 == t1 - neither should. "In the face of ambiguity" > and all that. > But it breaks compatibility: it breaks the rule that for fold=0 nothing changes. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 8 17:46:58 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 11:46:58 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 11:09 AM, Guido van Rossum wrote: > But it breaks compatibility: it breaks the rule that for fold=0 nothing > changes. It preserves a "weak form" of compatibility: nothing changes in the behavior of aware datetime objects unless they use a post-PEP tzinfo. Note that Solution 2 also breaks a "strong form" of compatibility (nothing changes unless fold=1) because pre-PEP tzinfos are supposed to interpret times in the fold as STD (fold=1). Note that in my experience very few tzinfo developers understand this requirement and with a run-of-the-mill tzinfo you have a 50/50 chance that it will interpret ambiguous times as fold=0 or fold=1. Note that PEP 495 in its present form does not promise a "strong form" of compatibility. This is something you wanted to have with fold=-1, but I thought I convinced you that it was not necessary. The current compatibility promise of PEP 495 is that fold attribute is ignored unless it is explicitly checked in tzinfo.utcoffset() and friends implementations. This stays under Solution 2 because u_ and v_ conversions are always the same if utcoffset() ignores the value of fold. Once you decide to use a post-PEP tzinfo, you have no choice but to test your software on the edge cases if you care about them. (And you probably do if you bother to switch to a post-PEP tzinfo.) If you don't care about edge cases, you can continue using pre-PEP tzinfos or switch and accept a more consistent but different edge case behavior. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 8 18:19:59 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 12:19:59 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 11:46 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > it breaks the rule that for fold=0 nothing changes. We may need a new section in the PEP explaining the differences between pre-PEP and post-PEP tzinfo implementations. For example, it is not true that post-PEP utcoffset() will return the same value on a fold=0 instance as a pre-PEP does. The pre-PEP rule is to treat both ambiguous (fold) and missing (gap) times as "standard time". In the typical DST observing timezone that alternated between STD and DST, this means that pre-PEP rule treats fold times as fold=1 and gap times as fold=0. For more complicated situations where you can see two folds or two gaps in a row or a time shift without a DST change (a change in STD offset), no rule is currently specified. The existing rules for fold/gap disambiguation are formulated for a single purpose: to make the generic fromutc() implementation work for the US-style timezones. Since PEP 495 requires that the new tzinfo implementations reimplement their own fromutc(), we decided that we are free to formulate new gap/fold disambiguation rules. The PEP 495 rules are formulated to be more rational than those that were dictated by the fromutc() implementation. For example, defaulting to the first time in the fold seems more natural and a wise choice: it the worst case you will have to kill an hour before the odd time meeting, but you won't miss it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 8 18:41:02 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 11:41:02 -0500 Subject: [Datetime-SIG] Version check (was Re: PEP 495: What's left to resolve) Message-ID: [Alex] > ... > Once you decide to use a post-PEP tzinfo, you have no choice but to test > your software on the edge cases if you care about them. Which reminds me: the PEP should add a way for a post-495 tzinfo to say it supplies post-495 semantics, so users can check whether they're getting a tzinfo they require (if they need fold disambiguation) or can't tolerate (if they need folds to be ignored for legacy reasons). It's not a change to the tzinfo API, but is a change to tzinfo semantics. I guess requiring a new `__version__ = 2` attribute would be OK. Or (preferably "and") add an optional `fold=None` argument to .utcoffset() (by default, use the datetime's .fold attribute, else use the passed value). Then an obscure form of version-checking could be done by seeing whether dt.utcoffset(fold=1) blows up. That's a poor way to spell "check the version", but would at least allow checking to see what would happen if `fold` changed without the expense of creating new short-lived datetime objects. Like: > v_self = n_self - self.replace(fold=(1-self.fold)).utcoffset() becoming: v_self = n_self - self.utcoffset(fold=1-self.fold) It seems the worst way to spell "check the version" is the status quo, where it seems a user would have to contrive a case where `fold` matters. While that's usually an excellent way ("check for the behavior you actually require"), in this case it means the user would have to know too much (e.g., how do they get a tzinfo representing a multi-offset zone to begin with? far as I know, there's no portable way to ask for that - then, even if they solve that, they need to know exactly where to find an ambiguous time in that zone). From tim.peters at gmail.com Tue Sep 8 19:06:00 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 12:06:00 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Alex] >> Solution 1: Make t1 > t0. >> >> Solution 2: Leave t1 == t0, but make t1 != u1. > > > Solution 3: Leave t1 == t0, but make *both* t0 != u0 and t1 != u1 if > t0.utcoffset() != t1.utcoffset(). > > In other words, > > def __eq__(self, other): > n_self = self.replace(tzinfo=None) > n_other = other.replace(tzinfo=None) > if self.tzinfo is other.tzinfo: > return n_self == n_other Well, that's infinite recursion - but I know what you mean ;-) > u_self = n_self - self.utcoffset() > v_self = n_self - self.replace(fold=(1-self.fold)).utcoffset() > u_other = n_other - other.utcoffset() > v_other = n_other - other.replace(fold=(1-self.fold)).utcoffset() > return u_self == u_other == v_self == v_other More infinite recursion. > Before anyone complaints that this makes comparison 4x slower, I don't care about the speed of by-magic interzone comparison, but if someone does I'd say it's only about 2x slower. .utcoffset() is the major expense, and this only doubles the number of those. > I note that we can add obvious optimizations for the common tzinfo is > datetime.timezone.utc and isinstance(tzinfo, datetime.timezone) cases. Please no. Comparison is almost certainly almost always intrazone, and .utcoffset() isn't called at all for intrazone comparisons. > Users that truly want to compare aware datetime instances between two > variable offset timezones, should realize that fold/gap detection in *both* > r.h.s. and l.h.s. zones is part of the operation that they request. > > This solution has some nice properties compared to the solution 2: (1) it > restores the transitivity - we no longer have u0 == t0 == t1 and t1 != u1; > (2) it restores the symmetry between fold=0 and fold=1 while preserving a > full backward compatibility. > > I also think this solution makes an intuitive sense: since we cannot decide > which of the two UTC times u0 and u1 should belong in the equivalency class > of t0 == t1 - neither should. "In the face of ambiguity" and all I do like that this "breaks" interzone comparison only in cases where `fold` actually makes a difference. Certainly more principled and focused than special-casing the snot out of all and only fold=1. But I can never decide whether something really "fixes the hash problem" without a lot more thought. So far, so good :-) From alexander.belopolsky at gmail.com Tue Sep 8 19:38:11 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 13:38:11 -0400 Subject: [Datetime-SIG] Version check (was Re: PEP 495: What's left to resolve) In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 12:41 PM, Tim Peters wrote: > [Alex] > > ... > > Once you decide to use a post-PEP tzinfo, you have no choice but to test > > your software on the edge cases if you care about them. > > Which reminds me: the PEP should add a way for a post-495 tzinfo to > say it supplies post-495 semantics, so users can check whether they're > getting a tzinfo they require (if they need fold disambiguation) or > can't tolerate (if they need folds to be ignored for legacy reasons). > We may end up providing something like this, but I hope developing this mechanism can be left to the tzinfo implementers. (Which can as well be us, but in another PEP.) I am not sure a tzinfo object will need a persistent attribute rather than just a way to require specific capabilities at the construction time. For example, a hypothetical zoneinfo() constructor or a factory function can take a "fold_aware" boolean argument and let the user specify what kind of tzinfo is requested. It will then become a QOI issue of whether zoneinfo() supports both pre- and post-PEP semantics or not. Note that zoneinfo() providers may end up extending the tzinfo API to include queries such as give me all folds between year A and year B. The downside of a persistent run-time attribute that differentiate between pre-PEP and post-PEP tzinfos is that it may promote writing code that tries to cope with the presence of pre-PEP and post-PEP tzinfos in the same program. This is a recipe for a combinatorial disaster. Note that on top of pre-PEP/post-PEP distinction a good tzinfo() library will probably also supply a TZ database version. Imagine writing a simple "within(t, start, stop)" function that should account for the tree arguments possibly having different "fold_aware" attribute and different tzversion? > > It's not a change to the tzinfo API, but is a change to tzinfo semantics. > > I guess requiring a new `__version__ = 2` attribute would be OK. > I generally dislike "version" constants or attributes. My preferred solution would be to provide a generic PEP 495 compliant fromutc() in a tzinfo subclass and ask PEP 495 compliant implementations to derive from that. > > Or (preferably "and") add an optional `fold=None` argument to > .utcoffset() (by default, use the datetime's .fold attribute, else > use the passed value). I thought about this as an optimization. dt.utcoffset(fold=1) being an equivalent of dt.replace(fold=1).utcoffset() which avoids copying of the entire dt object into a temporary. I think this is a minor issue. I can go either way on this. > Then an obscure form of version-checking could > be done by seeing whether dt.utcoffset(fold=1) blows up. I would not add dt.utcoffset(fold=x) just for that and if we end up adding it for other reasons will probably consider such use a hack. > That's a > poor way to spell "check the version", but would at least allow > checking to see what would happen if `fold` changed without the > expense of creating new short-lived datetime objects. Yes, this is a good reason and since calling utcoffset() both ways will be typical for "careful" applications, I don't mind giving them some syntactic sugar for that. Yet again, this is not a "live or die" issue for PEP 495. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 8 19:43:51 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 13:43:51 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 1:06 PM, Tim Peters wrote: > > def __eq__(self, other): > > n_self = self.replace(tzinfo=None) > > n_other = other.replace(tzinfo=None) > > if self.tzinfo is other.tzinfo: > > return n_self == n_other > > Well, that's infinite recursion - but I know what you mean ;-) > No. You've probably missed that n_ objects are naive and naive comparison is just your plain old fold-unaware compare-all-components -except-fold operation. > > > > u_self = n_self - self.utcoffset() > > v_self = n_self - self.replace(fold=(1-self.fold)).utcoffset() > > u_other = n_other - other.utcoffset() > > v_other = n_other - other.replace(fold=(1-self.fold)).utcoffset() > > return u_self == u_other == v_self == v_other > > More infinite recursion. ditto -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 8 19:50:15 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 13:50:15 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 1:06 PM, Tim Peters wrote: > > I note that we can add obvious optimizations for the common tzinfo is > > datetime.timezone.utc and isinstance(tzinfo, datetime.timezone) cases. > > Please no. Comparison is almost certainly almost always intrazone, > and .utcoffset() isn't called at all for intrazone comparisons. I don't understand this comment. Solution 3 does not change anything for the intrazone (self.tzinfo is other.tzinfo) comparisons. Are you just saying that a slowdown in interzone comparison is a welcome feature to discourage bad programming practices? Sorry, I have a few ideas on how to optimize Solution 3 __eq__ even without special-casing fixed-offset tzinfos. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 8 19:50:22 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 12:50:22 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Alex] >>> def __eq__(self, other): >>> n_self = self.replace(tzinfo=None) >>> n_other = other.replace(tzinfo=None) >>> if self.tzinfo is other.tzinfo: >>> return n_self == n_other [Tim] >> Well, that's infinite recursion - but I know what you mean ;-) [Alox] > No. You've probably missed that n_ objects are naive and naive comparison > is just your plain old fold-unaware compare-all-components -except-fold > operation. I assumed you were showing an implementation of datetime.__eq__. Yes? In that case, `self` and `other` may both be naive on entry. Then the first two lines effectively make exactly copies of them. Since None is None, the `self.tzinfo is other.tzinfo` check succeeds, and so goes on to compare n_self to n_other - which are exact copies of the original inputs. Lather, rinse, repeat. >>> u_self = n_self - self.utcoffset() >>> v_self = n_self - self.replace(fold=(1-self.fold)).utcoffset() >>> u_other = n_other - other.utcoffset() >>> v_other = n_other - other.replace(fold=(1-self.fold)).utcoffset() >>> return u_self == u_other == v_self == v_other >> More infinite recursion. > ditto Ditto ;-) From alexander.belopolsky at gmail.com Tue Sep 8 19:55:06 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 13:55:06 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 1:50 PM, Tim Peters wrote: > I assumed you were showing an implementation of datetime.__eq__. Yes? > In that case, `self` and `other` may both be naive on entry. Then > the first two lines effectively make exactly copies of them. Since > None is None, the `self.tzinfo is other.tzinfo` check succeeds, and so > goes on to compare n_self to n_other - which are exact copies of the > original inputs. Lather, rinse, repeat. > Got it. No, I was not concerned with the naive case - I assumed that it was magically fulfilled without calling this __eq__ method. If this idea passes a sniff test - I will implement it in my fork so that we can play with a working prototype. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 8 20:02:39 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 13:02:39 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Alex] >>> I note that we can add obvious optimizations for the common tzinfo is >>> datetime.timezone.utc and isinstance(tzinfo, datetime.timezone) cases. [Tim] >> Please no. Comparison is almost certainly almost always intrazone, >> and .utcoffset() isn't called at all for intrazone comparisons. [Alex] > I don't understand this comment. Solution 3 does not change anything for > the intrazone (self.tzinfo is other.tzinfo) comparisons. Right. The most important cases are already as fast as they were before. > Are you just saying that a slowdown in interzone comparison is a > welcome feature to discourage bad programming practices? Sorry, > I have a few ideas on how to optimize Solution 3 __eq__ even > without special-casing fixed-offset tzinfos. :-) Premature optimization is the root of all evil. You're proposing to add even more complication to the code _solely_ to speed up cases in which you're merely guessing it really will make a lick of difference to user code. And they'll "run slow" anyway, just not _as_ slow as possible. At the start it's always best to do the simplest thing that could possibly work without inflicting _obviously_ unreasonable pain. If it turns out it really does matter to someone, they'll file a report, and then's the time to think about semantically useless complications solely for speed. Every line of code is another chance for an error to sneak in, for maintainers to puzzle over after you're gone, etc. From alexander.belopolsky at gmail.com Tue Sep 8 20:19:08 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 14:19:08 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 2:02 PM, Tim Peters wrote: > > Are you just saying that a slowdown in interzone comparison is a > > welcome feature to discourage bad programming practices? Sorry, > > I have a few ideas on how to optimize Solution 3 __eq__ even > > without special-casing fixed-offset tzinfos. :-) > > Premature optimization is the root of all evil. Agree 100%. > You're proposing to add even more complication to the code No, I actually think the code can be simpler (and without an infinite recursion.) In any case, it won't matter for the CPython users what we will ship in datetime.py, so I will write something that make the intent very clear. The type of optimization that I had in mind was that once you discover that self is in the fold/gap, you can return False without calling other.utcoffset(). The question is what is easier to understand: (a) t1 and t2 are equal if and only if t1 - t1.replace(fold=f1).utcoffset() == t2 - t2.replace(fold=f2).utcoffset() for all four possible pairs (f1, f2); or (b) t1 and t2 are equal if and only if they are unambiguous and valid in their respective zones and convert to the same UTC instant. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 8 21:13:45 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 14:13:45 -0500 Subject: [Datetime-SIG] Version check (was Re: PEP 495: What's left to resolve) In-Reply-To: References: Message-ID: [Tim] >> Which reminds me: the PEP should add a way for a post-495 tzinfo to >> say it supplies post-495 semantics, so users can check whether they're >> getting a tzinfo they require (if they need fold disambiguation) or >> can't tolerate (if they need folds to be ignored for legacy reasons). [Alex] > We may end up providing something like this, but I hope developing this > mechanism can be left to the tzinfo implementers. Python defines the tzinfo API and minimal tzinfo semantics. If Python doesn't also resolve tzinfo discoverability issues created by its own new requirements. then tzinfo implementers will create a Tower of Babel. Far better for Python to define _the_ way to check whether a tzinfo implements 495 semantics. This should be nearly trivial, both to specify and for tzinfo authors to implement - unless we go out of our way to create complications that aren't _inherent_ to the problem at hand ("which version did I get?"). > (Which can as well be us but in another PEP.) Disagree. PEP 495 is _creating_ a new "discoverability" problem. So that's the place to fix it too, before it becomes a real problem. > I am not sure a tzinfo object will need a persistent attribute rather than > just a way to require specific capabilities at the construction time. "Tower of Babel" - Python has no business specifying how a tzinfo object "must be" obtained to begin with, and there are already multiple ways out in the field. But Python is requiring a change to semantics. Some tzinfo authors may choose to provide an explicit way to ask for PEP 495 semantics, while others may not, etc. User code needs a uniform way to ask whether what they get in the end meets their requirements. When their requirements depend only on things where Python itself changed its mind, it's Python's proper responsibility to give the user a way to tell which they got. > For example, a hypothetical zoneinfo() constructor or a > factory function can take a "fold_aware" boolean argument and let the user > specify what kind of tzinfo is requested. It will then become a QOI issue > of whether zoneinfo() supports both pre- and post-PEP semantics or not. Yes, Tower of Babel. There's no need to inflict this potential confusion on users. Just specify a way to check. that _all_ post-495 tzinfos must support. > Note that zoneinfo() providers may end up extending the tzinfo API to > include queries such as give me all folds between year A and year B. Different issue, because _Python_ isn't specifying anything about that. We can't do anything about Towers of Babel tzinfo authors choose to create on their own. We can do something about new semantics Python is forcing them to supply. BTW, I've never yet seen a tzinfo that supplied any functionality beyond the minimum required by the docs. > The downside of a persistent run-time attribute that differentiate between > pre-PEP and post-PEP tzinfos is that it may promote writing code that tries > to cope with the presence of pre-PEP and post-PEP tzinfos in the same > program. This is a recipe for a combinatorial disaster. If a user chooses to embrace that, that's on them. Far better to give them a uniform way to check the tzinfos they get so they can absolutely avoiding mixing pre-495 and post-495 tzinfos to begin with. > Note that on top of pre-PEP/post-PEP distinction a good tzinfo() library > will probably also supply a TZ database version. Imagine writing a > simple "within(t, start, stop)" function that should account for the > tree arguments possibly having different "fold_aware" attribute > and different tzversion? Again, how can a sane user ensure they're _not_ getting into a such a mess if they can't even ask "is this a pre- or post-495 tzinfo?" in a uniform way? Assume 495 is successful. Some general-purpose library code will be _passed_ datetimes with tzinfos it had nothing to do with creating, and general-purpose libraries can't assume more than the minimum the Python docs require. The library has no control at all over the tzinfos it sees, but may _need_ to know whether they're pre- or post-495. 495 can make that simple instead of nearly impossible. >> I guess requiring a new `__version__ = 2` attribute would be OK. > I generally dislike "version" constants or attributes. Me too, but far better than nothing. > My preferred solution would be to provide a generic PEP 495 compliant > fromutc() in a tzinfo subclass and ask PEP 495 compliant implementations > to derive from that. That would be fine, except it's no longer trivial - for us. It would be better to supply a new marker class in the stdlib a PEP 495 compliant tzinfo had to derive from, but whose .fromutc() _must_ be overridden. All the industrial-strength zone wrappings are dealing with databases for which overriding .fromutc() is by far the best approach anyway. So, if we wanted to be _useful_, it would do more good for more people if we supplied a horridly slow default .utcoffset() instead. But this is "creating complications that aren't _inherent_ to the problem at hand". And if this isn't the last change Python ever makes to tzinfo semantics, a plain integer version number is probably easier for most people to grasp and live with than a graph of marker classes anyway. >> Or (preferably "and") add an optional `fold=None` argument to >> .utcoffset() (by default, use the datetime's .fold attribute, else >> use the passed value). > I thought about this as an optimization. dt.utcoffset(fold=1) being an > equivalent of dt.replace(fold=1).utcoffset() which avoids copying of the > entire dt object into a temporary. I think this is a minor issue. I can go > either way on this. It's a poor way to do version-checking, so I shouldn't have mentioned it. Alas, Guido's time machine is tied up preventing by-magic interzone comparison from ever being implemented :-( From carl at oddbird.net Tue Sep 8 21:34:23 2015 From: carl at oddbird.net (Carl Meyer) Date: Tue, 8 Sep 2015 13:34:23 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> <55EE233C.1020307@oddbird.net> Message-ID: <55EF383F.1020705@oddbird.net> > [Tim] >>>>> An aware datetime _is_ a >>>> tzinfo> pair, and there's a natural bijection between naive datetimes >>>>> and POSIX timestamps (across all instants both can represent). > > [Carl] >>>> I don't understand this, and I suspect it's at the heart of our >>>> misunderstanding. I would say there are many possible bijections .... > > [Tim] >>> "Natural" bijection. I gave you very simple Python code implementing >>> that bijection already. A naive datetime represents an instant in the >>> proleptic Gregorian calendar. > > [Carl] >> What is your definition of "instant" here? [Tim] > I didn't need one - Occam's Razor again ;-) To establish a bijection, > all that's required is to show that a proposed function meets all the > formal requirements... "Represents an > instance" was just vague English motivation for what followed. Of course. I never expressed any doubt that you had established _a_ bijection. It was the motivation I was trying to understand. >> I don't think a naive datetime represents an instant at all; > > Fine by me - and by Python. Also fine if you _never_ use a naive datetime. > >> it represents a range of possible instants, > > Heh - I see you haven't defined what _you_ mean by "instant". I already gave my definition earlier in this thread. It's borrowed from NodaTime/JodaTime: an instant is a unique and unambiguous point on a single global non-relativistic monotonic time line. Since I don't care about leap seconds, this definition is satisfied equally well for my purposes by a POSIX timestamp or a UTC datetime, among many other possible representations. I find this definition of instant _useful_ because it means that all instants, no matter their representations, are always convertible to integers on the same scale. That's not true of naive datetimes, without making an additional assumption of timezone. > When > you do, please be sure it's consistent with what POSIX says here too: > > The relationship between the actual time of day and the current > value for seconds since the Epoch is unspecified. > > How any changes to the value of seconds since the Epoch are > made to align to a desired relationship with the current actual time > is implementation-defined. As represented in seconds since the > Epoch, each and every day shall be accounted for by exactly > 86400 seconds. AFAICT that's just a bit of beating around the bush about not supporting leap seconds. I don't care :-) > While you're at it, define a clean model in which all that makes a > lick of sense to a casual user ;-) Actually, I think Model A _is_ such a clean model (if we can presume that the casual user in question also doesn't care about leap seconds or relativistic effects). I've taught many Python users how to use pytz, and my experience has been that the concept of a single global monotonic timeline, where all aware datetimes are simply variant spellings of some unambiguous point on that timeline, but (other than in their representation as a Gregorian date/time) behave the same no matter which timezone you spell them in, is quite easy to explain and grasp, even for people who've never worked with timezones before. Part of my dismay in this thread has been realizing now that I've mis-educated all these users about how datetime is really supposed to work :-) Like Stuart, I'm a bit concerned that a whole lot of pytz users are going to be very confused if or when they try to switch to PEP 495 style tzinfo's instead. I think in some ways Model B is really more powerful than Model A, because it lets you work in any number of different "local time" models, rather than requiring that you always work on the same single global timeline. And there are definitely cases where you need that. > The "so what?", in context, was to tweak Guido about saying an aware > datetime is fundamentally different from a pair, > despite that the space of such pairs is isomorphic to the space of > aware datetimes (which _is_ the space of > pairs) under the natural naive_datetime <-> timestamp bijection. > > Why is that setting _you_ off? Guido handled it just fine ;-) Heh. Just the urge to understand things, that's all. I'm just slower than Guido :-) but I think I get your point now; it was a narrower point than I'd realized. is only "fundamentally different" from in that they imply different mental models about what they are supposed to represent; mathematically they are no different. >> Under what circumstances is it reasonable to make that assumption >> about a naive datetime? > > Any use case where it's convenient That's up to the user. not me - > or you. For example, before Python grew its builtin > datetime.timezone.utc implementation of a UTC class, I routinely used > naive datetimes I thought of as being in UTC. I was too lazy to > remember where I hid my own UTC class. No problem. Sure, of course. As a pytz user, I'm forced to do the same thing (use naive datetimes and track an implied timezone separately) anytime I need to work in a "local clock time" model. >> Rather than saying "a naive datetime simply doesn't correspond to >> any particular POSIX timestamp; they aren't comparable at all unless >> you have additional information," which is what I'd say. > > I'm starting to suspect you didn't design datetime ;-) In context, I > was replying to Guido, who was talking about Python. In Python's > datetime, naive datetimes are comparable. Naive time has no > _concept_ of time zone. Naive datetimes nevertheless have a notion > of total order, which is isomorphic to the POSIX timestamp notion of > total order under the natural bijection. Likewise for arithmetic, > etc. There's nothing "wrong" about exploiting any of that when it's > convenient. This is simply a mis-understanding. I certainly do consider naive datetimes comparable to other naive datetimes, and I'm well aware (and glad) that Python does too. The referent of "they" above was "naive datetimes and POSIX timestamps." I don't consider those comparable _to each other_ unless you bring an additional assumption about the implied timezone of the naive datetime. And Python agrees. >> I mean, I certainly hope you wouldn't want datetime to make `utcdt - >> naivedt` a defined operation where it's assumed the naive datetime is UTC. > > Certainly not. That _would_ be wrong ;-) Violent agreement again in the end once again, then... Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Tue Sep 8 21:45:16 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 15:45:16 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 1:06 PM, Tim Peters wrote: > But I can never decide whether something really "fixes the hash > problem" without a lot more thought. > Let me try to outline a formal proof. Definitions: An aware datetime value t is called "regular" if t.utcoffset() does not depend on the value of the fold attribute. All other values are called "special". A binary relation "==" is defined by the following rules: (a) two special values s1 and s2 satisfy the "==" relation if they are the same (all component are equal) or they differ only by the value of fold; (b) for any special value s and regular value r, both r == s and s == r are False; and (c) for two regular values r1 and r2, r1 == r2 is equivalent to r1 - r1.utcoffset() and r2 - r2.utcoffset() having the same components. (Recall that according to PEP 495, dt - delta always has fold=0.) It will also be useful to define a "naive" equivalence: t1 ~ t2 if t1.tzinfo is t2.tzinfo and all their components except fold (year through microseconds) are equal. We will assume that ~ being an equivalence relation is well known. Lemma: The "==" relation defined above is an equivalence relation. Proof: We need to prove reflexivity (t == t for any t), symmetry (t1 == t2 => t2 == t1) and transitivity (t1 == t2 and t2 == t3 implies t1 == t3). Note that because of rule (b) it is enough to prove that == is equivalence separately for regular and special values. The complete proof is a rather tedious analysis of six propositions: three properties for each regular/special case. I'll present the two least trivial ones. 1. Let's show that == is transitive on the regular datetimes. Indeed, let r1, r2 and r3 are regular datetimes and o1, o2, and o3 are their utcoffset() values. Then r1 == r2 and r2 == r3 implies that r1 - o1 ~ r2 - o2 and r2 - o2 ~ r3 - o3, which in turn implies that r1 - o1 ~ r3 - o3 by transitivity of ~, which in turn implies r1 == r3 by transitivity of ~. QED. 2. Let's show that == is transitive on the special datetimes. This case is even simpler because s1 == s2 implies s1 ~ s2 (s1 and s2 differ only by fold), s2 == s3 implies s2 ~ s3 and thus s1 ~ s3 by transitivity of ~ and s1 == s3 by rule (a). Lemma: A function that is constant on equivalence classes satisfies the hash invariant. Proof: This is a tautology. Proposition: newhash(t) = oldhash(t.replace(fold=0)) satisfies the hash invariant. Proof: If t is special, its equivalence class consists of itself and a value with the complement value of fold. Since we force fold=0 before computing the hash values, it is trivially the same for both values in the same class. If t is regular, since oldhash is defined as a hash of t - t.utcoffset() components, the hash values of r1 and r2 are equal if r1 - r1.utcoffset() ~ r2 - r2.utcoffset() which follows from r1 == r2 by rule (c). > > So far, so good :-) > Except for headache. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 8 23:22:28 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 16:22:28 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Guido] >> But it breaks compatibility: it breaks the rule that for fold=0 nothing >> changes. [Alex] > It preserves a "weak form" of compatibility: nothing changes in the behavior > of aware datetime objects unless they use a post-PEP tzinfo. In that specific way, it's "more backward compatible" than the "special-case-the-snot-out-of-fold=1 in interzone __eq__ and __ne__", but it's subtle: Some datetime class constructors, like .now() and today(), can set fold=1 even if no post-495 tzinfos exist, based on Python's own idea of what the system zone rules are. If one of those happens to be generated during a repeated time in the system zone, and a pre-495 tzinfo is attached, then special-casing fold=1 makes it "not equal" to anything in any other zone despite that a pre-495 tzinfo is in use. That's certainly breaking _some_ form of backward compatibility, however obscure. But under Alex's latest idea, that wouldn't break: the pre-495 tzinfo's .utcoffset() would return the same thing regardless of `fold`, so the new __eq__ wouldn't see any problem with it. The latest idea is based on determining whether a time is _really_ "a problem case", and to a pre-495 tzinfo nothing is. Just staring at `fold` without consulting the tzinfo is guessing at whether it _might_ be a real problem for the tzinfo in use, and in fact always guesses wrong when fold=1 and a pre-495 tzinfo is in use. > Note that Solution 2 also breaks a "strong form" of compatibility (nothing > changes unless fold=1) because pre-PEP tzinfos are supposed to interpret > times in the fold as STD (fold=1). Note that in my experience very few > tzinfo developers understand this requirement and with a run-of-the-mill > tzinfo you have a 50/50 chance that it will interpret ambiguous times as > fold=0 or fold=1. Well, if they copied the Python doc examples, they got this "right". If they're using dateutil's wrappings, they also got this right. And it's a non-issue in pytz, because that only ever uses fixed-offset classes. The three users who remain will just have to eat their own hasty cooking ;-) > ... > Once you decide to use a post-PEP tzinfo, you have no choice but to test > your software on the edge cases if you care about them. (And you probably > do if you bother to switch to a post-PEP tzinfo.) If you don't care about > edge cases, you can continue using pre-PEP tzinfos or switch and accept a > more consistent but different edge case behavior. Yup! The new idea is cleaner and clearer. But runs slower ;-) From rosuav at gmail.com Wed Sep 9 03:49:15 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 9 Sep 2015 11:49:15 +1000 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Wed, Sep 9, 2015 at 5:45 AM, Alexander Belopolsky wrote: > Definitions: An aware datetime value t is called "regular" if t.utcoffset() > does not depend on the value of the fold attribute. One point to clarify here. Is the definition of "regular" based on the timezone alone (that is to say, a UTC datetime is regular, and an Australia/Brisbane datetime is regular, but anything in a region with DST is always special), or are "special" datetimes only those in the fold period? The former is easily identified. As the zoneinfo file is parsed, it'll be obvious which ones can ever have times that differ only in fold, and they get flagged as "special". The check is simple - ask the timezone object whether it's regular or special. The latter, perhaps not so much. Given a particular datetime, can you easily and reliably ascertain whether or not there is any other section of time which can "look like" this one? Maybe I've missed something, having been skimming rather than reading every post in detail. (There have been rather a lot of them, and here I am making that worse...) ChrisA From alexander.belopolsky at gmail.com Wed Sep 9 04:02:42 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 8 Sep 2015 22:02:42 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 9:49 PM, Chris Angelico wrote: > On Wed, Sep 9, 2015 at 5:45 AM, Alexander Belopolsky > wrote: > > Definitions: An aware datetime value t is called "regular" if > t.utcoffset() > > does not depend on the value of the fold attribute. > > One point to clarify here. Is the definition of "regular" based on the > timezone alone (that is to say, a UTC datetime is regular, and an > Australia/Brisbane datetime is regular, but anything in a region with > DST is always special), or are "special" datetimes only those in the > fold period? It is what the definition says. If you want to know whether t is regular you have to compare t.utcoffset() and t.replace(fold=1-t.fold).utcoffset(). If they are the same, t is regular. If not - t is special. If tzinfo is a fixed offset timezone, all times with such tzinfo are regular. If tzinfo is a typical DST observing timezone, then times in the fold and in the gap are special and the rest are regular. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 9 04:10:49 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 21:10:49 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Alex] >> Definitions: An aware datetime value t is called "regular" if t.utcoffset() >> does not depend on the value of the fold attribute. [Chris Angelico] > One point to clarify here. Is the definition of "regular" based on the > timezone alone (that is to say, a UTC datetime is regular, and an > Australia/Brisbane datetime is regular, but anything in a region with > DST is always special), or are "special" datetimes only those in the > fold period? It applies to "an aware datetime value t". That's clear already ;-) Everything about `t` matters. In plain English `t` is "regular" if and only if `t` is in neither a fold nor a gap. So, e.g., all `t` in UTC are regular. In most zones with a notion of DST, there are exactly 2 wall-clock hours per year that are not regular (in the gap at the start of DST, and in the fold at DST end). > The former is easily identified. As the zoneinfo file is parsed, it'll > be obvious which ones can ever have times that differ only in fold, > and they get flagged as "special". The check is simple - ask the > timezone object whether it's regular or special. What's actually needed isn't that simple. > The latter, perhaps not so much. Given a particular datetime, can you > easily and reliably ascertain whether or not there is any other > section of time which can "look like" this one? Impossible to answer "easily" without knowing all the details of a specific tzinfo's internal data representation. For, e.g., a timezone _defined_ by a POSIX TZ rule, it's trivial, since those explicitly spell out the "problem hours" as local wall-clock times. For a tzfile, I posted pseudo-code a while back showing how to determine whether a UTC time corresponds to a fold in the zone, using a few simple calculations after doing a binary search across the zone's transition list to locate where the input UTC time belongs. A fold exists if and only if the current total UTC offset is less than the previous transition's total UTC offset. The opposite for a gap. However, this may mishandle cases (if any exist - I don't know) where consecutive transitions have exactly the same total UTC offset. So there are details left to flesh out, but it's conceptually easy enough ;-) For other zone sources, who knows? From rosuav at gmail.com Wed Sep 9 04:55:17 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 9 Sep 2015 12:55:17 +1000 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Wed, Sep 9, 2015 at 12:10 PM, Tim Peters wrote: > [Alex] >>> Definitions: An aware datetime value t is called "regular" if t.utcoffset() >>> does not depend on the value of the fold attribute. > > [Chris Angelico] >> One point to clarify here. Is the definition of "regular" based on the >> timezone alone (that is to say, a UTC datetime is regular, and an >> Australia/Brisbane datetime is regular, but anything in a region with >> DST is always special), or are "special" datetimes only those in the >> fold period? > > It applies to "an aware datetime value t". That's clear already ;-) > Everything about `t` matters. In plain English `t` is "regular" if > and only if `t` is in neither a fold nor a gap. So, e.g., all `t` in > UTC are regular. In most zones with a notion of DST, there are > exactly 2 wall-clock hours per year that are not regular (in the gap > at the start of DST, and in the fold at DST end). Okay, that's what I thought it meant. And it's easy enough to see if two datetimes differ only in fold. The problem I was seeing was a difficulty in recognizing whether a single datetime is special or not, which is answered here: On Wed, Sep 9, 2015 at 12:02 PM, Alexander Belopolsky wrote: > If you want to know whether t is regular you have to compare t.utcoffset() > and t.replace(fold=1-t.fold).utcoffset(). If they are the same, t is > regular. If not - t is special. Thanks Alex! (I can imagine pushing this to the timezone object as a primitive, which will allow it to be optimized down to "t is regular" for timezones that are always regular, but that's an optimization only.) ChrisA From tim.peters at gmail.com Wed Sep 9 05:10:56 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 22:10:56 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Alex] >> If you want to know whether t is regular you have to compare t.utcoffset() >> and t.replace(fold=1-t.fold).utcoffset(). If they are the same, t is >> regular. If not - t is special. [Chris] > (I can imagine pushing this to the timezone object as a primitive, Hey, I'm listed as a PEP co-author, and even I can't get Alex to budge on adding my utterly sensible new ".classify()" tzinfo method ;-) Instead zone-wrapping tzinfo authors will likely write one anyway for their internal use, but not expose it (e.g., a tzinfo's .fromutc() needs to compute "is this in a fold? if so, earlier or later time?:" each time it's called - and .utcoffset() needs to worry about both folds and gaps on each call). > which will allow it to be optimized down to "t is regular" for > timezones that are always regular, but that's an optimization only.) Any sensible wrapping of a fixed-offset ("always regular") zone will have a 1-line .utcoffset() implementation, simply returning that zone's constant offset. It will be cheap enough. From tim.peters at gmail.com Wed Sep 9 06:25:08 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 8 Sep 2015 23:25:08 -0500 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: [Alex] > ... > The question is what is easier to understand: (a) t1 and t2 are equal if and > only if t1 - t1.replace(fold=f1).utcoffset() == t2 - > t2.replace(fold=f2).utcoffset() for all four possible pairs (f1, f2); Infinite recursion again ;-) , but this time because interzone equality is being defined in terms of 4 more interzone equalities. > or (b) > t1 and t2 are equal if and only if they are unambiguous and valid in their > respective zones and convert to the same UTC instant. The docs generally do both, when feasible: an English description, followed by a Python expression to resolve the inherent imprecision of English. The intent is for the English to give the high-order bits, and for the Python expression to leave no possible misunderstanding. For this case, the clearest Python I can think of is: def toutc(t, fold): return (t - t.replace(fold=fold).utcoffset()).replace(tzinfo=None) Then t1 == t2, when t1 and t2 are aware datetimes in different zones, if and only if: toutc(t1, 0) == toutc(t1, 1) == toutc(t2, 0) == toutc(t2, 1) Then there's no English remaining to be misread. As a side benefit, the correctness of short-circuiting if t1 is a problem case becomes dead obvious on the face of it ;-) From ischwabacher at wisc.edu Wed Sep 9 07:18:33 2015 From: ischwabacher at wisc.edu (Isaac J Schwabacher) Date: Wed, 09 Sep 2015 05:18:33 +0000 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: I stop following for the week and the world goes mad. I've lost count of the number of times I've thought, "Are you out of your *mind*!?" while reading this thread. You actually considered breaking the __hash__ invariant? [Guido] > > I could not accept a PEP that leads to different datetime being considered > > == but having a different hash (*unless* due to a buggy tzinfo subclass > > implementation -- however no historical timezone data should ever depend on > > such a bug). > > > > I'm much less concerned about < being intransitive in edge cases. [Tim] > Offhand I don't know whether it can be (probably). The case I > stumbled into yesterday showed that equality ("==") could be > intransitive: > > assert a == b == c == d and a < d > > While initially jarring, I called it a "minor wart", because the > middle "==" there is working in classic arithmetic but the other two > are working in timeline arithmetic. But _a_ wart all the same, since > transitivity doesn't fail today. I'm assuming that the moment of temporary insanity has passed and you consider the __hash__ invariant to be sacrosanct. The problem here is that someone (Alexander, I think?) demonstrated a method of producing a tzinfo class and b and c to make this true, *given arbitrary a and d*. Equality may not be transitive, but equality of hashes is, which means that __hash__ must be constant over equivalence classes in the transitive closure of the relation defined by __eq__. In this case, this boils down to "if __hash__ ignores fold, all datetime objects must have the same hash". I imagine the performance implications of this are not acceptable. There is no satisfactory way of weaseling out of this; datetime equality is timeline equality now and forever, unless you're willing to give up one of backward compatibility, the __hash__ invariant, or the ability to implement new tzinfo classes. (The tzinfo in the example was contrived but not buggy.) > > I also don't particularly care about == following from the difference being zero. > > Still, unless we're constrained by backward compatibility, I would rather > > not add equivalence between *any* two datetimes whose tzinfo is not the same > > object -- even if we can infer that they both must refer to the same > > instant. > > Assuming "equivalent" means "compare equal", we're highly constrained. > For datetimes x and y with distinct non-None tzinfos, it's always been > the case that: > > 1. x-y effectively converted both to UTC before subtraction. > > 2. comparison effectively interpreted x-y as a __cmp__ result > 2a. various comparison transitivities essentially followed from that > > 3. Because of #2, to maintain __hash__'s contract datetime.__hash__ > also effectively converted to UTC before hashing > > All of that would (well, "should") continue to work fine, except that > fold=1 is being ignored in intrazone arithmetic (subtraction and > comparisons) and by hash(). Maybe there are other surprises. I just > happened to notice the hash() problem, and equality intransitivity, > both yesterday. via thought experiments. > > On the face of it, it's a conceptual mess to try to make fold=1 "mean > something" in some contexts but not in others. In particular, > arithmetic, comparison, and hashing are usually deeply interrelated, > and have been in datetime so far. Ignoring `fold` in single-zone > arithmetic, comparisons and hashing works fine (in "naive time", where > `fold` is senseless), but when going across zones `fold` cannot be > ignored. > > That's a huge problem for hash(), because it can have no idea whether > the pattern of later equality comparisons relying on hash results > _will_ be using classic or timeline rules (or a mix of both). > > That didn't matter before, because _a_ unique UTC equivalent always > existed (the possibility of ambiguous times was effectively ignored). > > Now it does matter, because the UTC equivalent can differ depending on > the `fold` value. Ignoring it sometimes but not others leads to the > current quandary. The last time I made an argument like this, Guido called me the *very loyal* opposition. :) ijs From tim.peters at gmail.com Wed Sep 9 08:34:48 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 01:34:48 -0500 Subject: [Datetime-SIG] Another round on error-checking In-Reply-To: References: Message-ID: [ijs] > I stop following for the week and the world goes mad. I've > lost count of the number of times I've thought, "Are you > out of your *mind*!?" while reading this thread. You actually > considered breaking the __hash__ invariant? It went unnoticed for some time that the original PEP 495 _did_ break it. Not intentionally. "Unintended consequence." Alex resisted accepting that it was a fatal problem at first, but was converted to One Of Us after a single night's intense torture ;-) ... > I'm assuming that the moment of temporary insanity has > passed and you consider the __hash__ invariant to be sacrosanct. Of course! > The problem here is that someone (Alexander, I think?) > demonstrated a method of producing a tzinfo class and b > and c to make this true, *given arbitrary a and d*. Equality > may not be transitive, but equality of hashes is, which > means that __hash__ must be constant over equivalence > classes in the transitive closure of the relation defined by > __eq__. In this case, this boils down to "if __hash__ ignores > fold, all datetime objects must have the same hash". Alex also sketched an approach to constructing a far higher-quality hash (than a constant function), but it required having, in advance (of the first hash() call), all tzinfos that could possibly be used across a program's run. For example, if we knew in advance there was only one possible non-fixed-offset zone Z, hash(x) could convert x to zone Z. then convert the result of that (ignoring its `fold`) to a timestamp (as a timedelta object) relative to 0001-01-01 00:00:00 in Z, then hash the timestamp. Then all spellings in all zones of one of the times in a Z fold would have the same hash. It's clever, but can't see a way to make it practical. There's nothing, e.g., to stop code from building a brand new tzinfo as a big string containing Python code, and compiling the string at runtime. > I imagine the performance implications of this are not acceptable. Heh. We could try a constant hash function and see whether anyone noticed. That would be fun :-) > There is no satisfactory way of weaseling out of this; _Something_ has to give, yes. "Satisfactory" is Guido's call. Weaseling is our job. I already did a small test to convince myself people _would_ notice if we removed dicts from the language. They're the real source of this problem ;-) > datetime equality is timeline equality now and forever, unless > you're willing to give up one of backward compatibility, the > __hash__ invariant, or the ability to implement new tzinfo classes. > (The tzinfo in the example was contrived but not buggy.) No tzinfo contrivance is necessary. The hash problem in the original PEP could be provoked using any zone whatsoever in which there's a fold (like, say, US/Eastern). I think you have in mind part of Alex's sketch of a better-than-constant hash, where zones were indeed contrived just to illustrate how nasty it _could_ get. Guido is least fond of by-magic interzone comparison, and that's what we've been picking on. All worm-arounds so far would sacrifice trichotomy in some (or all) cases of "problem times", by declaring that some problem times wouldn't compare equal to any datetime in any other zone. In the latest version of that, there would be no change to comparison results so long as pre-495 tzinfos were used. If you started to use post-495 tzinfos, that's your choice: then you get by-magic `fold` set correctly in all cases, correct zone conversions in all cases, and correct by-magic interzone subtraction in all cases - at the cost of living with that all problem times (whether in a gap or a fold) would compare "not equal" to all datetimes in all other zones. My own code couldn't care less (I've never used an interzone comparison outside of lines in datetime's test suite). You _could_ still compare them, but you'd either have to convert to a zone in which they were not problem times (timezone.utc would always work for this) first, or use by-magic interzone subtraction and check the sign of the result. So, given that a user would have to "do something" to have even the possibility of suffering a surprise that will probably never happen in their life, "not satisfactory" isn't a slam dunk. Luckily, PEP 20 is crystal clear about the right decision in this case. From alexander.belopolsky at gmail.com Wed Sep 9 17:44:40 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 9 Sep 2015 11:44:40 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: On Tue, Sep 8, 2015 at 3:59 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > On Mon, Sep 7, 2015 at 9:57 PM, Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: > >> Solution 1: Make t1 > t0. >> >> Solution 2: Leave t1 == t0, but make t1 != u1. >> > > Solution 3: Leave t1 == t0, but make *both* t0 != u0 and t1 != u1 if > t0.utcoffset() != t1.utcoffset(). I've implemented [1] Solution 3 in my Github fork. [1]: https://github.com/abalkin/cpython/commit/aac301abe89cad2d65633df98764e5b5704f7629 -------------- next part -------------- An HTML attachment was scrubbed... URL: From berker.peksag at gmail.com Wed Sep 9 17:49:52 2015 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Wed, 9 Sep 2015 18:49:52 +0300 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional Message-ID: The idea was came up when I reviewed issue 22241 [1] and Alexander said "This is a reasonable request": http://bugs.python.org/review/22241/ Currently, we have tests like (see Lib/test/datetimetester.py) self.assertEqual('UTC', timezone.utc.tzname(None)) self.assertEqual('UTC', timezone(ZERO).tzname(None)) self.assertEqual('UTC-05:00', timezone(-5 * HOUR).tzname(None)) self.assertEqual('UTC+09:30', timezone(9.5 * HOUR).tzname(None)) Can we just make dt optional and set its default value to None in Python 3.6? So timezone.utc.tzname(None) and timezone.utc.tzname() will both return "UTC". It's a small change, but I think it will make the API cleaner. --Berker [1] http://bugs.python.org/issue22241 From guido at python.org Wed Sep 9 18:19:09 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Sep 2015 09:19:09 -0700 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: +1, just submit a patch and mark it for 3.6. On Wed, Sep 9, 2015 at 8:49 AM, Berker Peksa? wrote: > The idea was came up when I reviewed issue 22241 [1] and Alexander > said "This is a reasonable request": > > http://bugs.python.org/review/22241/ > > Currently, we have tests like (see Lib/test/datetimetester.py) > > self.assertEqual('UTC', timezone.utc.tzname(None)) > self.assertEqual('UTC', timezone(ZERO).tzname(None)) > self.assertEqual('UTC-05:00', timezone(-5 * HOUR).tzname(None)) > self.assertEqual('UTC+09:30', timezone(9.5 * HOUR).tzname(None)) > > Can we just make dt optional and set its default value to None in Python > 3.6? So > > timezone.utc.tzname(None) and timezone.utc.tzname() > > will both return "UTC". It's a small change, but I think it will make > the API cleaner. > > --Berker > > [1] http://bugs.python.org/issue22241 > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 9 18:33:28 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 11:33:28 -0500 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: [Berker Peksa? ] > The idea was came up when I reviewed issue 22241 [1] and Alexander > said "This is a reasonable request": > > http://bugs.python.org/review/22241/ > > Currently, we have tests like (see Lib/test/datetimetester.py) > > self.assertEqual('UTC', timezone.utc.tzname(None)) > self.assertEqual('UTC', timezone(ZERO).tzname(None)) > self.assertEqual('UTC-05:00', timezone(-5 * HOUR).tzname(None)) > self.assertEqual('UTC+09:30', timezone(9.5 * HOUR).tzname(None)) > > Can we just make dt optional and set its default value to None in Python 3.6? So > > timezone.utc.tzname(None) and timezone.utc.tzname() > > will both return "UTC". It's a small change, but I think it will make > the API cleaner. > > --Berker > > [1] http://bugs.python.org/issue22241 +0. The base (tzinfo) class requires the datetime argument because, in general, a zone's name depends on the datetime (like "is it in the zone's "daylight" time"?). A subclass (like `timezone`) is free to override it to remove that requirement, but then code relying on the simplification is also relying on that it will only ever see instances of that subclass. General code can't make that assumption and get away with it. The lines from the test suite can't possibly ever see anything except the exact instance each is testing, so can't ever suffer that kind of problem. But it's also no real burden to add a few "None"s in the test suite. Snippets from test suites rarely make compelling examples either way - they're so very specific to the tiny bit of behavior they're probing. From alexander.belopolsky at gmail.com Wed Sep 9 19:24:04 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 9 Sep 2015 13:24:04 -0400 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: On Wed, Sep 9, 2015 at 12:33 PM, Tim Peters wrote: > +0. The base (tzinfo) class requires the datetime argument because, > in general, a zone's name depends on the datetime (like "is it in the > zone's "daylight" time"?). > I was thinking of returning the "zoneinfo" name such as America/New_York in this case. This would end the debate about what is the "proper" timezone name: if you know the date and time - you can get a specific EST/EDT abbreviation. If not - you'll just get whatever the zoneinfo calls itself. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Sep 9 19:38:17 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Sep 2015 10:38:17 -0700 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: <55F06E89.7050805@stoneleaf.us> On 09/09/2015 10:24 AM, Alexander Belopolsky wrote: > On Wed, Sep 9, 2015 at 12:33 PM, Tim Peters wrote: >> >> +0. The base (tzinfo) class requires the datetime argument because, >> in general, a zone's name depends on the datetime (like "is it in the >> zone's "daylight" time"?). > > I was thinking of returning the "zoneinfo" name such as America/New_York > in this case. This would end the debate about what is the "proper" > timezone name: if you know the date and time - you can get a specific > EST/EDT abbreviation. If not - you'll just get whatever the zoneinfo > calls itself. +1 -- ~Ethan~ From guido at python.org Wed Sep 9 19:43:20 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Sep 2015 10:43:20 -0700 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: On Wed, Sep 9, 2015 at 10:24 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Wed, Sep 9, 2015 at 12:33 PM, Tim Peters wrote: > >> +0. The base (tzinfo) class requires the datetime argument because, >> in general, a zone's name depends on the datetime (like "is it in the >> zone's "daylight" time"?). >> > > I was thinking of returning the "zoneinfo" name such as America/New_York > in this case. This would end the debate about what is the "proper" > timezone name: if you know the date and time - you can get a specific > EST/EDT abbreviation. If not - you'll just get whatever the zoneinfo calls > itself. > But that's not directly related to the proposal, is it? The proposal is to treat tz.tzname() the same as tz.tzname(None) -- not to give the former a different meaning. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Sep 9 19:51:24 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 9 Sep 2015 13:51:24 -0400 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: On Wed, Sep 9, 2015 at 1:43 PM, Guido van Rossum wrote: > On Wed, Sep 9, 2015 at 10:24 AM, Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: > >> >> On Wed, Sep 9, 2015 at 12:33 PM, Tim Peters wrote: >> >>> +0. The base (tzinfo) class requires the datetime argument because, >>> in general, a zone's name depends on the datetime (like "is it in the >>> zone's "daylight" time"?). >>> >> >> I was thinking of returning the "zoneinfo" name such as America/New_York >> in this case. This would end the debate about what is the "proper" >> timezone name: if you know the date and time - you can get a specific >> EST/EDT abbreviation. If not - you'll just get whatever the zoneinfo calls >> itself. >> > > But that's not directly related to the proposal, is it? The proposal is to > treat tz.tzname() the same as tz.tzname(None) -- not to give the former a > different meaning. > > Right. That's an independent proposal. I was mostly responding to Tim's comment. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 9 19:58:54 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 12:58:54 -0500 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: [Tim] >> +0. The base (tzinfo) class requires the datetime argument because, >> in general, a zone's name depends on the datetime (like "is it in the >> zone's "daylight" time"?). [Alex] > I was thinking of returning the "zoneinfo" name such as America/New_York in > this case. This would end the debate about what is the "proper" timezone > name: if you know the date and time - you can get a specific EST/EDT > abbreviation. If not - you'll just get whatever the zoneinfo calls itself. That's fine, and even desirable ;-) Just saying it's too late to change that the _base_ trzinfo class has always had a documented requirement for a datetime argument to tzinfo.tzname(). General code slinging trzinfos can only assume what's promised, and must supply what's required, by the base class. Subclasses are free to promise more (but not less) and/or require less (but not more), and code is free to rely on that, but such code is no longer general. Since that's just a _potential_ problem, and Python is for consenting adults, +0 on the original proposal (doesn't really matter to me either way, but I have a mild preference for allowing a simplification ("require less") in the `timezone` subclass). From alexander.belopolsky at gmail.com Wed Sep 9 20:30:21 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 9 Sep 2015 14:30:21 -0400 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: On Wed, Sep 9, 2015 at 1:58 PM, Tim Peters wrote: > +0 on the original proposal (doesn't really matter to me > either way, but I have a mild preference for allowing a simplification > ("require less") in the `timezone` subclass). > What would you say for the following proposal: leave tzinfo.tzname() signature as is, but add def name(self, dt=None): return self.tzname(dt) to the base tzinfo class. Now `tzname()` is a hook for tzinfo implementers, but name() is the higher level function for the users. (Note that I never liked that datetime.tzname() and tzinfo.tzname() had the same method name, so my proposal may reflect a personal bias.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 9 23:10:07 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 16:10:07 -0500 Subject: [Datetime-SIG] Making dt parameter of timezone.tzname(dt) optional In-Reply-To: References: Message-ID: [Alex] > What would you say for the following proposal: leave tzinfo.tzname() > signature as is, but add > > def name(self, dt=None): > return self.tzname(dt) > > to the base tzinfo class. Now `tzname()` is a hook for tzinfo implementers, > but name() is the higher level function for the users. (Note that I never > liked that datetime.tzname() and tzinfo.tzname() had the same method name, > so my proposal may reflect a personal bias.) Only if PEP 495 adds an obviously needed tzifno.classify(self, dt) method so that ordinary users don't have to become implementation experts to answer questions about datetimes that aren't about implementation details ;-) Short of that, I think I'd be happier if we changed tzinfo.tzname's signature to `dt=None`. No existing code would be harmed. New code would have to realize that any exploitation of the relaxed requirement could fail if an older tzinfo object is used. Same as that new code would have to realize that any exploitation of a new tzinfo.name() method could fail if an older tzinfo object is used. That's what "Version changed in" notes are for. I don't much care because I can't believe anyone uses a mix of tzinfos obtained from dozens of suppliers. BTW, this would be another use for starting to require that a tzinfo reveal a "version number" (however it's spelled). PEP 495 is a good place to start that too: any time Python changes tzinfo semantics, Python is creating _potential_ problems for everyone. PEP 495 is the first time we're proposing to change anything. From alexander.belopolsky at gmail.com Thu Sep 10 00:19:08 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 9 Sep 2015 18:19:08 -0400 Subject: [Datetime-SIG] PEP 495: The classify() method Message-ID: On Wed, Sep 9, 2015 at 5:10 PM, Tim Peters wrote: [in the "Making dt parameter of timezone.tzname(dt) optional" thread] > > Only if PEP 495 adds an obviously needed tzifno.classify(self, dt) > method so that ordinary users don't have to become implementation > experts to answer questions about datetimes that aren't about > implementation details ;-) Deal! But the return values of classify() should be -1 (for gap), 0 (for regular) and 1 (for fold). And while we are at it, let's bring back the builtin cmp() method because all these cryptic >, < and == are just too confusing. :-) Seriously, though, I have no objection to the classify() method, but someone else will have to design it and carry through the unavoidable bikeshedding rounds. My goal in PEP 495 is to draw a straight line between the current state of affairs and a lossless astimezone(). Niceties like classify() are just a little off that path. I had no illusions when I started PEP 495 that it would be as easy as it sounds (just add one measly bit!) Still, I did not anticipate all the subtle issues that would have to be resolved. So rather than proposing more features that are not strictly necessary, I would like to ask the group to start kicking the tires on the reference implementation. [1] [1]: https://github.com/abalkin/cpython/tree/issue24773-s3 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Sep 10 21:25:55 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 10 Sep 2015 12:25:55 -0700 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: Message-ID: On Mon, Sep 7, 2015 at 9:48 PM, Stuart Bishop wrote: > On 4 September 2015 at 23:01, Chris Barker wrote: > > > I would like a flag on datetime, but it seems it might be better to put > that > > flag on a tzinfo object. But the implementation is the something to argue > > about only if there is any chance of doing it at all. > > I would still lean towards a separate datetimetz class, but that is > just semantics. As this conversation has progressed, it seems the way forward, if anyone wants to go there, is a new datetime class that conforms to Carls "Model A" -- is that what you mean? For my part, it would be cool if such a class could use the same tzinfo objects as datetime.datetime, and maybe the same timedelta. But as Carl suggested -- that would be a job for a new library anyway. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 11 04:41:59 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 10 Sep 2015 21:41:59 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55EDB967.2050108@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: It's become beyond obvious that I'll never be able to make enough time to respond to all of these, so I'll address just this for now. because it's impossible to make progress on anything unless there's agreement on what technical terms mean: [Carl Meyer ] >>> If you are doing any kind of "integer arithmetic on POSIX timestamps", you >>> are _always_ doing timeline arithmetic. [Tim] >> True. [Carl] >>> Classic arithmetic may be many things, but the one thing it definitively is >>> _not_ is "arithmetic on POSIX timestamps." [Tim] >> False. UTC is an eternally-fixed-offset zone. There are no >> transitions to be accounted for in UTC. Classic and timeline >> arithmetic are exactly the same thing in any eternally-fixed-offset >> zone. Because POSIX timestamps _are_ "in UTC", any arithmetic >> performed on one is being done in UTC too. Your illustration next >> goes way beyond anything I could possibly read as doing arithmetic on >> POSIX timestamps: [Carl] > Translation: "I refuse to countenance the possibility of Model A." Not at all. I've tried several times to get it across in English, so this time I'll try code instead: def dt_add(dt, td, timeline=False): ofs = dt.utcoffset() as_utc = dt.replace(tzinfo=timezone.utc) # and the following is identical to converting to # a timestamp, "using POSIX timestamp arithmetic", # then converting back to calendar notation as_utc -= ofs as_utc += td if timeline: return as_utc.astimezone(dt.tzinfo) else: # classic return (as_utc + ofs).replace(tzinfo=dt.tzinfo) That adds an aware datetime to a timedelta, doing either classic or timeline arithmetic depending on the optional flag. If you want to claim this doesn't do either kind of arithmetic correctly, prove it with a specific example (of course cases where it's impossible to do _conversions_ correctly today would be off-point). Here's a variant of an earlier specific example: from datetime import datetime, timedelta, timezone from pytz.reference import Eastern turkey_in = datetime(2004, 10, 30, 15, tzinfo=Eastern) DAY = timedelta(days=1) turkey_out1 = dt_add(turkey_in, DAY, timeline=True) turkey_out2 = dt_add(turkey_in, DAY, timeline=False) print(turkey_in) print(turkey_out1) print(turkey_out2) and its output: 2004-10-30 15:00:00-04:00 # start 2004-10-31 14:00:00-05:00 # "a day later" in timeline 2004-10-31 15:00:00-05:00 # "a day later" in classic "Timeline" arithmetic accounts for that an hour was inserted when DST ended, and "classic" does not. The "POSIX timestamp arithmetic" part is identical across both cases. The only difference is in how the POSIX timestamp - which is always and only a count of seconds in UTC (which isn't my definition - it's POSIX's) - is converted back to local calendar notation at the very end. I believe you have _pictured_ the POSIX timestamp number line annotated with local calendar notations in your head, but those labels have nothing to do with the timestamp arithmetic. The labels have only to do with the functions used to map local calendar notations to and from POSIX timestamps. Those labelings are the difference between "timeline" and "classic" arithmetic at the higher level of aware datetime arithmetic. At the POSIX timestamp level, an integer is just an integer, with no defined meaning of any kind beyond a count of seconds in UTC. and a POSIX-defined mapping to and from propleptic Gregorian calendar notation. That said, two things to note: 1. The "as_utc -= ofs" line is theoretically impure, because it's treating a local time _as if_ it were a UTC time. There's no real way around that. We have to convert from local to UTC _somehow_, and POSIX dodges the issue by providing mktime() to do that "by magic". Here we're _inside_ the sausage factory, doing it ourselves. Some rat guts are visible at this level. If you look inside a C mktime() implementation, you'll find rat guts all over that too. But it's no problem for Guido ;-) We just set the hands on a UTC clock to match the local clock, then move the hands on the UTC clock by the amount the local clock is "ahead of" or "behind" UTC. In that way you can indeed picture the operation as being entirely "in UTC". 2. This would be a foolish _implementation_ of classic arithmetic, but not for semantic reasons. It's just grossly inefficient. Stare at the code, and in the classic case it subtracts the UTC offset at first only to add the same offset back later. Those cancel out, so there's no _semantic_ need to do either.. It's only excessive concern for theoretical purity that could stop one from spelling it as return dt + td from the start. That's technically absurd, since it's doing POSIX timestamp arithmetic on a timestamp that's _not_ a UTC seconds count. Its only virtue is that it gets the same answer far faster ;-) BTW, the same kind of reasoning shows why the value of the `timeline=` flag makes no difference in any case a fixed-offset zone is being used. Which is, concretely, what I mean by saying that timeline and classic arithmetic are exactly the same thing in any fixed-offset zone. From random832 at fastmail.com Sat Sep 12 20:23:12 2015 From: random832 at fastmail.com (Random832) Date: Sat, 12 Sep 2015 14:23:12 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? Message-ID: I was trying to find out how arithmetic on aware datetimes is "supposed to" work, and tested with pytz. When I posted asking why it behaves this way I was told that pytz doesn't behave correctly according to the way the API was designed. The tzlocal module, on the other hand, appears to simply defer to pytz on Unix systems. My question is, _are_ there any correct reference implementations that demonstrate the proper behavior in the presence of a timezone that has daylight saving time transitions? From tim.peters at gmail.com Sat Sep 12 20:53:06 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 12 Sep 2015 13:53:06 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: Message-ID: > I was trying to find out how arithmetic on aware datetimes is "supposed > to" work, and tested with pytz. When I posted asking why it behaves this > way I was told that pytz doesn't behave correctly according to the way > the API was designed. You were told (by me) that its implementation of tzinfos was not the _intended_ way. Which is another way of saying it was an unanticipated way. "Correctly" is a whole different kind of judgment. pytz users who faithfully follow the docs seem happy with it. > The tzlocal module, on the other hand, appears to > simply defer to pytz on Unix systems. > > My question is, _are_ there any correct reference implementations that > demonstrate the proper behavior in the presence of a timezone that has > daylight saving time transitions? Which specific "proper behaviors"? :"Hybrid" tzinfos following the recommendations in the Python docs, including the sample implementations in the docs, correctly mimic local clock behavior (skipping the clock ahead when DST starts, and moving the clock back when DST ends) when converting from UTC. It's impossible now to do local -> UTC conversions correctly in all cases, because it's impossible now to know which UTC time was intended for a local time in a fold. For the same reason, it's impossible now to know whether a local time in a fold is intended to be viewed as being in daylight time or standard time. But do note limitations of the default .fromutc() implementation: it only guarantees correct mimic-the-local-clock behavior when total-offset transitions are solely due to a notion of "daylight time" that strictly alternates between .dst() returning zero and non-zero values. Transitions due to any other reason may or may not be reflected in .fromutc()'s treatment of the local clock. Most importantly, a transition due to a zone changing its base ("standard") UTC offset is a possibility the default .fromutc() knows nothing about. The wrapping of the IANA ("Olson") zoneinfo database in dateutil uses hybrid tzinfos (the intended way of wrapping zones with multiple UTC offsets), and inherits the default .fromutc(), so all the above applies to it. Including all behaviors stemming from the impossibility of disambiguating local times in a fold. That's not a bug in dateutil. It's a gap in datetime's design, It was an intentional gap at the time, but that pytz went to such heroic lengths to fill it suggests PEP 495 may well be overdue ;-) From random832 at fastmail.com Sat Sep 12 21:16:02 2015 From: random832 at fastmail.com (random832 at fastmail.com) Date: Sat, 12 Sep 2015 15:16:02 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: Message-ID: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Oops, pressed the wrong reply button and it didn't include the datetime list. On Sat, Sep 12, 2015, at 14:53, Tim Peters wrote: > > I was trying to find out how arithmetic on aware datetimes is > > "supposed to" work, and tested with pytz. When I posted asking why > > it behaves this way I was told that pytz doesn't behave correctly > > according to the way the API was designed. > > You were told (by me) that its implementation of tzinfos was not the > _intended_ way. Which is another way of saying it was an > unanticipated way. "Correctly" is a whole different kind of judgment. > pytz users who faithfully follow the docs seem happy with it. My context is that I am working on an idea to include utc offsets in datetime objects (or on a similar object in a new module), as an alternative to something like a "fold" attribute. and since "classic arithmetic" is apparently so important, I'm trying to figure out how "classic arithmetic" _is actually supposed to work_ when adding a timedelta to a time lands it on the opposite side of a transition (or in the middle of a "spring forward" gap). If there is a "fall back" transition tonight, then adding a day to a time of 12 noon today could end up as: 12 noon tomorrow, offset still DST. 12 noon tomorrow, offset in standard time, 25 hours from now in real time. 11 AM tomorrow, offset in standard time, 24 hours from now in real time Which one of these is "classic arithmetic"? Pytz (if you don't explicitly call a "normalize" function) results in something that looks like the first. In one of the models I've thought of, you can get the second by replacing the tzinfo again, or the third by doing astimezone, but the first preserves "exactly 24 hours in the future" in both the UTC moment and the naive interpretation by leaving the offset alone even if it is an "unnatural" offset. The second one above is what you get when you call normalize. My question was whether there are any real implementations that work the intended way. If there are not, maybe the intended semantics should go by the wayside and be replaced by what pytz does. From tim.peters at gmail.com Sat Sep 12 21:41:15 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 12 Sep 2015 14:41:15 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: [] > My context is that I am working on an idea to include utc offsets in > datetime objects (or on a similar object in a new module), as an > alternative to something like a "fold" attribute. and since "classic > arithmetic" is apparently so important, Love it or hate it, it's flatly impossible to change anything about it now, for backward compatibility. > I'm trying to figure out how > "classic arithmetic" _is actually supposed to work_ when adding a > timedelta to a time lands it on the opposite side of a transition (or in > the middle of a "spring forward" gap). datetime arithmetic is defined in the Python docs. > If there is a "fall back" transition tonight, then adding a day to a > time of 12 noon today could end up as: > > 12 noon tomorrow, offset still DST. > 12 noon tomorrow, offset in standard time, 25 hours from now in real > time. > 11 AM tomorrow, offset in standard time, 24 hours from now in real time > > Which one of these is "classic arithmetic"? 12 noon tomorrow in every case, regardless of tzinfo and regardless of whether any kind of transition may or may not have occurred. Whether it is or isn't in DST in this specific case isn't defined by Python - that's entirely up to what the tzinfo implementation says. The _intended_ way of implementing tzinfos would say it was in standard time. > Pytz (if you don't > explicitly call a "normalize" function) results in something that looks > like the first. Yes, because pytz always uses a fixed-offset tzinfo. There is no difference between timeline arithmetic and classic arithmetic in any fixed-offset zone. > In one of the models I've thought of, you can get the > second by replacing the tzinfo again, or the third by doing astimezone, > but the first preserves "exactly 24 hours in the future" in both the UTC > moment and the naive interpretation by leaving the offset alone even if > it is an "unnatural" offset. > > The second one above is what you get when you call normalize. Yes. .normalize() effectively converts to UTC and back again In fact, this is all it does: def normalize(self, dt, is_dst=False): if dt.tzinfo is self: return dt if dt.tzinfo is None: raise ValueError('Naive time - no tzinfo set') return dt.astimezone(self) .fromutc() is called as the last step of .astimezone(), and .pytz overrides the default .fromutc() to plug "the appropriate" fixed-offset pytz tzinfo into the result. > My question was whether there are any real implementations that work the > intended way. dateutil, plus all implementations anyone may have written for themselves based on the Python doc examples. When datetime was originally released, there were no concrete tzinfo implementations in the world, so lots of people wrote their own for the zones they needed by copy/paste/edit of the doc examples. > If there are not, maybe the intended semantics should go > by the wayside and be replaced by what pytz does. Changing anything about default arithmetic behavior is not a possibility. This has been beaten to death multiple times on this mailing list already, and I'm not volunteering for another round of it ;-) From ethan at stoneleaf.us Sat Sep 12 21:38:10 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 12 Sep 2015 12:38:10 -0700 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: <55F47F22.5080802@stoneleaf.us> On 09/12/2015 12:16 PM, random832 at fastmail.com wrote: > If there is a "fall back" transition tonight, then adding a day to a > time of 12 noon today could end up as: > > 12 noon tomorrow, offset still DST. > 12 noon tomorrow, offset in standard time, 25 hours from now in real > time. > 11 AM tomorrow, offset in standard time, 24 hours from now in real time I believe option 2 is the intended semantics. -- ~Ethan~ From alexander.belopolsky at gmail.com Sat Sep 12 21:53:38 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 12 Sep 2015 15:53:38 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: On Sat, Sep 12, 2015 at 3:41 PM, Tim Peters wrote: > > If there are not, maybe the intended semantics should go > > by the wayside and be replaced by what pytz does. > > Changing anything about default arithmetic behavior is not a > possibility. This has been beaten to death multiple times on this > mailing list already, and I'm not volunteering for another round of it > ;-) Tim and Guido only grudgingly accept it, but datetime already gives you "the pytz way" and PEP 495 makes a small improvement to it. The localize/normalize functionality is provided by the .astimezone() method which when called without arguments will attach an appropriate fixed offset timezone to a datetime object. You can then add timedeltas to the result and stay within a "fictitious" fixed offset timezone that extends indefinitely in both directions. To get back to the actual civil time - you call .astimezone() again. This gives you what we call here a "timeline" arithmetic and occasionally it is preferable to doing arithmetic in UTC. (Effectively you do arithmetic in local standard time instead of UTC.) Using a fixed offset timezone other than UTC for timeline arithmetic is preferable in timezones that are far enough from UTC that business hours straddle UTC midnight. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Sep 12 21:55:58 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 12 Sep 2015 15:55:58 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <55F47F22.5080802@stoneleaf.us> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <55F47F22.5080802@stoneleaf.us> Message-ID: On Sat, Sep 12, 2015 at 3:38 PM, Ethan Furman wrote: > On 09/12/2015 12:16 PM, random832 at fastmail.com wrote: > > If there is a "fall back" transition tonight, then adding a day to a >> time of 12 noon today could end up as: >> >> (1) 12 noon tomorrow, offset still DST. >> (2) 12 noon tomorrow, offset in standard time, 25 hours from now in real >> time. >> (3) 11 AM tomorrow, offset in standard time, 24 hours from now in real >> time >> > > I believe option 2 is the intended semantics. This is correct. We call this behavior "classic arithmetic" on this list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sat Sep 12 22:10:29 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 12 Sep 2015 15:10:29 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: >>> If there are not, maybe the intended semantics should go >> > by the wayside and be replaced by what pytz does. >> Changing anything about default arithmetic behavior is not a >> possibility. This has been beaten to death multiple times on this >> mailing list already, and I'm not volunteering for another round of it >> ;-) [Alex] > Tim and Guido only grudgingly accept it, but datetime already gives you "the > pytz way" and PEP 495 makes a small improvement to it. To be clear, "Tim and Guido" have nothing at all against timeline arithmetic. Sometimes it's exactly what you need. But the _intended_ way to get it was always to convert to UTC first, or to just use plain old timestamps. Classic arithmetic was very intentionally the default. The only "grudgingly accepted" part is that .astimezone() grew a special case later, to make the absence of an argument "mean something": > The localize/normalize functionality is provided by the .astimezone() > method which when called without arguments will attach an appropriate > fixed offset timezone to a datetime object. You can then add timedeltas > to the result and stay within a "fictitious" fixed offset timezone that extends > indefinitely in both directions. To get back to the actual civil time - you > call .astimezone() again. This gives you what we call here a "timeline" > arithmetic and occasionally it is preferable to doing arithmetic in UTC. > (Effectively you do arithmetic in local standard time instead of UTC.) > Using a fixed offset timezone other than UTC for timeline arithmetic is > preferable in timezones that are far enough from UTC that business hours > straddle UTC midnight. The distance from UTC can't make any difference to the end result, although if you're working in an interactive shell "it's nice" to see intermediate results near current wall-clock time. "A potential problem" with .astimezone()'s default is that it _does_ create a fixed-offset zone. It's not at all obvious that it should do so. First time I saw it, my initial _expectation_ was that it "obviously" created a hybrid tzinfo reflecting the system zone's actual daylight rules, as various "tzlocal" implementations outside of Python do. From alexander.belopolsky at gmail.com Sat Sep 12 23:24:37 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 12 Sep 2015 17:24:37 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: On Sat, Sep 12, 2015 at 4:10 PM, Tim Peters wrote: > "A potential problem" with .astimezone()'s default is that it _does_ > create a fixed-offset zone. It's not at all obvious that it should do > so. First time I saw it, my initial _expectation_ was that it > "obviously" created a hybrid tzinfo reflecting the system zone's > actual daylight rules, as various "tzlocal" implementations outside of > Python do. > The clue should have been that .astimezone() is an instance method and you don't need to know time to create a hybrid tzinfo. If a Local tzinfo was available, it could just be passed to the .astimezone() method as an argument. You would not need .astimezone() to both create a tzinfo and convert the datetime instance to it. Still, I agree that this was a hack and a very similar hack to the one implemented by pytz. Hopefully once PEP 495 is implemented we will shortly see "as intended" tzinfos to become more popular. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Sep 13 00:24:58 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Sep 2015 15:24:58 -0700 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: On Sat, Sep 12, 2015 at 2:24 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Sat, Sep 12, 2015 at 4:10 PM, Tim Peters wrote: > >> "A potential problem" with .astimezone()'s default is that it _does_ >> create a fixed-offset zone. It's not at all obvious that it should do >> so. First time I saw it, my initial _expectation_ was that it >> "obviously" created a hybrid tzinfo reflecting the system zone's >> actual daylight rules, as various "tzlocal" implementations outside of >> Python do. >> > > The clue should have been that .astimezone() is an instance method and > you don't need to know time to create a hybrid tzinfo. If a Local tzinfo > was available, it could just be passed to the .astimezone() method as an > argument. You would not need .astimezone() to both create a tzinfo and > convert the datetime instance to it. > > Still, I agree that this was a hack and a very similar hack to the one > implemented by pytz. Hopefully once PEP 495 is implemented we will > shortly see "as intended" tzinfos to become more popular. > The repeated claims (by Alexander?) that astimezone() has the power of pytz's localize() need to stop. Those pytz methods work for any (pytz) timezone -- astimezone() with a default argument only works for the local time zone. (And indeed what it does is surprising, except perhaps to pytz users.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sun Sep 13 02:46:45 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 12 Sep 2015 20:46:45 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: On Sat, Sep 12, 2015 at 6:24 PM, Guido van Rossum wrote: > The repeated claims (by Alexander?) that astimezone() has the power of > pytz's localize() need to stop. Prove me wrong! :-) > Those pytz methods work for any (pytz) timezone -- astimezone() with a > default argument only works for the local time zone. That's what os.environ['TZ'] = zonename is for. The astimezone() method works for every timezone installed on your system. Try it - you won't even need to call time.tzset()! > (And indeed what it does is surprising, except perhaps to pytz users.) That I agree with. Which makes it even more surprising that I often find myself and pytz advocates on the opposite sides of the fence. Granted, setting TZ is a silly trick, but one simple way to bring a full TZ database to Python is to allow .astimezone() take a zonename string like 'Europe/Amsterdam' or 'America/Montevideo' as an argument and act as os.environ['TZ'] = zonename; t.astimezone() does now, but without messing with global state. I made this suggestion before, but I find it inferior to "as intended" tzinfos. The only real claim that I am making is that fictitious fixed offset timezones are useful and we already have some support for them in stdlib. The datetime.timezone instances that .astimezone() attaches as tzinfo are not that different from the instances that are attached by pytz's localize and normalize methods. In fact, the only major differences between datetime.timezone instances and those used by pytz is that pytz's EST and EDT instances know that they come from America/New_York, while datetime.timezone instances don't. That's why once you specify America/New_York in localize, your tzinfo.normalize knows it implicitely, while in the extended .astimezone() solution you will have to specify it again. This is not a problem when you only support one local timezone, but comes with a different set of tradeoffs when you have multiple timezones. One advantage of not carrying the memory of the parent zoneinfo in the fixed offset tzinfo instance is that pickling of datetime objects and their interchange between different systems becomes simpler. A pickle of a datetime.timezone instance is trivial - same as that of a tuple of timedelta and a short string, but if your fixed offset tzinfo carries a reference to a potentially large zoneinfo structure, you get all kinds of interesting problems when you share them between systems that have different TZ databases. In any case, there are three approaches to designing a TZ database interface in the datetime module: the "as intended" approach, the pytz approach and the astimezone(zonename:str) approach. The last two don't require a fold attribute to disambiguate end-of-dst times and the first one does. With respect to arithmetic, the last two approaches are equivalent: both timeline and classic arithmetics are possible, but neither is painless. The "as intended" approach comes with classic arithmetic that "just works" and encourages the best practice for timeline arithmetic: do it in UTC. That's why I believe PEP 495 followed by the implementation of fold-aware "as intended" tzinfos (either within stdlib or by third parties) is the right approach. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 13 03:58:48 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 12 Sep 2015 20:58:48 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: [Guido] >> Those pytz methods work for any (pytz) timezone -- astimezone() with a >> default argument only works for the local time zone. {Alex] > That's what os.environ['TZ'] = zonename is for. The astimezone() method > works for every timezone installed on your system. Try it - you won't even > need to call time.tzset()! I tried it. It makes no difference to anything for me. I stay on Windows to remind people that millions of Python users don't see any of the horrid nonsense Linuxish systems force on poor users ;-) > ... > In any case, there are three approaches to designing a TZ database interface > in the datetime module: the "as intended" approach, the pytz approach and > the astimezone(zonename:str) approach. Portability rules out #3, unless Python bundles its own zoneinfo wrapping. pytk's approach has many attractions, like no need for `fold` and no breakage of anything, and blazing fast .utcoffset(). Except at least arithmetic would have to be patched to do a `normalize` variant by magic (to attach the now-appropriate fixed-offset tzinfo, but without changing the clock in the process). Alas, that would be a huge speed hit for classic arithmetic. So, as always, the original intent is the only one that makes sense in the end ;-) > ... > That's why I believe PEP 495 followed by the implementation > of fold-aware "as intended" tzinfos (either within stdlib or by third > parties) is the right approach. Me too - except I think acceptance of 495 should be contingent upon someone first completing a fully functional (if not releasable) fold-aware zoneinfo wrapping. Details have a way of surprising, and we should learn from the last time we released a tzinfo spec in the absence of any industrial-strength wrappings using it. From guido at python.org Sun Sep 13 04:13:04 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Sep 2015 19:13:04 -0700 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: On Sat, Sep 12, 2015 at 5:46 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Sat, Sep 12, 2015 at 6:24 PM, Guido van Rossum > wrote: > >> The repeated claims (by Alexander?) that astimezone() has the power of >> pytz's localize() need to stop. > > > Prove me wrong! :-) > > >> Those pytz methods work for any (pytz) timezone -- astimezone() with a >> default argument only works for the local time zone. > > > That's what os.environ['TZ'] = zonename is for. The astimezone() method > works for every timezone installed on your system. Try it - you won't even > need to call time.tzset()! > That's global state. Doesn't count. > (And indeed what it does is surprising, except perhaps to pytz users.) > > > That I agree with. Which makes it even more surprising that I often find > myself and pytz advocates on the opposite sides of the fence. > > Granted, setting TZ is a silly trick, but one simple way to bring a full > TZ database to Python is to allow .astimezone() take a zonename string like > 'Europe/Amsterdam' or 'America/Montevideo' as an argument and act as > os.environ['TZ'] = zonename; t.astimezone() does now, but without messing > with global state. > It might as well be a different method then though. > I made this suggestion before, but I find it inferior to "as intended" > tzinfos. > > The only real claim that I am making is that fictitious fixed offset > timezones are useful and we already have some support for them in stdlib. > The datetime.timezone instances that .astimezone() attaches as tzinfo are > not that different from the instances that are attached by pytz's localize > and normalize methods. > And it has the same defect. > In fact, the only major differences between datetime.timezone instances > and those used by pytz is that pytz's EST and EDT instances know that they > come from America/New_York, while datetime.timezone instances don't. > That's why once you specify America/New_York in localize, your > tzinfo.normalize knows it implicitely, while in the extended .astimezone() > solution you will have to specify it again. This is not a problem when you > only support one local timezone, but comes with a different set of > tradeoffs when you have multiple timezones. > > One advantage of not carrying the memory of the parent zoneinfo in the > fixed offset tzinfo instance is that pickling of datetime objects and their > interchange between different systems becomes simpler. A pickle of a > datetime.timezone instance is trivial - same as that of a tuple of > timedelta and a short string, but if your fixed offset tzinfo carries a > reference to a potentially large zoneinfo structure, you get all kinds of > interesting problems when you share them between systems that have > different TZ databases. > The pickling should be careful to pickle by reference (on the timezone name). That its meaning depends on the tz database is a feature. > In any case, there are three approaches to designing a TZ database > interface in the datetime module: the "as intended" approach, the pytz > approach and the astimezone(zonename:str) approach. The last two don't > require a fold attribute to disambiguate end-of-dst times and the first one > does. With respect to arithmetic, the last two approaches are equivalent: > both timeline and classic arithmetics are possible, but neither is > painless. The "as intended" approach comes with classic arithmetic that > "just works" and encourages the best practice for timeline arithmetic: do > it in UTC. That's why I believe PEP 495 followed by the implementation of > fold-aware "as intended" tzinfos (either within stdlib or by third parties) > is the right approach. > Right. So please focus on this path and don't try to pretend to pytz users that hacks around astimezone() make pytz redundant, because they don't. There are other ways to fix the damage that pytz has done. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sun Sep 13 04:15:02 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 12 Sep 2015 22:15:02 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: On Sat, Sep 12, 2015 at 9:58 PM, Tim Peters wrote: > > That's why I believe PEP 495 followed by the implementation > > of fold-aware "as intended" tzinfos (either within stdlib or by third > > parties) is the right approach. > > Me too - except I think acceptance of 495 should be contingent upon > someone first completing a fully functional (if not releasable) > fold-aware zoneinfo wrapping. Good idea. How far are you from completing that? > Details have a way of surprising, and > we should learn from the last time we released a tzinfo spec in the > absence of any industrial-strength wrappings using it. I completely agree. That's why I am adding test cases like Lord Hope Island and Vilnius to datetimetester. I will try to create a zoneinfo wrapping prototype as well, but I will probably "cheat" and build it on top of pytz. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 13 04:25:19 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 12 Sep 2015 21:25:19 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: [Tim] >> Me too - except I think acceptance of 495 should be contingent upon >> someone first completing a fully functional (if not releasable) >> fold-aware zoneinfo wrapping. [Alex] > Good idea. How far are you from completing that? In my head, it was done last week ;-) In real life, I'm running out of spare time for much of anything anymore. I don't expect to be able to resume zoneinfo fiddling for at least 2 weeks. >> Details have a way of surprising, and >> we should learn from the last time we released a tzinfo spec in the >> absence of any industrial-strength wrappings using it. > I completely agree. That's why I am adding test cases like Lord Hope Island > and Vilnius to datetimetester. That helps a lot, but "industrial-strength" implies "by algorithm". There are far too many zones to deal with by crafting a hand-written class for each. > I will try to create a zoneinfo wrapping prototype as well, but I will > probably "cheat" and build it on top of pytz. It would be crazy not to ;-) Note that Stuart got to punt on "the hard part": .utcoffset(), since pytz only uses fixed-offset classes. For a prototype - and possibly forever after - I'd be inclined to create an exhaustive list of transition times in local time, parallel to the list of such times already there in UTC. An index into either list then gives an index into the other, and into the list of information about the transition (total offset, is_dst, etc). From alexander.belopolsky at gmail.com Sun Sep 13 04:40:33 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 12 Sep 2015 22:40:33 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: On Sat, Sep 12, 2015 at 10:25 PM, Tim Peters wrote: > > I will try to create a zoneinfo wrapping prototype as well, but I will > > probably "cheat" and build it on top of pytz. > > It would be crazy not to ;-) Note that Stuart got to punt on "the > hard part": .utcoffset(), since pytz only uses fixed-offset classes. > For a prototype - and possibly forever after - I'd be inclined to > create an exhaustive list of transition times in local time, parallel > to the list of such times already there in UTC. Yes. The only complication is that you need four transition points instead of two per year in a regular DST case: (1) start of gap; (2) end of gap; (3) start of fold; and (4) end of fold. Once you know where you are with respect to those points, figuring out utcoffset(), dst() and tzname() for either value of fold is trivial. > An index into either > list then gives an index into the other, and into the list of > information about the transition (total offset, is_dst, etc). Right. It's a shame though to work from a transitions in UTC list because most of DST rules are expressed in local times and then laboriously converted into UTC. I think I should also implement the POSIX TZ spec tzinfo. This is where the advantage of the "as intended" approach will be obvious. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sun Sep 13 04:54:09 2015 From: random832 at fastmail.com (random832 at fastmail.com) Date: Sat, 12 Sep 2015 22:54:09 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: <1442112849.1530082.382090617.0F961872@webmail.messagingengine.com> On Sat, Sep 12, 2015, at 22:25, Tim Peters wrote: > That helps a lot, but "industrial-strength" implies "by algorithm". > There are far too many zones to deal with by crafting a hand-written > class for each. It occurs to me that though it's written in C, the zdump utility included in the tz code is implementation-agnostic w.r.t. what algorithm is used by the localtime function being tested. It's algorithm could probably be adapted to python. From alexander.belopolsky at gmail.com Sun Sep 13 05:42:10 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 12 Sep 2015 23:42:10 -0400 Subject: [Datetime-SIG] PEP 495: What's left to resolve In-Reply-To: References: Message-ID: I have now rewritten the "Temporal Arithmetic" section of the PEP to reflect "Solution 3." Hg commit: https://hg.python.org/peps/rev/3dc0382326de Rendered PEP section: https://www.python.org/dev/peps/pep-0495/#temporal-arithmetic-and-comparison-operators In addition to a general review of the rewritten section, I would like to ask the group to comment on the following part specifically: "The result of addition (subtraction) of a timedelta to (from) a datetime will always have fold set to 0 even if the original datetime instance had fold=1." There are two "obvious" choices here: (t + d).fold == 0 and (t + d).fold == t.fold. My original motivation for the rule above was to minimize the chances that a user would ever see a fold=1 instance. However, I now think that preserving the value of fold may be a better option. For example, an application that needs to iterate over minutes in the repeated hour will not need to adjust the fold attribute after each addition. On the other hand, there is little harm from accidentally "leaking" fold=1 into the regular zone where fold value makes no difference. On Wed, Sep 9, 2015 at 11:44 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: >> > >Solution 1: Make t1 > t0. >> >> Solution 2: Leave t1 == t0, but make t1 != u1. >> >> >> Solution 3: Leave t1 == t0, but make *both* t0 != u0 and t1 != u1 if t0.utcoffset() != t1.utcoffset(). > > > I've implemented [1] Solution 3 in my Github fork. > > [1]: https://github.com/abalkin/cpython/commit/aac301abe89cad2d65633df98764e5b5704f7629 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 13 05:54:47 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 12 Sep 2015 22:54:47 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> Message-ID: [Alex] >>> I will try to create a zoneinfo wrapping prototype as well, but I will >>> probably "cheat" and build it on top of pytz. [Tim] >> It would be crazy not to ;-) Note that Stuart got to punt on "the >> hard part": .utcoffset(), since pytz only uses fixed-offset classes. >> For a prototype - and possibly forever after - I'd be inclined to >> create an exhaustive list of transition times in local time, parallel >> to the list of such times already there in UTC. [Alex] > Yes. The only complication is that you need four transition points instead > of two per year in a regular DST case: (1) start of gap; (2) end of gap; (3) > start of fold; and (4) end of fold. Once you know where you are with > respect to those points, figuring out utcoffset(), dst() and tzname() for > either value of fold is trivial. I wouldn't call those extras transitions - they're just warts hanging off of actual transitions. Earlier I showed Stuart how to determine everything about a possible fold from a UTC time using pytz's internal info, in PEP-431/495 Fri, 28 Aug 2015 01:01:06 -0500 He didn't reply that I saw, so it was either obvious or incomprehensible to him ;-) In any case, it just takes some very simple code once the transition record the UTC time belongs in is found. I'd be surprised if it weren't similarly easy to determine everything about a possible gap. At least in a zoneinfo wrapping, a hybrid tzinfo's .utcoffset() has to (at least internally) find "the transition record the UTC time belongs in" regardless. > ... > It's a shame though to work from a transitions in UTC list But that's what tzfiles store. It would be insane for a zoneinfo wrapping not to take advantage of that. For which reason, I do consider dateutil's zoneinfo wrapping to be insane ;-) (It inherits the default .fromutc()) Ah, BTW, I think dateutil's zoneinfo's wrapping also misunderstood some of what's actually in a tzfile. Specifically, a tzfile's " UTC/local indicators" and " standard/wall indicators" are 100% useless for anything we need, and shouldn't even be read from the file(*) (seek over 'em). > because most of DST rules are expressed in local times and then > laboriously converted into UTC. It's just a few lines of code in zoneinfo's zic.c. Nobody is doing it "by hand" there. > I think I should also implement the POSIX TZ spec tzinfo. For that you really should grab dateutil. It has a full implementation of POSIX TZ rules, as hybrid tzinfos; here from its docs: >>> tz1 = tzstr('EST+05EDT,M4.1.0,M10.5.0') >>> tz2 = tzstr('AEST-10AEDT-11,M10.5.0,M3.5.0') >>> dt = datetime(2003, 5, 8, 2, 7, 36, tzinfo=tz1) >>> dt.strftime('%X %x %Z') '02:07:36 05/08/03 EDT' >>> dt.astimezone(tz2).strftime('%X %x %Z') '16:07:36 05/08/03 AEST' Of course this implementation is tied into dateutil's rich supply of "calendar operations" too. > This is where the advantage of the "as intended" approach will be obvious. ? "As intended" is all about (to me) using hybrid tzinfos. And those are far richer in tzfiles than from POSIX rules. The latter only present us with simplest-possible DST transitions; tzfiles present us with every absurd political manipulation yet inflicted on humankind ;-) ------- (*) Long boring story. Short course: those indicators are only needed, on some systems, if a POSIZ TZ rule specifies a zone offset but gives no rules at all for daylight transitions, _and_ the system has a "posixrules" tzfile. Then an insane scheme is used to make up daylight rules "as if" the source file from which the posixrules tzfile was created had been for a zone with the TZ-specified standard offset instead, and these absurd indicators are used to figure out whether the posixrules source file specified _its_ daylight rules using UTC or local times, and if the later case then whether using standard time or wall-clock time instead. It's completely nuts. From carl at oddbird.net Sun Sep 13 08:16:23 2015 From: carl at oddbird.net (Carl Meyer) Date: Sun, 13 Sep 2015 00:16:23 -0600 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> Message-ID: <55F514B7.60004@oddbird.net> Hi Tim, On 09/10/2015 08:41 PM, Tim Peters wrote: > It's become beyond obvious that I'll never be able to make enough time > to respond to all of these, so I'll address just this for now. because > it's impossible to make progress on anything unless there's agreement > on what technical terms mean: > > > [Carl Meyer ] >>>> If you are doing any kind of "integer arithmetic on POSIX timestamps", you >>>> are _always_ doing timeline arithmetic. > > [Tim] >>> True. > > [Carl] >>>> Classic arithmetic may be many things, but the one thing it definitively is >>>> _not_ is "arithmetic on POSIX timestamps." > > [Tim] >>> False. UTC is an eternally-fixed-offset zone. There are no >>> transitions to be accounted for in UTC. Classic and timeline >>> arithmetic are exactly the same thing in any eternally-fixed-offset >>> zone. Because POSIX timestamps _are_ "in UTC", any arithmetic >>> performed on one is being done in UTC too. Your illustration next >>> goes way beyond anything I could possibly read as doing arithmetic on >>> POSIX timestamps: > > [Carl] >> Translation: "I refuse to countenance the possibility of Model A." > > Not at all. I've tried several times to get it across in English, so > this time I'll try code instead: > > def dt_add(dt, td, timeline=False): > ofs = dt.utcoffset() > as_utc = dt.replace(tzinfo=timezone.utc) > > # and the following is identical to converting to > # a timestamp, "using POSIX timestamp arithmetic", > # then converting back to calendar notation > as_utc -= ofs > as_utc += td > > if timeline: > return as_utc.astimezone(dt.tzinfo) > else: # classic > return (as_utc + ofs).replace(tzinfo=dt.tzinfo) Well, sure. Of course it is possible to use "arithmetic on POSIX timestamps" within an implementation of either kind of arithmetic, if you try hard enough; I've never said anything to the contrary (that would be a provably silly thing to say). What your code does make clear is that if you convert from a DST-using timezone to a POSIX timestamp, do "arithmetic on POSIX timestamps" and then do a normal (what you would in any other context call a "correct") conversion back to the first timezone afterwards, the result you get is timeline arithmetic. Sure, if you do a specific sort of weird (what you would in any other context call "wrong") conversion from the POSIX timestamp back to the other timezone afterward, then you can get classic arithmetic instead. I'm not sure what you think that demonstrates. I think it demonstrates that both timeline and classic arithmetic _can_ be described in terms that include "arithmetic on POSIX timestamps," but timeline arithmetic is much more naturally seen that way. Your original assertion was that "Classic arithmetic is equivalent to doing integer arithmetic on integer POSIX timestamps" as a justification for why datetime chose classic arithmetic, implying that classic arithmetic is somehow _more_ or _more naturally_ seen as "equivalent to integer arithmetic on integer POSIX timestamps" than timeline arithmetic. I found that assertion puzzling, and I still do. I'd still conclude the same thing I already said in an earlier reply: """ So, "timeline arithmetic is just arithmetic on POSIX timestamps" means viewing all aware datetimes as isomorphic to POSIX timestamps. "Classic arithmetic is just arithmetic on POSIX timestamps" means viewing aware datetimes as naive datetimes which one can pretend are in a hypothetical (maybe UTC, if you like) fixed-offset timezone which is isomorphic to actual POSIX timestamps (even though their actual timezone may not be fixed-offset). I accept that those are both true and useful in the implementation of their respective model. I just don't think either one is inherently obvious or useful as a justification of their respective mental models; rather, which one you find "obvious" just reveals your preferred mental model. """ > That adds an aware datetime to a timedelta, doing either classic or > timeline arithmetic depending on the optional flag. If you want to > claim this doesn't do either kind of arithmetic correctly, prove it > with a specific example I'm not sure why you'd think I'd have any issue with that code, or any desire to prove it wrong. [...] > I believe you have _pictured_ the POSIX timestamp number line > annotated with local calendar notations in your head, but those labels > have nothing to do with the timestamp arithmetic. It would be more accurate to say that a Model A view pictures only a single timeline, which is physical (Newtonian) time. A point on that timeline is an instant. Any given instant is annotated with any number of labels, each one a unique and unambiguous description of that instant in some labeling system. A labeling system can be very simple (e.g. POSIX timestamps), less simple (proleptic Gregorian in UTC, or to a lesser extent any fixed-offset timezone), or slightly ridiculous (timezones with folds and gaps, where now we need a `fold` attribute or an explicit offset at each instant or something similar to keep each label unique and unambiguous). This mental model implies (and requires) that all of these labeling systems are isomorphic to each other and to the physical-time timeline, and that arithmetic in any of them is isomorphic to arithmetic in any other (and is thus obviously timeline arithmetic). Really my only point in this entire thread has been that this model (contrary to some of the denigration of it on this mailing list) is actually quite intuitive, not difficult to teach, and possible to do all sorts of useful work in (_even_ when you have to also teach pytz's unfortunate API for it). If you can agree with that - great, we're done here. If you don't agree with that, we may as well still be done, because I have too much personal experience suggesting it to be true for you to be likely able to convince me otherwise :-) I've also come to recognize, through this thread, that Model B (where the "local clock time in a given timezone" "timeline" is elevated to sort-of-equal status with the physical timeline, rather than just considered a weird complex labeling system for physical time) is also useful (more useful for some tasks) and makes intuitive sense too. [...] > 1. The "as_utc -= ofs" line is theoretically impure, because it's > treating a local time _as if_ it were a UTC time. There's no real way > around that. We have to convert from local to UTC _somehow_, and > POSIX dodges the issue by providing mktime() to do that "by magic". > Here we're _inside_ the sausage factory, doing it ourselves. Some rat > guts are visible at this level. If you look inside a C mktime() > implementation, you'll find rat guts all over that too. This seems like a really hand-wavy rationalization of an operation that can only really be described as an incorrect timezone conversion. Of course that incorrect timezone conversion operation is useful for implementing classic arithmetic in the way you've implemented it, but taken out of that context it's just an incorrect conversion. The reason you _need_ that incorrect conversion is because for some reason you're really wanting to do your arithmetic in terms of POSIX timestamps (which are defined as being in UTC), but you don't _really_ want correct conversion to UTC and back (because if you do that, you'll get timeline arithmetic). > But it's no problem for Guido ;-) We just set the hands on a UTC > clock to match the local clock, then move the hands on the UTC clock > by the amount the local clock is "ahead of" or "behind" UTC. In that > way you can indeed picture the operation as being entirely "in UTC". Sure, you can, if you're motivated enough :-) > 2. This would be a foolish _implementation_ of classic arithmetic, but > not for semantic reasons. It's just grossly inefficient. Stare at > the code, and in the classic case it subtracts the UTC offset at first > only to add the same offset back later. Those cancel out, so there's > no _semantic_ need to do either.. It's only excessive concern for > theoretical purity that could stop one from spelling it as > > return dt + td > > from the start. That's technically absurd, since it's doing POSIX > timestamp arithmetic on a timestamp that's _not_ a UTC seconds count. > Its only virtue is that it gets the same answer far faster ;-) I actually think this implementation would be _less_ technically absurd. I'm not sure why you'd insist that any arithmetic on a count of seconds must be "POSIX timestamp arithmetic." In this case you're just doing integer arithmetic on a naive count of seconds since some point in the local timezone clock, rather than on a count of seconds in UTC. That's a much more natural way to view classic arithmetic, and also happens to be the way datetime actually does it (where "some point" is datetime(1, 1, 1)). Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Sun Sep 13 17:27:30 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Sep 2015 10:27:30 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <201509131224.t8DCOXHO004891@fido.openend.se> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> Message-ID: [Alex] >>I will try to create a zoneinfo wrapping prototype as well, but I will >>probably "cheat" and build it on top of pytz. [Laura Creighton] > My question, is whether it will handle Creighton, Saskatchewan, Canada? > Creighton is an odd little place. Like all of Saskatchewan, it is > in the Central time zone, even though you would expect it to be > in the Mountain time zone based on its location on the globe. > The people of Saskatchewan have decided not to adopt Daylight > Savings time. Except for the people of Creighton (and > nearby Denare Beach) -- who _do_ observe Daylight savings time. > > makes for an interesting corner case, one that I remember for > personal (and not economic, or professional) reasons. Hi, Laura! By "zoneinfo" here, we mean the IANA (aka "Olson") time zone database, which is ubiquitous on (at least) Linux: https://www.iana.org/time-zones So "will a wrapping of zoneinfo handle XYZ?" isn't so much a question about the wrapping as about what's in the IANA database. Best guess is that Creighton's rules are covered by that database's America/Winnipeg entries. It's generally true that the database makes no attempt to name every location on the planet. Instead it uses names of the form "general/specific" where "general" limits the scope to some large area of the Earth (here "America" really means "North America"), and "specific" names a well-known city within that area. For example, I live in Ashland, Wisconsin (extreme far north in that state, on Lake Superior), but so far as IANA is concerned my time zone rules are called "America/Chicago" (some 460 air miles SSE, in a different state). Just for fun, I'll paste in the comments from the Saskatchewan section of IANA's "northamerica" data file (a plain text source file from which binary tzfiles like America/Chicago and America/Winnipeg are generated). You'll see Creighton mentioned if you stay alert ;-) # Saskatchewan # From Mark Brader (2003-07-26): # The first actual adoption of DST in Canada was at the municipal # level. As the [Toronto] Star put it (1912-06-07), "While people # elsewhere have long been talking of legislation to save daylight, # the city of Moose Jaw [Saskatchewan] has acted on its own hook." # DST in Moose Jaw began on Saturday, 1912-06-01 (no time mentioned: # presumably late evening, as below), and would run until "the end of # the summer". The discrepancy between municipal time and railroad # time was noted. # From Paul Eggert (2003-07-27): # Willett (1914-03) notes that DST "has been in operation ... in the # City of Moose Jaw, Saskatchewan, for one year." # From Paul Eggert (2006-03-22): # Shanks & Pottenger say that since 1970 this region has mostly been as Regina. # Some western towns (e.g. Swift Current) switched from MST/MDT to CST in 1972. # Other western towns (e.g. Lloydminster) are like Edmonton. # Matthews and Vincent (1998) write that Denare Beach and Creighton # are like Winnipeg, in violation of Saskatchewan law. # From W. Jones (1992-11-06): # The. . .below is based on information I got from our law library, the # provincial archives, and the provincial Community Services department. # A precise history would require digging through newspaper archives, and # since you didn't say what you wanted, I didn't bother. # # Saskatchewan is split by a time zone meridian (105W) and over the years # the boundary became pretty ragged as communities near it reevaluated # their affiliations in one direction or the other. In 1965 a provincial # referendum favoured legislating common time practices. # # On 15 April 1966 the Time Act (c. T-14, Revised Statutes of # Saskatchewan 1978) was proclaimed, and established that the eastern # part of Saskatchewan would use CST year round, that districts in # northwest Saskatchewan would by default follow CST but could opt to # follow Mountain Time rules (thus 1 hour difference in the winter and # zero in the summer), and that districts in southwest Saskatchewan would # by default follow MT but could opt to follow CST. # # It took a few years for the dust to settle (I know one story of a town # on one time zone having its school in another, such that a mom had to # serve her family lunch in two shifts), but presently it seems that only # a few towns on the border with Alberta (e.g. Lloydminster) follow MT # rules any more; all other districts appear to have used CST year round # since sometime in the 1960s. # From Chris Walton (2006-06-26): # The Saskatchewan time act which was last updated in 1996 is about 30 pages # long and rather painful to read. # http://www.qp.gov.sk.ca/documents/English/Statutes/Statutes/T14.pdf From tim.peters at gmail.com Sun Sep 13 19:25:35 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Sep 2015 12:25:35 -0500 Subject: [Datetime-SIG] Timeline arithmetic? In-Reply-To: <55F514B7.60004@oddbird.net> References: <55E9D6EB.2090108@oddbird.net> <55E9F626.1080906@oddbird.net> <55ECD82E.9070305@oddbird.net> <55EDB967.2050108@oddbird.net> <55F514B7.60004@oddbird.net> Message-ID: [Carl Meyer] > Well, sure. Of course it is possible to use "arithmetic on POSIX > timestamps" within an implementation of either kind of arithmetic, if > you try hard enough; I've never said anything to the contrary (that > would be a provably silly thing to say). """ >>>> Classic arithmetic may be many things, but the one thing it definitively is >>>> _not_ is "arithmetic on POSIX timestamps." """ """ >> Translation: "I refuse to countenance the possibility of Model A." """ And for "try hard enough" here, "hard enough" amounted to "trivial" ;-) > What your code does make clear is that if you convert from a DST-using > timezone to a POSIX timestamp, do "arithmetic on POSIX timestamps" and > then do a normal (what you would in any other context call a "correct") > conversion back to the first timezone afterwards, the result you get is > timeline arithmetic. How else can you do timelime arithmetic? Zones are _defined_ as offsets from UTC now. > Sure, if you do a specific sort of weird (what you would in any other > context call "wrong") conversion from the POSIX timestamp There are only two contexts: Model A and Model B. So your "any other context" means simply "Model A", and, yes, a Model B conversion looks "wrong" to your Model A eyes. It's equally true that a Model A conversion looks "wrong" to Model B eyes. The code shows concretely how arbitrary this choice is. It's just a difference in how POSIX timestamps are _labelled_. It has nothing to do with the low-level arithmetic itself. > back to the other timezone afterward, then you can get classic > arithmetic instead. I'm not sure what you think that demonstrates """ >>>> Classic arithmetic may be many things, but the one thing it definitively is >>>> _not_ is "arithmetic on POSIX timestamps." """ """ >> Translation: "I refuse to countenance the possibility of Model A." """ > I think it demonstrates that both timeline and classic arithmetic _can_ be > described in terms that include "arithmetic on POSIX timestamps," but > timeline arithmetic is much more naturally seen that way. To you, obviously. But _on its own_ (devoid of any imposed labellings), POSIX timestamp arithmetic is _solely_ arithmetic on seconds-counts in UTC. There is no distinction between classic and timeline arithmetic in UTC (or in any other fixed-offset zone). Classic arithmetic is no more than "let's just pretend our clock is already showing UTC, do the arithmetic, then stop pretending". By Occam's Razor, that's as "natural" as anything gets ;-) > Your original assertion was that "Classic arithmetic is equivalent to > doing integer arithmetic on integer POSIX timestamps" It is. So is timeline arithmetic. The difference is in labeling, not in the arithmetic. > as a justification for why datetime chose classic arithmetic, Sorry, I don't recall trying to "justify" that choice beyond noting that there _was_ a choice, and one was overwhelmingly better suited to Guido's novel "naive time" model, while best practice for the other was already established in C via converting to UTC and back (whether spelled via a UTC tzinfo or via POSIX timestamps). There was no agonizing over that decision: the best way to proceed was obvious _given that_ "naive time" was the primary model in mind. > implying that classic arithmetic is somehow _more_ or _more naturally_ > seen as "equivalent to integer arithmetic on integer POSIX timestamps" than > timeline arithmetic. I found that assertion puzzling, and I still do. To me, it's dead easy to implement either kind of higher-level arithmetic via POSIX timestamp arithmetic, although it's easi_est_ to implement classic via the "just pretend at both ends" trick - no conversions are actually needed on either end. > I'd still conclude the same thing I already said in an earlier reply: > > """ > So, "timeline arithmetic is just arithmetic on POSIX timestamps" means > viewing all aware datetimes as isomorphic to POSIX timestamps. You're missing here that there isn't a _unique_ isomorphism. The code concretely showed that, at the higher level of datetime arithmetic, you can get either timeline or classic arithmetic depending on _which_ isomorphism you pick. The isomorphism is about the labeling, not about the POSIX timestamp arithmetic. > "Classic arithmetic is just arithmetic on POSIX timestamps" means > viewing aware datetimes as naive datetimes which one can pretend are in > a hypothetical (maybe UTC, if you like) fixed-offset timezone which is > isomorphic to actual POSIX timestamps (even though their actual timezone > may not be fixed-offset). That's why I wanted to show code ;-) The entire distinction is in the single if/else clause at the end. It doesn't require piles of words. > I accept that those are both true and useful in the implementation of > their respective model. I just don't think either one is inherently > obvious or useful as a justification of their respective mental models; > rather, which one you find "obvious" just reveals your preferred mental > model. > """ I'm not trying to "justify" anything. I'm trying to say that "POSIX timestamp arithmetic" on its own says nothing about which kind of higher-level arithmetic one sees. That's in the lableling. Which labeling you need _becomes_ obvious only after you identify the higher-level model you want. >> That adds an aware datetime to a timedelta, doing either classic or >> timeline arithmetic depending on the optional flag. If you want to >> claim this doesn't do either kind of arithmetic correctly, prove it >> with a specific example > I'm not sure why you'd think I'd have any issue with that code, or any > desire to prove it wrong. """ >>>> Classic arithmetic may be many things, but the one thing it definitively is >>>> _not_ is "arithmetic on POSIX timestamps." """ """ >> Translation: "I refuse to countenance the possibility of Model A." """ [...] >> I believe you have _pictured_ the POSIX timestamp number line >> annotated with local calendar notations in your head, but those labels >> have nothing to do with the timestamp arithmetic. > It would be more accurate to say that a Model A view pictures only a > single timeline, which is physical (Newtonian) time. A point on that > timeline is an instant. Any given instant is annotated with any number > of labels, each one a unique and unambiguous description of that instant > in some labeling system. A labeling system can be very simple (e.g. > POSIX timestamps), less simple (proleptic Gregorian in UTC, or to a > lesser extent any fixed-offset timezone), or slightly ridiculous > (timezones with folds and gaps, where now we need a `fold` attribute or > an explicit offset at each instant or something similar to keep each > label unique and unambiguous). This mental model implies (and requires) > that all of these labeling systems are isomorphic to each other and to > the physical-time timeline, and that arithmetic in any of them is > isomorphic to arithmetic in any other (and is thus obviously timeline > arithmetic). Regardless, the "labels have nothing to do with the timestamp arithmetic". > Really my only point in this entire thread has been that this model > (contrary to some of the denigration of it on this mailing list) is > actually quite intuitive, not difficult to teach, and possible to do all > sorts of useful work in (_even_ when you have to also teach pytz's > unfortunate API for it). If you can agree with that - great, we're done > here. If you don't agree with that, we may as well still be done, > because I have too much personal experience suggesting it to be true for > you to be likely able to convince me otherwise :-) If that was indeed your only point, then yes - there was again no need for any of this ;-) > I've also come to recognize, through this thread, that Model B (where > the "local clock time in a given timezone" "timeline" is elevated to > sort-of-equal status with the physical timeline, rather than just > considered a weird complex labeling system for physical time) is also > useful (more useful for some tasks) and makes intuitive sense too. It does suffer the drawback of not matching how clocks in the real world actually behave ;-) [...] >> 1. The "as_utc -= ofs" line is theoretically impure, because it's >> treating a local time _as if_ it were a UTC time. There's no real way >> around that. We have to convert from local to UTC _somehow_, and >> POSIX dodges the issue by providing mktime() to do that "by magic". >> Here we're _inside_ the sausage factory, doing it ourselves. Some rat >> guts are visible at this level. If you look inside a C mktime() >> implementation, you'll find rat guts all over that too. > This seems like a really hand-wavy rationalization of an operation that > can only really be described as an incorrect timezone conversion. Perhaps you missed that "as_utc -= ofs" is _also_ needed to implement timeline arithmetic? In fact, it's not _necessary_ to get the effect of classic arithmetic. It is necessary to implement timeline arithmetic: zones are defined as offsets from UTC, and doing POSIX timestamp arithmetic _requires_ converting to UTC first. How else are you going to do that, other than by subtracting the zone's UTC offset to convert to UTC? > Of course that incorrect timezone conversion operation is useful for > implementing classic arithmetic in the way you've implemented it, but > taken out of that context it's just an incorrect conversion. Nonsense: it is exactly the conversion "you" need at the start to correctly convert to UTC in Model A. Unless you do that first, you can't use "POSIX timestamp arithmetic" at all. > The reason you _need_ that incorrect conversion is because for some > reason you're really wanting to do your arithmetic in terms of POSIX > timestamps I needed it for two reasons. First, to implement timeline arithmetic using POSIX timestamps (a problem you seem to wish away by viewing the labels you want as being _inherently_ attached to the POSIX timestamp number line - but they're not - the only labels defined by POSIX are to and from the propleptic Gregorian calendar viewed in UTC). Second, to address your: """ >>>> Classic arithmetic may be many things, but the one thing it definitively is >>>> _not_ is "arithmetic on POSIX timestamps." """ > (which are defined as being in UTC), but you don't _really_ > want correct conversion to UTC and back (because if you do that, you'll > get timeline arithmetic). As above, it's really Model A that needs that conversion. Model B can live without it (and, in the actual Python implementation of classic arithmetic, doesn't bother with conversion on either end). As to "correct" conversion, that depends on which model you intend to implement. The "right" conversion at the end is "wrong" for the other model. >> But it's no problem for Guido ;-) We just set the hands on a UTC >> clock to match the local clock, then move the hands on the UTC clock >> by the amount the local clock is "ahead of" or "behind" UTC. In that >> way you can indeed picture the operation as being entirely "in UTC". > Sure, you can, if you're motivated enough :-) >> 2. This would be a foolish _implementation_ of classic arithmetic, but >> not for semantic reasons. It's just grossly inefficient. Stare at >> the code, and in the classic case it subtracts the UTC offset at first >> only to add the same offset back later. Those cancel out, so there's >> no _semantic_ need to do either.. It's only excessive concern for >> theoretical purity that could stop one from spelling it as >> >> return dt + td >> >> from the start. That's technically absurd, since it's doing POSIX >> timestamp arithmetic on a timestamp that's _not_ a UTC seconds count. >> Its only virtue is that it gets the same answer far faster ;-) > I actually think this implementation would be _less_ technically absurd. > I'm not sure why you'd insist that any arithmetic on a count of seconds > must be "POSIX timestamp arithmetic." Because I was addressing _your_ claims about POSIX timestamp arithmetic, like: """ >>>> Classic arithmetic may be many things, but the one thing it definitively is >>>> _not_ is "arithmetic on POSIX timestamps." """ To address that specific claim, I stuck solely to "arithmetic on POSIX timestamps". > In this case you're just doing integer arithmetic on a naive count of seconds > since some point in the local timezone clock, rather than on a count of > seconds in UTC. That's a much more natural way to view classic arithmetic, >:and also happens to be the way datetime actually does it (where "some > point" is datetime(1, 1, 1)). It can be viewed either way. A count of microseconds since 0001-01-01 00:00:00 0.0 is certainly more natural given knowledge of Python internals, but it's just a linear transformation between that notion and viewing it as a POSIX timestamp instead. As shown before, that's why "by hand" code to convert a UTC datetime to or from a POSIX timestamp (either integer or floating) is so trivial to write. From tim.peters at gmail.com Sun Sep 13 21:00:33 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Sep 2015 14:00:33 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <201509131600.t8DG07e0025688@fido.openend.se> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> Message-ID: [Tim] >> Hi, Laura! By "zoneinfo" here, we mean the IANA (aka "Olson") time >> zone database, which is ubiquitous on (at least) Linux: >> >> https://www.iana.org/time-zones >> >>So "will a wrapping of zoneinfo handle XYZ?" isn't so much a question >>about the wrapping as about what's in the IANA database. [Laura] > Then we had better be able to override it when it is wrong. Anyone can write their own tzinfo implementing any rules they like, and nobody is required to use anyone else's tzinfos. That said, zoneinfo is the most extensive collection of time zone info there is, so most people will probably use only that. And that said, zoneinfo is inordinately concerned with recording highly dubious guesses about what "the rules" were even over a century ago. Most people would probably be happy with a tzinfo that _only_ knew what "today's rules" are. POSIX TZ rules give a simple way to spell exactly that. Simple, but annoyingly cryptic. Gustavo's `dateutil` already supplies a way to magically build a tzinfo implementing a zone specified by a POSIX TZ rule string. More obvious ways to spell that are surely possible (like, for example, the obvious ways). Patches welcome ;-) >> Best guess is that Creighton's rules are covered by that database's >> America/Winnipeg entries. >> >> # Saskatchewan >> # Other western towns (e.g. Lloydminster) are like Edmonton. >> # Matthews and Vincent (1998) write that Denare Beach and Creighton >> # are like Winnipeg, in violation of Saskatchewan law. > I think that this will work. > Creighton is just across the border from Flin Flan, Manitoba. Indeed I think > the problem of 'drunken people from Manitoba trying to get one hours more > drinking done and being a menace on the highway' may have fueled the > 'we are going to have DST in violation of the law' movement in Creighton. :-) > But I am not sure how it is that a poor soul who just wants to print a > railway schedule 'in local time' is supposed to know that Creighton is > using Winnipeg time. I'm not sure how that poor soul would get a railway schedule manipulable in Python to begin with ;-) If it's "a problem" for "enough" users of a computer system, a Linux admin could simply make "America/Creighton" a link to the "America/Winnipeg" tzfile. But doing that for every nameable place on Earth might be considered annoying. To cover everyone, you may even need to specify a street address within "a city": http://www.quora.com/Are-there-any-major-cities-divided-by-two-time-zones Blame politicians for this. I can assure you Guido is not responsible for creating this mess ;-) From tim.peters at gmail.com Sun Sep 13 22:13:53 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Sep 2015 15:13:53 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <201509131940.t8DJe36w015280@fido.openend.se> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509131940.t8DJe36w015280@fido.openend.se> Message-ID: [Laura] >>> But I am not sure how it is that a poor soul who just wants to print a >>> railway schedule 'in local time' is supposed to know that Creighton is >>> using Winnipeg time. [Tim] >> I'm not sure how that poor soul would get a railway schedule >> manipulable in Python to begin with ;-) [Laura] > Via Rail will give you a schedule when you book your tickets. But I > am wrong, it gives it to you in local time, which you can scrape or > even use the via rail api. So it is the person getting off in > Creighton who wants to tell his relatives back in Halifax what > time he is arriving (in their time) (so they can call him and > avoid the hellish hotel surtax on long distance calls) who will > have the problem. Whatever time zone the traveler's railroad schedule uses, so long as it sticks to just one the traveler subtracts the departure time from the arrival time to determine how long the trip takes. They add that to the Halifax time at which they depart, and tell their Halifax relatives the result. They don't need to know anything about the destination's time zone to do this, unless a daylight transition occurs between departure and arrival, and the schedule itself remembered to account for it. In which case, pragmatically, they can just add an hour "to be safe" ;-) > And this is the sort of use case I think we will see a lot of. But there's nothing new here: datetime has been around for a dozen years already, and nobody is proposing to add any new basic functionality to tzinfos. PEP 495 is only about adding a flag to allow correct conversion of ambiguous local times (typically at the end of DST, when the local clock repeats a span of times) to UTC. So if this were a popular use case, I expect we would already have heard of it. Note that Python zoneinfo wrappings are already available via, at least, the pytz and dateutil packages. From tim.peters at gmail.com Sun Sep 13 23:58:09 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Sep 2015 16:58:09 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <201509132031.t8DKVTwJ028027@fido.openend.se> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> Message-ID: [Tim] >> Whatever time zone the traveler's railroad schedule uses, so long as >> it sticks to just one [Laura] > This is what does not happen. Which is why I have written a python > app to perform conversions for my parents, in the past. So how did they get the right time zone rules for Creighton? >>But there's nothing new here: datetime has been around for a dozen >>years already, and nobody is proposing to add any new basic >>functionality to tzinfos. PEP 495 is only about adding a flag to >>allow correct conversion of ambiguous local times (typically at the >>end of DST, when the local clock repeats a span of times) to UTC. So >>if this were a popular use case, I expect we would already have heard >>of it. Note that Python zoneinfo wrappings are already available via, >>at least, the pytz and dateutil packages. > I am a happy user of pytz. On the other hand, I think this means that > my brain has gone through some sort of non-reversible transformation > which makes me accurate, but not exactly sane on the issue. pytz made some strange decisions, from the POV of datetime's intended tzinfo design. But it also solved a problem datetime left hanging: how to disambiguate ambiguous local times. The _intended_ way to model zones with UTC offset transitions was via what the docs call a "hybrid" tzinfo: a single object smart enough on its own to figure out, e.g., whether a datetime's date and time are in "daylight" or "standard" time. However, there's currently no way for such a tzinfo to know whether an ambiguous local time is intended to be the earlier or the later of repeated times. PEP 495 aims to plug that hole. pytz solves it by _never_ creating a hybrid tzinfo. It only uses eternally-fixed-offset tzinfos. For example, for a conceptual zone with two possible total UTC offsets (one for "daylight", one for "standard"), there two distinct eternally-fixed-offset tzinfo objects in pytz. Then an ambiguous time is resolved by _which_ specific tzinfo object is attached. Typically the "daylight" tzinfo for the first time a repeated local time appears, and the "standard" tzinfo for its second appearance. In return, you have to use .localize() and .normalize() at various times, because pytz's tzinfo objects themselves are completely blind to the possibility of the total UTC offset changing. .localize() and .normalize() are needed to possibly _replace_ the tzinfo object in use, depending on the then-current date and time. OTOH, `dateutil` does create hybrid tzinfo objects. No dances are ever needed to possibly replace them. But it's impossible for dateutil's tzinfos to disambiguate times in a fold. Incidentally, dateutil also makes no attempt to account for transitions other than DST (e.g., sometimes a zone may change its _base_ ("standard") offset from UTC). So, yup, if you're thoroughly indoctrinated in pytz behavior, you will be accurate but appear insane to Guido ;-) At a semantic level, a pytz tzinfo doesn't capture the notion of a zone with offset changes - it doesn't even try to. All knowledge about offset changes is inside the .localize() and .normalize() dances. > I think I have misunderstood Alexander Belopolsky as saying that > datetime had functionality which I don't think it has. Thus I thought > we must be planning to add some functionality here. Sorry about this. Guido told Alex to stop saying that ;-) You can already get eternally-fixed-offset classes, like pytz does, on (at least) Linux systems by setting os.environ['TZ'] and then exploiting that .astimezone() without an argument magically synthesizes an eternally-fixed-offset tzinfo for "the system zone" (which the TZ envar specifies) current total UTC offset. That's not really comparable to what pytz does, except at a level that makes a lot of sense in theory but not much at all in practice ;-) > However, people do need to be aware, if they are not already, that > people with 3 times in 3 different tz will want to sort them. Telling > them that they must convert them to UTC before they do so is, in my > opinion, a very fine idea. Expecting them to work this out by themselves > via a assertion that the comparison operator is not transitive, is, > I think, asking a lot of them. Of course. Note that it's _not_ a problem in pytz, though: there are no sorting (or transitivity) problems if the only tzinfos you ever use have eternally fixed UTC offsets. There are no gaps or folds then, and everything works in an utterly obvious way - except that you have to keep _replacing_ tzinfos when they become inappropriate for the current dates and times in the datetimes they're attached to. From guido at python.org Mon Sep 14 00:21:45 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 13 Sep 2015 15:21:45 -0700 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <201509131224.t8DCOXHO004891@fido.openend.se> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> Message-ID: On Sun, Sep 13, 2015 at 5:24 AM, Laura Creighton wrote: > My question, is whether it will handle Creighton, Saskatchewan, Canada? > Creighton is an odd little place. Like all of Saskatchewan, it is > in the Central time zone, even though you would expect it to be > in the Mountain time zone based on its location on the globe. > The people of Saskatchewan have decided not to adopt Daylight > Savings time. Except for the people of Creighton (and > nearby Denare Beach) -- who _do_ observe Daylight savings time. > > makes for an interesting corner case, one that I remember for > personal (and not economic, or professional) reasons. > Hi Laura! Wouldn't it be sufficient for people in Creighton to set their timezone to US/Central? IIUC the Canadian DST rules are the same as the US ones. Now, the question may remain how do people know what to set their timezone to. But neither pytz nor datetime can help with that -- it is up to the sysadmin. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Sep 14 02:13:19 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Sep 2015 19:13:19 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> Message-ID: [Guido] > Wouldn't it be sufficient for people in Creighton to set their timezone to > US/Central? IIUC the Canadian DST rules are the same as the US ones. Now, > the question may remain how do people know what to set their timezone to. > But neither pytz nor datetime can help with that -- it is up to the > sysadmin. As Laura's use case evolved, it seems it was more that a train traveler from Halifax to Creighton wants to tell their Halifax relatives when they'll arrive in Creighton, but (of course) expressed in Halifax time. Nobody in this case knows anything about Creighton's rules, except the traveler may be staring at a train schedule giving arrival in Creighton time anyway. While this may be beyond pytz's wizardy, nothing is too hard for datetime ;-) datetime.timezone.setcontext("datetime-sig messages from mid-Sep 2015") arrivaltime = datetime.strptime(scraped_arrival_time, "") arrivaltime = datetime.replace(arrivaltime, tzinfo=gettz("Context/Creighton")) print(arrivaltime.astimezone(gettz("Context/Halifax")) From alexander.belopolsky at gmail.com Mon Sep 14 05:54:42 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 13 Sep 2015 23:54:42 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> Message-ID: On Sun, Sep 13, 2015 at 6:21 PM, Guido van Rossum wrote: > > Now, the question may remain how do people know what to set their timezone to. But neither pytz nor datetime can help with that -- it is up to the sysadmin. Note that this question is also out of the scope of "tzdist", IETF Time Zone Data Distribution Service Working Group: """ The following are Out of scope for the working group: ... - Lookup protocols or APIs to map a location to a time zone. """ I am not aware of any effort to develop such service. On the other hand, stationary ISPs have means to distribute TZ information to the hosts. See for example, RFC 4833 ("Timezone Options for DHCP"). -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Mon Sep 14 21:13:16 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 15:13:16 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> Message-ID: <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 14:53, Tim Peters wrote: > So, on your own machine, whenever daylight time starts or ends, you > manually change your TZ environment variable to specify the newly > appropriate eternally-fixed-offset zone? Of course not. No, but the hybrid zone isn't what gets attached to the individual struct tm value when you convert a time from utc (or from a POSIX timestamp) to a timezone local value. A single fixed utc offset is (along with the name and, yes, isdst flag). And pytz doesn't involve manually changing anything, it involves (as best it can) automatically applying the value to attach to each individual datetime value. > A datetime object is the Python spelling of a C struct tm, but never > included the tm_isdst flag. And no-one behind this proposal seems to be contemplating adding an equivalent to tm_gmtoff, despite that it would serve the same disambiguation purpose and make it much cheaper to maintain global invariants like a sort order according to the UTC value (No, I don't *care* how that's not how it's defined, it is *in fact* true for the UTC value that you will ever actually get from converting the values to UTC *today*, and it's the only total ordering that actually makes any sense) From alexander.belopolsky at gmail.com Mon Sep 14 21:25:55 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 15:25:55 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 3:13 PM, Random832 wrote: > (No, I don't > *care* how that's not how it's defined, it is *in fact* true for the UTC > value that you will ever actually get from converting the values to UTC > *today*, and it's the only total ordering that actually makes any sense) > This is a fine attitude when you implement your own brand new datetime library. As an author of a new library you have freedoms that developers of a 12 years old widely deployed code don't have. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Sep 14 21:30:58 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 14:30:58 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: [Tim] >> So, on your own machine, whenever daylight time starts or ends, you >> manually change your TZ environment variable to specify the newly >> appropriate eternally-fixed-offset zone? Of course not. [Random832 ] > No, but the hybrid zone isn't what gets attached to the individual > struct tm value when you convert a time from utc (or from a POSIX > timestamp) to a timezone local value. A single fixed utc offset is > (along with the name and, yes, isdst flag). You're assuming much more than POSIX - and the ISO C standard - requirs. My description was quite explicitly about how POSIX has done it all along. tm_gmtoff and tm_zone are extensions to the standards, introduced (IIRC) by BSD. Portable code (including Python's implementation) can't assume they're available. > And pytz doesn't involve manually changing anything, it involves (as > best it can) automatically applying the value to attach to each > individual datetime value. .normalize() is a manual step. It doesn't invoke itself by magic (although I believe Stuart would like Python to add internal hooks so pytz _could_ get it invoked by magic). >> A datetime object is the Python spelling of a C struct tm, but never >> included the tm_isdst flag. > And no-one behind this proposal seems to be contemplating adding an > equivalent to tm_gmtoff, It was off the table because, for backward compatibility, we need to mess with the pickle format as little as possible. It's vital that datetimes obtained from old pickles continue to work fine, and that pickles obtained from new datetime objects work fine when loaded by older Pythons unless they actually require the new fold=1 possibility. > despite that it would serve the same disambiguation purpose and > make it much cheaper to maintain global invariants like a sort order > according to the UTC value It would be nice to have! .utcoffset() is an expensive operation as-is, and being able to rely on tm_gmtoff would make that dirt-cheap instead. > (No, I don't *care* how that's not how it's defined, ? How what is defined?: > it is *in fact* true for the UTC value that you will ever actually get > from converting the values to UTC *today*, and it's the only total > ordering that actually makes any sense) Well, you lost me there. In a post-495 world, conversion to UTC will work correctly in all cases. It cannot today.; From alexander.belopolsky at gmail.com Mon Sep 14 21:44:25 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 15:44:25 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 3:30 PM, Tim Peters wrote: > > make it much cheaper to maintain global invariants like a sort order > > according to the UTC value > > It would be nice to have! .utcoffset() is an expensive operation > as-is, and being able to rely on tm_gmtoff would make that dirt-cheap > instead. If it is just a question of optimization, datetime objects can be extended to cache utcoffset. Note that PyPy have recently added caching of the hash values in datetime objects. I merged their changes in our datetime.py, but it did not look like C implementation would benefit from it as much as pure python did. I expect something similar from caching utcoffset: a measurable improvement for tzinfos implemented in Python and a wash for those implemented in C. (A more promising optimization approach is to define a C API for tzinfo interface.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Sep 14 21:48:08 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 15:48:08 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 3:44 PM, Random832 wrote: > It is an > invariant that is true today, and therefore which you can't rely on any > of the consumers of this 12 years old widely deployed code not to assume > will remain true. > Sorry, this sentence does not parse. You are missing a "not" somewhere. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Sep 14 21:49:53 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 14:49:53 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: [Tim] >> It would be nice to have! .utcoffset() is an expensive operation >> as-is, and being able to rely on tm_gmtoff would make that dirt-cheap >> instead. [Alex] > If it is just a question of optimization, Yes. If it's more than just that, then 495 doesn't actually solve the problem of getting the correct UTC offset in all cases. > datetime objects can be extended to cache utcoffset. Note that PyPy > have recently added caching of the hash values in datetime objects. I > merged their changes in our datetime.py, but it did not look like C > implementation would benefit from it as much as pure python did. I > expect something similar from caching utcoffset: a measurable > improvement for tzinfos implemented in Python and a wash for those > implemented in C. (A more promising optimization approach is to define a C > API for tzinfo interface.) There's no answer to this. It depends on how expensive .utcoffset() is, which in turn depends on how the tzinfo author implements it. I don't care now fast it is. But, even if I did, "premature optimization" applies at this time ;-) From random832 at fastmail.com Mon Sep 14 21:44:12 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 15:44:12 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 15:25, Alexander Belopolsky wrote: > This is a fine attitude when you implement your own brand new datetime > library. As an author of a new library you have freedoms that developers > of a 12 years old widely deployed code don't have. I'm talking about the real behavior of datetime as it exists *today*, and has existed for the past 12 years, before any of this "add fold flag but sort 2:15 fold1 before 2:45 fold0" nonsense gets in. It is an invariant that is true today, and therefore which you can't rely on any of the consumers of this 12 years old widely deployed code not to assume will remain true. Enforcing an invariant that all ordering is done according to UTC timestamps would not break any backward compatibility, because there is not a *single* pair of timestamps that can be represented today with any *remotely* plausible tzinfo whose order is different from that. For that matter, a tzinfo where two possible values for fold aren't sufficient to disambiguate timestamps is *more* plausible than one where the naive ordering of any two non-fold timestamps is reversed from the UTC order, yet that case apparently isn't being considered. From random832 at fastmail.com Mon Sep 14 21:58:34 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 15:58:34 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 15:30, Tim Peters wrote: > You're assuming much more than POSIX - and the ISO C standard - > requirs. My description was quite explicitly about how POSIX has done > it all along. tm_gmtoff and tm_zone are extensions to the standards, > introduced (IIRC) by BSD. Portable code (including Python's > implementation) can't assume they're available. No, but that doesn't mean it's not in fact true (what was under discussion was "your own machine", not "a minimal POSIX implementation"). And it doesn't mean it's not a best practice that python can and should copy. I'm not talking about *using* it, I'm talking about working the same way independently, so this has nothing to do with assuming it's available. > It was off the table because, for backward compatibility, we need to > mess with the pickle format as little as possible. It's vital that > datetimes obtained from old pickles continue to work fine, and that > pickles obtained from new datetime objects work fine when loaded by > older Pythons unless they actually require the new fold=1 possibility. I don't see how this would prevent that. Aware datetimes have a tzinfo *right there* that can be asked for a value to populate utcoffset with if there isn't a pickled one. > > (No, I don't *care* how that's not how it's defined, > > ? How what is defined?: Just trying, unsuccessfully apparently, to head off the "no, it's defined as working the same as a naive datetime if the tzinfo values are the same" argument that got brought up the *last* time I made this claim. > > it is *in fact* true for the UTC value that you will ever actually get > > from converting the values to UTC *today*, and it's the only total > > ordering that actually makes any sense) > > Well, you lost me there. In a post-495 world, conversion to UTC will > work correctly in all cases. It cannot today.; It'll provide *a* value in all cases. The sort order today is equivalent to using that value in all cases unless you've got a pathological tzinfo specifically crafted to break it. I think that's an important enough invariant to be worth keeping, since it is the only possible way to provide a total order in the presence of interzone comparisons. From alexander.belopolsky at gmail.com Mon Sep 14 22:01:03 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 16:01:03 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 3:49 PM, Tim Peters wrote: > It depends on how expensive .utcoffset() > is, which in turn depends on how the tzinfo author implements it. > No, it does not. In most time zones, UTC offset in seconds can be computed by C code as a 4-byte integer faster than CPython can look up the .utcoffset method. (At least for times within a few years around now.) A programmer who makes it slower should be fired. Yet I agree, "'premature optimization' applies at this time." -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Mon Sep 14 22:08:50 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 16:08:50 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> Message-ID: <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 15:48, Alexander Belopolsky wrote: > On Mon, Sep 14, 2015 at 3:44 PM, Random832 > wrote: > > > It is an > > invariant that is true today, and therefore which you can't rely on any > > of the consumers of this 12 years old widely deployed code not to assume > > will remain true. > > > > Sorry, this sentence does not parse. You are missing a "not" somewhere. Nope. I am asserting that: This invariant is true today. Therefore, it is likely that at least some consumers of datetime will assume it is true. Therefore, you cannot rely on there not being any consumers which assume it will remain true. It's awkward, since when I go back to analyze it it turns out that the "not" after 'code' actually technically modifies "any" earlier in the sentence, but the number of negatives is correct. (Though, it actually works out even without that change, since the question of *which* consumers rely on the invariant is unknown.) From tim.peters at gmail.com Mon Sep 14 22:15:35 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 15:15:35 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> Message-ID: [Random832 ] Whether or not datetimes stored tm_gmtoff and tm_zone workalikes has no effect on semantics I can see. If, in your view, they're purely an optimization, they're just a distraction for now. If you're proposing to add them _instead_ of adding `fold`, no, that can't work, for the pickle compatibility reasons already explained. Whether something is in a fold needs to preserved across pickling, but "almost all" pickles need to be readable by older Pythons too. This is doable adding one bit, but not doable at all if we need to add arbitrary timedelta and string objects _instead_ of that bit. ... >>> (No, I don't *care* how that's not how it's defined, >> ? How what is defined?: > Just trying, unsuccessfully apparently, to head off the "no, it's > defined as working the same as a naive datetime if the tzinfo values are > the same" argument that got brought up the *last* time I made this > claim. Sorry, I still don't know what this is about. >>> it is *in fact* true for the UTC value that you will ever actually get >>> from converting the values to UTC *today*, and it's the only total >>> ordering that actually makes any sense) >> Well, you lost me there. In a post-495 world, conversion to UTC will >> work correctly in all cases. It cannot today.; > It'll provide *a* value in all cases. It will provide the correct UTC offset in all cases. > The sort order today is equivalent to using that value in all > cases unless you've got a pathological tzinfo > specifically crafted to break it. I think that's an important enough > invariant to be worth keeping, since it is the only possible way to > provide a total order in the presence of interzone comparisons. Show some code? I don't know what you're talking about. It is true that the earlier and later of an ambiguous time in a fold will compare equal in their own zone, but compare not equal after conversion to UTC (or to any other zone in which they're not in one of the latter zone's folds). Is that what you're talking about? From tim.peters at gmail.com Mon Sep 14 22:22:50 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 15:22:50 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: [Tim] >> It depends on how expensive .utcoffset() >> is, which in turn depends on how the tzinfo author implements it. [Alex] > No, it does not. In most time zones, UTC offset in seconds can be computed > by C code as a 4-byte integer Which is a specific implementation of .utcoffset(). Which likely has nothing to do with how most tzinfo authors will implement _their_ .utcoffset(). For example, look at any tzinfo.utcoffset() implementation that currently exists ;-) > faster > than CPython can look up the .utcoffset method. (At least for times > within a few years around now.) A programmer who makes it slower should > be fired. So any programmer who implements .utcoffset() in Python should be fired? That's the only way I can read that. > Yet I agree, "'premature optimization' applies at this time." I'm more worried now about premature firing ;-) From random832 at fastmail.com Mon Sep 14 22:27:05 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 16:27:05 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> Message-ID: <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 16:15, Tim Peters wrote: > [Random832 ] > > Whether or not datetimes stored tm_gmtoff and tm_zone workalikes has > no effect on semantics I can see. If, in your view, they're purely an > optimization, they're just a distraction for now. If you're proposing > to add them _instead_ of adding `fold`, no, that can't work, for the > pickle compatibility reasons already explained. Whether something is > in a fold needs to preserved across pickling, but "almost all" pickles > need to be readable by older Pythons too. This is doable adding one > bit, but not doable at all if we need to add arbitrary timedelta and > string objects _instead_ of that bit. A) I'm still not sure why, but I was talking about adding an int, not a timedelta and a string. B) Older python versions can't make use of either utcoffset or fold, but can ignore either of them. I don't even see why they couldn't ignore a timedelta and a string if we felt like adding those. C) What value fold "should" have can be inferred from the time, the utcoffset, and the tzinfo. > >> Well, you lost me there. In a post-495 world, conversion to UTC will > >> work correctly in all cases. It cannot today.; > > > It'll provide *a* value in all cases. > > It will provide the correct UTC offset in all cases. I'm saying that *today*, even with no 495, it does provide *a* value in all cases (even if that's sometimes the "wrong" value for an ambiguous time). And that value is, for any plausible tzinfo, ordered the same for any given pair of datetimes with the same tzinfo as the datetimes considered as naive datetimes. There is, or appears to be, a faction that is proposing to change that by sorting fold=1 2:15 before fold=0 2:45 even though the former is *actually* 30 minutes later than the latter, and I am *utterly baffled* at why they think this is a good idea. > It is true that the earlier and later of an ambiguous time in a fold > will compare equal in their own zone, but compare not equal after > conversion to UTC (or to any other zone in which they're not in one of > the latter zone's folds). Is that what you're talking about? Yes. Or two different ambiguous times, where the properly earlier one compares greater and vice versa. I have no idea why anyone thinks this is reasonable or desirable behavior. From alexander.belopolsky at gmail.com Mon Sep 14 22:27:26 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 16:27:26 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 4:08 PM, Random832 wrote: > On Mon, Sep 14, 2015, at 15:48, Alexander Belopolsky wrote: > > On Mon, Sep 14, 2015 at 3:44 PM, Random832 > > wrote: > > > > > It is an > > > invariant that is true today, and therefore which you can't rely on any > > > of the consumers of this 12 years old widely deployed code not to > assume > > > will remain true. > > > > > > > Sorry, this sentence does not parse. You are missing a "not" somewhere. > > Nope. I am asserting that: > > This invariant is true today. > You've never specified "this invariant", but I'll assume you are talking about "a < b implies a.astimezone(UTC) < b.astimezone(UTC)." This is *not* true today: >>> from datetime import * >>> from datetimetester import Eastern >>> UTC = timezone.utc >>> a = datetime(2002, 4, 7, 1, 40, tzinfo=Eastern) >>> b = datetime(2002, 4, 7, 2, 20, tzinfo=Eastern) >>> a < b True >>> a.astimezone(UTC) < b.astimezone(UTC) False > Therefore, it is likely that at least some consumers of datetime will > assume it is true. > Obviously, if Random832 is a real person, the last statement is true. This does not make the assumption true, just proves that at least one user is confused about the current behavior. :-) > Therefore, you cannot rely on there not being any consumers which assume > it will remain true. > That's where we are now. Some users make baseless assumptions. This will probably remain true. :-( > It's awkward, since when I go back to analyze it it turns out that the > "not" after 'code' actually technically modifies "any" earlier in the > sentence, but the number of negatives is correct. Writing in shorter sentences may help. > (Though, it actually > works out even without that change, since the question of *which* > consumers rely on the invariant is unknown.) > True. We will never know how many users rely on false assumptions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Sep 14 22:39:14 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 16:39:14 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 4:22 PM, Tim Peters wrote: > > faster > > than CPython can look up the .utcoffset method. (At least for times > > within a few years around now.) A programmer who makes it slower should > > be fired. > > So any programmer who implements .utcoffset() in Python should be > fired? That's the only way I can read that. No, no! I've already conceded that caching UTC offset will probably help pure Python implementations. PyPy folks have established this fact for hash and I am willing to extrapolate their results to UTC offset. I am only trying to say that if we decide to bring a fast TZ database to CPython, pure python tzinfo interface will likely become our main bottleneck, not the speed with which C code can compute the offset value. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Sep 14 22:45:17 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 15:45:17 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> Message-ID: [Random832 ] > A) I'm still not sure why, but I was talking about adding an int, not a > timedelta and a string. > > B) Older python versions can't make use of either utcoffset or fold, but > can ignore either of them. I don't even see why they couldn't ignore a > timedelta and a string if we felt like adding those. Because all versions of Python expect a very specific pickle layout for _every_ kind of pickled object (including datetimes).. Make any change to the pickle format of any object, and older Pythons will simply blow up (raise an exception) when trying to load the new pickle - or do something insane with the pickle bits. It's impossible for older Pythons to know anything about what "the new bits" are supposed to mean, and there is no way to spell, in the pickle engine, "but if you're an older version, skip over the next N bytes". > C) What value fold "should" have can be inferred from the time, the > utcoffset, and the tzinfo. So you are proposing to add ... something ... _instead_ of adding `fold`. Already addressed that. See above. > I'm saying that *today*, even with no 495, it [utcoffset] does provide > *a* value in all cases (even if that's sometimes the "wrong" value > for an ambiguous time). Sure. > And that value is, for any plausible tzinfo, ordered the same for > any given pair of datetimes with the same tzinfo as the datetimes > considered as naive datetimes. Yes. > There is, or appears to be, a faction that is proposing to change that > by sorting fold=1 2:15 before fold=0 2:45 even though the former is > *actually* 30 minutes later than the latter, and I am *utterly baffled* > at why they think this is a good idea. It's not so much a "good idea" as that it's the only idea consistent with Python's "naive time" model. Folds and gaps don't exist in naive time. Indeed, the _concept_ of "time zone" doesn't really exist in naive time. There's _inherent_ tension between the naive time model and the way multi-offset time zones actually behave. So it goes. >> It is true that the earlier and later of an ambiguous time in a fold >> will compare equal in their own zone, but compare not equal after >> conversion to UTC (or to any other zone in which they're not in one of >> the latter zone's folds). Is that what you're talking about? > Yes. Or two different ambiguous times, where the properly earlier one > compares greater and vice versa. I have no idea why anyone thinks this > is reasonable or desirable behavior. >From which I can guess, without asking, that you think "naive time" itself is unreasonable and undesirable ;-) From random832 at fastmail.com Mon Sep 14 22:54:56 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 16:54:56 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> Message-ID: <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 16:27, Alexander Belopolsky wrote: > You've never specified "this invariant", I have specified it numerous times. > but I'll assume you are talking > about "a < b implies a.astimezone(UTC) < b.astimezone(UTC)." This is > *not* > true today: In my first few posts about the issue I did note "mid-spring-forward" times as an exception (and I assert that they are the *only* exception). But repetition and having to keep explaining this has worn me down. > >>> from datetime import * > >>> from datetimetester import Eastern > >>> UTC = timezone.utc > >>> a = datetime(2002, 4, 7, 1, 40, tzinfo=Eastern) > >>> b = datetime(2002, 4, 7, 2, 20, tzinfo=Eastern) > >>> a < b > True > >>> a.astimezone(UTC) < b.astimezone(UTC) > False I don't know how your datetimetester works, so this is a bit of a black box to me - correct me if any of the below is wrong: I assume that 2002-04-07 is the morning of the "spring forward" transition of that year. Therefore, it's worth noting, the time in "b" is one that doesn't actually exist. I actually did mention, in one of my messages on the subject, that "spring forward" times were an exception - the *only* exception, to the invariant, but that's been lost in a few of my repetitions. I'm going to assume that the interpretations that led to your results are: a = 2002-04-07 01:40:00 -0500 = 2002-04-07 06:40:00 Z b = 2002-04-07 02:20:00 -0400 = 2002-04-07 06:20:00 Z I don't think this is a reasonable value for b.astimezone(UTC) to have. But anyway, none of this is actually relevant to my claims about how the times near "fall back" transitions (i.e. with different fold values) should be sorted. I wasn't at any point proposing *actually* converting to UTC as part of the mechanism for comparing times. Just that having times near "fold" points ordered in any other way would be surprising and unreasonable. From alexander.belopolsky at gmail.com Mon Sep 14 23:01:15 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 17:01:15 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 4:54 PM, Random832 wrote: > I don't know how your datetimetester works Please educate yourself: https://hg.python.org/cpython/file/tip/Lib/test/datetimetester.py#l3539 Some familiarity with the CPython test suit is pretty much a pre-requisite to make a meaningful contribution to PEP 495 discussions at this stage. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Sep 14 23:10:47 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 17:10:47 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 4:54 PM, Random832 wrote: > I'm going to assume that the interpretations that led to your results > are: > a = 2002-04-07 01:40:00 -0500 = 2002-04-07 06:40:00 Z > b = 2002-04-07 02:20:00 -0400 = 2002-04-07 06:20:00 Z > Looks right: >>> print(a) 2002-04-07 01:40:00-05:00 >>> print(a.astimezone(UTC)) 2002-04-07 06:40:00+00:00 >>> print(b) 2002-04-07 02:20:00-04:00 >>> print(b.astimezone(UTC)) 2002-04-07 06:20:00+00:00 > I don't think this is a reasonable value for b.astimezone(UTC) to have. > You would have to go back in time to 2002-2003 and argue with Tim and Guido about that. Trust me - you would loose. Arguing about it today is even more futile. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Mon Sep 14 23:23:20 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 17:23:20 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> Message-ID: <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 16:45, Tim Peters wrote: > Because all versions of Python expect a very specific pickle layout > for _every_ kind of pickled object (including datetimes).. Make any > change to the pickle format of any object, and older Pythons will > simply blow up (raise an exception) when trying to load the new pickle > - or do something insane with the pickle bits. It's impossible for > older Pythons to know anything about what "the new bits" are supposed > to mean, and there is no way to spell, in the pickle engine, "but if > you're an older version, skip over the next N bytes". Well, you could have put some reserved bits in the original pickle format for datetime back when it was first defined, or even just allowed passing in a longer string for future extension purposes. That you didn't makes me wonder just where you're finding the space to put the fold bit. > It's not so much a "good idea" as that it's the only idea consistent > with Python's "naive time" model. Folds and gaps don't exist in naive > time. Indeed, the _concept_ of "time zone" doesn't really exist in > naive time. There's _inherent_ tension between the naive time model > and the way multi-offset time zones actually behave. So it goes. But why does it need to be consistent? You can't compare naive datetimes with aware ones. If you want to sort/bisect a list of datetimes, they have to either all be naive or all be aware. So when we're talking about how ordering works, we're fundamentally talking about how it works for aware datetimes. From alexander.belopolsky at gmail.com Mon Sep 14 23:23:34 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 17:23:34 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> Message-ID: On Mon, Sep 14, 2015 at 4:54 PM, Random832 wrote: > But anyway, none of this is actually relevant to my claims about how the > times near "fall back" transitions (i.e. with different fold values) > should be sorted. > Current behavior for gap times is relevant because it shows that you do get surprising results when you step out of the naive time model. The gap times can be created now and they violate astimezone(utc) monotonicity. PEP 495 allows more times that are outside of the naive time model: fold=1 times in the fall-back fold. It is unavoidable that astimezone(utc) is non-monotonic in this case as well. After all, why does it concern you more than the non-monotonicity of astimezone(local)? I wasn't at any point proposing *actually* converting > to UTC as part of the mechanism for comparing times. > In this case what were you *actually* proposing? > Just that having > times near "fold" points ordered in any other way would be surprising > and unreasonable. > "Other" than what? In the previous sentence you said that converting to UTC to compare was not your proposal. Please let us know what your proposal is rather than what it isn't. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Mon Sep 14 23:30:29 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 17:30:29 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> Message-ID: <1442266229.281937.383569841.7079F391@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 17:23, Alexander Belopolsky wrote: > In this case what were you *actually* proposing? My point is that I'm not proposing a specific mechanism. Just saying that the order that other people are claiming is somehow necessary for consistency with naive datetimes (that you can't actually compare these values with) is not necessary *and* not reasonable, and whatever is implemented should put them in the right order by whatever mechanism is determined to be best. From tim.peters at gmail.com Mon Sep 14 23:34:07 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 16:34:07 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> Message-ID: [Tim] >> Because all versions of Python expect a very specific pickle layout >> for _every_ kind of pickled object (including datetimes).. Make any >> change to the pickle format of any object, and older Pythons will >> simply blow up (raise an exception) when trying to load the new pickle >> - or do something insane with the pickle bits. It's impossible for >> older Pythons to know anything about what "the new bits" are supposed >> to mean, and there is no way to spell, in the pickle engine, "but if >> you're an older version, skip over the next N bytes". [Random832 ] > Well, you could have put some reserved bits in the original pickle > format for datetime back when it was first defined, or even just allowed > passing in a longer string for future extension purposes. Yes, we "could have" done that for all pickle formats for all types. But why on Earth would we? Pickle size is important to many apps (e.g., Zope applications can store billions of pickles in databases. and it may not be purely coincidence ;-) that Zope Corp paid for datetime development), and there would have been loud screaming about any "wasted" bytes. > That you didn't makes me wonder just where you're finding the space to put the > fold bit. PEP 495 gives all the details. Short course: there are bits that are _always_ 0 now within some datetime pickle bytes. `fold` will abuse one of those always-0-now pickle bits. >> It's not so much a "good idea" as that it's the only idea consistent >> with Python's "naive time" model. Folds and gaps don't exist in naive >> time. Indeed, the _concept_ of "time zone" doesn't really exist in >> naive time. There's _inherent_ tension between the naive time model >> and the way multi-offset time zones actually behave. So it goes. > But why does it need to be consistent? You can't compare naive datetimes > with aware ones. If you want to sort/bisect a list of datetimes, they > have to either all be naive or all be aware. So when we're talking about > how ordering works, we're fundamentally talking about how it works for > aware datetimes. Aware datetimes _within_ a zone also follow the naive time model. It's unfortunate that they're nevertheless called "aware" datetimes. So, sorry, even when sorting a list of aware datetimes, if they share a common zone it is wholly intended that they all work in naive time. Apps that can't tolerate naive time should convert to UTC first. End of problems. From carl at oddbird.net Mon Sep 14 23:39:28 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 14 Sep 2015 15:39:28 -0600 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> References: <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> Message-ID: <55F73E90.2040600@oddbird.net> On 09/14/2015 03:23 PM, Random832 wrote: > Well, you could have put some reserved bits in the original pickle > format for datetime back when it was first defined, or even just allowed > passing in a longer string for future extension purposes. That you > didn't makes me wonder just where you're finding the space to put the > fold bit. By exploiting the currently-always-0 first bit in the "minutes" byte. See https://www.python.org/dev/peps/pep-0495/#pickles It might be useful to read PEP 495 before commenting on it ;-) >> It's not so much a "good idea" as that it's the only idea consistent >> with Python's "naive time" model. Folds and gaps don't exist in naive >> time. Indeed, the _concept_ of "time zone" doesn't really exist in >> naive time. There's _inherent_ tension between the naive time model >> and the way multi-offset time zones actually behave. So it goes. > > But why does it need to be consistent? You can't compare naive datetimes > with aware ones. If you want to sort/bisect a list of datetimes, they > have to either all be naive or all be aware. So when we're talking about > how ordering works, we're fundamentally talking about how it works for > aware datetimes. What you're missing (and I was missing too, before going around in some lengthy earlier threads on this mailing list, which you may -- or may not -- find it worth your time to read) is that even "aware datetimes" in Python's datetime library always operate in "naive local clock time" for whatever timezone they are in; they aren't just alternate notations for the corresponding UTC time. This is why if you add timedelta(hours=24) to datetime(2014, 11, 2, 12, tzinfo=Eastern), you get datetime(2014, 11, 3, 12, tzinfo=Eastern), even though the difference between those two datetimes in UTC is 25 hours, not 24. In order to stay consistent with that "naive local clock time" model, all operations within a time zone must ignore the `fold` value. The `fold` value really doesn't exist at all in the naive clock time model, it's only tracked as a convenience for correct round-tripping. This implies that 1:30am fold=0 and 1:30am fold=1 are equal, and also that 1:20am fold=1 is "earlier" than 1:40am fold=0 (as long as you stay within the naive clock time model -- if you don't want to, you should convert to UTC). You may want to rail against that model. I (and some others) already did. You can go back in the archives here and read our efforts. Perhaps you'll have better luck if you try; I doubt it. But given that model, this is the only approach that makes sense. And you can get the same work done in that model. If you want to operate on the physical-time timeline, just always operate in UTC internally and only translate to "aware datetimes" at display time. That's what you probably should be doing anyway. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Mon Sep 14 23:48:11 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 16:48:11 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <55F73E90.2040600@oddbird.net> References: <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> <55F73E90.2040600@oddbird.net> Message-ID: [Carl Meyer , on "aware" datetimes following the "naive time" model] > ... > You may want to rail against that model. I (and some others) already > did. You can go back in the archives here and read our efforts. Perhaps > you'll have better luck if you try; I doubt it. There are two ways Random832 might have better luck: 1. Making Guido regret naive time. 2. Making datetime change what it's done for the last dozen years. I'd say the chance of #1 is one in a billion. But that's a lot better than the chance of #2 ;-) > But given that model, this is the only approach that makes sense. We should also note that we already _tried_ paying attention to fold within a single zone. Besides being even more of a conceptual mess, as you and I batted examples back & forth it became clear that it broke various other kinds of backward compatibility. > And you can get the same work done in that model. If you want to operate > on the physical-time timeline, just always operate in UTC internally and > only translate to "aware datetimes" at display time. That's what you > probably should be doing anyway. Alas, sanity is the last thing any good programmer will yield to ;-) From random832 at fastmail.com Mon Sep 14 23:53:55 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 17:53:55 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> Message-ID: <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 17:34, Tim Peters wrote: > Yes, we "could have" done that for all pickle formats for all types. > But why on Earth would we? Pickle size is important to many apps > (e.g., Zope applications can store billions of pickles in databases. > and it may not be purely coincidence ;-) that Zope Corp paid for > datetime development), and there would have been loud screaming about > any "wasted" bytes. Would allowing a 16-byte string in the future have increased the storage occupied by a 10-byte string today? Would allowing a third argument in the future have increased the storage occupied by two arguments today? As far as I can tell the pickle format for non-primitive types isn't _that_ fixed-width. > > That you didn't makes me wonder just where you're finding the space to put the > > fold bit. > > PEP 495 gives all the details. Short course: there are bits that are > _always_ 0 now within some datetime pickle bytes. `fold` will abuse > one of those always-0-now pickle bits. And what happens to older implementations if that bit is 1? > Aware datetimes _within_ a zone also follow the naive time model. > It's unfortunate that they're nevertheless called "aware" datetimes. > > So, sorry, even when sorting a list of aware datetimes, if they share > a common zone it is wholly intended that they all work in naive time. And if some of them share a common zone, then some of them will work in naive time, and some of them will work in aware time, and some pairs (well, triples) of them will cause problems for sort/bisect algorithms. Maybe it'd be best to simply ban interzone comparisons. Or have some sort of context manager to determine how arithmetic and comparisons work. From carl at oddbird.net Tue Sep 15 00:00:32 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 14 Sep 2015 16:00:32 -0600 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> References: <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> Message-ID: <55F74380.2030202@oddbird.net> On 09/14/2015 03:53 PM, Random832 wrote: > And if some of them share a common zone, then some of them will work in > naive time, and some of them will work in aware time, and some pairs > (well, triples) of them will cause problems for sort/bisect algorithms. Yep, if you're working with a heterogenous-tzinfo set of aware datetimes in Python, there may not be a total ordering (and you'll get violations of various other arithmetic identities, too). Best available option: don't do that. > Maybe it'd be best to simply ban interzone comparisons. Yes, you've got it. Interzone comparisons and arithmetic are the real wart in the datetime module, once you accept its intended mental model. If the time machine were in working order, they ought to be banned and require explicit conversion to the same timezone instead. > Or have some > sort of context manager to determine how arithmetic and comparisons > work. Ouch, please no. If there were a strong desire to support _both_ mental models of an aware datetime in the Python datetime library, there would be several better ways to do it (like two different classes for aware datetimes, or a flag on tzinfo classes, or the - rejected by Guido - PEP 500). But given the option to "just work in UTC" when you want timeline arithmetic, and the potential for just multiplying confusion by providing more mental models, I don't think there's sufficient desire for that. At least, I've lost such desire as I once may have had ;-) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From tim.peters at gmail.com Tue Sep 15 00:09:47 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 17:09:47 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> Message-ID: [Random832 ] > Would allowing a 16-byte string in the future have increased the storage > occupied by a 10-byte string today? Would allowing a third argument in > the future have increased the storage occupied by two arguments today? > As far as I can tell the pickle format for non-primitive types isn't > _that_ fixed-width. Sorry, I'm not arguing about this any more. Pickle doesn't work at all at the level of "count of bytes followed by a string". If you want to make a pickle argument that makes sense, I'm afraid you'll need to become familiar with how pickle works first. This is not the place for a pickle tutorial. Start by learning what a datetime pickle actually is. pickletools.dis() will be very helpful. >>> That you didn't makes me wonder just where you're finding the space to put the >>> fold bit. >> PEP 495 gives all the details. Short course: there are bits that are >> _always_ 0 now within some datetime pickle bytes. `fold` will abuse >> one of those always-0-now pickle bits. > And what happens to older implementations if that bit is 1? Unpickling will raise an exception, complaining that the minute value is out of range. >> Aware datetimes _within_ a zone also follow the naive time model. >> It's unfortunate that they're nevertheless called "aware" datetimes. >> >> So, sorry, even when sorting a list of aware datetimes, if they share >> a common zone it is wholly intended that they all work in naive time. > And if some of them share a common zone, then some of them will work in > naive time, and some of them will work in aware time, and some pairs > (well, triples) of them will cause problems for sort/bisect algorithms. All sorts of things may happen, yes. As I said, if you need to care, convert to UTC first. Most apps do nothing like this. > Maybe it'd be best to simply ban interzone comparisons. We cannot. Backward compatibility. If would have been better had interzone comparisons and subtraction not been supported from the start. Too late to change that. > Or have some sort of context manager to determine how arithmetic and comparisons > work. Write a PEP ;-) From tim.peters at gmail.com Tue Sep 15 02:31:24 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 19:31:24 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442259852.259192.383467881.5156BE88@webmail.messagingengine.com> <1442261330.265134.383487745.4E9C3005@webmail.messagingengine.com> <1442264096.274955.383520321.20C18E75@webmail.messagingengine.com> Message-ID: [Alexander Belopolsky] >> ... >> >>> from datetime import * >> >>> from datetimetester import Eastern >> >>> UTC = timezone.utc >> >>> a = datetime(2002, 4, 7, 1, 40, tzinfo=Eastern) >> >>> b = datetime(2002, 4, 7, 2, 20, tzinfo=Eastern) >> >>> a < b >> True >> >>> a.astimezone(UTC) < b.astimezone(UTC) >> False [Random832 ] > ... > I don't know how your datetimetester works, so this is a bit of a black > box to me - correct me if any of the below is wrong: > > I assume that 2002-04-07 is the morning of the "spring forward" > transition of that year. Therefore, it's worth noting, the time in "b" > is one that doesn't actually exist. I actually did mention, in one of my > messages on the subject, that "spring forward" times were an exception - > the *only* exception, to the invariant, but that's been lost in a few of > my repetitions. > > I'm going to assume that the interpretations that led to your results > are: > a = 2002-04-07 01:40:00 -0500 = 2002-04-07 06:40:00 Z > b = 2002-04-07 02:20:00 -0400 = 2002-04-07 06:20:00 Z > > I don't think this is a reasonable value for b.astimezone(UTC) to have. I can explain the thinking here: in "naive time", there's no such thing as "missing time". Indeed, if you watch an old-fashioned mechanical clock near the time DST starts, you'll see it change from 1:59 to 2:00 to 2:01 ... to 2:20. Since it's now ">= 2:00" on the local clock, US rules say you're now in daylight time. So the only UTC offset that _does_ make sense is the US/Eastern daylight offset: -4. "But you forgot to set the clock ahead, so this should _really_ be considered as still being in standard time!" is an argument outside the naive time model. "Set the clock ahead? That's insane! My clock keeps perfect time - why would I break it?" ;-) The bottom-line lesson being the same as always: if you need to care about folds and gaps, in datetime it's intended that you work in UTC instead (or some other fixed-offset zone). From random832 at fastmail.com Tue Sep 15 03:19:56 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 21:19:56 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> Message-ID: <1442279996.198469.383712497.36F9DE26@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 18:09, Tim Peters wrote: > Sorry, I'm not arguing about this any more. Pickle doesn't work at > all at the level of "count of bytes followed by a string". The SHORT_BINBYTES opcode consists of the byte b'C', followed by *yes indeed* "count of bytes followed by a string". > If you > want to make a pickle argument that makes sense, I'm afraid you'll > need to become familiar with how pickle works first. This is not the > place for a pickle tutorial. > > Start by learning what a datetime pickle actually is. > pickletools.dis() will be very helpful. 0: \x80 PROTO 3 2: c GLOBAL 'datetime datetime' 21: q BINPUT 0 23: C SHORT_BINBYTES b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00' 35: q BINPUT 1 37: \x85 TUPLE1 38: q BINPUT 2 40: R REDUCE 41: q BINPUT 3 43: . STOP The payload is ten bytes, and the byte immediately before it is in fact 0x0a. If I pickle any byte string under 256 bytes long by itself, the byte immediately before the data is the length. This is how I initially came to the conclusion that "count of bytes followed by a string" was valid. I did, before writing my earlier post, look into the high-level aspects of how datetime pickle works - it uses __reduce__ to create up to two arguments, one of which is a 10-byte string, and the other is the tzinfo. Those arguments are passed into the date constructor and detected by that constructor - for example, I can call it directly with datetime(b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00') and get the same result as unpickling. At the low level, the part that represents that first argument does indeed appear to be "count of bytes followed by a string". I can add to the count, add more bytes, and it will call the constructor with the longer string. If I use pickletools.dis on my modified value the output looks the same except for, as expected, the offsets and the value of the argument to the SHORT_BINBYTES opcode. So, it appears that, as I was saying, "wasted space" would not have been an obstacle to having the "payload" accepted by the constructor (and produced by __reduce__ ultimately _getstate) consist of "a byte string of >= 10 bytes, the first 10 of which are used and the rest of which are ignored by python <= 3.5" instead of "a byte string of exactly 10 bytes", since it would have accepted and produced exactly the same pickle values, but been prepared to accept larger arguments pickled from future versions. For completeness: Protocol version 2 and 1 use BINUNICODE on a latin1-to-utf8 version of the byte string, with a similar "count of bytes followed by a string" (though the count of bytes is of UTF-8 bytes). Protocol version 0 uses UNICODE, terminated by \n, and a literal \n is represented by \\u000a. In all cases some extra data around the value sets it up to call "codecs.encode(..., 'latin1')" upon unpickling. So have I shown you that I know enough about the pickle format to know that permitting a longer string (and ignoring the extra bytes) would have had zero impact on the pickle representation of values that did not contain a longer string? I'd already figured out half of this before writing my earlier post; I just assumed *you* knew enough that I wouldn't have to show my work. Extra credit: 0: \x80 PROTO 3 2: c GLOBAL 'datetime datetime' 21: q BINPUT 0 23: ( MARK 24: M BININT2 2014 27: K BININT1 9 29: K BININT1 14 31: K BININT1 21 33: K BININT1 6 35: K BININT1 42 37: t TUPLE (MARK at 23) 38: q BINPUT 1 40: R REDUCE 41: q BINPUT 2 43: . STOP From alexander.belopolsky at gmail.com Tue Sep 15 03:42:00 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Sep 2015 21:42:00 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442279996.198469.383712497.36F9DE26@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> <1442279996.198469.383712497.36F9DE26@webmail.messagingengine.com> Message-ID: No credit for anything other than the "extra credit" section. Partial credit for that. Study that printout and you should understand what Tim was saying. > On Sep 14, 2015, at 9:19 PM, Random832 wrote: > >> On Mon, Sep 14, 2015, at 18:09, Tim Peters wrote: >> Sorry, I'm not arguing about this any more. Pickle doesn't work at >> all at the level of "count of bytes followed by a string". > > The SHORT_BINBYTES opcode consists of the byte b'C', followed by *yes > indeed* "count of bytes followed by a string". > >> If you >> want to make a pickle argument that makes sense, I'm afraid you'll >> need to become familiar with how pickle works first. This is not the >> place for a pickle tutorial. >> >> Start by learning what a datetime pickle actually is. >> pickletools.dis() will be very helpful. > > 0: \x80 PROTO 3 > 2: c GLOBAL 'datetime datetime' > 21: q BINPUT 0 > 23: C SHORT_BINBYTES b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00' > 35: q BINPUT 1 > 37: \x85 TUPLE1 > 38: q BINPUT 2 > 40: R REDUCE > 41: q BINPUT 3 > 43: . STOP > > The payload is ten bytes, and the byte immediately before it is in fact > 0x0a. If I pickle any byte string under 256 bytes long by itself, the > byte immediately before the data is the length. This is how I initially > came to the conclusion that "count of bytes followed by a string" was > valid. > > I did, before writing my earlier post, look into the high-level aspects > of how datetime pickle works - it uses __reduce__ to create up to two > arguments, one of which is a 10-byte string, and the other is the > tzinfo. Those arguments are passed into the date constructor and > detected by that constructor - for example, I can call it directly with > datetime(b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00') and get the same result > as unpickling. > > At the low level, the part that represents that first argument does > indeed appear to be "count of bytes followed by a string". I can add to > the count, add more bytes, and it will call the constructor with the > longer string. If I use pickletools.dis on my modified value the output > looks the same except for, as expected, the offsets and the value of the > argument to the SHORT_BINBYTES opcode. > > So, it appears that, as I was saying, "wasted space" would not have been > an obstacle to having the "payload" accepted by the constructor (and > produced by __reduce__ ultimately _getstate) consist of "a byte string > of >= 10 bytes, the first 10 of which are used and the rest of which are > ignored by python <= 3.5" instead of "a byte string of exactly 10 > bytes", since it would have accepted and produced exactly the same > pickle values, but been prepared to accept larger arguments pickled from > future versions. > > For completeness: Protocol version 2 and 1 use BINUNICODE on a > latin1-to-utf8 version of the byte string, with a similar "count of > bytes followed by a string" (though the count of bytes is of UTF-8 > bytes). Protocol version 0 uses UNICODE, terminated by \n, and a literal > \n is represented by \\u000a. In all cases some extra data around the > value sets it up to call "codecs.encode(..., 'latin1')" upon unpickling. > > So have I shown you that I know enough about the pickle format to know > that permitting a longer string (and ignoring the extra bytes) would > have had zero impact on the pickle representation of values that did not > contain a longer string? I'd already figured out half of this before > writing my earlier post; I just assumed *you* knew enough that I > wouldn't have to show my work. > > Extra credit: > 0: \x80 PROTO 3 > 2: c GLOBAL 'datetime datetime' > 21: q BINPUT 0 > 23: ( MARK > 24: M BININT2 2014 > 27: K BININT1 9 > 29: K BININT1 14 > 31: K BININT1 21 > 33: K BININT1 6 > 35: K BININT1 42 > 37: t TUPLE (MARK at 23) > 38: q BINPUT 1 > 40: R REDUCE > 41: q BINPUT 2 > 43: . STOP > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/ From tim.peters at gmail.com Tue Sep 15 03:56:47 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 20:56:47 -0500 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: <1442279996.198469.383712497.36F9DE26@webmail.messagingengine.com> References: <1442085362.324875.381920729.5E7A6DCE@webmail.messagingengine.com> <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> <1442279996.198469.383712497.36F9DE26@webmail.messagingengine.com> Message-ID: [Tim] >> Sorry, I'm not arguing about this any more. Pickle doesn't work at >> all at the level of "count of bytes followed by a string". [Random832 ] > The SHORT_BINBYTES opcode consists of the byte b'C', followed by *yes > indeed* "count of bytes followed by a string". Yes, some individual opcodes do work that way. >> If you >> want to make a pickle argument that makes sense, I'm afraid you'll >> need to become familiar with how pickle works first. This is not the >> place for a pickle tutorial. >> >> Start by learning what a datetime pickle actually is. >> pickletools.dis() will be very helpful. > 0: \x80 PROTO 3 > 2: c GLOBAL 'datetime datetime' > 21: q BINPUT 0 > 23: C SHORT_BINBYTES b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00' > 35: q BINPUT 1 > 37: \x85 TUPLE1 > 38: q BINPUT 2 > 40: R REDUCE > 41: q BINPUT 3 > 43: . STOP > > The payload is ten bytes, and the byte immediately before it is in fact > 0x0a. If I pickle any byte string under 256 bytes long by itself, the > byte immediately before the data is the length. This is how I initially > came to the conclusion that "count of bytes followed by a string" was > valid. Ditto. > I did, before writing my earlier post, look into the high-level aspects > of how datetime pickle works - it uses __reduce__ to create up to two > arguments, one of which is a 10-byte string, and the other is the > tzinfo. Those arguments are passed into the date constructor and > detected by that constructor - for example, I can call it directly with > datetime(b'\x07\xdf\t\x0e\x15\x06*\x00\x00\x00') and get the same result > as unpickling. Good job! That abuse of the constructor was supposed to remain a secret ;-) > At the low level, the part that represents that first argument does > indeed appear to be "count of bytes followed by a string". I can add to > the count, add more bytes, and it will call the constructor with the > longer string. If I use pickletools.dis on my modified value the output > looks the same except for, as expected, the offsets and the value of the > argument to the SHORT_BINBYTES opcode. > > So, it appears that, as I was saying, "wasted space" would not have been > an obstacle to having the "payload" accepted by the constructor (and > produced by __reduce__ ultimately _getstate) consist of "a byte string > of >= 10 bytes, the first 10 of which are used and the rest of which are > ignored by python <= 3.5" instead of "a byte string of exactly 10 > bytes", since it would have accepted and produced exactly the same > pickle values, but been prepared to accept larger arguments pickled from > future versions. Yes, if we had done things differently from the start, things would work differently today. But what's the point? We have to live now with what _was_ done. A datetime pickle carrying a string payload with anything other than exactly 10 bytes will almost always blow up under older Pythons. and would be considered "a bug" if it didn't. Pickles are not at all intended to be forgiving (they're enough of a potential security hole without going out of their way to ignore random mysteries). It may be nicer if Python had a serialization format more deliberately designed for evolution of class structure - but it doesn't. Classes that need such a thing now typically store their own idea of a "version" number as part of their pickled state datetime never did. > ... > So have I shown you that I know enough about the pickle format to know > that permitting a longer string (and ignoring the extra bytes) would > have had zero impact on the pickle representation of values that did not > contain a longer string? Yes. If we had a time machine, it might even have proved useful ;-) > I'd already figured out half of this before > writing my earlier post; I just assumed *you* knew enough that I > wouldn't have to show my work. It's always best to show your work on a public list. Thanks for finally ;-) doing so! From random832 at fastmail.com Tue Sep 15 04:08:58 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 22:08:58 -0400 Subject: [Datetime-SIG] Are there any "correct" implementations of tzinfo? In-Reply-To: (Alexander Belopolsky's message of "Mon, 14 Sep 2015 21:42:00 -0400") References: <201509131224.t8DCOXHO004891@fido.openend.se> <201509131600.t8DG07e0025688@fido.openend.se> <201509132031.t8DKVTwJ028027@fido.openend.se> <201509140827.t8E8RPqb001076@fido.openend.se> <1442257996.253100.383441705.7A0986C7@webmail.messagingengine.com> <1442260714.263025.383475777.4728D768@webmail.messagingengine.com> <1442262425.268793.383506657.0443601E@webmail.messagingengine.com> <1442265800.280460.383547057.16B65298@webmail.messagingengine.com> <1442267635.287083.383576201.0990DAA7@webmail.messagingengine.com> <1442279996.198469.383712497.36F9DE26@webmail.messagingengine.com> Message-ID: Alexander Belopolsky writes: > No credit for anything other than the "extra credit" section. Partial > credit for that. Study that printout and you should understand what > Tim was saying. My original claim was that the pickler can't know and doesn't care if a byte string value merely happens to be 10 bytes long for this object, or is 10 bytes long every time, it will encode it with SHORT_BINBYTES ["C", count of bytes, string of bytes] regardless. The datetime constructor itself does care what value is passed to it, but what I was saying was the class could have been written originally to accept optionally longer strings and ignore the extra values, so that future versions could pickle as longer strings and be compatible. In such a case, the actual pickle format would _still_ have consisted of __reduce__() == (datetime, (b"..........", [optional tzinfo])), just with the option of accepting (and ignoring) longer byte strings encoded by later versions of the datetime class. The pickle format is versatile enough to pass any (pickleable) value at all to a constructor (or to __setstate__). Designing the datetime constructor/setstate in the past to be able to accept a byte string of a length other than exactly 10 would have allowed the representation to be extended in the present, rather than smuggling a single extra bit into one of the existing bytes. But it would not have changed the actual representation that would have been produced by pickle back then, not one bit. And, now, to answer my own question from a previous message... >>> class C(): ... def __reduce__(self): ... return (datetime, (b"\x07\xdf\t\x0e\x155'\rA\xb2",)) ... >>> pickle.loads(pickle.dumps(C())) datetime.datetime(2015, 9, 14, 21, 53, 39, 868786) >>> class C(): ... def __reduce__(self): ... return (datetime, (b"\x07\xdf\t\x0e\x955'\rA\xb2",)) ... >>> pickle.loads(pickle.dumps(C())) datetime.datetime(2015, 9, 14, 149, 53, 39, 868786) >>> datetime.strftime(pickle.loads(pickle.dumps(C())), '%Y%m%d%H%M%S') Traceback (most recent call last): File "", line 1, in ValueError: hour out of range That was the bit we were talking about, right? From alexander.belopolsky at gmail.com Wed Sep 16 04:53:21 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 15 Sep 2015 22:53:21 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil Message-ID: On Sat, Sep 12, 2015 at 9:58 PM, Tim Peters wrote: > I think acceptance of 495 should be contingent upon > someone first completing a fully functional (if not releasable) > fold-aware zoneinfo wrapping. > After studying both pytz and dateutil offerings, I decided that it is easier to add "fold-awareness" to the later. I created a fork [1] on Github and added [2] fold-awareness logic to the tzrange class that appears to be the base class for most other tzinfo implementations. I was surprised how few test cases had to be changed. It looks like dateutil test suit does not test questionable (in the absence of fold) behavior. I will need to beef up the test coverage. I am making all development public early on and hope to see code reviews and pull requests from interested parties. Pull requests with additional test cases are most welcome. [1]: https://github.com/abalkin/dateutil/tree/pep-0495 [2]: https://github.com/abalkin/dateutil/commit/57ecdbf481de7e21335ece8fcc5673d59252ec3f -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Sep 18 04:47:44 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 17 Sep 2015 22:47:44 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: Message-ID: [Tim Peters] > > I think acceptance of 495 should be contingent upon > someone first completing a fully functional (if not releasable) > fold-aware zoneinfo wrapping. [Alexander Belopolsky] > > I am making all development public early on and hope to see code reviews and pull requests from interested parties. Pull requests with additional test cases are most welcome. I've made some additional progress in my dateutil fork [1]. The tzfile class is now fold-aware. The tzfile implementation of tzinfo takes the history of local time type changes from a binary zoneinfo file. These files are installed on the majority of UNIX platforms. More testing is needed, but I think my fork is now close to meeting Tim's challenge. Please note that you need to run the modified dateutil fork [1] code under PEP 495 fork of CPython. [2] [1]: https://github.com/abalkin/dateutil/tree/pep-0495 [2]: https://github.com/abalkin/cpython -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at ganssle.io Fri Sep 18 16:23:30 2015 From: paul at ganssle.io (Paul Ganssle) Date: Fri, 18 Sep 2015 10:23:30 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil Message-ID: <55FC1E62.6060202@ganssle.io> > After studying both pytz and dateutil offerings, I decided that it is > easier to add "fold-awareness" to the later. I created a fork [1] on > Github and added [2] fold-awareness logic to the tzrange class that > appears to be the base class for most other tzinfo implementations. I > was surprised how few test cases had to be changed. It looks like > dateutil test suit does not test questionable (in the absence of fold) > behavior. I will need to beef up the test coverage. Just to clarify on the point of test coverage, I think one of the main reasons for this is that, at the moment, dateutil doesn't handle ambiguous times well (see Issue #57[1] and Issue #112[2]), so any such tests would likely be failing tests. At the moment, I can't comment on how easy this will be to implement in a release version of dateutil if PEP 495 is accepted because I haven't looked into it enough, but one thing to be aware of is that backwards-compatibility is a high priority here (we'll continue to support python 2.6+ for the foreseeable future), so any changes need to fall back to sane behavior. Preferably, they would fall back to the exact /same/ behavior, regardless of platform and python version. Of course, it doesn't seem like your goal right now is to build something that can roll out right away as soon as PEP 495 is integrated, so there's plenty of time to clean it up and possibly build in a compatibility module, I just thought I'd bring that up so you're aware. [1] https://github.com/dateutil/dateutil/issues/57 [2] https://github.com/dateutil/dateutil/issues/112 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 18 17:56:15 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 18 Sep 2015 10:56:15 -0500 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <55FC1E62.6060202@ganssle.io> References: <55FC1E62.6060202@ganssle.io> Message-ID: [Alex] >> After studying both pytz and dateutil offerings, I decided that it is >> easier to add "fold-awareness" to the later. I created a fork [1] on >> Github and added [2] fold-awareness logic to the tzrange class that appears >> to be the base class for most other tzinfo implementations. I was >> surprised how few test cases had to be changed. It looks like dateutil >> test suit does not test questionable (in the absence of fold) behavior. I >> will need to beef up the test coverage. [Paul Ganssle ] > Just to clarify on the point of test coverage, I think one of the main > reasons for this is that, at the moment, dateutil doesn't handle > ambiguous times well (see Issue #57[1] and Issue #112[2]), so any such > tests would likely be failing tests. Because dateutil inherits the default .fromutc(), it's all but certain it can't handle cases in the IANA database where a zone's base ("standard") offset changed either. But it's handling gaps & folds due to DST transitions as well as is _possible_ for a hybrid tzinfo given datetime's original design. There was no provision in datetime to make it possible for a hybrid tzinfo to know whether the earlier or later of an ambiguous local time is intended. That's not dateutil's fault, and not something any hybrid tzinfo can solve before PEP 495 is implemented. dateutil is following the doc's advice to consider an ambiguous time to be the later (in "standard time"), which in combination with inheriting the default .fromutc() is enough to ensure that UTC->local conversion at least mimics the hands on the local clock (skipping local times at DST start, and repeating some at DST end). So it's doing the best it _can_ do now in those respects. > At the moment, I can't comment on how easy this will be to implement in > a release version of dateutil if PEP 495 is accepted because I haven't > looked into it enough, but one thing to be aware of is that > backwards-compatibility is a high priority here (we'll continue to > support python 2.6+ for the foreseeable future), so any changes need to > fall back to sane behavior. Preferably, they would fall back to the > exact /same/ behavior, regardless of platform and python version. > > Of course, it doesn't seem like your goal right now is to build > something that can roll out right away as soon as PEP 495 is integrated, > so there's plenty of time to clean it up and possibly build in a > compatibility module, I just thought I'd bring that up so you're aware. The goal of PEP 495 is to make it possible for hybrid tzinfos to handle all cases of gaps and folds due to any cause whatsoever (provided that folds are never worse than 2-to-1), What Alex is really after here is to kick the tires on PEP 495, to make sure: 1. All cases in the IANA database are in fact solved (that database is the richest source of the goofiest zone changes to date). 2. That it's not only possible, but implementable with reasonable effort and performance. dateutil was "the obvious" base to start from, since it's the only widely used wrapping of the IANA database using hybrid tzinfos (pytz took a very different path). Whether dateutil can make _use_ of this experiment is up to you ;-) In cases where results differ from the current implementation, the latter results can only be called "wrong". Which you may well need to preserve. In which case, I'd suggest leaving the current implementation alone, and _adding_ a new wrapping of tzfiles based on Alex's code. dateutil's get-a-zone factory functions would need to grow some way to spell "I want a pre-495 tzinfo" or "I want a post-495 tzinfo". New functions, optional function flags, global setting ... whatever you think works best. Of course this would apply to wrappings of other sources of zone info too, but the IANA database must be by far the hardest (e.g., fold and gap times can be deduced directly from a POSIX TZ string rule, which are only subject to twice-a-year DST changes at worst). From paul at ganssle.io Fri Sep 18 19:05:40 2015 From: paul at ganssle.io (Paul Ganssle) Date: Fri, 18 Sep 2015 13:05:40 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> Message-ID: <55FC4464.9050201@ganssle.io> On 9/18/2015 11:56, Tim Peters wrote: > Because dateutil inherits the default .fromutc(), it's all but certain > it can't handle cases in the IANA database where a zone's base > ("standard") offset changed either. > > But it's handling gaps & folds due to DST transitions as well as is > _possible_ for a hybrid tzinfo given datetime's original design. There > was no provision in datetime to make it possible for a hybrid tzinfo > to know whether the earlier or later of an ambiguous local time is > intended. That's not dateutil's fault, and not something any hybrid > tzinfo can solve before PEP 495 is implemented. > > dateutil is following the doc's advice to consider an ambiguous time > to be the later (in "standard time"), which in combination with > inheriting the default .fromutc() is enough to ensure that UTC->local > conversion at least mimics the hands on the local clock (skipping > local times at DST start, and repeating some at DST end). So it's > doing the best it _can_ do now in those respects. This is quite possibly true, and is roughly in line with my thinking on the matter to date, but in my mind the behavior of dateutil with respect to ambiguous times is undefined, so I'm not going to add tests that enforce an arbitrary implementation choice as it's not behavior I want to lock down. It's a separate question as to whether it can or cannot do better in some cases. The issues I linked to are both cases where an unambiguously specified time ("now" or a time specified in UTC with an IANA time zone) are incorrectly converted into local time. It is//almost certainly true that enough information is available to properly localize these datetimes, but at least in the case of localizing "now" the cost in doing so is additional complexity on the back-end. > > The goal of PEP 495 is to make it possible for hybrid tzinfos to > handle all cases of gaps and folds due to any cause whatsoever > (provided that folds are never worse than 2-to-1), What Alex is > really after here is to kick the tires on PEP 495, to make sure: > > 1. All cases in the IANA database are in fact solved (that database > is the richest source of the goofiest zone changes to date). > > 2. That it's not only possible, but implementable with reasonable effort > and performance. > > dateutil was "the obvious" base to start from, since it's the only > widely used wrapping of the IANA database using hybrid tzinfos (pytz > took a very different path). Yes, this was more or less my understanding. I just thought I'd put it out there in case the fact that the more complex nature of the actual implementation had some bearing on the thinking about the implementation. For example, these tests could be problematic from a backwards compatibility standpoint. I haven't had time to read the PEP or the discussion on the matter, so maybe this has already been considered, but would make for a simpler interface if an unspecified value for fold left the old behavior intact. I'll definitely read these things when I have time, so if it's already been discussed no need to re-hash on my behalf. > Whether dateutil can make _use_ of this experiment is up to you ;-) > > In cases where results differ from the current implementation, the > latter results can only be called "wrong". Which you may well need to > preserve. > In which case, I'd suggest leaving the current implementation alone, > and _adding_ a new wrapping of tzfiles based on Alex's code. > dateutil's get-a-zone factory functions would need to grow some way to > spell "I want a pre-495 tzinfo" or "I want a post-495 tzinfo". New > functions, optional function flags, global setting ... whatever you > think works best. > > Of course this would apply to wrappings of other sources of zone info > too, but the IANA database must be by far the hardest (e.g., fold and > gap times can be deduced directly from a POSIX TZ string rule, which > are only subject to twice-a-year DST changes at worst). I think it's likely premature (and the wrong forum) to discuss such downstream implementation details, but I imagine that it won't be difficult to devise some scheme that by default gives the right answer where possible, as long as there's a relatively straightforward way of wrapping datetimes such that it provides a consistent /interface/ across various platforms. As for the question of whether to preserve the "wrong" values for the sake of backwards compatibility, I'm not likely to sacrifice maximum /accuracy/ across platforms for maximum /consistency/ across platforms. But again, this is somewhat off-topic. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 834 bytes Desc: OpenPGP digital signature URL: From guido at python.org Fri Sep 18 19:07:38 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Sep 2015 10:07:38 -0700 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <55FC4464.9050201@ganssle.io> References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> Message-ID: On Fri, Sep 18, 2015 at 10:05 AM, Paul Ganssle wrote: > On 9/18/2015 11:56, Tim Peters wrote: > > Because dateutil inherits the default .fromutc(), it's all but certain it > can't handle cases in the IANA database where a zone's base ("standard") > offset changed either. > > But it's handling gaps & folds due to DST transitions as well as is > _possible_ for a hybrid tzinfo given datetime's original design. There was > no provision in datetime to make it possible for a hybrid tzinfo to know > whether the earlier or later of an ambiguous local time is intended. That's > not dateutil's fault, and not something any hybrid tzinfo can solve before > PEP 495 is implemented. > > dateutil is following the doc's advice to consider an ambiguous time to be > the later (in "standard time"), which in combination with inheriting the > default .fromutc() is enough to ensure that UTC->local conversion at least > mimics the hands on the local clock (skipping local times at DST start, and > repeating some at DST end). So it's doing the best it _can_ do now in those > respects. > > This is quite possibly true, and is roughly in line with my thinking on > the matter to date, but in my mind the behavior of dateutil with respect to > ambiguous times is undefined, so I'm not going to add tests that enforce an > arbitrary implementation choice as it's not behavior I want to lock down. > Could you at least lock down that ambiguous times return *something* rather than raising an exception? Or perhaps even that they return one of two valid alternatives? > It's a separate question as to whether it can or cannot do better in some > cases. The issues I linked to are both cases where an unambiguously > specified time ("now" or a time specified in UTC with an IANA time zone) > are incorrectly converted into local time. It is almost certainly true > that enough information is available to properly localize these datetimes, > but at least in the case of localizing "now" the cost in doing so is > additional complexity on the back-end. > > > The goal of PEP 495 is to make it possible for hybrid tzinfos to > handle all cases of gaps and folds due to any cause whatsoever > (provided that folds are never worse than 2-to-1), What Alex is > really after here is to kick the tires on PEP 495, to make sure: > > 1. All cases in the IANA database are in fact solved (that database > is the richest source of the goofiest zone changes to date). > > 2. That it's not only possible, but implementable with reasonable effort > and performance. > > dateutil was "the obvious" base to start from, since it's the only > widely used wrapping of the IANA database using hybrid tzinfos (pytz > took a very different path). > > > Yes, this was more or less my understanding. I just thought I'd put it out > there in case the fact that the more complex nature of the actual > implementation had some bearing on the thinking about the implementation. > For example, these tests > > could be problematic from a backwards compatibility standpoint. I haven't > had time to read the PEP or the discussion on the matter, so maybe this has > already been considered, but would make for a simpler interface if an > unspecified value for fold left the old behavior intact. > > I'll definitely read these things when I have time, so if it's already > been discussed no need to re-hash on my behalf. > > Whether dateutil can make _use_ of this experiment is up to you ;-) > > In cases where results differ from the current implementation, the > latter results can only be called "wrong". Which you may well need to > preserve. > > In which case, I'd suggest leaving the current implementation alone, > and _adding_ a new wrapping of tzfiles based on Alex's code. > dateutil's get-a-zone factory functions would need to grow some way to > spell "I want a pre-495 tzinfo" or "I want a post-495 tzinfo". New > functions, optional function flags, global setting ... whatever you > think works best. > > Of course this would apply to wrappings of other sources of zone info > too, but the IANA database must be by far the hardest (e.g., fold and > gap times can be deduced directly from a POSIX TZ string rule, which > are only subject to twice-a-year DST changes at worst). > > I think it's likely premature (and the wrong forum) to discuss such > downstream implementation details, but I imagine that it won't be difficult > to devise some scheme that by default gives the right answer where > possible, as long as there's a relatively straightforward way of wrapping > datetimes such that it provides a consistent *interface* across various > platforms. > > As for the question of whether to preserve the "wrong" values for the sake > of backwards compatibility, I'm not likely to sacrifice maximum *accuracy* > across platforms for maximum *consistency* across platforms. But again, > this is somewhat off-topic. > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Sep 18 19:36:30 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 18 Sep 2015 13:36:30 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <55FC4464.9050201@ganssle.io> References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> Message-ID: On Fri, Sep 18, 2015 at 1:05 PM, Paul Ganssle wrote: > I haven't had time to read the PEP or the discussion on the matter, so > maybe this has already been considered, but would make for a simpler > interface if an unspecified value for fold left the old behavior intact. > Yes this have been considered and there is a section [1] on this in the PEP. TL;DR: There will be no way to spell "fold=unspecified." We decided to change the current disambiguation rule (default to STD) because it does not work for roll-back transitions that don't change isdst. Furthermore, this rule is only needed to make default fromutc() work, but post-PEP tzinfos will have to override that method anyways. [1]: https://www.python.org/dev/peps/pep-0495/#backward-compatibility -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at ganssle.io Fri Sep 18 19:32:44 2015 From: paul at ganssle.io (Paul Ganssle) Date: Fri, 18 Sep 2015 13:32:44 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> Message-ID: <55FC4ABC.7020207@ganssle.io> This is a reasonable point that I'll have to mull over. I think by and large I'd prefer to be agnostic about these sorts of things if they don't have any bearing on my contribution to the interface. Everything below datetime uses native python modules, so making explicit guarantees about the results of what is essentially undefined behavior above and beyond the guarantees the interpreter / standard is already making seems unnecessary to me. That said, I'll have to go over the module in more detail and see places where the dateutil interface /should/ be making some guarantees about the behavior. On 9/18/2015 13:07, Guido van Rossum wrote: > On Fri, Sep 18, 2015 at 10:05 AM, Paul Ganssle > wrote: > > On 9/18/2015 11:56, Tim Peters wrote: >> Because dateutil inherits the default .fromutc(), it's all but >> certain it can't handle cases in the IANA database where a zone's >> base ("standard") offset changed either. >> >> But it's handling gaps & folds due to DST transitions as well as >> is _possible_ for a hybrid tzinfo given datetime's original >> design. There was no provision in datetime to make it possible >> for a hybrid tzinfo to know whether the earlier or later of an >> ambiguous local time is intended. That's not dateutil's fault, >> and not something any hybrid tzinfo can solve before PEP 495 is >> implemented. >> >> dateutil is following the doc's advice to consider an ambiguous >> time to be the later (in "standard time"), which in combination >> with inheriting the default .fromutc() is enough to ensure that >> UTC->local conversion at least mimics the hands on the local >> clock (skipping local times at DST start, and repeating some at >> DST end). So it's doing the best it _can_ do now in those respects. > This is quite possibly true, and is roughly in line with my > thinking on the matter to date, but in my mind the behavior of > dateutil with respect to ambiguous times is undefined, so I'm not > going to add tests that enforce an arbitrary implementation choice > as it's not behavior I want to lock down. > > > Could you at least lock down that ambiguous times return *something* > rather than raising an exception? Or perhaps even that they return one > of two valid alternatives? > > > It's a separate question as to whether it can or cannot do better > in some cases. The issues I linked to are both cases where an > unambiguously specified time ("now" or a time specified in UTC > with an IANA time zone) are incorrectly converted into local time. > It is//almost certainly true that enough information is available > to properly localize these datetimes, but at least in the case of > localizing "now" the cost in doing so is additional complexity on > the back-end. > >> >> The goal of PEP 495 is to make it possible for hybrid tzinfos to >> handle all cases of gaps and folds due to any cause whatsoever >> (provided that folds are never worse than 2-to-1), What Alex is >> really after here is to kick the tires on PEP 495, to make sure: >> >> 1. All cases in the IANA database are in fact solved (that database >> is the richest source of the goofiest zone changes to date). >> >> 2. That it's not only possible, but implementable with reasonable effort >> and performance. >> >> dateutil was "the obvious" base to start from, since it's the only >> widely used wrapping of the IANA database using hybrid tzinfos (pytz >> took a very different path). > > Yes, this was more or less my understanding. I just thought I'd > put it out there in case the fact that the more complex nature of > the actual implementation had some bearing on the thinking about > the implementation. For example, these tests > > could be problematic from a backwards compatibility standpoint. I > haven't had time to read the PEP or the discussion on the matter, > so maybe this has already been considered, but would make for a > simpler interface if an unspecified value for fold left the old > behavior intact. > > I'll definitely read these things when I have time, so if it's > already been discussed no need to re-hash on my behalf. > >> Whether dateutil can make _use_ of this experiment is up to you ;-) >> >> In cases where results differ from the current implementation, the >> latter results can only be called "wrong". Which you may well need to >> preserve. >> In which case, I'd suggest leaving the current implementation alone, >> and _adding_ a new wrapping of tzfiles based on Alex's code. >> dateutil's get-a-zone factory functions would need to grow some way to >> spell "I want a pre-495 tzinfo" or "I want a post-495 tzinfo". New >> functions, optional function flags, global setting ... whatever you >> think works best. >> >> Of course this would apply to wrappings of other sources of zone info >> too, but the IANA database must be by far the hardest (e.g., fold and >> gap times can be deduced directly from a POSIX TZ string rule, which >> are only subject to twice-a-year DST changes at worst). > I think it's likely premature (and the wrong forum) to discuss > such downstream implementation details, but I imagine that it > won't be difficult to devise some scheme that by default gives the > right answer where possible, as long as there's a relatively > straightforward way of wrapping datetimes such that it provides a > consistent /interface/ across various platforms. > > As for the question of whether to preserve the "wrong" values for > the sake of backwards compatibility, I'm not likely to sacrifice > maximum /accuracy/ across platforms for maximum /consistency/ > across platforms. But again, this is somewhat off-topic. > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > > > > -- > --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 834 bytes Desc: OpenPGP digital signature URL: From alexander.belopolsky at gmail.com Fri Sep 18 20:06:45 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 18 Sep 2015 14:06:45 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <55FC4ABC.7020207@ganssle.io> References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <55FC4ABC.7020207@ganssle.io> Message-ID: On Fri, Sep 18, 2015 at 1:32 PM, Paul Ganssle wrote: > Everything below datetime uses native python modules, so making explicit > guarantees about the results of what is essentially undefined behavior > above and beyond the guarantees the interpreter / standard is already > making seems unnecessary to me. If you looks at the standard library tests, you will see that we guarantee consistency in all but most extreme edge cases. (E.g., conversion between timezones with overlapping but not equal folds is one of such cases. [2]) I've found that dateutil test coverage is very good for the utcoffset()/tzname()/dst() triad, but it is less thorough for anything that involves fromutc(). (I believe I had to adjust only one test case when I added fold-awareness to fromutc().) This is understandable because you rely on stdlib version of fromutc(), but this is a problem in itself. We know that default fromutc() is only adequate for tzrange and very simple tzfile cases. I suspect dateutil has problems that are not limited to ambiguous datetimes in some IANA time zones. [1]: https://hg.python.org/cpython/file/v3.5.0/Lib/test/datetimetester.py#l2818 [2]: https://hg.python.org/cpython/file/v3.5.0/Lib/test/datetimetester.py#l3640 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 18 20:42:59 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 18 Sep 2015 13:42:59 -0500 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <55FC4464.9050201@ganssle.io> References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> Message-ID: [Paul Ganssle ] > .. > It's a separate question as to whether it can or cannot do better in some > cases. The issues I linked to are both cases where an unambiguously > specified time ("now" or a time specified in UTC with an IANA time zone) are > incorrectly converted into local time. It is almost certainly true that > enough information is available to properly localize these datetimes, but at > least in the case of localizing "now" the cost in doing so is additional > complexity on the back-end. When converting from UTC to a local ambiguous time, you obviously know which UTC time you started with. The problem is that it's impossible to _record_ which UTC time you started with. The date and time attributes of the local datetimes are (must be) identical, so the only way you _could_ record it is by overriding .fromutc() to attach a different tzinfo object (the only bits of a datetime object that could possibly differ between the earlier and later of an ambiguous local time). Which is what pytz does. But then the semantics of arithmetic changes too, because datetime subtraction and comparison do different things depending on whether or not the datetimes' tzinfo objects are identical (same object). This is why POSIX has a tm_isdst flag in a struct tm (the POSIX spelling of a Python datetime), to record whether an ambiguous local time is intended to be the earlier or later. PEP 495's new `fold` flag is the same so far as DST transitions go, but is also clearly applicable to all possible causes of folds (including a zone's "standard" offset changing). From tim.peters at gmail.com Fri Sep 18 21:00:57 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 18 Sep 2015 14:00:57 -0500 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <55FC4ABC.7020207@ganssle.io> Message-ID: [Alex] > ... > I suspect dateutil has problems that are not limited to ambiguous > datetimes in some IANA time zones. For pytz, Stuart said he ran zdump across all zones in the database, to drive exhaustive tests of all transition instants in every zone. That's an excellent idea :-) I strongly suspect dateutil will get some cases wrong simply because it's paying attention to the gmt/std/wall indicators in tzfiles. Those have no meaning for anything a tzinfo is trying to accomplish - it's an "attractive nuisance" that they're even stored in a tzfile. To convert transition times from UTC to local times (as dateutil appears to want to do), it should simply add the current total UTC offset, ignoring the gmt/std/wall indicators entirely. All transition times in tzfiles are recorded in UTC, regardless of what the gmt/std/wall indicators say. That won't make any difference for "most" zones because it just so happens that the "wall" indicator is set for most transitions and the "std" indicator is not (reflecting that most zoneinfo _source_ files record DST transition points in local wall-clock time). An exhaustive test would stumble into the exceptions. The way to fix broken cases discovered this way is to just ignore gmt/std/wall (better, seek over 'em when reading the file - they're useless). From alexander.belopolsky at gmail.com Fri Sep 18 21:01:16 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 18 Sep 2015 15:01:16 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> Message-ID: On Fri, Sep 18, 2015 at 2:42 PM, Tim Peters wrote: > > When converting from UTC to a local ambiguous time, you obviously know > which UTC time you started with. The problem is that it's impossible > to _record_ which UTC time you started with. The date and time > attributes of the local datetimes are (must be) identical, so the only > way you _could_ record it is by overriding .fromutc() to attach a > different tzinfo object (the only bits of a datetime object that could > possibly differ between the earlier and later of an ambiguous local > time). > > Which is what pytz does. The pytz hack is in violation of the strict reading of the reference manual [1] which says "The purpose of fromutc() is to adjust the date and time data ...". I think it is in the spirit if not in the letter of datetime module design that fromutc(dt) should not change dt.tzinfo. In any case, I think we have concluded on this list that pytz approach is not an example to be followed. I just wanted to mention for Paul's benefit that it is not just the arithmetic that is affected by the pytz hack. The changes in arithmetic are themselves consequences of the violation of the "fromutc(dt).tzinfo is dt.tzinfo" invariant. [1]: https://docs.python.org/3/library/datetime.html#datetime.tzinfo.fromutc -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at ganssle.io Fri Sep 18 21:16:20 2015 From: paul at ganssle.io (Paul Ganssle) Date: Fri, 18 Sep 2015 15:16:20 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <55FC4ABC.7020207@ganssle.io> Message-ID: Appreciate the advice, I have to admit that these edge cases seem rare enough that they haven't been a priority for me (I'm still trying to wrap up a release that doesn't occasionally break the parser for certain strings on the 29th-31st of every month, for example). Much to think about hate, I like that zdump idea as a method for test discovery. On Sep 18, 2015 3:01 PM, "Tim Peters" wrote: > [Alex] > > ... > > I suspect dateutil has problems that are not limited to ambiguous > > datetimes in some IANA time zones. > > For pytz, Stuart said he ran zdump across all zones in the database, > to drive exhaustive tests of all transition instants in every zone. > That's an excellent idea :-) > > I strongly suspect dateutil will get some cases wrong simply because > it's paying attention to the gmt/std/wall indicators in tzfiles. > Those have no meaning for anything a tzinfo is trying to accomplish - > it's an "attractive nuisance" that they're even stored in a tzfile. > To convert transition times from UTC to local times (as dateutil > appears to want to do), it should simply add the current total UTC > offset, ignoring the gmt/std/wall indicators entirely. All transition > times in tzfiles are recorded in UTC, regardless of what the > gmt/std/wall indicators say. > > That won't make any difference for "most" zones because it just so > happens that the "wall" indicator is set for most transitions and the > "std" indicator is not (reflecting that most zoneinfo _source_ files > record DST transition points in local wall-clock time). An exhaustive > test would stumble into the exceptions. The way to fix broken cases > discovered this way is to just ignore gmt/std/wall (better, seek over > 'em when reading the file - they're useless). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Fri Sep 18 22:59:50 2015 From: 4kir4.1i at gmail.com (Akira Li) Date: Fri, 18 Sep 2015 23:59:50 +0300 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: (Alexander Belopolsky's message of "Fri, 18 Sep 2015 15:01:16 -0400") References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> Message-ID: <87a8sj4f5l.fsf@gmail.com> Alexander Belopolsky writes: > On Fri, Sep 18, 2015 at 2:42 PM, Tim Peters wrote: >> >> When converting from UTC to a local ambiguous time, you obviously know >> which UTC time you started with. The problem is that it's impossible >> to _record_ which UTC time you started with. The date and time >> attributes of the local datetimes are (must be) identical, so the only >> way you _could_ record it is by overriding .fromutc() to attach a >> different tzinfo object (the only bits of a datetime object that could >> possibly differ between the earlier and later of an ambiguous local >> time). >> >> Which is what pytz does. > > The pytz hack is in violation of the strict reading of the reference manual > [1] which says "The purpose of fromutc() is to adjust the date and time > data ...". I think it is in the spirit if not in the letter of datetime > module design that fromutc(dt) should not change dt.tzinfo. pytz's fromutc() returns the correct* result. dateutil can't do it (at the moment) https://github.com/dateutil/dateutil/issues/112 * The word "correct" here does not depend on the programming language specification and/or its implementation e.g. from the bug description: Input: 2011-11-06 05:30:00 UTC+0000, America/Toronto Expected: 2011-11-06 01:30:00 EDT-0400 If the reference manual mandates a different result then it is wrong wrong. > In any case, I think we have concluded on this list that pytz approach is > not an example to be followed. I just wanted to mention for Paul's benefit > that it is not just the arithmetic that is affected by the pytz hack. The > changes in arithmetic are themselves consequences of the violation of the > "fromutc(dt).tzinfo is dt.tzinfo" invariant. Consider the following (natural) equality: tz.fromutc(utc_time) == utc_time.replace(tzinfo=utc_tz).astimezone(tz) The right side allows *utc_time.tzinfo* being None or *utc_time.tzinfo* may be some equivalent of *timezone.utc*. It is confusing that the method named *fromutc()* (its stdlib implementation) rejects *utc_time* if it is in utc timezone. stdlib's behavior that mandates utc_time.tzinfo == tz where tz may have non-zero utc offset is weird (mind-bending -- input time must be utc but tzinfo is not utc -- wtf). There is no need to attach *tz* before calling *tz.fromutc()* -- tz is passed as *self* anyway. > [1]: https://docs.python.org/3/library/datetime.html#datetime.tzinfo.fromutc From alexander.belopolsky at gmail.com Fri Sep 18 23:03:35 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 18 Sep 2015 17:03:35 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <87a8sj4f5l.fsf@gmail.com> References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <87a8sj4f5l.fsf@gmail.com> Message-ID: On Fri, Sep 18, 2015 at 4:59 PM, Akira Li <4kir4.1i at gmail.com> wrote: > stdlib's behavior that mandates utc_time.tzinfo == tz where tz may have > non-zero utc offset is weird (mind-bending -- input time must be utc but > tzinfo is not utc -- wtf). > Please stop fighting decisions that have been made 12 years ago. You cannot win regardless of the merits of your arguments. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Fri Sep 18 23:29:20 2015 From: random832 at fastmail.com (Random832) Date: Fri, 18 Sep 2015 17:29:20 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <55FC4ABC.7020207@ganssle.io> Message-ID: <1442611760.2536451.387678633.3DAFF41C@webmail.messagingengine.com> On Fri, Sep 18, 2015, at 15:00, Tim Peters wrote: > Those have no meaning for anything a tzinfo is trying to accomplish - > it's an "attractive nuisance" that they're even stored in a tzfile. For background information: the purpose of storing them in a tzfile is to allow that tzfile to be used as a template for dynamically creating timezones with the same rules but other offsets. This is used for the timezone named "posixrules" - which is a US timezone (America/New_York) by default - to generate timezones for POSIX timezone strings that don't explicitly specify their daylight rules. They should not be used for normal interpretation of a timezone. From tim.peters at gmail.com Sat Sep 19 03:02:21 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 18 Sep 2015 20:02:21 -0500 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> Message-ID: [Tim] >> When converting from UTC to a local ambiguous time, you obviously know >> which UTC time you started with. The problem is that it's impossible >> to _record_ which UTC time you started with. The date and time >> attributes of the local datetimes are (must be) identical, so the only >> way you _could_ record it is by overriding .fromutc() to attach a >> different tzinfo object (the only bits of a datetime object that could >> possibly differ between the earlier and later of an ambiguous local >> time). >> >> Which is what pytz does. [Alex] > The pytz hack is in violation of the strict reading of the reference manual > [1] which says "The purpose of fromutc() is to adjust the date and time data > ...". I think it is in the spirit if not in the letter of datetime module > design that fromutc(dt) should not change dt.tzinfo. It's certainly "in the spirit" not to change it. I wrote that part of the docs, and it never occurred to me that anyone would even _consider_ changing it ;-) > In any case, I think we have concluded on this list that pytz approach is > not an example to be followed. Well, it was dead easy to establish it wasn't Guido's intent as the primary original designer, or my intent as the primary original implementer & doc author - all anyone ever had to do to establish _that_ was to ask us ;-) I happen to still believe that a "hybrid" tzinfo is the best approach, but appreciate that pytz solved a world of problems with its approach (while creating others). I really can't tell if a consensus has been reached among the relative handful of datetime-SIG participants. Which means there is no consensus. > I just wanted to mention for Paul's benefit > that it is not just the arithmetic that is affected by the pytz hack. The > changes in arithmetic are themselves consequences of the violation of the > "fromutc(dt).tzinfo is dt.tzinfo" invariant. Paul, something else you should know: you don't _have_ to change anything if PEP 495 is implemented. That alone shouldn't change any results dateutil computes in any case. datetuil will simply ignore `fold` then, and compute the same results it computes today. The intent is to make it _possible_ for dateutil to get conversions exactly right in every case, which it cannot do today. From tim.peters at gmail.com Sat Sep 19 03:10:26 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 18 Sep 2015 20:10:26 -0500 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <55FC4ABC.7020207@ganssle.io> Message-ID: [Paul Ganssle ] > Appreciate the advice, I have to admit that these edge cases seem rare > enough that they haven't been a priority for me On a third look, I think you can ignore my rant about the gmt/std/wall indicators: those don't appear to be _used_ at all in the current dateutil code. I was either hallucinating, or (mis)remembering some older version of the code. But since they're not used, you could save some memory space & cycles by not bothering to read them from the tzfile to begin with. About edge cases, as before it's simply not possible to get them all right today, nor to get as many right as _is_ possible for IANA zones today without overriding .fromutc(). If I were you I'd wait to see PEP 495's fate. Then "always right all the time" could become possible. > (I'm still trying to wrap up a release that doesn't occasionally break the > parser for certain strings on the 29th-31st of every month, for example). Just fix the 30th, and call it progress ;-) From tim.peters at gmail.com Sat Sep 19 03:33:48 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 18 Sep 2015 20:33:48 -0500 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <87a8sj4f5l.fsf@gmail.com> References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <87a8sj4f5l.fsf@gmail.com> Message-ID: [Akira Li <4kir4.1i at gmail.com>] > pytz's fromutc() returns the correct* result. dateutil can't do it (at > the moment) https://github.com/dateutil/dateutil/issues/112 Do you understand why PEP 495 is being proposed? > ... > Consider the following (natural) equality: > > tz.fromutc(utc_time) == utc_time.replace(tzinfo=utc_tz).astimezone(tz) Clear as mud to me ;-) > The right side allows *utc_time.tzinfo* being None or *utc_time.tzinfo* > may be some equivalent of *timezone.utc*. Since the RHS replaces utc_time.tzinfo before using utc_time, the RHS "allows" utc_time.tzinfo to be anything whatsoever at the start. > It is confusing that the method named *fromutc()* (its stdlib implementation) > rejects *utc_time* if it is in utc timezone. But your use, despite your claim of being "natural", is highly _un_natural. The natural use of .astimezone() is to invoke it _from_ a datetime object: a_datetime.astimezone(tz) .fromutc() was rarely intended to be invoked directly, except perhaps by tzinfo authors. In that context, its real use is to help implement .astimezone(), And its calling conventions are natural in that context: def datetime.astimezone(self, tz): myoffset = self.utcoffset() utc = (self - myoffset).replace(tzinfo=tz) return tz.fromutc(utc) > stdlib's behavior that mandates utc_time.tzinfo == tz Not "==", "is". > where tz may have non-zero utc offset is weird (mind-bending -- > input time must be utc but tzinfo is not utc -- wtf). There is no > need to attach *tz* before calling *tz.fromutc()* -- tz is passed > as *self* anyway. Redundancy helps catch programming errors. I know darned well this check helped catch errors I made when implementing this stuff to begin with. There's always potential confusion when one object delegates operations to operations of the same names implemented by a contained object. If you don't like it, tough ;-) Stick to using astimezone() and leave the internals alone. If you are going to play with the internals, follow the rules, It's not like they weren't documented ;-) From 4kir4.1i at gmail.com Sat Sep 19 05:19:33 2015 From: 4kir4.1i at gmail.com (Akira Li) Date: Sat, 19 Sep 2015 06:19:33 +0300 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: (Tim Peters's message of "Fri, 18 Sep 2015 20:33:48 -0500") References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <87a8sj4f5l.fsf@gmail.com> Message-ID: <87wpvn2j0a.fsf@gmail.com> Tim Peters writes: > [Akira Li <4kir4.1i at gmail.com>] >> pytz's fromutc() returns the correct* result. dateutil can't do it (at >> the moment) https://github.com/dateutil/dateutil/issues/112 > > Do you understand why PEP 495 is being proposed? > Yes, that is why I said "at the moment" https://www.python.org/dev/peps/pep-0495/#rationale https://github.com/python/peps/blob/70c78c6c48f9f025f0485f4a756b313d414b5786/pep-0495.txt#L31-L54 >> ... >> Consider the following (natural) equality: >> >> tz.fromutc(utc_time) == utc_time.replace(tzinfo=utc_tz).astimezone(tz) > > Clear as mud to me ;-) > > >> The right side allows *utc_time.tzinfo* being None or *utc_time.tzinfo* >> may be some equivalent of *timezone.utc*. > > Since the RHS replaces utc_time.tzinfo before using utc_time, the RHS > "allows" utc_time.tzinfo to be anything whatsoever at the start. "anything whatsover" would conflict with the _name_ *utc_time*. If *utc_time* is a naive datetime object then it may be interpreted as utc time in a given program. If *utc_time* is timezone-aware then utc_time.tzinfo being an equivalent of timezone.utc is not surprising too. >> It is confusing that the method named *fromutc()* (its stdlib implementation) >> rejects *utc_time* if it is in utc timezone. > > But your use, despite your claim of being "natural", is highly > _un_natural. The natural use of .astimezone() is to invoke it _from_ > a datetime object: > > a_datetime.astimezone(tz) > > .fromutc() was rarely intended to be invoked directly, except perhaps > by tzinfo authors. In that context, its real use is to help implement > .astimezone(), And its calling conventions are natural in that > context: > > def datetime.astimezone(self, tz): > myoffset = self.utcoffset() > utc = (self - myoffset).replace(tzinfo=tz) > return tz.fromutc(utc) > > > >> stdlib's behavior that mandates utc_time.tzinfo == tz > > Not "==", "is". Yes, it was an error. Though it does not change the meaning of the sentence i.e., any value except None or timezone.utc analog is surprising for utc_time.tzinfo >> where tz may have non-zero utc offset is weird (mind-bending -- >> input time must be utc but tzinfo is not utc -- wtf). There is no >> need to attach *tz* before calling *tz.fromutc()* -- tz is passed >> as *self* anyway. > > Redundancy helps catch programming errors. I know darned well this > check helped catch errors I made when implementing this stuff to begin > with. There's always potential confusion when one object delegates > operations to operations of the same names implemented by a contained > object. > > If you don't like it, tough ;-) Stick to using astimezone() and leave > the internals alone. If you are going to play with the internals, > follow the rules, It's not like they weren't documented ;-) To be clear, it is not a suggestion to change anything in stdlib. It was a reaction to the earlier message in this thread, to point out why stdlib's fromutc() API is not the example that should be followed. Thank you for providing the explicit reasons for the specific choices in the API design: "redundency helps" and fromutc() is semi-private. I can't remember when I've used fromutc() directly (It is used indirectly via datetime.now(tz), datetime.fromtimestamp(ts, tz), d.astimezone(tz), tz.normalize()). From tim.peters at gmail.com Sat Sep 19 05:25:29 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 18 Sep 2015 22:25:29 -0500 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <87wpvn2j0a.fsf@gmail.com> References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <87a8sj4f5l.fsf@gmail.com> <87wpvn2j0a.fsf@gmail.com> Message-ID: [Akira Li <4kir4.1i at gmail.com>] > ... > To be clear, it is not a suggestion to change anything in stdlib. It was > a reaction to the earlier message in this thread, to point out why > stdlib's fromutc() API is not the example that should be followed. Thank > you for providing the explicit reasons for the specific choices in the > API design: "redundency helps" and fromutc() is semi-private. I can't > remember when I've used fromutc() directly (It is used indirectly via > datetime.now(tz), datetime.fromtimestamp(ts, tz), d.astimezone(tz), Which are part of Python. > tz.normalize()). Which is unique to pytz. So, yes, it's used as intended, by _implementations_ of higher-level methods. In those contexts, "convenience" is of no importance, but the value of catching errors (by implementers!) is of supreme importance. From alexander.belopolsky at gmail.com Sat Sep 19 07:09:01 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 19 Sep 2015 01:09:01 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: <87wpvn2j0a.fsf@gmail.com> References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> <87a8sj4f5l.fsf@gmail.com> <87wpvn2j0a.fsf@gmail.com> Message-ID: On Fri, Sep 18, 2015 at 11:19 PM, Akira Li <4kir4.1i at gmail.com> wrote: > It was > a reaction to the earlier message in this thread, to point out why > stdlib's fromutc() API is not the example that should be followed. > You don't have to "follow" it but you must understand what datetime module expects from you as a tzinfo implementer if you decide to override the default fromutc() implementation. What Stuart did in pytz was a hack that the authors of the original design did not expect. I think you find fromutc() design unnatural because you have a different view of what datetime instances are. I believe for you, datetime instances are labels on a time line, but they are not. They are more like clock faces. Aware datetimes are clocks with stickers that say "New York", "Madrid", etc. The label tells you how to interpret the time that the clock shows, but that time does not have to be "current" or "accurate" time at the location written on the label. You can take a "Madrid" clock and set it to show "current" New York time. Nothing in datetime module will stop you even if you set the time that falls in Madrid "gap" and makes no sense there. The fromutc() method helps you to set your New York clock if you know "current" UTC time. The instructions are simple: set "current" UTC time on your New York clock and call fromutc(). If you adopt this mental picture, then the idea of replacing tzinfo on a datetime becomes absurd. Why would you want ruin a perfectly good "New York" clock simply because it comes from Geneva showing time that is 5 hours ahead? You don't rip off the "New York" label - you just wind the clock back 5 hours. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Sep 19 22:16:01 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 19 Sep 2015 16:16:01 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta Message-ID: The datetime.dst() and its namesake tzinfo.dst() [1] methods are required to return a timedelta object that represents a quantity added to standard time in a spring-forward transition. As explained in documentation, the dst() value is already incorporated in the value returned by utcoffset() and is not needed in typical calculations. Therefore, it is not surprising that both dateutil and pytz get it wrong in some cases. [2,3] While pytz does slightly better than dateutil, it looks like it may not be possible to derive the correct value of dst() from the compiled binary tzfiles alone in all cases. The problematic cases are transitions that involve a simultaneous change in standard time and a DST transition. For example, Portugal switching from CET to WEST in 1996. [2] While the "SAVE" amount can be found in the raw tzdist files, this information is lost when the raw files are compiled. The transition information includes only the full new UTC offset and a boolean isdst flag. If the transition is a pure DST transition, then dst() is just the difference between the new UTC offset and the old, but if the standard time offset changes at the time of the DST transition, there is no information in the binary tzfile to split the full difference into standard time change and DST adjustment. Unless I miss something, it looks like a high-quality tzinfo implementation should extract the "SAVE" information from the raw files. [1]: https://docs.python.org/3/library/datetime.html#datetime.tzinfo.dst [2]: https://github.com/dateutil/dateutil/issues/128 [3]: https://bugs.launchpad.net/pytz/+bug/1497619 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 20 00:01:59 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 19 Sep 2015 17:01:59 -0500 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: Message-ID: [Alexander Belopolsky ] > The datetime.dst() and its namesake tzinfo.dst() [1] methods are required to > return a timedelta object that represents a quantity added to standard time > in a spring-forward transition. > > As explained in documentation, the dst() value is already incorporated in > the value returned by utcoffset() and is not needed in typical calculations. > Therefore, it is not surprising that both dateutil and pytz get it wrong in > some cases. [2,3] Ya, the docs over-promised here ;-) I think the only "important" invariant to maintain is that _some_ kind of DST is in effect if and only if .dst() != timedelta(0). > While pytz does slightly better than dateutil, it looks like it may not be > possible to derive the correct value of dst() from the compiled binary > tzfiles alone in all cases. You're right, it can't, but for a more general reason than what you give next: at base, it's impossible to always know what a zone's "standard offset" is from what a tzfile stores, even though the zoneinfo source (text) files do spell that out. > The problematic cases are transitions that involve a simultaneous change in > standard time and a DST transition. For example, Portugal switching from > CET to WEST in 1996. [2] Specifically, on 1996-03-31 that simultaneously switched from CET (standard time) to WEST (daylight time), yes? The total UTC offset was !;00:00 both before and after. In cases "like this", you can search either backward or forward in the transition list, to find a closest _different_ DST switch, and calculate a change of 1 hour either way. So it's "almost certain" that the DST offset is an hour in this case too. A case where that doesn't work, unless squinting: that place in Antarctica with two kinds of DST each year. The total UTC offset increases by 1 when the first DST kicks in, and by 1 again when the second kicks in. . So, in the second case, the delta between adjacent total UTC offsets is just 1, despite that the (total) DST offset is actually 2. Which suggests a more general "good guess": If the transition record says DST is not in effect, dst() should return timedelta(0). Else it says DST is in effect. If the prior transition record says it was not in effect and the total UTC offsets differ, .dst() should return their difference. Else the total offsets are the same, or DST is in effect for both. Search back to find the closest preceding time DST switched. Use the total UTC offset from the "not DST" half of that switch instead. If none can be found going backward, go forward instead. And if both searches fail, return timedelta(hours=1). > While the "SAVE" amount can be found in the raw tzdist files, this > information is lost when the raw files are compiled. The transition > information includes only the full new UTC offset and a boolean isdst flag. > If the transition is a pure DST transition, then dst() is just the > difference between the new UTC offset and the old, but if the standard time > offset changes at the time of the DST transition, there is no information in > the binary tzfile to split the full difference into standard time change and > DST adjustment. > > Unless I miss something, it looks like a high-quality tzinfo implementation > should extract the "SAVE" information from the raw files. I will continue to draw a distinction between "high quality" and "timezone wonk" quality ;-) > [1]: https://docs.python.org/3/library/datetime.html#datetime.tzinfo.dst > [2]: https://github.com/dateutil/dateutil/issues/128 > [3]: https://bugs.launchpad.net/pytz/+bug/1497619 From tim.peters at gmail.com Sun Sep 20 02:11:11 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 19 Sep 2015 19:11:11 -0500 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: Message-ID: [Alex] > I wonder, what's the point of saving daylight at the place where sun does > not set? (or does not rise depending on the time of the year?) What's the point of DST _anywhere_? Politics :-) But in Antarctica, the base notion of "time zone" itself is essentially senseless: https://en.wikipedia.org/wiki/Time_in_Antarctica """ Antarctica sits on every line of longitude, due to the South Pole being situated near the middle of the continent. Theoretically Antarctica would be located in all time zones; however, areas south of the Antarctic Circle experience extreme day-night cycles near the times of the June and December solstices, making it difficult to determine which time zone would be appropriate. For practical purposes time zones are usually based on territorial claims; however, many stations use the time of the country they are owned by or the time zone of their supply base (e.g. McMurdo Station and Amundsen?Scott South Pole Station use New Zealand time due to their main supply base beingChristchurch, New Zealand).[1] Nearby stations can have different time zones, due to their belonging to different countries. Many areas have no time zone since nothing is decided and there are not even any temporary settlements that have any clocks. They are simply labeled with UTC time.[2] """ Then there's a list of "standard" UTC offsets for various Antarctica locations, varying from -4 to +12. From alexander.belopolsky at gmail.com Sun Sep 20 02:14:56 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 19 Sep 2015 20:14:56 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: Message-ID: On Sat, Sep 19, 2015 at 8:04 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > > On Sat, Sep 19, 2015 at 6:01 PM, Tim Peters wrote: >> >> that place in Antarctica with two kinds of DST each year. > > > I wonder, what's the point of saving daylight at the place where sun does not set? (or does not rise depending on the time of the year?) Tim, are you referring to the "Troll" rule? [1] That's a strange beast indeed and a comment above it says: # The CET-switching Troll rules require zic from tzcode 2014b or later, so as # suggested by Bengt-Inge Larsson comment them out for now, and approximate # with only UTC and CEST. Uncomment them when 2014b is more prevalent. On the other hand, I don't see any challenges to PEP 495 there other than finding means to extract the relevant information. Maybe I should hand-code this rule as demo/test case. [1]: https://github.com/eggert/tz/blob/master/antarctica#L217 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sun Sep 20 02:04:47 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 19 Sep 2015 20:04:47 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: Message-ID: On Sat, Sep 19, 2015 at 6:01 PM, Tim Peters wrote: > that place in Antarctica with two kinds of DST each year. > I wonder, what's the point of saving daylight at the place where sun does not set? (or does not rise depending on the time of the year?) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Sep 20 02:30:52 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 19 Sep 2015 19:30:52 -0500 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: Message-ID: [Alex] > Tim, are you referring to the "Troll" rule? [1] That's a strange beast > indeed and a comment above it says: > > # The CET-switching Troll rules require zic from tzcode 2014b or later, so > as > # suggested by Bengt-Inge Larsson comment them out for now, and approximate > # with only UTC and CEST. Uncomment them when 2014b is more prevalent. Yes, and this was pointed out some time ago. These really are the rules they use: http://www.timeanddate.com/time/zone/antarctica/troll > On the other hand, I don't see any challenges to PEP 495 there other than > finding means to extract the relevant information. The only problem is figuring out how to handle .dst() - which is a problem regardless of whether 495 is implemented. I remain unclear as to why it broke zic.c, though! > Maybe I should hand-code this rule as demo/test case. This is soooo sad - you're clearly becoming a timezone wonk ;-) > [1]: https://github.com/eggert/tz/blob/master/antarctica#L217 From alexander.belopolsky at gmail.com Sun Sep 20 02:51:00 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 19 Sep 2015 20:51:00 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: Message-ID: On Sat, Sep 19, 2015 at 8:30 PM, Tim Peters wrote: > I remain unclear as to why it broke zic.c, though! > http://mm.icann.org/pipermail/tz/2014-March/020758.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sun Sep 20 02:43:17 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 19 Sep 2015 20:43:17 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: Message-ID: On Sat, Sep 19, 2015 at 8:30 PM, Tim Peters wrote: > > Maybe I should hand-code this rule as demo/test case. > > This is soooo sad - you're clearly becoming a timezone wonk ;-) No, I just want to close the timezone issue in Python once and for all. Despite its reputation, the issue is trivial: its all about a bunch of piecewise constant functions and very simple expressions like x + f(x). BTW, how do you like my new algorithm for inverting x + f(x)? https://github.com/abalkin/cpython/blob/7c30620c1789ee6ecead945513e2b34ce0c24d26/Lib/test/datetimetester.py#L4328 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Sep 21 05:49:22 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 20 Sep 2015 23:49:22 -0400 Subject: [Datetime-SIG] PEP 495 (Local Time Disambiguation) is ready for pronouncement In-Reply-To: References: Message-ID: On Sat, Aug 15, 2015 at 8:49 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > PEP 495 [1] is a deliberately minimalistic proposal to remove an > ambiguity in representing some local times as datetime.datetime > objects. A major issue has come up since my announcement above. Tim Peters have noticed that PEP 495 would violate the "hash invariant" unless the fold attribute is accounted for in inter-zone comparisons. See [2] for details. This issue has been resolved by modifying the definition [3] of the "==" operator for aware datetimes with post-PEP tzinfo. Note that no program will be affected by this change unless it uses a post-PEP tzinfo implementation. I made some smaller changes [4] to the PEP as well and it should finally be ready for pronouncement. [1]: https://www.python.org/dev/peps/pep-0495 [2]: https://mail.python.org/pipermail/datetime-sig/2015-September/000625.html [3]: https://www.python.org/dev/peps/pep-0495/#aware-datetime-equality-comparison [4]: https://hg.python.org/peps/log/39b7c1da05a2/pep-0495.txt -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Sep 21 06:55:16 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Sep 2015 00:55:16 -0400 Subject: [Datetime-SIG] Adding PEP 495 support to dateutil In-Reply-To: References: <55FC1E62.6060202@ganssle.io> <55FC4464.9050201@ganssle.io> Message-ID: On Fri, Sep 18, 2015 at 9:02 PM, Tim Peters wrote: > > I happen to still believe that a "hybrid" tzinfo is the best approach, > but appreciate that pytz solved a world of problems with its approach > (while creating others). I really can't tell if a consensus has been > reached among the relative handful of datetime-SIG participants. > Which means there is no consensus. If "consensus" means "absence of sustained opposition" [1], it looks like we either weared out or intimidated the "opposition" enough for it not to be "sustained" anymore. :-) Luckily, PEP 495 solves at least one problem that has nothing to do with a choice of tzinfo: it makes the result of datetime.now() unambiguous. Also, the PEP does not take a position on what approach is better - it just makes both equally feasible. [1] https://lehors.wordpress.com/2008/08/07/what-consensus-means/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Sep 21 14:44:23 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 21 Sep 2015 14:44:23 +0200 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: Message-ID: <55FFFBA7.80905@egenix.com> On 20.09.2015 02:30, Tim Peters wrote: > [Alex] >> Tim, are you referring to the "Troll" rule? [1] That's a strange beast >> indeed and a comment above it says: >> >> # The CET-switching Troll rules require zic from tzcode 2014b or later, so >> as >> # suggested by Bengt-Inge Larsson comment them out for now, and approximate >> # with only UTC and CEST. Uncomment them when 2014b is more prevalent. > > Yes, and this was pointed out some time ago. These really are the > rules they use: > > http://www.timeanddate.com/time/zone/antarctica/troll Interesting. It lists two "DST"s per year: they first go from GMT to CET, then to CEST, and then back to CET and GMT. I guess they switched to CET when the station was used and to GMT for the instruments during winter when it was not used. But this is not consistent with what the Norwegians report on their Troll station website: http://www.npolar.no/en/about-us/stations-vessels/troll/index.html """ The time zone Troll is located in, UTC +0, is 1 hour behind Norwegian time (2 hours during summer time in Norway). Photo: Norwegian Polar Institute """ The webcam confirms this: ftp://ftp.npolar.no/Out/TrollWebCam/TrollPublic.jpg -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 21 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-26: Python Meeting Duesseldorf Sprint 2015 5 days to go 2015-10-21: Python Meeting Duesseldorf ... 30 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From alexander.belopolsky at gmail.com Mon Sep 21 17:01:20 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Sep 2015 11:01:20 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: <55FFFBA7.80905@egenix.com> References: <55FFFBA7.80905@egenix.com> Message-ID: On Mon, Sep 21, 2015 at 8:44 AM, M.-A. Lemburg wrote: > Interesting. It lists two "DST"s per year: they first > go from GMT to CET, then to CEST, and then back to CET and GMT. > I guess they switched to CET when the station was used and > to GMT for the instruments during winter when it was not used. > > But this is not consistent with what the Norwegians report on their > Troll station website: > > http://www.npolar.no/en/about-us/stations-vessels/troll/index.html > > """ > The time zone Troll is located in, UTC +0, is 1 hour behind Norwegian > time (2 hours during summer time in Norway). Photo: Norwegian Polar > Institute > """ > > The webcam confirms this: > ftp://ftp.npolar.no/Out/TrollWebCam/TrollPublic.jpg > For those who care about this kind of timezone trivia, apparently [1] Troll station is inhabited by Norwegians only during the colder months (March?October) who use Norwegian time (TZ=Europe/Oslo). During the warmer ("busy") season, the station switches to UTC as an option that is equally annoying to all international inhabitants. The specific rules that appear in some versions of the IANA zone info files are pure fantasy. [1]: http://mm.icann.org/pipermail/tz/2014-March/020705.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Mon Sep 21 17:49:45 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 21 Sep 2015 17:49:45 +0200 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> Message-ID: <56002719.8090404@egenix.com> On 21.09.2015 17:01, Alexander Belopolsky wrote: > On Mon, Sep 21, 2015 at 8:44 AM, M.-A. Lemburg wrote: > >> Interesting. It lists two "DST"s per year: they first >> go from GMT to CET, then to CEST, and then back to CET and GMT. >> I guess they switched to CET when the station was used and >> to GMT for the instruments during winter when it was not used. >> >> But this is not consistent with what the Norwegians report on their >> Troll station website: >> >> http://www.npolar.no/en/about-us/stations-vessels/troll/index.html >> >> """ >> The time zone Troll is located in, UTC +0, is 1 hour behind Norwegian >> time (2 hours during summer time in Norway). Photo: Norwegian Polar >> Institute >> """ >> >> The webcam confirms this: >> ftp://ftp.npolar.no/Out/TrollWebCam/TrollPublic.jpg >> > > For those who care about this kind of timezone trivia, apparently [1] Troll > station is inhabited by Norwegians only during the colder months > (March?October) > who use Norwegian time (TZ=Europe/Oslo). During the warmer ("busy") > season, the station switches to UTC as an option that is equally annoying > to all international inhabitants. > > The specific rules that appear in some versions of the IANA zone info files > are pure fantasy. > > [1]: http://mm.icann.org/pipermail/tz/2014-March/020705.html Looks like assigning a "time zone" to the place is simply conceptually wrong and was just done to make some tz folks happy. Anyway, the main takeaway for me is that it is obviously possible to have more than two DST switches during the year, which is something I wasn't aware of before seeing this example. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 21 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-26: Python Meeting Duesseldorf Sprint 2015 5 days to go 2015-10-21: Python Meeting Duesseldorf ... 30 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tim.peters at gmail.com Mon Sep 21 18:04:21 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 21 Sep 2015 11:04:21 -0500 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: <56002719.8090404@egenix.com> References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> Message-ID: [Marc-Andre, on Antarctica/Troll] > Looks like assigning a "time zone" to the place is simply conceptually > wrong and was just done to make some tz folks happy. There are many "time zones" in Antarctica (theoretically, it's in all time zones). They're all senseless ;-) https://en.wikipedia.org/wiki/Time_in_Antarctica > Anyway, the main takeaway for me is that it is obviously possible > to have more than two DST switches during the year, which is > something I wasn't aware of before seeing this example. The Brits beat 'em to it, but a long time ago: https://en.wikipedia.org/wiki/British_Summer_Time In 1940, during the Second World War, the clocks in Britain were not put back by an hour at the end of Summer Time. In subsequent years, clocks continued to be advanced by one hour each spring and put back by an hour each autumn until July 1945. During these summers, therefore, Britain was two hours ahead of GMT and operating on British Double Summer Time (BDST). The clocks were brought back in line with GMT at the end of summer in 1945. In 1947, due to severe fuel shortages, clocks were advanced by one hour on two occasions during the spring, and put back by one hour on two occasions during the autumn, meaning that Britain was back on BDST during that summer. These may be the corresponding lines in IANA's "europe" file: Rule GB-Eire 1947 only - Mar 16 2:00s 1:00 BST Rule GB-Eire 1947 only - Apr 13 1:00s 2:00 BDST Rule GB-Eire 1947 only - Aug 10 1:00s 1:00 BST Rule GB-Eire 1947 only - Nov 2 2:00s 0 GMT From alexander.belopolsky at gmail.com Mon Sep 21 18:20:23 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Sep 2015 12:20:23 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: <56002719.8090404@egenix.com> References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> Message-ID: On Mon, Sep 21, 2015 at 11:49 AM, M.-A. Lemburg wrote: > > Looks like assigning a "time zone" to the place is simply conceptually > wrong and was just done to make some tz folks happy. Yes, the best definition of "time zone" in computing contexts is the one given by the tzdist group: "A description of the past and predicted future timekeeping practices of a collection of clocks that are intended to agree." Apparently, there is no concerted effort at the Troll station to have a station-specific set of timekeeping rules. They just use either UTC or Europe/Oslo depending on the needs of the current expedition. > > Anyway, the main takeaway for me is that it is obviously possible > to have more than two DST switches during the year, which is > something I wasn't aware of before seeing this example. The March 1st switch at Troll from UTC to CET is not really a DST transition. It is a transition that changes the standard time. (The value of isdst does not change in the transition.) Even more exotic things can happen if one would try to model a ship's clock using a tzinfo instance. By convention, ships use the time of the closest port or whatever the captain feels appropriate in international waters. Since ship logs are usually reliable and ship speed is low, such specialty application will probably work in most cases. Note that faster vehicles such as the ISS use UTC these days, but I think the Apollo program used the Houston time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Sep 21 19:02:56 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 22 Sep 2015 03:02:56 +1000 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 2:20 AM, Alexander Belopolsky wrote: > Note that faster vehicles such as the ISS use UTC these days... Isn't the ISS fast enough that relativity starts getting in the way? I am *so* glad the Python datetime module doesn't have to concern itself with that... ChrisA From alexander.belopolsky at gmail.com Mon Sep 21 19:10:53 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Sep 2015 13:10:53 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> Message-ID: On Mon, Sep 21, 2015 at 1:02 PM, Chris Angelico wrote: > On Tue, Sep 22, 2015 at 2:20 AM, Alexander Belopolsky > wrote: > > Note that faster vehicles such as the ISS use UTC these days... > > Isn't the ISS fast enough that relativity starts getting in the way? > Not fast enough for an astronaut to miss a wedding anniversary. I am *so* glad the Python datetime module doesn't have to concern itself > with that... > Neither do the astronauts when they schedule a video conference with a family at home and that's the only time they need to worry about civil time zones on the Earth. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Sep 21 19:23:29 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Sep 2015 13:23:29 -0400 Subject: [Datetime-SIG] PEP 495 (Local Time Disambiguation) is ready for pronouncement In-Reply-To: References: Message-ID: For those who prefer using Github's review tools, I have republished the PEP at . Comments and pull requests are welcome. On Sun, Sep 20, 2015 at 11:49 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > On Sat, Aug 15, 2015 at 8:49 PM, Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: > > > > PEP 495 [1] is a deliberately minimalistic proposal to remove an > > ambiguity in representing some local times as datetime.datetime > > objects. > > A major issue has come up since my announcement above. Tim Peters have > noticed that PEP 495 would violate the "hash invariant" unless the fold > attribute is accounted for in inter-zone comparisons. > See [2] for details. This issue has been resolved by modifying the > definition [3] of the "==" operator for aware datetimes with post-PEP > tzinfo. Note that no program will be affected by this change unless it > uses a post-PEP tzinfo implementation. > > I made some smaller changes [4] to the PEP as well and it should finally > be ready for pronouncement. > > [1]: https://www.python.org/dev/peps/pep-0495 > [2]: > https://mail.python.org/pipermail/datetime-sig/2015-September/000625.html > [3]: > https://www.python.org/dev/peps/pep-0495/#aware-datetime-equality-comparison > [4]: https://hg.python.org/peps/log/39b7c1da05a2/pep-0495.txt > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Sep 21 23:54:41 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Sep 2015 14:54:41 -0700 Subject: [Datetime-SIG] PEP 495 (Local Time Disambiguation) is ready for pronouncement In-Reply-To: References: Message-ID: On Sun, Sep 20, 2015 at 8:49 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > On Sat, Aug 15, 2015 at 8:49 PM, Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: > > > > PEP 495 [1] is a deliberately minimalistic proposal to remove an > > ambiguity in representing some local times as datetime.datetime > > objects. > > A major issue has come up since my announcement above. Tim Peters have > noticed that PEP 495 would violate the "hash invariant" unless the fold > attribute is accounted for in inter-zone comparisons. > See [2] for details. This issue has been resolved by modifying the > definition [3] of the "==" operator for aware datetimes with post-PEP > tzinfo. Note that no program will be affected by this change unless it > uses a post-PEP tzinfo implementation. > > I made some smaller changes [4] to the PEP as well and it should finally > be ready for pronouncement. > > [1]: https://www.python.org/dev/peps/pep-0495 > [2]: > https://mail.python.org/pipermail/datetime-sig/2015-September/000625.html > [3]: > https://www.python.org/dev/peps/pep-0495/#aware-datetime-equality-comparison > [4]: https://hg.python.org/peps/log/39b7c1da05a2/pep-0495.txt > I've reviewed this latest version and I am hereby accepting it. The topic is both controversial and yawn-inducing, so I think it's better not to give the usual one-day warning on python-dev -- I'll just post my decision there. Alexander and Tim, thank for all your work on this! It's been a wild, wild ride. (And no, I am not going to make a joke about leap seconds here. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Sep 22 18:12:20 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 22 Sep 2015 18:12:20 +0200 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> Message-ID: <56017DE4.9030806@egenix.com> On 21.09.2015 18:20, Alexander Belopolsky wrote: > On Mon, Sep 21, 2015 at 11:49 AM, M.-A. Lemburg wrote: >> >> Looks like assigning a "time zone" to the place is simply conceptually >> wrong and was just done to make some tz folks happy. > > > Yes, the best definition of "time zone" in computing contexts is the one > given by the tzdist group: "A description of the past and predicted future > timekeeping practices of a collection of clocks that are intended to > agree." Apparently, there is no concerted effort at the Troll station to > have a station-specific set of timekeeping rules. They just use either UTC > or Europe/Oslo depending on the needs of the current expedition. Ah, the joys of freedom of choice :-) >> Anyway, the main takeaway for me is that it is obviously possible >> to have more than two DST switches during the year, which is >> something I wasn't aware of before seeing this example. > > The March 1st switch at Troll from UTC to CET is not really a DST > transition. It is a transition that changes the standard time. (The value > of isdst does not change in the transition.) > > Even more exotic things can happen if one would try to model a ship's clock > using a tzinfo instance. By convention, ships use the time of the closest > port or whatever the captain feels appropriate in international waters. > Since ship logs are usually reliable and ship speed is low, such specialty > application will probably work in most cases. Note that faster vehicles > such as the ISS use UTC these days, but I think the Apollo program used the > Houston time. Time on ships seems to depend on what the captain and company think is the right way: http://travel.stackexchange.com/questions/43245/what-time-is-used-on-board-a-cruise-ship even though there is a standard called "Nautical time" for this: https://en.wikipedia.org/wiki/Nautical_time """ In practice, nautical times are used only for radio communication, etc. Aboard the ship, e.g. for scheduling work and meal times, the ship may use a suitable time of its own choosing. The captain is permitted to change his or her clocks at a chosen time following the ship's entry into another time zone, typically at midnight. Ships on long-distance passages change time zone on board in this fashion. On short passages the captain may not adjust clocks at all, even if they pass through different time zones, for example between the UK and continental Europe. Passenger ships often use both nautical and on-board time zones on signs. When referring to time tables and when communicating with land, the land time zone must be employed. """ On planes, the situation seems to be similar. I've not been on a flight yet where the captain announces new time zones midway :-) I guess even though the approach to use location names for time zones creates a more or less sane system on the ground, it doesn't really address the changes in authority when things start moving. Perhaps we should just standardize on UTC world-wide and then instead have the work day begin at different times depending on location. Crazy idea, but then it'd safe us all a lot of work :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 22 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-26: Python Meeting Duesseldorf Sprint 2015 4 days to go 2015-10-21: Python Meeting Duesseldorf ... 29 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From alexander.belopolsky at gmail.com Tue Sep 22 18:32:11 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 12:32:11 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: <56017DE4.9030806@egenix.com> References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 12:12 PM, M.-A. Lemburg wrote: > Perhaps we should just standardize on UTC world-wide and then > instead have the work day begin at different times depending > on location. Crazy idea, but then it'd safe us all a lot of > work :-) > Publishers of daily planners should lobby for this. They will be able to sell planners customized for each location. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Sep 22 18:44:38 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Sep 2015 02:44:38 +1000 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: <56017DE4.9030806@egenix.com> References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Wed, Sep 23, 2015 at 2:12 AM, M.-A. Lemburg wrote: > On planes, the situation seems to be similar. I've not been on a flight > yet where the captain announces new time zones midway :-) Me neither. Usually what I see is "Time at origin" and "Time at destination", and occasionally a few other time points, but nobody really cares about "time right underneath us". > Perhaps we should just standardize on UTC world-wide and then > instead have the work day begin at different times depending > on location. Crazy idea, but then it'd safe us all a lot of > work :-) > Yes! Yes, a hundred times yes! Whenever possible, I try to synchronize on UTC with everyone. Our Dungeons & Dragons campaigns are all scheduled that way - eg I run one at 2AM UTC every Sunday. It's fair on everyone, that way; nobody has to cope with more than one timezone's DST changes, and only ever their own. ChrisA From alexander.belopolsky at gmail.com Tue Sep 22 19:13:30 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 13:13:30 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 12:44 PM, Chris Angelico wrote: > > Yes! Yes, a hundred times yes! Whenever possible, I try to synchronize > on UTC with everyone. Our Dungeons & Dragons campaigns are all > scheduled that way - eg I run one at 2AM UTC every Sunday. It's fair > on everyone, that way; nobody has to cope with more than one > timezone's DST changes, and only ever their own. I call UTC "make it equally annoying to everyone choice." It is tolerable within (Western) Europe, but when your team is more geographically diverse, 2AM UTC may still be Saturday for some. Our job as programmers is to teach computers how to understand humans, not the other way around. A good scheduling application should allow you to set up the schedule in any way you want including your local time, your standard time or any other time zone that is "special" for your particular case (Dragonlance Mean Time, perhaps:-). Once scheduled, the times should be displayed to your team members in their local time. If you make a schedule relative to a time zone with DST transitions, some of your team members may be surprised by the apparent changes of the schedule. That's a human problem - whatever compromise you come up with - your computer should be helpful in implementing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Sep 22 19:40:06 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Sep 2015 03:40:06 +1000 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Wed, Sep 23, 2015 at 3:13 AM, Alexander Belopolsky wrote: > I call UTC "make it equally annoying to everyone choice." It is tolerable > within (Western) Europe, but when your team is more geographically diverse, > 2AM UTC may still be Saturday for some. It may indeed - that's actually a feature (most of our American players like it being a Saturday evening). > Our job as programmers is to teach computers how to understand humans, not > the other way around. A good scheduling application should allow you to set > up the schedule in any way you want including your local time, your standard > time or any other time zone that is "special" for your particular case > (Dragonlance Mean Time, perhaps:-). Once scheduled, the times should be > displayed to your team members in their local time. If you make a schedule > relative to a time zone with DST transitions, some of your team members may > be surprised by the apparent changes of the schedule. That's a human > problem - whatever compromise you come up with - your computer should be > helpful in implementing. The trouble is the exact same thing that we were discussing with the beginning of this mailing list. Allow me to spin you a few scenarios. "Hi folks! We're starting a D&D campaign, and we'll be meeting up every week." 1) "It'll be at 9PM every Saturday in your time zone (Chicago)." -- different for everyone 2) "It'll be at noon every Sunday for the Dungeon Master (Melbourne)." 3) "It'll be at 2AM every Sunday in UTC." I can easily write a program that does the conversions - in fact, I have one built into the MUD client that we use for actually playing the game. The trouble is, people will expect these recurring events to repeat on a cycle based on the displayed time - what's been referred to as "classic" or "naive" arithmetic. The result is: 1) Game time is every Saturday when my clock shows 9PM. 2) Game time is ... uhh ... I dunno, I'll just show up some time and hope. 3) Game time is 168 hours after the previous game started. #1 fundamentally can't work, because we have to sync up around the globe. Either that, or the program has to recalculate "it'll be 9PM this week, but 8PM next week" every time, and it would have to do that on the basis of #2 or #3. Even so, it's confusing to have to go and check it every time; the clock time for the game might change unpredictably, depending on the fundamental timezone. #2 inflicts double DST confusion on everyone that isn't in the same time zone as the Dungeon Master. This is how Threshold RPG works - the official timezone is EST (though I prefer to describe it by its tzdata name, America/New_York), so anyone in the US east coast states has it easy, and other people in contiguous USA are doing reasonably alright; folks in Australia [1] have to cope with two hour DST changes each year, and folks in Europe have to worry about temporary desynchronizations each year as DST stabilizes. #3 works for everyone. Again, papering over the difference slightly can help (which is why the MUD client has a time converter in it), but it's much easier to explain: when you go onto Daylight Saving Time, your clock moves forward, which means the next session will happen at 9PM on your clock instead of 8PM. There's exactly one clock shift for every DST transition, and all you have to do is explain to people that DST doesn't change anything except your clock. That's why we schedule things in UTC. Showing the time in your local timezone as an abstraction over a UTC fundamental is nice and safe. We *know* we can always do that unambiguously, and it's easy to explain what's going on. It's still a leaky abstraction, though, and I prefer to explicitly tell people that it's scheduled in UTC, but that they can see what UTC time translates to what local time. That's why it's safest to be clear about UTC usage, and honestly, this has nothing to do with what a computer can and can't be taught to do - it's all about what humans can get their heads around. ChrisA [1] Except Queensland, where they're smart. From alexander.belopolsky at gmail.com Tue Sep 22 20:02:39 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 14:02:39 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 1:40 PM, Chris Angelico wrote: > > That's why it's safest to be clear about UTC > usage, and honestly, this has nothing to do with what a computer can > and can't be taught to do - it's all about what humans can get their > heads around. UTC is no better than DMT (Dragonlance Mean Time). In fact, I think I will have easier time explaining DMT to a ten year old than explaining UTC. If your team can agree on a natural language, they can agree on a timescale. It does not matter what it is. If uniformity was a universal virtue, we would all be speaking Esperanto by now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Tue Sep 22 20:35:18 2015 From: 4kir4.1i at gmail.com (Akira Li) Date: Tue, 22 Sep 2015 21:35:18 +0300 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: (Alexander Belopolsky's message of "Tue, 22 Sep 2015 14:02:39 -0400") References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: <87d1xa2tg9.fsf@gmail.com> Alexander Belopolsky writes: > On Tue, Sep 22, 2015 at 1:40 PM, Chris Angelico wrote: >> >> That's why it's safest to be clear about UTC >> usage, and honestly, this has nothing to do with what a computer can >> and can't be taught to do - it's all about what humans can get their >> heads around. > > UTC is no better than DMT (Dragonlance Mean Time). In fact, I think I will > have easier time explaining DMT to a ten year old than explaining UTC. If > your team can agree on a natural language, they can agree on a timescale. > It does not matter what it is. If uniformity was a universal virtue, we > would all be speaking Esperanto by now. We are already speaking Esperanto. It is just called English. If we want the same time moment in real life then scheduling in UTC is the default. Particular time you could display using any label you like as long as the corresponding UTC time is easily available. It is trivial to find out UTC time on the computer. From alexander.belopolsky at gmail.com Tue Sep 22 20:40:53 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 14:40:53 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: <87d1xa2tg9.fsf@gmail.com> References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> <87d1xa2tg9.fsf@gmail.com> Message-ID: On Tue, Sep 22, 2015 at 2:35 PM, Akira Li <4kir4.1i at gmail.com> wrote: [Alexander Belopolsky] > If uniformity was a universal virtue, we would all be speaking Esperanto > by now. > > We are already speaking Esperanto. It is just called English. I should have said "exclusively." -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 22 20:43:30 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 14:43:30 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: <87d1xa2tg9.fsf@gmail.com> References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> <87d1xa2tg9.fsf@gmail.com> Message-ID: On Tue, Sep 22, 2015 at 2:35 PM, Akira Li <4kir4.1i at gmail.com> wrote: > It is trivial to find out UTC time on the computer. Unless it is shut down in an anticipation of a leap second. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Sep 23 01:05:07 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Sep 2015 09:05:07 +1000 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Wed, Sep 23, 2015 at 4:02 AM, Alexander Belopolsky wrote: > On Tue, Sep 22, 2015 at 1:40 PM, Chris Angelico wrote: >> >> That's why it's safest to be clear about UTC >> usage, and honestly, this has nothing to do with what a computer can >> and can't be taught to do - it's all about what humans can get their >> heads around. > > UTC is no better than DMT (Dragonlance Mean Time). In fact, I think I will > have easier time explaining DMT to a ten year old than explaining UTC. If > your team can agree on a natural language, they can agree on a timescale. > It does not matter what it is. If uniformity was a universal virtue, we > would all be speaking Esperanto by now. If I were creating my own standard out of thin air, then yes, it wouldn't make a lot of difference, and I could pick anywhere. (There are a few invariants that I'd maintain, such as that it should "tick" the same way our civil clocks do - one second equals one civil second, and they're packaged up into hours and days the same way - but it doesn't matter what the exact offset is.) But UTC already exists, and that gives it an inherent advantage. I've never tried to explain DMT to anyone, but explaining a simplified form of GMT/UTC (ignore leap seconds, ignore relativity, ignore UT0/UT1 etc) is pretty easy - it's just a well-known time zone that has no DST. ChrisA From alexander.belopolsky at gmail.com Wed Sep 23 02:12:10 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 20:12:10 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 7:05 PM, Chris Angelico wrote: > But UTC already exists, and > that gives it an inherent advantage. I've never tried to explain DMT > to anyone, but explaining a simplified form of GMT/UTC (ignore leap > seconds, ignore relativity, ignore UT0/UT1 etc) is pretty easy - it's > just a well-known time zone that has no DST. > It is not as well-known as you might think. I, for one, don't even know how to translate it in my native Russian. I bet people in Russia who know what Moscow time is outnumber those who know what UTC is at least 100 to 1. I bet you will get a similar ratio in California between UTC and say Eastern Standard Time. No TV station in Russia or in the US will ever announce its schedule in UTC. They will use Moscow time in Russia and EST in the US. Occasionally, national TV networks in the US will announce the show time in two or three major time zones, but never in UTC. In Russia, time zones are identified as Moscow+HH much more often than UTC+HH. The only place you will see clocks showing UTC time in Russia is the space command center. GMT is popular in Western Europe because of geographical proximity and the ubiquitous BBC broadcasts, but it is not as well-known elsewhere. Let's have a show of hands here: how many people know what "C" stands for in UTC and what "M" stands in GMT and what is the significance of these letters? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Sep 23 02:57:05 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Sep 2015 10:57:05 +1000 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Wed, Sep 23, 2015 at 10:12 AM, Alexander Belopolsky wrote: > It is not as well-known as you might think. I, for one, don't even know how > to translate it in my native Russian. I bet people in Russia who know what > Moscow time is outnumber those who know what UTC is at least 100 to 1. I > bet you will get a similar ratio in California between UTC and say Eastern > Standard Time. Of course. Local time is always better known than UTC. But any given local time is only going to be known in its own locality. I would bet that the people in Russia who know Eastern Standard Time, or the people in California who know Moscow time, would be quite low. > Let's have a show of hands here: how many people know what "C" stands for in > UTC and what "M" stands in GMT and what is the significance of these > letters? I know, on both counts, because I'm a wonk. But those specifics are part of what I would elide, along with leap seconds and relativity, when explaining a scheduling system. (Let's face it - nobody's going to schedule a meeting to such accuracy that any of it will matter.) Time is a lot messier than most people need to care about. ChrisA From alexander.belopolsky at gmail.com Wed Sep 23 04:16:07 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 22:16:07 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 8:57 PM, Chris Angelico wrote: > [ Alexander Belopolsky] I bet people in Russia who know what > Moscow time is outnumber those who know what UTC is at least 100 to 1. I > > bet you will get a similar ratio in California between UTC and say > Eastern > > Standard Time. > > Of course. Local time is always better known than UTC. Moscow Time is hardly local for Russian Anadyr or Petropavlovsk-Kamchatsky, but people still use Moscow Time for train schedules there. In fact, those places are closer to California than they are to Moscow. > But any given local time is only going to be known in its own locality. Depends on a locality. Local time at the village of Greenwich is fairly well-known. :-) > I would bet > that the people in Russia who know Eastern Standard Time, or the > people in California who know Moscow time, would be quite low. > I suspect that anyone who knows about UTC would know about both Moscow and New York. > > Let's have a show of hands here: how many people know what "C" stands > for in > > UTC and what "M" stands in GMT and what is the significance of these > > letters? > > I know, on both counts, because I'm a wonk. Well, in this case you know more than I do. I know that "M" stands for "mean" (I've heard that on BBC:-) and that it has something to do with the solar time, but I cannot tell you "mean" of what it is or whether BBC's fifth beep comes on a UTC or GMT second. > But those specifics are > part of what I would elide, along with leap seconds and relativity, > when explaining a scheduling system. Right, but most people (myself included) only learn about UTC when they learn about those complications. I would say in New York, Eastern Time is for most people, EST is for nerds and UTC is for wonks. (Let's face it - nobody's going > to schedule a meeting to such accuracy that any of it will matter.) > Time is a lot messier than most people need to care about. Right. So let them use the time that their wall clocks are showing. When a New Yorker calls Cupertino, they have three options: Eastern, Pacific and UTC. The first two are a slight inconvenience for one of them and the third is a major annoyance for both. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Sep 23 04:27:52 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Sep 2015 12:27:52 +1000 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Wed, Sep 23, 2015 at 12:16 PM, Alexander Belopolsky wrote: > > On Tue, Sep 22, 2015 at 8:57 PM, Chris Angelico wrote: >> >> [ Alexander Belopolsky] I bet people in Russia who know what >> >> > Moscow time is outnumber those who know what UTC is at least 100 to 1. >> > I >> > bet you will get a similar ratio in California between UTC and say >> > Eastern >> > Standard Time. >> >> Of course. Local time is always better known than UTC. > > > Moscow Time is hardly local for Russian Anadyr or Petropavlovsk-Kamchatsky, > but people still use Moscow Time for train schedules there. In fact, those > places are closer to California than they are to Moscow. "Close" doesn't necessarily have anything to do with geographic location. I'm fairly sure Troll Research Station isn't physically close to Norway, but when it's being operated solely by Norwegians, it's politically very close. I've no idea how the trains operate, but it's a lot more likely that they're politically near Moscow than California. >> I would bet >> that the people in Russia who know Eastern Standard Time, or the >> people in California who know Moscow time, would be quite low. > > I suspect that anyone who knows about UTC would know about both Moscow and > New York. Know about, yes, but they won't necessarily know the DST rules etc. >> > Let's have a show of hands here: how many people know what "C" stands >> > for in >> > UTC and what "M" stands in GMT and what is the significance of these >> > letters? >> >> I know, on both counts, because I'm a wonk. > > Well, in this case you know more than I do. I know that "M" stands for > "mean" (I've heard that on BBC:-) and that it has something to do with the > solar time, but I cannot tell you "mean" of what it is or whether BBC's > fifth beep comes on a UTC or GMT second. Yes, it's because GMT is based on the average solar noon. If you have an actual sundial, you can observe actual solar noon, but to convert that to civil time, you need a table of translations that takes seasonal variation into account. In theory, Greenwich Time would show noon when the sun is directly overhead, but that would mean that successive days vary in length; Greenwich Mean Time averages it all out so you get a consistent 86400-second day. UTC is defined by the coordination of a bunch of clocks around the world. There are a few different forms, most of which never go more than one second away from each other. GMT is usually defined as being equal to one or other of them, but which one is not entirely standardized, so if you need subsecond accuracy, don't use GMT at all. For scheduling events, though, GMT == UTC == TIA == Unix time. >> But those specifics are >> part of what I would elide, along with leap seconds and relativity, >> when explaining a scheduling system. > > > Right, but most people (myself included) only learn about UTC when they > learn about those complications. I would say in New York, Eastern Time is > for most people, EST is for nerds and UTC is for wonks. > >> (Let's face it - nobody's going >> to schedule a meeting to such accuracy that any of it will matter.) >> Time is a lot messier than most people need to care about. > > > Right. So let them use the time that their wall clocks are showing. When a > New Yorker calls Cupertino, they have three options: Eastern, Pacific and > UTC. The first two are a slight inconvenience for one of them and the third > is a major annoyance for both. Sure. If you're scheduling a one-off event, that's no problem. But when you schedule a recurring event, suddenly the first two become major annoyances and the third becomes much more minor. (With the possible exception that different states of the US can probably cheat, since there's federal US legislation about DST. But if your examples were New York and Sydney, then my point stands.) ChrisA From alexander.belopolsky at gmail.com Wed Sep 23 04:45:19 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 22:45:19 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 10:27 PM, Chris Angelico wrote: > [ Alexander Belopolsky] but I cannot tell you "mean" of what it is or > whether BBC's > > fifth beep comes on a UTC or GMT second. > > Yes, it's because GMT is based on the average solar noon. If you have > an actual sundial, you can observe actual solar noon, but to convert > that to civil time, you need a table of translations that takes > seasonal variation into account. In theory, Greenwich Time would show > noon when the sun is directly overhead, but that would mean that > successive days vary in length; Greenwich Mean Time averages it all > out so you get a consistent 86400-second day. > > UTC is defined by the coordination of a bunch of clocks around the > world. There are a few different forms, most of which never go more > than one second away from each other. GMT is usually defined as being > equal to one or other of them, but which one is not entirely > standardized, so if you need subsecond accuracy, don't use GMT at all. > For scheduling events, though, GMT == UTC == TIA == Unix time. > Thanks for the lecture, but I still don't know what BBC broadcasts. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Sep 23 04:53:23 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 22:53:23 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 10:27 PM, Chris Angelico wrote: > (With the > possible exception that different states of the US can probably cheat, > since there's federal US legislation about DST. But if your examples > were New York and Sydney, then my point stands.) > What would be your guess for the ratio between the number of calls between New York and say San Francisco to that between New York and Sydney? For the latter, I'll concede: UTC makes sense because it is somewhere in the middle. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Sep 23 04:58:28 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Sep 2015 12:58:28 +1000 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Wed, Sep 23, 2015 at 12:53 PM, Alexander Belopolsky wrote: > On Tue, Sep 22, 2015 at 10:27 PM, Chris Angelico wrote: >> >> (With the >> possible exception that different states of the US can probably cheat, >> since there's federal US legislation about DST. But if your examples >> were New York and Sydney, then my point stands.) > > > What would be your guess for the ratio between the number of calls between > New York and say San Francisco to that between New York and Sydney? For > the latter, I'll concede: UTC makes sense because it is somewhere in the > middle. :-) Heh. It's not really a matter of being in the middle, though - I would advocate UTC for any recurring event that involves different DST rules. UTC isn't mid-way between, say, Sydney and Warsaw, but if you want to phone someone in the opposite hemisphere every week, it'd be best to schedule it in UTC so you don't have to worry about four different offsets (you could both be on DST, or either of you could, or neither). Of course, if DST were abolished world-wide, then everything would be easy, and we could happily schedule things in each other's timezones without any confusion. I could key in "11PM" and the program would interpret that as being UTC+10, and then my friend in Florida could see it as "9AM", and nobody would be confused at all. Alas, I fear 'tis a vain hope... ChrisA From alexander.belopolsky at gmail.com Wed Sep 23 05:15:57 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Sep 2015 23:15:57 -0400 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: On Tue, Sep 22, 2015 at 10:58 PM, Chris Angelico wrote: > Of course, if DST were abolished world-wide, then everything would be > easy, and we could happily schedule things in each other's timezones > without any confusion. I could key in "11PM" and the program would > interpret that as being UTC+10, and then my friend in Florida could > see it as "9AM", and nobody would be confused at all. Alas, I fear > 'tis a vain hope... > I think all these DST-related scheduling problems are highly exaggerated. My kids go to a school in New York with a European curriculum. Apparently, schoolchildren in Europe study six days a week, so the program is organized on a 6-days cycle. This means that the first Monday is day 1, the second is day 6, the third is ... I am lost already. Guess what: the kids don't complain. Figuring out timezone difference between New York and Sydney is easy. Try to match up the school holiday schedules between New York and New Jersey! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Wed Sep 23 09:43:52 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 23 Sep 2015 09:43:52 +0200 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> Message-ID: <56025838.1010702@egenix.com> I think I only got part of my tongue-in-cheek suggestion across :-) The idea was to drop local time altogether and instead use UTC everywhere. Wall clocks would all show UTC. Instead of switching time zones, you'd adapt your schedule as needed and this could be as flexible as you want. People would just have to get used to having dinner at e.g. 03:00 UTC instead of 8pm [add some timezone here] and you would be able to enjoy sunset at 10:00 UTC in some places. Ain't going to happen, but it would allow people to gain back some more freedom in scheduling their lives. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 23 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-26: Python Meeting Duesseldorf Sprint 2015 3 days to go 2015-10-21: Python Meeting Duesseldorf ... 28 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rosuav at gmail.com Wed Sep 23 13:33:50 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Sep 2015 21:33:50 +1000 Subject: [Datetime-SIG] Computing .dst() as a timedelta In-Reply-To: <56025838.1010702@egenix.com> References: <55FFFBA7.80905@egenix.com> <56002719.8090404@egenix.com> <56017DE4.9030806@egenix.com> <56025838.1010702@egenix.com> Message-ID: On Wed, Sep 23, 2015 at 5:43 PM, M.-A. Lemburg wrote: > I think I only got part of my tongue-in-cheek suggestion across :-) > > The idea was to drop local time altogether and instead use UTC > everywhere. Wall clocks would all show UTC. Instead of switching > time zones, you'd adapt your schedule as needed and this could > be as flexible as you want. > > People would just have to get used to having dinner at e.g. > 03:00 UTC instead of 8pm [add some timezone here] and you > would be able to enjoy sunset at 10:00 UTC in some places. > > Ain't going to happen, but it would allow people to gain back > some more freedom in scheduling their lives. *looks at left wrist* Current civil time is 9:33PM. *looks at right wrist* Current UTC time is 11:33. I wear two watches for that exact reason. Bring it on! ChrisA From alexander.belopolsky at gmail.com Wed Sep 23 23:00:32 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Sep 2015 17:00:32 -0400 Subject: [Datetime-SIG] IANA TZ database statistics Message-ID: I added a method to datetimetester [1] to compute some overall statistics on tzfiles. My code ignores "version 2" data, so I include only transitions that fall within 32-bit time_t (1900 to 2038 range). Here are the results for the default Mac OSX system files and the most recent Github version of tz [2]: >>> from datetimetester import * >>> ZoneInfo.stats() Number of zones: 584 Number of transitions: 38510 = 19058 (gaps) + 19008 (folds) + 444 (zeros) Min gap: 0:00:16 at 1935-01-01 03:40:52 in America/Paramaribo Max gap: 1 day, 0:00:00 at 2011-12-30 10:00:00 in Pacific/Apia Min fold: 0:01:31 at 1932-01-01 03:58:29 in America/Barbados Max fold: 10:00:00 at 1952-01-13 14:00:00 in Antarctica/DumontDUrville >>> ZoneInfo.zoneroot = '/usr/local/etc/zoneinfo' >>> ZoneInfo.stats() Number of zones: 585 Number of transitions: 39018 = 19434 (gaps) + 19131 (folds) + 453 (zeros) Min gap: 0:00:04 at 1914-01-01 04:00:04 in America/Manaus Max gap: 1 day, 0:00:00 at 2011-12-30 11:00:00 in Pacific/Fakaofo Min fold: 0:00:10 at 1906-06-30 16:53:20 in Asia/Ho_Chi_Minh Max fold: 23:00:00 at 1969-09-30 13:00:00 in Kwajalein [1]: https://github.com/abalkin/cpython/commit/fa4f8055ac6723d4d0940ea141e05f931c718a2c [2]: https://github.com/eggert/tz -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Sep 24 00:16:44 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 24 Sep 2015 08:16:44 +1000 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Thu, Sep 24, 2015 at 7:00 AM, Alexander Belopolsky wrote: > Here are the results for the default Mac OSX system files and the most > recent Github version of tz [2]: > >>>> from datetimetester import * >>>> ZoneInfo.stats() > Number of zones: 584 > Number of transitions: 38510 = 19058 (gaps) + 19008 (folds) + 444 (zeros) > Min gap: 0:00:16 at 1935-01-01 03:40:52 in > America/Paramaribo > Max gap: 1 day, 0:00:00 at 2011-12-30 10:00:00 in Pacific/Apia > Min fold: 0:01:31 at 1932-01-01 03:58:29 in America/Barbados > Max fold: 10:00:00 at 1952-01-13 14:00:00 in > Antarctica/DumontDUrville >>>> ZoneInfo.zoneroot = '/usr/local/etc/zoneinfo' >>>> ZoneInfo.stats() > Number of zones: 585 > Number of transitions: 39018 = 19434 (gaps) + 19131 (folds) + 453 (zeros) > Min gap: 0:00:04 at 1914-01-01 04:00:04 in America/Manaus > Max gap: 1 day, 0:00:00 at 2011-12-30 11:00:00 in Pacific/Fakaofo > Min fold: 0:00:10 at 1906-06-30 16:53:20 in Asia/Ho_Chi_Minh > Max fold: 23:00:00 at 1969-09-30 13:00:00 in Kwajalein Neat! (Is that meant to be "from test.datetimetester import *", or was I loading this up the wrong way? Anyway, not significant.) A lot of the small numbers are going to be when different places adopted standard time, and such. To get a better handle on what's happening _now_, I added an option [1] to your stats function for a starting year: >>> ZoneInfo.stats(start_year=1970) Number of zones: 1790 = 46266 (gaps) + 46130 (folds) + 843 (zeros) Min gap: 0:15:00 at 1985-12-31 18:30:13 in right/Asia/Kathmandu Max gap: 1 day, 0:00:00 at 2011-12-30 10:00:24 in right/Pacific/Apia Min fold: 0:30:00 at 2037-04-04 15:00:26 in right/Australia/Lord_Howe Max fold: 3:00:00 at 2012-02-21 17:00:24 in right/Antarctica/Casey >>> ZoneInfo.stats() Number of zones: 1790 = 58914 (gaps) + 58777 (folds) + 1363 (zeros) Min gap: 0:00:16 at 1935-01-01 03:40:52 in posix/America/Paramaribo Max gap: 1 day, 0:00:00 at 2011-12-30 10:00:24 in right/Pacific/Apia Min fold: 0:01:31 at 1932-01-01 03:58:29 in posix/America/Barbados Max fold: 10:00:00 at 1952-01-13 14:00:00 in posix/Antarctica/DumontDUrville I'm not sure whether this actually helps anything or not, but hey, cool stats :) ChrisA [1] https://github.com/Rosuav/cpython/commit/ed51575f7ffe7ba98bfad58a43602cb8f74cfe2a From alexander.belopolsky at gmail.com Thu Sep 24 00:44:26 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Sep 2015 18:44:26 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Wed, Sep 23, 2015 at 6:16 PM, Chris Angelico wrote: > A lot of the small numbers are going to be when different places > adopted standard time, and such. To get a better handle on what's > happening _now_, I added an option [1] to your stats function for a > starting year: > > >>> ZoneInfo.stats(start_year=1970) > Number of zones: 1790 = 46266 (gaps) + 46130 (folds) + 843 (zeros) > Min gap: 0:15:00 at 1985-12-31 18:30:13 in > right/Asia/Kathmandu > Max gap: 1 day, 0:00:00 at 2011-12-30 10:00:24 in > right/Pacific/Apia > Min fold: 0:30:00 at 2037-04-04 15:00:26 > in right/Australia/Lord_Howe > Max fold: 3:00:00 at 2012-02-21 17:00:24 > in right/Antarctica/Casey > >>> ZoneInfo.stats() > Number of zones: 1790 = 58914 (gaps) + 58777 (folds) + 1363 (zeros) > Min gap: 0:00:16 at 1935-01-01 03:40:52 in > posix/America/Paramaribo > Max gap: 1 day, 0:00:00 at 2011-12-30 10:00:24 in > right/Pacific/Apia > Min fold: 0:01:31 at 1932-01-01 03:58:29 in > posix/America/Barbados > Max fold: 10:00:00 at 1952-01-13 14:00:00 in > posix/Antarctica/DumontDUrville > It looks like you've got double counts because you included both "posix" and "right" tzfiles in the search. (I don't think the data that I actually read is different between the two sets.) > I'm not sure whether this actually helps anything or not, but hey, cool > stats :) > If we can make some simplified assumptions about transition locations and sizes, we can avoid a binary search over seconds to locate the transitions via POSIX localtime/mktime APIs. I am considering making the cut-off at 1970 and assume 1970 standard time for all times before that year. I think this is best we can do on Windows where (IIRC) mktime does not work for times before epoch. (What about localtime?) In any case, TZ data before 1970 is highly suspect so we will probably do our users a favor by assuming standard time and letting those with historical timeseries figure out the transitions by themselves. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Sep 24 00:49:15 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 24 Sep 2015 08:49:15 +1000 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Thu, Sep 24, 2015 at 8:44 AM, Alexander Belopolsky wrote: > In any case, TZ data before 1970 is highly suspect so we will probably do > our users a favor by assuming standard time and letting those with > historical timeseries figure out the transitions by themselves. Yeah. Originally I made a boolean to suppress pre-1970 data, before settling on the arbitrary starting year option. I expect that 1970 will be the most common year to use as the base. FWIW: https://github.com/abalkin/cpython/pull/1 ChrisA From tim.peters at gmail.com Thu Sep 24 02:47:28 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 23 Sep 2015 19:47:28 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: [Alex] > I added a method to datetimetester [1] to compute some overall statistics on > tzfiles. My code ignores "version 2" data, so I include only transitions > that fall within 32-bit time_t (1900 to 2038 range). >From staring at zic.c, looks like (so far) the data in the version 2 section is identical to that in the version 1 section, except written out in wider data formats. The pretty clear intent is that they never intend to generate explicit transitions beyond 2037 in any version, until it's after 2037 in the real world and they need to do so because a POSIX TZ rule can't handle some new goofy exception (and version 2 also contains a POSIX TZ rule at the end, when possible). Then they'll need to add new transitions in the version 2 section only (version 1 data formats are too narrow to record them). > I am considering making the cut-off at 1970 and assume 1970 standard > time for all times before that year. Is there a real need for a "high performance" tzinfo? That is, who cares? ;-) It would sure be _surprising_ if a Python wrapping of zoneinfo returned different results than native Linux tools wrapping the same thing. > I think this is best we can do on Windows Of course not, if by "best" we mean "gets the same answers everyone else gets". In that case, "best" is returning what the IANA database says should be returned in all cases. > where (IIRC) mktime does not work for times before epoch. (What about > localtime?) Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32 >>> import time >>> time.localtime(0) time.struct_time(tm_year=1969, tm_mon=12, tm_mday=31, tm_hour=18, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=365, tm_isdst=0) >>> time.localtime(-1) Traceback (most recent call last): File "", line 1, in time.localtime(-1) OSError: [Errno 22] Invalid argument Which is another meaning for "best": avoid flaky C library functions altogether. >>> epoch = datetime(1970, 1, 1) >>> epoch + timedelta(seconds=1e11) datetime.datetime(5138, 11, 16, 9, 46, 40) >>> import time >>> time.localtime(1e11) Traceback (most recent call last): File "", line 1, in time.localtime(1e11) OSError: [Errno 22] Invalid argument From random832 at fastmail.com Thu Sep 24 03:12:59 2015 From: random832 at fastmail.com (Random832) Date: Wed, 23 Sep 2015 21:12:59 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: <1443057179.975535.392060185.41F67F5C@webmail.messagingengine.com> On Wed, Sep 23, 2015, at 20:47, Tim Peters wrote: > [Alex] > > I added a method to datetimetester [1] to compute some overall statistics on > > tzfiles. My code ignores "version 2" data, so I include only transitions > > that fall within 32-bit time_t (1900 to 2038 range). > > From staring at zic.c, looks like (so far) the data in the version 2 > section is identical to that in the version 1 section, except written > out in wider data formats. The pretty clear intent is that they never > intend to generate explicit transitions beyond 2037 in any version, They do have transitions for before 1901, though. > > I think this is best we can do on Windows > > where (IIRC) mktime does not work for times before epoch. (What about > > localtime?) Windows has its own mechanism for storing timezone information, not tzdata, but no, none of the MSVCRT functions work for times before 1970. From tim.peters at gmail.com Thu Sep 24 03:22:20 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 23 Sep 2015 20:22:20 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: [Alex] > ... > If we can make some simplified assumptions about transition locations and > sizes, we can avoid a binary search over seconds to locate the transitions > via POSIX localtime/mktime APIs. BTW, "the obvious" way to almost always avoid binary search is for a tzinfo to remember the index of the last transition it had to use, then next time start a linear search from there. It should usually succeed in 1 or 2 tries. Programs in real life don't jump around across all possible times at random. To be truly insane, it could meld linear search with binary search, like Python's listsort.c's "gallop" functions. I'd say that's far more trouble than it's worth in this context, though. Simple linear search with a search finger (index saved across searches) should do fine. From alexander.belopolsky at gmail.com Thu Sep 24 05:58:18 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Sep 2015 23:58:18 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: > [Alex] > > My code ignores "version 2" data, so I include only transitions > > that fall within 32-bit time_t (1900 to 2038 range). > [Tim] > From staring at zic.c, .. I get a pounding headache. :-( > looks like (so far) the data in the version 2 > section is identical to that in the version 1 section, except written > out in wider data formats. The pretty clear intent is that they never > intend to generate explicit transitions beyond 2037 in any version, > I compiled the latest Github version on my Mac and I get $ /usr/local/etc/zdump -V 'America/New_York'| tail -4 America/New_York Sun Mar 8 06:59:59 2499 UT = Sun Mar 8 01:59:59 2499 EST isdst=0 gmtoff=-18000 America/New_York Sun Mar 8 07:00:00 2499 UT = Sun Mar 8 03:00:00 2499 EDT isdst=1 gmtoff=-14400 America/New_York Sun Nov 1 05:59:59 2499 UT = Sun Nov 1 01:59:59 2499 EDT isdst=1 gmtoff=-14400 America/New_York Sun Nov 1 06:00:00 2499 UT = Sun Nov 1 01:00:00 2499 EST isdst=0 gmtoff=-18000 > until it's after 2037 in the real world and they need to do so because > a POSIX TZ rule can't handle some new goofy exception (and version 2 > also contains a POSIX TZ rule at the end, when possible). What they do is a so-called 400-year hack: since the Gregorian calendar repeats itself every 400 years, any regular calendar-based rule will generate transitions with a 400-year period. This observation allows them to generate 400+ years of explicit transitions through 2499 and extent that through eternity by periodicity. > Then they'll need to add new transitions in the version 2 section only > (version 1 data formats are too narrow to record them). > > They already do that for transitions both before EPOCH - 2**31 seconds and after EPOCH + 2**31 seconds. $ /usr/local/etc/zdump -V 'America/New_York'| head -2 America/New_York Sun Nov 18 16:59:59 1883 UT = Sun Nov 18 12:03:57 1883 LMT isdst=0 gmtoff=-17762 America/New_York Sun Nov 18 17:00:00 1883 UT = Sun Nov 18 12:00:00 1883 EST isdst=0 gmtoff=-18000 [Alex] > I am considering making the cut-off at 1970 and assume 1970 standard > time for all times before that year. > [Tim] > Is there a real need for a "high performance" tzinfo? This is not about new tzinfos. This is about implementing PEP 495's .astimezone(). > That is, who cares? ;-) I do. :-) > It would sure be _surprising_ if a Python wrapping of > zoneinfo returned different results than native Linux tools wrapping > the same thing. > This is not about wrapping IANA's tzdist. This is about implementing PEP 495 features using POSIX APIs. > [Alex] > I think this is best we can do on Windows > [Tim] > Of course not, if by "best" we mean "gets the same answers everyone > else gets". In that case, "best" is returning what the IANA database > says should be returned in all cases. > Which version of IANA database? > [Alex] > where (IIRC) mktime does not work for times before epoch. (What about > > localtime?) > [Tim] > >>> time.localtime(-1) .. OSError: [Errno 22] Invalid argument > That's what I thought. > > Which is another meaning for "best": avoid flaky C library functions > altogether. > > >>> time.localtime(1e11) .. > OSError: [Errno 22] Invalid argument > I don't want to try to figure out how to access tzfiles in a portable way. We need another PEP for this because I don't see any better solution than to repackage IANA files as a pip-installable package. Such PEP should probably be discussed on distutils-sig first. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 24 07:37:48 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 24 Sep 2015 00:37:48 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: [Alex] > ... > This is not about new tzinfos. This is about implementing PEP 495's > .astimezone(). Ah. You realize that's the first time that's been mentioned in this thread? It's been a total mystery until now ;-) > ... > This is not about wrapping IANA's tzdist. This is about implementing PEP > 495 features using POSIX APIs. Specifically which features? Do you just mean .astimezone() treating a naive datetime as being in the system zone, and the absence of any argument implying the system zone? Or more than just that? >> ... >> In that case, "best" is returning what the IANA database >> says should be returned in all cases. > Which version of IANA database? If it's still relevant, the only version any user cares about: the one that happens to be installed on their machine ;-) > ... > I don't want to try to figure out how to access tzfiles in a portable way. > We need another PEP for this because I don't see any better solution than to > repackage IANA files as a pip-installable package. Such PEP should probably > be discussed on distutils-sig first. Sorry, since this thread started by presenting statistics about the contents of the IANA database, I three-quarters assumed that _was_ what this was about. I agree that needs a whole different PEP. I also agree figuring out the system zone's rules is a puzzle using POSIX. Note that Gustavo gave up on trying to use mktime() in dateutil's tzlocal class. You could say time.timezone and time.altzone define the only two (or one, if time.daylight is 0) possible total UTC offsets, and assume that's always been, and always will be, the case. But I don't think even `altzone` is actually required by POSIX - it's of little help :-( From alexander.belopolsky at gmail.com Thu Sep 24 17:11:48 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Sep 2015 11:11:48 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: > [Alex] > This is not about wrapping IANA's tzdist. This is about implementing PEP > > 495 features using POSIX APIs. > [Tim] > Specifically which features? Do you just mean .astimezone() treating > a naive datetime as being in the system zone, and the absence of any > argument implying the system zone? Or more than just that? > Also, .timestamp() respecting the fold attribute and datetime.now() and datetime.fromtimestamp() setting the fold attribute appropriately. In all these cases one needs to know how far the transition point is from a given time. > > >> [Tim] > >> In that case, "best" is returning what the IANA database > >> says should be returned in all cases. > The database itself does not say anything about what should be returned by various tools, but I would interpret that as "whatever zdump returns." > > > [Alex] > Which version of IANA database? [Tim] > If it's still relevant, the only version any user cares about: the > one that happens to be installed on their machine ;-) I don't think Windows comes with any, but I know close to nothing about Windows. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Thu Sep 24 17:26:34 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Sep 2015 11:26:34 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Thu, Sep 24, 2015 at 1:37 AM, Tim Peters wrote: > I also agree figuring out the system zone's rules is a puzzle using > POSIX. Note that Gustavo gave up on trying to use mktime() in > dateutil's tzlocal class. > I think he was bitten by the flaky behavior of mktime() when tm_isdst is passed as -1. I intend calling mktime twice with tm_isdst=0 and tm_isdst=1 and detect fold/gap by what mktime that does to the tm structure. If we discover that some systems misbehave even in tm_isdst>=0 cases, we can roll out our own mktime() that probes localtime() multiple times. > You could say time.timezone and > time.altzone define the only two (or one, if time.daylight is 0) > possible total UTC offsets, and assume that's always been, and always > will be, the case. > Linux (glibc) updates timezone, altzone and tzname whenever localtime() is called. I think this is a horrible hack, but it does not seem to be in violation of POSIX. > But I don't think even `altzone` is actually > required by POSIX - it's of little help :-( > I don't want to rely on any of these variables. For offsets, I would just compute the timestamp on the output of localtime_r (the reentrant version does not mess with globals) and compare that to the input timestamp. For tzname, I would use strftime("%Z") which seems to be rather portable. Of course, on platforms where localtime and mktime fill in tm_gmtoff and tm_zone, I can just use those. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 24 19:06:48 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 24 Sep 2015 12:06:48 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: [Tim] >> ... >> Specifically which features? Do you just mean .astimezone() treating >> a naive datetime as being in the system zone, and the absence of any >> argument implying the system zone? Or more than just that? [Alex] > Also, .timestamp() respecting the fold attribute and datetime.now() and > datetime.fromtimestamp() setting the fold attribute appropriately. In all > these cases one needs to know how far the transition point is from a given > time. Got it. I should have known that the first time - sorry ;-) >> In that case, "best" is returning what the IANA database >> says should be returned in all cases. > The database itself does not say anything about what should be returned by > various tools, but I would interpret that as "whatever zdump returns." Gimme a break. > ... > I don't think Windows comes with any, but I know close to nothing about > Windows. Windows has minimal (compared to IANA) time zone info stored in the registry. You can look at dateutil's tzwin.py for code accessing it. Zones generally store no historical info, and assume a zone switches DST zero or two times per year.. In the latter case, the registry essentially stores a compiled version of the "n'th weekday of the month" flavor of POSIX TZ string rules, so code can compute when DST starts and ends each year. tzwin.py's tzwinlocal class implements a hybrid tzinfo appropriate for the current system zone (although it never worked for me :-( ). So, ironically enough, this could all be relatively straightforward on Windows: in return for sticking to regular rules, you get to know the rules up front. For portable code, think of Windows as implementing as little as POSIX requires of localtime() and mktime(). While it uses a 64-bit type for time_t, values must be >= 0 and are documented as working only through 31 December 3000 23:59:59 UTC. On my Windows 10 box, it actually goes 21 whole hours ;-) beyond that:" Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32 ... >>> datetime.utcfromtimestamp(32535215999) datetime.datetime(3000, 12, 31, 23, 59, 59) >>> datetime.utcfromtimestamp(32535215999 + 21 * 3600) datetime.datetime(3001, 1, 1, 20, 59, 59) >>> datetime.utcfromtimestamp(32535215999 + 21 * 3600 + 1) Traceback (most recent call last): File "", line 1, in datetime.utcfromtimestamp(32535215999 + 21 * 3600 + 1) OSError: [Errno 22] Invalid argument >> I also agree figuring out the system zone's rules is a puzzle using >> POSIX. Note that Gustavo gave up on trying to use mktime() in >> dateutil's tzlocal class. > I think he was bitten by the flaky behavior of mktime() when tm_isdst > is passed as -1. Good point! > I intend calling mktime twice with tm_isdst=0 and tm_isdst=1 and detect > fold/gap by what mktime that does to the tm structure. If we discover > that some systems misbehave even in tm_isdst>=0 cases, we can roll > out our own mktime() that probes localtime() multiple times. Just one suggestion: force the year/timestamp into a 400-year span starting at 1971 first (via adding/subtracting multiples of 400 years). Then not even Windows will blow up ;-) > ... From alexander.belopolsky at gmail.com Thu Sep 24 19:38:17 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Sep 2015 13:38:17 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Thu, Sep 24, 2015 at 1:06 PM, Tim Peters wrote: > Just one suggestion: force the year/timestamp into a 400-year span > starting at 1971 first (via adding/subtracting multiples of 400 > years). Then not even Windows will blow up ;-) > This will work for the future dates (and I think I should use 2100 through 2399 range to avoid extending not-regular rules into the far future). For the far in the past dates, I still think the earliest transition to standard time should be used as the "big bang" transition. Note that the 400 year hack does not work for systems with 32-bit time_t. I think it is ok to just raise OverflowError on those whenever a timezone operation is requested on a date outside of EPOCH ? 2**31 range. That's about 140 years. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 24 20:16:59 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 24 Sep 2015 13:16:59 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: [Tim] >> Just one suggestion: force the year/timestamp into a 400-year span >> starting at 1971 first (via adding/subtracting multiples of 400 >> years). Then not even Windows will blow up ;-) [Alex] > This will work for the future dates (and I think I should use 2100 through > 2399 range to avoid extending not-regular rules into the far future). That's fine. > For the far in the past dates, I still think the earliest transition to standard > time should be used as the "big bang" transition. I'm not sure exactly what that means - I'm just trying to worm around that time_t values less than 0 aren't supported on all systems. > Note that the 400 year > hack does not work for systems with 32-bit time_t. I think it is ok to just > raise OverflowError on those whenever a timezone operation is requested on a > date outside of EPOCH ? 2**31 range. That's about 140 years. The 400-year hack is just mindlessly simple. It's possible to do far better, since there are only 14 possible yearly calendars (which day of the week is January first, and is it a leap year? 7*2 = 14). So a table with 14 entries, mapping (weekday_of_1_Jan, is_leap) -> fixed canonical year is sufficient. Nothing in that depends on the time zone - it can be precomputed as a static table equally applicable to all time zones (for years in which "normalization": is desired). In general, most (*) 28-year spans contain at least one of each possible yearly calendar. So a 32-bit time_t isn't a real problem here. For example, any system capable of representing the years from 1972 through 1996 inclusive covers all possible yearly calendars. (*) Exceptions can occur when the span crosses a century. From alexander.belopolsky at gmail.com Thu Sep 24 20:29:58 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Sep 2015 14:29:58 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Thu, Sep 24, 2015 at 2:16 PM, Tim Peters wrote: > > For the far in the past dates, I still think the earliest transition to > standard > > time should be used as the "big bang" transition. > > I'm not sure exactly what that means - I'm just trying to worm around > that time_t values less than 0 aren't supported on all systems. I should have said "earliest *discoverable* transition." For systems with non-negative time_t, that would be somewhere in 1970s. The key decision here is that regular DST transitions are extended into the future but not into the past. For the far past utcoffset will be fixed at the earliest standard time offset that can be fished out from localtime/mktime calls. Does Python support any systems with 32-bit time_t? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 24 20:40:43 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 24 Sep 2015 13:40:43 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: [Alex] > Does Python support any systems with 32-bit time_t? Not that I use ;-) But I'm sure there are many - pick some random 32-bit box. The move to 64-bit time_t appears to be relatively recent even on Linux system. I don't know when it happened on Windows, but 32-bit Windows XP boxes definitely use 32 bits (and there are still flags to allow switching back to that). http://stackoverflow.com/questions/14361651/is-there-any-way-to-get-64-bit-time-t-in-32-bit-program-in-linux From random832 at fastmail.com Thu Sep 24 20:42:18 2015 From: random832 at fastmail.com (Random832) Date: Thu, 24 Sep 2015 14:42:18 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> On Thu, Sep 24, 2015, at 14:29, Alexander Belopolsky wrote: > Does Python support any systems with 32-bit time_t? Uh... Linux/i386 comes to mind. I still don't see the logic in doing any of this rather than parsing zoneinfo files directly on systems that use it; Get[Dynamic]TimeZoneInformation[ForYear] on Windows, etc. From tim.peters at gmail.com Thu Sep 24 20:48:19 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 24 Sep 2015 13:48:19 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> References: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> Message-ID: [Random832 ] > ... > I still don't see the logic in doing any of this rather than parsing > zoneinfo files directly on systems that use it; > Get[Dynamic]TimeZoneInformation[ForYear] on Windows, etc. Presumably Alex doesn't want to devote his life to fleshing out "etc" on endless platforms he doesn't use. If you think it's simple, _you_ write the code. Start by writing code to answer the question "well, _does_ this system use zoneinfo files?" on all possible Python platforms. Thanks ;-) From alexander.belopolsky at gmail.com Thu Sep 24 20:50:39 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Sep 2015 14:50:39 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> References: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> Message-ID: On Thu, Sep 24, 2015 at 2:42 PM, Random832 wrote: > I still don't see the logic in doing any of this rather than parsing > zoneinfo files directly on systems that use it; > There is no portable way to even discover the location of the zoneinfo files. (The default location when installing from the source is the incredible /usr/local/etc/zoneinfo! ) If some location is guessed by searching some likely candidates, there is no guarantee that this is what system tzset() is using. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Thu Sep 24 20:57:53 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Sep 2015 14:57:53 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> Message-ID: On Thu, Sep 24, 2015 at 2:48 PM, Tim Peters wrote: > > Get[Dynamic]TimeZoneInformation[ForYear] on Windows, etc. > > Presumably Alex doesn't want to devote his life to fleshing out "etc" > on endless platforms he doesn't use. It's worse than that. I have no desire to learn even what "Get[Dynamic]TimeZoneInformation[ForYear]" is. :-( -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Fri Sep 25 01:51:55 2015 From: 4kir4.1i at gmail.com (Akira Li) Date: Fri, 25 Sep 2015 02:51:55 +0300 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: (Alexander Belopolsky's message of "Thu, 24 Sep 2015 14:50:39 -0400") References: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> Message-ID: <87oagr1ilg.fsf@gmail.com> Alexander Belopolsky writes: > On Thu, Sep 24, 2015 at 2:42 PM, Random832 wrote: > >> I still don't see the logic in doing any of this rather than parsing >> zoneinfo files directly on systems that use it; >> > > There is no portable way to even discover the location of the zoneinfo > files. (The default location when installing from the source is the > incredible /usr/local/etc/zoneinfo! ) If some location is guessed by > searching some likely candidates, there is no guarantee that this is what > system tzset() is using. tzlocal module by Lennart Regebro might be good enough in practice https://github.com/regebro/tzlocal From alexander.belopolsky at gmail.com Fri Sep 25 02:18:35 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Sep 2015 20:18:35 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: <87oagr1ilg.fsf@gmail.com> References: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> <87oagr1ilg.fsf@gmail.com> Message-ID: On Thu, Sep 24, 2015 at 7:51 PM, Akira Li <4kir4.1i at gmail.com> wrote: > tzlocal module by Lennart Regebro might be good enough in practice > https://github.com/regebro/tzlocal > How well has it been tested on say FreeBSD or Solaris? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 25 03:11:08 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 24 Sep 2015 20:11:08 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: <1443120138.1452862.392770233.43DB6E32@webmail.messagingengine.com> <87oagr1ilg.fsf@gmail.com> Message-ID: [Akira Li <4kir4.1i at gmail.com>] >> tzlocal module by Lennart Regebro might be good enough in practice >> https://github.com/regebro/tzlocal [Alex] > How well has it been tested on say FreeBSD or Solaris? I'm not sure it's relevant to what you're trying to do now. Lennart's tzlocal is intended to work with pytz, and its unix.py just searches all over creation for "the local" IANA tzfile to pass to pytz. My guess is that unless/until Python ships IANA files itself, we're best off sticking to standard C functions. Those have _some_ scant chance of working everywhere ;-) From alexander.belopolsky at gmail.com Fri Sep 25 21:57:36 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 25 Sep 2015 15:57:36 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Wed, Sep 23, 2015 at 6:16 PM, Chris Angelico wrote: > I'm not sure whether this actually helps anything or not, ... Based on the fold size statistic, I have implemented [1] a more robust fold-detection algorithm that passes the exhaustive test. The only assumption that it requires is that no fold is bigger than 24 hours. [1]: https://github.com/abalkin/cpython/commit/54d3596b0180512c68c91e8308665c0a9e61c9eb -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 25 23:32:50 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 25 Sep 2015 16:32:50 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: [Alex] > Based on the fold size statistic, I have implemented [1] a more robust > fold-detection algorithm that passes the exhaustive test. The only > assumption that it requires is that no fold is bigger than 24 hours. > > [1]: > https://github.com/abalkin/cpython/commit/54d3596b0180512c68c91e8308665c0a9e61c9eb Wondering whether this line: if probe2 != result + trans: could be replaced with: if probe2 == result: I'm not sure what the first line is saying ;-) The second line says to me "this is the later time in a fold if and only if subtracting the width of the fold from the starting timestamp converts to the same local time - that's what 'the later time in a fold' means". From alexander.belopolsky at gmail.com Sat Sep 26 02:25:19 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 25 Sep 2015 20:25:19 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Fri, Sep 25, 2015 at 5:32 PM, Tim Peters wrote: > > > > https://github.com/abalkin/cpython/commit/54d3596b0180512c68c91e8308665c0a9e61c9eb > > Wondering whether this line: > > if probe2 != result + trans: > > could be replaced with: > > if probe2 == result: > Yes, it can. Thanks for the suggestion. > > I'm not sure what the first line is saying ;-) It says that probe2 and result are on the opposite sides of the transition, but your tests is simpler and easier to understand. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sat Sep 26 04:28:20 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 25 Sep 2015 21:28:20 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: >>> https://github.com/abalkin/cpython/commit/54d3596b0180512c68c91e8308665c0a9e61c9eb [Tim] >> Wondering whether this line: >> >> if probe2 != result + trans: >> >> could be replaced with: >> >> if probe2 == result: [Alex] > Yes, it can. Thanks for the suggestion. Good! So you have a simple, cross-platform solution now, at least for timestamps localtime() doesn't barf on. > The only assumption that it requires is that no fold is bigger than 24 hours. Well, it does rely on more than just that. For example, if there's a gap where the clock jumps from 2 to 3, followed soon after by a fold of an hour repeating times of the form 4:MM, then the second occurrence of 4:30 won't be detected as such.- the fold and the gap "cancel out" with respect to subtracting 24 hours in either naive time or timestamp time. So it seems a sufficient condition is that there's at most one UTC offset change in the last 24 hours. I wouldn't be surprised if that's always true now - or that it won't be after Kim Jong-un learns he could annoy us by making it false ;-) From alexander.belopolsky at gmail.com Sat Sep 26 04:50:19 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 25 Sep 2015 22:50:19 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Fri, Sep 25, 2015 at 10:28 PM, Tim Peters wrote: > > So it seems a sufficient condition is that there's at most one UTC > offset change in the last 24 hours. Yes. That's the condition I've been talking for months about. If you have a chance, please take a look at https://github.com/abalkin/cpython/commit/d146830e70a1fda22380c5ba0d9592c16acd23de It fails on Europe/Tallinn which seems to have transitions separated by 22 hours with the *same* utcoffset. I don't understand why zic would ever produce something like this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Sep 26 05:03:29 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 25 Sep 2015 23:03:29 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Fri, Sep 25, 2015 at 10:50 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > It fails on Europe/Tallinn which seems to have transitions separated by 22 > hours with the *same* utcoffset. > > I don't understand why zic would ever produce something like this. > Interestingly, system zdump misses the problem transition: $ zdump -v Europe/Tallinn | grep 1999 Europe/Tallinn Sun Mar 28 00:59:59 1999 UTC = Sun Mar 28 02:59:59 1999 EET isdst=0 Europe/Tallinn Sun Mar 28 01:00:00 1999 UTC = Sun Mar 28 04:00:00 1999 EEST isdst=1 Europe/Tallinn Sun Oct 31 00:59:59 1999 UTC = Sun Oct 31 03:59:59 1999 EEST isdst=1 Europe/Tallinn Sun Oct 31 01:00:00 1999 UTC = Sun Oct 31 03:00:00 1999 EET isdst=0 You need to use my zdump.py tool [1] to see it: $ ./python.exe Tools/tz/zdump.py Europe/Tallinn | grep 1999 1999-03-28 01:00:00 UTC = 1999-03-28 04:00:00 EEST isdst=1 +1 1999-10-31 01:00:00 UTC = 1999-10-31 03:00:00 EET isdst=0 -1 1999-10-31 22:00:00 UTC = 1999-11-01 00:00:00 EET isdst=0 +0 [1] : https://github.com/abalkin/cpython/blob/issue24773-s3/Tools/tz/zdump.py -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Sep 26 05:12:26 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 25 Sep 2015 23:12:26 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Fri, Sep 25, 2015 at 11:03 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > You need to use my zdump.py tool [1] to see it: > > $ ./python.exe Tools/tz/zdump.py Europe/Tallinn | grep 1999 > 1999-03-28 01:00:00 UTC = 1999-03-28 04:00:00 EEST isdst=1 +1 > 1999-10-31 01:00:00 UTC = 1999-10-31 03:00:00 EET isdst=0 -1 > 1999-10-31 22:00:00 UTC = 1999-11-01 00:00:00 EET isdst=0 +0 > > [1] : > https://github.com/abalkin/cpython/blob/issue24773-s3/Tools/tz/zdump.py > It looks like this problem has been fixed [2] in the 2015f release: $ ./python.exe Tools/tz/zdump.py /usr/local/etc/zoneinfo/Europe/Tallinn | grep 1999 1999-03-28 01:00:00 UTC = 1999-03-28 04:00:00 EEST isdst=1 +1 1999-10-31 01:00:00 UTC = 1999-10-31 03:00:00 EET isdst=0 -1 [2]: https://github.com/eggert/tz/commit/cf8df34364ffc9bd4eaddc5ff0d6bcdbd699893b -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Sep 26 05:36:45 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 25 Sep 2015 23:36:45 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Fri, Sep 25, 2015 at 10:28 PM, Tim Peters wrote: > So it seems a sufficient condition is that there's at most one UTC > offset change in the last 24 hours. > Apparently [1] we are not alone in wanting this condition. [1]: http://mm.icann.org/pipermail/tz/2015-June/022309.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sat Sep 26 05:36:52 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 25 Sep 2015 22:36:52 -0500 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: [Tim] >> So it seems a sufficient condition is that there's at most one UTC >> offset change in the last 24 hours. [Alex] > Yes. That's the condition I've been talking for months about. ? > If you have a chance, please take a look at > > https://github.com/abalkin/cpython/commit/d146830e70a1fda22380c5ba0d9592c16acd23de > > It fails on Europe/Tallinn which seems to have transitions separated by 22 > hours with the *same* utcoffset. > > I don't understand why zic would ever produce something like this. Well, the Tallinn source rules I see include: 2:00 EU EE%sT 1999 Nov 1 2:00 - EET 2002 Feb 21 That is, they decided to stop messing with DST at all effective the start of November, 1999. But until then, they were following "EU" daylight rules. Which ends DST on the last Sunday of October, which in 1999 happened to be Oct 31. So the first switch to EET local Sunday morning was due to EU daylight time ending, and then the second "switch" to EET at local midnight was due to Tallinn opting out of DST rules altogether. Which didn't change the zone name, DST status, or UTC offset. The output of zic doesn't appear nearly well defined enough to say whether that's "a bug" or "a feature", though :-( From alexander.belopolsky at gmail.com Sat Sep 26 05:44:09 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 25 Sep 2015 23:44:09 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Fri, Sep 25, 2015 at 11:36 PM, Tim Peters wrote: > > [Alex] > > Yes. That's the condition I've been talking for months about. > > ? "if we (generously) allow utcoffset to vary from -24h to +24h, then a "sane" zone can be defined as the one where utcoffset changes at most once in any 48 hour period." https://mail.python.org/pipermail/python-dev/2015-April/139171.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Sep 26 05:46:47 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 25 Sep 2015 23:46:47 -0400 Subject: [Datetime-SIG] IANA TZ database statistics In-Reply-To: References: Message-ID: On Fri, Sep 25, 2015 at 11:36 PM, Tim Peters wrote: > Well, the Tallinn source rules I see include: > > 2:00 EU EE%sT 1999 Nov 1 > 2:00 - EET 2002 Feb 21 > That's a bug that has been fixed in 2015f. See < http://mm.icann.org/pipermail/tz/2015-June/022309.html> for details. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 29 03:21:51 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 28 Sep 2015 21:21:51 -0400 Subject: [Datetime-SIG] Making tm_gmtoff and tm_zone available on all platforms Message-ID: Most UNIX platforms extend struct tm to include tm_gmtoff and tm_zone fields that contain the current UTC offset in seconds and the zone abbreviation. Python has been making these fields available as attributes of time.struct_time [1] since version 3.3, but only on platforms that support them in the C library. >>> import time >>> t = time.localtime() >>> t.tm_gmtoff -14400 >>> t.tm_zone 'EDT' I propose that we make these attributes available on all platforms by computing their values when they are not available in struct tm. The tm_gmtoff value is easy to compute by comparing localtime() to gmtime(): >>> u = time.gmtime(time.mktime(t)) >>> from calendar import timegm >>> timegm(t) - timegm(u) -14400 and tm_zone can be computed by calling strftime() with a '%Z' directive. >>> time.strftime('%Z', t) 'EDT' What does the group think? [1]: https://docs.python.org/3/library/time.html#time.struct_time -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 29 05:04:31 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 20:04:31 -0700 Subject: [Datetime-SIG] Making tm_gmtoff and tm_zone available on all platforms In-Reply-To: References: Message-ID: I had been wondering about that myself. But your implementation proposal sounds kind of expensive, doesn't it? On Mon, Sep 28, 2015 at 6:21 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > Most UNIX platforms extend struct tm to include tm_gmtoff and tm_zone > fields that contain the current UTC offset in seconds and the zone > abbreviation. > > Python has been making these fields available as attributes of > time.struct_time [1] since version 3.3, but only on platforms that support > them in the C library. > > >>> import time > >>> t = time.localtime() > >>> t.tm_gmtoff > -14400 > >>> t.tm_zone > 'EDT' > > I propose that we make these attributes available on all platforms by > computing their values when they are not available in struct tm. > > The tm_gmtoff value is easy to compute by comparing localtime() to > gmtime(): > > >>> u = time.gmtime(time.mktime(t)) > >>> from calendar import timegm > >>> timegm(t) - timegm(u) > -14400 > > and tm_zone can be computed by calling strftime() with a '%Z' directive. > > >>> time.strftime('%Z', t) > 'EDT' > > What does the group think? > > [1]: https://docs.python.org/3/library/time.html#time.struct_time > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG at python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 29 05:29:59 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 28 Sep 2015 23:29:59 -0400 Subject: [Datetime-SIG] Making tm_gmtoff and tm_zone available on all platforms In-Reply-To: References: Message-ID: On Mon, Sep 28, 2015 at 11:04 PM, Guido van Rossum wrote: > > I had been wondering about that myself. But your implementation proposal sounds kind of expensive, doesn't it? It could be with a naive implementation that would simply fill additional fields in the existing time.struct_time object, but we can also modify the struct_time class to compute the additional attributes only when they are requested. (I believe struct_time is currently implemented as PyStructSequence, so we will probably need to subclass that somehow.) On the other hand, I would start with a naive implementation and worry about the optimizations later. As far as I know, the POSIX layer on Windows (which is the main platform that will be affected) is already very slow, so the price of cross-platform portability may be within user expectations in this case. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 29 06:08:21 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 29 Sep 2015 00:08:21 -0400 Subject: [Datetime-SIG] PEP 495 implementation Message-ID: I have completed a pure python implementation of PEP 495 and the patch is ready for review. [1] If you prefer the Github interface, please review the pull request from my cpython clone. [2] Finally, please add yourself as "nosy" to issue #24773 [3] if you would like to follow future developments. [1]: http://bugs.python.org/review/24773/#ps15654 [2]: https://github.com/python/cpython/pull/20 [3]: http://bugs.python.org/issue24773 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 29 23:03:14 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Sep 2015 14:03:14 -0700 Subject: [Datetime-SIG] Making tm_gmtoff and tm_zone available on all platforms In-Reply-To: References: Message-ID: OK, I think this is fine then. On Mon, Sep 28, 2015 at 8:29 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Mon, Sep 28, 2015 at 11:04 PM, Guido van Rossum > wrote: > > > > I had been wondering about that myself. But your implementation proposal > sounds kind of expensive, doesn't it? > > > It could be with a naive implementation that would simply fill additional > fields in the existing time.struct_time object, but we can also modify the > struct_time class to compute the additional attributes only when they are > requested. (I believe struct_time is currently implemented as > PyStructSequence, so we will probably need to subclass that somehow.) > > On the other hand, I would start with a naive implementation and worry > about the optimizations later. As far as I know, the POSIX layer on > Windows (which is the main platform that will be affected) is already very > slow, so the price of cross-platform portability may be within user > expectations in this case. > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Sep 30 18:54:56 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 30 Sep 2015 12:54:56 -0400 Subject: [Datetime-SIG] Making tm_gmtoff and tm_zone available on all platforms In-Reply-To: References: Message-ID: On Tue, Sep 29, 2015 at 5:03 PM, Guido van Rossum wrote: > OK, I think this is fine then. Implementation will be tracked at . -------------- next part -------------- An HTML attachment was scrubbed... URL: