[Datetime-SIG] Another round on error-checking

Isaac J Schwabacher ischwabacher at wisc.edu
Wed Sep 9 07:18:33 CEST 2015


I stop following for the week and the world goes mad. I've lost count of the number of times I've thought, "Are you out of your *mind*!?" while reading this thread. You actually considered breaking the __hash__ invariant?

[Guido]
> > I could not accept a PEP that leads to different datetime being considered
> > == but having a different hash (*unless* due to a buggy tzinfo subclass
> > implementation -- however no historical timezone data should ever depend on
> > such a bug).
> >
> > I'm much less concerned about < being intransitive in edge cases.

[Tim]
> Offhand I don't know whether it can be (probably).  The case I
> stumbled into yesterday showed that equality ("==") could be
> intransitive:
> 
>     assert a == b == c == d  and  a < d
> 
> While initially jarring, I called it a "minor wart", because the
> middle "==" there is working in classic arithmetic but the other two
> are working in timeline arithmetic.  But _a_ wart all the same, since
> transitivity doesn't fail today.

I'm assuming that the moment of temporary insanity has passed and you consider the __hash__ invariant to be sacrosanct.

The problem here is that someone (Alexander, I think?) demonstrated a method of producing a tzinfo class and b and c to make this true, *given arbitrary a and d*. Equality may not be transitive, but equality of hashes is, which means that __hash__ must be constant over equivalence classes in the transitive closure of the relation defined by __eq__. In this case, this boils down to "if __hash__ ignores fold, all datetime objects must have the same hash".

I imagine the performance implications of this are not acceptable.

There is no satisfactory way of weaseling out of this; datetime equality is timeline equality now and forever, unless you're willing to give up one of backward compatibility, the __hash__ invariant, or the ability to implement new tzinfo classes. (The tzinfo in the example was contrived but not buggy.)

> > I also don't particularly care about == following from the difference being zero.
> > Still, unless we're constrained by backward compatibility, I would rather
> > not add equivalence between *any* two datetimes whose tzinfo is not the same
> > object -- even if we can infer that they both must refer to the same
> > instant.
> 
> Assuming "equivalent" means "compare equal", we're highly constrained.
> For datetimes x and y with distinct non-None tzinfos, it's always been
> the case that:
> 
> 1. x-y effectively converted both to UTC before subtraction.
> 
> 2. comparison effectively interpreted x-y as a __cmp__ result
> 2a.  various comparison transitivities essentially followed from that
> 
> 3. Because of #2, to maintain __hash__'s contract datetime.__hash__
>     also effectively converted to UTC before hashing
> 
> All of that would (well, "should") continue to work fine, except that
> fold=1 is being ignored in intrazone arithmetic (subtraction and
> comparisons) and by hash().  Maybe there are other surprises.  I just
> happened to notice the hash() problem, and equality intransitivity,
> both yesterday. via thought experiments.
> 
> On the face of it, it's a conceptual mess to try to make fold=1 "mean
> something" in some contexts but not in others.  In particular,
> arithmetic, comparison, and hashing are usually deeply interrelated,
> and have been in datetime so far.  Ignoring `fold` in single-zone
> arithmetic, comparisons and hashing works fine (in "naive time", where
> `fold` is senseless), but when going across zones `fold` cannot be
> ignored.
> 
> That's a huge problem for hash(), because it can have no idea whether
> the pattern of later equality comparisons relying on hash results
> _will_ be using classic or timeline rules (or a mix of both).
> 
> That didn't matter before, because _a_ unique UTC equivalent always
> existed (the possibility of ambiguous times was effectively ignored).
> 
> Now it does matter, because the UTC equivalent can differ depending on
> the `fold` value.  Ignoring it sometimes but not others leads to the
> current quandary.

The last time I made an argument like this, Guido called me the *very loyal* opposition. :)

ijs


More information about the Datetime-SIG mailing list