ANN: intervalset Was: Set type for datetime intervals

Tue Apr 5 09:44:04 EDT 2016

On Tue, Apr 5, 2016, at 01:44, Nagy László Zsolt wrote:
> If you want to manipulate an interval set, you cannot just "add another
> interval" to it. Adding an interval may result in unifying thousands of
> other intervals. It is possible that you add a wide interval to a set
> with 1000 elements, and as a result the set will have only a single
> element.

Yeah, but it's not clear why the add method can't do the same
unification that the constructor does.

> How about creating two classes for this? One that supports zero sized
> intervals, and another that doesn't?

If you don't want zero sized intervals, just don't put any in it. You
don't have a separate list type to support even integers vs one that
supports all floats. What operations are made different in
implementation by this restriction?

> Well, here is the problem: floats, ints, decimals and datetimes are all
> ordered types. But they all have a different resolution, and some of
> them are implementation dependent.

All real number types should be considered a single type for the purpose
of implementing something like this. There's no reason you shouldn't be
able to mix floats, ints, Decimals, and Fractions (the last of which has
conceptually infinite resolution).

> We must know and use the smallest
> resolution, even if we use boolean flags to denote openness. For
> example: if we use integer values, then the closed interval [1,9] is
> equal to the open interval (0,10) or the left-open interval (0,10]. We
> want to be able to compare them, so we must know the resolution!

There's no reason to consider (0, 10) equal to _any_ closed interval.

> If we
> stay at the generalized form of the class that can be used on any
> ordered type, then we cannot avoid using the concept of the "resolution
> of the type". Here are the choices:
> 
>   * Use boolean flags to denote openness:
>       o We do not support testing for interval equality - I think that
>         missing this fundamental operator would make the implementation
>         useless.

I don't see how considering some intervals to not be equal to others
means not supporting testing interval equality.

>       o Support testing for internal equality - this requires knowledge
>         about the smallest resolution, so in this case using flags to
>         denote openness makes it possible to represent the same interval
>         in 4 different ways! (left opened/closed, right opened/closed).
>         I think this is also a bad idea.
>       o We may have an agreement about how we should represent certain
>         intervals from the 4 different possible opportunities, but the
>         simplest way would be to always store them as closed intervals -
>         which makes these boolean flags unneccessary
>   * Do not use boolean flags to denote openness - I don't see any
>     problem here.
> 
> 
> I think the problem is that you are thinking the mathematical way, where
> an open interval can never be equal to a closed interval. I was thinking
> the IT way, where everything is represented with bits, and the "open
> interval" is just a concept that always has a biggest and a smallest
> possible value.

The smallest possible value of a Fraction above zero is limited only by
available memory (its denominator is the largest possible int). You
could also reasonably implement a "higher-resolution datetime class"
which can be interchangeably and compared with normal datetimes... maybe
even with a Fraction for fractional seconds.

> Well, I have also implemented __gt__ but its interpretation is equally
> blurry. When two intervals overlap, which is the bigger?

With set types, the comparison operations are is-subset-of operations,
i.e. a < b returns true if a is _completely contained within_ b, and
false if they do not intersect or they overlap differently on both
sides. It does not imply that all values in a are less than all values
in b (that would be a different operation)

> My original
> task was to compare working hours (intervals) of employees and in that
> interpretation, the above "contains" operation was useful.

Of course the operation is useful. My entire point is that it should be
spelled "a < b" not "a in b".

Look at sets to see what I'm talking about:

{1, 2} < {1, 2, 3}: True
{2, 3} < {1, 2, 3}: True
{1, 3} < {1, 2, 3}: True

{1, 2} < {2, 3}: False
{1, 2} > {2, 3}: False

{1, 2} < {1, 2}: False
{1, 2} <= {1, 2}: True