ANN: intervalset Was: Set type for datetime intervals

Nagy László Zsolt gandalf at shopzeus.com
Tue Apr 5 01:44:58 EDT 2016


> I don't know if I like it being immutable. Maybe have separate mutable
> and immutable versions.
The reason for making it immutable is implementation specific. These
intervals are stored in an ordered list. Overlapping intervals are
unified by the constructor. This makes sure that sets with equal
elements are represented equally, and makes it possible to hash the set.
Comparing them by these hashes is fast and easy.

If you want to manipulate an interval set, you cannot just "add another
interval" to it. Adding an interval may result in unifying thousands of
other intervals. It is possible that you add a wide interval to a set
with 1000 elements, and as a result the set will have only a single
element.
> Like I said before, I don't think the set-like operations on Intervals
> are useful - what can you accomplish with them rather than by making a
> set consisting of only one interval and doing operations on that?
Well, then don't use them. I believe that the intersection of two
intervals is especially useful. Easy to check if two intervals have a
common non-empty part or not. You can also check this with a condition,
but that condition is a long one.
>>>> element ::= (start_point_in_time, end_point_in_time)
>>>> intervalset ::= { element1, element2, .... }
>>> Are these open intervals or closed intervals?
>> Closed. In my particular implementation, there is a singleton for all
>> empty intervals. It is not possible to create an arbitrary interval with
>> the same starting and ending time (the singleton being the only
>> exception).
> An "arbitrary interval with the same starting and ending time" would be
> more like an isolated datetime (i.e. 00:00 is in the set but 23:59 or
> 00:01 is not)
Yes, that is true. Again: my task was to represent non-empty intervals.
We can say that a zero sized closed interval is also a special interval
that identifies a single point of time. Since the base class has been
rewritten to support any ordered type, it makes sense. Probably this is
a good idea to do. There are applications where the user is not
interested in zero sized intervals. For example, if you want to store
the working time of an employee then obviously you don't want to store
zero sized intervals.

How about creating two classes for this? One that supports zero sized
intervals, and another that doesn't?
Or maybe create a method that removes all zero sized intervals?
>> I think that implementing open intervals would be much more
>> difficult, and we would have to know and use the smallest possible
>> increment (resolution) of the date/time type. Which may be platform
>> dependent.
> My suggestion was to use boolean flags, and to use that to control
> whether to use < or <= to test for membership.
>
> An open interval is more like an hour where 00:00 is included and
> 00:59:59.999999 is included but 01:00 is not. With discrete resolution
> it's as simple as just moving the endpoint off by one unit, but I think
> it'd be cleaner to use flags (you'd need it for numeric intervals, since
> numeric types can have any resolution).
Well, here is the problem: floats, ints, decimals and datetimes are all
ordered types. But they all have a different resolution, and some of
them are implementation dependent. We must know and use the smallest
resolution, even if we use boolean flags to denote openness. For
example: if we use integer values, then the closed interval [1,9] is
equal to the open interval (0,10) or the left-open interval (0,10]. We
want to be able to compare them, so we must know the resolution! If we
stay at the generalized form of the class that can be used on any
ordered type, then we cannot avoid using the concept of the "resolution
of the type". Here are the choices:

  * Use boolean flags to denote openness:
      o We do not support testing for interval equality - I think that
        missing this fundamental operator would make the implementation
        useless.
      o Support testing for internal equality - this requires knowledge
        about the smallest resolution, so in this case using flags to
        denote openness makes it possible to represent the same interval
        in 4 different ways! (left opened/closed, right opened/closed).
        I think this is also a bad idea.
      o We may have an agreement about how we should represent certain
        intervals from the 4 different possible opportunities, but the
        simplest way would be to always store them as closed intervals -
        which makes these boolean flags unneccessary
  * Do not use boolean flags to denote openness - I don't see any
    problem here.


I think the problem is that you are thinking the mathematical way, where
an open interval can never be equal to a closed interval. I was thinking
the IT way, where everything is represented with bits, and the "open
interval" is just a concept that always has a biggest and a smallest
possible value.

I'm not saying that your suggestion is wrong. I'm just saying that it
raises all kinds of implementation specific questions that are hard to
answer - at least for me.

>>    def __contains__(self, other):
>>        """Containment relation.
>>
>>        Tell if `other` is contained *completely* in this set. The argument can either be an Interval or an
>>        IntervalSet.
>>        """
> I don't think this is appropriate for this operation; this should be
> __gt__. __contains__ should test a datetime (or whatever type item), not
> another Interval/IntervalSet.
Well, I have also implemented __gt__ but its interpretation is equally
blurry. When two intervals overlap, which is the bigger? My original
task was to compare working hours (intervals) of employees and in that
interpretation, the above "contains" operation was useful. In other
cases, using a set intersection is more meaningful ("what is the common
part in them").

Cheers,

   Laszlo




More information about the Python-list mailing list