Python Module Exposure

George Sakkis gsakkis at rutgers.edu
Sat Jul 9 11:55:07 EDT 2005


"Jacob Page" <apoco at cox.net> wrote:

> George Sakkis wrote:
> > 1. As already noted, ISet is not really descriptive of what the class
> > does. How about RangeSet ? It's not that long and I find it pretty
> > descriptive. In this case, it would be a good idea to change Interval
> > to Range to make the association easier.
>
> The reason I decided not to use the term Range was because that could be
> confused with the range function, which produces a discrete set of
> integers.  Interval isn't a term used in the standard/built-in library,
> AFAIK, so I may stick with it.  Is this sound reasoning?

Yes, it is not unreasonable; I can't argue strongly against Interval.
Still I'm a bit more in favor of Range and I don't think it is
particularly confusing with range() because:
1. Range has to be either qualified with the name of the package (e.g.
rangesets.Range) or imported as "from rangesets import Range", so one
cannot mistake it for the range builtin.
2. Most popular naming conventions use lower first letter for functions
and capital for classes.
3. If you insist that only RangeSet should be exposed from the module's
interface and Range (Interval) should be hidden, the potential conflict
between range() and RangeSet is even less.

> > 2. The module's "helper functions" -- which are usually called factory
> > functions/methods because they are essentially alternative constructors
> > of ISets -- would perhaps better be defined as classmethods of ISet;
> > that's a common way to define instance factories in python. Except for
> > 'eq' and 'ne', the rest define an ISet of a single Interval, so they
> > would rather be classmethods of Interval. Also the function names could
> > take some improvement; at the very least they should not be 2-letter
> > abbreviations. Finally I would omit 'eq', an "interval" of a single
> > value; single values can be given in ISet's constructor anyway. Here's
> > a different layout:
>
> First, as far as having the factory functions create Interval instances
> (Range instances in your examples), I was hoping to avoid Intervals
> being publically "exposed."  I think it's cleaner if the end-programmer
> only has to deal with one object interface.

First off, the python convention for names you intend to be 'private'
is to prefix them with a single underscore, i.e. _Interval, so it was
not obvious at all by reading the documentation that this was your
intention. Assuming that Interval is to be exposed, I  found
Interval.lowerThan(5) a bit more intuitive than
IntervalSet.lowerThan(5). The only slight problem is the 'ne'/
allExcept factory which doesn't return a continuous interval and
therefore cannot be a classmethod in Interval.

On whether Interval should be exposed or not: I believe that interval
is a useful abstraction by itself and has the important property of
being continuous, which IntervalSet doesn't. Having a simple
single-class interface is a valid argument, but it may turn out to be
restricted later. For example, I was thinking that a useful method of
IntervalSet would be an iterator over its Intervals:
    for interval in myIntervalSet:
        print interval.min, interval.max
There are several possible use cases where dealing directly with
intervals would be appropriate or necessary, so it's good to have them
supported directly by the module.

> I like the idea of lengthening the factory function names, but I'm not
> sure that the factory methods would be better as class methods.
> Consider a use-case:
>
>  >>> import iset
>  >>> interval = iset.ISet.lowerThan(4)
>
> as opposed to:
>
>  >>> import iset
>  >>> interval = iset.lowerThan(4)

I was thinking along the lines of:

>>> from rangesets import RangeSet, Range
>>> mySet = RangeSet(Range.lowerThan(4), Range.greaterThan(5))

That's similar to "from sets import Set" before set became builtin.

> This is less typing and probably eliminates some run-time overhead.  Can
> you list any advantages of making the factory functions class methods?

One of the main reason for introducing classmethods was to allow
alternate constructors that play well with subclasses. So if Interval
is subclassed, say for a FrozenInterval class,
FrozenInterval.lowerThan() returns FrozenInterval instances
automatically.

> > class Range(object):     # Interval
> >
> >     @classmethod
> >     def lowerThan(cls, value, closed=False):
> >         # lt, for closed==False
> >         # le, for closed==True
> >         return cls(None, False, value, closed)
> >
> >     @classmethod
> >     def greaterThan(cls, value, closed=False):
> >         # gt, for closed==False
> >         # ge, for closed==True
> >         return cls(value, closed, None, False)
> >
> >     @classmethod
> >     def between(cls, min, max, closed=False):
> >         # exInterval, for closed==False
> >         # incInterval, for closed==True
> >         return cls(min, closed, max, closed)
>
> I like the function names, but in my dialect, lessThan would be more proper.

Either (lowerThan,greaterThan) or (lessThan,moreThan) are fine with me;
I'm very slightly in favor of the second for brevity.

> > class RangeSet(object):  # ISet
> >
> >     @classmethod
> >     def allExcept(cls, value): # ne
> >         return cls(Range.lowerThan(value), Range.greaterThan(value))
> >
> > 3. Having more than one names for the same thing is good to be avoided
> > in general. So keep either "none" or "empty" (preferably "empty" to
> > avoid confusion with None) and remove ISet.__add__ since it is synonym
> > to ISet.__or__.
>
> I agree that "none" should be removed.  However, some programmers may
> prefer to use +, some |, so I'd like to provide both.

Interval.__add__ is ok, but IntervalSet should be as close to regular
sets as possible; anyway, that's a minor point.

> > 4. Intervals shouldn't be comparable; the way __cmp__ works is
> > arbitrary and not obvious.
>
> I agree that __cmp__ is being used arbitrarily.  I wanted sorting to be
> easy, but there's other ways of doing that.  I'll take your suggestion.
>
> > 5. Interval.adjacentTo() could take an optional argument to control
> > whether an endpoint is allowed to be in both ranges or not (e.g.
> > whether (1,3], [3, 7] are adjacent or not).
>
> Interval objects weren't meant to be public; adjacentTo is used by ISet
> for combinatory functions.  I could look into ways to hide the class
> from the public, but hiding things never seemed Pythonic to me; someone
> may want to use them for some reason.  Maybe pydoc can be made to not
> document certain objects.

I mentioned already the arguments for keeping it public. If you insist
though, several tools recognize the single underscore convention as
meaning 'private'; epydoc does, I don't  know about pydoc. Also, "from
module import *" imports only the top level names that don't start with
an undescore.

> > 6. Possible ideas in your TODO list:
> >     - Implement the whole interface of sets for ISet's so that a client
> > can use either or them transparently.
>
> That was my intention.  I'll have to see which interface functions I'm
> missing.
>
> >     - Add an ISet.remove() for removing elements, Intervals, ISets as
> > complementary to ISet.append().
> >     - More generally, think about mutable vs immutable Intervals and
> > ISets. The sets module in the standard library will give you a good
> > idea of  the relevant design and implementation issues.
>
> Both good ideas.
>
> > After I look into the module's internals, I'll try to make some changes
> > and send it back to you for feedback.
>
> Thanks for your feedback, George.  I look forward to any additional
> comments you come up with.

Regards,
George




More information about the Python-list mailing list