operators and datatypes for sets

Steve Holden sholden at cox.rr.com
Mon Apr 23 17:55:35 EDT 2001


"Anton Vredegoor" <anton at vredegoor.doge.nl> wrote in message
news:9c1jdj$phb$1 at news.hccnet.nl...
>
> There has been some discussion about sets some time ago, but I could
> find no conclusive information. The consensus seemed to be to use a
> dictionary as the underlying data type. In order to revive the
> discussion (or to make it explicit here, if it was not dead but only
> invisible to me) I would like to present some test code I have made. I
> would post it here but since it is 142 lines long I will just give a
> link:
>
> http://home.hccnet.nl/a.vredegoor/smallset/smallset.py
>
> or the html version:
>
> http://home.hccnet.nl/a.vredegoor/smallset/smallset.html
>
> The basic idea is to use lists as the underlying datatype and to do
> member administration by setting bits to '1' or '0' in a longint. This
> has the advantage of doing very fast operations. There are also some
> disadvantages, one of them is having to specify a 'universe'.
>
So you are talking about finite sets, with underlying base sets specifying
the universe.

Had you thought about mapping the members on to the positions with a
dictionary to allow more flexibility? Having to compute the element's
position outside the set is inelegant.

> There is a also a PEP about this,
>
> http://python.sourceforge.net/peps/pep-0218.html
>
> but this PEP is about a language change, which is way out of my league
> since I am only talking about a module.
>
> I have used the 'in' operator for determining if a set is a subset of
> another set. This is different from its normal use where it is used
> for checking if an element is in a set.

This is likely to be VERY confusing. Although "contains" is ambiguous in
English for subset and element set membership, "in" is pretty much expected
to refer to element membership, so you might want to think about providing
an "is_subset" method ... but don't ask me which way around the arguments
should go, that could be a three-week thread all on its own!

>                                                       Further I have used
the '/'
> operator for 'set1 without the elements in set2'. This is also
> unusual.

Subtraction would seem the more normal operation.

>            I have also added a new operator '~' which can be available
> because the universe is specified. It gives all elements that are not
> in the set. These are just suggestions.
>
> Please let me know what you think or show me some better code,
> improvements would also be very welcome.
>
> Anton.
>
There are a couple of things ...

Firstly, have you made any provision for different instances to deal with
different universes? It seems to me that it would be more flexible if the
smallset.__init__() method were to record some information about the
universe of that set, so there was a tighter coupling between smallsets and
the objects they were dealing with.

That way, compress(), toindex() and expand() would be methods of whichever
type of object were being stored in the smallset (in your example, point),
and they would be called from within the smallset methods through a
self.universe instance variable.

For example, your __repr__() mthod might then change from

    def __repr__(self):
        #delegate printing to objects in the set
        return '%s' %expand(self.data)

to

    def __repr__(self):
        #delegate printing to objects in the set
        return '%s' %self.data.expand()

Hope this helps.

regards
 sTeVe





More information about the Python-list mailing list