Py2.3: Feedback on Sets

Istvan Albert ialbert at mailblocks.com
Tue Aug 12 12:24:40 EDT 2003


Raymond Hettinger wrote:

First of all, thanks for the work on it, I need to use sets
in my work all the time. I had written my own
(simplistic) implementation but that adds another layer
of headaches when distributing programs since then
I have to distribute multiple modules.

Sometimes I ended up with a little set function in every
big module. Pretty silly. For me sets are a greatly useful
addition.

> * Is the support for sets of sets necessary for your work
>    and, if so, then is the implementation sufficiently
>    powerful?

One pattern that I constantly need is to remove duplicates from
a sequence. I don't know if this an often enough used pattern to
warrant an API change, for me it would be most useful if I could
get the contents of a set as a sequence right away, without having to 
explicitly code it.

 > * Are you overjoyed/outraged by the choice of | and & as
 > set operators (instead of + and *)?

I think that since you have have - as a difference operator it
would make sense to also have + as a union operator. Takes nothing
away from |. The & operator is the right one, * would not be appropriate 
IMO.

> * Do you care that sets can only contain hashable elements?

I don't really care, on the other hand, it might be better to call the
class HashSet, so that it conveys right away that it uses hashing
to store the elements.

> * Are the docs clear?  Can you suggest improvements?

I wondered whether it would be better to specify the immutability
of the class at the constructor level.

Then there is the update method. It feels a little bit redundant
since there is an add() method that seems to be doing the same thing 
only that add() adds only one element at a time.
Would it be possible to have add() handle all additions, iterable or 
not, then scrap update() altogether.

Then just by looking at the docs, it feels a little bit confusing to
have discard() and remove() do essentially the same thing but only one 
of them raising an exception. Which one? I already forgot. I don't know 
which one I would prefer though.

Another aspect that I did not understand, what is difference between 
update() and union_update().

The long winded method names, such as difference_update() also feel 
redundant when one can achieve the same thing with the -= operator. I 
would drop these and instead show in the docs how to accomplish these 
with the operators. Would considerably cut down on the documentation,
and apparent complexity.

I'm a big fan of having the minimal number of methods as long it is
easy to obtain the result.

For example methods like x.issubset(y) is the same as bool(x-y) so may 
not be all that necessary, just a thought.

> * Are sets helpful in your daily work or does the need arise
>    only rarely?

I use them very often and they are extremely useful.

thanks again,

Istvan.





More information about the Python-list mailing list