[Python-Dev] API for binary operations on Sets

Raymond Hettinger raymond.hettinger at gmail.com
Thu Sep 30 05:50:32 CEST 2010


I would like to solicit this group's thoughts on how to reconcile the Set abstract base class with the API for built-in set objects (see http://bugs.python.org/issue8743 ).  I've been thinking about this issue for a good while and the RightThingToDo(tm) isn't clear.

Here's the situation:

Binary operators for the built-in set object restrict their "other" argument to instances of set, frozenset, or one of their subclasses.   Otherwise, they return NotImplemented.  This design was intentional (i.e. part of the original pure python version, it is unittested behavior, and it is a documented restriction).  It allows other classes to "see" the NotImplemented and have a chance to take-over using __ror__, __rand__, etc.     Also, by not accepting any iterable, it prevents little coding atrocities or possible mistakes like "s | 'abc'".  This is a break with what is done for lists (Guido has previously lamented that list.__add__ accepting any iterable is one of his "regrets").  This design has been in place for several years and so far everyone has been happy with it (no bug reports, feature requests, or discussions on the newsgroup, etc).  If someone needed to process a non-set iterable, the named set methods (like intersection, update, etc) all accept any iterable value and this provides an immediate, usable alternative.

In contrast, the Set and MutableSet abstract base classes in Lib/_abcoll.py take a different approach.  They specify that something claiming to be set-like will accept any-iterable for a binary operator (IOW, the builtin set object does not comply).   The provided mixins (such as __or__, __and__, etc) are implemented that way and it works fine.  Also, the Set and MutableSet API do not provide named methods such as update, intersection, difference, etc.  They aren't really needed because the operator methods already provide the functionality and because it keeps the Set API to a reasonable minimum.

All of this it well and good, but the two don't interoperate.  You can't get an instance of the Set ABC to work with a regular set, nor do regular sets comply with the ABC.  These are problems because they defeat some of the design goals for ABCs.

We have a few options:

1. Liberalize setobject.c binary operator methods to accept anything registered to the Set ABC and add a backwards incompatible restriction to the Set ABC binary operator methods to only accept Set ABC instances (they currently accept any iterable).   

This approach has a backwards incompatible tightening of the Set ABC, but that will probably affect very few people.  It also has the disadvantage of not providing a straight-forward way to handle general iterable arguments (either the implementer needs to write named binary methods like update, difference, etc for that purpose or the user will need to cast the the iterable to a set before operating on it).   The positive side of this option is that keeps the current advantages of the setobject API and its NotImplemented return value.

1a.  Liberalize setobject.c binary operator methods, restrict SetABC methods, and add named methods (like difference, update, etc) that accept any iterable.

2. We could liberalize builtin set objects to accept any iterable as an "other" argument to a binary set operator.  This choice is not entirely backwards compatible because it would break code depending on being able run __ror__, __rand__, etc after a NotImplemented value is returned.  That being said, I think it unlikely that such code exists.  The real disadvantage is that it replicates the problems with list.__add__ and Guido has said before that he doesn't want to do that again.  

I was leaning towards #1 or #1a and the guys on IRC thought #2 would be better.  Now I'm not sure and would like additional input so I can get this bug closed for 3.2.  Any thoughts on the subject would be appreciated.

Thanks,


Raymond


P.S. I also encountered a small difficulty in implementing #2 that would still need to be resolved if that option is chosen.

























-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100929/f6c8944e/attachment.html>


More information about the Python-Dev mailing list