[Python-ideas] Add orderedset as set(iterable, *, ordered=False) and similarly for frozenset.

Sun Feb 8 04:26:01 CET 2015

Resending this earlier post.  It was mistakenly sent to python-ideas at googlegroups.com instead of python-ideas at python.org. 

Sorry for the confusion and for the posts arriving out-of-sequence.

----------------------

> On Feb 4, 2015, at 1:14 PM, Neil Girdhar <mistersheik at gmail.com> wrote:
> 
> Hundreds of people want an orderedset (http://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set) and yet the ordered sets that are in PyPI (oset, ordered-set) are incomplete.  Please consider adding an ordered set that has *all* of the same functionality as set.
> 
> The only major design decision is (I think) whether to store the elements in a linked list or dynamic vector.
> 
> My personal preference is to specify the ordered set via a flag to the constructor, which can be intercepted by __new__ to return a different object type as necessary.
> 
> (If anyone wants to add a complete ordered set to PyPI that would also be very useful for me!)

Long ago, I ago considered adding an OrderedSet() and decided to hold-off for several reasons:

* The primary use cases for sets don't need order.  The main use cases being: uniquification, fast membership testing, and classic set-to-set operations (union, intersection, and difference).  

* As far as I can tell, the primary benefit of an ordered set is that it is easier on the eyes when interactively experimenting with trivial schoolbook examples.  That said, an OrderedSet __repr__ looks uglier because it would lack the {10,20,30} style literals we have for regular sets).

* It isn't entirely clear what the set-to-set operations should do to maintain order, whether equality tests should require matching order, whether we care about loss of commutativity, and whether we're willing to forgo the usual optimizations on set-to-set operations (i.e. the intersection algorithm loops over the smaller of the two input sets and checks membership in the larger set -- that is fast, but it would lose order if the second set were the smaller of the two).

* The standard library should provide fast, clean fundamental building blocks and defer all of the myriad of variants and derivations to PyPI.   We already provide OrderedDict and MutableSet so that rolling your own OrderedSet isn't hard (see the recipe below).

* We're likely to soon have an OrderedDict written in C (I believe Eric Snow has been working on that for some time), so anything build on top of an OrderedDict will inherit all the speed benefits.   Also, there is some chance that PyPy folks will contribute their work making regular sets and dicts ordered by default; in which case, having a separate OrderedSet would have been a waste and the question would be moot.

* Also, I wouldn't read too much in the point score on the StackOverflow question.  First, the question is very old and predates simpler solutions now available using OrderedDict and MutableMapping.  Second, StackOverflow tends to award high scores to the simplest questions and answers (such as http://stackoverflow.com/a/8538295/1001643 ).  A high score indicates interest but not a need to change the language.  A much better indicator would be the frequent appearance of ordered sets in real-world code or high download statistics from PyPI or ActiveState's ASPN cookbook.

* One thing I've used to test the waters is to include a link in the documentation to an OrderedSet recipe (it's in the "See also" section at the very bottom of the page at https://docs.python.org/2/library/collections.html ).  As far as I can tell, there has been nearly zero uptake based on this link.  That gives us a good indication that there is very little real-world need for an OrderedSet.

What I suggest is adding the recipe (like the one shown below) directly to the docs in this section:
https://docs.python.org/2/library/collections.html#ordereddict-examples-and-recipes
There we already have code for a LastUpdatedOrderedDict() and an OrderedCounter().

In the past, I've used these recipes as a way to gauge interest.  When the interest was sufficient, it got promoted from a recipe in the docs to a full implementation living in a standard library module.  For example, itertools.combinations_with_replacement() and itertools.accumulate() started out as recipes in the docs.

In general, there shouldn't be a rush to push more things in the standard library.  We get many suggestions in the form, "it might be nice to have XXX in the standard library", however, in many cases there has been so little uptake in real-world code that it is isn't worth the maintenance burden or for the burden on the working programmer for having too many choices in the standard library (I encounter very few people who actually know most of what is available in the standard library).

PyPI is a successful secondary source of code and has become part of the working programmer's life.  We're gradually making it easier and easier to pull that code on-demand.  IMO, that is where most of the collections variants should live. Otherwise, it would be easy to fill the collections module with dozens of rarely used dict/set variants, red-black trees, pairing heaps, tries, graph implementations, bitarrays, bitsets, persistent collections, etc.)

my-two-cents-ly,

Raymond

------- Rolling your own from existing parts --------------

from collections import OrderedDict, MutableSet

class OrderedSet(MutableSet):

   def __init__(self, iterable=()):
       self._data = OrderedDict.fromkeys(iterable)

   def add(self, key):
       self._data[key] = None

   def discard(self, key):
       self._data.pop(key, None)

   def __len__(self):
       return len(self._data)

   def __contains__(self, key):
       return key in self._data

   def __iter__(self):
       return iter(self._data)

   def __repr__(self):
       return '%s(%r)' % (self.__class__.__name__, set(self))