[Python-ideas] Add orderedset as set(iterable, *, ordered=False) and similarly for frozenset.

Sun Feb 8 04:21:44 CET 2015

> * It isn't entirely clear what the set-to-set operations should do to maintain order, whether equality tests should require matching order, whether we care about loss of commutativity, and whether we're willing to forgo the usual optimizations on set-to-set operations (i.e. the intersection algorithm loops over the smaller of the two input sets and checks membership in the larger set -- that is fast, but it would lose order if the second set were the smaller of the two).
> 
>> I agree it's not obvious, but I think that with a little debate we could come together on a reasonable choice.

"In the face of ambiguity, refuse the temptation to guess."   Some of the participants on python-ideas need to develop a much stronger aversion to opening a can of worms ;-)

>  You could even block the set-to-set operations for now.  The main use of OrderedSet just requires add, update, and iteration as your recipe illustrates.

Since you don't seem have any specific requirements for the order created by set-to-set operations, then just use the existing set operations on OrderedDict.keys():

    >>> a = OrderedDict.fromkeys(['raymond', 'rachel', 'matthew'])
    >>> b = OrderedDict.fromkeys(['matthew', 'calvin', 'rachel'])
    >>> a.keys() & b.keys()
    {'rachel', 'matthew'}
    >>> a.keys() | b.keys()
    {'rachel', 'calvin', 'matthew', 'raymond'}
    >>> a.keys() - b.keys()
    {'raymond'}
    >>> a.keys() ^ b.keys()
    {'calvin', 'raymond'}
    >>> list(a.keys())
    ['raymond', 'rachel', 'matthew']

> 
> * The standard library should provide fast, clean fundamental building blocks and defer all of the myriad of variants and derivations to PyPI.   We already provide OrderedDict and MutableSet so that rolling your own OrderedSet isn't hard (see the recipe below).
> 
> I think that OrderedSet is a fundamental building block :)

The rule for "fundamental" that I've used for itertools is whether the thing you want can easily be built out of what you've already got.   For example, you can't make accumulate() out of the other itertools, but you can make tabulate(func) from map(func, count()).  Likewise, you can easily make an OrderedSet from an OrderedDict and MutableSet, or if you're needs are less, just use the existing set operations on dict views as shown above. 

> * We're likely to soon have an OrderedDict written in C (I believe Eric Snow has been working on that for some time), so anything build on top of an OrderedDict will inherit all the speed benefits. 
> 
> Sure.  It's also easy to have an OrderedSet written in C (especially once OrderedDict is done.)

You're profoundly under-estimating how much work it is to build and maintain a separate C implementation.  I've maintained the code for collections and sets for over a decade and can assure you that the maintenance burden has been enormous.

> Also, there is some chance that PyPy folks will contribute their work making regular sets and dicts ordered by default; in which case, having a separate OrderedSet would have been a waste and the question would be moot.
> 
> I think that using dicts and sets and assuming that they're ordered because you plan on using PyPy is a terrible error.

Perhaps, I wasn't clear.  The PyPy folks have offered to port their work back into Python.  They're offering a nice combination of faster performance (especially for iteration), better memory utilization, and as a by-product retaining insertion order.  If their offer comes to fruition, then this whole discussion is moot.

> 
> To be honest, my main issue is my personal need.  

The usual solution for personal needs is to have a personal utils module (it sounds like you've been using my recipe for over a year).  Decisions about what to put in the standard library have far reaching impact and we need to consider what is best for most of the users most of the time (and to consider our own maintenance burden).

I thought ordered sets were interesting enough to write an implementation six years ago, but frankly there are many other other things that have a much better case for being added to collections (for example, some sort of graph structure, some kind of binary tree, some kind of trie, or perhaps a bloom filter).

> If there was a complete PyPI recipe, I would be very happy.  

Then, that is what we should do.

> I don't think I'm the only one, but maybe you're right and I am.
>  
> 
> * One thing I've used to test the waters is to include a link in the documentation to an OrderedSet recipe (it's in the "See also" section at the very bottom of the page at https://docs.python.org/2/library/collections.html ).  As far as I can tell, there has been nearly zero uptake based on this link.  That gives us a good indication that there is very little real-world need for an OrderedSet.
> 
> I've been using your recipe for at least a year (thank you).  How are you measuring uptake?

That is a very difficult question to answer (there is no precise tool for saying X people care about thing Y).  The inputs I do have are imprecise and incomplete but never-the-less still informative.  When google's code search was still alive, there was a direct way of seeing what people used in published code, and I frequently searched to see what people were using in the python world and how they were using it (for example, that is how I was able to determine that str.swapcase() was nearly useless and propose that it be removed for Python 3).

Since then, I rely on a number of other sources such as reading the python bug tracker every day for the last 15 years to know what people are requesting.  Likewise, as frequent contributor to stack overflow you can see what people are asking about.  As a person who does python code reviews and teaches python courses for a living, I get to see into the code base for many companies.  Also, I spend time reading the source for popular python tools to see what their requirements are. Another source is reviewing what is included in toolsets from Continuum and Enthought to see what their client's have asked for.  None of this is precise or sufficiently inclusive, but it does give me some basis for finding out what people care about.

>> 
>> What I suggest is adding the recipe (like the one shown below) directly to the docs in this section:
>> https://docs.python.org/2/library/collections.html#ordereddict-examples-and-recipes
>> There we already have code for a LastUpdatedOrderedDict() and an OrderedCounter().
>> 
>> In the past, I've used these recipes as a way to gauge interest.  When the interest was sufficient, it got promoted from a recipe in the docs to a full implementation living in a standard library module.  For example, itertools.combinations_with_replacement() and itertools.accumulate() started out as recipes in the docs.
>> 
>> In general, there shouldn't be a rush to push more things in the standard library.  We get many suggestions in the form, "it might be nice to have XXX in the standard library", however, in many cases there has been so little uptake in real-world code that it is isn't worth the maintenance burden or for the burden on the working programmer for having too many choices in the standard library (I encounter very few people who actually know most of what is available in the standard library).
> 
> I agree.

Sounds good.  I think we can put an OrderedSet() recipe in the docs and another out on PyPI, and be done with it.

>  
> 
> PyPI is a successful secondary source of code and has become part of the working programmer's life.  We're gradually making it easier and easier to pull that code on-demand.  IMO, that is where most of the collections variants should live. Otherwise, it would be easy to fill the collections module with dozens of rarely used dict/set variants, red-black trees, pairing heaps, tries, graph implementations, bitarrays, bitsets, persistent collections, etc.)
> 
> my-two-cents-ly,
> 
> 
> 
> Agreed.
> 
> It would be nice to complete one of the recipes (oset or ordered-set) — at least for me.

Completing a recipe is more reasonable than pushing for this to be in the standard library.

Raymond