Is there a canonical way to check whether an iterable is ordered?

Chris Angelico rosuav at gmail.com
Thu Sep 18 09:33:25 EDT 2014


On Thu, Sep 18, 2014 at 10:58 PM, Roy Smith <roy at panix.com> wrote:
> I suspect what he meant was "How can I tell if I'm iterating over an
> ordered collection?", i.e. iterating over a list vs. iterating over a
> set.

Right, which is what I meant by asking if the order mattered. When you
iterate over a set, you'll get some kind of order (because iterables
have to have an order), but it won't mean anything.

> Is there anything which requires an iterator to be deterministic?  For
> example, let's say I have an iterable, i, and I do:
>
> list1 = [item for item in i]
> list2 = [item for item in i]
>
> am I guaranteed that list1 == list2?  It will be for all the collections
> I can think of in the standard library, but if I wrote my own class with
> an __iter__() which yielded the items in a non-deterministic order,
> would I be violating something other than the principle of least
> astonishment?

It's not guaranteed. If you do exactly that, with no operations in
between, then yes, all the stdlib collections will (AFAIK) give you
matching lists, but that's definitely not required by iterator
protocol.

The one thing you can rely on (and therefore must comply with, when
you design an iterable) is that iteration will hit every element
exactly once. Implementing that on most collections means returning
the values in some internal representational order, something that's
consistent across the lifetime of the iterator; having that not be
consistent across multiple iterators would be the exception, not the
rule.

There would be a few special cases where you specifically *want to*
have the lists differ, though. Imagine DNS records: perhaps you have
four A records for some name, and you want to distribute load between
them [1]. You could have your name server do something like this:

class _cl_iter:
    def __init__(self, lst, start):
        self._data = lst
        self._start = self._next = start
    def __iter__(self): return self
    def __next__(self):
        if self._next is None: raise StopIteration
        val = self._data[self._next]
        self._next = (self._next + 1) % len(self._data)
        if self._next == self._start: self._next = None
        return val

class CircularList:
    def __init__(self, it):
        self._data = list(it)
        self._next = -1
    def __iter__(self):
        self._next = (self._next + 1) % len(self._data)
        return _cl_iter(self._data, self._next)

Every time you iterate over a given CircularList, you'll get the same
results, but starting at a different point:

>>> lst = CircularList(("192.0.2.1","192.0.2.2","192.0.2.3","192.0.2.4"))
>>> list(lst)
['192.0.2.1', '192.0.2.2', '192.0.2.3', '192.0.2.4']
>>> list(lst)
['192.0.2.2', '192.0.2.3', '192.0.2.4', '192.0.2.1']
>>> list(lst)
['192.0.2.3', '192.0.2.4', '192.0.2.1', '192.0.2.2']
>>> list(lst)
['192.0.2.4', '192.0.2.1', '192.0.2.2', '192.0.2.3']

So if you have a whole bunch of these for your different A records,
and you return them in iteration order to each client, you'll end up
with different clients getting them in a different order, with minimal
extra storage space. (This is Py3 code; to make it Py2 compatible,
probably all you need to do is subclass object and rename __next__ to
next.)

But this is a pretty unusual case. I would expect that most objects
will either iterate consistently until mutated, or return only what
wasn't previously consumed (like an iterator, which is itself
iterable).

ChrisA

[1] This isn't true load-balancing, of course, but it's a simple way
to distribute requests a bit. It's better than nothing.



More information about the Python-list mailing list