[Python-Dev] [Python-3000] Iterators for dict keys, values, and items == annoying :)

Nick Coghlan ncoghlan at gmail.com
Wed Mar 29 13:15:21 CEST 2006


Paul Moore wrote:
> On 3/29/06, Brett Cannon <brett at python.org> wrote:
>> Without a direct reason in terms of the language needing a
>> standardization of an interface, perhaps we just don't need views.  If
>> people want their iterator to have a __len__ method, then fine, they
>> can add it without breaking anything, just realize it isn't part of
>> the iterator protocol and thus may limit what objects a function can
>> accept, but that is there choice.
> 
> Good point. I think we need to start from strong use cases. With
> these, I agree that the view concept is a good implementation
> technique to consider. But let's not implement views just for the sake
> of having them - I'm pretty sure that was never Guido's intention.

There are three big use cases:

   dict.keys
   dict.values
   dict.items

Currently these all return lists, which may be expensive in terms of copying.
They all have iter* variants which while memory efficient, are far less 
convenient to work with.

For Py3k, the intent is to have only one version which produces a view with 
the memory efficiency of an iterator, but the convenience of a list.

To give these views the benefits of having a real list, the following is all 
that's really needed:

   1. implement __len__ (allows bool() and len() to work)
       - all delegate to dict.__len__

   2. implement __contains__ (allows containment tests to work)
       - delegate to dict.__contains__ for dict.keys()
       - use (or fallback to) linear search for dict.values()
       - check "dict[item[0]] == item[1]" for dict.items()

   3. implement __iter__ (allows iteration to work)
       - make iter(dict.keys()) equivalent to current dict.iterkeys()
       - make iter(dict.values()) equivalent to current dict.itervalues()
       - make iter(dict.items()) equivalent to current dict.iteritems()

For an immutable view, that's all you need. IOW, take the iterable protocol 
(an __iter__ that returns a new iterator when invoked) and add __len__ and 
__contains__ to get a "container" protocol. Given that containment falls back 
on __iter__ anyway, __len__ is the only essential addition to turn an iterable 
into a container.

Note that adding __len__ to an *iterator* does NOT give you something that 
would satisfy such a container protocol - invoking __iter__ again does not 
give you a fresh iterator, so you can't easily iterate repeatedly.

With reiterability as a defining characteristic, other niceties become 
possible (potentially available as a mixin):

   1. a generic container __str__ (not __repr__!) implementation:

       def __str__(self):
           # keep default __repr__ since eval(repr(x)) won't round trip
           name = self.__name__
           guts = ", ".join(repr(x) for x in self)
           return "%s([%s])" % guts

   2. generic container value based equality testing:
       def __eq__(self, other):
           if len(self) != len(other):
               return False
           for this, that in izip(self, other):
               if this != that:
                   return False
           return True

Further refinement of such a container protocol to the minimal requirements 
for a sequence protocol is already defined by such things as the requirements 
of the reversed() builtin:

   for i, x in enumerate(seq):
      assert seq[i] == x

Cheers,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-Dev mailing list