Help with sets

Mon Oct 11 11:09:19 EDT 2010

On 10/11/10 6:11 AM, Lawrence D'Oliveiro wrote:
> In message<8h9ob9FkurU1 at mid.individual.net>, Gregory Ewing wrote:
>
>> Lawrence D'Oliveiro wrote:
>>
>>> Did you know that applying the “set” or “frozenset” functions to a dict
>>> return a set of its keys?
>>
>>> Seems a bit dodgy, somehow.
>>
>> That's just a consequence of the fact that dicts produce their
>> keys when iterated over, and the set constructor iterates over
>> whatever you give it.
>
> Hmm. It seems that “iter(<dict>)” iterating over the keys has been around a
> long time. But a dict has both keys and values: why are language constructs
> treating them so specially as to grab the keys and throw away the values?

Language constructs are not treating anything specially much less "throwing 
away" anything. The language construct in question does exactly the same thing 
with every object nowadays: call the .__iter__() method to get the iterator and 
call .next() on that iterator until it raises StopIteration. It is the 
responsibility of the dict object itself to decide how it wants to be iterated over.

The reasoning for this decision is spelled out in the PEP introducing the 
iterator feature:

   http://www.python.org/dev/peps/pep-0234/

"""
     - There has been a long discussion about whether

           for x in dict: ...

       should assign x the successive keys, values, or items of the
       dictionary.  The symmetry between "if x in y" and "for x in y"
       suggests that it should iterate over keys.  This symmetry has been
       observed by many independently and has even been used to "explain"
       one using the other.  This is because for sequences, "if x in y"
       iterates over y comparing the iterated values to x.  If we adopt
       both of the above proposals, this will also hold for
       dictionaries.

       The argument against making "for x in dict" iterate over the keys
       comes mostly from a practicality point of view: scans of the
       standard library show that there are about as many uses of "for x
       in dict.items()" as there are of "for x in dict.keys()", with the
       items() version having a small majority.  Presumably many of the
       loops using keys() use the corresponding value anyway, by writing
       dict[x], so (the argument goes) by making both the key and value
       available, we could support the largest number of cases.  While
       this is true, I (Guido) find the correspondence between "for x in
       dict" and "if x in dict" too compelling to break, and there's not
       much overhead in having to write dict[x] to explicitly get the
       value.

       For fast iteration over items, use "for key, value in
       dict.iteritems()".  I've timed the difference between

           for key in dict: dict[key]

       and

           for key, value in dict.iteritems(): pass

       and found that the latter is only about 7% faster.

       Resolution: By BDFL pronouncement, "for x in dict" iterates over
       the keys, and dictionaries have iteritems(), iterkeys(), and
       itervalues() to return the different flavors of dictionary
       iterators.
"""

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco