Python 3: dict & dict.keys()

Thu Jul 25 10:57:10 EDT 2013

On Thu, 25 Jul 2013 20:34:23 +1000, Chris Angelico wrote:

> On Thu, Jul 25, 2013 at 7:44 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Thu, 25 Jul 2013 18:15:22 +1000, Chris Angelico wrote:
>>> That's true, but we already have that issue with sets. What's the
>>> union of {0} and {0.0}? Python's answer: It depends on the order of
>>> the operands.
>>
>> That's a side-effect of how numeric equality works in Python. Since 0
>> == 0.0, you can't have both as keys in the same dict, or set. Indeed,
>> the same numeric equality issue occurs here:
>>
>> py> from fractions import Fraction
>> py> [0, 2.5] == [0.0, Fraction(5, 2)] True
>>
>> So nothing really to do with sets or dicts specifically.
> 
> Here's how I imagine set/dict union:
> 1) Take a copy of the first object
> 2) Iterate through the second. If the key doesn't exist in the result,
> add it.

That's because you're too much of a programmer to step away from the 
implementation. Fundamentally, set union has nothing to do with objects, 
or bit strings, or any concrete implementation. Sets might be infinite, 
and "take a copy" impossible or meaningless.

Logically, the union of set A and set B is the set containing every 
element which is in A, every element in B, and no element which is not. 
How you assemble those elements in a concrete implementation is, in a 
sense, irrelevant. In old-school Pascal, the universe of possible 
elements is taken from the 16-bit, or 32-bit if you're lucky, integers; 
in Python, it's taken from hashable objects. Even using your suggested 
algorithm above, since union is symmetric, it should make no difference 
whether you start with the first, or with the second.

> This works just fine even when "add it" means "store this value against
> this key". The dict's value and the object's identity are both ignored,
> and you simply take the first one you find.

I don't believe that "works", since the whole point of dicts is to store 
the values. In practice, the values are more important than the keys. The 
key only exists so you can get to the value -- the key is equivalent to 
the index in a list, the value to the value at that index. We normally 
end up doing something like "print adict[key]", not "print key". So 
throwing away the values just because they happen to have the same key is 
a fairly dubious thing to do, at least for union or intersection.

(In contrast, that's exactly what you want an update method to do. 
Different behaviour for different methods.)

[...]
>>> Raising an error would work, but is IMO unnecessary.
>>
>> I believe that's the only reasonable way for a dict union method to
>> work. As the Zen says:
>>
>> In the face of ambiguity, refuse the temptation to guess.
>>
>> Since there is ambiguity which value should be associated with the key,
>> don't guess.
> 
> There's already ambiguity as to which of two equal values should be
> retained by the set. 

In an ideal world of Platonic Ideals, it wouldn't matter, since 
everything is equal to itself, and to nothing else. There's only one 
number "two", whether you write it as 2 or 2.0 or 800/400 or Ⅱ or 0b10, 
and it is *impossible even in principle* to distinguish them since there 
is no "them" to distinguish between. Things that are equal shouldn't be 
distinguishable, not by value, not by type, not by identity.

But that would be *too abstract* to be useful, and so we allow some of 
the abstractness leak away, to the benefit of all. But the consequence of 
this is that we sometimes have to make hard decisions, like, which one of 
these various "twos" do we want to keep? Or more often, we stumble into a 
decision by allowing the implementation specify the behaviour, rather 
than choosing the behaviour and the finding an implementation to match 
it. Given the two behaviours:

{2} | {2.0} => {2} or {2.0}, which should it be? Why not Fraction(2) or 
Decimal(2) or 2+0j?

there's no reason to prefer Python's answer, "the value on the left", 
except that it simplifies the implementation. The union operator ought to 
be symmetrical, a ∪ b should be identical to b ∪ a, but isn't. Another 
leaky abstraction.

-- 
Steven