Python 3: dict & dict.keys()

Thu Jul 25 06:34:23 EDT 2013

On Thu, Jul 25, 2013 at 7:44 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Thu, 25 Jul 2013 18:15:22 +1000, Chris Angelico wrote:
>> That's true, but we already have that issue with sets. What's the union
>> of {0} and {0.0}? Python's answer: It depends on the order of the
>> operands.
>
> That's a side-effect of how numeric equality works in Python. Since 0 ==
> 0.0, you can't have both as keys in the same dict, or set. Indeed, the
> same numeric equality issue occurs here:
>
> py> from fractions import Fraction
> py> [0, 2.5] == [0.0, Fraction(5, 2)]
> True
>
> So nothing really to do with sets or dicts specifically.

Here's how I imagine set/dict union:
1) Take a copy of the first object
2) Iterate through the second. If the key doesn't exist in the result, add it.

This works just fine even when "add it" means "store this value
against this key". The dict's value and the object's identity are both
ignored, and you simply take the first one you find.

> Aside: I think the contrary behaviour is, well, contrary. It would be
> strange and disturbing to do this:
>
> for key in some_dict:
>     if key == 0:
>         print("found")
>         print(some_dict[key])
>
> and have the loop print "found" and then have the key lookup fail, but
> apparently that's how things work in Pike :-(

I agree, that would be very strange and disturbing. I mentioned that
aspect merely in passing, but the reason for the difference is not an
oddity of key lookup, but a different decision about float and int: in
Pike, 0 and 0.0 are not equal. (Nor are 1 and 1.0, in case you thought
this was a weirdness of zero.) It's a debatable point; are we trying
to say that all numeric types represent real numbers, and are equal if
they represent the same real number? Or are different representations
distinct, just as much as the string "0" is different from the integer
0? Pike took the latter approach. PHP took the former approach to its
illogical extreme, that the string "0001E1" is equal to "000010" (both
strings). No, the dictionary definitely needs to use object equality
to do its lookup, although I could well imagine an implementation that
runs orders of magnitude faster when object identity can be used.

>> I would say that Python can freely pick from the first two options you
>> offered (either keep-first or keep-last), most likely the first one, and
>> it'd make good sense. Your third option would be good for a few specific
>> circumstances, but then you probably would also want the combination of
>> {1:'a'} and {1:'a'} to be {1:['a','a']} for consistency.
>
> Okay, that's six variations. And no, I don't think the "consistency"
> argument is right -- the idea is that you can have multiple values per
> key. Since 'a' == 'a', that's only one value, not two.

Well, it depends what you're doing with the merging of the dicts. But
all of these extra ways to do things would be explicitly-named
functions with much rarer usage (and quite possibly not part of the
standard library, they'd be snippets shared around and put directly in
application code).

>> Raising an error would work, but is IMO unnecessary.
>
> I believe that's the only reasonable way for a dict union method to work.
> As the Zen says:
>
> In the face of ambiguity, refuse the temptation to guess.
>
> Since there is ambiguity which value should be associated with the key,
> don't guess.

There's already ambiguity as to which of two equal values should be
retained by the set. Python takes the first. Is that guessing? Is that
violating the zen? I don't see a problem with the current set
implementation, and I also don't see a problem with using that for
dict merging.

> Object identity is a red herring. It would be perfectly valid for a
> Python implementation to create new instances of each element in the set
> union, assuming such creation was free of side-effects (apart from memory
> usage and time, naturally). set.union() makes no promise about the
> identity of elements, and it is defined the same way for languages where
> object identity does not exist (say, old-school Pascal).

That still doesn't deal with the "which type should the new object
be". We're back to this question: What is the union of 0 and 0.0?

>>> {0} | {0.0}
{0}
>>> {0.0} | {0}
{0.0}

Maybe Python could create a brand new object, but would it be an int
or a float? The only way I could imagine this working is with a
modified-set class that takes an object constructor, and passes every
object through it. That way, you could have set(float) that coerces
everything to float on entry, which would enforce what you're saying
(even down to potentially creating a new object with a new id, though
float() seems to return a float argument unchanged in CPython 3.3).
Would that really help anything, though? Do we gain anything by not
simply accepting, in the manner of Colonel Fairfax, the first that
comes?

ChrisA