[Python-ideas] Add dict.getkey() and set.get()

Mon Sep 16 03:34:53 CEST 2013

From: Steven D'Aprano <steve at pearwood.info>

Sent: Sunday, September 15, 2013 5:30 PM

> On Sun, Sep 15, 2013 at 09:02:33PM +0100, Oscar Benjamin wrote:
> 
>>  I don't know whether this is relying on undefined behaviour but the
>>  following is O(1) and seems to work:
>> 
>>  >>> def canonical_key(d, k):
>>  ...     k, = {k} & d.keys()
>>  ...     return k
>>  ...
> 
> I'm pretty sure that (1) it relies on implementation-specific behaviour, 
> and (2) it's O(N), not O(1).
> 
> The implementation-specific part is whether & takes the key from the 
> left-hand or right-hand operand when the keys are equal. For example, in 
> Python 3.3:
> 
> py> {1} & {1.0}
> {1.0}
> py> {1} & {1.0, 2.0}
> {1}

Actually, this isn't strictly relevant, because he's calling dict_keys.__rand__, not set.__and__. There's no reason they have to do the same thing—and, in fact, they don't, which is why it works in the first place.

But the larger point is valid: both methods are implementation-specific—and the fact that they produce opposite results is a nice illustration of that.

> And surely it's O(N) -- to be precise, O(M+N) -- because & has to walk 
> all the keys in both operands? I suppose technically & could special 
> case "one of the operands has length 1" and optimize it, but that too 
> would be an implementation detail.

No it doesn't. In fact, it _can't_ walk all the keys in both operands; unless they were sorted (and they aren't), there's no way to make that work. Instead, it does an O(N) or O(M) walk over one operand, calling the other operand's __contains__ method for each.

But again, the larger point is valid: whichever one it returns the values from is obviously the one it's walking. So, in his case, it's calling {k}.__contains__(n) for each key in d.keys(), which is O(N).