[Python-ideas] dictionary constructor should not allow duplicate keys

Tue May 3 21:09:20 EDT 2016

On Mon, May 02, 2016 at 02:36:35PM -0700, Luigi Semenzato wrote:

> The original problem description:
> 
> lives_in = { 'lion': ['Africa', 'America'],
>              'parrot': ['Europe'],
>              #... 100+ more rows here
>              'lion': ['Europe'],
>              #... 100+ more rows here
>            }
> 
> The above constructor overwrites the first 'lion' entry silently,
> often causing unexpected behavior.

Did your colleague really have 200+ items in the dict? No matter, I 
suppose. The same principle applies.

When you have significant amount of data in a dict (or any other data 
structure, such as a list, tree, whatever), the programmer has to take 
responsibility for the data validation. Not the compiler. Out of all the 
possible errors, why is "duplicate key" so special? Your colleague could 
have caused unexpected behaviour in many ways:

lives_in = { # missed value
             'lion': ['Africa', 'America'],
             # misspelled value
             'parrot': ['Eruope'],
             # misspelled key
             'kangeroo': ['Australia'],
             # invalid key
             'kettle': ['Arctic'],
             # invalid value
             'aardvark': 'South Africa',
             # missed key
             # oops, forgot 'tiger' altogether
           }

Where was your colleague's data validation? I'm sorry that your 
colleague lost a lot of time debugging this failure, but you might have 
had exactly the same result from any of the above errors.

Unless somebody can demonstrate that "duplicate keys" is a systematic 
and common failure among Python programmers, I think that it is 
perfectly reasonable to put the onus on detecting duplicates on the 
programmer, just like all those other data errors.

The data validation need not be a big burden. In my own code, unless the 
dict is so small that I can easily see that it is correct with my own 
eyes, I always follow it with an assertion:

assert len(lives_in) == 250

which is a cheap test for at least some duplicate, missed or extra keys. 
But depending on your use-case, it may be that dict is the wrong data 
structure to use, and you need something that will validate items as 
they are entered. Unless your dict is small enough that you can see it 
is correct by eye, you need some sort of check that your data is valid, 
that you haven't forgotten keys, or misspelled them.

The dict constructor won't and can't do that for you, so you need to do 
it youself. Once you're doing that, then it is no extra effort to check 
for duplicates.

So unless you have a good answer to the question "Why are duplicate keys 
so special that the dict constructor has to guard against them, when it 
doesn't guard against all the other failures we have to check for?", I 
think the status quo should stand.

There is one obvious answer:

Duplicate keys are special because, unlike the other errors, the dict 
constructor CAN guard against them.

That would be a reasonable answer. But I'm not sure if it is special 
*enough* to justify violating the Zen of Python. That may be a matter of 
opinion and taste.

-- 
Steve