Are dicts supposed to raise comparison errors

Richard Damon Richard at Damon-Family.org
Wed Aug 1 07:56:11 EDT 2018


On 8/1/18 4:36 AM, Robin Becker wrote:
> On 31/07/2018 16:52, Chris Angelico wrote:
>> On Wed, Aug 1, 2018 at 1:28 AM, MRAB <python at mrabarnett.plus.com> wrote:
>>> On 2018-07-31 08:40, Robin Becker wrote:
>>>>
>>>> A bitbucket user complains that python 3.6.6 with -Wall -b prints
>>>> warnings
> .............
>>> The warning looks wrong to be.
>>>
>>> In Python 2, u'a' and b'a' would be treated as the same key, but in
>>> Python 3
>>> they are distinct and can co-exist.
>>>
>>> Something for Python's bug tracker, I think!
>>
>> It's a warning specifically requested by the -b option. The two keys
>> in question have the same hash, which means they have to be compared
>> directly; they will compare unequal, but because of the -b flag, the
>> comparison triggers a warning. If that warning is spurious, *don't use
>> the -b option*.
>>
>> ChrisA
>>
>
> I looked at the language documentation for 3.7 mappings
>
>> These represent finite sets of objects indexed by nearly arbitrary
>> values. The only types of values not acceptable as keys are values
>> containing lists or dictionaries or other mutable types that are
>> compared by value rather than by object identity, the reason being
>> that the efficient implementation of dictionaries requires a key’s
>> hash value to remain constant. Numeric types used for keys obey the
>> normal rules for numeric comparison: if two numbers compare equal
>> (e.g., 1 and 1.0) then they can be used interchangeably to index the
>> same dictionary entry.
>
>
>
> it says explicitly that numeric keys will use numeric comparison, but
> no mention is made of strings/bytes etc etc and there's an implication
> that object identity is used rather than comparison. In python 3.x
> b'a' is not the same as 'a' either the documentation is lacking some
> reference to strings/bytes or the warning is wrong. Using the excuse
> that normal comparison is being used seems a bit feeble. It would
> clearly improve speed if object identity were to be used, but I
> suppose that's not the case for other reasons.

One problem with using identity for strings is that strings that have
the same value may not have the same identity. I believe very short
strings, like small numbers are cached, so same valued (and typed) short
strings will have the same identity, but longer strings are not. It
might work for 'a', but might not for 'employee'.

-- 
Richard Damon




More information about the Python-list mailing list