[Python-Dev] unicode hell/mixing str and unicode as dictionary keys

Fri Aug 4 03:37:54 CEST 2006

On 8/3/06, M.-A. Lemburg <mal at egenix.com> wrote:
> > ...but in the case of dictionaries this behaviour has changed and in
> > prior versions of python dictionaries did work as I expected them to.
> > Now they don't.
>
> Let's put it this way: Python 2.5 uncovered a bug in your
> application that has always been there. It's better to
> fix your application than arguing to cover up the bug again.

I would understand this assertion if Ralf were expecting dictionaries
to consider
    { u'm\xe1s': 1, 'm\xe1s': 1 } == { u'm\xe1s': 1 } == { 'm\xe1s': 1 }
This is clearly a mess waiting to explode.

But that's not what he said. He expects, as is the case in python2.4,
    len({ u'm\xe1s': 1, 'm\xe1s': 1 }) == 2
because u'm\xe1s' clearly does not equal 'm\xe1s'. Because it raises
an exception, the dictionary shouldn't consider it equal, so there
should be the two keys which happen to be somewhat equivalent.

While this is in fact in the NEWS (Patch #1497053 & bug #1275608), I
think this should be raised for further discussion. Raising the
exception is good for debugging mistakes, but bad for dictionaries
holding holding inequal objects that happen to hash to the same value,
and correclty raise exceptions on comparison.

When we thought it was just a debugging tool, it made sense to put it
straight in to 2.5. Since it actually can adversely affect behavior in
only slightly edgy cases, perhaps it should go through a warning phase
(which ideally could show the exception that was thrown, thus yielding
most or all of the intended debugging advantage).

Michael
-- 
Michael Urman  http://www.tortall.net/mu/blog