unicode keys in dicts

Peter Hansen peter at engcorp.com
Thu Jan 8 10:20:51 EST 2004


Jiba wrote:
> 
> is the following behaviour normal :
> 
> >>> d = {"é" : 1}
> >>> d["é"]
> 1
> >>> d[u"é"]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> KeyError: u'\xe9'
> 
> it seems that "é" and u"é" are not considered as the same key (in Python
> 2.3.3). Though they have the same hash code (returned by hash()).
> 
> And "e" and u"e" (non accentuated characters) are considered as the same
> !

Well, "e" and u"e" _are_ the same character, while the unicode that comes
from decoding the "é" representation is entirely dependent on which codec 
you use for the decoding.  It is only the same as u"é" when decoded using 
certain codecs, most likely.  ASCII is 7-bit only, so the "é" value is 
not legal in ASCII, which is likely your default encoding.

For example, try "é".decode('iso-8859-1') and you will probably get the 
unicode value you were expecting.

I'm not the best to answer this, but I would at least say that the above
behaviour is considered "normal", though it can be surprising to those
of us not expert in Unicode issues... 

-Peter



More information about the Python-list mailing list