[Python-Dev] unicode hell/mixing str and unicode as dictionary keys

Jean-Paul Calderone exarkun at divmod.com
Fri Aug 4 06:43:23 CEST 2006


On Thu, 03 Aug 2006 21:34:04 -0700, Josiah Carlson <jcarlson at uci.edu> wrote:
>
>Bob Ippolito <bob at redivi.com> wrote:
>> On Aug 3, 2006, at 6:51 PM, Greg Ewing wrote:
>>
>> > M.-A. Lemburg wrote:
>> >
>> >> Perhaps we ought to add an exception to the dict lookup mechanism
>> >> and continue to silence UnicodeErrors ?!
>> >
>> > Seems to be that comparison of unicode and non-unicode
>> > strings for equality shouldn't raise exceptions in the
>> > first place.
>>
>> Seems like a slightly better idea than having dictionaries suppress
>> exceptions. Still not ideal though because sticking non-ASCII strings
>> that are supposed to be text and unicode in the same data structures
>> is *probably* still an error.
>
>If/when 'python -U -c "import test.testall"' runs without unexpected
>error (I doubt it will happen prior to the "all strings are unicode"
>conversion), then I think that we can say that there aren't any
>use-cases for strings and unicode being in the same dictionary.
>
>As an alternate idea, rather than attempting to .decode('ascii') when
>strings and unicode compare, why not .decode('latin-1')?  We lose the
>unicode decoding error, but "the right thing" happens (in my opinion)
>when u'\xa1' and '\xa1' compare.

It might be right for Latin-1 strings.

However, it would be even *more* surprising for the person who has to
figure out why his program works when his program gets a string containing
'\xc0' from one user but fails when it gets '\xe3\x81\x82' from another
user.

I like the exception that 2.5 raises.  I only wish it raised by default
when using 'ascii' and u'ascii' as keys in the same dictionary. ;)  Oh,
and that str and unicode did not hash like they do.  ;)

Jean-Paul


More information about the Python-Dev mailing list