[Python-Dev] decoding errors when comparing strings
Fredrik Lundh
Fredrik Lundh" <effbot@telia.com
Wed, 26 Jul 2000 09:49:44 +0200
guido wrote:
> > summary: the current interpreter throws an "ASCII decoding
> > error" exception if you compare 8-bit and unicode strings, and
> > the 8-bit string happen to contain a character in the 128-255
> > range.
>=20
> Doesn't bother me at all. If I write a user-defined class that raises
> an exception in __cmp__ you can get the same behavior. The fact that
> the hashes were the same is a red herring; there are plenty of values
> with the same hash that aren't equal.
>=20
> I see the exception as a useful warning that the program isn't
> sufficiently Unicode aware to work correctly. That's a *good* thing
> in my book -- I'd rather raise an exception than silently fail.
I assume that means you're voting for alternative 3:
"a third alternative would be to keep the exception, and make
the dictionary code exception proof."
because the following isn't exactly good behaviour:
>>> a =3D "=84"
>>> b =3D unicode(a, "iso-8859-1")
>>> d =3D {}
>>> d[a] =3D "a"
>>> d[b] =3D "b"
>>> len(d)
UnicodeError: ASCII decoding error: ordinal not in range(128)
>>> len(d)
2
(in other words, the dictionary implementation misbehaves if items
with the same hash value cannot be successfully compared)
</F>