Unicode and dictionaries

Sat Jan 16 23:42:21 EST 2010

On Jan 16, 7:06 pm, Ben Finney <ben+pyt... at benfinney.id.au> wrote:
> Carl Banks <pavlovevide... at gmail.com> writes:
> > On Jan 16, 3:56 pm, Ben Finney <ben+pyt... at benfinney.id.au> wrote:
> > > gizli <mehm... at gmail.com> writes:
> > > > >>> test_dict = {u'öğe':1}
> > > > >>> u'öğe' in test_dict.keys()
> > > > True
> > > > >>> 'öğe' in test_dict.keys()
> > > > True
>
> > > I would call this a bug. The two objects are different, so the latter
> > > expression should return ‘False’.
>
> > Except the two objects are not different if default encoding is utf-8.
>
> They are different, because a Unicode object is *not* encoded in any
> character encoding, whereas the byte string object is.

Of course they're different, it's not relevant to this situation.
What matters is if they compare equal, which is the only criteria for
whether an object is found in a list.  x in s is true if there is some
object m in s for which m == x.

If the default encoding and the terminal encoding are both UTF-8 (or
both latin-9), then u'öğe' == 'öğe'.  This behavior is documented (PEP
100) and therefore not a bug.  Relevant lines:

"Unicode objects should compare equal to other objects after these
other objects have been coerced to Unicode.  For strings this means
that they are interpreted as Unicode string using the <default
encoding>."

Carl Banks