Problem with sets and Unicode strings

Wed Jun 28 12:59:31 EDT 2006

> But <http://docs.python.org/ref/comparisons.html> says:
> 
> Strings are compared lexicographically using the numeric equivalents
> (the result of the built-in function ord()) of their characters. Unicode
> and 8-bit strings are fully interoperable in this behavior.
> 
> Doesn't this mean that Unicode and 8-bit strings can be compared and
> this comparison is well defined? (even if it's is not meaningful)

Obviously not - otherwise you wouldn't have the problems you'd observed,
wouldn't you?

What happens of course is that in case of string to unicode-comparison, the
string gets coerced to an unicode value - using the default encoding!

# -*- coding: latin1 -*-

print "ö".decode("latin1") == u"ö"
print "ö" == u"ö"

So - they are fully interoperable and the comparison is well defined - when
the coercion is successful.

Diez