Problem with sets and Unicode strings

Diez B. Roggisch deets at nospam.web.de
Wed Jun 28 12:59:31 EDT 2006


> But <http://docs.python.org/ref/comparisons.html> says:
> 
> Strings are compared lexicographically using the numeric equivalents
> (the result of the built-in function ord()) of their characters. Unicode
> and 8-bit strings are fully interoperable in this behavior.
> 
> Doesn't this mean that Unicode and 8-bit strings can be compared and
> this comparison is well defined? (even if it's is not meaningful)

Obviously not - otherwise you wouldn't have the problems you'd observed,
wouldn't you?

What happens of course is that in case of string to unicode-comparison, the
string gets coerced to an unicode value - using the default encoding!


# -*- coding: latin1 -*-

print "ö".decode("latin1") == u"ö"
print "ö" == u"ö"



So - they are fully interoperable and the comparison is well defined - when
the coercion is successful.

Diez



More information about the Python-list mailing list