Problem with sets and Unicode strings
Diez B. Roggisch
deets at nospam.web.de
Wed Jun 28 12:59:31 EDT 2006
> But <http://docs.python.org/ref/comparisons.html> says:
>
> Strings are compared lexicographically using the numeric equivalents
> (the result of the built-in function ord()) of their characters. Unicode
> and 8-bit strings are fully interoperable in this behavior.
>
> Doesn't this mean that Unicode and 8-bit strings can be compared and
> this comparison is well defined? (even if it's is not meaningful)
Obviously not - otherwise you wouldn't have the problems you'd observed,
wouldn't you?
What happens of course is that in case of string to unicode-comparison, the
string gets coerced to an unicode value - using the default encoding!
# -*- coding: latin1 -*-
print "ö".decode("latin1") == u"ö"
print "ö" == u"ö"
So - they are fully interoperable and the comparison is well defined - when
the coercion is successful.
Diez
More information about the Python-list
mailing list