[ python-Bugs-1564763 ] Unicode comparison change in 2.4 vs. 2.5

SourceForge.net noreply at sourceforge.net
Mon Sep 25 23:33:43 CEST 2006


Bugs item #1564763, was opened at 2006-09-24 23:43
Message generated for change (Comment added) made by arigo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1564763&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Joe Wreschnig (piman)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Unicode comparison change in 2.4 vs. 2.5

Initial Comment:
Python 2.5 changed the behavior of unicode comparisons
in a significant way from Python 2.4, causing a test
case failure in a module of mine. All tests passed with
an earlier version of 2.5, though unfortunately I don't
know what version in particular it started failing with.

The following code prints out all True on Python 2.4;
the strings are compared case-insensitively, whether
they are my lowerstr class, real strs, or unicodes. On
Python 2.5, the comparison between lowerstr and unicode
is false, but only in one direction.

If I make lowerstr inherit from unicode rather than
str, all comparisons are true again. So at the very
least, this is internally inconsistent. I also think
changing the behavior between 2.4 and 2.5 constitutes a
serious bug.

----------------------------------------------------------------------

>Comment By: Armin Rigo (arigo)
Date: 2006-09-25 21:33

Message:
Logged In: YES 
user_id=4771

Sorry, I missed your comment: if lowerstr inherits from
unicode then it just works.  The reason is that
'abc'.__eq__(u'abc') returns NotImplemented, but
u'abc'.__eq__('abc') returns True.

This is only inconsistent because of the asymmetry between
strings and unicodes: strings can be transparently turned
into unicodes but not the other way around -- so
unicode.__eq__(x) can accept a string as the argument x
and convert it to a unicode transparently, but str.__eq__(x)
does not try to convert x to a string if it is a unicode.

It's not a completely convincing explanation, but I think it
shows at least why we got at the current situation of Python
2.5.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-09-25 21:11

Message:
Logged In: YES 
user_id=4771

This is an artifact of the change in the unicode class, which
now has the proper __eq__, __ne__, __lt__, etc. methods
instead of the semi-deprecated __cmp__.  The mixture of
__cmp__ and the other methods is not very well-defined.  This
is why your code worked in 2.4: a bit by chance.

Indeed, in theory it should not, according to the language
reference.  So what I am saying is that although it is a
behavior change from 2.4 to 2.5, I would argue that it is not
a bug but a bug fix...

The reason is that if we ignore the __eq__ vs __cmp__ issues,
the operation 'a == b' is defined as: Python tries
a.__eq__(b); if this returns NotImplemented, then Python
tries b.__eq__(a).  As an exception, if type(b) is a strict
subclass of type(a), then Python tries in the other order. 
This is why you get the 2.5 behavior: if lowerstr inherits
from str, it is not a subclass of unicode, so u'abc' ==
lowerstr() tries u'abc'.__eq__(), which works immediately. 
On the other hand, if lowerstr inherits from unicode, then
Python tries first lowerstr().__eq__(u'abc').

This part of the Python object model - when to reverse the
order or not - is a bit obscure and not completely helpful...
Subclassing built-in types generally only works a bit.  In
your situation you should use a regular class that behaves in
a string-like fashion, with an __eq__() method doing the
case-insensitive comparison... if you can at all - there are
places where you need a real string, so this "solution" might
not be one either, but I don't see a better one :-(

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1564763&group_id=5470


More information about the Python-bugs-list mailing list